14
Global Etymologies
John D. Bengtson and Merritt Ruhlen
If the strength of Indo-European studies
is largely based on the existence,
in a few instances at least,
of very old sources, the strength
of Amerindian studies is simply
the vast number of languages.
Thus synchronic breadth becomes
the source of diachronic depth.
—Joseph H. Greenberg (1987)
How does one know that two languages are related? Or that two language
families are related? Every linguist purports to know the answers to these
questions, but the answers vary surprisingly from one linguist to another. And
the divergence of views concerning what is actually known is even greater than
that exhibited on the question of how one arrives at this body of information.
This is not a particularly satisfactory state of affairs. In what follows we
will explore these questions in a global context. We conclude that, despite
the generally antipathetic or agnostic stance of most linguists, the case for
monogenesis of extant (and attested extinct) languages is quite strong. We
will present evidence that we feel can only be explained genetically (i.e. as
278
14. Global Etymologies
the result of common origin), but we will also attempt to answer some of the
criticism that has been leveled at work such as ours for over a century.
THE BASIS OF LINGUISTIC TAXONOMY
That ordinary words form the basis of linguistic taxonomy is a direct conse-
quence of the fundamental property of human language, the arbitrary relation-
ship between sound and meaning. Since all sequences of sounds are equally
well suited to represent any meaning, there is no tendency or predisposition
for certain sounds or sound sequences to be associated with certain meanings
(leaving aside onomatopoeia, which in any event is irrelevant for classifica-
tion). In classifying languages genetically we seek, among the available lexical
and grammatical formatives, similarities that involve both sound and mean-
ing. Typological similarities, involving sound alone or meaning alone, do not
yield reliable results.
The fundamental principles of taxonomy are not specific to linguistics, but
are, rather, as applicable in fields as disparate as molecular biology, botany,
ethnology, and astronomy. When one identifies similarities among molecular
structures, plants, human societies, or stars, the origin of such similarities can
be explained only by one of three mechanisms: (1) common origin, (2) borrow-
ing, or (3) convergence. To demonstrate that two languages (or language fam-
ilies) are related, it is thus sufficient to show that their shared similarities are
not the result of either borrowing or convergence. As regards convergence—
the manifestation of motivated or accidental resemblances—linguists are in a
more favorable situation than are biologists. In biology, convergence may be
accidental, but is more often motivated by the environment; it is not by ac-
cident that bats resemble birds, or that dolphins resemble fish. In linguistics,
by contrast, where the sound/meaning association is arbitrary, convergence is
always accidental.
It is seldom emphasized that similarities between language families are
themselves susceptible to the same three explanations. That we so seldom see
mention of this corollary principle is largely because twentieth-century histori-
cal linguistics has been laboring under the delusion that language families like
Indo-European share no cognates with other families, thus offering nothing
to compare. At this level, it is alleged, similarities simply do not exist.
What is striking is that this position—for which considerable evidence to
the contrary existed already at the start of this century (Trombetti 1905) and
which on a priori grounds seems most unlikely (Ruhlen 1988a)—came to be
almost universally accepted by linguists, most of whom have never investi-
gated the question themselves. Those few scholars who have actually investi-
gated the question, such as Trombetti (1905), Swadesh (1960), and Greenberg
14. Global Etymologies
279
(1987), have tended to favor monogenesis of extant languages. Even Edward
Sapir, often considered an exemplar of linguistic sobriety (despite his alleged
excesses in the Americas), looked favorably upon the work of Trombetti, as
seen in a letter to Kroeber in 1924: “There is much excellent material and
good sense in Trombetti in spite of his being a frenzied monogenist. I am
not so sure that his standpoint is less sound than the usual ‘conservative’
one” (quoted in Golla 1984: 420). We maintain that a comparison of the
world’s language families without preconception reveals numerous widespread
elements that can only be reasonably explained as the result of common origin.
BORROWING
Linguists employ a number of well-known techniques to distinguish bor-
rowed words from inherited items. Most important, clearly, is the fact that
basic vocabulary, as defined by Dolgopolsky (1964) and others, is highly resis-
tant to borrowing. Though it is no doubt true that any word may on occasion
be borrowed by one language from another, it is equally true that such basic
items as pronouns and body parts are rarely borrowed. Furthermore, borrow-
ing takes place between two languages, at a particular time and place, not
between language families, across broad expanses of time and place. Thus
to attribute the global similarities we document here to borrowing would be
ludicrous. And as regards the alleged cases of mass borrowing in the Amer-
icas (the so-called “Pan-Americanisms”), Greenberg (1990: 11) quite rightly
protests “that basic words and pronouns could be borrowed from Tierra del
Fuego to British Columbia . . . is so utterly improbable that it hardly needs
discussion.” It seems to us even less likely that basic vocabulary—the grist for
most of the etymologies we offer herein—could have been borrowed from one
language to another all the way from Africa across Eurasia to South America.
CONVERGENCE
A common criticism of work like ours is that, with around 5,000 languages
to choose from, it cannot be too hard to find a word in some African lan-
guage that is semantically and phonologically similar to, or even identical
with, some word in an American Indian language.
1
There are so many possi-
bilities, runs this argument, that one can hardly fail to find accidental “look-
alikes” everywhere (Goddard 1979, Campbell 1988). But this sort of mindless
search is exactly the reverse of how the comparative method proceeds. The
units we are comparing are language families, not individual languages (a
language isolate like Basque has traditionally been considered, taxonomically,
1
For a more fundamental discussion of convergence, see Chapter 2.
280
14. Global Etymologies
a family consisting of a single language). Specifically, we will be compar-
ing items in the following 32 taxa, each of which we believe is a genetically
valid group at some level of the classification: Khoisan, Niger-Congo, Kordofa-
nian, Nilo-Saharan, Afro-Asiatic, Kartvelian, Indo-European, Uralic, Dravid-
ian, Turkic, Mongolian, Tungus, Korean, Japanese-Ryukyuan, Ainu, Gilyak,
Chukchi-Kamchatkan, Eskimo-Aleut, Caucasian, Basque, Burushaski, Yeni-
seian, Sino-Tibetan, Na-Dene, Indo-Pacific, Australian, Nahali, Austroasiatic,
Miao-Yao, Daic (= Kadai), Austronesian, and Amerind.
One may legitimately wonder why, for the most part, we are comparing
relatively low-level families like Indo-European and Sino-Tibetan rather than
higher-level taxa like Eurasiatic/Nostratic and Dene-Caucasian, especially
since both of us support the validity of these higher-level families (Bengtson
1991a,b, Ruhlen 1990a). We do this to emphasize that higher-level groupings
do not require the prior working out of all the intermediate nodes, contrary
to the opinion of most Amerindian specialists (the field is all but bereft of
generalists!). As is well known, both Indo-European and Austronesian were
recognized as families from the early years of their investigation, long be-
fore specialists had reconstructed all their intermediate levels (a task that is,
of course, still incomplete). In taxonomy it is a commonplace that higher-
level groupings are often more obvious—and easier to demonstrate—than are
lower-level nodes. We maintain that this is particularly so when one consid-
ers the entire world. Current contrary opinion notwithstanding, it is really
fairly simple to show that all the world’s language families are related, as we
shall see in the etymologies that follow. Discovering the correct intermedi-
ate groupings of the tree—the subgrouping of the entire human family—is a
much more difficult task, and one that has only begun. Exactly the same is
true of Amerind, which itself is a well-defined taxon (Greenberg 1987, Ruhlen
1991a); the subgrouping within Amerind involves far more difficult analyses
and taxonomic decisions (Ruhlen 1991c).
Each of our 32 genetic groups is defined by a set of etymologies that
connects grammatical and lexical items presumed to be cognate within that
group; the postulated membership and putative subgrouping within each of
these groups is given in Ruhlen (1987a). The precise number of etymologies
defining each of the 32 groups ranges from several thousand (for close-knit
and/or well-documented groups like Dravidian or Indo-European) to several
dozen (for ancient and/or poorly studied groups like Indo-Pacific or Aus-
tralian). For the most part the many etymologies defining each group have
been discovered independently, by different scholars. (In this regard Green-
berg’s work—in Africa, New Guinea, and the Americas—represents an excep-
tion to the rule.) So instead of drawing our etymologies from thousands of
languages, each containing thousands of words, we are, rather, limited to less
14. Global Etymologies
281
than three-dozen families, some of which have no more than a few hundred
identifiable cognates. The pool of possibilities is thus greatly reduced, and
accidental look-alikes will be few.
We believe that the failure of our critics to appreciate the truly minuscule
probability of accidental similarities is the chief impediment to their under-
standing why all the world’s languages must derive from a common origin.
Accordingly, let us consider this question in some detail. Each of the etymolo-
gies we cite involves at least a half-dozen of the 32 supposedly independent
families, precisely because the probability of finding the same accidental re-
semblance in six different families is close to zero. The multiplication of the
(im)probabilities of accidental resemblance, as more and more families are
considered, quickly assures the attentive taxonomist that similarities shared
by numerous families, often separated by vast distances, cannot be due to
chance. This crucial point has been emphasized by Collinder (1949), Green-
berg (1957, 1963, 1987), and Dolgopolsky (1964), among others, but even
Trombetti (1905) was well aware of the statistical importance of attestation
in multiple families, rather than in just two. The biologist Richard Dawkins
(1987: 274) makes the same point: “Convergent evolution is really a special
kind of coincidence. The thing about coincidences is that, even if they happen
once, they are far less likely to happen twice. And even less likely to happen
three times. By taking more and more separate protein molecules, we can all
but eliminate coincidence.”
To see just how unlikely accidental look-alikes really are, let us consider
two languages that each have just seven consonants and three vowels:
p
t
k
s
m
n
l
i
u
a
With a few notable exceptions the vast majority of the world’s languages show
at least these phonological distinctions. Yet even this minimal inventory is
capable of producing 147 CVC roots, as shown in Table 5. The probability
of accidental phonological identity is only 1/147, though the probability of
accidental phonological resemblance might be 2/147, 3/147, etc., depending on
how many other phonological shapes in Table 5 are deemed sufficiently similar.
A perusal of Table 5 suggests, however, that most of these putative roots
are quite distinct phonologically and are not readily connected by common
phonological processes.
282
14. Global Etymologies
TABLE 5 Possible CVC Roots for a Language with Seven Consonants and Three
Vowels
KAK
LAK
MAK
NAK
PAK
SAK
TAK
KAL
LAL
MAL
NAL
PAL
SAL
TAL
KAM
LAM
MAM
NAM
PAM
SAM
TAM
KAN
LAN
MAN
NAN
PAN
SAN
TAN
KAP
LAP
MAP
NAP
PAP
SAP
TAP
KAS
LAS
MAS
NAS
PAS
SAS
TAS
KAT
LAT
MAT
NAT
PAT
SAT
TAT
KIK
LIK
MIK
NIK
PIK
SIK
TIK
KIL
LIL
MIL
NIL
PIL
SIL
TIL
KIM
LIM
MIM
NIM
PIM
SIM
TIM
KIN
LIN
MIN
NIN
PIN
SIN
TIN
KIP
LIP
MIP
NIP
PIP
SIP
TIP
KIS
LIS
MIS
NIS
PIS
SIS
TIS
KIT
LIT
MIT
NIT
PIT
SIT
TIT
KUK
LUK
MUK
NUK
PUK
SUK
TUK
KUL
LUL
MUL
NUL
PUL
SUL
TUL
KUM
LUM
MUM
NUM
PUM
SUM
TUM
KUN
LUN
MUN
NUN
PUN
SUN
TUN
KUP
LUP
MUP
NUP
PUP
SUP
TUP
KUS
LUS
MUS
NUS
PUS
SUS
TUS
KUT
LUT
MUT
NUT
PUT
SUT
TUT
Now were we to compare two languages with a more typical phonemic
inventory, say, fourteen consonants and five vowels,
p
t
k
b
d
g
ˇc
s
m
n
l
r
j
w
i
u
e
o
a
we would find that the number of possible CVC roots in each language jumps
to 980. Again, of course, the probability of chance resemblance will depend
on certain phonological assumptions, but precious few accidental identities or
resemblances, vis-`
a-vis the stock of some other language or group of languages,
could be expected.
One may appreciate just how unlikely an explanation of chance resemblance
—independent development in each family—really is by considering the prob-
14. Global Etymologies
283
ability that the resemblances noted in etymology 21 (below) arose by conver-
gence. We have chosen this etymology for our argument because the meaning
involved is rarely borrowed and has no onomatopoeic connections. It thus
offers a clear case, where the similarities must be due either to common origin
or to accidental convergence. Let us try to calculate the probability that these
similarities arose independently. To do this we must make certain assump-
tions, and at each such stage we shall adopt a minimalist approach that in
fact underestimates the true probability. Let us assume, as we did above, that
each language family uses only seven consonants and three vowels, yielding
the 147 syllable types shown in Table 5. What, then, is the probability that
two languages will accidentally match for a particular semantic/phonological
domain, in the present case ‘female genitalia’ ? Clearly it is 1/147 or .007.
Whatever the form that appears in the first language family, the second fam-
ily has only one chance in 147 of matching it. And the probability that a
third family will offer a match will be (1/147)
2
or .000049; that of a fourth
family, (1/147)
3
or .0000003; and so forth. In the etymology we give, 14 of
the 32 taxa show apparent cognates, though the evidence is for the moment
slim in Australian and the vowel in Austronesian (and many Amerind forms)
is e rather than the expected u. But if we ignore these details, then the prob-
ability that the particular sound/meaning correlation “PUT/female genitals”
arose independently fourteen times will be (1/147)
13
, or about one chance in
ten octillion, by our rough calculations. We feel that this qualifies as a long
shot; certainly descent from a common source is the more likely explanation.
The foregoing constitutes what we consider to be the basis of genetic classi-
fication in linguistics. The application of these basic principles to the world’s
language families leads inevitably, in our opinion, to the conclusion that they
all derive from a single source, as suggested by the 27 etymologies presented
below. We have not yet dealt, however, with a number of other topics that
in the minds of many linguists are inextricably tied up with taxonomy, ques-
tions like reconstruction, sound correspondences, and the like. We believe that
these topics are not in fact of crucial importance in linguistic taxonomy, and
that mixing the basic taxonomic principles with these other factors has led
to much of the current confusion that we see concerning the classification of
the world’s languages. So that these ancilary topics not be invoked yet again,
by those opposed to global comparisons, we will take them up one by one
and explain why they are not relevant to our enterprise. Let us begin with a
topic that is at the heart of many current disputes, the alleged incompatibility
between Greenberg’s method of multilateral comparison and the traditional
methods of comparative linguistics.
284
14. Global Etymologies
MULTILATERAL COMPARISON VS. THE TRADITIONAL METHOD
Many linguists feel that Greenberg’s use of what he calls multilateral com-
parison to classify languages in various parts of the world is incompatible
with—or even antagonistic to—the methods of traditional historical linguis-
tics, which emphasize reconstruction and sound correspondences (about which,
see below). Thus, Bynon (1977: 271) claims that “the use of basic vocabulary
comparison not simply as a preliminary to reconstruction but as a substitute
for it is more controversial. . . . Traditional historical linguists . . . have not
been slow in pointing out the inaccuracies which are bound to result from a
reliance on mere similarity of form assessed intuitively and unsubstantiated
by reconstruction.” In a similar vein, Anna Morpurgo Davies (1989: 167)
objects that “we do not yet know whether superfamilies outlined in this way
have the same properties as families established with the standard compara-
tive method. If they do not, there is a serious risk that the whole concept of
superfamily is vacuous.” And Derbyshire and Pullum (1991: 13) find Green-
berg’s Amerind hypothesis “startling, to say the least, when judged in terms
of the standard methodology . . . .”
The confusion displayed in the previous three quotes (and one could give
many others) results from a failure to realize that the comparative method
consists essentially of two stages. The first stage is classification, which is re-
ally no different from what Greenberg calls multilateral comparison. The sec-
ond stage, which might be called historical linguistics, involves family-internal
questions such as sound correspondences and reconstruction. In practice,
there is no name for this second stage simply because the two stages are seldom
distinguished in the basic handbooks on historical linguistics, in which, almost
without exception, the initial stage, classification, is overlooked (Bynon 1977,
Hock 1986, Anttila 1989). Also overlooked in these basic texts are language
families other than Indo-European. The origin of this anomaly—which knows
no parallel in the biological world—is a consequence of the primogeniture
of Indo-European in the pantheon of identified families, and the subsequent
elaboration of the family by Europeans in the nineteenth century.
That the initial stage of comparative linguistics, classification, is so system-
atically overlooked today lies in the origin of the Indo-European concept itself.
When Sir William Jones announced in 1786 that Sanskrit, Greek, and Latin—
and probably Gothic and Celtic as well—had all “sprung from some common
source,” he essentially resolved the first stage of comparative linguistics at the
outset: he identified five branches of Indo-European and hypothesized that all
five were altered later forms of a single language that no longer existed. What
was left unstated in Jones’s historic formulation was the fact that languages
such as Arabic, Hebrew, and Turkish—languages that Jones knew well—were
Dostları ilə paylaş: |