The Enigma of Amerindian Origins

Since Christopher Columbus (1451–1506) landed in Guanahani in the Bahamas on October 12, 1492, the question was asked: who were the people found by him? In 1537, Pope Paul III (1468–1549) solemnly recognized that they were humans. But from where did they come? José de Acosta, a Spanish Jesuit who lived in America from 1572 to 1587, wrote in a book published in 1590 a surprisingly accurate account. They would have traveled by land from Asia! During the ensuing 420 years, however, many other, sometimes fantastic hypotheses were proposed instead, like that they were descendants of Israel’s lost tribes, came from distant continents like Europe or Oceania, or even that all humankind descended from them (reviews in Salzano and Callegari-Jacques 1988; Crawford 1998; Lavallée 2000).

A Synthesis, as Arrived in 2007

My last review of the subject (Salzano 2007) considered a vast array of evidence, including: (a) geology and archeology; (b) paleoanthropology and morphology; (c) linguistics; (d) genetic markers (blood groups and proteins; mitochondrial DNA, Y chromosome, X and autosomes); and (e) viruses, bacteria, and fungi. The most likely scenario that emerged from this exercise was that a single major migration occurred, originating from the Altai Mountains of Southern Siberia without significant discontinuities in time. These first migrants should have entered the continent at least 15,000 years ago, probably using the Pacific coast route.

Update—Modeling

Four post-2007 studies will be considered here that have quantitatively examined the postulated colonization process. The first (Lanata et al. 2008) examined the demographic conditions that would be necessary for an early human dispersal in the Americas. The authors developed a formula based on a previous one derived by Sir Ronald A. Fisher (1890–1962). It takes into consideration the intrinsic maximum growth rate, the environment’s carrying capacity, and the diffusion rate. Space was divided to allow multidirectional expansion, and birth–growth–death variables were assigned to each individual. Paleovegetation maps were used to simulate the past environments.

The simulations started with 80 people in Alaska, 18,000 years before present (BP). The best scenario obtained involved two population reductions (bottlenecks) at the beginning of the process and in Central America and population hotspots before the second bottleneck and in Amazonia. With a 5% animal population growth, humans would reach Tierra del Fuego in 13,000 years. Growth rates of 2% would lead to unlikely estimates.

Linguistics was used as the basis for the second model (Nichols 2008). Spread rates and ages for 50 language families and subfamilies were obtained. Different rates were applied to low and mid-latitudes, and the value found for the entry date was 22,300 years BP. The conclusion was that the ancestors of the Monte Verde archeological site in Chile (dated at 14,500 years BP) had entered North America and were south of the glacial limits well before the end of glaciation and in fact during the very height of glaciation. Johanna Nichols (2008) indicated that this estimate would be consistent with other linguistic evidence, which includes large number of irreducible indigenous American language families, typical numbers of descendant families per ancestor language, structural diversity, and a later stratum of linguistic immigration to western North America.

The two other approaches involved Bayesian methods. Kitchen et al. (2008) used mtDNA and restricted nuclear data, applying to them the isolation-by-migration structured coalescent model. The results suggested a three-stage colonization scenario for the peopling of the Americas. Fagundes et al. (2008a) criticized their analysis, so they revised their dataset, included new information, and reassessed their conclusions (Mulligan et al. 2008). After leaving Asia, Amerind ancestors would have remained in Beringia for 7,000–15,000 years in population stability, followed by a rapid expansion into the Americas 16,000 years BP through the interior ice-free corridor or along the coast. The founder group would have consisted of 1,000–2,000 “effective” individuals (that is, those who contributed to the gene pool of subsequent generations). This scenario is basically that of the “out-of-Beringia” model proposed by Bonatto and Salzano in 1997 (Bonatto and Salzano 1997a, b).

Ray et al. (2010), on the other hand, considered 401 autosomal microsatellite loci typed in 29 native American populations. Using an approximate Bayesian computation framework, they surprisingly verified that a single or two discrete waves of migration from Asia would be highly inconsistent with the observed levels of genetic diversity. The data would be best explained by a model involving recurrent gene flow between Asia and America after the initial colonization, estimated to have started with 100 individuals 13,000 years BP.

Update—Archeology and Paleoanthropology, Comparison with other Data

The most comprehensive post-2007 survey on the subject considered was provided by David J. Meltzer’s book (Meltzer 2009). In 446 pages, he aptly considered both archeological and non-archeological data examining what we know of the first Americans and the methods used by archeologists, geologists, linguists, physical anthropologists, and geneticists to evaluate the problem. He contends that a pre-Clovis presence in the Americas should now be considered a reality.

Evidence concerning the two competing entry routes in the Americas (The Ice Free Corridor and the Coastal Models) were examined by Nicole M. Waguespack (2007), but she does not favor one of these two alternatives. The number of waves was considered by Francisco Rothhammer and Tom D. Dillehay (2009). Their conclusion is that South America was probably colonized between at least 15,000 and 13,500 years BP, most likely by just one migration wave. Another review (Goebel et al. 2008) asserted that the most parsimonius explanation for the available genetic, archeological, and environmental evidence is that humans colonized the Americas around 15,000 years BP.

The Clovis/non-Clovis debate was evaluated in a paper which revised the Clovis time range to 13,250–12,800 calendar years BP, concluding that humans already lived in the Americas before those who used this technique (Waters and Stafford 2007); and that finding pre-Clovis human cropolites at Paisley 5 Mile Point Caves, south-central Oregon, indicated human presence there at 14,270–14,000 calendar years BP (Gilbert et al. 2008). On the other hand, Dillehay et al. (2008) directly dated nine species of marine algae from the Monte Verde site between 14,220 to 13,980 calendar years BP. They favor the view that the early settlement of South America was along the Pacific Coast.

Walter A. Neves et al. (2007a, b) studied 30 early Holocene specimens recovered from Sumidouro Cave (Lagoa Santa region, central Brazil) and 74 human skulls dated between 13,000 and 3,500 calendar years BP. The two sets showed remarkable similarities and clear differences from the morphology of present-day Amerindians and were interpreted as new evidences for the Two Main Biological Components Model advocated by them. The Paleoamerican crania of East Central Argentina dated from 9,000 to 400 calendar years BP showed, however, the same mtDNA haplogroups as later populations with contemporaneous Amerindian morphology (Perez et al. 2009). Generally, the geographical facial skeleton differentiation in South America’s extreme south agrees with those obtained from mtDNA haplogroup frequencies (Perez et al. 2007), but the results of a calendrical 10,300-year-old individual from the On Your Knees Cave, Prince of Wales Island, Alaska revealed a new specific allele arrangement, called subhaplogroup D whose diversity suggested that new calibrations of the mtDNA clock should be considered (Kemp et al. 2007). Based on a much larger amount of data, 576 late Pleistocene/early Holocene and modern skulls, which were submitted to a geometric morphometric analysis (Rolando González-José et al. 2008) suggested that the classical Paleoamerican and Mongoloid craniofacial patterns should be viewed as extremes of a continuous morphological variation. They also built a model considering the genetic and physical anthropology data: A founder population living in Beringia 26,000 to 18,000 years BP and characterized by high craniofacial diversity, founder mtDNA and Y chromosome lineages, and some private autosomal alleles would expand and enter America. Afterwards, a more recent circumarctic gene flow would have enabled the diffusion of characteristics from Asia to America and vice-versa.

Update—Genetic/Genomic and Evolutionary Approaches

Evolutionary population studies can be made either through uniparental (mtDNA, chromosome Y) or biparental (autosome) markers. The first approach avoids the problem of recombination and therefore is especially useful in phylogeographical analyses. But these markers represent just a restricted sample of the whole genome; therefore, conclusions exclusively based on them should be viewed with caution. Wang et al. (2007) performed a study involving 678 autosomal microsatellite loci, genotyped in 422 widely sampled Native Americans from 24 populations. This material was compared with data available from 54 other indigenous populations spread all over the world. The results thus obtained can be summarized as follows: (a) gradients of decreasing genetic diversity both as a function of geographic distance from the Bering Strait and from Siberia were found; (b) there was a higher level of diversity and lower level of population structure in western, as compared to eastern, South America; (c) there is a suggestion that coastal routes were easier than inland ones for the people migrating; and (d) there is partial agreement, on a local scale, between genetic and linguistic similarity.

Lewis (2010) considered this set of data with some additions (29 Native American populations plus the Siberian Tundra Nentsi and Yakut). Using the hierarchical model developed by Jeffrey C. Long (see Lewis and Long 2008), he found (a) a basal position in the topologies of Central Americans as compared to South American populations; (b) similar levels of variation in western and eastern South American regions, contrary to the conclusions in Wang et al. (2007); and (c) suggestion of a major bottleneck or founder effect in North America before the peopling of Central and South America.

It is not easy to find alleles that are almost or strictly private to a continental population, but two have been observed in Amerindians. Schroeder et al. (2009) found a frequency of 30% of a nine-repeat allele at microsatellite D9S1120 in Native American and Western Beringian populations and only 7% elsewhere. The allele occurs in the same haplotypic background (“the American Modal Haplotype”) except for a few cases attributed to recombination. The mean time to the allele’s most recent common ancestor was calculated to be 7,325 to 39,900 years, and the authors suggested that the findings support a single founding population for the prehistoric colonization of the continent.

The other variant ABCA1 (ATP-binding cassette transporter A1) seems to be exclusive to Native American individuals. Its C230 allele was found in 29 of 36 Amerindian groups (range 0-31%, average 12%) but in none of 863 individuals from several other ethnic groups. The allele was also found in a single haplotype, and C230-bearing chromosomes presented longer relative haplotype extension compared to others. This functional variant, which is a major determinant of HDL-C levels, may have contributed to the adaptive evolution of Native American populations (Acuña-Alonzo et al. 2010).

Extensive investigations on the Amerindian mtDNA variation can be summarized as follows: (a) the variability of 623 complete mtDNAs from the Americas and Asia suggested more genetic diversity within the founder population than was previously found, a pause in Beringia, and a swift migration southward. There were also indications of bidirectional gene flow between Siberia and the Northern American Arctic (Tamm et al. 2007); (b) a total of 515 Arctic Siberian mtDNA samples, including 84 completely sequenced, indicated that the direct ancestors of the Paleosiberian-speaking Yukaghir were originally from southern Siberia; that A2 originated in situ in Alaska; disclosed a new founding lineage (D10); and identified two refugial sources in the Altai-Sayan and mid-lower Amur, suggesting more than one founding Native American population (Volodko et al. 2008); (c) 171 complete sequences of the four pan-American A2, B2, C1, and D1 haplogroups provided coalescence times ranging from 18,000 to 21,000 years (Achilli et al. 2008); (d) 86 complete mitochondrial genomes and haplogroups A–D and X data indicated: (1) a single founding population; (2) that the initial differentiation from Asian populations ended with a moderate bottleneck in Beringia during the Last Glacial Maximum (LGM), 23,000–19,000 years ago; and (3) that toward the end of the LGM, a strong population expansion started 18,000 and finished 15,000 years ago (Fagundes et al. 2008b); and (e) 69 entire mtDNAs suggested two almost concomitant paths of migration from Beringia, 15,000 to 17,000 years ago. They are represented by two rare mtDNA haplogroups, D4h3, which would have spread along the Pacific coast, and X2a, which would have traveled along the ice-free corridor (Perego et al. 2009).

Why Migrate?

Heterogeneity in range of dispersal is widespread in nature and the distributions observed are commonly leptokurtic, with a semi-logarithmic curve which approximates to a straight line of negative slope (Bateman 1963). Human mobility patterns have several peculiarities due to the evolutionary interaction between biology and culture (Sutter 1963; González et al. 2008), but I will focus my attention on a recent association found by our group (Tovo-Rodrigues et al. 2010).

The dopamine receptor D4 (DRD4) gene is one of the most variable in the human genome. This region contains an expressed variable number of tandem repeats (VNTR) of 48 base pairs and differences of single nucleotide polymorphisms inside these repeats. In humans, the VNTR variability in this region ranges from 2 to 11 repeats. The seven repeats (7R) allele is one of the most frequent, its prevalence varying in different populations from 0% to 78%. It is most prevalent in South Amerindians and is virtually absent in Asian populations.

Biochemical analysis has shown that the 7R protein has threefold blunted ability to reduce forskolin-estimated cAMP levels when compared to the most frequent variant, the 4R protein. The 7R allele has also been associated with a novelty-seeking personality dimension, impulsivity, and hyperactivity. High linkage disequilibrium with neighboring alleles, high values of non-synonymous as compared to synonymous changes and a rapid increase in its frequency suggest that this allele is under the influence of positive selection. The 7R mutation probably arose before the Paleolithic, between 40,000 and 50,000 years ago and may have influenced the out of Africa exodus (Ding et al. 2002; Wang et al. 2004).

We investigated the 7R distribution in South Amerindians (18 populations, 568 individuals) and verified an increase in its frequency in populations with a recent past of subsistence means based in hunter-gathering (average 58%) when compared to agriculturalists (48%). Exploratory behavior that would be influenced by this allele would be adaptive in nomad, hunter-gatherer populations since it would allow better resource exploitation. On the contrary, in sedentary agricultural societies, such behavior would involve social costs, due to the intensive methods of land use, which favor permanence (Tovo-Rodrigues et al. 2010). This conclusion reinforces a previous independent suggestion of association between 7R and migration (Chen et al. 1999). Of course, further studies in populations with these subsistence patterns in other regions of the Americas and elsewhere are needed to substantiate this claim.

History, Language, and Genetics

In the past two decades, several attempts at disclosing the Amerindian genetic structure using diverse sets of protein and DNA systems have been made. To cite just one example, Sandoval et al. (2009) suggested that genetic divergence predated linguistic diversification in Mexico. A recent investigation by our group (Callegari-Jacques et al. 2011) considered the distribution of 11 STRs distributed in the different autosomes of our genome in 30 South Amerindian populations in a total of 948 individuals, looking for associations that would indicate prehistoric movements.

The results were subjected to extensive statistical analysis which included the Garza-Williamson index to detect past bottlenecks, as well as tests of specific hypotheses by analysis of molecular variance, pairwise genetic distances, and Generalized Hierarchical Modeling.

No evidence of past bottlenecks was obtained with these data, and a clear division of the populations in three broad geographical areas (Andes, Amazonia, and the Southeast, which includes the Chaco and southern Brazil) was found.

Since extensive population movements have occurred after the European arrival, we decided to make a new historical/linguistic/genetic comparison examining putative sources of geographical origin of the languages spoken by different Amerindian groups and comparing their history with the genetic constitution of people living nearby. The selected groups and samples used were as follows: Maipurean (two Matsiguenga samples, n = 55), Tupi-Mondé (Gavião, Zoró, Surui, n = 77), Tupi-Guarani (two Wayampi samples, Emerillon, Zoé, Urubu-Kaapor, Awá-Guajá, Parakanã, n = 250), Cariban (Ka’lina, Tiriyó, Apalai, Wai Wai, n = 119), Central Jean (Xavante, n = 34), Southern Jean (Kaingang, n = 50), Matacoan (two Whichi samples, n = 54), Guaykuruan (Pilagá, two Toba samples, n = 74), Zamucoan (Ayoreo, n = 48), and Barbacoan (Cayapa, n = 15).

Table 1 presents the dates for the splits and dispersions of these language groups, as inferred by our linguistic colleagues, and Table 2 the tests of hypotheses based on the genetic data. The genetic relationships among people who speak three important South American language groups (Tupian, Cariban, and Maipurean) confirmed their common identity, as verified previously, with the Tupi showing a higher genetic affinity to the Cariban rather than to the Maipurean (Salzano et al. 2005). The Chaco groups also clustered together. Other analyses placed the Central Jean near to the Chaco languages, in line with their peripheral position as compared to others of the continent. Barbacoan remained isolated, as expected.

Table 1 Splits and dispersions, South Amerindian language groups
Table 2 Generalized Hierarchical Modeling of past language-related population structure based on 11 STR loci

Coda

Broadly, the studies reviewed here clearly established an early beginning for the American prehistoric colonization process, although exact dates vary in different evaluations. The previous intermediate stay in Beringia, as postulated by the Out of Beringia model proposed several years ago (Bonatto and Salzano 1997a, b) seems now to be clearly validated. But the one-wave-only pattern of migration has been questioned again. The possibility of circumarctic gene flow in both directions between Asian and North American populations has been suggested more than once and should be checked. Other details about the number of founders and the lineages carried by them, specific population bottlenecks, and other genetic-evolutionary events that occurred during the colonization process deserve further attention.

It is characteristic of the scientific method to test specific hypotheses that would lead to successive approximations of reality. The tools available for the analysis of the problem of Amerindian origins are far better now than those we had even just a decade ago. We can be confident that at least the broad outlines of the process have been identified, but many questions remain. If in this or any other problem we would ever reach the truth is debatable. But we could try!