Complete mitochondrial genomes of Thai and Lao populations indicate an ancient origin of Austroasiatic groups and demic diffusion in the spread of Tai–Kadai languages
The Tai–Kadai (TK) language family is thought to have originated in southern China and spread to Thailand and Laos, but it is not clear if TK languages spread by demic diffusion (i.e., a migration of people from southern China) or by cultural diffusion, with native Austroasiatic (AA) speakers switching to TK languages. To address this and other questions, we obtained 1234 complete mtDNA genome sequences from 51 TK and AA groups from Thailand and Laos. We find high genetic heterogeneity across the region, with 212 different haplogroups, and significant genetic differentiation among different samples from the same ethnolinguistic group. TK groups are more genetically homogeneous than AA groups, with the latter exhibiting more ancient/basal mtDNA lineages, and showing more drift effects. Modeling of demic diffusion, cultural diffusion, and admixture scenarios consistently supports the spread of TK languages by demic diffusion.
There is also considerable linguistic diversity, with five language families [Tai–Kadai (TK), Austroasiatic (AA), Sino–Tibetan (ST), Hmong–Mien (HM) and Austronesian (AN)], spoken in the area. Most people speak TK languages (94.40%, in Thailand and 69.60% in Laos), while AA is the second most common language family (4.10% in Thailand and 22.70% in Laos) (Lewis et al. 2016). However, the AA family is more diverse (27 languages in Thailand and 47 languages in Laos) than TK (16 languages in Thailand and 21 languages in Laos). The ST and HM families are concentrated in the area of northern and northwestern Thailand as well as northern and central Laos (ST: 19 languages in Thailand and 11 languages in Laos; HM: 3 languages in Thailand and 4 languages in Laos). The AN family is restricted to southern Thailand with just six languages (Lewis et al. 2016). Both major families (AA and TK) are widespread across Asia; there are 167 AA languages spoken by ~102 million people from South Asia (Bangladesh and India) to southern China and MSEA, including Malaysia; and 92 TK languages spoken by ~80 million people in northeast India, southern China, Vietnam, Myanmar, Cambodia, Thailand and Laos (Lewis et al. 2016). Although the origin and spread of AA is debatable (Chaubey et al. 2011; Diffloth 2005), AA people are generally considered to be descended from the earliest inhabitants of the region (Condominas 1990; Penth 2000). TK is generally considered to have arisen in southeast China prior to 2.5 kya and then spread to SEA between 1 and 2 kya (O’Connor 1995; Pittayaporn 2014).
Although archaeological and linguistic evidence point to an expansion from southern China, physical anthropological studies indicate that the present-day Thai people resemble ancient people (Sangvichien 1966) as well as modern AA people in northern Thailand (Nakbunlung 1994). Therefore, there are two competing hypotheses concerning the origin of the modern Thai/Lao TK people: (1) a demic expansion of people from southern China that brought their genes, culture, and language to Thailand/Laos; or (2) a cultural diffusion from southern China that resulted in native AA people adopting the TK language and culture. This general question of demic vs. cultural diffusion is a long-standing one concerning expansions in other parts of the world, particularly those involving languages and/or agricultural practices, e.g., expansions associated with Indo-European, Bantu, Han and Austronesian languages (Ammerman and Cavalli-Sforza 1994; Battaglia et al. 2009; Chikhi et al. 2002; Diamond and Bellwood 2003; Pakendorf et al. 2011; Peng et al. 2010; Sokal et al. 1991; Wen et al. 2004). While genetic studies have proven to be informative in distinguishing between demic vs. cultural diffusion in these other contexts, to date, genetic studies have not been applied to this question with respect to TK people. In particular, previous mitochondrial (mt) DNA studies on Thai/Lao populations were too limited to address this question via phylogenetic or simulation-based analyses (Bodner et al. 2011; Kutanan et al. 2011, 2014). Therfore, to address the role of demic vs. cultural diffusion in the origins of the TK people as well as to investigate other aspects of Thai/Lao prehistory, we analyze here 1234 complete mtDNA genome sequences from 51 Thai/Laos populations, comprising a comprehensive sampling of TK and AA genetic diversity.
Blood or buccal samples were collected with informed consent from 1234 unrelated subjects belonging to 51 populations that were classified into 23 ethnolinguistic groups (Fig. 1; Table S1 in Online Resource 1). All groups speak either AA or TK languages and all are from Thailand, with the exception of two populations from Laos.
MtDNA sequencing and multiple alignment
DNA was isolated as described previously from blood samples (Seielstad et al. 1999) and from buccal cells with the Gentra Puregene Buccal Cell Kit (Qiagen). Sequencing libraries were constructed using a multiplex protocol for the Illumina Genome Analyzer platform (Meyer and Kircher 2010) and were enriched for mtDNA as described previously (Maricic et al. 2010). Several Illumina platforms and lengths of sequencing reads were employed, with post-processing using Illumina software and the Improved Base Identification System (Kircher et al. 2010). The software MIA (Briggs et al. 2009), which is implemented in an in-house sequence assembly–analysis pipeline for calling consensus sequences and detecting mtDNA heteroplasmy (Li and Stoneking 2012), was used to map sequencing reads to the revised Cambridge Reference Sequence (Andrews et al. 1999). A multiple sequence alignment of the sequences and the Reconstructed Sapiens Reference Sequence (RSRS) (Behar et al. 2012) was executed by MAFFT 7.271 (Katoh and Standley 2013).
The aligned sequences were assigned haplogroups using HaploGrep (Kloss-Brandstätter et al. 2010) with PhyloTree mtDNA tree build 17 (van Oven and Kayser 2009). MitoTool was also used to re-check haplogroup assignments (Fan and Yao 2011). The software Arlequin 188.8.131.52 (Excoffer and Lischer 2010) was used for the following analyses: measures of genetic diversity, pairwise genetic distances (Φst, pairwise difference), analysis of molecular variance (AMOVA) and a Mantel test comparing genetic and geographic distances between populations; for the latter, we computed three types of geographic distance, i.e. great-circle distance, least cost path distance, and resistance distance. The great-circle distance matrix was generated by Geographic Distance Matrix Generator v 1.2.3 (Ersts 2006) and the other two distance matrices were computed by the functions costDistance in the package gdistance (van Etten 2012) and using CIRCUITSCAPE (McRae 2006) based on a constructed cost-surface raster, respectively. To create this cost-surface raster, briefly, R 3.2.0 was employed using the function mosaic from the package raster (Hijmans and Van Etten 2013) to merge two data, i.e. a 30-s elevation grid generated from the WorldClim database (Hijmans et al. 2005) and vector files containing major rivers in Thailand and Laos obtained from Natural Earth. Then, a cost-surface raster was reclassified with parameters known to affect human movements, e.g., mountain, terrain and river (Tassi et al. 2015).
Nonparametric multidimensional scaling (MDS) analysis (based on Φst values) as well as correspondence analysis (CA) using haplogroup counts were constructed using STATISTICA 10.0 (StatSoft, Inc., USA).
BEAST 1.8 was used to construct Bayesian skyline plots (BSP) by population and maximum clade credibility (MCC) trees by haplogroup, based on Bayesian Markov chain Monte Carlo (MCMC) analyses. The software jModel test 2.1.7 (Darriba et al. 2012) was employed to choose the most suitable model during creation of the input file of BEAST by BEAUTi v1.8 (Drummond et al. 2012). BSP calculations were conducted with the data partitioned between coding and noncoding regions with respective mutation rates of 1.708 × 10−8 and 9.883 × 10−8 (Soares et al. 2009). Tracer 1.6 was used to visualize the BSP plot. For the Bayesian MCMC estimates (BE) and credible intervals (CI) of haplogroup coalescence times, the RSRS was employed to root the mtDNA tree. The Bayesian MCC trees from the BEAST runs were assembled with TreeAnnotator and drawn with FigTree v 1.4.0. To check clustering of sequences by haplogroup, median-joining networks without pre- or post-processing steps were constructed by Network 4.11 and visualized in Network publisher 184.108.40.206 (Fluxus Technology, http://www.fluxus-engineering.com). Contour maps are generated by Golden Software Surfer 10.0 (Golden Software Inc., USA).
The newly generated 1234 mtDNA sequences were compared with a reference data set comprising 2129 Asian mtDNA genomes representing 62 populations retrieved from the literature (Table S2 in Online Resource 1). A neighbor-joining (NJ) tree (Saitou and Nei 1987) based on the Φst distances was generated by MEGA 7 (Kumar et al. 2016).
The analysis of approximate Bayesian computation (ABC) was employed to choose the best-supported hypothesis about the maternal origins of the Thai and Laotian populations. Owing to the different local histories specific to each region, three different mtDNA data sets from the TK and AA as well as a priori parameters (e.g. divergence times) was used in the simulation process. As the origin time of prehistorical TK-speaking groups is unknown, we employed the existing time of the Tai in southern China of ~3 kya, similar to a previous study (Sun et al. 2013). Then, some prehistorical TK groups started to separate from their common ancestor with the Chinese Dai from their homeland in southern China and spread southward to the area of present-day Thailand in the last 1–2 kya (O’Connor 1995; Penth 2000; Pittayaporn 2014). Some TK groups finally reached northern Thailand where LW groups are native inhabitants and founded their kingdom, named Lanna around the end of the thirteenth century A.D. (Condominas 1990). The KM people, the majority of northern Thai, are either genetically from LW groups or admixed with them, and, thus, should originate at this time. We, therefore, conduct the first analysis by pooling ten KM populations (KM1-KM10) as well as combining the three AA-speaking Lawa groups (LW1–LW3) and using the Xishuanbanna Dai as a representative of the Tai source from southern China (Diroma et al. 2014). Although, nowadays, the IS and LA people constitute the vast majority of populations in northeastern Thailand and Laos, respectively, both of them share ethnic identity, and the historical motherland of Lao Isan is in Laos (Schliesinger 2001). Allowing for the differences in both routes of migration and times of prehistorical TK groups, the migration from further north to the area of present-day Lao would have met the KH groups, one of the predominant AA people in SEA, who established the Angorian state around 1.2 kya (Higham 2014). In addition, SU, KA, BU and SO are the other AA groups distributed in the area of present-day Laos whose ancestors could have interacted with TK groups. In the second analysis, therefore, the Xishuanbanna Dai is utilized as the Tai sources, while all AA groups (KH1–KH2, SU, KA, BU, and SO) are combined and the TK-speaking Lao groups (LA1–LA2 and IS1–IS4) are pooled. In the last analysis, we focus on the IS, as they are a Lao group who recently migrated to northeastern Thailand, approximately 250 ya; evidence of biculturalism between KH and IS in northeastern Thailand has been recorded (Vail 2007). One potential scenario was that the IS (IS1–IS4) diverged from the LA (LA1–LA2) without any genetic contact with the KH (KH1–KH2); a second scenario is that IS did admix with KH after diverging from LA. Although an origin of IS from KH is unlikely, we also investigated this scenario.
The simulated data sets were generated by the software package ABCtoolbox (Wegmann et al. 2010). The posterior probabilities were calculated by employing two different approaches, acceptance–rejection procedure (AR) (Pritchard et al. 1999) and weighted multinomial logistic regression (LR) (Beaumont 2008). The former approach considers only a certain number of “best” simulations, and then simply counts the proportion of those retained simulations that were generated by each investigated model. After a few hundred simulations, an excellent fit with the observed data indicates that this approach is reliable (Beaumont 2008), and therefore, 100, 200 and 500 of the best simulations were used in this analysis. According to the latter approach, a logistic regression is fitted where the model is the categorical dependent variable and the summary statistics are the predictive variables. The regression is local around the vector of observed summary statistics, and at the point equivalent to the observed vector of summary statistics, the probability of each model is estimated. Maximum likelihood was used to evaluate the β coefficients of the regression, considering different numbers of retained simulations (50,000, 100,000 and 150,000). The posterior probabilities for each model were calculated by the modified R scripts (http://code.google.com/p/popabc/source/browse/#svn%2Ftrunk%2Fscripts). The following summary statistics were employed: the number of haplotypes, haplotype diversity, total number of segregating sites, number of private segregating sites, Tajima’s D, and mean number of pairwise differences for each population, as well as mean number of differences between pairs of populations and pairwise Φst. The distribution of simulated data under different models with respect to the observed data was evaluated by a visual inspection of a principal component analysis (PCA) of the best 1000 (or 5000) simulations for each model, using the PCA function implemented in the R package FactoMineR (Husson et al. 2007).
The power to infer the correct model in all tests was estimated by generating 1000 pseudo-observed data sets according to each analyzed model, with parameter values randomly chosen from the corresponding prior distribution. These pseudo-observed datasets were examined along with the same ABC framework applied in the model selection (i.e., with logistic regression and 50,000 retained simulations). Three different sets of models were considered separately. For each model, we evaluated the proportion of cases where the true model was correctly chosen (i.e., true positives) as well as the proportion of cases where the model selection procedure assigned the highest support to one of the other two tested models (i.e., false positives), considering a posterior probability threshold of 0.5 to assign the support.
Genetic diversity is higher in TK than in AA groups
For the 1234 mtDNA genome sequences obtained (GenBank under accession numbers KX456435–KX457668), there are 761 distinct sequences (haplotypes) belonging to 212 haplogroups (Table S3 in Online Resource 1). Details concerning sequencing results and sequence coverage are provided in Online Resource 2. The summary statistics for the genetic diversity in each population are provided in Table S1. Haplotype diversity (h) ranges from 1.00 in the LA2 (see Fig. 1 for population locations and population abbreviations) to 0.80 in the TN2 group. The SK, BO and TN1 groups also exhibit h values somewhat lower than the remaining populations; the same trend is observed for haplogroup diversity, as relatively large values are observed in almost all populations except in TN1, TN2, SK and BO. Both nucleotide diversity (π) and mean number of pairwise differences (MPD) are also the lowest in the TN1 group (0.0013 and 21.41, respectively), while the largest values are observed in the MO2 group (0.0026 and 42.6, respectively).
Haplotype and haplogroup diversity values as well as the number of segregating sites are significantly higher for TK than for AA groups (Mann–Whitney U tests: h: Z = 3.34, P = 0.0008, haplogroup diversity: Z = 3.53, P = 0.0004, number of segregating site: Z = 2.85, P = 0.0044). However, the π values of AA groups are not significantly different from those of the TK groups (Z = 1.45, P = 0.15).
Greater genetic heterogeneity of AA groups
Analysis of molecular variance (AMOVA) results
Number of groups
Among population (within group)
Greater genetic homogeneity among the TK populations was also reflected in the haplotype sharing analysis (Table S4 in Online Resource 1), which showed that they shared more haplotypes than the AA populations. In particular, the various KM populations shared a number of haplotypes, as did the PU populations, indicating some recent genetic exchange/ancestry among populations within the same ethnolinguistic group. The highest number of shared haplotypes is five, which are shared among the KM5–KM6 and PU2–PU4 groups. Many haplotypes in the PU are shared with almost all of the other TK populations. Among the AA populations, despite the relatively large genetic differences between the TN2 and TN3 populations, they share four haplotypes. Overall, only four populations (IS3, SK, MO1 and MO4) did not share any haplotypes with any other population.
Significant genetic differentiation within ethnolinguistic groups
Surprisingly, we observed striking and significant genetic differences between populations classified as the same ethnolinguistically but sampled from different locations. This can be seen in the MDS analysis (Fig. 2a, b), in which two of the three most extreme outliers are from the same ethnolinguistic group, namely two of the three AA-speaking H’tin groups, TN1 and TN2 (the third outlier is the SK, a TK-speaking group from northeastern Thailand). In fact, the MDS analysis shows that in many cases, populations from the same ethnolinguistic group are not genetically similar. This is further indicated by an AMOVA for each separate ethnolinguistic group that was sampled from multiple locations (Table 1); in all such instances, the among-populations variance component is significantly different from zero. This unexpected high degree of heterogeneity within the same ethnolinguistic group contributes to the lack of correspondence between the genetic structure of the Thai/Laos populations and their geographic/linguistic relationships.
Relationships with other Asian populations
The Bayesian estimates (BE) of coalescence times with 95% credible intervals (CI) for each haplogroup
The haplogroup profiles by population emphasize the greater genetic heterogeneity in AA groups than in TK groups (Fig. 1; Table S3). Some AA groups have extremely high frequencies of particular haplogroups, indicating the pronounced effect of genetic drift; examples include: R9b2 with a frequency of 32.00% in TN2; R22 with frequencies of 17.39% in BO and 20.00% in SU; D4 with frequencies of 28.00% in MO1, 31.81% in MO5, 22.73% in LW1, and 20.00% in PL; and B6a with a frequency of 72.00% in TN1. Overall, the greater heterogeneity in haplogroup distribution and pronounced haplogroup frequency differences are consistent with an older presence of AA groups in Thailand.
Some haplogroups prevalent in South Asia also occur in some AA groups, especially the Mon groups. These include D4, mentioned above, as well as W3a1b, which is reported here for the first time in MSEA. W3a1b was found in two Mon populations (24.00% in MO1 and 4.35% in MO2); these haplogroups provide further evidence for genetic connections between these Mon groups and South Asia.
Although many haplogroups are shared between MSEA and ISEA, there are distinct differences in the distribution of some sublineages. For example, haplogroup B4 is widespread throughout SEA; in our study, it is almost entirely restricted to TK groups (Fig. 1; Table S3), where it occurs as three primary sublineages, namely B4b1a2a, B4a1c4 and B4c2, all of which have been reported previously in MSEA (Peng et al. 2010; Zhang et al. 2013). Several other B4 sublineages characteristic of Taiwan (e.g., B4b1a2h, B4b1a2f and B4b1a2g) (Ko et al. 2014), the Philippines (e.g., B4b1a2b, B4b1a2c and B4b1a2d) (Gunnarsdottir et al. 2011) and Oceania (e.g., B4a1a1a) (Duggan et al. 2014) were not found in our study, in agreement with previous studies (Summerer et al. 2014; Zhang et al. 2013). Overall, the lack of sharing of recent sublineages indicates a lack of recent contact between MSEA and ISEA (Fig. S4).
Finally, the more extensive sampling of Thai/Laos mtDNA sequences in this study has resulted in much deeper ages for some haplogroups that were poorly sampled in previous studies. For example, we estimate that haplogroups R9b and R22 both coalesce at ~39 kya (Table 2), compared to previous estimates of ~29 kya (Hill et al. 2006) and ~19 kya (Zhang et al. 2013), respectively. Moreover, while R9b and R22 have been suggested to originate in southern China (Hill et al. 2006) and ISEA (Hill et al. 2007; Zhang et al. 2013), respectively, northeastern Thailand is also a potential source for these haplogroups (Fig. S5).
Population size change trends over time
Testing models of demic diffusion vs. cultural diffusion vs. admixture
Although these three demographic scenarios are proposed for all TK people, archaeological, linguistic and historical evidence clearly indicate the potential for differences in the local history and demography, especially for groups from northern vs. northeastern Thailand (Penth 2000; Schliesinger 2001). We, therefore, performed ABC analyses using three different data sets in all three demographic scenarios: (1) northern Thai people (Khon Mueang, KM); (2) ethnic Lao including northeastern Thai people (Lao Isan, IS) and Laotian (LA); and (3) Lao Isan (to infer the history of this specific population, for reasons detailed in the “Methods” section). In each analysis, we used AA populations for comparison and set priors for some parameters (e.g., divergence and admixture time) based on historical evidence, as detailed in the “Methods” section.
In general, the results of the ABC analyses show that in all cases, the simulated data included the observed data (Fig. S7 in Online Resource 3) and the results of the model selection are consistent among different thresholds, i.e., the different numbers of simulations retained to fit the logistic regression curve. The highest posterior probabilities in both approaches, AR (0.70–0.74) and LR (0.84–0.86), support the demic diffusion model in the northern Thai KM (Fig. 6). Even though the AA-speaking LW groups have culturally interacted with the KM (Condominas 1990; Penth 2000), they are not the maternal ancestor of the KM. The test of ethnic Lao (IS and LA; scenario 2) shows the same trend in supporting the demic diffusion model, although it received higher support by LR (0.76–0.79) than by AR (0.56–0.63). The ethnic Lao are, thus, genetically distinct from the neighboring AA-speaking groups, including the KH, KA SO, SU and BU groups. These two results for TK groups across a vast area of Thailand and Laos, thus, indicate a genetic origin of the TK from southern China followed by a rapid population expansion from (presumably) a few groups to the current census size of around 50 million, within 1–2 kya. For the last analysis concerning the origin of the IS population, there is no distinction between the demic diffusion and admixture models, which differ by absence/presence of migration between KH and IS beginning ~250 years ago. The AR assigned a probability of about 0.55 to demic diffusion and about 0.45 to admixture but vice versa in LR. In either event, this analysis does not support the purely cultural diffusion model.
The results of power analysis for the three tested data sets indicated that the true positive rate is generally good, in particular for the demic diffusion model in the first two tests (which was unequivocally supported by the model selection procedures). The false positive rate is low in almost all of the comparisons (less than 0.05) for the selected model of the second test, and slightly higher (0.066) for the selected model of the first test (Table S5 in Online Resource 1). In sum, these results confirm the reliability of the posterior probabilities of the models.
The extensive and intensive sampling of complete mtDNA genomes in 51 AA and TK groups from Thailand and Laos shows a high genetic diversification with a total of 212 haplogroups observed. The proposed autochthonous ancient lineages are B5a1d, B6a, R22, R9b and F1f; the many basal lineages detected in this study suggests that the area of present-day Thailand and Laos may have been an ancient migratory route for modern humans, in accordance with the finding that the oldest modern human remains in East Asia are from Tam Pa Ling Cave in Laos (Demeter et al. 2012). Previous studies have suggested Myanmar (Li et al. 2015) and Cambodia (Zhang et al. 2013) as the corridor for initial settlers, assuming travel along river valleys; our results indicate that in addition, early modern human groups may have migrated through the interior upland, as also suggested by archaeological evidence found in caves in the highlands (Pureepatpong 2006; Shoocondej 2006).
Several lines of evidence point to a more ancient presence of AA groups than of TK groups, including greater genetic heterogeneity and, on average, older maternal lineages, in keeping with previous studies (Kutanan et al. 2011, 2014; Srithawong et al. 2015). There are also distinct affinities between some AA groups (especially the Mon groups) and South Asia, where AA groups are also found. TK groups are less heterogeneous, tend to show more signs of population expansion, and more genetic affinities with southern Chinese groups than with AN groups. The modeling of different demographic scenarios for different groups of populations further supports a demic diffusion of the ancestors of TK groups from southern China. However, the BSP results do not indicate population expansions in the history of some TK groups, e.g., KM. A possible explanation for this discrepancy is that sample collection procedures can produce a spurious signal of population decline in BSP analyses (Heller et al. 2013). Moreover, in addition to the ABC analyses, there is other evidence for demic diffusion of TK groups, e.g., the genetic distance analyses and the distribution of particular haplogroups.
The genetic affinities between TK and AN groups are in keeping with linguistic affinities between the TK and AN language families (Sagart 2004) and may be explained by the hypothesis that aboriginal Taiwanese (i.e., the first Austronesians) are descended from a migration associated with rice and millet farming that began in northern China between 9000 and 11,000 years ago and went both to Taiwan as well as continuing into southern China and MSEA (Ko et al. 2014). Thus, according to this view, both AN and TK groups have a common origin that reflects this agricultural expansion and can be seen in both the genetic and the linguistic data. There are further genetic affinities between MSEA and ISEA, but no sharing of recent sublineages, in keeping with previous studies that suggested a pre-Austronesian migration from MSEA to ISEA (Jinam et al. 2012).
Finally, there is widespread and significant genetic heterogeneity among samples from the same ethnolinguistic group from different locations. This result holds for all cases where there was more than one sampling location per ethnolinguistic group (Table 1). It appears that this heterogeneity arises from various sources. In the hill tribes, such as the Lawa and H’tin, isolation and drift due to geography and cultural constraints (e.g., matrilocality) appear to be the major factor. For the lowland populations (MO, KH, IS, KM, and PU), recent gene flow with other groups seems to be the major factor. Overall, these results suggest that multiple samples from the same ethnolinguistic group should be obtained whenever feasible, especially for ethnolinguistic groups distributed across a wide geographic area.
In conclusion, this study provides a comprehensive data set of complete mtDNA genome sequences, which we have utilized to gain new insights into the history of Thai/Laos populations. Information from other genetic markers, e.g., Y chromosome and genome-wide data, will provide even more insights into the genetic history of this region.
Open access funding provided by Max Planck Society. We would like to thank all village chiefs and participants who donated their biological samples. We greatly appreciate the assistance of the following coordinators who assisted in collecting samples: Khamnikone Sipaseuth, Saksuriya Triyarach, Narongdech Mahasirikul, Praweena Maneerattanaroongroj, Suparat Srithawong, Kanokpohn Srithongdeang, Nattapol Poltham and Sukhum Ruangchai. We also thank Roland Schröder, Chiara Barbieri, Leonardo Arias Alvis, Enrico Macholdt and Sandra Oliveira from MPI-EVA for technical assistance and valuable advice. This study was primarily supported by the MPI-EVA and Research and Academic Affairs Promotion Fund (RAAPF) of Faculty of Science, Khon Kaen University.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
All procedures performed in studies involving human participants were approved by Chiang Mai University, Khon Kaen University, Naruesuan University, and the Ethics Commission of the University of Leipzig Medical Faculty.
Informed consent was obtained from all individual participants included in the study.
- Ammerman AJ, Cavalli-Sforza LL (1994) The Neolithic transition and the genetics of populations in Europe. Princeton University Press, New JerseyGoogle Scholar
- Anderson D (1990) Lang Rong Rien rockshelter: a Pleistocene-early Holocene archaeological site from Krabi, Southwestern Thailand. University of Pennsylvania Press, PhiladelphiaGoogle Scholar
- Battaglia V, Fornarino S, Al-Zahery N, Olivieri A, Pala M, Myres NM, King RJ, Rootsi S, Marjanovic D, Primorac D, Hadziselimovic R, Vidovic S, Drobnic K, Durmishi N, Torroni A, Santachiara-Benerecetti AS, Underhill PA, Semino O (2009) Y-chromosomal evidence of the cultural diffusion of agriculture in Southeast Europe. Eur J Hum Genet 17(6):820–830CrossRefPubMedGoogle Scholar
- Beaumont M (2008) Joint determination of topology, divergence time and immigration in population trees. In: Matsumura S, Forster P, Renfrew C (eds) Simulations, genetics and human prehistory. McDonald Institute for Archaeological Research, Cambridge, pp 135–154Google Scholar
- Briggs AW, Good JM, Green RE, Krause J, Maricic T, Stenzel U, Lalueza-Fox C, Rudan P, Brajkovic D, Kucan Z, Gusic I, Schmitz R, Doronichev VB, Golovanova LV, de la Rasilla M, Fortea J, Rosas A, Pääbo S (2009) Targeted retrieval and analysis of five Neanderthal mtDNA genomes. Science 325:318–321CrossRefPubMedGoogle Scholar
- Chaubey G, Metspalu M, Choi Y, Mägi R, Romero IG, Soares P, van Oven M, Behar DM, Rootsi S, Hudjashov G, Mallick CB, Karmin M, Nelis M, Parik J, Reddy AG, Metspalu E, van Driem G, Xue Y, Tyler-Smith C, Thangaraj K, Singh L, Remm M, Richards MB, Lahr MM, Kayser M, Villems R, Kivisild T (2011) Population genetic structure in Indian Austroasiatic speaker: the role of landscape barriers and sex-specific admixture. Mol Biol Evol 28(2):1013–1024CrossRefPubMedGoogle Scholar
- Condominas G (1990) From Lawa to Mon, from Saa’ to Thai. Research School of Pacific Studies. Australian National University, CanberraGoogle Scholar
- Demeter F, Shackelford LL, Bacon AM, Duringer P, Westaway K, Sayavongkhamdy T, Braga J, Sichanthongtip P, Khamdalavong P, Ponche JL, Wang H, Lundstrom C, Patole-Edoumba E, Karpoff AM (2012) Anatomically modern human in Southeast Asia (Laos) by 46 ka. Proc Natl Acad Sci USA 109(36):14375–14380CrossRefPubMedPubMedCentralGoogle Scholar
- Duggan A, Evans B, Friedlaender FR, Friedlaender JS, Koki G, Merriwether DA, Kayser M, Stoneking M (2014) Maternal history of Oceania from complete mtDNA genomes: contrasting ancient diversity with recent homogenization due to the Austronesian expansion. Am J Hum Genet 94(5):721–733CrossRefPubMedPubMedCentralGoogle Scholar
- Ersts PJ (2006) Geographic Distance Matrix Generator v1.2.3. American Museum of Natural History, Center for Biodiversity and ConservationGoogle Scholar
- Higham C (2014) Early mainland Southeast Asia: from first humans to Angkor. River Books Press, BangkokGoogle Scholar
- Hijmans RJ, Etten JV (2013) Raster: geographic data analysis and modeling. R package version 2:1–49Google Scholar
- Husson F, Josse J, Le S, Mazet J (2007) FactoMineR: factor analysis and data mining with R. R package version 1.04Google Scholar
- Jinam TA, Hong LC, Phipps ME, Stoneking M, Ameen M, Edo J, HUGO Pan-Asian SNP Consortium, Saitou N (2012) Evolutionary history of continental Southeast Asians: “early train” hypothesis based on genetic analysis of mitochondrial and autosomal DNA data. Mol Biol Evol 29:3513–3527Google Scholar
- Lewis MP, Simons GF, Fennig CD (2016) Ethnologue: languages of the world, 19th edn. SIL International, DallasGoogle Scholar
- Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harbor Protoc 6:1–10Google Scholar
- Nakbunlung S (1994) Origins and biological affinities of the modern Thai population: an osteological perspective. Dissertation, University of IllinoisGoogle Scholar
- Penth H (2000) A brief history of Lanna: civilizations of North Thailand Chiang Mai. Silkworm Books, Chiang MaiGoogle Scholar
- Pittayaporn P (2014) Layers of Chinese loanwords in proto-southwestern Tai as evidence for the dating of the spread of southwestern Tai. Manusya J Humanit 20:47–68Google Scholar
- Pureepatpong N (2006) Recent investigations of early people (late Pleistocene to early Holocene) from Ban Rai and Tham Lod rock shelter sites, Pang Mapha district, Mae Hongson province, Northwestern Thailand. In: Bacus EA, Glover IC, Pigott VC (eds) Uncovering Southeast Asia’s past. NUS Press, Singapore, pp 38–45Google Scholar
- Schliesinger J (2001) Tai group of Thailand, volume 1: introduction and overview. White Lotus Press, BangkokGoogle Scholar
- Shoocondej R (2006) Late Pleistocene activities at the Tham Lod rockshelter in highland Bang Mapha, Mae Hongson Province, Northwestern Thailand. In: Bacus EA, Glover IC, Pigott VC (eds) Uncovering Southeast Asia’s past. NUS Press, Singapore, pp 22–37Google Scholar
- Summerer M, Horst J, Erhart G, Weißensteiner H, Schönherr S, Pacher D, Forer L, Horst D, Manhart A, Horst B, Sanguansermsri T, Kloss-Brandstätter A (2014) Large-scale mitochondrial DNA analysis in Southeast Asia reveals evolutionary effects of cultural isolation in the multi-ethnic population of Myanmar. BMC Evol Biol 14:17CrossRefPubMedPubMedCentralGoogle Scholar
- van Etten JV (2012) distance: distances and routes on geographical grids. R package version 1.1–4Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.