Mitochondrial DNA control region variation in a population sample from Thailand

Mitochondrial DNA (mtDNA) control region sequences from hair samples of 213 individuals from Thailand were analyzed using Sanger sequencing. A total of 170 different haplotypes were identified, of which 146 occurred only once (unique haplotypes). The dataset showed a random match probability of 0.87% and a haplotype diversity of 0.9960. The samples were assigned to 85 different haplogroups with B5a, F1a1a, and M being the most frequent ones. Pairwise FST-values between this and other Southeast and East Asian populations revealed significant but relatively low differences, indicating a close relation. Heteroplasmic positions were observed in 12.2% of hair samples confirming the frequent appearance of heteroplasmic positions in hairs. This dataset will complement existing data as an mtDNA reference for forensic investigations. Electronic supplementary material The online version of this article (10.1007/s00414-020-02303-2) contains supplementary material, which is available to authorized users.


Introduction
Mitochondrial DNA (mtDNA) analysis has become a routine approach in forensic casework where STR markers cannot be used. The associated estimation of the frequency of obtained mtDNA profiles in the respective population sample is based on the availability of suitable population data sets. MtDNA data from Thailand has already been published [1][2][3][4][5].
However, these datasets either are limited to SNPs in the mitochondrial hypervariable regions 1 and 2 [1] or the collection strategy focuses on a different priority, such as language [2][3][4]. Furthermore, a forensic study on a population from Thailand was done in the northern province of Chiang Mai [5]. Given that the population of Thailand is composed of different ethnolinguistic groups [6], a regional population sample cannot be considered to be representative for the whole country.
In this paper, we present a population dataset of 213 individuals living in all four major regions of Thailand.

Samples
Hair samples were obtained from 213 unrelated individuals of both sexes living in southern, central, northern, and northeastern Thailand (Fig. 1, Supplementary Table S1). The samples were collected from volunteer donors and anonymized. Written informed consent was obtained from all participants. Ethical approval for mtDNA sequencing analysis was given by the Ethics Committee of the University of Freiburg, Germany (398/16).

DNA extraction, amplification, and sequencing
Total DNA of 10-12 hairs per individual was extracted with the MagCore® Genomic DNA Tissue Kit (RBC Bioscience, New Taipei City, Taiwan) at the University of Khon Kaen (Khon Kaen, Thailand). PCR and sequencing of the entire control region were performed at the University Medical Center Freiburg -University of Freiburg (Freiburg, Germany) as described in [7] using primers given in Supplementary Tables S2 and S3.
MtDNA data quality was controlled using the EMPOP tool NETWORK [14]. Further quality control was done by the team of EMPOP at Medical University of Innsbruck. All 213 sequences are incorporated into the EMPOP database under the accession number EMP00699.

Statistical analysis
Intra-and inter-population statistical analysis was done using Arlequin v3.5.2.2 [15]. Numbers of different and unique haplotypes were counted and genetic diversity indices of the population (random match probability, haplotype diversity, number of polymorphic positions, mean number of pairwise differences, and nucleotide diversity) were calculated. Length variants at nucleotide positions 16193, 309, and 573 were ignored for statistical tests. Random match probability was calculated as the sum of squared haplotype frequencies.
We compared our data with six other Southeast and East Asian populations from recent studies [5,[16][17][18][19][20] only considering control region data (np 16024-576). We performed a molecular variance analysis (AMOVA) and calculated genetic diversity indices for the additional included studies, pairwise differences between and within populations, and pairwise F STvalues.

Results and discussion
We obtained 213 high-quality mtDNA control region sequences from Thailand to establish reference data (Supplementary Table S1). Summary statistics are presented in Table 1. From a total of 170 different haplotypes, 146 were unique. The population sample had a random match probability of 0.87% and a haplotype diversity of 0.9960 ± 0.0013 revealing a high heterogeneity in the population making it useful for forensic analyses.
We compared the detected haplotypes with those of six earlier studies of Southeast and East Asian populations including one with samples from Northern Thailand [5,[16][17][18][19][20]. A total of 44 of the 170 haplotypes (25.9%) were found in at least one other population (Supplementary Table S4). Accordingly, 126 haplotypes (74.1%) of our study were not observed in the other studies, including the most common haplotype of our study (9 samples) (cf. Supplementary Table S1).

Haplogroup composition
The 213 samples from Thailand were assigned to 85 different haplogroups (Supplementary Table S1). Some sequences could not be classified to a terminal branch of the PhyloTree [13] and were assigned to their MRCA such as macrohaplogroup M (8.9%). The most frequent terminally assigned haplogroups were B5a (9.4%), F1a1a (8.9%), and M (8.9%). All samples belong to macrohaplogroups R (50.7%), M (39.4%), and N (9.4%) except of one sample which could only be assigned to L3 as MRCA.

Genetic distances between Southeast and East Asian populations
We compared the genetic structures of our population sample from Thailand with the six other Southeast and East Asian population samples [5,[16][17][18][19][20]. The total number of samples was 1789. Analysis of molecular variance (AMOVA) revealed that 98.06% of the genetic variation is due to differences within populations. Thus, only 1.94% of the total genetic variance is caused by differences between populations ( Table 2). Bodner et al. (2011) had found an inter-population variance of only 0.84% in a very similar dataset, but only considering hypervariable regions (HVS-I and HVS-II) and a regional restricted population [16].
The number of mean pairwise differences (MPD) in the Thai population is 12.86, which is in the dimension of that observed in other Asian populations ( Table 3). The lowest MPD value (11.75) was observed in a South Korean population, the highest (13.42) in Northern Thailand.
Pairwise F ST -values between population samples were relatively low and similar, indicating a close relation between populations. Higher variance was found between Southeast Asian populations (Thailand, Northern Thailand, Laos, Northern Vietnam, and Myanmar) and South Korea (F ST 0.037-0.051), whereas genetic variance between Hong Kong and South Korea was low (F ST 0.016). All F ST -values between Thailand and other population samples were significant (Table 4).   Table S5). In a former study including 691 hair shaft samples, the frequency of point heteroplasmy was 11.4% [21]. Based on other studies using various types of tissue material, in Southeast and East Asian populations, the percentage of samples with point heteroplasmy was calculated as follows: Northern Thailand 2.6% (blood), Laos 3.7% (blood), Hong Kong 8.5% (blood), Myanmar 8.6% (blood), South Korea 10.3% (blood and buccal swabs), and Northern Vietnam 15.0% (buccal swabs) [5,[16][17][18][19][20]. These values confirm that mtDNA heteroplasmy frequency is dependent on the analyzed tissue as specified in [7]. However, it has to be considered that low level heteroplasmic positions were not detected using Sanger sequencing.

Conclusion
The sample of 213 mtDNA control region sequences will serve as a high-quality mtDNA reference for Thailand. Most of the detected haplotypes were unique within the known data and will complement the available data from Northern Thailand [5] and other Southeast Asian populations.
Recently, 960 complete mtDNA genomes from Thailand originally sequenced to investigate anthropologic questions [4] were also incorporated into the EMPOP database [14]. The increasing data of Southeast Asian mtDNA sequences is leading to a reliable forensic reference for this region. Above diagonal: Average number of pairwise differences between populations ( b Π XY) Diagonal elements: Average number of pairwise differences within population ( b Π X) Authors' contributions All authors contributed to the study conception and design. UI collected the samples and extracted the DNA. DS performed mtDNA analysis, statistical analysis, and drafted the manuscript. JN evaluated the analysis and revised the manuscript. SLB designed the study. All authors contributed to the manuscript and approved the final version.
Data availability All haplotypes are provided in the supplementary materials and were provided to EMPOP under the accession number EMP00699.

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Ethics approval Ethical approval for mtDNA sequencing analysis was given by the Ethics Committee of the University of Freiburg, Germany (398/16). Consent to participate (include appropriate statements) Informed consent was obtained from all individual participants included in the study.

Consent for publication (include appropriate statements)
The participants have consented to the submission of analyzed DNA results as anonymous data.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.