Introduction

Taiwan's population of about 23 million is heterogeneous and is made up of 97.9% Han Chinese and 2.1% indigenous people. Han Chinese are composed of Minnan (68.9%), Hakka (15%) and mainland Chinese (14%) who arrived from China after World War II. Minnan and Hakka are the so-called “Taiwanese” who have immigrated to Taiwan from the southeast coast of China since the seventeenth century [1]. Physical and cultural anthropology has led to the classification of some 14 different indigenous Taiwanese people [2]: Ami, Paiwan, Atayal, Bunun, Truku, Rukai, Puyuma, Tsou, Saisiyat, Yami (Tau), Kavalan, Thao, Sakizaya and Sediq. The Paiwans, with a population size of about 84,500, reside around in the southern mountainous regions of Taiwan (Fig. 1) and belong to the Formosan sub-branch of the Austronesian language group [3, 4]. Linguistic and archaeological evidence suggests that Austronesian language speakers expanded early from Southern China into Taiwan [5, 6]. Due to the recent theory of Taiwan being the stepping-stone of the Austronesian demographic expansion, many anthropological studies have investigated the distribution of Y-STR haplotypes amongst Taiwanese indigenous people. Major limitations to these studies however, have been the relatively small sample size. None of the previous publications on male genetics of the Paiwan [614] comprise more than 50 chromosomes, which sometimes provide less information for studying. Here, we present the largest Y-STR study on the Paiwans to date.

Fig. 1
figure 1

Map of the region of residence of the Paiwan population in Taiwan (shaded). (Modified from http://www.apc.gov.tw/)

Materials and methods

After obtaining consent, buccal swab samples were collected from 208 unrelated male Paiwan volunteers residing in the southern mountainous regions of Taiwan. Genomic DNA was extracted and purified using the Genomic DNA Extraction Kit (Yeastern Company, Taipei County, Taiwan, ROC.), and DNA concentrations were determined with the 7300 real-time PCR system using the Quantifiler™ Y Human Male DNA Quantification Kit (Applied Biosystems, Foster City, CA, USA).

Target DNA (0.1 ng/μL) was amplified by PCR, using the AmpFlSTR® YFiler™ PCR Amplification Kit (Applied Biosystems), at the following loci: DYS19, DYS385ab, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635 and GATA H4. The efficacy of the kit was determined by the identification of a 9-marker European minimal haplotype (minHt, the haplotype set of DYS19, DYS385ab, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) and an 11-marker SWGDAM (Scientific Working Group on DNA Analysis Methods) core set (minHt plus DYS438 and DYS439). Amplification was performed in 25 μL reactions containing 10 μL target DNA, 5 μL primer set, 9.2 μL PCR reaction mix and 0.8 μL DNA polymerase using a GeneAmp PCR System 9700 (Applied Biosystems) under the following conditions: 95°C for 11 min, followed by 30 cycles at 94°C for 1 min, 61°C for 1 min and 72°C for 1 min, with a final extension step at 60°C for 80 min. Electrophoresis of the PCR products, together with GeneScan-500 Internal Lane Size Standard (LIZ-500) to determine the base-pair sizes, was performed on the ABI Prism® 3100 Avant Genetic Analyzer (Applied Biosystems). GeneMapper ID Software Version 3.1 (Applied Biosystems) was used in order to determine the allelic repeats by applying YFiler Allelic Ladder. Alleles were named as suggested by the DNA Commission of the International Society of Forensic Genetics [15].

Allelic frequencies were estimated by direct counting. Gene diversity and haplotype diversity were calculated according to Nei [16]. Comparisons among populations were computed by an analysis of molecular variance (AMOVA) test. Pairwise values of Φ st, an analogue of F st that takes the evolutionary distance between individual haplotypes into account [17, 18], were calculated to measure genetic distances between minimal 9-locus haplotypes of the Paiwan and the published data from ten other populations [1927] from the region (n = 2112) with the statistical significance determined by a permutation test (10,000 replicates). We used an implementation of AMOVA provided at the YHRD website [28]. The DYS389I allele length was obtained by subtracting the shorter allele from the longer allele at DYS389I/II. To illustrate the relationship between populations based on pairwise Φ st, an MDS plot was created by using the “Population analysis” tools of the YHRD [28]. A median-joining network [29] based on 12-locus Y-STR haplotypes (omitting the complex repeats) was calculated using the NETWORK 2.0b software (available online, www.fluxus-engineering.come/sharenet.htm).

Results and discussion

The Y-STR allele frequencies of the Paiwan population are shown in Table S1. Forensic indices such as the locus diversity and haplotype diversity values were calculated from the allelic frequency for each locus. Of the 17 markers analyzed, DYS385ab showed the greatest degree of diversity, 0.8354, and DYS438 showed the lowest, 0.1014. A total of 135 haplotypes were identified in the 208 individuals studied, of which 102 were unique (Table S2). The overall observed haplotype diversity reached 0.9922 ± 0.0010, and the discrimination capacity was 0.6490. The most frequent 17-locus haplotype was H122 that was identified in nine individuals (Table S2). The diversity values were markedly reduced when compared to the major population in Taiwan [30]. In addition, three intermediate or duplicated alleles at locus DYS448, not yet observed in the YHRD 3.0, (Release 29 from June 15th, 2009; 15.2, 16.2 and 15.2, 18), were identified in this study. A further search of the YHRD release reveals no match among 17,384 haplotypes. Only one match (H102) was observed with the Atayal population [21]. A median-joining network analysis for the Paiwan sample based on the YFiler haplotypes (omitting the loci with mutations at different repeat blocks DYS389II, DYS390, DYS385ab and DYS448) shows 36 different haplotypes occurring more than once. Most of these haplotypes fall in three clearly discernible clusters, A–C, which surround the predominant H43, H63 and H122 haplotypes (Table S2, Fig. 2). The network illustrates the reduced haplotype variability and the dominance of certain patriclans within the Paiwan population. Taken together, this data indicate a patrilocal type of residence of the Paiwan which live in relative isolation in the southern mountainous regions of Taiwan.

Fig. 2
figure 2

Median-joining network for 36 different 12-locus haplotypes (n > 2) from the Paiwan population. Three clusters surrounding dominant haplotypes (black circles) are clearly discernible

In order to analyze the relationship of the Paiwan tribal people to other Austronesian-speaking (Atayal, Malay and Filipino) and non-Austronesian-speaking neighbouring populations from Taiwan, South Eastern China and Japan, we compared the most widely studied minimal 9-locus haplotypes (minHt) of the Paiwan via AMOVA with published data from several reference populations. In total, 2112 reference samples from the following populations (all published and fully referenced in the YHRD) we used: 805 Taiwanese from Taiwan, YHRD Accession Numbers 003340 and YA003193 [19, 20], 170 Atayal from Taiwan, YA003524 [21], 109 Han Chinese from Minnan, YA003308 [22], 76 Filipinos from Manila, YA003202 [23], 211 Filipinos from Luzon, YA003206 [24], 334 Malays from Malaysia, YA003278 [25], 113 Bidayuh from Malaysia, YA003415 [26], 103 Iban from Malaysia, YA003416 [26], 104 Melanau from Malaysia, YA003417 [26] and 87 Japanese from Okinawa, YA003205 [27]. All pairwise comparisons between the indigenous Paiwan and Atayal and the other populations show large Φst values and are significant at the 5% level (see Fig. 3, Table S3). As shown by AMOVA and illustrated in the MDS plot (Fig. 3), the Paiwan and Atayal indigenous tribes show little genetic relationship to other Taiwanese, Chinese, Malaysian and Japanese populations tested. A close association between the Paiwan and Tagalog- and Cebuano-speaking populations of the Philippines (which also belong to the Austronesian language group) was also not detectable with Y-STR markers. Surprisingly, while intermarriage between the early settlers from the mainland with pre-existing mountain tribes occurred [19], any admixture of the male gene pool of Chinese settlers on Taiwan and indigenous people is not supported by our Y chromosome analysis. Accordingly, the studied native Paiwan population from Taiwan is seen as a population isolate with a male gene pool formed by caste endogamy and a patrilocal residence type resulting in strong genetic drift [31].

Fig. 3
figure 3

MDS plot based on pairwise Φ st values calculated for the Paiwan (filled circle) and ten reference populations (open circles)

While the national Y-STR database for the Taiwanese populations is growing with currently 1183 haplotypes (refer to www.yhrd.org, National databases, Taiwan) the heterogeneous populations structure must strictly be observed by construction of separate database branches for major and minor subpopulations.