Genetic consequences of postglacial colonization by the endemic Yarkand hare ( Lepus yarkandensis ) of the arid Tarim Basin

Orogenesis of the Qinghai-Tibetan Plateau, which occurred in a stepwise manner, contributed to the extreme aridity of the Tarim Basin, resulting in vulnerable and unstable ecosystems. Quaternary climatic oscillations may have affected the ecosystems and, consequently, the distributions and genetic structuring of the Tarim Basin’s biota. We used nucleotide sequence data from 2 mitochondrial (mt) DNA genes (Cyt b and the D-loop) to test hypotheses associated with the matrilineal and demographical histories of the Tarim Basin’s endemic Yarkand hare ( Lepus yarkandensis ). Range-wide sampling involving 20 populations and 224 individuals detected 126 haplotypes that clustered into 5 major lineages in both the phylogenetic tree and median-joining network. P opulations from the northern and eastern Tarim Basin shared a similar history, as did those from the western and southern regions. Demographical analysis and genetic diversity estimations suggested that the western and southern regions might have served as glacial refugia for the Yarkand hare during Quaternary climatic oscillations. The distribution of the Yarkand hare, especially in the northern and eastern parts, probably represented 3 postglacial colonization events, dated to 0.21, 0.090 and 0.054 MYA, which corresponded to known interglacial periods . Given the relatively complete geographic isolation between the eastern and southern populations, the Yarkand hare likely dispersed during postglacial periods from the southwest to the north, and then onward to the east. The absence of water likely forced the species into refugia, and this differed from other Pleistocene bio-geographical drivers. The demographical and historical patterns have important implications for conservation.

Orogenesis of the QTP affected surrounding ecologies. For example, genealogical evaluations of largely aquatic frogs are associated with changes in river courses [21]. The phylogenetic history of Southeast Asian frogs in the tribe Paini was greatly affected by the Cenozoic Indo-Asian collision that resulted in the uplift of the QTP [22]. Other surrounding regions, such as the Tarim Basin, may have been similarly affected.
The Tarim Basin is located on the northern edge of the QTP in southern Xinjiang Uygur Autonomous Region, northwestern China ( Figure 1). With an area of ~560000 km 2 , it is the world's largest inland basin. To the north, it is bounded by the Tianshan Mountains, to the south by the Kunlun Mountains, to the west by the Pamir Plateau, and to the southeast by the Altun Mountains. The elevations in the basin range from 800 to 1300 m above sea level (a.s.l), whereas the average height of the surrounding mountains exceeds 4000 m a.s.l. The Taklamakan Desert (337000 km 2 ), which dominates much of the Tarim Basin, is encircled by intermittent oases. Being extremely arid, the basin's ecosystem is vulnerable and unstable [23][24][25][26]. During the Quaternary, montane glaciers surrounded the Tarim Basin, but these were absent within the Tarim Basin itself [27][28][29][30][31][32].
To what extent, if any, was the biota in the Tarim Basin affected by Quaternary glacial cycles?
The Yarkand hare (Lepus yarkandensis) only is associated with the system of oases that encircle the Taklamakan Desert [33]. It is smaller than other species of Chinese hares, and it appears to have adapted to tolerating extreme environmental stress. Populations of Yarkand hares are in decline because desert encroachment and human activities [34]. It is listed in the Second Category of State Key Protected Wildlife List (1988), and as Near Threatened in the 1996 IUCN Red List of Threatened Animals [35].
The Yarkand hare is a model organism for studying the effects of Quaternary climatic oscillations on the Tarim Basin biota. It speciated around 0.64 ± 0.26 million years ago (MYA) [36] and successfully survived the Quaternary. Although several species of Lepus were greatly affected by Pleistocene climatic fluctuations [11,[36][37][38][39][40], the impact of climatic oscillations on the demographic history and genetic diversity of the Yarkand hare is poorly understood [41,42]. Consequently, we analyzed mitochondrial DNA diversity from the species. Our primary null hypothesis (H 0ran ) states that genetic structuring is random, other than isolation by distance, and genetic diversity is equally distributed across the range. Several associated secondary null hypotheses are noted below.

Sample collection and laboratory methods
A total of 224 skin or muscle samples from Yarkand hares were obtained from 20 localities in the Tarim Basin, which almost completely covered the species range ( Figure 1 and Table S1). All collections were performed following animal use protocols approved by the Kunming Institute of Zoology Animal Care Committee.
Total DNA was extracted following standard procedures using proteinase K [43]. The complete Cytochrome oxidase b (Cyt b) gene and control region (D-loop) were amplified from all samples using degenerate primers. Primer pair 5′-CAACTACAAGAACCTAATG-3′ (forward primer) and  Table S1. 5′-CAGGGTAATAYACTATACTA-3′ (reverse primer) were used for Cyt b, and 5′-CAGAGATGGAGATYAACT-C-3′ (forward primer) and 5′-GCATGGGCTGATTAGTC-AT-3′ (reverse primer) for the D-loop. PCR amplification was performed as follows: 95°C initial hot start for 5 min; followed by 35 cycles of 94°C denaturation for 1 min, 51°C annealing for 1 min and 72°C extension for 1 min. A final extension was made at 72°C for 10 min. Amplified PCR products were purified and then sequenced in both directions on an ABI PRISM™ 3730 DNA sequencer, following the manufacturer's protocols. All sequences obtained were checked carefully for ambiguous scores and then queried by BLAST searches against GenBank to confirm homology [44]. The sequences were deposited in GenBank with the accessions [HM752273-HM752352] for Cyt b and [HM75-2353-HM752460] for D-loop.

Data analyses
The Cyt b and D-loop sequences were initially aligned using ClustalX 1.81 [45] and subsequently refined by visual inspection. Nucleotide diversity (π) and haplotype diversity (h) were determined using ARLEQUIN v.3.1 [46]. The Yarkand hare is very mobile; therefore, the 20 populations were divided into four geographical groups according to the potential river and desert barriers ( Figure 1). Each group was analyzed separately as follows: North (KC, SY, LT, KRL, YULI, AKS and AWT); East (RQ and QM); West (ATS, KS, BC and MGT); and South (YC, PS, GM, HT, CL, YT and NY). To examine genetic differentiation among these groups, pairwise F ST values were calculated and tested for significance using ARLEQUIN; Cyt b and D-loop were evaluated independently.
To detect haplotypic differentiation in the Yarkand hare, a median-joining network (MJN) was constructed with the software NETWORK 4.1.0.9 [47], based on concatenated Cyt b and D-loop sequences. This method outperforms minimum-spanning networks and statistical parsimony methods when networks have relatively divergent haplotypes and are characterized by numerous missing-node haplotypes [48], which was true for Yarkand hares. Analysis of molecular variance [49] (AMOVA; 10000 permutations) was computed to test for population structure using ARLEQUIN.
Possible historical population expansion events were assessed by mismatch distribution and Fu's F S value [50] using ARLEQUIN. Populations at demographic equilibrium, the null hypothesis (H 0eq ), were expected to show a multimodal distribution, and, alternatively (H 1eq ), populations that experienced a recent demographic expansion were expected to be unimodal [51]. Demographic expansion time was estimated from Cyt b sequences using the equations τ = 2ut and u = μk, where μ was the mutation rate for the whole sequence, t was the number of generations since the expansion, and k was the length of sequence used. A Cyt b diver-gence rate of 4% per MYA was assumed [36]. For the F S analyses, significance was based on 5000 simulated samplings.
The matrilineal history of the Yarkand hare was hypothesized using Bayesian inference as implemented in MrBayes v. 3.04b [52]. Lepus hainanus from [37] and Oryctolagus cuniculus from [38] were used as the outgroup. The program MODELTEST 3.7 [53] was used to find the optimal model of DNA substitution based on the Akaike information criterion [54]. Bayesian analyses began with random starting trees and ran for 5×10 6 generations, with Markov chains sampled every 100 generations. The first 1.25×10 6 generations were discarded as burn-in. The analysis was conducted twice to ensure that the Bayesian analyses were not trapped in local optima [55,56]. The remaining trees from both analyses were used to create a majority rule consensus tree, where the percent of samples recovering the same clade was assumed to represent the Bayesian posterior probability (BPP) of that clade.
Nodal support based on nonparametric bootstrap sampling (BS) used 1000 pseudoreplicates. The analysis was performed on a neighbor-joining (NJ) phenogram calculated using MEGA 4 [57]. Gaps and missing data were excluded in the NJ analysis, and the Maximum Composite Likelihood model was used for calculating genetic distances. Uncorrected genetic distances (p-distance) between, and within, major mtDNA lineages were calculated based on Cyt b data using MEGA 4 [57].

Sequence diversity
A total of 224 sequences consisting of 1140 base pairs were aligned for Cyt b. The average base frequencies were as follows: A=28.6%, G=12.5%, C=27.5% and T=31.4%. Nine transitions and one transversion were identified. Eighty haplotypes were discovered based on 107 polymorphic sites, of which 53 were potentially phylogenetically informative. The Cyt b sequences appeared to be of mitochondrial origin and not nuclear mitochondrial pseudogenes (numts), as the reading frame was intact and the third position base composition (A=36.4%, C=32.6%, G=2.0% and T=29%) was similar to the average in mammalian mtDNA [58] (A=39%, C=36%, G=3% and T=21%). For the D-loop, an alignment length of 553 base pairs was obtained for 224 fragments. The average base frequencies were as follows: A = 29.1%, G = 11.4%, C = 30.5% and T = 29%. Seventeen transitions and three transversions were identified. A total of 106 haplotypes were defined from 106 polymorphic sites, of which 93 were potentially phylogenetically informative. After concatenating the Cyt b and D-loop fragments (1693 bp (aligned)), 126 haplotypes were identified. Among 213 polymorphic sites, 146 were potentially phylogenetically informative. Separate phylogenetic analysis of Cyt b and D-loop found similar phylogenetic trees, suggesting that the D-loop fragment was also from mtDNA.
Nucleotide diversity values (π) ranged from 0.016 ± 0.010 (AKS) to 0.048 ± 0.049 (ATS) in the D-loop, and from 0.002 ± 0.001 (AKS) to 0.022±0.022 (MGT) in Cyt b (Table 1); π averaged 0.033 ± 0.016 for the D-loop and 0.008 ± 0.004 for Cyt b. Among the 4 geographical groups, π for the D-loop showed no significant differences. In contrast, π values for Cyt b in the West (0.011 ± 0.006) and South (0.010 ± 0.005) were almost twice as high as those observed in the North (0.005 ± 0.003) and East (0.006 ± 0.003). Haplotype diversity values (h) for the D-loop ranged from 0.524 ± 0.209 (PS) to 1.000 ± 0.500 (ATS, MGT, and CL), and for Cyt b from 0.714 ± 0.181 (PS) to 1.000 ± 0.500 (ATS, MGT, and CL). Averaged across all populations, h for the D-loop was 0.986 ± 0.002 and for Cyt b was 0.969 ± 0.005. Values of h for both D-loop and Cyt b were very high (>0.9) and the 4 geographical groups showed no obvious differences.

Network analysis and population differentiation
The MJN resolved 5 haplogroups, which were referred to as groups A, B, C, D and E ( Figure 2). Group A included 60 haplotypes, of which 10 were from West (7) or South (3), 46 were from East (5) or North (41). Haplotypes H14 and H37 were shared by West and North, and H28 and H57 were shared by South and North. The 18 haplotypes of B were from West (3), South (13), North (1; H79), and one was shared by West and North (H83). Most of the 29 haplotypes in C were from East (7), North (18), or shared between them (1; H115), but the group also included South (3). The 7 haplotypes in D included South (2), North (2), and East (2), and one shared by North and East (H23). Group E had 12 haplotypes in total from West (9), South (2), and North (1). Most haplotypes from East and North were admixed and fell within groups A and C, and the majority of haplotypes from West and South were admixed and fell within groups B and E. Group D included a small number of haplotypes from South, North and East, but not West, and none of these haplotypes dominated in any region.
For AMOVA, we divided the 20 populations into 2 groups according to geographical associations in the MJN (Figure 2 (Table 3). These analyses required rejection of H 0ran and acceptance of the alternative hypothesis, H 1ran , which stated that genetic diversity was not randomly distributed.

Demographic analysis
Given that the D-loop was more likely to experience substitutional saturation, the mismatch analysis and Fu's F S values were calculated from the Cyt b data alone. The analyses used all haplotypes and the five haplogroups resolved in the MJN (Figure 3). The mismatch distribution for all haplotypes, group A and group C, showed a unimodal distribution, indicating three recent demographic expansions (  (Figure 3(c), (e), (f), respectively). Thus, the H 0eq could not be rejected for these groups.

Genealogical history
Phylogenetic analyses of Cyt b alone and the combined dataset yielded identical tree topologies. The Yarkand hare's haplotypes grouped into five major lineages ( Figure  4), exactly corresponding to the five haplogroups (A, B, C, D and E) shown in the MJN (Figure 2). Phylogenetic inference based on D-loop sequences supported the monophyly of these five lineages, except for lineage C (tree not shown). Cyt b sequence divergence (uncorrected pairwise distance; p-distance) among the 5 clades ranged from 0.7% to 2.3% ( Table 4). As seen in Figure 4, E branched first, followed by D, C, and the terminal sister-grouping of B and A. The associations of these 5 lineages and the relationships among the 5 lineages were strongly supported (BPP = 0.98-1.00, Figure 4). Similar support was obtained from BS values on the NJ tree (> 70%, Figure 4), except for C (BS = 59%, Figure 4).
Most haplotypes in lineages A and C were from North Branches are generally proportional to the number of differences between haplotypes. Terminal locality abbreviations are mapped in Figure 1 and detailed in Table S1.

Haplotypic structure
Populations from the combined northern and eastern areas of the Tarim Basin, and those from the combined southern and western areas, formed genetically and geographically distinct entities. The MJN and genealogical tree detected five distinct lineages of Yarkand hares, and the Cyt b sequence divergence among them was substantial (P-distance = 0.7%-2.3%). However, we did not find exclusive geographical structures corresponding to the 5 maternal lineages. Microevolutionary studies assume the null hypothesis that gene flow is unabated and this hypothesis cannot be rejected. When considering 20 populations, little geographical structure in the mtDNA variation was detected and this result was consistent with a previous study [42]. However, when the samples were combined into four geographical groups, phylogeographic differentiation was observed ( Figure 2), and this structure was supported by pairwise F ST values ( Table 3). The lower pairwise F ST values between North and East, and those between South and West compared with those between other combinations indicated genetic substructuring. This genetic differentiation was congruent with existing geographical barriers to dispersion. North and West are physically isolated by the Aksu and  Tarim rivers, and East and South are isolated by the Taklamakan Desert (Figure 1) [58]. The discovery of three haplotypes (H14, H37 and H83; Figure 2) shared between West and North, yet none shared between East and South, indicated that isolation between East and South was more complete than that between West and North.
Morphological evidence from the Yarkand hare is congruent with the genetic evaluation. An analysis of 24 cranial measurements using a one-way ANOVA analysis found that the southwestern population (PS) is generally separated from northern (KC, SY and YULI) and eastern (QM) populations [59]. The northern and eastern populations cannot be discriminated from each other morphologically. Thus, populations from the southwestern regions of the Tarim Basin and those from the northern plus eastern regions differ morphologically, and this corresponds to the genetic pattern.

Glacial refugia and postglacial expansion
Postglacial colonization can take the form of sequential dispersal into a newly available habitat by a few individuals. Populations in recently deglaciated regions should have relatively low genetic diversity compared with that in stable populations persisting in refugia, and colonizing populations should bear signatures of population expansion following the dispersal [2,[60][61][62]. These predictions can be treated as a null hypothesis (H 0gl ), because glaciated areas will not have resident terrestrial vertebrates.
Western and southern regions of the Tarim Basin might contain one or more refugia used by the Yarkand hare during times of maximum glaciation. If true, then dispersal during interglacial times would occur from southwestern refugia towards the northeast as the glaciers retreated. The demographical history of the Yarkand hare in the northern and eastern Tarim Basin cannot reject the H 0gl . Mismatch distributions and Fu's F S values suggest demographic expansion for the total group, and northeastern lineages A and C (Figure 3(a), (b), (d), respectively). In stark contrast, the H 0gl is rejected for South and West (Figure 3(c), (f), respectively). These populations appear to be stable.
Population expansion time estimates for the total group and groups A and C are consistent with interglacial range expansion. The estimated date of 0.21 MYA for total hap-lotypes predates the Bulakebashi Valley glacier in the west Kunlun Mountains [29] (0.206 MYA). The estimated date of 0.090 MYA for group A and falls within the last interglacial period in the west Kunlun Mountains [29] (0.07-0.13 MYA). In addition, the estimated date of 0.054 MYA for group C falls in the interglacial period between the last glacial maximum [29] (0.016-0.025 MYA) and the Litian І glacier [29] (0.066 MYA) in the western Kunlun Mountains. If true, then the area must have contained several Pleistocene refugia, at least some of which are responsible for the evolution of distinctive maternal lineages.
The genealogical relationship of lineage A is consistent with the H 0gl . The southwestern lineage E roots at the base of the tree, whereas northeastern lineage A is terminal, and this position enjoys strong support (Figure 4; BPP = 0.98). Thus, populations from southwestern portions of the Tarim Basin have a relatively older evolutionary history. Furthermore, the D-loop's π shows no significant differences among the four geographical groups, probably because of substitutional saturation-the D-loop has a higher evolutionary rate than Cyt b. However, Cyt b's π for West (0.011±0.006) and South (0.010±0.005) is almost twice as high as that of the North (0.005±0.003) and East (0.006±0.003). Northeastern populations have relatively less genetic diversity. Therefore, it is likely that the distribution of the Yarkand hare contracted to southwestern parts of the Tarim Basin during glacial maxima, and then recolonized or expanded into the northeast during the interglacial periods. This is similar to the rapid recolonization by many species in temperate Europe and North America [2,8]. Given the relatively complete geographic isolation between East and South, postglacial dispersal to the East likely occurred via the North.
This demographical history is further supported by the evidence from gene flow. The AMOVA results show that only 5.82% of the variation in the D-loop and 11.28% in Cyt b lies between geographical groups. Most of the variation (82.94% in control region and 75.90% in Cyt b) is attributable to intrapopulational diversity ( Table 2). This indicates extensive gene flow. Long distance gene flow is evidenced by the fact that all four geographical groups contain a mix of haplotypes, although some haplotypes dominate in different geographical areas (Figure 2). Wu et al. [42] reported similar results. Although hares are capable of very rapid locomotion, extensive, long distance dispersal is unlikely to account for our findings. The pattern of broadly mixed haplotypes suggests a recent common history of colonization, which is also congruent with our inference for the demographical history of the Yarkand hare.
The H 0gl is also supported by geological evidence. Given that the Tarim Basin probably did not have Quaternary glaciers [27][28][29][30][31][32], why did only the southwestern Tarim Basin contain refugia during glacial events, if the northeast was also not glaciated? The Tarim Basin is located north of the Qinghai-Tibetan Plateau. Uplifting of the Qinghai-Tibetan Bayesian inference tree from 126 mtDNA haplotypes based on combined mtDNA Cyt b and D-loop fragments. The optimal evolutionary model is TIM+I+G and the tree is rooted with Lepus hainanus and Oryctolagus cuniculus. Numbers above the branches are Bayesian posterior probabilities and those below the branches are the bootstrap proportions derived from a neighbor-joining tree. The geographic distributions of the haplotype are included in the sequence names, and these are mapped in Figure 1 and detailed in Table S1.
During interglacial times, the average annual precipitation in the Tarim Basin was very limited (~17.4-33.6 mm). Meltwater rivers from glaciers in the Tianshan and Kunlun mountains in the southwest (Figure 1) served as the dominant sources of water. However, the meltwater greatly decreased at times of maximum glaciation because the climate was colder and drier [76]. Total river flow declined from the southwest towards the northeast to the extent that the middle and lower reaches dried up, and yet the upper reaches near the glaciers were not greatly influenced, as evidenced from the Keriya [76] (Figure 1) and Tarim rivers [25,[77][78][79][80] ( Figure 1). For example, today, the 1321-km long Tarim River is the world's longest intra-continental river in a desert system. It runs eastward across the northern Tarim Basin to the eastern Tarim Basin and waters the biota in its drainage area. However, during glacial times, northern and eastern portions of the river dried up and its associated biota must have either died or retreated to the southwest. The subsequent interglacial periods were relatively warm and wet; therefore, organisms probably recolonized the northeast as glacial meltwater increased and the rivers reoccupied their courses. Certainly, higher species diversity occurred in southwestern Tarim Basin in ancient times, as evidenced by a large number of ancient oases and the occurrence of a large variety of pollens [68,81]. This pattern supports the concept of the southwestern Tarim Basin being a glacial refugium.

Conservation implications
Our study not only provides insights into the evolutionary history of the Yarkand hare, but also helps to identify conservation units for this threatened species. Traditionally, subspecies have been recognized on the basis of discontinuities in the geographical distribution of phenotypic traits [82], and they have been treated as separate conservation units in conservation biology and management plans [83]. However, increasing molecular studies failed to identify traditional subspecies as being phylogenetically diagnosable [82][83][84][85][86][87], and philosophically their recognition can be a problem [88]. A conservation unit might also consider congruence among morphological characters, genetic patterns, ecology, and physiology, for example. From this perspective, the Yarkand hare could be divided into two conservation units: one for the populations from North and East and the other from the South and West. Management plans for these two conservation units would be expected to effec-tively prevent extirpation on both groups and the concomitant loss of genetic diversity.
The extent of genetic and morphological diversity suggests that any consideration of population augmentation via translocations must be made cautiously. Although it may be desirable to manage genetic diversity within areas, a shift of genotypic frequency could be disastrous if haplotypic diversity reflects adaptation. Further, the ecosystems of the northern and eastern Tarim Basin are more vulnerable to habitat degradation by way of water resource utilization than those in the southwest. Special attention should be paid to the northeastern region. Although the Yarkand hare has been listed in the Second Category of State Key Protected Wildlife List (1988) and the 1996 IUCN Red List of Threatened Animals [35], management strategies should focus on maintaining ecosystem stability and not just protecting named taxa.
In the QTP, Europe, and North America, glacial advance drove organisms into refugia. However, in the Tarim Basin, their retreat reflected available water. In the Tarim Basin, refugia may have existed not only during glacial periods, but also other times when the water supply was not sufficient to sustain ecosystems. Recent damming and agricultural exploitation of water in the Tarim River has resulted in the loss of the natural flora in large areas of the river's lower reaches. Although water recharging in the lower reaches has restored the natural vegetation on a large scale as the groundwater level rose [80,81], this strategy may not result in long-term ecosystem stability, especially given mounting economic and agricultural pressures. Therefore, management strategies should also include the return of farmland to natural forests or grasslands, in key areas. Water resources should be carefully managed. Most importantly, new laws must be enacted to guarantee that management strategies are put in practice.

Conclusion
Postglacial colonization of the Tarim Basin by the endemic Yarkand hare was assessed by range-wide sampling. Populations from the northern and eastern Tarim Basin shared a similar history, as did those from the western and southern regions. Refugia likely existed in the southwestern Tarim Basin at times of maximum glaciation. The present distribution of the Yarkand hare probably resulted from at least three postglacial colonization events which occurred about 0.21, 0.090, and 0.054 MYA. Significantly, availability of water, rather than displacement by glaciers, was the dominant driving force for the retreat of the Yarkand hare to refugia. This differed from the dynamic mechanism for refugia occupation in Europe, North America, and the QTP in China, i.e., glacial displacement. Investigations of other species, especially plants, would be essential to determine if this new dynamic mechanism has a general significance for arid species. Nevertheless, this discovery has important implications for conservation. Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Supporting Information
Table S1 Specimens examined in this study. Collection and voucher specimen information for populations of the Yarkand hare (Lepus yarkandensis) The supporting information is available online at csb.scichina.com and www.springerlink.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.