Phylogeography of the White-Toothed Shrews Crocidura suaveolens and Crocidura sibirica: Searching for the Geographical Homeland

We studied the polymorphism of the cytb gene in two forms of the Lesser White-toothed Shrew species complex: Crocidura suaveolens s. stricto and C. sibirica. The haplotypes of C. sibirica are found to be very similar to those of Crocidura suaveolens. They do not belong to a distinct haplogroup. The molecular diversity of the populations in the Asian part of the range is higher than in Eastern Europe. For the combined sample from Asia and Europe together, we revealed a significant signal of population expansion. Analysis of the expansion time showed that the Asian territory was colonized earlier (before the last glacial maximum) than the Eastern Europe (at the very end of the Late Pleistocene and in the early Holocene). The results of the ancestral area reconstruction are consistent with the hypothesis of a Middle Asian origin of the C. suaveolens/C. sibirica group, recent colonization of Inner Asia and later penetration into Eastern Europe.


INTRODUCTION
The Lesser White-toothed Shrew in a broad sense (Crocidura suaveolens sensu lato) is a large species complex, and its composition is still under discussion. Over a vast area stretching across all of Eurasia from the Iberian Peninsula to Tsushima Island through several natural zones, all forms of the Lesser Whitetoothed Shrew, except for С. sibirica, have identical karyotypes (Grafodatsky et al., 1988), and many authors believe that all populations with 2n = 40 on this territory should be considered a single species C. suaveolens (Catzeflis et al., 1985;Vogel et al., 1986;Hutterer, 1993). However, there is also evidence of morphological (Zaitsev, 1991;Tembotova, 1999;Bannikova et al., 2001;Jiang and Hoffmann, 2001) and high molecular (Vogel et al., 2003;Ohdachi et al., 2004;Bannikova et al., 2006;Dubey et al., 2006Dubey et al., , 2007aDubey et al., , 2007b diversity of this group. At present, no fewer than seven forms are recognized within the "Lesser White-toothed Shrew" species group according to morphological data and mitochondrial DNA (mtDNA) sequences; however, the views on the species status of these form vary among different authors. Usually, the Lesser White-toothed Shrew C. suaveolens (Pallas, 1811), the Siberian Shrew C. sibirica (Dukelsky, 1930), the Caucasian Long-tailed Shrew C. gueldenstaedtii (Pallas 1811), the Caspian Shrew C. caspica (Thomas, 1907), the Manchurian Shrew C. shantungensis (Miller 1901), and Zarudnyi's Rock Shrew C. zarudnyi (Ognev, 1928) (Zaitsev, 1991;Bannikova et al., 2006;Zaitsev et al., 2014;Burgin et al., 2018) are considered species. However, the species status is also assigned by some authors to C. mimula (Miller 1901) from Western Europe (Bannikova et al., 2006) or C. aleksandrisi (Vesmanis, 1977) from Cyrenaica on the Mediterranean coast of North Africa (Burgin et al., 2018). According to mitochondrial data, there are some other forms in the "Lesser Whitetoothed Shrew" species group: the samples from the Iberian Peninsula, the south and west of France, and Southern Italy form a clade that includes the subspecies cantabra and iculisma; the shrews of Cyprus (cypria), the samples from Lesbos and Vukarikisilka islands and Western Turkey, the white-toothed shrews from Central and Southeastern Iran, and from Libya (Dubey et al., 2007a) all form distinct phylogroups. The taxonomic status of these forms remains unclear.
The Lesser White-toothed Shrew in a narrow sense (Crocidura suaveolens s.str.) (the type territory is the Crimea, Chersonesus) has the most extensive range among all Palearctic species and has striking diversity of the habitats, occurring mainly in the intrazonal biotopes of the desert, steppe, and forest zones, as well as in anthropogenic habitats. In the deserts of Middle and Inner Asia, it inhabits dry stream beds (saires) and oases; in the forest zone of the European part of the range, it particularly tends to be synanthropic, occurring in various residential buildings (Zaitsev et al., 2014).
The Lesser White-toothed Shrew has a wide range of color and size variability, which explains the recognition of several forms by different authors: C. lignicolor (Miller 1900) from Xinjiang, C. ilensis (Miller 1901) from eastern Kazakhstan, C. lar (Allen 1928) from Mongolia, and C. mordeni (Goodwin 1934) from Central Kazakhstan. In some papers, the name C. gmelini was erroneously attributed to the form suaveolens s.str. (Hoffmann, 1996;Jiang and Hoffmann, 2001).
The phylogeographic structure and demographic history of C. suaveolens s.str. have been poorly studied, due to limited size of the previously analyzed samples. The vast range of the Lesser White-toothed Shrew demands a representative sample covering the entire geographical variation of the species. We should remark here that the Lesser White-toothed Shrew in the strict sense is genetically clearly distinct from other forms of the species complex, except for the Siberian Shrew. At the same time, the Siberian White-toothed Shrew is the most recognisable by morphology (Zaitsev, 1991) and karyotype (Grafodatskiy et al., 1988) among other forms, but simultaneously does not differ by the mtDNA haplotypes from C. suaveolens s.str. (Bannikova et al., 2006). Due to the impossibility of simple genetic testing of C. sibirica, the existence of different views on its range gives rise to significant difficulties (Zaitsev, 1991;Zaitsev et al., 2014). Inter alia, according to I.I. Stogov, only the Siberian Whitetoothed Shrew inhabits the right-bank part of the East Kazakhstan Region and both species occur in the Black Irtysh delta (Bekenov et al., 1985). However, we have not found any morphologically identified specimens of C. sibirica in our sample from the latter locality (the Zaisan Basin). It is also worth noting that the shrews from the Middle and Inner Asia in some studies were erroneously assigned to C. sibirica instead of C. suaveolens, without examination of the data on terra typica of the latter form and without considering the morphological traits (Jiang and Hoffmann, 2001;Ohdachi et al., 2004;Dubey et al., 2006Dubey et al., , 2007a. To address the problem of genetic mtDNA testing, we also included in our analysis the data on the samples with a geographical origin and morphological traits corresponding to the C. sibirica form.
In the present work, we studied for the first time the phylogeographic structure of the Lesser and Siberian White-toothed Shrews on the extensive geographical sample. The aim of our study was to explore intraspecific genetic variation, to identify potential phylo-geographic breaks, and to find the center of genetic diversity and to determine the direction of dispersal.
The genetic diversity estimators (nucleotide and haplotype diversity), the parameters of pairwise distances distribution (mismatch distribution), expansion time (τ), and the F st values between samples and neutrality tests (Tajima, 1989;Fu, 1996) were calculated using Arlequin 3.5.2.2 (Excoffier et al., 2005) for the samples with n ≥ 10. The general pattern of demographic history was described using the Bayesian Skyline Plot (BSP) method , which estimates the changes in the effective population size over time. We used BEAST v. 1.10.4 for demographic analysis. The strict clock model and the Bayesian Skyline model with ten groups for the tree shape were used. The suitability of a simpler demographic model, i.e., the model of exponential growth, was rejected by using the Marginal Likelihood Estimator method.
Estimation of the evolutionary rate of cytb. There is a severe problem while estimating the time of recent divergence events (<1-2 million years). It's the potential distortion of divergence times estimates due to the changes in the apparent rate of substitutions over time (rate decay) (Ho et al., 2005). As a consequence, the phylogenetic rates obtained by the examination of interspecific divergence events may be inapplicable to the dating of intraspecific (population) events.
The expansion time was determined by applying the rate of divergence estimated for all cytb positions, which is 16% per million years. It was obtained as follows: firstly, it was assumed that the rate of substitutions by transversions for the 3rd codon position (tv3) of cytb changes insignificantly over time and tv3 occurred at a rate close to the phylogenetic one. This substitution rate had been obtained for Crocidura earlier (Lavrenchenko et al., 2009): 1.87% per million years; secondly, based on the ratio of different types of cytb substitutions in each species of the C. suaveolens s.lato group, the tv3 rate was estimated to be 23.4% of the rate of all substitutions at all positions. Hence, the substitution rate of all substitutions in cytb of the C. suaveolens s.lato group species suitable for assessing recent times (<100000 years ago) is 8%, and, therefore, the haplotype divergence rate is 16%.
Reconstruction of the center of origin. The potential center of origin of the ancestral mitotype was recon-structed using the principle of maximum parsimony, as was proposed in a similar work on the Common Shrew (Raspopova et al., 2020). The purpose of this analysis was to use the connection between the geographical distribution of mitotypes and their phylogenetic relationships in order to determine the geographic placement of the ancestral mitotype, that makes the dispersal scenario of mitochondrial lineages over the area as parsimonious as possible. We approached this task using the parsimony algorithm adapted for landmark data and implemented in the TNT program version v. 1.5 Sept. 2020 (Catalano et al., 2010;Goloboff and Catalano 2016). Since the geographic coordinates cannot be used directly to describe the position of points in two-dimensional space, they were converted into arbitrary Cartesian coordinates using the linear multidimensional scaling of the matrix of distances between sampling points using the cmdscale package in R. The permitted distribution of ancestral mitotypes was confined to the set of sampling points and the nodes of a 12 × 12 coordinate grid covering the modern range. In addition, it was necessary to address the uncertainty of the phylogeny between haplotypes. For this purpose, the analysis was repeated 100 times using a sample of trees obtained as a result of BEAST (BSP) analysis. Then we analyzed the spatial distribution of reconstructed placements of the ancestral node.
The resultant Cartesian coordinates were converted into true latitude/longitude using the original script provided by one of the authors (I. Artyushin). The distribution density was obtained by the kernel density estimation (KDE) using the von Mises-Fischer distribution with kappa = 200 as a kernel. The calculations were made in the R v3.6.1 environment using the nls {stats} and distGeo {geosphere} functions and the raster package for visualization.

RESULTS
Phylogenetic and phylogeographic analysis. None of the phylogenetic analyses (BI, ML, MP, NJ) yielded a resolved haplotype tree; however, several potential groups can be distinguished. Figure 2 shows the Bayesian tree constructed in the Beast software, which is also a chronogram of the major evolutionary events in the studied species group. The median network (Fig. 3) includes 79 haplotypes: 71 from the Lesser White-toothed Shrew, seven from the Siberian Shrew, one of them is common for both forms. The network has a star-shaped structure; nevertheless, the same groups of haplotypes as on the tree can be found here.
Group I is the most heterogeneous one; it lacks the central, most common haplotype, and it's haplotypes differ from each other in the greater number of substitutions. This group consists of the haplotypes from Middle Asia, Iran, and Kashgaria. Group II includes the bulk siberian haplotypes, the haplotypes from the Balkhash Lake region, the Zaisan Basin, and Dzun-garia, as well as three solitary haplotypes from Middle Asia. The haplotypes of the Siberian White-toothed Shrew within group II form a distinct and compact haplogroup. Group III includes two broadly dispersed haplotypes (h10, h69) and derived haplotypes and can be divided into two subgroups: IIIa and IIIb. Subgroup IIIa is formed mainly by the haplotypes of Eastern Europe and also includes some of the haplotypes from Mongolia, Dzungaria, the Zaisan Basin, and the Balkhash Lake region. Subgroup IIIb contains most of the haplotypes from Mongolia and Dzungaria, the Zaisan and Balkhash haplotypes, and one haplotype from Astrakhan, as well as some part of Siberian haplotypes; moreover, the samples of C. sibirica from Krasnoyarsk and one sample from Novosibirsk carry haplotype h69, which is common among the samples from other localities, including the Gobi.
In general, the samples from Middle Asia are most unlinked to each other and distant from the samples from other geographic regions. The haplotypes in common to the northernmost and southernmost regions of the range are found only in the Balkhash Lake region and the Zaisan Basin. In the sample from Eastern Europe, the haplotypes common with the Asian samples (primarily from the Balkhash Lake region) can be found to the east from the left bank of the Volga River and are absent on the right bank and further in the western part of the range.
As can be seen from the data presented, the haplotypes from the range of C. sibirica are very similar to those of C. suaveolens s.str.; they do not form a single monophyletic group, and the Lesser White-toothed Shrew is paraphyletic relative to the Siberian Shrew with respect to mtDNA. Given this we further analyze and discuss the sample of specimens of the two species (C. suaveolens/C. sibirica) as a single sample.
Times of major divergence events. Figure 2 shows the times of the main divergence events in the C. suaveolens/C. sibirica group. With the accepted divergence rate of ~16% per million years, the latest common ancestor of all C. suaveolens s.str./C. sibirica lines should have existed 42 000 years ago. The C. sibirica line, which is found within group II, was split 19 600 years ago, and its radiation occurred 15 000 years ago.
Genetic diversity of geographic populations. We split our sample into ten main samples (n ≥ 4) (Fig. 4) based on geographical and genetic distances (p-distances, 0.22-0.72%) between the samples from local populations. Seven of them (n ≥ 10) were included in  the analysis of genetic diversity and demography: C. suaveolens (1) Europe 1: to the west of the right bank of the Volga River; (2) Europe 2: from the left bank of the Volga River to the Urals; (3) the Balkhash Lake region; (4) the Zaisan Basin; (5) Southern and Central Mongolia (the Gobi); (6) Dzungaria; (7) C. sibirica Altai, Krasnoyarsk, Kemerovo, and Novosibirsk (Siberia). All available data were used to analyze the total sample for Asia (n = 97). The changes in the effective population size over time were modeled only for samples of more than 25 specimens.
The indices of the molecular diversity and demographic parameters for the samples of at least ten specimens are presented in Tables 1 and 2. Statistical parameters of the cytb diversity in the geographical   (Table 1). Among large samples, the maximum haplotype diversity is apparent in the population of the Zaisan Basin; the Balkhash Lake region has the maximum nucleotide diversity. The high h and π values in Siberia are due to existence of two sympatric haplogroups in C. sibirica (p-distance, 0.76%); the diversity within these groups is low. Despite the small sizes of the samples from Middle Asia (n = 7) and Kashgar (n = 4), let us note that the haplotype and nucleotide diversity in them is very high (π = 0.0046 ± 0.0029 in Middle Asia and π = 0.0050 ± 0.0036 in Kashgar) with h = 1 in both cases.
Demographic analysis. Tajima's and Fu's neutrality tests are significantly negative only for the European population, indicating its recent expansion. There is also a significant increase in the case of the total Asian sample, but it is not significant for individual geographic populations ( Table 1). The Rg and SSD indices do not reject the sudden expansion model for any of the samples ( Table 2).
The modeling of the effective population size dynamic (BSP) for the joint C. suaveolens s.str./ C. sibirica group indicates weak growth. It is difficult to determine the exact time when it happened given the pattern obtained (Fig. 5). The results of the BSP analysis for the Gobi and eastern Kazakhstan samples reveals relatively recent growth for the Gobi population and a noticeably slower and substantially earlier growth in eastern Kazakhstan. We failed to obtain the results of the BSP analysis for the sample from Europe due to extremely low nucleotide and haplotype variability.
However, the twofold difference in tau (the time in nucleotide substitutions, Table 2) in the samples from Middle Asia and Kazakhstan compared to Eastern Europe indirectly indicates their greater age. If we recalculate the tau index into absolute time at a cytb divergence rate of 16% per million years for the two samples with a significant expansion (joint Asian and joint European, Table 2), we can obtain an approximate time of dispersal. On the Asian territory, the expansion occurred earlier: about 24000 years ago; the East European part of the range was colonized no more than 12000 years ago, which corresponds to the end of the Late Pleistocene and the beginning of Holocene. The Mismatch distribution analysis yielded quite similar distributions of pairwise distances in the total sample from Eastern Europe and in the large populations from Asia (Fig. 6). At the same time, the shape of the distribution is certainly unimodal in the former case (i.e., in Europe, it does not deviate from the shape expected from the sudden expansion model), while in the large Asian samples (South/Central Mongolia, eastern Kazakhstan, joint Asian sample) the observed distribution begins to deviate visually from the expected shape. The samples from the Balkhash Lake and Zaisan regions have the least smooth distribution, though their Rg values do not confirm this. Splitting the European sample into sub- Ancestral area. Reconstruction of the center of origin (Fig. 7) showed that the latest common ancestor of modern mitochondrial lines of the Lesser Whitetoothed Shrew could have inhabited Middle Asia (except for the southwest), as well as the territory of Kazakhstan and further to the west, up to the Northern Caspian. Potential points of origin have the highest density of distribution in the area from Semirechye to Western Gissar-Alai mountain ranges. The area of possible distribution of the ancestral mitotype does not include Mongolia, North China, Altai, Iran, Turkmenistan, and the Black Sea region.

DISCUSSION
The chronogram and the structure of group I in the phylogenetic network (probably, it is the most basal branch of the tree; it lacks a central haplotype common to most specimens and has a significant number of substitutions between haplotypes) suggest that this group, which contains the Middle Asian, Iranian, and Kashgar haplotypes predominantly is older than groups II and III; i.e., the dispersal of shrews in the southern part of the range is an earlier event than the colonization of the western and northeastern parts of the modern range. The European part of the range was colonized most recently. This is indicated by the lowest haplotype and nucleotide diversity in the Eastern European sample, the specimens of which belong only to group III which has a simple star-shaped structure with the central, most common haplotype. On the contrary, the haplotypes from eastern Kazakhstan occur in all groups of haplotypes and the nucleotide diversity of the eastern Kazakhstan populations is one of the highest, which indicates that it can be potentially one of if not the oldest one. The results of the ancestral locality reconstruction for the species are also consistent with the hypothesis of the Middle Asian and/or Kazakhstanian origin of the Lesser White-toothed Shrew and reveal the recent origin of the Inner Asian populations (Mongolia, China) and even later colonization of the European part of the range as a result of rapid expansion from the territory of Kazakhstan.
Thus, our data show that the Lesser White-toothed Shrew has colonized its range very quickly and relatively recently. Apparently, this is why phylogeographic breaks had not enough time to emerge over the vast and ecologically diverse territory and thus the variation in color and size reflects only the morphological adaptation to arid or humid landscapes and does not have an initial genetic basis.
The Lesser White-toothed Shrew resembles by its range type and its landscape and habitat preferences, certain other Palearctic species of small mammals. The Grey Dwarf Hamster, Cricetulus migratorius (Pallas 1773), inhabits steppes, forest steppes, deserts, and semi-deserts from Greece to southeastern Mongolia inclusive and from Central Russia to Balochistan (Lebedev, 2012). Thus, its range covers almost entirely the range of the Lesser White-toothed Shrew, though it doesn't reach as far north and does not occur in the forest zone. The Grey Hamster is ecologically flexible and, similarly to the Lesser White-toothed Shrew, inhabits oases in the arid zone. It was shown that the species has three allopatric mitochondrial lineages   Observed Expected distribution  with a phylogeographic break along the Volga River between two broad-range forms (genetic distance of 2.5%). The Lesser White-toothed Shrew, unlike the hamster, has probably just recently colonized the territory to the west of the Volga River.
The territories to the east of the Volga River and up to Mongolia are inhabited by the eastern lineage of the Grey Dwarf Hamster, and its phylogeographic structure resembles that of the Lesser White-toothed Shrew, with its star-shaped pattern and low nucleotide diversity. Like the Lesser White-toothed Shrew, the Grey Dwarf Hamster has relatively recently populated Inner Asia (Mongolia) from Middle Asia and/or Kazakhstan; its expansion in the eastern part of current range actually coincides with that of the Lesser Whitetoothed Shrew and is estimated to be 25000 years ago . We can find the common features in the shape of the recent range and the direction of dispersal between Crocidura suaveolens s.lato and the species complex of the House Mouse (Mus musculus s.lato). In particular, the area of the "musculus" form actually coincides with the area of C. suaveolens s.str., with the exception of the southernmost (Iran) and western (Central Europe) parts, and there are some similarities in the phylogeographic structure (Bonhomme and Searle, 2012). As in the case of the Lesser White-toothed Shrew, Middle Asia was colonized by the House Mouse before China, Mongolia, and Europe: published datings suggest an expansion close to the Pleistocene-Holocene boundary (Suzuki et al., 2013;Hardouin et al., 2015;Li et al., 2020).
The status of the Siberian White-toothed Shrew is still unclear. C. sibirica bears two mitochondrial lineages that are not connected with particular geographic localities and are not monophyletic, i.e., they are not species specific (it is similar to previous studies). One of these lineages (from the common group II) is distributed mainly in the western part of the range (Novosibirsk, Kemerovo) and the second one (the lin-eage from the common group IIIb) inhabits mainly the southeastern part (Krasnoyarsk, Teletskoe Lake).
The western Siberian genetic line, contains more haplotypes, compared to the southeastern line, none of them are found within the range of the Lesser Whitetoothed Shrew. This line branched ~20000 years ago. If we take this date as the time of divergence between the haplotypes of C. sibirica and C. suaveolens s.str., it is hardly reasonable to assume the species status of the Siberian form due to very recent time of its origin. The absence of the mitochondrial geographic structure also suggests recent genesis of the modern range.
In this case, the Siberian Shrew is only a morphotype of the Lesser White-toothed Shrew isolated during the last glacial maximum, in a steppe refugium bordering the forest e.g. in Altai mountains. Fixation of the morphological differences in such population, even those associated with adaptation to living in a forest environment, doesn't necessarily lead to general divergence and reproductive isolation from the sister form. The existence of the second lineage (which is more recent and similar to the Mongolian haplotypes) can be explained by hybridization with the Lesser White-toothed Shrew during the secondary contact. Given the broad range of C. suaveolens s.str. and much smaller one of C. sibirica, which is also located at the northernmost limit of the Lesser White-toothed Shrew's range, it's reasonable to assume the directional introgression of mtDNA from the Lesser Whitetoothed Shrew to the Siberian Shrew.  Another hypothesis suggests that C. sibirica is actually a species, that lost it's native mtDNA, since it was completely replaced by the mtDNA of the Lesser White-toothed Shrew due to hybridization during its active colonization of the current range. The coexistence of two mitochondrial lineages in this case can be explained by two hybridization events with the ancestral form (Lesser White-toothed Shrew) during nonsimultaneous secondary contacts. We expect that a larger number of nuclear loci will help to unveil origin of discrepancy between the genetic and morphological variability of the Siberian Shrew.