Population Genomics of the House Mouse and the Brown Rat.

Mice (Mus musculus) and rats (Rattus norvegicus) have long served as model systems for biomedical research. However, they are also excellent models for studying the evolution of populations, subspecies, and species. Within the past million years, they have spread in various waves across large parts of the globe, with the most recent spread in the wake of human civilization. They have developed into commensal species, but have also been able to colonize extreme environments on islands free of human civilization. Given that ample genomic and genetic resources are available for these species, they have thus also become ideal mammalian systems for evolutionary studies on adaptation and speciation, particularly in the combination with the rapid developments in population genomics. The chapter provides an overview of the systems and their history, as well as of available resources.


Introduction
Population genomics can address very different biological questions related to speciation, divergence of closely related species, within species population structure or within population evolutionary processes that affect adaptation. In the era of next-generation sequencing (NGS) with increasing taxonomic sampling, the crucial factor to apply population genomics is not any longer the number of genetic markers (quantity) but it is quality and complexity of the massive amount of available information that needs to be integrated and interpreted.
In this chapter, we focus on studies of population genomics in rodents and in particular on the Murinae. Murinae as a subfamily of rodents comprises more than one hundred genera and it is among mammals one of the largest subfamilies with species native to most continents. Murinae includes the house mouse (Mus musculus) and the brown rat (Rattus norvegicus) of which laboratory strains have been used since decades for biomedical research, as well as to serve that this is most likely due to secondary adaptive introgression, even across large geographic distances [20,21]. The overall phylogenomic analysis suggests that M. m. musculus and M. m. castaneus are sister groups and that M. m. domesticus is more basal [12,19].
The subspecies meet in several zones of secondary contact, where they form hybrid zones [2,18,22]. Fertility of offspring is impaired across these hybrid zones, and this serves as a general model to study the genetic basis of hybrid sterility as part of speciation processes (e.g., [22][23][24][25][26] Despite genomic resources, including a variant database of 17 laboratory inbred strains [12,40], there was the need to derive laboratory strains that harbor most of the natural variation found in wild-derived populations [41,42]. Genotype arrays were established that were constructed to maximize variant information at low sequencing costs [43]. The still commonly used genotyping arrays are MegaMUGA with a set of 77,808 SNP markers and GigaMUGA with a set of 143,259 SNP markers [44], which only represent a fraction of variants found between any sequenced inbred strain and the reference genome (~4 to 5 million SNPs; [12,45]). However, researchers started to complement their analyses with NGS based datasets and genomic resources for wild populations of the house mouse are now common ground for subsequent analysis [14].

Brown Rat History
Mice and rats approximately diverged 7-12 million years ago [46]. Similar to house mice, brown rats (Rattus norvegicus) have been used for more than two centuries for biomedical studies to learn about the basis of human diseases and to deal with human pest management [47,48]. The genome of the brown rat was published in 2004 [49] and consists of 20 autosomes and 2 sex chromosomes (X and Y) with a total length of 2.8 Gb (currently with 22,250 coding and 8934 noncoding genes annotated). The house mouse genome and the brown rat genome show a high number of shared syntenic homologous blocks with different levels of recombination [50]. Approximately 30% of the rat genome aligns only with the mouse genome, which might correspond to rodent-specific repeats [49]. A syntenic view of both genomes is given in Fig. 2 to illustrate the pairwise chromosome assignment obtained from the Synteny Portal (see Table 1 for web page URL link; [51]).
The origin of the laboratory brown rat (Rattus norvegicus) and the black rat (Rattus rattus) most likely lies in central Asia [52]. Spatial population genomics studies were conducted on brown rats living in New York City [53] and, like in mice studies, mtDNA haplotype data could disentangle the phylogeography of brown rats in the countries surrounding the South Atlantic Ocean [54]. While the phylogeography of black rats, like the phylogeography of house mice, reflects human colonization and settlement history [53,55,56,57], brown rats did not appear in Europe until the sixteenth century. Their dispersal routes from Asia to Europe are still under debate [57]. For example, one route is thought to lead via northeast China and Siberia, while another route inferred on whole-genome sequencing may represent an expansion via a Southern East Asia route [58]. Figure 3 illustrates the sampling distribution of Rattus norvegicus from publicly available wholegenome data sets.

Population Genomics
As mammal species expand, they are faced with new abiotic and biotic factors, such as different climatic conditions, different food or new pathogens, prey and/or predators, which potentially lead to adaptation and contributes to shaping the genome over time. Evolutionary changes in the genome can result from mutation, gene flow, random genetic drift, recombination and selection. Genome-wide scans for deviation from modelled neutrality aim at revealing such evolutionary processes. Genome-wide scans can help to identify genotypic and phenotypic variation, and by taking demographic events into account, they can even detect genes under recent positive selection [59]. Negative selection leads to sequence conservation by removing disadvantageous alleles. Positive selection can yield to an excess of nonsynonymous fixed differences or lead to an altered allele-frequency spectrum (AFS). Multiple approaches exist to detect adaptation, each with its own caveats. For example, dN/dS ratios can be used in comparative studies to detect selection on genes. But this analysis is limited to species that represent a certain evolutionary distance to allow a sufficient number of substitutions to have occurred [60]. When samples are drawn from different populations of the same species, it is necessary to study frequency changes of polymorphisms instead of substitutions. As compared to studies with a limited number of neutral markers, population genomics uses high marker density to robustly infer genome-wide effects, usually as signals of departure from expectations of the neutral theory of molecular evolution (see Chapter 5 for a detailed description how to detect positive selection).

House Mouse Genetic Variation
Population genetic studies revealed a fairly large effective population size (N e ) for wild natural populations of mice in the order of N e ¼ 5 Â 10 5 to 2 Â 10 6 [61,62] with two to three generations per year. Based on a genotyping array, the effective population sizes for the subspecies were estimated to range between N e ¼ 0.25 Â 10 5 to 1.2 Â 10 5 for M. m. musculus, N e ¼ 0.58 Â 10 5 to 2 Â 10 5 for M. m. domesticus and N e ¼ 2 Â 10 5 to 7 Â 10 5 for M. m. castaneus [63]. This assumption was validated recently by a population genomic study on nucleotide diversity within the subspecies of M. m. castaneus [64]. In the same study an excess of adaptive substitutions  [58] in protein-coding genes, UTRs and conserved noncoding elements (CNE) were observed [64]. A follow-up study based on the same data recently inferred the recombination landscape within the same subspecies and revealed that genetic diversity is positively correlated with the rate of recombination [17] (see ref. 13 for the recombination landscape in the collaborative cross [41] and see ref. 65 for mouse inbred strains). The frequency-weighted mean estimate of the recombination rate was inferred from a broad-scaled map to 4N e r/bp ¼ 0.0092 for autosomes per bp and to 4N e r/ bp ¼ 0.0026 for the X chromosome [17]. One candidate gene that is known to influence recombination break points in mammals is PRDM9 [66][67][68][69]. PRDM9 is highly polymorphic in natural populations of the house mouse [70,71] and it was recently shown that some alleles are preferred over others in hybrid mice [72]. What is remarkable in the study of Booker et al. [17] is the high level of variability of recombination hot spots within one population and between wild-derived and classical inbred strains, which is worth further consideration. For example, phasing approaches should depend on an accurate recombination map and the question arises whether global heterogeneous recombination rates provide sufficient information for fine-scaled phasing inference.
Researchers need to rely on high-quality genome information to perform reference-based whole-genome analysis to retain variant information for the populations under study. However, in some cases the sequence divergence of the analyzed population and the reference is high and might produce mapping artefacts [73]. To cope with such situations Sarver et al. [74] performed a pseudoreference based approach using exome data to infer the phylogenetic relationship and gene tree incongruence of the Mus clade. While Sarver et al. [74] used the D-statistic [75] to detect introgression between M. m. musculus and M. m. domesticus, other methods have been recently applied to infer introgression signals [8,20,21,76,77].
In their genomic comparison, Harr et al. [14] incorporated the two other house mouse subspecies M. m. domesticus and M. m. musculus together with the M. m. castaneus samples. In total this study covers a divergence time of roughly two million years by complementing the data with samples from the sister species M. spretus and the recently diverged species M. m. helgolandicus [14]; see Fig. 1. In combination with the short generation time of mice, this constitutes a substantial molecular divergence, which is, for example, larger than the divergence between humans and Hominidae across the same time scale. Figure 4 represents the inferred population sizes for the subspecies M. m. domesticus and the diverged species M. m. helgolandicus, this data set was analyzed with the smc++ software setting the mutation rate to μ ¼ 5 Â 10 À9 per base pair per generation [78].
Population genetic variation in segmental duplications (copy number variation) was systematically studied by Pezer et al. [79]. They found among the most copy-number variable genes three highly conserved genes that encode the splicing factor CWC22, the spindle protein SFI1, and the Holliday junction recognition protein HJURP. These genes showed population-specific expansion patterns that suggested an involvement in local adaptations. Other variable genes were found to encode proteins that are relevant for environmental and behavioral interactions, such as vomeronasal and olfactory receptors, as well as major urinary proteins. In a follow-up study, it was suggested that duplications in the Androgen-binding protein gene region might specifically have contributed to species diversification [80].
Another study also identified the CWC22 region as a region which shows major segmental duplication in the house mouse. It received the genetic name R2d and it was shown that the structural mutation rate appears to depend on the diploid configuration at that locus [81]. By reconstructing the origin and history of copynumber variants (CNVs), the study of Morgan et al. [81] is a nice example how important refined analyses are to disentangle complex genome structures. This is particularly true for genomic regions that are duplicated and are absent from the reference genome, which the author termed the "missing genome" [81].  [14] was filtered to only retain intergenic regions without any feature annotation. For each population a separate smc++ [78] model was created setting the per generation mutation rate to 5 Â 10 À9 (see Note 1 for a detailed method description) The sequence and structural diversity of Y chromosomes in natural populations was studied in [82]. The mouse Y chromosome is in comparison to other mammals larger and harbors more annotated genes. The authors could show that CNV on the long arm of both sex chromosomes is highly variable, but sequence diversity as compared to autosomes is low in nonrepetitive regions.
The autosomal AFS of neutral intergenic regions was used to infer demography of all subspecies with the software "∂a ∂i" [83]. All simple models applied predicted effective population sizes that fall inside the range mentioned above (M. m. domesticus: N e ¼ 1.

Brown Rat Genetic Variation
Rats and in particular the species Rattus norvegicus have an effective population size comparable to that of the mice subspecies M. m. domesticus and M. m. musculus. Denium et al. [84] estimated the effective population size to be N e ¼ 1.24 Â 10 5 , based on silent mutations of 12 wild-derived animals. The authors highlight a recent bottleneck in rats (20,000 years ago) based on a 'PSMC' [85] analysis (see Chapter 7 for a discussion of MSMC and MSMC2). This bottleneck might be the cause of negative estimates of the rate of adaptive evolution in proteins and noncoding elements. Compared to mice, rats show a larger proportion of mildly deleterious mutations and concordantly a lower rate of highly deleterious mutations [84]. However, the reduction in diversity around exons is comparable to values obtained for mice [64]. Considering the different N e of mice and rats, Denium et al. [84] estimated linkage disequilibrium (LD) decay to be six to seven times faster in mice than in rats.
As for mice, researchers looked into speciation and introgression events using population genomics. Teng et al. [86] used the Himalayan field rat (Rattus nitidius) as an outgroup, which is geographically restricted to Southeast Asia, to investigate introgression in brown rats sampled in China. With whole-genome data from 44 individuals, the N e for brown rats and Himalayan field rats was estimated to N e brown rats ¼ 2.53 Â 10 5 and N e Himalayan field rats ¼ 5.18 Â 10 5 , which reflects a difference of similar order to that of the house mice subspecies M. m. musculus and M. m. castaneus. According to the "PSMC" analysis the sibling species R. norvegicus and R. nitidius diverged~650 thousand years ago, that is, within a time frame where the mouse divergence is suggested to be at the level of subspecies. The proportion of admixed fragments was estimated to 1.59% with admixture block sizes from 100 kbp to 1.42 Mbp [86]. Among the 346 introgressed regions detected, 92 loci were classified as adaptive. The strongest candidate is located on chromosome 1 overlapping with the "vomeronasal 1 receptor cluster," a chemical communication protein. As in mice [20], the regions were enriched in biological terms like "chemosensory perception" and "immune response." Next to regions showing signals of introgression, 352 regions were identified as having undergone a selective sweep based on allele frequency differentiation between populations "XP-CLR" [87] and cross population extended haplotype homozygosity calculations "XP-EHH" [88] which, like introgressed regions, are enriched in proteins involved in immune-response and metabolism.
Zeng et al. [58] extended the publicly available whole-genome sample set of brown rats to a world-wide distribution. With more than 100 individuals the authors investigated the geographic origin and migration paths. In contrast to previous hypothesis that Rattus norvegicus dispersed from northern Asia to Europe, their data supports the southern East Asian dispersal route to Europe [58]. Similar to Teng et al. [86], Zeng et al. [58] consistently identified candidate genes with signatures of positive selection that are associated with the immune-response by comparing European and Chinese populations.

Examples of Genes Under Positive Selection
In this section, we discuss three of several examples of genes that have been shown to be involved in adaptation in mice and rats. One prominent example is the evolution of the resistance against warfarin, a rodent pest management poison.

Rodent Resistance to Anticoagulants: Vkorc1
As vectors for human diseases, rodents have been reduced over half a century by rodenticides. Common compounds of rodenticides target the blood coagulation (e.g., warfarin) and target the vitamin K reductase reaction [89]. Several mutations have been found in house mice and brown rats within the Vkorc1 gene that confer resistance against warfarin [90]. Song et al. [76] suggested that an allele introgressed from the Algerian mouse (Mus spretus) into M. m. domesticus led to anticoagulant resistance. Both species live today in sympatry in south-western Europe. Vkorc1 was subject to adaptive protein evolution in M. spretus since it separated from other Mus lineages and four introgressed polymorphisms could be linked to a strong resistance phenotype [76,91]. Based on wholegenome data [14], this region shows negative Tajima's D values within western European mouse populations in contrast to a population from Iran (see Fig. 5a), compatible with recent positive selection acting on it.  Table 1 for web page URL link)

Pathogen Related Resistance: Xpr1
Next to artificial human-made selection pressure, there exists natural selection caused by pathogens. Hasenkamp et al. [92] have studied the gene Xpr1, coding for the receptor of murine leukemia virus (MLV) They found that the gene has been subject to a recent selective sweep in the population from Iran and that the selected haplotype has adaptively introgressed into a population from France, where it has mixed with existing haplotypes and thus creates a higher average population diversity than in the nonintrogressed population from Germany (see Fig. 5b). It seems that the Xpr1 gene itself is under frequent positive selection and that alleles coping with new virus variants can rather quickly spread into other subpopulations if these are actively dealing with infectious cycles of that virus variant [92].

Segmental Duplications and Selective Sweeps: R2D2
As mentioned above, R2d is a CNV region on chromosome 2 that was found to cause nonrandom segregation [93]. Didion et al. [93] showed that signatures of selective sweeps obtained via genomewide scans can be mimicked by "selfish" alleles. Within the 127 kbp genome region of R2d there is one annotated gene, namely Cwc22, which is a spliceosomal protein. Based on haplotype sharing, analysis of almost 400 individuals sampled across Europe revealed that all individuals with an extreme excess of shared identity showed a high copy number of R2d. If only one subpopulation was analyzed, the haplotype sharing methods failed to detect this "selfish" sweep. However, if individuals from different geographically locations were included in the analysis, R2d was identified as a selective sweep. Morgan et al. [81] showed for the same locus that an initial duplication event~3.5 million years ago led to R2d1 and R2d2 and, therefore, mouse strains containing a single copy must have lost the second one. The authors identified nonallelic gene conversion in R2d1, which were transferred from R2d2 and caused the appearance of deep coalescence among R2d1 sequences [81]. Given both the patterns of concerted evolution, as well as the evolutionary dynamics of the selfish alleles, this could be a case of evolution through "molecular drive" [94].

The t-Haplotype as Meiotic Drive Element
Meiotic drive elements, or segregation distorters, transmit themselves to over 50% of the progeny of heterozygous individuals. The mouse t-haplotype, located within several inversions on chromosome 17, is a classic example of such a meiotic drive element [1]. Despite a strong driving capacity, t-haplotypes remain at relatively low frequency in natural populations, since homozygous individuals have strongly reduced viability [95]. The population genomics of the t-haplotype was studied in [96] based on the data provided in [14]. They found evidence for an accumulation of nonsynonymous substitutions within the inversions, but also signatures of recombination events that appear to have regenerated coding sequences that had accumulated deleterious mutations.
Based on the corresponding transcriptome data in [14] they could show that individuals carrying a t-haplotype display also a change in the testis expression of genes outside of the t-complex.

Conclusion
Per sample cost reduction for sequencing has led to an exponential increase in available whole-genome data for model and nonmodel organisms. Being among the longest studied mammals, both house mouse and brown rat have proven to serve as models for studying the processes that shape genome evolution in natural populations, including introgression and positive selection. However, while the public domain is steadily filled with population genomic usable datasets, there is still a gap between studies that predict candidates and studies that functionally validate them. As a consequence, functional studies to prove that genes have a direct impact on fitness in a certain species should be extended. The experimental set up to measure fitness will always depend on the species level and should be imbedded in an environmental context.