Ancestral bias in the Hras1 gene and distal Chromosome 7 among inbred mice
- 393 Downloads
Inbred strains of mice vary in their frequency of liver tumors initiated by a mutation in the Hras1 (H-ras) proto-oncogene. We sequenced 4.5 kb of the Hras1 gene on distal Chr 7 in a diverse set of 12 commonly used laboratory inbred strains of mice and detected no sequence variation to account for strain-specific differences in Hras1 mutation prevalence. Furthermore, the Hras1 sequence is essentially monoallelic for an ancestral gene derived from the M. m. domesticus species. To determine if the monoallelism and associated low rate of polymorphism are unique to Hras1 or representative of the general chromosomal locale, we extended the sequence analysis to 12 genes in the final 8 Mb of distal Chr 7. A region of at least 2.5 Mb that encompasses several genes, including Hras1 and the H19/Igf2 loci, demonstrates virtually no sequence variation. The 12 inbred strains share one dominant haplotype derived from the M. m. domesticus allele. Chromosomal regions flanking the monoallelic segment exhibit a significantly higher rate of variation and multiple haplotypes, a majority of which are attributed to M. m. domesticus or M. m. musculus ancestry.
KeywordsSingle Nucleotide Polymorphism Inbred Strain Single Nucleotide Polymorphism Data Ancestral Allele Imprint Region
Several papers have recently described the first inquiries into the single nucleotide polymorphism (SNP) and haplotype structure of the inbred mouse genome on a global scale (Lindblad-Toh et al. 2000; Pletcher et al. 2004; Tsang et al. 2005; Wade and Daly 2005; Wade et al. 2002; Wiltshire et al. 2003). This research verifies and details a patchwork pattern of variation first posited in 1987, in which pairs of laboratory inbred strains share large blocks derived from one ancestral strain, primarily M. m. domesticus or M. m. musculus (Bonhomme et al. 1987). Within any pairwise inbred strain comparison, there are haplotype blocks defined by a low or high SNP rate, and the average length of these blocks is estimated to be 1.2–1.4 megabases (Mb) (Frazer et al. 2004; Wade et al. 2002; Zhang et al. 2005). Genome-wide studies of a few inbred strains estimate that the average rate of polymorphism between two such strains is 0.5 SNP/10 kb in the low-SNP-rate blocks and 35 SNP/10 kb in SNP-dense blocks (Frazer et al. 2004). Blocks of limited diversity are attributed to a recent coalescence in which inbred strains inherited the same ancestral allele, while SNP-dense blocks reflect inheritance of divergent ancestral alleles.
Laboratory inbred strains of mice vary in their frequency of Hras1 (H-ras) mutational activation during multistage hepatocarcinogenesis (Buchmann et al. 1991; J.C. Drew and N.R. Drinkwater, unpublished). Given its significant role in liver tumorigenesis and its strain-dependent frequency of activation, we tested the hypothesis that strain-specific polymorphisms in the Hras1 gene could account for variable frequencies of Hras1 initiation in liver tumors. We sequenced approximately 4.5 kb of the Hras1 gene from 12 diverse, yet commonly used laboratory inbred mouse strains and two distantly related inbred strains. The set includes representatives from all six families of inbred strains as defined by genome-wide parsimony analysis of simple sequence length polymorphisms (Witmer et al. 2003).
In this article we report on the monoallelic inheritance of Hras1 and its neighbors on distal Chr 7 among 12 classic inbred strains. Sequence analysis revealed a region of remarkably low diversity among the strains. No strain-specific SNPs account for differences in frequency of Hras1 activation. Additional sequence analysis of surrounding genes in the final 8 Mb of distal Chr 7 exposed a unique 2.5-Mb block that is essentially devoid of any sequence variation among the 12 classic inbred strains. This block is flanked by regions with significantly greater diversity. Analysis of wild-derived inbred strains representing ancestral genomes of M. m. domesticus and M. m. musculus indicates that the 12 lab inbred strains have fixed alleles from the M. m. domesticus progenitor strain in the 2.5-Mb region on distal Chr 7 that includes the Hras1 gene. Considering the well-established role of Hras1 in cell signal transduction and of mutant Hras1 in tumor development, these results have important implications for the study of Hras1 in mouse models of neoplasia and contribute to the burgeoning understanding of laboratory inbred mouse genomes.
Materials and methods
Sequence analysis of Hras1 and surrounding genes on Chr 7
The C57BL/6J (B6) Hras1 and additional 5′ and 3′ nucleotide sequence was originally obtained from GenBank (accession No. z50013) and the Trace Archive database (http://www.ncbi.nlm.nih.gov/Traces/trace.cgi). Primer sets for PCR were designed with Primer3 software (http://www.frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi) using the B6 sequence as a template to amplify a set of 400–500 bp of overlapping products. Genomic DNA from spleens was prepared according to a previously published protocol (Bilger et al. 2004) from the following strains housed in our colony: B6, C57BR/cdJ, C3H/HeJ, CBA/J, and SM/J. Genomic DNA from 129X1/SvJ, 129P3/J, A/J, AKR/J, BALB/cByJ, DBA/2J, SWR/J, CAST/EiJ, SPRET/EiJ, WSB/EiJ, and CZECHII/Ei was obtained from The Jackson Laboratory (Bar Harbor, ME). Primers for 11 additional genes on distal Chr 7 were designed from the published B6 sequence (Mouse Genome Consortium 2002; http://www.ensembl.org/mus_musculus). For each of the 11 genes, we sequenced a 300–500-bp region. For 8 of 11 genes, the product spanned an intron/exon boundary. All primer sequences and accession numbers are listed in Supplementary Table 1.
PCR reactions included 10–20 ng of DNA, 62.5 μM dNTPs (Amersham, Piscataway, NJ), 15 pmol of forward and reverse primers (Integrated DNA Technologies, Coralville, IA), 0.3 U Taq DNA polymerase, and 1 × PCR buffer (Roche, Indianapolis, IN) in a total reaction volume of 20 μl. The reactions were incubated in thermocyclers at 95°C for 3 min, followed by 40 cycles at 94°C for 15 sec, 60°C for 25 sec, 72°C for 90 sec, and a final step at 72°C for 7 min. Minor modifications to the standard protocol such as optimization of annealing temperatures and initial incubations at 95°C for 15 min with HotStar DNA Polymerase (Qiagen, Valencia, CA) were required for a small number of reactions. All PCR products were purified with a Qiagen PCR Purification Kit according to the recommended protocol, except for the final elution step. Instead of eluting DNA with the suggested buffer, we eluted twice with 50 μl of ddH2O. Approximately 35–50 ng of purified PCR products was used for sequencing in separate reactions with 1 pmol of forward and reverse primers using Big Dye terminator chemistry (Perkin Elmer, Boston, MA) and analyzed on an Applied Biosystems 373 DNA automated sequencer (Foster City, CA). Sequencing reactions were incubated at 94°C for 4 min, followed by 35 cycles at 94°C for 30 sec, 55°C for 30 sec, 60°C for 90 sec, and a final step at 60°C for 10 min. The sequences were processed at the sequencing facility at the McArdle Laboratory for Cancer Research.
Sequence data from seven genes were concatenated for each of the 12 lab inbred strains. The genes included in the analysis were Mgmt, Ebf3, Dpysl4, Hras1, Fgf3, Fgf4, and Ccnd1. The linked strain sequences were aligned with ClustalX (Thompson et al. 1997). Loss-parsimony analysis according to the method of Fitch (1971) was executed with the DNAPARS program in the PHYLIP package. The output data were used to generate a phylogenetic tree with the DRAWGRAM program in the PHYLIP package (Felsenstein 1989).
Sequence of the Hras1 gene in 14 inbred strains
When AKR is compared to each of the 11 other laboratory inbred strains, the two AKR SNPs aggregate to a total of 22 SNPs among all inbred pairwise comparisons, yielding an average of 0.3 SNPs between any 2 of the 12 inbred strains. This SNP rate is a much lower level of sequence variation than expected without any prior knowledge of the diversity of the region. Based on recent global estimates of sequence diversity, we expected to observe a total of 344 SNPs, or approximately 5 SNPs for any two-strain comparison (Frazer et al. 2004; Wade et al. 2002). The observed variation is a 17-fold reduction in the expected number of SNPs and is highly significant (the expected SNP rate is outside of the 99.99% confidence interval for the observed SNP rate).
As anticipated, CAST and SPRET genomes had a greater number of SNPs compared with the common laboratory inbred strains with 26 and 51 SNPs each, respectively (versus B6 and 10 other inbred strains); these results are consistent with previous estimations of 1 SNP/200 bp between CAST or SPRET and the laboratory inbred mouse (Frazer et al. 2004; Lindblad-Toh et al. 2000).
Ancestral origin of the common Hras1 allele
To determine the origin of the inbred Hras1 gene, we resequenced an 870-bp region that contains the AKR SNP located in intron 1 using genomic DNA from the WSB/EiJ (WSB) and CZECHII/EiJ (CZECH) strains. WSB and CZECH are wild-derived inbred strains representative of the ancestral species M. m. domesticus and M. m. musculus, respectively (Wade et al. 2002). Three SNPs were detected between the two archetypal strains, WSB and CZECH (SNP rate = 34/10 kb), and this rate is consistent with the upper level of diversity commonly observed between any two laboratory inbred strains (Frazer et al. 2004). All 12 of the classic laboratory inbred strains, including AKR, share the WSB nucleotides at this location (Fig. 1). The AKR SNP in intron 1 is unique because it is not observed in the WSB or CZECH sequence (Supplementary Table 2). Thus, the 12 laboratory inbred strains are effectively monoallelic for the M. m. domesticus-derived allele.
SNP frequencies of 12 genes on distal Chr 7
Two regions of considerably greater diversity border the conserved block. The three sampled genes located in the 136.96–138.95-Mb interval, Mgmt, Ebf3, and Dpysl4, have SNP frequencies of 20–40 SNP/10 kb. The three sampled genes located in the final megabase of sequence on Chr 7, Fgf3, Fgf4, and Ccnd1, have similar, albeit slightly lower, frequencies in the range of 5–30 SNP/10 kb for all pairwise comparisons (Fig. 2, lower panel). For 9 of the 12 genes, the sequenced fragment spans an intron-exon junction. For three genes, Mgmt, Ebf3, and Igf2, the sequenced region consists entirely of intronic sequence. For genes other than Hras1, all sequenced exon regions are coding, and there is approximately twice as much intron sequence as exon sequence (2741 bp vs. 1392). Supplementary Table 1 contains the exon and intron coverage per sequenced gene region. The average SNP rates of exonic and intronic sequence are 6 and 14 SNP/10 kb, respectively (data not shown), and this ratio is consistent with previous observations (Thomas et al. 2002).
The diversity between the laboratory inbred mouse and CAST or SPRET is significantly higher (p < 0.05, Fisher’s exact test) than the level of variation among the laboratory inbred strains, and this SNP rate is constant throughout the 8-Mb region (Fig. 2, upper panel).
Haplotype and phylogenetic analysis of 12 genes in the final 8 Mb of Chr 7
The phylogeny of this Chr 7 region, as represented by the SNP data shown in Fig. 3, was analyzed using the PHYLIP loss-parsimony program (Felenstein 1989). The resulting tree (Supplementary Fig. 1) reveals relatedness at distal Chr 7 that does not match the strains’ overall relatedness based on genealogic, microsatellite, and SNP data (Beck et al. 2000; Petkov et al. 2004; Tsang et al. 2005; Witmer et al. 2003). For instance, in genome-wide analyses, 129P3/J and C3H/HeJ align to two separate branches although they are identical at the distal Chr 7 region. C57BR/cdJ and C57BL/6J are closely related in genome-wide scans, but these two strains are distinct in the distal Chr 7 region and C57BR/cdJ and SM/J are entirely identical. These results are consistent with previous observations that local and global hereditary relationships often differ (Frazer et al. 2004; Park et al. 2002).
Validation of the haplotype structure
Previous estimates indicate that M. m. domesticus ancestry accounts for two thirds of the laboratory inbred mouse genome (Wade et al. 2002). Given this proportion, the chance of randomly encountering a locus in which 12 laboratory inbred strains have all fixed the M. m. domesticus allele, as for the monoallelic block on distal Chr 7, is (0.67)12 = 0.008 per block, or about 10 per genome. To determine the frequency of such blocks empirically, and to validate the presence of this large, monoallelic block on distal Chr 7, we queried the Mouse Phenome Database (http://www.aretha.jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home). Although this database has information for all 12 lab inbred strains, the level of strain coverage varies. We restricted our analysis to the eight strains in our set for which at least 155,000 SNPs per genome had been analyzed: 129X1/SvJ, A/J, AKR/J, BALB/cByJ, C3H/HeJ, B6, DBA/2J, and SM/J. We discovered that for 354 nucleotide positions in a 2-Mb (141–143 Mb) window that lies within the monoallelic block on distal Chr 7, 100% were identical for all eight strains. We assessed the level of SNP variation throughout the entire genome to determine how often one would encounter a 2-Mb block with this level of sequence similarity at random among eight laboratory inbred strains. We analyzed all entered SNP information for the eight sampled strains in 2-Mb sliding windows. We identified five hundred 2-Mb intervals that contained information for at least 50 SNPs for the eight strains queried. In this analysis, 1% of these 2-Mb blocks are identical for all nucleotide positions in all eight laboratory strains. However, the block on distal Chr 7 had the highest level of significance as determined by comparing the lower bounds of the 95% confidence intervals (data not shown). Therefore, although the monoallelic block on distal Chr 7 is not singular, blocks of this size with similar sequence conservation (100% similarity among many strains) are rare in the mouse genome.
Remarkably, a 2.5-Mb region on distal Chr. 7 that encompasses several genes, including Hras1, is essentially monoallelic among many laboratory inbred strains, all of which have fixed the M. m. domesticus allele. Although the AKR/J strain harbors two SNPs in the Hras1 gene, they are both in noncoding regions. The AKR/J strain develops Hras1 mutations in approximately 4% of its chemically induced liver tumors, which is very similar to the phenotype of the C57BL/6J and DBA/2J strains (J.C. Drew and N.R. Drinkwater, unpublished). These observations support the inference that the AKR-specific SNPs are phenotypically silent.
As a result of this high degree of haplotype conservation, we conclude that a 4.5-kb region carrying the Hras1 gene is not responsible for strain-dependent variation in Hras1 mutation rates in liver tumors. Although sequence analysis included the promoter regions and the 3′ untranslated region, regulatory elements beyond the sequenced domain may affect expression and result in variable rates of Hras1-initiated liver tumors. We analyzed Hras1 expression in the livers of five of the classic inbred strains (129X1/SvJ, C57BR/cdJ, C57BL/6J, C3H/HeJ, and SM/J) by microarray and did not observe any significant differences among the strains (A. Bilger, personal communication).
The extremely low rate of variation in the 2.5-5-Mb Hras1 region, which contains up to 100 genes, is highly unusual. The overall SNP rate of the monoallelic region on distal Chr 7 is 0.48 SNPs/10 kb and is consistent with the observed rate in low-SNP regions for a single pair of strains. However, this low SNP rate is very rare for comparisons among many strains. Frazer et al. (2004) recently suggested that haplotype analysis with 12 inbred strains captures greater than 95% of the variation likely to be present among the classic laboratory inbred strains. The set of 12 lab inbred strains presented here are genetically diverse and include members from the six major families of inbred strains (Witmer et al. 2003). The monoallelic region on distal Chr 7 is the only block reported with such a low rate of variation over 66 pairwise strain comparisons. The two less-conserved regions flanking the monoallelic block are in agreement with the published average rate of 35 SNP/10 kb in a SNP-dense block, and our strain distribution of ancestral alleles is in accordance with the predicted proportion of two thirds with M. m. domesticus ancestry (Frazer et al. 2004; Wade et al. 2002).
The monoallelic interval, or SNP “desert,” on distal Chr 7 affects QTL analysis in two ways. First, this SNP desert is effectively noninformative and precludes linkage analysis in this region for pairs of the aforementioned, frequently used, 12 laboratory inbred strains. Second, the lack of genotypic differences among the 12 common lab strains examined in this study suggests that phenotypic differences are unlikely to be the direct result of genes located in this monoallelic 2-5-Mb interval, which includes Hras1, H19, Igf2, Insulin 2 precursor (Ins2), and Cyclin-dependent kinase inhibitor 1c (Cdkn1c) (Mouse Genome Informatics, www.informatics.jax.org).
Functional selection may account for the shared M. m. domesticus ancestry in the distal Chr 7 monoallelic region. However, the Hras1 gene is unlikely to be the driving force of such selection because Hras1 is not an essential gene and it demonstrates functional redundancy with other RAS family members (Esteban et al. 2001; Ise et al. 2000; Johnson et al. 1997). The monoallelic block also includes the well-studied H19/Igf2 imprinted region, which could drive selection for this haplotype. To our knowledge, this is the first detailed analysis of sequence variation among a large number of inbred strains in any imprinted region. Integrity of human imprinting clusters is essential and may be important for regulating expression. Since imprinted clusters are well conserved between human and mouse, murine imprinted regions may also require strict conservation (Caspary et al. 1998; Reik and Walter 2001). However, mice lacking functional Igf2 are viable, which suggests that Igf2 is not solely driving selection (Constania et al. 2000). A congenic mouse line carrying the M. m. musculus distal Chr 7 region on a M. m. domesticus or standard inbred background would reveal whether the M. m. musculus Chr 7 allele results in a selective disadvantage.
Reduced variation is often observed at chromosomal regions linked to a beneficial mutation driving positive selection (Kaplan et al. 1989). For example, Schlenke and Begeen (2004) identified a chromosomal invariant region in a Drosphilia simulans population that was best explained by recent fixation due to directional selection. Candidate gene evaluation of the region led to a locus containing a transposon that is associated with increased resistance to insecticides and thereby potentially improves fitness. As this example shows, monoallelic blocks can be signposts for biologically relevant loci and conserved regions such as that on distal mouse Chr 7 may be fruitful to explore.
The authors are grateful to Kristin Liss, Kim Leutkehoelter, Andrew Schneider, and Matthew Gigot for expert technical assistance in various aspects of this work. They also thank Andrea Bilger for critical advice on the manuscript. This work was supported by grants CA22484 and CA009135 from the National Cancer Institute of the National Institutes of Health.
- Bonhomme F, Guenet JL, Dod B, Moriwaki K, Bulfield G (1987) The polyphyletic origin of laboratory inbred mice and their rate of evolution. J Linn Soc 30:51–58Google Scholar
- Esteban LM, Vicario-Abejon C, Fernandez-Salguero P, Fernandez-Medarde A, Swaminathan N, et al. (2001) Targeted genomic disruption of H-ras and N-ras, individually or in combination, reveals the dispensability of both loci for mouse growth and development. Mol Cell Biol 21:1444–1452PubMedCrossRefGoogle Scholar
- Felsenstein J (1989) PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5:164–166Google Scholar