Abstract
This study presents the first global, 1-Mbp-level analysis of patterns of nucleotide substitutions along the human lineage. The study is based on the analysis of a large amount of repetitive elements deposited into the human genome since the mammalian radiation, yielding a number of results that would have been difficult to obtain using the more conventional comparative method of analysis. This analysis revealed substantial and consistent variability of rates of substitution, with the variability ranging up to twofold among different regions. The rates of substitutions of C or G nucleotides with A or T nucleotides vary much more sharply than the reverse rates, suggesting that much of that variation is due to differences in mutation rates rather than in the probabilities of fixation of C/G vs. A/T nucleotides across the genome. For all types of substitution we observe substantially more hotspots than coldspots, with hotspots showing substantial clustering over tens of Mbp’s. Our analysis revealed that GC-content of surrounding sequences is the best predictor of the rates of substitution. The pattern of substitution appears very different near telomeres compared to the rest of the genome and cannot be explained by the genome-wide correlations of the substitution rates with GC content or exon density. The telomere pattern of substitution is consistent with natural selection or biased gene conversion acting to increase the GC-content of the sequences that are within 10–15 Mbp away from the telomere.
Similar content being viewed by others
References
Arndt PF (2004) Identification and measurement of neighbor dependent nucleotide substitution process. Lecture Notes in Informatics P-53 (2004) 227–234; accepted for publication in Bioinformatics (2005).
PF Arndt CB Burge T Hwa (2003a) ArticleTitleDNA sequence evolution with neighbor-dependent mutation J Comput Biol 10 313–322 Occurrence Handle10.1089/10665270360688039 Occurrence Handle1:CAS:528:DC%2BD3sXms1ajt7s%3D
PF Arndt DA Petrov T Hwa (2003b) ArticleTitleDistinct changes of genomic biases in nucleotide substitution at the time of mammalian radiation Mol Biol Evol 26 1887–1896 Occurrence Handle10.1093/molbev/msg204
G Bernardi (2000) ArticleTitleIsochores and the evolutionary genomics of vertebrates Gene 241 3–17 Occurrence Handle10.1016/S0378-1119(99)00485-0 Occurrence Handle1:CAS:528:DyaK1MXotVGksrw%3D Occurrence Handle10607893
MJ Box (1966) ArticleTitleA comparison of several current optimization methods and use of transformations in constrained problems Compute J 9 67–77
RJ Britten WF Baron DB Stout EH Davidson (1988) ArticleTitleSources and evolution of human Alu repeated sequences Proc Natl Acad Sci USA 85 4770–4774 Occurrence Handle1:CAS:528:DyaL1cXlt1enuro%3D Occurrence Handle3387437
T Caspersson KR Castleman G Lomakka EJ Modest A Moller R Nathan RJ Wall L Zech (1971) ArticleTitleAutomatic karyotyping of quinacrine mustard stained human chromosomes Exp Cell Res 67 233–235 Occurrence Handle10.1016/0014-4827(71)90645-8 Occurrence Handle1:STN:280:CS6B1crjtVQ%3D Occurrence Handle5569201
VG Cheung N Nowak W Jang et al. (2001) ArticleTitleIntegration of cytogenetic landmarks into the draft sequence of the human genome Nature 409 953–988 Occurrence Handle10.1038/35057192 Occurrence Handle1:STN:280:DC%2BD3M7lvFGmsg%3D%3D Occurrence Handle11237021
C Coulondre JH Miller PJ Farabaugh W Gilbert (1978) ArticleTitleMolecular basis of base substitution hotspots in Escherichia coli Nature 274 775–780 Occurrence Handle355893
L Duret N Galtier (2000) ArticleTitleThe covariation between TpA deficiency, CpG deficiency, and G+C content of human isochores is due to a mathematical artifact Mol Biol Evol 17 1620–1625 Occurrence Handle1:CAS:528:DC%2BD3cXnvFyqtrw%3D Occurrence Handle11070050
L Duret M Semon G Piganeau D Mouchiroud N Galtier (2002) ArticleTitleVanishing GC-rich isochores in mammalian genomes Genetics 162 1837–1847 Occurrence Handle1:CAS:528:DC%2BD3sXht1ajs7s%3D Occurrence Handle12524353
H Ellegren NG Smith MT Webster (2003) ArticleTitleMutation rate variation in the mammalian genome Curr Opin Genet Dev 13 562–568 Occurrence Handle10.1016/j.gde.2003.10.008 Occurrence Handle1:CAS:528:DC%2BD3sXpt1SnsLw%3D Occurrence Handle14638315
A Eyre-Walker LD Hurst (2001) ArticleTitleThe evolution of isochores Nature Rev Genet 2 549–555 Occurrence Handle10.1038/35080577 Occurrence Handle1:CAS:528:DC%2BD3MXltVCksb8%3D
J Filipski JP Thiery G Bernard (1973) ArticleTitleAn analysis of the bovine genome by Cs2SO4-Ag density gradient centrifugation J Mol Biol 80 177–197 Occurrence Handle10.1016/0022-2836(73)90240-4 Occurrence Handle1:CAS:528:DyaE2cXht1Wkug%3D%3D Occurrence Handle4798988
KJ Fryxell E Zuckerkandl (2000) ArticleTitleCytosine deamination plays a primary role in the evolution of mammalian isochores Mol Biol Evol 17 1371–1383 Occurrence Handle1:CAS:528:DC%2BD3cXmtFyrtrc%3D Occurrence Handle10958853
TS Furey D Haussler (2003) ArticleTitleIntegration of the cytogenetic map with the draft human genome sequence Hum Mol Genet 12 1037–1044 Occurrence Handle10.1093/hmg/ddg113 Occurrence Handle1:CAS:528:DC%2BD3sXkt1Kku7c%3D Occurrence Handle12700172
RC Hardison KM Roskin S Yang et al. (2003) ArticleTitleCovariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution Genome Res 13 13–26 Occurrence Handle10.1101/gr.844103 Occurrence Handle1:CAS:528:DC%2BD3sXnvFGmsg%3D%3D Occurrence Handle12529302
ST Hess JD Blake RD Blake (1994) ArticleTitleWide variations in neighbor-dependent substitution rates J Mol Biol 236 1022–1033 Occurrence Handle10.1016/0022-2836(94)90009-4 Occurrence Handle1:CAS:528:DyaK2cXis1aqsbk%3D Occurrence Handle8120884
T Hubbard D Barker E Birney et al. (2002) ArticleTitleThe Ensembl genome database project Nucleic Acids Res 30 38–41 Occurrence Handle10.1093/nar/30.1.38 Occurrence Handle1:CAS:528:DC%2BD38Xht12ksbY%3D Occurrence Handle11752248
J Jurka (2000) ArticleTitleRepbase update: a database and an electronic journal of repetitive elements Trends Genet 16 418–420 Occurrence Handle1:CAS:528:DC%2BD3cXmvFygtr0%3D Occurrence Handle10973072
J Jurka T Smith (1988) ArticleTitleA fundamental division in the Alu family of repeated sequences Proc Natl Acad Sci USA 85 4775–4778 Occurrence Handle1:CAS:528:DyaL1cXlt1enurs%3D Occurrence Handle3387438
V Kapitonov J Jurka (1996) ArticleTitleThe age of Alu subfamilies J Mol Evol 42 59–65 Occurrence Handle1:CAS:528:DyaK28XntFOmtg%3D%3D Occurrence Handle8576965
A Kong DF Gudbjartsson J Sainz et al. (2002) ArticleTitleA high-resolution recombination map of the human genome Nature Genet 31 241–247 Occurrence Handle1:CAS:528:DC%2BD38XkvVGmtLc%3D Occurrence Handle12053178
HM Kritzer (1980) ArticleTitlecomparing partial order correlations from contingency table data Sociol Methods Res 8 420–433
S Kumar S Subramanian (2002) ArticleTitleMutation rates in mammalian genomes Proc Natl Acad Sci USA 99 803–808 Occurrence Handle10.1073/pnas.022629899 Occurrence Handle1:CAS:528:DC%2BD38Xht1Wis74%3D Occurrence Handle11792858
ES Lander LM Linton B Birren et al. (2001) ArticleTitleInitial sequencing and analysis of the human genome Nature 409 860–921 Occurrence Handle10.1038/35057062 Occurrence Handle11237011
MJ Lerche AO Urrutia A Pavlicek LD Hurst (2003) ArticleTitleA unification of mosaic structures in the human genome Hum Mol Genet 12 2411–2415 Occurrence Handle10.1093/hmg/ddg251 Occurrence Handle12915446
MJ Lercher JV Chamary LD Hurst (2004) ArticleTitleGenomic regionally in rate of evolution is not explained by clustering of genes of comparable expression profile Genome Res 14 1002–1013 Occurrence Handle10.1101/gr.1597404 Occurrence Handle1:CAS:528:DC%2BD2cXkvFGhsbY%3D Occurrence Handle15173108
J Meunier L Duret (2004) ArticleTitleRecombination drives the evolution of GC-content in the human genome Mol Biol Evol 21 984–990 Occurrence Handle10.1093/molbev/msh070 Occurrence Handle1:CAS:528:DC%2BD2cXksVymtbY%3D Occurrence Handle14963104
D Mouchiroud G D’Onofrio B Aissani G Macaya C Gautier G Bernardi (1991) ArticleTitleThe distribution of genes in the human genome Gene 100 181–187 Occurrence Handle10.1016/0378-1119(91)90364-H Occurrence Handle1:CAS:528:DyaK3MXkvVKmurc%3D Occurrence Handle2055469
WH Press SA Teukolsky WT Vetterling BP Flannery (1992) Numerical recipes in C. The art of scientific computing Cambridge University Press Cambridge
PD Rabinowicz LE Palmer BP May MT Hemann SW Lowe WR McCombie RA Martienssen (2003) ArticleTitleGenes and transposons are differentially methylated in plants, but not in mammals Genome Res 13 2658–2664 Occurrence Handle10.1101/gr.1784803 Occurrence Handle1:CAS:528:DC%2BD3sXpvVaku7s%3D Occurrence Handle14656970
A Razin AD Riggs (1980) ArticleTitleDNA methylation and gene function Science 210 604–610 Occurrence Handle1:CAS:528:DyaL3cXmtlartro%3D Occurrence Handle6254144
S Saccone A De Sario J Wiegant AK Raap G Delia Valle G Bernard (1993) ArticleTitleCorrelations between isochores and chromosomal bands in the human genome Proc Natl Acad Sci USA 90 11929–11933 Occurrence Handle1:CAS:528:DyaK2cXntVKmug%3D%3D Occurrence Handle8265650
NG Smith MT Webster H Ellegren (2002) ArticleTitleDeterministic mutation rate variation in the human genome Genome Res 12 1350–1356 Occurrence Handle10.1101/gr.220502 Occurrence Handle1:CAS:528:DC%2BD38Xnt1elsbo%3D Occurrence Handle12213772
S Subramanian S Kumar (2003) ArticleTitleNeutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes Genome Res 13 838–844 Occurrence Handle10.1101/gr.1152803 Occurrence Handle1:CAS:528:DC%2BD3sXjs1Oqur8%3D Occurrence Handle12727904
RH Waterston K Lindblad-Toh E Birney et al. (2002) ArticleTitleInitial sequencing and comparative analysis of the mouse genome Nature 420 520–562 Occurrence Handle10.1038/nature01262 Occurrence Handle1:CAS:528:DC%2BD38Xpt1WhsLw%3D Occurrence Handle12466850
JA Yoder CP Walsh TH Bestor (1997) ArticleTitleCytosine methylation and the ecology of intragenomic parasites Trends Genet 13 335–340 Occurrence Handle1:CAS:528:DyaK2sXlt1Ggu78%3D Occurrence Handle9260521
Acknowledgments
T.H. is supported by the NSF through Grants 0211308, 0216576, and 0225630. D.P. is supported by NSF Grant DEB-0317171, the Terman Award, and the Alfred P. Sloan Fellowship in Computational Molecular Biology. P.A. and D.P. are grateful for the hospitality of the Center for Theoretical Biological Physics at UCSD, where extensive discussions of this research took place.
Author information
Authors and Affiliations
Corresponding author
Additional information
Reviewing Editor: Dr. Jerzy Jurka
Rights and permissions
About this article
Cite this article
Arndt, P.F., Hwa, T. & Petrov, D.A. Substantial Regional Variation in Substitution Rates in the Human Genome: Importance of GC Content, Gene Density, and Telomere-Specific Effects. J Mol Evol 60, 748–763 (2005). https://doi.org/10.1007/s00239-004-0222-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-004-0222-5