Skip to main content
Log in

Comparative genomics: methods and applications

  • Review
  • Published:
Naturwissenschaften Aims and scope Submit manuscript

Abstract

Interpreting the functional content of a given genomic sequence is one of the central challenges of biology today. Perhaps the most promising approach to this problem is based on the comparative method of classic biology in the modern guise of sequence comparison. For instance, protein-coding regions tend to be conserved between species. Hence, a simple method for distinguishing a functional exon from the chance absence of stop codons is to investigate its homologue from closely related species. Predicting regulatory elements is even more difficult than exon prediction, but again, comparisons pinpointing conserved sequence motifs upstream of translation start sites are helping to unravel gene regulatory networks. In addition to interspecific studies, intraspecific sequence comparison yields insights into the evolutionary forces that have acted on a species in the past. Of particular interest here is the identification of selection events such as selective sweeps. Both intra- and interspecific sequence comparisons are based on a variety of computational methods, including alignment, phylogenetic reconstruction, and coalescent theory. This article surveys the biology and the central computational ideas applied in recent comparative genomics projects. We argue that the most fruitful method of understanding the functional content of genomes is to study them in the context of related genomic sequences. In particular, such a study may reveal selection, a fundamental pointer to biological relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Abril JF, Guigó R, Wiehe T (2004) gff2aplot: plotting sequence comparisons. Bioinformatics 19:2477–2479

    Article  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J. Mol Biol 215:403–410

    Article  CAS  Google Scholar 

  • Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    CAS  PubMed  Google Scholar 

  • Andersson SG, Alsmark C, Canback B, Davids W, Frank C, Karlberg O, Klasson L, Antoine-Legault B, Mira A, Tamas I (2002) Comparative genomics of microbial pathogens and symbionts. Bioinformatics 18 [Suppl 2]:S17

  • Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, Gelpke MD, Roach J, Oh T, Ho IY, Wong M, Detter C, Verhoef F, Predki P, Tay A, Lucas S, Richardson P, Smith SF, Clark MS, Edwards YJ, Doggett N, Zharkikh A, Tavtigian SV, Pruss D, Barnstead M, Evans C, Baden H, Powell J, Glusman G, Rowen L, Hood L, Tan YH, Elgar G, Hawkins T, Venkatesh B, Rokhsar D, Brenner S (2002) Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301–1310

    Article  CAS  PubMed  Google Scholar 

  • Baeza-Yates RA, Perleberg CH(1992) Fast and practical approximate string matching. In: Springer (ed) Proc 3rd Symp Combinatorial Pattern Matching. (Springer lecture notes in computer science, vol 644) Springer, Berlin Heidelberg New York, pp 185–192

  • Bennetzen J (2002) Opening the door to comparative plant biology. Science 296:60–63

    Article  PubMed  Google Scholar 

  • Bernal A, Ear U, Kypides N (2001) Genomes online database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res 29:126–127

    Article  CAS  PubMed  Google Scholar 

  • Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L, Rubin EM (2003) Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299:1391–1394

    Article  CAS  PubMed  Google Scholar 

  • Brachat S, Dietrich FS, Voegeli S, Zhang Z, Stuart L, Lerch A, Gates K, Gaffney T, Philippsen P (2003) Reinvestigation of the Saccharomyces cerevisiae genome annotation by comparison to the genome of a related fungus: Ashbya gossypii. Genome Biol 4:R45

    Article  PubMed  Google Scholar 

  • Bray N, Dubchak I, Pachter L (2003) AVID: a global alignment program. Genome Research 13:97–102

    Article  CAS  PubMed  Google Scholar 

  • Brendel V, Kurtz S, Walbot V (2002) Comparative genomics of Arabidopsis and maize: prospects and limitations. Genome Biol 3:1005

    Article  Google Scholar 

  • Brenner S, Elgar G, Sandford R, Macrae A, Venkatesh B, Aparicio S (1993) Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature 366:265–268

    Article  CAS  PubMed  Google Scholar 

  • Brosch R, Pym AS, Gordon SV, Cole ST (2001) The evolution of mycobacterial pathogenicity: clues from comparative genomics. Trends Microbiol 9:452–458

    Article  CAS  PubMed  Google Scholar 

  • Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94

    Article  CAS  PubMed  Google Scholar 

  • Buysse JM (2001) The role of genomics in antibacterial target discovery. Curr Med Chem 8:1713–1726

    CAS  PubMed  Google Scholar 

  • Casjens S (1998) The diverse and dynamic structure of bacterial genomes. Annu Rev Genet 32:339–377

    Article  CAS  PubMed  Google Scholar 

  • Chiaromonte F, Yap VB, Miller W (2002) Scoring pairwise genomic sequence alignments. Pacific Symp Biocomput 2002:115–126

    Google Scholar 

  • Chung HR, Gusfield G (2003) Perfect phylogeny haplotyper: haplotype inferral using a tree model. Bioinformatics 19:780–781

    Article  CAS  PubMed  Google Scholar 

  • Clark AG(1990) Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol 7:111–122

    CAS  PubMed  Google Scholar 

  • Clark AG, Gibson G, Kaufman T, Myers E, O’Grady P (2003) Draft proposal for Drosophila as a model system for comparative genomics. http://life.biology.mcmaster.ca/brian/evoldir.html

  • Clark MS (1999) Comparative genomics: the key to understanding the human genome project. Bioessays 21:121–130

    Article  CAS  PubMed  Google Scholar 

  • Cole JR, Chai B, Marsh TL, Farris RJ, Wang Q, Kulam SA, Chandra S, McGarrell DM, Schmidt TM, Garrity GM, Tiedje JM (2003) The ribosomal database project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res 31:442–443

    Article  CAS  PubMed  Google Scholar 

  • Cole ST (1998) Comparative mycobacterial genomics. Curr Opin Microbiol 1:567–571

    Article  CAS  PubMed  Google Scholar 

  • Couronne O, Poliakov A, Bray N, Ishkhanov T, Ryaboy D, Rubin E, Pachter L, Dubchak I (2003) Strategies and tools for whole-genome alignments. Genome Res 13:73–80

    Article  CAS  PubMed  Google Scholar 

  • Crollius HR, Jaillon O, Bernot A, Dasilva C, Bouneau L, Fischer C, Fizames C, Wincker P, Brottier P, Quetier F, Saurin W, Weissenbach J (2000) Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nat Genet 25:235–238

    Article  CAS  PubMed  Google Scholar 

  • Darwin C (1859) On the origin of species by means of natural selection or the preservation of favoured races in the struggle for life, 1985 edn. Penguin, London

  • Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure, vol 5. National Biomedical Research Foundation, Washington, D.C., pp 345–352

  • Delcher AL, Kasti S, Fleischmann RD, Peterson J, White W, Salzberg SL (1999) Alignment of whole genomes. Nucleic Acids Res 27:2369–2376

    Article  CAS  PubMed  Google Scholar 

  • Dermitzakis ET, Reymond A, Lyle R, Scamuffa N, Ucla C, Deutsch S, Stevenson BJ, Flegel V, Bucher P, Jongeneel CV, Antonarakis SE (2002) Numerous potentially functional but non-genic conserved sequences on human chromosome 21. Nature 420:578–582

    Article  CAS  PubMed  Google Scholar 

  • Dubchak I, Brudno M, Loots GG, Pachter L, Mayor C, Rubin EM, Frazer KA (2000) Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res 10:1304–1306

    Article  CAS  PubMed  Google Scholar 

  • Edwards AWF, Cavalli-Sforza LL (1964) Reconstruction of evolutionary trees. In: Heywood VH, McNeill J (eds) Phenetic and phylogenetic classification. Systematics Association, London, pp 67–76

  • Eigen M, Winkler-Oswatitsch R, Dress A (1988) Statistical geometry in sequence space: a method of quantitative comparative sequence analysis. Proc Natl Acad Sci USA 85:5913–5917

    CAS  PubMed  Google Scholar 

  • Elgar G, Sandford R, Aparicio S, Macrae A, Venkatesh B, Brenner S (1996) Small is beautiful: comparative genomics with the pufferfish (Fugu rubripes). Trends Genet 12:145–150

    Article  CAS  PubMed  Google Scholar 

  • Fay JC, Wu CI (2000) Hitchhiking under positive Darwinian selection. Genetics 155:1405–1413

    CAS  PubMed  Google Scholar 

  • Felsenstein J (1993) PHYLIP (phylogeny interference package). University of Washington, Seattle

  • Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland, Mass.

  • Field D, Hood D, Moxon R (1999) Contribution of genomics to bacterial pathogenesis. Curr Opin Genet Dev 9:700–703

    Article  CAS  PubMed  Google Scholar 

  • Fitzgerald JR, Musser JM (2001) Evolutionary genomics of pathogenic bacteria. Trends Microbiol 9:547–553

    Article  CAS  PubMed  Google Scholar 

  • Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton GG, Kelley JM, Fritchman JL, Weidman JF, Small KV, Sandusky M, Fuhrmann JL, Nguyen DT, Utterback T, Saudek DM, Phillips CA, Merrick JM, Tomb J, Dougherty BA, Bott KF, Hu PC, Lucier TS, Peterson SN, Smith HO, Venter JC (1995) The minimal gene complement of Mycoplasma genitalium. Science 270:397–403

    CAS  PubMed  Google Scholar 

  • Fu YX, Li WH(1993) Statistical tests of neutrality of mutations. Genetics 133:693–709

    CAS  PubMed  Google Scholar 

  • Galagan JE, Nusbaum C, Roy A, Endrizzi MG, Macdonald P, FitzHugh W, Calvo S, Engels R, Smirnov S, Atnoor D, Brown A, Allen N, Naylor J, Stange-Thomann N, DeArellano K, Johnson R, Linton L, McEwan P, McKernan K, Talamas J, Tirrell A, Ye W, Zimmer A, Barber RD, Cann I, Graham DE, Grahame DA, Guss AM, Hedderich R, Ingram-Smith C, Kuettner HC, Krzycki JA, Leigh JA, Li W, Liu J, Mukhopadhyay B, Reeve JN, Smith K, Springer TA, Umayam LA, White O, White RH, Conway de Macario E, Ferry JG, Jarrell KF, Jing H, Macario AJ, Paulsen I, Pritchett M, Sowers KR, Swanson RV, Zinder SH, Lander E, Metcalf WW, Birren B (2002) The genome of M. acetivorans reveals extensive metabolic and physiological diversity. Genome Res 12:532–542

    Article  CAS  PubMed  Google Scholar 

  • Galperin MY, Koonin EV (2003) Frontiers in computational genomics. (Functional genomics, vol 3) Caister, Wymondham

  • Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415:141–147

    Article  CAS  PubMed  Google Scholar 

  • Gilad Y, Rosenberg S, Przeworski M, Lancet D, Skorecki K (2002) Evidence for positive selection and population structure at the human MAO-A gene. Proc Natl Acad Sci USA 99:862–867

    Article  CAS  PubMed  Google Scholar 

  • Gish W, States D (1993) Identification of protein coding regions by database similarity search. Nat Genet 3:266–272

    CAS  PubMed  Google Scholar 

  • Glinka S, Ometto L, Mousset S, Stephan W, Lorenzo DD (2003) Demography and natural selection have shaped genetic variation in Drosophila melanogaster: a multi-locus approach. Genetics 165:1269–1278

    PubMed  Google Scholar 

  • Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG(1996) Life with 6000 genes. Science 274:546–563

    Article  CAS  PubMed  Google Scholar 

  • Goldstein DB (2001) Islands of linkage disequilibrium. Nat Genet 29:109–111

    Article  CAS  PubMed  Google Scholar 

  • Griffiths RC, Marjoram P (1997) An ancestral recombination graph. In: Donnelly P, Tavar’e S (eds) Progress in population genetics and human evolution. (The IAM volumes in mathematics and its applications, vol 87) Springer, Berlin Heidelberg New York, pp 257–270

  • Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge

  • Gusfield D (2001) Inference of haplotypes from samples of diploid populations: complexity and algorithms. J Comput Biol 8:305–323

    Article  CAS  PubMed  Google Scholar 

  • Hardison RC (2000) Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet 16:369–372

    Article  CAS  PubMed  Google Scholar 

  • Hardison RC, Oeltjen J, Miller W (1997) Long human–mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res 7:959–966

    CAS  PubMed  Google Scholar 

  • Haubold B, Wiehe T (2001) Statistics of divergence times. Mol Biol Evol 18:1157–1160

    CAS  PubMed  Google Scholar 

  • Haubold B, Wiehe T (2002) Calculating the SNP-effective sample size from an alignment. Bioinformatics 18:36–38

    Article  CAS  PubMed  Google Scholar 

  • Haubold B, Kroymann J, Ratzka A, Mitchell-Olds T, Wiehe T (2002) Recombination and gene conversion in Arabidopsis thaliana. Genetics 161:1269–1278

    CAS  PubMed  Google Scholar 

  • Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89:10915–10919

    CAS  PubMed  Google Scholar 

  • Horikawa Y, Oda N, Cox NJ, Li X, Orho-Melander M, Hara M, Hinokio Y, Lindner TH, Mashima H, Schwarz PEH, Bosque-Plata L del, Horikawa Y, Oda Y, Yoshiuchi I, Colilla S, Polonsky KS, Wei S, Concannon P, Iwasaki N, Schulze J, Baier LJ, Bogardus C, Groop L, Boerwinkle E, Hanis CL, Bell GI (2000) Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nat Genet 26:163–175

    Article  CAS  PubMed  Google Scholar 

  • Hudson RR (1983) Properties of a neutral allele model with intragenic recombination. Theor Popul Biol 23:183–201

    CAS  PubMed  Google Scholar 

  • Hudson RR (1990) Gene genealogies and the coalescent process. Oxford Surv Evol Biol 7:1–44

    Google Scholar 

  • Hudson RR (2002) Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18:337–338

    Article  CAS  PubMed  Google Scholar 

  • Hudson RR, Kaplan NL (1987) The coalescent process in models with selection and recombination. Genetics 120:831–840

    Google Scholar 

  • Hudson RR, Slatkin M, Maddison WP (1992) Estimation of levels of gene flow from DNA sequence data. Genetics 132:583–589

    CAS  PubMed  Google Scholar 

  • Huson DH (1998) SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14:68–73

    Article  CAS  PubMed  Google Scholar 

  • Hutchison CA III, Peterson SN, Gill SR, Cline RT, White O, Fraser CM, Smith HO, Venter CJ (1999) Global transposon mutagenesis and a minimal Mycoplasma genome. Science 286:2165

    Article  CAS  PubMed  Google Scholar 

  • International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921

    Article  PubMed  Google Scholar 

  • International SNP Map Working Group (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928–933

    Article  PubMed  Google Scholar 

  • Kan Z, Rouchka E, Gish W, States D (2001) Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res 11:889–900

    Article  CAS  PubMed  Google Scholar 

  • Kaneko T, Nakamura Y, Sato S, Minamisawa K, Uchiumi T, Sasamoto S, Watanabe A, Idesawa K, Iriguchi M, Kawashima K, Kohara M, Matsumoto M, Shimpo S, Tsuruoka H, Wada T, Yamada M, Tabata S (2002) Complete genomic sequence of nitrogen-fixing symbiotic bacterium Bradyrhizobium japonicum USDA110. DNA Res 9:189–197

    PubMed  Google Scholar 

  • Kawai J, Shinagawa A, Shibata K, Yoshino M, Itoh M, Ishii Y, Arakawa T, Hara A, Fukunishi Y, Konno H, Adachi J, Fukuda S, Aizawa K, Izawa M, Nishi K, Kiyosawa H, Kondo S, Yamanaka I, Saito T, Okazaki Y, Gojobori T, Bono H, Kasukawa T, Saito R, Kadota K, Matsuda H, Ashburner M, Batalov S, Casavant T, Fleischmann W, Gaasterland T, Gissi C, King B, Kochiwa H, Kuehl P, Lewis S, Matsuo Y, Nikaido I, Pesole G, Quackenbush J, Schriml LM, Staubli F, Suzuki R, Tomita M, Wagner L, Washio T, Sakai K, Okido T, Furuno M, Aono H, Baldarelli R, Barsh G, Blake J, Boffelli D, Bojunga N, Carninci P, De Bonaldo MF, Brownstein MJ, Bult C, Fletcher C, Fujita M, Gariboldi M, Gustincich S, Hill D, Hofmann M, Hume DA, Kamiya M, Lee NH, Lyons P, Marchionni L, Mashima J, Mazzarelli J, Mombaerts P, Nordone P, Ring B, Ringwald M, Rodriguez I, Sakamoto N, Sasaki H, Sato K, Schonbach C, Seya T, Shibata Y, Storch KF, Suzuki H, Toyo-oka K, Wang KH, Weitz C, Whittaker C, Wilming L, Wynshaw-Boris A, Yoshida K, Hasegawa Y, Kawaji H, Kohtsuki S, Hayashizaki Y (2001) Functional annotation of a full-length mouse cDNA collection. Nature 409:685–690

    Article  PubMed  Google Scholar 

  • Kececioglu J, Gusfield D (1998) Reconstructing a history of recombinations from a set of sequences. Discrete Appl Math 88:239–260

    Article  Google Scholar 

  • Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254

    Article  CAS  PubMed  Google Scholar 

  • Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664

    CAS  PubMed  Google Scholar 

  • Kessler MM, Zeng Q, Hogan S, Cook R, Morales AJ, Cottarel G(2003) Systematic discovery of new genes in the Saccharomyces cerevisiae genome. Genome Res 13:264–271

    Article  CAS  PubMed  Google Scholar 

  • Kingman JFC (1982a) The coalescent. Stochastic Process Appl 13:235–248

    Article  Google Scholar 

  • Kingman JFC (1982b) On the genealogy of large populations. J Appl Probab 19A:27–43

    Google Scholar 

  • Kingman JFC (2000) Origins of the coalescent: 1974–1982. Genetics 154:1461–1463

    Google Scholar 

  • Koch MA, Weisshaar B, Kroymann J, Haubold B, Mitchell-Olds T (2001) Comparative genomics and regulatory evolution: conservation and function of the chs and apetala3 promoters. Mol Biol Evol 18:1882–1891

    CAS  PubMed  Google Scholar 

  • Koonin EV, Mushegian AR (1996) Complete genome sequences of cellular life forms: glimpses of theoretical evolutionary genomics. Curr Opin Genet Dev 6:757–762

    Article  CAS  PubMed  Google Scholar 

  • Korf I, Flicek P, Duan D, Brent MR (2001) Integrating genomic homology into gene structure prediction. Bioinformatics 17:S140–S148

    PubMed  Google Scholar 

  • Kornberg TB, Krasnow MA (2000) The Drosophila genome sequence: implications for biology and medicine. Science 287:2218–2220

    Article  CAS  PubMed  Google Scholar 

  • Li WH (1997) Molecular evolution. Sinauer, Sunderland, Mass.

  • Makarova KS, Koonin EV (2003) Comparative genomics of Archaea: how much have we learned in six years, and what’s next? Genome Biol 4:115

    Article  PubMed  Google Scholar 

  • Mayer K, Murphy G, Tarchini R, Wambutt R, Volckaert G, Pohl T, Dusterhof A, Stiekema W, Entian KD, Terryn N, Lemcke K, Haase D, Hall CR, Dodeweerd AM van, Tingey SV, Mewes HW, Bevan MW, Bancroft I (2001) Conservation of microstructure between a sequenced region of the genome of rice and multiple segments of the genome of Arabidopsis thaliana. Genome Res 11:1167–1174

    Article  CAS  PubMed  Google Scholar 

  • Maynard Smith J, Haigh J (1974) The hitch-hiking effect of a favourable gene. Genet Res 23:23–35

    PubMed  Google Scholar 

  • McVean GAT, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P (2004) The fine-scale structure of recombination rate variation in the human genome. Science 304:581–584

    Article  CAS  PubMed  Google Scholar 

  • Meyer I, Durbin R (2002) Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18:1309–1318

    Article  CAS  PubMed  Google Scholar 

  • Miller W (2001) Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics 17:391–397

    Article  CAS  PubMed  Google Scholar 

  • Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–561

    Article  PubMed  Google Scholar 

  • Mural RJ, Adams MD, Myers EW, Smith HO, Miklos GL, Wides R, Halpern A, Li PW, Sutton GG, Nadeau J, Salzberg SL, Holt RA, Kodira CD, Lu F, Chen L, Deng Z, Evangelista CC, Gan W, Heiman TJ, Li J, Li Z, Merkulov GV, Milshina NV, Naik AK, Qi R, Shue BC, Wang A, Wang J, Wang X, YanX, Ye J, Yooseph S, Zhao Q, Zheng L, Zhu SC, Biddick K, Bolanos R, Delcher AL, Dew IM, Fasulo D, Flanigan MJ, Huson DH, Kravitz SA, Miller JR, Mobarry CM, Reinert K, Remington KA, Zhang Q, Zheng XH, Nusskern DR, Lai Z, Lei Y, Zhong W, Yao A, Guan P, Ji RR, Gu Z, Wang ZY, Zhong F, Xiao C, Chiang CC, Yandell M, Wortman JR, Amanatides PG, Hladun SL, Pratts EC, Johnson JE, Dodson KL, Woodford KJ, Evans CA, Gropman B, Rusch DB, Venter E, Wang M, Smith TJ, Houck JT, Tompkins DE, Haynes C, Jacob D, Chin SH, Allen DR, Dahlke CE, Sanders R, Li K, Liu X, Levitsky AA, Majoros WH, Chen Q, Xia AC, Lopez JR, Donnelly MT, Newman MH, Glodek A, Kraft CL, Nodell M, Ali F, An HJ, Baldwin-Pitts D, Beeson KY, Cai S, Carnes M, Carver A, Caulk PM, Center A, Chen YH, Cheng ML, Coyne MD, Crowder M, Danaher S, Davenport LB, Desilets R, Dietz SM, Doup L, Dullaghan P, Ferriera S, Fosler CR, Gire HC, Gluecksmann A, Gocayne JD, Gray J, Hart B, Haynes J, Hoover J, Howland T, Ibegwam C, Jalali M, Johns D, Kline L, Ma DS, MacCawley S, Magoon A, Mann F, May D, McIntosh TC, Mehta S, Moy L, Moy MC, Murphy BJ, Murphy SD, Nelson KA, Nuri Z, Parker KA, Prudhomme AC, Puri VN, Qureshi H, Raley JC, Reardon MS, Regier MA, Rogers YH, Romblad DL, Schutz J, Scott JL, Scott R, Sitter CD, Smallwood M, Sprague AC, Stewart E, Strong RV, Suh E, Sylvester K, Thomas R, Tint NN, Tsonis C,Wang G, Wang G, Williams MS, Williams SM, Windsor SM, Wolfe K, Wu MM, Zaveri J, Chaturvedi K, Gabrielian AE, Ke Z, Sun J, Subramanian G, Venter JC, Pfannkoch CM, Barnstead M, Stephenson LD (2002) A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 296:1661–1671

    Article  CAS  PubMed  Google Scholar 

  • Mushegian AR, Koonin EV (1996) A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci USA 93:10268–10273

    Article  CAS  PubMed  Google Scholar 

  • National Institutes of Health and Department of Energy (1990) Understanding our genetic inheritance. (The United States human genome project; the first five years: fiscal years 1991–1995. Technical report) National Institutes of Heals and Department of Energy http://www.genome.gov

  • Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453

    CAS  PubMed  Google Scholar 

  • Nordborg M (2001) Coalescent theory. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics. Wiley, Mannheim, pp 178–212

  • Nurtdinov RN, Artamonova II, Mironov AA, Gelfand MS (2003) Low conservation of alternative splicing patterns in the human and mouse genomes. Hum Mol Genet 12:1313–1320

    Article  CAS  PubMed  Google Scholar 

  • Ohler U, Niemann H (2001) Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet 17:56–60

    Article  CAS  PubMed  Google Scholar 

  • Ohler U, Niemann H, Liao GC, Rubin GM (2001) Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17 [Suppl]:S199–S206

  • Pachter L, Alexandersson M, Cawley S (2001) Applications of generalized pair hidden Markov models to alignment and gene finding problems. In: Press A (ed) Proceedings of the fifth annual conference on computational molecular biology. RECOMB, New York, pp 241–248

  • Parra G, Agarwal P, Abril J, Wiehe T, Fickett J, Guigó R (2003) Comparative gene prediction in human and mouse. Genome Res 13:108–117

    Article  CAS  PubMed  Google Scholar 

  • Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719–1723

    Article  CAS  PubMed  Google Scholar 

  • Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci 85:2444–2448

    CAS  PubMed  Google Scholar 

  • Plotkin JP, Dushoff J, Fraser HB (2004) Detecting selection using a single genome sequence of M. tuberculosis and P. falciparum. Nature 428:942–945

    Article  CAS  PubMed  Google Scholar 

  • Ponomarenko JV, Ponomarenko MP, Frolov AS, Vorobyev DG, Overton GC, Kolchanov NA (1999) Conformational and physicochemical DNA features specific for transcription factor binding sites. Bioinformatics 15:654–668

    Article  CAS  PubMed  Google Scholar 

  • Reich DR, Schaffner SF, Daly MJ, McVean G, Mullikin JC, Higgins JM, Richter DJ, Lander ES, Altschuler D (2002) Human genome sequence variation and the influence of gene history, mutation and recombination. Nat Genet 32:135–142

    Article  CAS  PubMed  Google Scholar 

  • Reichwald K (2003) Interspeziesvergleich genomischer DNA-Sequenzen zur Genidentifizierung in 240 kb des humanen und murinen X-Chromosoms. PhD thesis, Friedrich Schiller Universität, Jena

  • Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W, Cherry JM, Henikoff S, Skupski MP, Misra S, Ashburner M, Birney E, Boguski MS, Brody T, Brokstein P, Celniker SE, Chervitz SA, Coates D, Cravchik A, Gabrielian A, Galle RF, Gelbart WM, George RA, Goldstein LS, Gong F, Guan P, Harris NL, Hay BA, Hoskins RA, Li J, Li Z, Hynes RO, Jones SJ, Kuehl PM, Lemaitre B, Littleton JT, Morrison DK, Mungall C, O’Farrell PH, Pickeral OK, Shue C, Vosshall LB, Zhang J, Zhao Q, Zheng XH, Lewis S (2000) Comparative genomics of the eukaryotes. Science 287:2204–2215

    Article  CAS  PubMed  Google Scholar 

  • Ruepp A, Gram lW, Santos-Martinez ML, Koretke KK, Volker C, Mewes HW, Frishman D, Stocker S, Lupas AN, Baumeister W (2000) The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilum. Nature 407:508–513

    Article  CAS  PubMed  Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylgenetic trees. Mol Biol Evol 4:406–425

    CAS  PubMed  Google Scholar 

  • Schlötterer C (2003) Hitchhiking mapping—functional genomics from the population genetics perspective. Trends Genet 19:32–38

    Article  PubMed  Google Scholar 

  • SchoolnikGK(2002) Functional and comparative genomics of pathogenic bacteria. Curr Opin Microbiol 5:20–26

    Article  PubMed  Google Scholar 

  • Schwartz R, Clark AG, Istrail S (2002) Methods for inferring block-wise ancestral history from haploid sequences. In: Guigó R, Gusfield D (eds) Lecture notes in computer science, Springer, Berlin Heidelberg New York, pp 44–59

  • Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W (2003) Human–mouse alignments with BLASTZ. Genome Res 13:103–107

    Article  CAS  PubMed  Google Scholar 

  • Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197

    CAS  PubMed  Google Scholar 

  • Sorrells ME, La Rota M, Bermudez-Kandianis CE, Greene RA, Kantety R, Munkvold JD, Miftahudin A, Mahmoud A, Ma X, Gustafson PJ, Qi LL, Echalier B, Gill BS, Matthews DE, Lazo GR, Chao S, Anderson OD, Edwards H, Linkiewicz AM, Dubcovsky J, Akhunov ED, Dvorak J, Zhang D, Nguyen HT, Peng J, Lapitan NL, Gonzalez-Hernandez JL, Anderson JA, Hossain K, Kalavacharla V, Kianian SF, Choi DW, Close TJ, Dilbirligi M, Gill KS, Steber C, Walker-Simmons MK, McGuire PE, Qualset CO (2003) Comparative DNA sequence analysis of wheat and rice genomes. Genome Res 13:1818–1827

    CAS  PubMed  Google Scholar 

  • Strimmer K, Haeseler A von (1997) Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci USA 94:6815–6819

    Article  CAS  PubMed  Google Scholar 

  • Taher L, Rinner O, Garg S, Sczyrba A, Brudno M, Batzoglou S, Morgenstern B (2003) Agenda: homology-based gene prediction. Bioinformatics 12:1575–1577

    Article  Google Scholar 

  • Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595

    CAS  PubMed  Google Scholar 

  • Thomas JW, Touchman JW (2002)Vertebrate genome sequencing: building a backbone for comparative genomics. Trends Genet 18:104–108

    Article  CAS  PubMed  Google Scholar 

  • Thomas JW, Touchman JW, Blakesley RW, Bouffard GG, Beckstrom-Sternberg SM, Margulies EH, Blanchette M, Siepel AC, Thomas PJ, McDowell JC, Maskeri B, Hansen NF, Schwartz MS, Weber RJ, Kent WJ, Karolchik D, Bruen TC, Bevan R, Cutler DJ, Schwartz S, Elnitski L, Idol JR, Prasad AB, Lee-Lin SQ, Maduro VV, Summers TJ, Portnoy ME, Dietrich NL, Akhter N, Ayele K, Benjamin B, Cariaga K, Brinkley CP, Brooks SY, Granite S, Guan X, Gupta J, Haghighi P, Ho SL, Huang MC, Karlins E, Laric PL, Legaspi R, Lim MJ, Maduro QL, Masiello CA, Mastrian SD, McCloskey JC, Pearson R, Stantripop S, Tiongson EE, Tran JT, Tsurgeon C, Vogt JL, Walker MA, Wetherby KD, Wiggins LS, Young AC, Zhang LH, Osoegawa K, Zhu B, Zhao B, Shu CL, De Jong PJ, Lawrence CE, Smit AF, Chakravarti A, Haussler D, Green P, Miller W, Green ED (2003) Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424:788–793

    Article  CAS  PubMed  Google Scholar 

  • Thompson JD, Higgins DG, Gibson TJ (1994) Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680

    CAS  PubMed  Google Scholar 

  • Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X (2001) The sequence of the human genome. Science 291:1304–1351

    Article  CAS  PubMed  Google Scholar 

  • Watterson GA (1975) On the number of segregating sites in genetical models without recombination. Theor Popul Biol 7:256–276

    CAS  PubMed  Google Scholar 

  • Weiner J (1994) The beak of the finch. Vintage, New York

  • Werner T (2003a) Promoters can contribute to the elucidation of protein function. Trends Biotechnol 21:9–13

    Article  CAS  PubMed  Google Scholar 

  • Werner T (2003b) The state of the art of mammalian promoter recognition. Brief Bioinf 4:22–30

    Article  CAS  Google Scholar 

  • Wiehe T, Guigo R, Miller W(2000) Genome sequence comparisons: hurdles in the fast lane to functional genomics. Brief Bioinform 1:381–388

    CAS  PubMed  Google Scholar 

  • Willey JS, Dao-Ung LP, Sluyter R, Shemon AN, Li C, Taper J, Gallo J, Manoharan A (2002) A loss-of-function polymorphic mutation in the cytolytic P2X7 receptor gene and chronic lymphocytic leukaemia: a molecular study. Lancet 359:1114–1119

    Article  PubMed  Google Scholar 

  • Wiuf C, Hein J (2000) The coalescent with gene conversion. Genetics 155:451–462

    CAS  PubMed  Google Scholar 

  • Wong GKS, Passey DA, Yu J (2001) Most of the human genome is transcribed. Genome Res 11:1975–1977

    Article  CAS  PubMed  Google Scholar 

  • Zhang CT, Zhang R, Ou HY (2003) The Z curve database: a graphic representation of genome sequences. Bioinformatics 19:593–599

    Article  CAS  PubMed  Google Scholar 

  • Zhou W, Goodman SN, Galizia G, Lieto C, Ferraraccio F, Pignatelli C, Purdie CA, Piris J, Morris R, Harrison DJ, Paty PB, Culliford A, Romans KE, Montgomery EA, Choti MA, Kinzler KW, Vogelstein B (2002) Counting alleles to predict recurrence of early-stage colorectal cancers. Lancet 359:219–225

    Article  PubMed  Google Scholar 

  • Zmaskek CM, Eddy SR (2001) ATV: display and manipulation of annotated phylgenetic trees. Bioinformatics 17:383–384

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We are grateful to an anonymous referee for helpful comments. B.H. is financially supported by Dehner Gartencenter GmbH and the Stifterverband für die Deutsche Wissenschaft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bernhard Haubold.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haubold, B., Wiehe, T. Comparative genomics: methods and applications. Naturwissenschaften 91, 405–421 (2004). https://doi.org/10.1007/s00114-004-0542-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00114-004-0542-8

Keywords

Navigation