Advertisement

Naturwissenschaften

, Volume 91, Issue 9, pp 405–421 | Cite as

Comparative genomics: methods and applications

  • Bernhard Haubold
  • Thomas Wiehe
Review

Abstract

Interpreting the functional content of a given genomic sequence is one of the central challenges of biology today. Perhaps the most promising approach to this problem is based on the comparative method of classic biology in the modern guise of sequence comparison. For instance, protein-coding regions tend to be conserved between species. Hence, a simple method for distinguishing a functional exon from the chance absence of stop codons is to investigate its homologue from closely related species. Predicting regulatory elements is even more difficult than exon prediction, but again, comparisons pinpointing conserved sequence motifs upstream of translation start sites are helping to unravel gene regulatory networks. In addition to interspecific studies, intraspecific sequence comparison yields insights into the evolutionary forces that have acted on a species in the past. Of particular interest here is the identification of selection events such as selective sweeps. Both intra- and interspecific sequence comparisons are based on a variety of computational methods, including alignment, phylogenetic reconstruction, and coalescent theory. This article surveys the biology and the central computational ideas applied in recent comparative genomics projects. We argue that the most fruitful method of understanding the functional content of genomes is to study them in the context of related genomic sequences. In particular, such a study may reveal selection, a fundamental pointer to biological relevance.

Keywords

Comparative Genomic Selective Sweep Suffix Tree Mycoplasma Genitalium Phylogenetic Footprinting 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

We are grateful to an anonymous referee for helpful comments. B.H. is financially supported by Dehner Gartencenter GmbH and the Stifterverband für die Deutsche Wissenschaft.

References

  1. Abril JF, Guigó R, Wiehe T (2004) gff2aplot: plotting sequence comparisons. Bioinformatics 19:2477–2479CrossRefGoogle Scholar
  2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J. Mol Biol 215:403–410CrossRefGoogle Scholar
  3. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman D (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedGoogle Scholar
  4. Andersson SG, Alsmark C, Canback B, Davids W, Frank C, Karlberg O, Klasson L, Antoine-Legault B, Mira A, Tamas I (2002) Comparative genomics of microbial pathogens and symbionts. Bioinformatics 18 [Suppl 2]:S17Google Scholar
  5. Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, Gelpke MD, Roach J, Oh T, Ho IY, Wong M, Detter C, Verhoef F, Predki P, Tay A, Lucas S, Richardson P, Smith SF, Clark MS, Edwards YJ, Doggett N, Zharkikh A, Tavtigian SV, Pruss D, Barnstead M, Evans C, Baden H, Powell J, Glusman G, Rowen L, Hood L, Tan YH, Elgar G, Hawkins T, Venkatesh B, Rokhsar D, Brenner S (2002) Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science 297:1301–1310CrossRefPubMedGoogle Scholar
  6. Baeza-Yates RA, Perleberg CH(1992) Fast and practical approximate string matching. In: Springer (ed) Proc 3rd Symp Combinatorial Pattern Matching. (Springer lecture notes in computer science, vol 644) Springer, Berlin Heidelberg New York, pp 185–192Google Scholar
  7. Bennetzen J (2002) Opening the door to comparative plant biology. Science 296:60–63CrossRefPubMedGoogle Scholar
  8. Bernal A, Ear U, Kypides N (2001) Genomes online database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res 29:126–127CrossRefPubMedGoogle Scholar
  9. Boffelli D, McAuliffe J, Ovcharenko D, Lewis KD, Ovcharenko I, Pachter L, Rubin EM (2003) Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299:1391–1394CrossRefPubMedGoogle Scholar
  10. Brachat S, Dietrich FS, Voegeli S, Zhang Z, Stuart L, Lerch A, Gates K, Gaffney T, Philippsen P (2003) Reinvestigation of the Saccharomyces cerevisiae genome annotation by comparison to the genome of a related fungus: Ashbya gossypii. Genome Biol 4:R45CrossRefPubMedGoogle Scholar
  11. Bray N, Dubchak I, Pachter L (2003) AVID: a global alignment program. Genome Research 13:97–102CrossRefPubMedGoogle Scholar
  12. Brendel V, Kurtz S, Walbot V (2002) Comparative genomics of Arabidopsis and maize: prospects and limitations. Genome Biol 3:1005CrossRefGoogle Scholar
  13. Brenner S, Elgar G, Sandford R, Macrae A, Venkatesh B, Aparicio S (1993) Characterization of the pufferfish (Fugu) genome as a compact model vertebrate genome. Nature 366:265–268CrossRefPubMedGoogle Scholar
  14. Brosch R, Pym AS, Gordon SV, Cole ST (2001) The evolution of mycobacterial pathogenicity: clues from comparative genomics. Trends Microbiol 9:452–458CrossRefPubMedGoogle Scholar
  15. Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA. J Mol Biol 268:78–94CrossRefPubMedGoogle Scholar
  16. Buysse JM (2001) The role of genomics in antibacterial target discovery. Curr Med Chem 8:1713–1726PubMedGoogle Scholar
  17. Casjens S (1998) The diverse and dynamic structure of bacterial genomes. Annu Rev Genet 32:339–377CrossRefPubMedGoogle Scholar
  18. Chiaromonte F, Yap VB, Miller W (2002) Scoring pairwise genomic sequence alignments. Pacific Symp Biocomput 2002:115–126Google Scholar
  19. Chung HR, Gusfield G (2003) Perfect phylogeny haplotyper: haplotype inferral using a tree model. Bioinformatics 19:780–781CrossRefPubMedGoogle Scholar
  20. Clark AG(1990) Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol 7:111–122PubMedGoogle Scholar
  21. Clark AG, Gibson G, Kaufman T, Myers E, O’Grady P (2003) Draft proposal for Drosophila as a model system for comparative genomics. http://life.biology.mcmaster.ca/brian/evoldir.html
  22. Clark MS (1999) Comparative genomics: the key to understanding the human genome project. Bioessays 21:121–130CrossRefPubMedGoogle Scholar
  23. Cole JR, Chai B, Marsh TL, Farris RJ, Wang Q, Kulam SA, Chandra S, McGarrell DM, Schmidt TM, Garrity GM, Tiedje JM (2003) The ribosomal database project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res 31:442–443CrossRefPubMedGoogle Scholar
  24. Cole ST (1998) Comparative mycobacterial genomics. Curr Opin Microbiol 1:567–571CrossRefPubMedGoogle Scholar
  25. Couronne O, Poliakov A, Bray N, Ishkhanov T, Ryaboy D, Rubin E, Pachter L, Dubchak I (2003) Strategies and tools for whole-genome alignments. Genome Res 13:73–80CrossRefPubMedGoogle Scholar
  26. Crollius HR, Jaillon O, Bernot A, Dasilva C, Bouneau L, Fischer C, Fizames C, Wincker P, Brottier P, Quetier F, Saurin W, Weissenbach J (2000) Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence. Nat Genet 25:235–238CrossRefPubMedGoogle Scholar
  27. Darwin C (1859) On the origin of species by means of natural selection or the preservation of favoured races in the struggle for life, 1985 edn. Penguin, LondonGoogle Scholar
  28. Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure, vol 5. National Biomedical Research Foundation, Washington, D.C., pp 345–352Google Scholar
  29. Delcher AL, Kasti S, Fleischmann RD, Peterson J, White W, Salzberg SL (1999) Alignment of whole genomes. Nucleic Acids Res 27:2369–2376CrossRefPubMedGoogle Scholar
  30. Dermitzakis ET, Reymond A, Lyle R, Scamuffa N, Ucla C, Deutsch S, Stevenson BJ, Flegel V, Bucher P, Jongeneel CV, Antonarakis SE (2002) Numerous potentially functional but non-genic conserved sequences on human chromosome 21. Nature 420:578–582CrossRefPubMedGoogle Scholar
  31. Dubchak I, Brudno M, Loots GG, Pachter L, Mayor C, Rubin EM, Frazer KA (2000) Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res 10:1304–1306CrossRefPubMedGoogle Scholar
  32. Edwards AWF, Cavalli-Sforza LL (1964) Reconstruction of evolutionary trees. In: Heywood VH, McNeill J (eds) Phenetic and phylogenetic classification. Systematics Association, London, pp 67–76Google Scholar
  33. Eigen M, Winkler-Oswatitsch R, Dress A (1988) Statistical geometry in sequence space: a method of quantitative comparative sequence analysis. Proc Natl Acad Sci USA 85:5913–5917PubMedGoogle Scholar
  34. Elgar G, Sandford R, Aparicio S, Macrae A, Venkatesh B, Brenner S (1996) Small is beautiful: comparative genomics with the pufferfish (Fugu rubripes). Trends Genet 12:145–150CrossRefPubMedGoogle Scholar
  35. Fay JC, Wu CI (2000) Hitchhiking under positive Darwinian selection. Genetics 155:1405–1413PubMedGoogle Scholar
  36. Felsenstein J (1993) PHYLIP (phylogeny interference package). University of Washington, SeattleGoogle Scholar
  37. Felsenstein J (2004) Inferring phylogenies. Sinauer, Sunderland, Mass.Google Scholar
  38. Field D, Hood D, Moxon R (1999) Contribution of genomics to bacterial pathogenesis. Curr Opin Genet Dev 9:700–703CrossRefPubMedGoogle Scholar
  39. Fitzgerald JR, Musser JM (2001) Evolutionary genomics of pathogenic bacteria. Trends Microbiol 9:547–553CrossRefPubMedGoogle Scholar
  40. Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton GG, Kelley JM, Fritchman JL, Weidman JF, Small KV, Sandusky M, Fuhrmann JL, Nguyen DT, Utterback T, Saudek DM, Phillips CA, Merrick JM, Tomb J, Dougherty BA, Bott KF, Hu PC, Lucier TS, Peterson SN, Smith HO, Venter JC (1995) The minimal gene complement of Mycoplasma genitalium. Science 270:397–403PubMedGoogle Scholar
  41. Fu YX, Li WH(1993) Statistical tests of neutrality of mutations. Genetics 133:693–709PubMedGoogle Scholar
  42. Galagan JE, Nusbaum C, Roy A, Endrizzi MG, Macdonald P, FitzHugh W, Calvo S, Engels R, Smirnov S, Atnoor D, Brown A, Allen N, Naylor J, Stange-Thomann N, DeArellano K, Johnson R, Linton L, McEwan P, McKernan K, Talamas J, Tirrell A, Ye W, Zimmer A, Barber RD, Cann I, Graham DE, Grahame DA, Guss AM, Hedderich R, Ingram-Smith C, Kuettner HC, Krzycki JA, Leigh JA, Li W, Liu J, Mukhopadhyay B, Reeve JN, Smith K, Springer TA, Umayam LA, White O, White RH, Conway de Macario E, Ferry JG, Jarrell KF, Jing H, Macario AJ, Paulsen I, Pritchett M, Sowers KR, Swanson RV, Zinder SH, Lander E, Metcalf WW, Birren B (2002) The genome of M. acetivorans reveals extensive metabolic and physiological diversity. Genome Res 12:532–542CrossRefPubMedGoogle Scholar
  43. Galperin MY, Koonin EV (2003) Frontiers in computational genomics. (Functional genomics, vol 3) Caister, WymondhamGoogle Scholar
  44. Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G, Superti-Furga G (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415:141–147CrossRefPubMedGoogle Scholar
  45. Gilad Y, Rosenberg S, Przeworski M, Lancet D, Skorecki K (2002) Evidence for positive selection and population structure at the human MAO-A gene. Proc Natl Acad Sci USA 99:862–867CrossRefPubMedGoogle Scholar
  46. Gish W, States D (1993) Identification of protein coding regions by database similarity search. Nat Genet 3:266–272PubMedGoogle Scholar
  47. Glinka S, Ometto L, Mousset S, Stephan W, Lorenzo DD (2003) Demography and natural selection have shaped genetic variation in Drosophila melanogaster: a multi-locus approach. Genetics 165:1269–1278PubMedGoogle Scholar
  48. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG(1996) Life with 6000 genes. Science 274:546–563CrossRefPubMedGoogle Scholar
  49. Goldstein DB (2001) Islands of linkage disequilibrium. Nat Genet 29:109–111CrossRefPubMedGoogle Scholar
  50. Griffiths RC, Marjoram P (1997) An ancestral recombination graph. In: Donnelly P, Tavar’e S (eds) Progress in population genetics and human evolution. (The IAM volumes in mathematics and its applications, vol 87) Springer, Berlin Heidelberg New York, pp 257–270Google Scholar
  51. Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, CambridgeGoogle Scholar
  52. Gusfield D (2001) Inference of haplotypes from samples of diploid populations: complexity and algorithms. J Comput Biol 8:305–323CrossRefPubMedGoogle Scholar
  53. Hardison RC (2000) Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet 16:369–372CrossRefPubMedGoogle Scholar
  54. Hardison RC, Oeltjen J, Miller W (1997) Long human–mouse sequence alignments reveal novel regulatory elements: a reason to sequence the mouse genome. Genome Res 7:959–966PubMedGoogle Scholar
  55. Haubold B, Wiehe T (2001) Statistics of divergence times. Mol Biol Evol 18:1157–1160PubMedGoogle Scholar
  56. Haubold B, Wiehe T (2002) Calculating the SNP-effective sample size from an alignment. Bioinformatics 18:36–38CrossRefPubMedGoogle Scholar
  57. Haubold B, Kroymann J, Ratzka A, Mitchell-Olds T, Wiehe T (2002) Recombination and gene conversion in Arabidopsis thaliana. Genetics 161:1269–1278PubMedGoogle Scholar
  58. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89:10915–10919PubMedGoogle Scholar
  59. Horikawa Y, Oda N, Cox NJ, Li X, Orho-Melander M, Hara M, Hinokio Y, Lindner TH, Mashima H, Schwarz PEH, Bosque-Plata L del, Horikawa Y, Oda Y, Yoshiuchi I, Colilla S, Polonsky KS, Wei S, Concannon P, Iwasaki N, Schulze J, Baier LJ, Bogardus C, Groop L, Boerwinkle E, Hanis CL, Bell GI (2000) Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nat Genet 26:163–175CrossRefPubMedGoogle Scholar
  60. Hudson RR (1983) Properties of a neutral allele model with intragenic recombination. Theor Popul Biol 23:183–201PubMedGoogle Scholar
  61. Hudson RR (1990) Gene genealogies and the coalescent process. Oxford Surv Evol Biol 7:1–44Google Scholar
  62. Hudson RR (2002) Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18:337–338CrossRefPubMedGoogle Scholar
  63. Hudson RR, Kaplan NL (1987) The coalescent process in models with selection and recombination. Genetics 120:831–840Google Scholar
  64. Hudson RR, Slatkin M, Maddison WP (1992) Estimation of levels of gene flow from DNA sequence data. Genetics 132:583–589PubMedGoogle Scholar
  65. Huson DH (1998) SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14:68–73CrossRefPubMedGoogle Scholar
  66. Hutchison CA III, Peterson SN, Gill SR, Cline RT, White O, Fraser CM, Smith HO, Venter CJ (1999) Global transposon mutagenesis and a minimal Mycoplasma genome. Science 286:2165CrossRefPubMedGoogle Scholar
  67. International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921CrossRefPubMedGoogle Scholar
  68. International SNP Map Working Group (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928–933CrossRefPubMedGoogle Scholar
  69. Kan Z, Rouchka E, Gish W, States D (2001) Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res 11:889–900CrossRefPubMedGoogle Scholar
  70. Kaneko T, Nakamura Y, Sato S, Minamisawa K, Uchiumi T, Sasamoto S, Watanabe A, Idesawa K, Iriguchi M, Kawashima K, Kohara M, Matsumoto M, Shimpo S, Tsuruoka H, Wada T, Yamada M, Tabata S (2002) Complete genomic sequence of nitrogen-fixing symbiotic bacterium Bradyrhizobium japonicum USDA110. DNA Res 9:189–197PubMedGoogle Scholar
  71. Kawai J, Shinagawa A, Shibata K, Yoshino M, Itoh M, Ishii Y, Arakawa T, Hara A, Fukunishi Y, Konno H, Adachi J, Fukuda S, Aizawa K, Izawa M, Nishi K, Kiyosawa H, Kondo S, Yamanaka I, Saito T, Okazaki Y, Gojobori T, Bono H, Kasukawa T, Saito R, Kadota K, Matsuda H, Ashburner M, Batalov S, Casavant T, Fleischmann W, Gaasterland T, Gissi C, King B, Kochiwa H, Kuehl P, Lewis S, Matsuo Y, Nikaido I, Pesole G, Quackenbush J, Schriml LM, Staubli F, Suzuki R, Tomita M, Wagner L, Washio T, Sakai K, Okido T, Furuno M, Aono H, Baldarelli R, Barsh G, Blake J, Boffelli D, Bojunga N, Carninci P, De Bonaldo MF, Brownstein MJ, Bult C, Fletcher C, Fujita M, Gariboldi M, Gustincich S, Hill D, Hofmann M, Hume DA, Kamiya M, Lee NH, Lyons P, Marchionni L, Mashima J, Mazzarelli J, Mombaerts P, Nordone P, Ring B, Ringwald M, Rodriguez I, Sakamoto N, Sasaki H, Sato K, Schonbach C, Seya T, Shibata Y, Storch KF, Suzuki H, Toyo-oka K, Wang KH, Weitz C, Whittaker C, Wilming L, Wynshaw-Boris A, Yoshida K, Hasegawa Y, Kawaji H, Kohtsuki S, Hayashizaki Y (2001) Functional annotation of a full-length mouse cDNA collection. Nature 409:685–690CrossRefPubMedGoogle Scholar
  72. Kececioglu J, Gusfield D (1998) Reconstructing a history of recombinations from a set of sequences. Discrete Appl Math 88:239–260CrossRefGoogle Scholar
  73. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423:241–254CrossRefPubMedGoogle Scholar
  74. Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12:656–664PubMedGoogle Scholar
  75. Kessler MM, Zeng Q, Hogan S, Cook R, Morales AJ, Cottarel G(2003) Systematic discovery of new genes in the Saccharomyces cerevisiae genome. Genome Res 13:264–271CrossRefPubMedGoogle Scholar
  76. Kingman JFC (1982a) The coalescent. Stochastic Process Appl 13:235–248CrossRefGoogle Scholar
  77. Kingman JFC (1982b) On the genealogy of large populations. J Appl Probab 19A:27–43Google Scholar
  78. Kingman JFC (2000) Origins of the coalescent: 1974–1982. Genetics 154:1461–1463Google Scholar
  79. Koch MA, Weisshaar B, Kroymann J, Haubold B, Mitchell-Olds T (2001) Comparative genomics and regulatory evolution: conservation and function of the chs and apetala3 promoters. Mol Biol Evol 18:1882–1891PubMedGoogle Scholar
  80. Koonin EV, Mushegian AR (1996) Complete genome sequences of cellular life forms: glimpses of theoretical evolutionary genomics. Curr Opin Genet Dev 6:757–762CrossRefPubMedGoogle Scholar
  81. Korf I, Flicek P, Duan D, Brent MR (2001) Integrating genomic homology into gene structure prediction. Bioinformatics 17:S140–S148PubMedGoogle Scholar
  82. Kornberg TB, Krasnow MA (2000) The Drosophila genome sequence: implications for biology and medicine. Science 287:2218–2220CrossRefPubMedGoogle Scholar
  83. Li WH (1997) Molecular evolution. Sinauer, Sunderland, Mass.Google Scholar
  84. Makarova KS, Koonin EV (2003) Comparative genomics of Archaea: how much have we learned in six years, and what’s next? Genome Biol 4:115CrossRefPubMedGoogle Scholar
  85. Mayer K, Murphy G, Tarchini R, Wambutt R, Volckaert G, Pohl T, Dusterhof A, Stiekema W, Entian KD, Terryn N, Lemcke K, Haase D, Hall CR, Dodeweerd AM van, Tingey SV, Mewes HW, Bevan MW, Bancroft I (2001) Conservation of microstructure between a sequenced region of the genome of rice and multiple segments of the genome of Arabidopsis thaliana. Genome Res 11:1167–1174CrossRefPubMedGoogle Scholar
  86. Maynard Smith J, Haigh J (1974) The hitch-hiking effect of a favourable gene. Genet Res 23:23–35PubMedGoogle Scholar
  87. McVean GAT, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P (2004) The fine-scale structure of recombination rate variation in the human genome. Science 304:581–584CrossRefPubMedGoogle Scholar
  88. Meyer I, Durbin R (2002) Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18:1309–1318CrossRefPubMedGoogle Scholar
  89. Miller W (2001) Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics 17:391–397CrossRefPubMedGoogle Scholar
  90. Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–561CrossRefPubMedGoogle Scholar
  91. Mural RJ, Adams MD, Myers EW, Smith HO, Miklos GL, Wides R, Halpern A, Li PW, Sutton GG, Nadeau J, Salzberg SL, Holt RA, Kodira CD, Lu F, Chen L, Deng Z, Evangelista CC, Gan W, Heiman TJ, Li J, Li Z, Merkulov GV, Milshina NV, Naik AK, Qi R, Shue BC, Wang A, Wang J, Wang X, YanX, Ye J, Yooseph S, Zhao Q, Zheng L, Zhu SC, Biddick K, Bolanos R, Delcher AL, Dew IM, Fasulo D, Flanigan MJ, Huson DH, Kravitz SA, Miller JR, Mobarry CM, Reinert K, Remington KA, Zhang Q, Zheng XH, Nusskern DR, Lai Z, Lei Y, Zhong W, Yao A, Guan P, Ji RR, Gu Z, Wang ZY, Zhong F, Xiao C, Chiang CC, Yandell M, Wortman JR, Amanatides PG, Hladun SL, Pratts EC, Johnson JE, Dodson KL, Woodford KJ, Evans CA, Gropman B, Rusch DB, Venter E, Wang M, Smith TJ, Houck JT, Tompkins DE, Haynes C, Jacob D, Chin SH, Allen DR, Dahlke CE, Sanders R, Li K, Liu X, Levitsky AA, Majoros WH, Chen Q, Xia AC, Lopez JR, Donnelly MT, Newman MH, Glodek A, Kraft CL, Nodell M, Ali F, An HJ, Baldwin-Pitts D, Beeson KY, Cai S, Carnes M, Carver A, Caulk PM, Center A, Chen YH, Cheng ML, Coyne MD, Crowder M, Danaher S, Davenport LB, Desilets R, Dietz SM, Doup L, Dullaghan P, Ferriera S, Fosler CR, Gire HC, Gluecksmann A, Gocayne JD, Gray J, Hart B, Haynes J, Hoover J, Howland T, Ibegwam C, Jalali M, Johns D, Kline L, Ma DS, MacCawley S, Magoon A, Mann F, May D, McIntosh TC, Mehta S, Moy L, Moy MC, Murphy BJ, Murphy SD, Nelson KA, Nuri Z, Parker KA, Prudhomme AC, Puri VN, Qureshi H, Raley JC, Reardon MS, Regier MA, Rogers YH, Romblad DL, Schutz J, Scott JL, Scott R, Sitter CD, Smallwood M, Sprague AC, Stewart E, Strong RV, Suh E, Sylvester K, Thomas R, Tint NN, Tsonis C,Wang G, Wang G, Williams MS, Williams SM, Windsor SM, Wolfe K, Wu MM, Zaveri J, Chaturvedi K, Gabrielian AE, Ke Z, Sun J, Subramanian G, Venter JC, Pfannkoch CM, Barnstead M, Stephenson LD (2002) A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 296:1661–1671CrossRefPubMedGoogle Scholar
  92. Mushegian AR, Koonin EV (1996) A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci USA 93:10268–10273CrossRefPubMedGoogle Scholar
  93. National Institutes of Health and Department of Energy (1990) Understanding our genetic inheritance. (The United States human genome project; the first five years: fiscal years 1991–1995. Technical report) National Institutes of Heals and Department of Energy http://www.genome.gov
  94. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453PubMedGoogle Scholar
  95. Nordborg M (2001) Coalescent theory. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics. Wiley, Mannheim, pp 178–212Google Scholar
  96. Nurtdinov RN, Artamonova II, Mironov AA, Gelfand MS (2003) Low conservation of alternative splicing patterns in the human and mouse genomes. Hum Mol Genet 12:1313–1320CrossRefPubMedGoogle Scholar
  97. Ohler U, Niemann H (2001) Identification and analysis of eukaryotic promoters: recent computational approaches. Trends Genet 17:56–60CrossRefPubMedGoogle Scholar
  98. Ohler U, Niemann H, Liao GC, Rubin GM (2001) Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17 [Suppl]:S199–S206Google Scholar
  99. Pachter L, Alexandersson M, Cawley S (2001) Applications of generalized pair hidden Markov models to alignment and gene finding problems. In: Press A (ed) Proceedings of the fifth annual conference on computational molecular biology. RECOMB, New York, pp 241–248Google Scholar
  100. Parra G, Agarwal P, Abril J, Wiehe T, Fickett J, Guigó R (2003) Comparative gene prediction in human and mouse. Genome Res 13:108–117CrossRefPubMedGoogle Scholar
  101. Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR, Lee DH, Marjoribanks C, McDonough DP, Nguyen BT, Norris MC, Sheehan JB, Shen N, Stern D, Stokowski RP, Thomas DJ, Trulson MO, Vyas KR, Frazer KA, Fodor SP, Cox DR (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294:1719–1723CrossRefPubMedGoogle Scholar
  102. Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci 85:2444–2448PubMedGoogle Scholar
  103. Plotkin JP, Dushoff J, Fraser HB (2004) Detecting selection using a single genome sequence of M. tuberculosis and P. falciparum. Nature 428:942–945CrossRefPubMedGoogle Scholar
  104. Ponomarenko JV, Ponomarenko MP, Frolov AS, Vorobyev DG, Overton GC, Kolchanov NA (1999) Conformational and physicochemical DNA features specific for transcription factor binding sites. Bioinformatics 15:654–668CrossRefPubMedGoogle Scholar
  105. Reich DR, Schaffner SF, Daly MJ, McVean G, Mullikin JC, Higgins JM, Richter DJ, Lander ES, Altschuler D (2002) Human genome sequence variation and the influence of gene history, mutation and recombination. Nat Genet 32:135–142CrossRefPubMedGoogle Scholar
  106. Reichwald K (2003) Interspeziesvergleich genomischer DNA-Sequenzen zur Genidentifizierung in 240 kb des humanen und murinen X-Chromosoms. PhD thesis, Friedrich Schiller Universität, JenaGoogle Scholar
  107. Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W, Cherry JM, Henikoff S, Skupski MP, Misra S, Ashburner M, Birney E, Boguski MS, Brody T, Brokstein P, Celniker SE, Chervitz SA, Coates D, Cravchik A, Gabrielian A, Galle RF, Gelbart WM, George RA, Goldstein LS, Gong F, Guan P, Harris NL, Hay BA, Hoskins RA, Li J, Li Z, Hynes RO, Jones SJ, Kuehl PM, Lemaitre B, Littleton JT, Morrison DK, Mungall C, O’Farrell PH, Pickeral OK, Shue C, Vosshall LB, Zhang J, Zhao Q, Zheng XH, Lewis S (2000) Comparative genomics of the eukaryotes. Science 287:2204–2215CrossRefPubMedGoogle Scholar
  108. Ruepp A, Gram lW, Santos-Martinez ML, Koretke KK, Volker C, Mewes HW, Frishman D, Stocker S, Lupas AN, Baumeister W (2000) The genome sequence of the thermoacidophilic scavenger Thermoplasma acidophilum. Nature 407:508–513CrossRefPubMedGoogle Scholar
  109. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylgenetic trees. Mol Biol Evol 4:406–425PubMedGoogle Scholar
  110. Schlötterer C (2003) Hitchhiking mapping—functional genomics from the population genetics perspective. Trends Genet 19:32–38CrossRefPubMedGoogle Scholar
  111. SchoolnikGK(2002) Functional and comparative genomics of pathogenic bacteria. Curr Opin Microbiol 5:20–26CrossRefPubMedGoogle Scholar
  112. Schwartz R, Clark AG, Istrail S (2002) Methods for inferring block-wise ancestral history from haploid sequences. In: Guigó R, Gusfield D (eds) Lecture notes in computer science, Springer, Berlin Heidelberg New York, pp 44–59Google Scholar
  113. Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W (2003) Human–mouse alignments with BLASTZ. Genome Res 13:103–107CrossRefPubMedGoogle Scholar
  114. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197PubMedGoogle Scholar
  115. Sorrells ME, La Rota M, Bermudez-Kandianis CE, Greene RA, Kantety R, Munkvold JD, Miftahudin A, Mahmoud A, Ma X, Gustafson PJ, Qi LL, Echalier B, Gill BS, Matthews DE, Lazo GR, Chao S, Anderson OD, Edwards H, Linkiewicz AM, Dubcovsky J, Akhunov ED, Dvorak J, Zhang D, Nguyen HT, Peng J, Lapitan NL, Gonzalez-Hernandez JL, Anderson JA, Hossain K, Kalavacharla V, Kianian SF, Choi DW, Close TJ, Dilbirligi M, Gill KS, Steber C, Walker-Simmons MK, McGuire PE, Qualset CO (2003) Comparative DNA sequence analysis of wheat and rice genomes. Genome Res 13:1818–1827PubMedGoogle Scholar
  116. Strimmer K, Haeseler A von (1997) Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci USA 94:6815–6819CrossRefPubMedGoogle Scholar
  117. Taher L, Rinner O, Garg S, Sczyrba A, Brudno M, Batzoglou S, Morgenstern B (2003) Agenda: homology-based gene prediction. Bioinformatics 12:1575–1577CrossRefGoogle Scholar
  118. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595PubMedGoogle Scholar
  119. Thomas JW, Touchman JW (2002)Vertebrate genome sequencing: building a backbone for comparative genomics. Trends Genet 18:104–108CrossRefPubMedGoogle Scholar
  120. Thomas JW, Touchman JW, Blakesley RW, Bouffard GG, Beckstrom-Sternberg SM, Margulies EH, Blanchette M, Siepel AC, Thomas PJ, McDowell JC, Maskeri B, Hansen NF, Schwartz MS, Weber RJ, Kent WJ, Karolchik D, Bruen TC, Bevan R, Cutler DJ, Schwartz S, Elnitski L, Idol JR, Prasad AB, Lee-Lin SQ, Maduro VV, Summers TJ, Portnoy ME, Dietrich NL, Akhter N, Ayele K, Benjamin B, Cariaga K, Brinkley CP, Brooks SY, Granite S, Guan X, Gupta J, Haghighi P, Ho SL, Huang MC, Karlins E, Laric PL, Legaspi R, Lim MJ, Maduro QL, Masiello CA, Mastrian SD, McCloskey JC, Pearson R, Stantripop S, Tiongson EE, Tran JT, Tsurgeon C, Vogt JL, Walker MA, Wetherby KD, Wiggins LS, Young AC, Zhang LH, Osoegawa K, Zhu B, Zhao B, Shu CL, De Jong PJ, Lawrence CE, Smit AF, Chakravarti A, Haussler D, Green P, Miller W, Green ED (2003) Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424:788–793CrossRefPubMedGoogle Scholar
  121. Thompson JD, Higgins DG, Gibson TJ (1994) Clustal W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680PubMedGoogle Scholar
  122. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X (2001) The sequence of the human genome. Science 291:1304–1351CrossRefPubMedGoogle Scholar
  123. Watterson GA (1975) On the number of segregating sites in genetical models without recombination. Theor Popul Biol 7:256–276PubMedGoogle Scholar
  124. Weiner J (1994) The beak of the finch. Vintage, New YorkGoogle Scholar
  125. Werner T (2003a) Promoters can contribute to the elucidation of protein function. Trends Biotechnol 21:9–13CrossRefPubMedGoogle Scholar
  126. Werner T (2003b) The state of the art of mammalian promoter recognition. Brief Bioinf 4:22–30CrossRefGoogle Scholar
  127. Wiehe T, Guigo R, Miller W(2000) Genome sequence comparisons: hurdles in the fast lane to functional genomics. Brief Bioinform 1:381–388PubMedGoogle Scholar
  128. Willey JS, Dao-Ung LP, Sluyter R, Shemon AN, Li C, Taper J, Gallo J, Manoharan A (2002) A loss-of-function polymorphic mutation in the cytolytic P2X7 receptor gene and chronic lymphocytic leukaemia: a molecular study. Lancet 359:1114–1119CrossRefPubMedGoogle Scholar
  129. Wiuf C, Hein J (2000) The coalescent with gene conversion. Genetics 155:451–462PubMedGoogle Scholar
  130. Wong GKS, Passey DA, Yu J (2001) Most of the human genome is transcribed. Genome Res 11:1975–1977CrossRefPubMedGoogle Scholar
  131. Zhang CT, Zhang R, Ou HY (2003) The Z curve database: a graphic representation of genome sequences. Bioinformatics 19:593–599CrossRefPubMedGoogle Scholar
  132. Zhou W, Goodman SN, Galizia G, Lieto C, Ferraraccio F, Pignatelli C, Purdie CA, Piris J, Morris R, Harrison DJ, Paty PB, Culliford A, Romans KE, Montgomery EA, Choti MA, Kinzler KW, Vogelstein B (2002) Counting alleles to predict recurrence of early-stage colorectal cancers. Lancet 359:219–225CrossRefPubMedGoogle Scholar
  133. Zmaskek CM, Eddy SR (2001) ATV: display and manipulation of annotated phylgenetic trees. Bioinformatics 17:383–384CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag 2004

Authors and Affiliations

  1. 1.Fachbereich Biotechnologie & BioinformatikFachhochschule WeihenstephanFreisingGermany
  2. 2.Institut für GenetikUniversität zu KölnCologneGermany

Personalised recommendations