Advertisement

Phylogenomics pp 171-184 | Cite as

Computational Reconstruction of Ancestral DNA Sequences

  • Mathauieu Blanchette
  • Abdoulaye Baniré Diallo
  • Eric D. Green
  • Webb Miller
  • David Haussler
Part of the Methods in Molecular Biology™ book series (MIMB, volume 422)

Abstract

This chapter introduces the problem of ancestral sequence reconstruction: given a set of extant orthologous DNA genomic sequences (or even whole-genomes), together with a phylogenetic tree relating these sequences, predict the DNA sequence of all ancestral species in the tree. Blanchette et al. (1) have shown that for certain sets of species (in particular, for eutherian mammals), very accurate reconstruction can be obtained. We explain the main steps involved in this process, including multiple sequence alignment, insertion and deletion inference, substitution inference, and gene arrangement inference. We also describe a simulation-based procedure to assess the accuracy of the reconstructed sequences. The whole reconstruction process is illustrated using a set of mammalian sequences from the CFTR region.

Key Words

Ancestral DNA sequence reconstruction multiple sequences alignment mammalian phylogeny; mammalian evolution substitutions and indels reconstruction ancestral sequence reconstruction accuracy 

References

  1. 1.
    Blanchette, M., Green, E. D., Webb, M., and Haussler, D. (2004) Reconstructing large regions of an ancestral mammalian genome in silico. Genome Res. 14, 2412–2423.CrossRefPubMedGoogle Scholar
  2. 2.
    International Human Genome Sequencing Consortium, Lander, E., et al. (2001) Initial sequencing and analysis of the human genome. Nature 5, 409(6822), 860–921 (PMID: 12466850).CrossRefPubMedGoogle Scholar
  3. 3.
    International Mouse Genome Sequencing Consortium, Waterston, R. H., Lindblad-Toh, K., Birney, E., et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 5, 420(6915), 520–562 (PMID: 12466850).CrossRefPubMedGoogle Scholar
  4. 4.
    Rat Genome Sequencing Project Consortium, Gibbs, R. A., Weinstock, G. M., Metzker, M. L., et al. (2004) Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493–521.CrossRefPubMedGoogle Scholar
  5. 5.
    Margulies, E. H., Blanchette, M., NISC Comparative Sequencing Program, Haussler, D., and Green, E. (2003) Identification and characterization of multi-species conserved sequences. Genome Res. 13(12), 2507–2518 (PMID: 14656959).CrossRefPubMedGoogle Scholar
  6. 6.
    Cooper, G. M., Brudno, M., Green, E. D., Batzoglou, S., and Sidow, A. (2003) Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. Genome Res. 13(5), 813–820.CrossRefPubMedGoogle Scholar
  7. 7.
    Bejerano, G., Pheasant, M., Makunin, I., et al. (2004) Ultraconserved elements in the human genome. Science 304(5675), 1321–1325.CrossRefPubMedGoogle Scholar
  8. 8.
    Goodman, M., Barnabas, J., Matsuda, G., and Moore, G. W. (1971) Molecular evolution in the descent of man. Nature 233, 604–613.CrossRefPubMedGoogle Scholar
  9. 9.
    Enard, W., Przeworski, M., Fisher, S. E., et al. (2002) Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418(6900), 869–872.CrossRefPubMedGoogle Scholar
  10. 10.
    Eizirik, E., Murphy, W. J., and O’Brien, S. J. (2001) Molecular dating and biogeography of the early placental mammal radiation. J. Hered. 92(2), 212–219 (PMID: 11396581).CrossRefPubMedGoogle Scholar
  11. 11.
    Springer, M. S., Murphy, W. J., Eizirik, E., and O’Brien, S. J. (2003). Placental mammal diversification and the Cretaceous-Tertiary boundary. Proc. Natl Acad. Sci. U. S A 4, 100(3), 1056–1060 (PMID: 12552136).CrossRefPubMedGoogle Scholar
  12. 12.
    Thomas, J., Touchman, J. W., Blakesley, R. W., et al. (2003) Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788–793.CrossRefPubMedGoogle Scholar
  13. 13.
    Karolchick, D., Baertsch, R., Diekhans, M., et al. (2003) The UCSC genome browser database. Nucleic Acids Res. 31, 51–54.CrossRefGoogle Scholar
  14. 14.
    Maddison, D. R. and Schulz K.-S. (ed.) (2004) The Tree of Life Web Project. http://tolweb.org
  15. 15.
    Felsenstein, J. (1989) PHYLIP-Phylogeny inference package (Version 3.2). Cladistics 5, 164–166.Google Scholar
  16. 16.
    Swofford, D. L. (2003) PAUP: Phylogenetic Analysis Using Parsimony. Sinauer, Sunderland, MA.Google Scholar
  17. 17.
    Huelsenbeck, J. P. and Ronquist, F. (2001) MrBayes: Bayesian inference of phylogeny. Bioinformatics 17, 754–755.CrossRefPubMedGoogle Scholar
  18. 18.
    Bray, N. and Pachter, L. (2004) MAVID: constrained ancestral alignment of multiple sequences. Genome Res. 14, 693–699.CrossRefPubMedGoogle Scholar
  19. 19.
    Cooper, G. M., Stone, E. A., Asimenos, G., et al. (2005) Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15(7), 901–913.CrossRefPubMedGoogle Scholar
  20. 20.
    Blanchette, M., Kent, W. J., Riemer, C., et al. (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 14(4), 708–715 (PMID: 15060014).CrossRefPubMedGoogle Scholar
  21. 21.
    Schwartz, S., Kent, W. J., Smith, A., et al. (2003) Human-mouse alignments with BLASTZ. Genome Res. 13(1), 103–107.CrossRefPubMedGoogle Scholar
  22. 22.
    Chindelevitch, L., Li, Z., Blais, E., and Blanchette, M. (2006) On the inference of parsimonious indel evolutionary scenarios. J. Bioinformatics Comput. Biol. in press.Google Scholar
  23. 23.
    Fredslund, J., Hein, J., and Scharling, T. (2003) A large version of the small parsimony problem. Lecture Notes in Bioinformatics, Proceedings of WABI’03. 2812, 417–432.Google Scholar
  24. 24.
    Yang, Z., Kumar, S., and Nei, M. (1995) A new method of inference of ancestral nucleotide and amino acid sequences. Genetics 141, 1641–1650.PubMedGoogle Scholar
  25. 25.
    Siepel, A. and Haussler, D. (2003) Combining phylogenetic and hidden Markov models in biosequence analysis. Proceedings of the 7th Annual International. Conference on Research in Computational Molecular Biology. pp. 277–286.Google Scholar
  26. 26.
    Bourque, G. and Pevzner, P. (2002) Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res. 12(1), 26–36.PubMedGoogle Scholar
  27. 27.
    Stoye, J., Evers, D., and Meyer, F. (1997) Generating benchmarks for multiple sequence alignments and phylogenetic reconstructions. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 303–204 (PMID: 9322053).PubMedGoogle Scholar
  28. 28.
    Hasegawa, M., Kishino, H., and Yano, T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22(2), 160–174.CrossRefPubMedGoogle Scholar
  29. 29.
    Kent, J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. (2003). Evolution’s cauldron: duplication, deletion and rearrangement in the mouse and human genomes, Proc. Natl Acad. Sci. USA 100(20), 11,848–11,489.CrossRefGoogle Scholar
  30. 30.
    Jurka, J. (2002) Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16(9), 418–420 (PMID: 10973072).CrossRefGoogle Scholar
  31. 31.
    Smit, A. and Green, P. (1999) RepeatMasker, http://ftp.genome.washington.edu/RM/RepeatMasker.html
  32. 32.
    Hoeffding, W. (1963) Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–27.CrossRefGoogle Scholar
  33. 33.
    Le Cam, L. (1986) Asymptotic Methods in Statistical Decision Theory, Springer, New York.Google Scholar
  34. 34.
    Lucena, B. and Haussler, D. (2005) Counterexample to a claim about the reconstruction of an ancestral character states. Syst Biol. 54(4), 693–695.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press Inc., Totowa, NJ 2008

Authors and Affiliations

  • Mathauieu Blanchette
    • 1
  • Abdoulaye Baniré Diallo
    • 1
  • Eric D. Green
    • 2
  • Webb Miller
    • 3
  • David Haussler
    • 4
  1. 1.McGill Centre for BioinformaticsMcGill UniversityMontrealCanada
  2. 2.National Human Genome Research Institute, National Institutes of HealthBethesda
  3. 3.Center for Comparative Genomics and BioinformaticsPenn StateUniversity Park
  4. 4.Howard Hughes Medical InstituteUniversity of CaliforniaSanta Cruz

Personalised recommendations