Whole-Genome Alignment

  • Colin N. DeweyEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 855)


Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.

Key words

Sequence alignment Whole-genome alignment Orthology map Toporthology Genome evolution Comparative genomics 


  1. 1.
    Loytynoja A (2012) Alignment methods: strategies, challenges, benchmarking, and comparative overview. In Anisimova, M., (ed.), Evolutionary genomics: statistical and computational methods (volume 1). Methods in Molecular Biology, Springer Science+Business media, LLCGoogle Scholar
  2. 2.
    Fleischmann RD, Adams MD, White O, et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512PubMedCrossRefGoogle Scholar
  3. 3.
    Kyrpides NC (1999) Genomes OnLine Database (GOLD 1.0): a monitor of complete and ongoing genome projects world-wide. Bioinformatics 15:773–4PubMedCrossRefGoogle Scholar
  4. 4.
    Fitch WM (1970) Distinguishing homologous from analogous proteins. Systematic Zoology 19:99–113PubMedCrossRefGoogle Scholar
  5. 5.
    Altenhoff AM, Dessimoz C (2012) Inferring orthology and paralogy. In Anisimova, M., (ed.), Evolutionary genomics: statistical and computational methods (volume 1). Methods in Molecular Biology, Springer Science+Business media, LLCGoogle Scholar
  6. 6.
    Dewey CN (2011) Positional orthology: putting genomic evolutionary relationships into context. Briefings in Bioinformatics. doi: 10.1093/bib/bbr040
  7. 7.
    Dewey CN, Pachter L (2006) Evolution at the nucleotide level: the problem of multiple whole-genome alignment. Human Molecular Genetics 15:R51–R56PubMedCrossRefGoogle Scholar
  8. 8.
    Blanchette M, Kent WJ, Riemer C, et al. (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research 14:708–15PubMedCrossRefGoogle Scholar
  9. 9.
    Ma J, Ratan A, Raney BJ, et al. (2008) The infinite sites model of genome evolution. Proceedings of the National Academy of Sciences of the United States of America 105:14254–61PubMedCrossRefGoogle Scholar
  10. 10.
    Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48:443–53PubMedCrossRefGoogle Scholar
  11. 11.
    Smith TF, Waterman MS (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147:195–7PubMedCrossRefGoogle Scholar
  12. 12.
    Tesler G (2002) GRIMM: genome rearrangements web server. Bioinformatics 18:492–3PubMedCrossRefGoogle Scholar
  13. 13.
    Paten B, Herrero J, Fitzgerald S, et al. (2008) Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Research 18:1829–43PubMedCrossRefGoogle Scholar
  14. 14.
    Ma J, Zhang L, Suh BB, et al. (2006) Reconstructing contiguous regions of an ancestral genome. Genome Research 16:1557–65PubMedCrossRefGoogle Scholar
  15. 15.
    Stark A, Lin MF, Kheradpour P, et al. (2007) Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450:219–232PubMedCrossRefGoogle Scholar
  16. 16.
    Alioto T (2012) Gene prediction. In Anisimova, M., (ed.), Evolutionary genomics: statistical and computational methods (volume 1). Methods in Molecular Biology, Springer Science+Business media, LLCGoogle Scholar
  17. 17.
    Eddy SR (2002) Computational genomics of noncoding RNA genes. Cell 109:137–40PubMedCrossRefGoogle Scholar
  18. 18.
    Margulies EH, Blanchette M, Haussler D, et al. (2003) Identification and characterization of multi-species conserved sequences. Genome Research 13:2507–18PubMedCrossRefGoogle Scholar
  19. 19.
    Tagle DA, Koop BF, Goodman M, et al. (1988) Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. Journal of Molecular Biology 203:439–55PubMedCrossRefGoogle Scholar
  20. 20.
    Bejerano G, Pheasant M, Makunin I, et al. (2004) Ultraconserved elements in the human genome. Science 304:1321–5PubMedCrossRefGoogle Scholar
  21. 21.
    Altschul SF, Gish W, Miller W, et al. (1990) Basic local alignment search tool. Journal of Molecular Biology 215:403–10PubMedGoogle Scholar
  22. 22.
    Altschul SF, Madden TL, Schäffer AA, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25:3389–402PubMedCrossRefGoogle Scholar
  23. 23.
    Brudno M, Malde S, Poliakov A, et al. (2003) Glocal alignment: finding rearrangements during alignment. Bioinformatics 19 Suppl 1:i54–62PubMedCrossRefGoogle Scholar
  24. 24.
    Ma B, Tromp J, Li M (2002) PatternHunter: faster and more sensitive homology search. Bioinformatics 18:440–5PubMedCrossRefGoogle Scholar
  25. 25.
    Sun Y, Buhler J (2004) Designing multiple simultaneous seeds for DNA similarity search. In: Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, 76–84. ACMGoogle Scholar
  26. 26.
    Xu J, Brown D, Li M, et al. (2006) Optimizing multiple spaced seeds for homology search. Journal of Computational Biology 13:1355–68PubMedCrossRefGoogle Scholar
  27. 27.
    Zhang L (2007) Superiority of spaced seeds for homology search. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4:496–505PubMedCrossRefGoogle Scholar
  28. 28.
    Schwartz S, Kent WJ, Smit A, et al. (2003) Human-mouse alignments with BLASTZ. Genome Research 13:103–7PubMedCrossRefGoogle Scholar
  29. 29.
    Delcher AL, Kasif S, Fleischmann RD, et al. (1999) Alignment of whole genomes. Nucleic Acids Research 27:2369–76PubMedCrossRefGoogle Scholar
  30. 30.
    Brudno M, Chapman M, Göttgens B, et al. (2003) Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4:66PubMedCrossRefGoogle Scholar
  31. 31.
    Brudno M, Do CB, Cooper GM, et al. (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Research 13:721–31PubMedCrossRefGoogle Scholar
  32. 32.
    Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  33. 33.
    Pevzner P, Tesler G (2003) Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. Genome Research 13:37–45PubMedCrossRefGoogle Scholar
  34. 34.
    Pham SK, Pevzner PA (2010) DRIMM-Synteny: decomposing genomes into evolutionary conserved segments. Bioinformatics 26:2509–16PubMedCrossRefGoogle Scholar
  35. 35.
    Dewey CN (2007) Aligning multiple whole genomes with Mercator and MAVID. In: Bergman N (ed) Methods in Molecular Biology, volume 395, 221–36. Humana Press, Clifton, NJGoogle Scholar
  36. 36.
    Paten B, Herrero J, Beal K, et al. (2008) Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Research 18:1814–28PubMedCrossRefGoogle Scholar
  37. 37.
    Hachiya T, Osana Y, Popendorf K, et al. (2009) Accurate identification of orthologous segments among multiple genomes. Bioinformatics 25:853–60PubMedCrossRefGoogle Scholar
  38. 38.
    Dubchak I, Poliakov A, Kislyuk A, et al. (2009) Multiple whole-genome alignments without a reference organism. Genome Research 19:682–9PubMedCrossRefGoogle Scholar
  39. 39.
    Darling AE, Mau B, Perna NT (2010) progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLoS One 5:e11147CrossRefGoogle Scholar
  40. 40.
    Angiuoli SV, Salzberg SL (2010) Mugsy: Fast multiple alignment of closely related whole genomes. Bioinformatics 27:334–342PubMedCrossRefGoogle Scholar
  41. 41.
    Pevzner PA, Pevzner PA, Tang H, et al. (2004) De novo repeat classification and fragment assembly. Genome Research 14:1786–96PubMedCrossRefGoogle Scholar
  42. 42.
    Paten B, Diekhans M, Earl D, et al. (2011) Cactus graphs for genome comparisons. Journal of Computational Biology 18:469–81PubMedCrossRefGoogle Scholar
  43. 43.
    Bray N, Pachter L (2004) MAVID: constrained ancestral alignment of multiple sequences. Genome Research 14:693–9PubMedCrossRefGoogle Scholar
  44. 44.
    Rausch T, Emde AK, Weese D, et al. (2008) Segment-based multiple sequence alignment. Bioinformatics 24:i187–92PubMedCrossRefGoogle Scholar
  45. 45.
    Bradley RK, Roberts A, Smoot M, et al. (2009) Fast statistical alignment. PLoS Computational Biology 5:e1000392CrossRefGoogle Scholar
  46. 46.
    Slater GSC, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31PubMedCrossRefGoogle Scholar
  47. 47.
    Flicek P, Amode MR, Barrell D, et al. (2011) Ensembl 2011. Nucleic Acids Research 39:D800–6PubMedCrossRefGoogle Scholar
  48. 48.
    Frazer KA, Pachter L, Poliakov A, et al. (2004) VISTA: computational tools for comparative genomics. Nucleic Acids Research 32:W273–9PubMedCrossRefGoogle Scholar
  49. 49.
    Kent WJ, Sugnet CW, Furey TS, et al. (2002) The Human Genome Browser at UCSC. Genome Research 12:996–1006PubMedGoogle Scholar
  50. 50.
    Kent WJ, Baertsch R, Hinrichs A, et al. (2003) Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proceedings of the National Academy of Sciences of the United States of America 100:11484–9PubMedCrossRefGoogle Scholar
  51. 51.
    Darling ACE, Mau B, Blattner FR, et al. (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Research 14:1394–403PubMedCrossRefGoogle Scholar
  52. 52.
    Edgar RC, Asimenos G, Batzoglou S, et al. Evolver: a whole-genome sequence evolution simulator Accessed 11 July 2011
  53. 53.
    Stoye J, Evers D, Meyer F (1998) Rose: generating sequence families. Bioinformatics 14:157–63PubMedCrossRefGoogle Scholar
  54. 54.
    Cartwright RA (2005) DNA assembly with gaps (Dawg): simulating sequence evolution. Bioinformatics 21:iii31–8PubMedCrossRefGoogle Scholar
  55. 55.
    Pollard DA, Moses AM, Iyer VN, et al. (2006) Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments. BMC Bioinformatics 7:376PubMedCrossRefGoogle Scholar
  56. 56.
    Varadarajan A, Bradley RK, Holmes IH (2008) Tools for simulating evolution of aligned genomic regions with integrated parameter estimation. Genome Biology 9:R147PubMedCrossRefGoogle Scholar
  57. 57.
    Fletcher W, Yang Z (2009) INDELible: a flexible simulator of biological sequence evolution. Molecular Biology and Evolution 26:1879–88PubMedCrossRefGoogle Scholar
  58. 58.
    Kim J, Sinha S (2010) Towards realistic benchmarks for multiple alignments of non-coding sequences. BMC Bioinformatics 11:54PubMedCrossRefGoogle Scholar
  59. 59.
    Margulies EH, Cooper GM, Asimenos G, et al. (2007) Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Research 17:760–774PubMedCrossRefGoogle Scholar
  60. 60.
    Morgenstern B, Rinner O, Abdeddaïm S, et al. (2002) Exon discovery by genomic sequence alignment. Bioinformatics 18:777–87PubMedCrossRefGoogle Scholar
  61. 61.
    Genome 10K Community of Scientists (2009) Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. The Journal of Heredity 100:659–74CrossRefGoogle Scholar
  62. 62.
    Lunter G, Rocco A, Mimouni N, et al. (2008) Uncertainty in homology inferences: Assessing and improving genomic sequence alignment. Genome Research 18:298–309PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Biostatistics and Medical Informatics and Computer Sciences, Genome Center of WisconsinUniversity of Wisconsin-MadisonMadisonUSA

Personalised recommendations