Whole-Genome Alignment

Dewey, Colin N.

doi:10.1007/978-1-61779-582-4_8

Whole-Genome Alignment

Colin N. Dewey²

Protocol
First Online: 01 January 2012

6487 Accesses
17 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 855))

Abstract

Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

Loytynoja A (2012) Alignment methods: strategies, challenges, benchmarking, and comparative overview. In Anisimova, M., (ed.), Evolutionary genomics: statistical and computational methods (volume 1). Methods in Molecular Biology, Springer Science+Business media, LLC
Google Scholar
Fleischmann RD, Adams MD, White O, et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512
Article PubMed CAS Google Scholar
Kyrpides NC (1999) Genomes OnLine Database (GOLD 1.0): a monitor of complete and ongoing genome projects world-wide. Bioinformatics 15:773–4
Article PubMed CAS Google Scholar
Fitch WM (1970) Distinguishing homologous from analogous proteins. Systematic Zoology 19:99–113
Article PubMed CAS Google Scholar
Altenhoff AM, Dessimoz C (2012) Inferring orthology and paralogy. In Anisimova, M., (ed.), Evolutionary genomics: statistical and computational methods (volume 1). Methods in Molecular Biology, Springer Science+Business media, LLC
Google Scholar
Dewey CN (2011) Positional orthology: putting genomic evolutionary relationships into context. Briefings in Bioinformatics. doi:10.1093/bib/bbr040
Dewey CN, Pachter L (2006) Evolution at the nucleotide level: the problem of multiple whole-genome alignment. Human Molecular Genetics 15:R51–R56
Article PubMed CAS Google Scholar
Blanchette M, Kent WJ, Riemer C, et al. (2004) Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research 14:708–15
Article PubMed CAS Google Scholar
Ma J, Ratan A, Raney BJ, et al. (2008) The infinite sites model of genome evolution. Proceedings of the National Academy of Sciences of the United States of America 105:14254–61
Article PubMed CAS Google Scholar
Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48:443–53
Article PubMed CAS Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147:195–7
Article PubMed CAS Google Scholar
Tesler G (2002) GRIMM: genome rearrangements web server. Bioinformatics 18:492–3
Article PubMed CAS Google Scholar
Paten B, Herrero J, Fitzgerald S, et al. (2008) Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Research 18:1829–43
Article PubMed CAS Google Scholar
Ma J, Zhang L, Suh BB, et al. (2006) Reconstructing contiguous regions of an ancestral genome. Genome Research 16:1557–65
Article PubMed CAS Google Scholar
Stark A, Lin MF, Kheradpour P, et al. (2007) Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature 450:219–232
Article PubMed CAS Google Scholar
Alioto T (2012) Gene prediction. In Anisimova, M., (ed.), Evolutionary genomics: statistical and computational methods (volume 1). Methods in Molecular Biology, Springer Science+Business media, LLC
Google Scholar
Eddy SR (2002) Computational genomics of noncoding RNA genes. Cell 109:137–40
Article PubMed CAS Google Scholar
Margulies EH, Blanchette M, Haussler D, et al. (2003) Identification and characterization of multi-species conserved sequences. Genome Research 13:2507–18
Article PubMed CAS Google Scholar
Tagle DA, Koop BF, Goodman M, et al. (1988) Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. Journal of Molecular Biology 203:439–55
Article PubMed CAS Google Scholar
Bejerano G, Pheasant M, Makunin I, et al. (2004) Ultraconserved elements in the human genome. Science 304:1321–5
Article PubMed CAS Google Scholar
Altschul SF, Gish W, Miller W, et al. (1990) Basic local alignment search tool. Journal of Molecular Biology 215:403–10
PubMed CAS Google Scholar
Altschul SF, Madden TL, Schäffer AA, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25:3389–402
Article PubMed CAS Google Scholar
Brudno M, Malde S, Poliakov A, et al. (2003) Glocal alignment: finding rearrangements during alignment. Bioinformatics 19 Suppl 1:i54–62
Article PubMed Google Scholar
Ma B, Tromp J, Li M (2002) PatternHunter: faster and more sensitive homology search. Bioinformatics 18:440–5
Article PubMed CAS Google Scholar
Sun Y, Buhler J (2004) Designing multiple simultaneous seeds for DNA similarity search. In: Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, 76–84. ACM
Google Scholar
Xu J, Brown D, Li M, et al. (2006) Optimizing multiple spaced seeds for homology search. Journal of Computational Biology 13:1355–68
Article PubMed CAS Google Scholar
Zhang L (2007) Superiority of spaced seeds for homology search. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4:496–505
Article PubMed CAS Google Scholar
Schwartz S, Kent WJ, Smit A, et al. (2003) Human-mouse alignments with BLASTZ. Genome Research 13:103–7
Article PubMed CAS Google Scholar
Delcher AL, Kasif S, Fleischmann RD, et al. (1999) Alignment of whole genomes. Nucleic Acids Research 27:2369–76
Article PubMed CAS Google Scholar
Brudno M, Chapman M, Göttgens B, et al. (2003) Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4:66
Article PubMed Google Scholar
Brudno M, Do CB, Cooper GM, et al. (2003) LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA. Genome Research 13:721–31
Article PubMed CAS Google Scholar
Gusfield D (1997) Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge
Book Google Scholar
Pevzner P, Tesler G (2003) Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. Genome Research 13:37–45
Article PubMed CAS Google Scholar
Pham SK, Pevzner PA (2010) DRIMM-Synteny: decomposing genomes into evolutionary conserved segments. Bioinformatics 26:2509–16
Article PubMed CAS Google Scholar
Dewey CN (2007) Aligning multiple whole genomes with Mercator and MAVID. In: Bergman N (ed) Methods in Molecular Biology, volume 395, 221–36. Humana Press, Clifton, NJ
Google Scholar
Paten B, Herrero J, Beal K, et al. (2008) Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Research 18:1814–28
Article PubMed CAS Google Scholar
Hachiya T, Osana Y, Popendorf K, et al. (2009) Accurate identification of orthologous segments among multiple genomes. Bioinformatics 25:853–60
Article PubMed CAS Google Scholar
Dubchak I, Poliakov A, Kislyuk A, et al. (2009) Multiple whole-genome alignments without a reference organism. Genome Research 19:682–9
Article PubMed CAS Google Scholar
Darling AE, Mau B, Perna NT (2010) progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement. PLoS One 5:e11147
Article Google Scholar
Angiuoli SV, Salzberg SL (2010) Mugsy: Fast multiple alignment of closely related whole genomes. Bioinformatics 27:334–342
Article PubMed Google Scholar
Pevzner PA, Pevzner PA, Tang H, et al. (2004) De novo repeat classification and fragment assembly. Genome Research 14:1786–96
Article PubMed CAS Google Scholar
Paten B, Diekhans M, Earl D, et al. (2011) Cactus graphs for genome comparisons. Journal of Computational Biology 18:469–81
Article PubMed CAS Google Scholar
Bray N, Pachter L (2004) MAVID: constrained ancestral alignment of multiple sequences. Genome Research 14:693–9
Article PubMed CAS Google Scholar
Rausch T, Emde AK, Weese D, et al. (2008) Segment-based multiple sequence alignment. Bioinformatics 24:i187–92
Article PubMed Google Scholar
Bradley RK, Roberts A, Smoot M, et al. (2009) Fast statistical alignment. PLoS Computational Biology 5:e1000392
Article Google Scholar
Slater GSC, Birney E (2005) Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31
Article PubMed Google Scholar
Flicek P, Amode MR, Barrell D, et al. (2011) Ensembl 2011. Nucleic Acids Research 39:D800–6
Article PubMed Google Scholar
Frazer KA, Pachter L, Poliakov A, et al. (2004) VISTA: computational tools for comparative genomics. Nucleic Acids Research 32:W273–9
Article PubMed CAS Google Scholar
Kent WJ, Sugnet CW, Furey TS, et al. (2002) The Human Genome Browser at UCSC. Genome Research 12:996–1006
PubMed CAS Google Scholar
Kent WJ, Baertsch R, Hinrichs A, et al. (2003) Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proceedings of the National Academy of Sciences of the United States of America 100:11484–9
Article PubMed CAS Google Scholar
Darling ACE, Mau B, Blattner FR, et al. (2004) Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Research 14:1394–403
Article PubMed CAS Google Scholar
Edgar RC, Asimenos G, Batzoglou S, et al. Evolver: a whole-genome sequence evolution simulator http://www.drive5.com/evolver. Accessed 11 July 2011
Stoye J, Evers D, Meyer F (1998) Rose: generating sequence families. Bioinformatics 14:157–63
Article PubMed CAS Google Scholar
Cartwright RA (2005) DNA assembly with gaps (Dawg): simulating sequence evolution. Bioinformatics 21:iii31–8
Article PubMed CAS Google Scholar
Pollard DA, Moses AM, Iyer VN, et al. (2006) Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments. BMC Bioinformatics 7:376
Article PubMed Google Scholar
Varadarajan A, Bradley RK, Holmes IH (2008) Tools for simulating evolution of aligned genomic regions with integrated parameter estimation. Genome Biology 9:R147
Article PubMed Google Scholar
Fletcher W, Yang Z (2009) INDELible: a flexible simulator of biological sequence evolution. Molecular Biology and Evolution 26:1879–88
Article PubMed CAS Google Scholar
Kim J, Sinha S (2010) Towards realistic benchmarks for multiple alignments of non-coding sequences. BMC Bioinformatics 11:54
Article PubMed Google Scholar
Margulies EH, Cooper GM, Asimenos G, et al. (2007) Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Research 17:760–774
Article PubMed CAS Google Scholar
Morgenstern B, Rinner O, Abdeddaïm S, et al. (2002) Exon discovery by genomic sequence alignment. Bioinformatics 18:777–87
Article PubMed CAS Google Scholar
Genome 10K Community of Scientists (2009) Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. The Journal of Heredity 100:659–74
Article Google Scholar
Lunter G, Rocco A, Mimouni N, et al. (2008) Uncertainty in homology inferences: Assessing and improving genomic sequence alignment. Genome Research 18:298–309
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Biostatistics and Medical Informatics and Computer Sciences, Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, USA
Colin N. Dewey

Authors

Colin N. Dewey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Colin N. Dewey .

Editor information

Editors and Affiliations

Department of Computer Science, ETH Zürich, Universitätsstr. 6, Zürich, 8092, Switzerland
Maria Anisimova

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Dewey, C.N. (2012). Whole-Genome Alignment. In: Anisimova, M. (eds) Evolutionary Genomics. Methods in Molecular Biology, vol 855. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-61779-582-4_8

Download citation

DOI: https://doi.org/10.1007/978-1-61779-582-4_8
Published: 07 February 2012
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-61779-581-7
Online ISBN: 978-1-61779-582-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics