Ordering Partially Assembled Genomes Using Gene Arrangements

  • Éric Gaul
  • Mathieu Blanchette
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4205)

Abstract

Several mammalian genomes will only be sequenced at a 2X coverage, resulting in the impossibility of assembling contigs into chromosomes. We introduce the problem of ordering the contigs of two partially assembled genomes so as to maximize the similarity (measured in terms of genome rearrangements) between the assembled genomes. We give a linear-time algorithm for the Block Ordering Problem (BOP): Given two signed permutations (genomes) that are been broken into blocks (contigs), order and orient each set of blocks, in such a way that the number of cycles in the breakpoint graph of the resulting permutations is maximized. We illustrate our algorithm on a set of 90 markers from the human and mouse chromosomes X and show how the size of the contigs and the rearrangement distance between the two genomes affects the accuracy of the predicted assemblies. The appendix and an implementation are available at www.mcb.mcgill.ca/~egaul/recomb2006.

Keywords

Genome rearrangement gene order breakpoint graph genome assembly 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bourque, G., Pevzner, P.A., Tesler, G.: Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes. Genome Research 14(4), 507–516 (2004)CrossRefGoogle Scholar
  2. 2.
    Ma, J., Zhang, L., Suh, B., Raney, B.J., Kent, W.J., Blanchette, M., Haussler, D., Miller, W.: Reconstructing contiguous regions of an ancestral genome. Genome Research (in press, 2006)Google Scholar
  3. 3.
    Murphy, W.J., Larkin, D.M., van der Wind, A.E., Bourque, G., Tesler, G., Auvil, L., Beever, J.E., Chowdhary, B.P., Galibert, F., Gatzke, L., Hitte, C., Meyers, S.N., Milan, D., Ostrander, E.A., Pape, G., Parker, H.G., Raudsepp, T., Rogatcheva, M.B., Schook, L.B., Skow, L.C., Welge, M., Womack, J.E., O’brien, S.J., Pevzner, P.A., Lewin, H.A.: Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps. Science 309(5734), 613–617 (2005)CrossRefGoogle Scholar
  4. 4.
    El-Mabrouk, N., Nadeau, J., Sankoff, D.: Genome halving. In: Farach-Colton, M. (ed.) CPM 1998. LNCS, vol. 1448, pp. 235–250. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    Murphy, W.J., Bourque, G., Tesler, G., Pevzner, P., O’Brien, S.J.: Reconstructing the genomic architecture of mammalian ancestors using multispecies comparative maps. Hum Genomics 1(1), 30–40 (2003)Google Scholar
  6. 6.
    Tang, J., Moret, B.M.E.: Scaling up accurate phylogenetic reconstruction from gene-order data. Bioinformatics 19 (Suppl. 1), 305–312 (2003)CrossRefGoogle Scholar
  7. 7.
    Tesler, G.: GRIMM: genome rearrangements web server. Bioinformatics 18(3), 492–493 (2002)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Bourque, G., Tesler, G., Pevzner, P.A.: The convergence of cytogenetics and rearrangement-based models for ancestral genome reconstruction. Genome Res. 16(3), 311–313 (2006)CrossRefGoogle Scholar
  9. 9.
    Froenicke, L., CaldŽs, M.G., Graphodatsky, A., MŸller, S., Lyons, L.A., Robinson, T.J., Volleth, M., Yang, F., Wienberg, J.: Are molecular cytogenetics and bioinformatics suggesting diverging models of ancestral mammalian genomes? Genome Res. 16(3), 306–310 (2006)CrossRefGoogle Scholar
  10. 10.
    Peng, Q., Pevzner, P.A., Tesler, G.: The Fragile Breakage versus Random Breakage Models of Chromosome Evolution. PLoS Comput. Biol. 2(2), 14 (2006)CrossRefGoogle Scholar
  11. 11.
    Sankoff, D., Trinh, P.: Chromosomal breakpoint reuse in genome sequence rearrangement. J. Comput. Biol. 12(6), 812–821 (2005)CrossRefGoogle Scholar
  12. 12.
    Margulies, E.H., Vinson, J.P., Miller, W., Jaffe, D.B., Lindblad-Toh, K., Chang, J.L., Green, E.D., Lander, E.S., Mullikin, J.C., Clamp, M.: (NISC) Comparative Sequencing Program. An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. Proc. Natl. Acad. Sci. USA 102(13), 4795–4800 (2005)CrossRefGoogle Scholar
  13. 13.
    Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R.K., Gibbs, R.A., Kent, W.J., Miller, W., Haussler, D.: Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15(8), 1034–1050 (2005)CrossRefGoogle Scholar
  14. 14.
    Bourque, G., Pevzner, P.A.: Genome-Scale Evolution: Reconstructing Gene Orders in the Ancestral Species. Genome Res. 12(1), 26–36 (2002)Google Scholar
  15. 15.
    Bergeron, A.: A very elementary presentation of the Hannenhalli-Pevzner theory. Discrete Applied Mathematics 146(2), 134–145 (2005)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Hannenhalli, S., Pevzner, P.: Transforming cabbage into turnip polynomial algorithm for sorting signed permutations by reversals. In: Proceedings of the 27th Annual ACM Symposium on the Theory of Computing, pp. 178–187 (1995)Google Scholar
  17. 17.
    Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16), 3340–3346 (2005)CrossRefGoogle Scholar
  18. 18.
    Zheng, C., Lenert, A., Sankoff, D.: Reversal distance for partially ordered genomes. Bioinformatics 21(suppl. 1), i502–i508 (2005)CrossRefGoogle Scholar
  19. 19.
    Bergeron, A., Mixtacki, J., Stoye, J.: 10. In: Mathematics of Evolution and Phylogeny: the Inversion Distance Problem, 1st edn., pp. 262–290. Oxford University Press, Oxford (2005)Google Scholar
  20. 20.
    Pevzner, P., Tesler, G.: Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution. Proc. Natl. Acad. Sci. USA 100(13), 7672–7677 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Éric Gaul
    • 1
  • Mathieu Blanchette
    • 1
  1. 1.McGill Centre for BioinformaticsMcGill University 

Personalised recommendations