A Pseudo-boolean Programming Approach for Computing the Breakpoint Distance Between Two Genomes with Duplicate Genes

  • Sébastien Angibaud
  • Guillaume Fertin
  • Irena Rusu
  • Annelyse Thévenin
  • Stéphane Vialette
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4751)

Abstract

Comparing genomes of different species has become a crucial problem in comparative genomics. Recent research have resulted in different genomic distance definitions: number of breakpoints, number of common intervals, number of conserved intervals, Maximum Adjacency Disruption number (MAD), etc. Classical methods (usually based on permutations of gene order) for computing genomic distances between whole genomes are however seriously compromised for genomes where several copies of the same gene may be scattered across the genome. Most approaches to overcoming this difficulty are based on the exemplar method (keep exactly one copy in each genome of each duplicated gene) and the maximum matching method (keep as many copies as possible in each genome of each duplicated gene). Unfortunately, it turns out that, in presence of duplications, most problems are NP-hard, and hence several heuristics have been recently proposed.

Extending research initiated in [2], we propose in this paper a novel generic pseudo-boolean approach for computing the exact breakpoint distance between two genomes in presence of duplications for both the exemplar and maximum matching methods. We illustrate the application of this methodology on a well-known public benchmark dataset of γ-Proteobacteria.

Keywords

genome rearrangement duplication breakpoint distance heuristic pseudo-boolean programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Angibaud, S., Fertin, G., Rusu, I., Vialette, S.: How pseudo-boolean programming can help genome rearrangement distance computation. In: Bourque, G., El-Mabrouk, N. (eds.) Comparative Genomics. LNCS (LNBI), vol. 4205, pp. 75–86. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Angibaud, S., Fertin, G., Rusu, I., Vialette, S.: A general framework for computing rearrangement distances between genomes with duplicates. Journal of Computational Biology 14(4), 379–393 (2007)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Barth, P.: A Davis-Putnam based enumeration algorithm for linear pseudo-boolean optimization. Technical Report MPI-I-95-2-003, Max Planck Institut Informatik, p.13 (2005)Google Scholar
  4. 4.
    Blin, G., Chauve, C., Fertin, G.: The breakpoint distance for signed sequences. In: Proc. 1st Algorithms and Computational Methods for Biochemical and Evolutionary Networks (Comp. Bio. Nets.), pp. 3–16. KCL publications (2004)Google Scholar
  5. 5.
    Blin, G., Chauve, C., Fertin, G.: Genes order and phylogenetic reconstruction: Application to γ-proteobacteria. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 11–20. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Blin, G., Rizzi, R.: Conserved intervals distance computation between non-trivial genomes. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 22–31. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Bourque, G., Yacef, Y., El-Mabrouk, N.: Maximizing synteny blocks to identify ancestral homologs. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 21–35. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Bryant, D.: The complexity of calculating exemplar distances. In: Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment, and the Evolution of Gene Families, pp. 207–212. Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  9. 9.
    Chai, D., Kuehlmann, A.: A fast pseudo-boolean constraint solver. In: Proc. 40th ACM IEEE Conference on Design Automation, pp. 830–835. ACM Press, New York (2003)Google Scholar
  10. 10.
    Chauve, C., Fertin, G., Rizzi, R., Vialette, S.: Genomes containing duplicates are hard to compare. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J.J. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 783–790. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: Assignment of orthologous genes via genome rearrangement. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(4), 302–315 (2005)CrossRefGoogle Scholar
  12. 12.
    Eén, N., Sörensson, N.: Translating pseudo-boolean constraints into SAT. Journal on Satisfiability, Boolean Modeling and Computation 2, 1–26 (2006)MATHGoogle Scholar
  13. 13.
    Lerat, E., Daubin, V., Moran, N.A.: From gene tree to organismal phylogeny in prokaryotes: the case of γ-proteobacteria. PLoS Biology 1(1), 101–109 (2003)CrossRefGoogle Scholar
  14. 14.
    Marron, M., Swenson, K.M., Moret, B.M.E.: Genomic distances under deletions and insertions. Theoretical Computer Science 325(3), 347–360 (2004)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)CrossRefGoogle Scholar
  16. 16.
    Sankoff, D., Haque, L.: Power boosts for cluster tests. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 11–20. Springer, Heidelberg (2005)Google Scholar
  17. 17.
    Schrijver, A.: Theory of Linear and Integer Programming. John Wiley and Sons, Chichester (1998)MATHGoogle Scholar
  18. 18.
    Sheini, H.M., Sakallah, K.A.: Pueblo: A hybrid pseudo-boolean SAT solver. Journal on Satisfiability, Boolean Modeling and Computation 2, 165–189 (2006)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Sébastien Angibaud
    • 1
  • Guillaume Fertin
    • 1
  • Irena Rusu
    • 1
  • Annelyse Thévenin
    • 2
  • Stéphane Vialette
    • 2
  1. 1.Laboratoire d’Informatique de Nantes-Atlantique (LINA), FRE CNRS 2729, Université de Nantes, 2 rue de la Houssinière, 44322 Nantes Cedex 3France
  2. 2.Laboratoire de Recherche en Informatique (LRI), UMR CNRS 8623, Faculté des Sciences d’Orsay - Université Paris-Sud, 91405 OrsayFrance

Personalised recommendations