Skip to main content

On the Approximability of Comparing Genomes with Duplicates

  • Conference paper
WALCOM: Algorithms and Computation (WALCOM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4921))

Included in the following conference series:

Abstract

A central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogenetic tree. A large number of such measures has been proposed in the recent past: number of reversals, number of breakpoints, number of common or conserved intervals, SAD etc. In their initial definitions, all these measures suppose that genomes contain no duplicates. However, we now know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. Then, after a gene relabeling according to this matching and a deletion of the unmatched signed genes, two genomes without duplicates are obtained and the measure can be computed.

In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar model, maximum matching model and non maximum matching model). We prove that, for each model and each measure, computing a matching between two genomes that optimizes the measure is APX-Hard. We show that this result remains true even for two genomes G 1 and G 2 such that G 1 contains no duplicates and no gene of G 2 appears more than twice. Therefore, our results extend those of [5,6,8]. Finally, we propose a 4-approximation algorithm for a measure closely related to the number of breakpoints, the number of adjacencies, under the maximum matching model, in the case where genomes contain the same number of duplications of each gene.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alimonti, P., Kann, V.: Some APX-completeness results for cubic graphs. Theoretical Computer Science 237(1-2), 123–134 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  2. Angibaud, S., Fertin, G., Rusu, I., Vialette, S.: A general framework for computing rearrangement distances between genomes with duplicates. Journal of Computational Biology 14(4), 379–393 (2007)

    Article  MathSciNet  Google Scholar 

  3. Bafna, V., Pevzner, P.: Sorting by reversals: genome rearrangements in plant organelles and evolutionary history of X chromosome. Molecular Biology and Evolution, 239–246 (1995)

    Google Scholar 

  4. Blin, G., Rizzi, R.: Conserved interval distance computation between non-trivial genomes. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 22–31. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  5. Bryant, D.: The complexity of calculating exemplar distances. In: Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignement, and the Evolution of Gene Families, pp. 207–212. Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  6. Chauve, C., Fertin, G., Rizzi, R., Vialette, S.: Genomes containing duplicates are hard to compare. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J.J. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 783–790. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Chen, Z., Fu, B., Xu, J., Yang, B., Zhao, Z., Zhu, B.: Non-breaking similarity of genomes with gene repetitions. In: CPM 2007. LNCS, vol. 4580, pp. 119–130. Springer, Heidelberg (2007)

    Google Scholar 

  8. Chen, Z., Fu, B., Zhu, B.: The approximability of the exemplar breakpoint distance problem. In: Cheng, S.-W., Poon, C.K. (eds.) AAIM 2006. LNCS, vol. 4041, pp. 291–302. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Crochemore, M., Hermelin, D., Landau, G.M., Vialette, S.: Approximating the 2-interval pattern problem. In: Brodal, G.S., Leonardi, S. (eds.) ESA 2005. LNCS, vol. 3669, pp. 426–437. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Goldstein, A., Kolman, P., Zheng, Z.: Minimum common string partition problem: Hardness and approximations. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 473–484. Springer, Heidelberg (2004)

    Google Scholar 

  11. Kolman, P., Waleń, T.: Reversal distance for strings with duplicates: Linear time approximation using hitting set. In: Erlebach, T., Kaklamanis, C. (eds.) WAOA 2006. LNCS, vol. 4368, pp. 279–289. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  12. Li, W., Gu, Z., Wang, H., Nekrutenko, A.: Evolutionary analysis of the human genome. Nature (409), 847–849 (2001)

    Article  Google Scholar 

  13. Marron, M., Swenson, K.M., Moret, B.M.E.: Genomic distances under deletions and insertions. Theoretical Computer Science 325(3), 347–360 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  14. Papadimitriou, C., Yannakakis, M.: Optimization, approximation, and complexity classes. Journal of Computer and System Sciences 43(3), 425–440 (1991)

    Article  MATH  MathSciNet  Google Scholar 

  15. Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)

    Article  Google Scholar 

  16. Sankoff, D., Haque, L.: Power boosts for cluster tests. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 121–130. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. Tang, J., Moret, B.M.E.: Phylogenetic reconstruction from gene-rearrangement data with unequal gene content. In: Dehne, F., Sack, J.-R., Smid, M. (eds.) WADS 2003. LNCS, vol. 2748, pp. 37–46. Springer, Heidelberg (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Shin-ichi Nakano Md. Saidur Rahman

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Angibaud, S., Fertin, G., Rusu, I. (2008). On the Approximability of Comparing Genomes with Duplicates. In: Nakano, Si., Rahman, M.S. (eds) WALCOM: Algorithms and Computation. WALCOM 2008. Lecture Notes in Computer Science, vol 4921. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77891-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77891-2_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77890-5

  • Online ISBN: 978-3-540-77891-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics