On the Approximability of Comparing Genomes with Duplicates

Angibaud, Sébastien; Fertin, Guillaume; Rusu, Irena

doi:10.1007/978-3-540-77891-2_4

Sébastien Angibaud¹,
Guillaume Fertin¹ &
Irena Rusu¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4921))

Included in the following conference series:

International Workshop on Algorithms and Computation

397 Accesses
10 Citations

Abstract

A central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogenetic tree. A large number of such measures has been proposed in the recent past: number of reversals, number of breakpoints, number of common or conserved intervals, SAD etc. In their initial definitions, all these measures suppose that genomes contain no duplicates. However, we now know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. Then, after a gene relabeling according to this matching and a deletion of the unmatched signed genes, two genomes without duplicates are obtained and the measure can be computed.

In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar model, maximum matching model and non maximum matching model). We prove that, for each model and each measure, computing a matching between two genomes that optimizes the measure is APX-Hard. We show that this result remains true even for two genomes G ₁ and G ₂ such that G ₁ contains no duplicates and no gene of G ₂ appears more than twice. Therefore, our results extend those of [5,6,8]. Finally, we propose a 4-approximation algorithm for a measure closely related to the number of breakpoints, the number of adjacencies, under the maximum matching model, in the case where genomes contain the same number of duplications of each gene.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alimonti, P., Kann, V.: Some APX-completeness results for cubic graphs. Theoretical Computer Science 237(1-2), 123–134 (2000)
Article MATH MathSciNet Google Scholar
Angibaud, S., Fertin, G., Rusu, I., Vialette, S.: A general framework for computing rearrangement distances between genomes with duplicates. Journal of Computational Biology 14(4), 379–393 (2007)
Article MathSciNet Google Scholar
Bafna, V., Pevzner, P.: Sorting by reversals: genome rearrangements in plant organelles and evolutionary history of X chromosome. Molecular Biology and Evolution, 239–246 (1995)
Google Scholar
Blin, G., Rizzi, R.: Conserved interval distance computation between non-trivial genomes. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 22–31. Springer, Heidelberg (2005)
Chapter Google Scholar
Bryant, D.: The complexity of calculating exemplar distances. In: Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignement, and the Evolution of Gene Families, pp. 207–212. Kluwer Academic Publishers, Dordrecht (2000)
Google Scholar
Chauve, C., Fertin, G., Rizzi, R., Vialette, S.: Genomes containing duplicates are hard to compare. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J.J. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 783–790. Springer, Heidelberg (2006)
Chapter Google Scholar
Chen, Z., Fu, B., Xu, J., Yang, B., Zhao, Z., Zhu, B.: Non-breaking similarity of genomes with gene repetitions. In: CPM 2007. LNCS, vol. 4580, pp. 119–130. Springer, Heidelberg (2007)
Google Scholar
Chen, Z., Fu, B., Zhu, B.: The approximability of the exemplar breakpoint distance problem. In: Cheng, S.-W., Poon, C.K. (eds.) AAIM 2006. LNCS, vol. 4041, pp. 291–302. Springer, Heidelberg (2006)
Chapter Google Scholar
Crochemore, M., Hermelin, D., Landau, G.M., Vialette, S.: Approximating the 2-interval pattern problem. In: Brodal, G.S., Leonardi, S. (eds.) ESA 2005. LNCS, vol. 3669, pp. 426–437. Springer, Heidelberg (2005)
Chapter Google Scholar
Goldstein, A., Kolman, P., Zheng, Z.: Minimum common string partition problem: Hardness and approximations. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 473–484. Springer, Heidelberg (2004)
Google Scholar
Kolman, P., Waleń, T.: Reversal distance for strings with duplicates: Linear time approximation using hitting set. In: Erlebach, T., Kaklamanis, C. (eds.) WAOA 2006. LNCS, vol. 4368, pp. 279–289. Springer, Heidelberg (2007)
Chapter Google Scholar
Li, W., Gu, Z., Wang, H., Nekrutenko, A.: Evolutionary analysis of the human genome. Nature (409), 847–849 (2001)
Article Google Scholar
Marron, M., Swenson, K.M., Moret, B.M.E.: Genomic distances under deletions and insertions. Theoretical Computer Science 325(3), 347–360 (2004)
Article MATH MathSciNet Google Scholar
Papadimitriou, C., Yannakakis, M.: Optimization, approximation, and complexity classes. Journal of Computer and System Sciences 43(3), 425–440 (1991)
Article MATH MathSciNet Google Scholar
Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)
Article Google Scholar
Sankoff, D., Haque, L.: Power boosts for cluster tests. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 121–130. Springer, Heidelberg (2005)
Chapter Google Scholar
Tang, J., Moret, B.M.E.: Phylogenetic reconstruction from gene-rearrangement data with unequal gene content. In: Dehne, F., Sack, J.-R., Smid, M. (eds.) WADS 2003. LNCS, vol. 2748, pp. 37–46. Springer, Heidelberg (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique de Nantes-Atlantique (LINA), FRE CNRS 2729, Université de Nantes, 2 rue de la Houssinière, 44322, Nantes, Cedex 3, France
Sébastien Angibaud, Guillaume Fertin & Irena Rusu

Authors

Sébastien Angibaud
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Fertin
View author publications
You can also search for this author in PubMed Google Scholar
Irena Rusu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Shin-ichi Nakano Md. Saidur Rahman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Angibaud, S., Fertin, G., Rusu, I. (2008). On the Approximability of Comparing Genomes with Duplicates. In: Nakano, Si., Rahman, M.S. (eds) WALCOM: Algorithms and Computation. WALCOM 2008. Lecture Notes in Computer Science, vol 4921. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77891-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-77891-2_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77890-5
Online ISBN: 978-3-540-77891-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics