Genomes Containing Duplicates Are Hard to Compare

  • Cedric Chauve
  • Guillaume Fertin
  • Romeo Rizzi
  • Stéphane Vialette
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3992)


In this paper, we are interested in the algorithmic complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes. In that case, there are usually two main ways to compute a given (dis)similarity measure M between two genomes G 1 and G 2: the first model, that we will call the matching model, consists in computing a one-to-one correspondence between genes of G 1 and genes of G 2, in such a way that M is optimized in the resulting permutation. The second model, called the exemplar model, consists in keeping in G 1 (resp. G 2) exactly one copy of each gene, thus deleting all the other copies, in such a way that M is optimized in the resulting permutation. We present here different results concerning the algorithmic complexity of computing three different similarity measures (number of common intervals, MAD number and SAD number) in those two models, basically showing that the problem becomes NP-completeness for each of them as soon as genomes contain duplicates. In the case of MAD and SAD, we actually prove that, under both models, both MAD and SAD problems are APX-hard.


Similarity Measure Duplicate Gene Algorithmic Complexity Match Model Exemplar Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Blin, G., Chauve, C., Fertin, G.: The breakpoint distance for signed sequences. In: 1st Int. Conference on Algorithms and Computational Methods for Biochemical and Evolutionary Networks, CompBioNets 2004. Texts in Algorithms, vol. 3, pp. 3–16. KCL Publications (2004)Google Scholar
  2. 2.
    Blin, G., Rizzi, R.: Conserved interval distance computation between non-trivial genomes. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 22–31. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J. (eds.) Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment, and the Evolution of Gene Families, pp. 207–212. Kluwer Acad. Pub., Dordrecht (2000)Google Scholar
  4. 4.
    Chen, X., Zheng, J., Fu, Z., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: Assignment of orthologous genes via genome rearrangement. IEEE/ACM Trans. on Comp. Biology and Bioinformatics 2(4), 302–315 (2005)CrossRefGoogle Scholar
  5. 5.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: a guide to the theory of NP-completeness. W.H. Freeman, San Franciso (1979)MATHGoogle Scholar
  6. 6.
    Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)CrossRefGoogle Scholar
  7. 7.
    Sankoff, D.: Gene and genome duplication. Curr. Opin. Genet. Dev. 11(6), 681–684 (2001)CrossRefGoogle Scholar
  8. 8.
    Sankoff, D., Haque, L.: Power boosts for cluster tests. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 121–130. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Cam Thach, N.: Algorithms for calculating exemplar distances. Honours Year Project Report, National University of Singapore (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Cedric Chauve
    • 1
  • Guillaume Fertin
    • 2
  • Romeo Rizzi
    • 3
  • Stéphane Vialette
    • 4
  1. 1.LaCIM, CGL, Département d’InformatiqueUniversité du Québec À Montréal CP 8888MontréalCanada
  2. 2.Laboratoire d’Informatique de Nantes-Atlantique (LINA)FRE CNRS 2729 Université de NantesNantes Cedex 3France
  3. 3.Dipartimento di Matematica e InformaticaUniversità di UdineItaly
  4. 4.Laboratoire de Recherche en Informatique (LRI), UMR CNRS 8623, Faculté des Sciences d’OrsayUniversité Paris-SudOrsayFrance

Personalised recommendations