On the Closest String via Rank Distance

  • Liviu P. Dinu
  • Alexandru Popa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7354)


Given a set S of k strings of maximum length n, the goal of the closest substring problem (CSSP) is to find the smallest integer d (and a corresponding string t of length ℓ ≤ n) such that each string s ∈ S has a substring of length ℓ of “distance” at most d to t. The closest string problem (CSP) is a special case of CSSP where ℓ = n. CSP and CSSP arise in many applications in bioinformatics and are extensively studied in the context of Hamming and edit distance. In this paper we consider a recently introduced distance measure, namely the rank distance. First, we show that the CSP and CSSP via rank distance are NP-hard. Then, we present a polynomial time k-approximation algorithm for the CSP problem. Finally, we give a parametrized algorithm for the CSP (the parameter is the number of input strings) if the alphabet is binary and each string has the same number of 0’s and 1’s.


Close String Edit Distance Complete Bipartite Graph Input String Rank Aggregation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arrow, K.J.: Social Choice and Indivudual Values. Wiley, New York (1963)Google Scholar
  2. 2.
    Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing Bias from Consensus Sequences. In: Hein, J., Apostolico, A. (eds.) CPM 1997. LNCS, vol. 1264, pp. 247–261. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  3. 3.
    Berstein, Y., Onn, S.: Nonlinear bipartite matching. Disc. Optim. 5(1), 53–65 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    de la Higuera, C., Casacuberta, F.: Topology of Strings: Median String is NP-Complete. Theor. Comput. Sci. 230(1-2), 39–48 (2000)zbMATHCrossRefGoogle Scholar
  5. 5.
    Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM J. Comput. 32(4), 1073–1090 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Diaconis, P., Graham, R.L.: Spearman’s footrule as a measure of disarray. J. Royal Statist. Soc. Series B (Methodological) 39(2), 262–268 (1977)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Dinu, L.P.: On the classification and aggregation of hierarchies with different constitutive elements. Fundam. Inform. 55(1), 39–50 (2003)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Dinu, L.P., Manea, F.: An efficient approach for the rank aggregation problem. Theor. Comput. Sci. 359(1-3), 455–461 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Dinu, L.P., Sgarro, A.: A low-complexity distance for dna strings. Fundam. Inform. 73(3), 361–372 (2006)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Frances, M., Litman, A.: On covering problems of codes. Theory Comput. Syst. 30(2), 113–119 (1997)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Gramm, J., Huffner, F., Niedermeier, R.: Closest strings, primer design, and motif search. currents in computational molecular biology. In: RECOMB, pp. 74–75 (2002)Google Scholar
  12. 12.
    Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. Inf. Comput. 185(1), 41–55 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Li, M., Ma, B., Wang, L.: Finding similar regions in many sequences. J. Comput. Syst. Sci. 65(1), 73–96 (2002)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Liu, X., He, H., Sýkora, O.: Parallel Genetic Algorithm and Parallel Simulated Annealing Algorithm for the Closest String Problem. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 591–597. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  15. 15.
    Nicolas, F., Rivals, E.: Complexities of the Centre and Median String Problems. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 315–327. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  16. 16.
    Nicolas, F., Rivals, E.: Hardness results for the center and median string problems under the weighted and unweighted edit distances. J. Disc. Alg. 3(2-4), 390–415 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Palmer, J., Herbon, L.: Plant mitochondrial dna evolves rapidly in structure, but slowly in sequence. J. Mol. Evol. 28, 87–89 (1988)CrossRefGoogle Scholar
  18. 18.
    Popov, V.Y.: Multiple genome rearrangement by swaps and by element duplications. Theor. Comput. Sci. 385(1-3), 115–126 (2007)zbMATHCrossRefGoogle Scholar
  19. 19.
    Schwarz, N.: Rank aggregation by criteria. Minimizing the maximum Kendall-tau distance. Diplomarbeit, Jena (2009)Google Scholar
  20. 20.
    Wang, L., Dong, L.: Randomized algorithms for motif detection. J. Bioinf. and Comp. Biol. 3(5), 1039–1052 (2005)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Wooley, J.C.: Trends in computational biology: A summary based on a recomb plenary lecture. J. Comp. Biol. 6(3/4) (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Liviu P. Dinu
    • 1
  • Alexandru Popa
    • 2
  1. 1.Faculty of Mathematics and Computer ScienceUniversity of BucharestBucharestRomania
  2. 2.Department of Communications & NetworkingAalto University School of Electrical EngineeringAaltoFinland

Personalised recommendations