# Algorithms for Closest and Farthest String Problems via Rank Distance

• Liviu P. Dinu
• Bogdan C. Dumitru
• Alexandru Popa
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11436)

## Abstract

A new distance between strings, termed rank distance, was introduced by Dinu (Fundamenta Informaticae, 2003). Since then, the properties of rank distance were studied in several papers. In this article, we continue the study of rank distance. More precisely we tackle three problems that concern the distance between strings.

1. 1.

The first problem that we study is String with Fixed Rank Distance (SFRD): given a set of strings S and an integer d decide if there exists a string that is at distance d from every string in S. For this problem we provide a polynomial time exact algorithm.

2. 2.

The second problem that we study is named is the Closest String Problem under Rank Distance (CSRD). The input consists of a set of strings S, asks to find the minimum integer d and a string that is at distance at most d from all strings in S. Since this problem is NP-hard (Dinu and Popa, CPM 2012) it is likely that no polynomial time algorithm exists. Thus, we propose three different approaches: a heuristic approach and two integer linear programming formulations, one of them using geometric interpretation of the problem.

3. 3.

Finally, we approach the Farthest String Problem via Rank Distance (FSRD) that asks to find two strings with the same frequency of characters (i.e. the same Parikh vector) that have the largest possible rank distance. We provide a polynomial time exact algorithm for this problem.

## References

1. 1.
Arbib, C., Felici, G., Servilio, M., Ventura, P.: Optimum solution of the closest string problem via rank distance. In: Cerulli, R., Fujishige, S., Mahjoub, A.R. (eds.) ISCO 2016. LNCS, vol. 9849, pp. 297–307. Springer, Cham (2016). Google Scholar
2. 2.
Babaie, M., Mousavi, S.R.: A memetic algorithm for closest string problem and farthest string problem. In: 2010 18th Iranian Conference on Electrical Engineering. IEEE, May 2010Google Scholar
3. 3.
Bādoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC 2002, pp. 250–257. ACM, New York (2002)Google Scholar
4. 4.
Ben-Dor, A., Lancia, G., Ravi, R., Perone, J.: Banishing bias from consensus sequences. In: Apostolico, A., Hein, J. (eds.) CPM 1997. LNCS, vol. 1264, pp. 247–261. Springer, Heidelberg (1997). Google Scholar
5. 5.
de la Higuera, C., Casacuberta, F.: Topology of strings: median string is NP-complete. Theor. Comput. Sci. 230(1–2), 39–48 (2000)
6. 6.
Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM J. Comput. 32(4), 1073–1090 (2003)
7. 7.
Deza, E., Deza, M.: Dictionary of Distances. North-Holland, Amsterdam (2006)
8. 8.
Dinu, A., Dinu, L.P.: On the syllabic similarities of romance languages. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 785–788. Springer, Heidelberg (2005). Google Scholar
9. 9.
Dinu, L.P.: On the classification and aggregation of hierarchies with different constitutive elements. Fundam. Inform. 55(1), 39–50 (2003)
10. 10.
Dinu, L.P., Ionescu, R., Tomescu, A.: A rank-based sequence aligner with applications in phylogenetic analysis. PLoS ONE 9(8), e104006 (2014)Google Scholar
11. 11.
Dinu, L.P., Manea, F.: An efficient approach for the rank aggregation problem. Theor. Comput. Sci. 359(1–3), 455–461 (2006)
12. 12.
Dinu, L.P., Popa, A.: On the closest string via rank distance. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 413–426. Springer, Heidelberg (2012). Google Scholar
13. 13.
Dinu, L.P., Sgarro, A.: A low-complexity distance for DNA strings. Fundam. Inform. 73(3), 361–372 (2006)
14. 14.
Frances, M., Litman, A.: On covering problems of codes. Theory Comput. Syst. 30(2), 113–119 (1997)
15. 15.
Gagolewski, M.: Data Fusion: Theory, Methods, and Applications. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland (2015)Google Scholar
16. 16.
Gramm, J., Huffner, F., Niedermeier, R.: Closest strings, primer design, and motif search. In: Currents in Computational Molecular Biology. RECOMB, pp. 74–75 (2002)Google Scholar
17. 17.
Greenhill, S.J.: Levenshtein distances fail to identify language relationships accurately. Comput. Linguist. 37(4), 689–698 (2011)Google Scholar
18. 18.
Ionescu, R.T., Popescu, M.: Knowledge Transfer between Computer Vision and Text Mining - Similarity-Based Learning Approaches. Advances in Computer Vision and Pattern Recognition. Springer, Cham (2016). Google Scholar
19. 19.
Ionescu, R.T., Popescu, M., Cahill, A.: String kernels for native language identification: insights from behind the curtains. Comput. Linguist. 42(3), 491–525 (2016)
20. 20.
Kannan, R.: Minkowski’s convex body theorem and integer programming. Math. Oper. Res. 12(3), 415–440 (1987)
21. 21.
Koonin, E.V.: The emerging paradigm and open problems in comparative genomics. Bioinformatics 15(4), 265–266 (1999)Google Scholar
22. 22.
Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. Inf. Comput. 185(1), 41–55 (2003)
23. 23.
Lenstra, H.W.: Integer programming with a fixed number of variables. Math. Oper. Res. 8(4), 538–548 (1983)
24. 24.
Li, M., Ma, B., Wang, L.: Finding similar regions in many sequences. J. Comput. Syst. Sci. 65(1), 73–96 (2002)
25. 25.
Liu, X., He, H., Sýkora, O.: Parallel genetic algorithm and parallel simulated annealing algorithm for the closest string problem. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 591–597. Springer, Heidelberg (2005). Google Scholar
26. 26.
Meneses, C.N., Lu, Z., Oliveira, C.A.S., Pardalos, P.M.: Optimal solutions for the closest-string problem via integer programming. INFORMS J. Comput. 16(4), 419–429 (2004)
27. 27.
Nerbonne, J., Hinrichs, E.W.: Linguistic distances. In: Proceedings of the Workshop on Linguistic Distances, Sydney, July 2006, pp. 1–6 (2006)Google Scholar
28. 28.
Nicolas, F., Rivals, E.: Complexities of the centre and median string problems. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 315–327. Springer, Heidelberg (2003). Google Scholar
29. 29.
Nicolas, F., Rivals, E.: Hardness results for the center and median string problems under the weighted and unweighted edit distances. J. Discrete Algorithms 3(2–4), 390–415 (2005)
30. 30.
Popescu, M., Dinu, L.P.: Rank distance as a stylistic similarity. In: 22nd International Conference on Computational Linguistics, Posters Proceedings, COLING 2008, 18–22 August 2008, Manchester, UK, pp. 91–94 (2008)Google Scholar
31. 31.
Popov, V.Y.: Multiple genome rearrangement by swaps and by element duplications. Theor. Comput. Sci. 385(1–3), 115–126 (2007)
32. 32.
Ritter, J.: An efficient bounding sphere. In: Graphics Gems, pp. 301–303. Elsevier (1990)Google Scholar
33. 33.
Sun, Y., et al.: Combining genomic and network characteristics for extended capability in predicting synergistic drugs for cancer. Nat. Commun. 6, 8481 (2015)Google Scholar
34. 34.
Wang, L., Dong, L.: Randomized algorithms for motif detection. J. Bioinf. Comput. Biol. 3(5), 1039–1052 (2005)Google Scholar
35. 35.
Wooley, J.C.: Trends in computational biology: a summary based on a RECOMB plenary lecture. J. Comput. Biol. 6(3/4), 459–474 (1999)Google Scholar

© Springer Nature Switzerland AG 2019

## Authors and Affiliations

• Liviu P. Dinu
• 1
• 2
• Bogdan C. Dumitru
• 1
• 2
• Alexandru Popa
• 1
• 3
Email author
1. 1.Faculty of Mathematics and Computer ScienceUniversity of BucharestBucharestRomania
2. 2.Human Language Technologies Research CenterUniversity of BucharestBucharestRomania
3. 3.National Institute for Research and Development in InformaticsBucharestRomania