COCOON 2017: Computing and Combinatorics pp 396-406

# Approximating Weighted Duo-Preservation in Comparative Genomics

• Saeed Mehrabi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10392)

## Abstract

Motivated by comparative genomics, Chen et al. [9] introduced the Maximum Duo-preservation String Mapping (MDSM) problem in which we are given two strings $$s_1$$ and $$s_2$$ from the same alphabet and the goal is to find a mapping $$\pi$$ between them so as to maximize the number of duos preserved. A duo is any two consecutive characters in a string and it is preserved in the mapping if its two consecutive characters in $$s_1$$ are mapped to same two consecutive characters in $$s_2$$. The MDSM problem is known to be NP-hard and there are approximation algorithms for this problem [3, 5], all of which consider only the “unweighted” version of the problem in the sense that a duo from $$s_1$$ is preserved by mapping to any same duo in $$s_2$$ regardless of their positions in the respective strings. However, it is well-desired in comparative genomics to find mappings that consider preserving duos that are “closer” to each other under some distance measure [18].

In this paper, we introduce a generalized version of the problem, called the Maximum-Weight Duo-preservation String Mapping (MWDSM) problem, capturing both duos-preservation and duos-distance measures in the sense that mapping a duo from $$s_1$$ to each preserved duo in $$s_2$$ has a weight, indicating the “closeness” of the two duos. The objective of the MWDSM problem is to find a mapping so as to maximize the total weight of preserved duos. We give a polynomial-time 6-approximation algorithm for this problem.

## References

1. 1.
Bar-Yehuda, R., Even, S.: A local-ratio theorem for approximating the weighted vertex cover problem. In: Ausiello, G., Lucertini, M. (eds.) Analysis and Design of Algorithms for Combinatorial Problems, vol. 109, pp. 27–45. North-Holland (1985)Google Scholar
2. 2.
Beretta, S., Castelli, M., Dondi, R.: Parameterized tractability of the maximum-duo preservation string mapping problem. Theor. Comput. Sci. 646, 16–25 (2016)
3. 3.
Boria, N., Cabodi, G., Camurati, P., Palena, M., Pasini, P., Quer, S.: A 7/2-approximation algorithm for the maximum duo-preservation string mapping problem. In: Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), Tel Aviv, Israel, pp. 11:1–11:8 (2016)Google Scholar
4. 4.
Boria, N., Kurpisz, A., Leppänen, S., Mastrolilli, M.: Improved approximation for the maximum duo-preservation string mapping problem. In: Brown, D., Morgenstern, B. (eds.) WABI 2014. LNCS, vol. 8701, pp. 14–25. Springer, Heidelberg (2014). doi: Google Scholar
5. 5.
Brubach, B.: Further improvement in approximating the maximum duo-preservation string mapping problem. In: Frith, M., Storm Pedersen, C.N. (eds.) WABI 2016. LNCS, vol. 9838, pp. 52–64. Springer, Cham (2016). doi:
6. 6.
Bulteau, L., Fertin, G., Komusiewicz, C., Rusu, I.: A fixed-parameter algorithm for minimum common string partition with few duplications. In: Darling, A., Stoye, J. (eds.) WABI 2013. LNCS, vol. 8126, pp. 244–258. Springer, Heidelberg (2013). doi:
7. 7.
Bulteau, L., Komusiewicz, C.: Minimum common string partition parameterized by partition size is fixed-parameter tractable. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2014), Portland, Oregon, USA, pp. 102–121 (2014)Google Scholar
8. 8.
Chan, T.M., Har-Peled, S.: Approximation algorithms for maximum independent set of pseudo-disks. Discrete Comput. Geometry 48(2), 373–392 (2012)
9. 9.
Chen, W., Chen, Z., Samatova, N.F., Peng, L., Wang, J., Tang, M.: Solving the maximum duo-preservation string mapping problem with linear programming. Theor. Comput. Sci. 530, 1–11 (2014)
10. 10.
Chen, X., Zheng, J., Zheng, F., Nan, P., Zhong, Y., Lonardi, S., Jiang, T.: Assignment of orthologous genes via genome rearrangement. IEEE/ACM Trans. Comput. Biology Bioinform. 2(4), 302–315 (2005)
11. 11.
Chrobak, M., Kolman, P., Sgall, J.: The greedy algorithm for the minimum common string partition problem. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) APPROX/RANDOM -2004. LNCS, vol. 3122, pp. 84–95. Springer, Heidelberg (2004). doi:
12. 12.
Cormode, G., Muthukrishnan, S.: The string edit distance matching problem with moves. ACM Trans. Algorithms 3(1), 2:1–2:19 (2007)
13. 13.
Dudek, B., Gawrychowski, P., Ostropolski-Nalewaja, P.: A family of approximation algorithms for the maximum duo-preservation string mapping problem. CoRR, abs/1702.02405 (2017)Google Scholar
14. 14.
Goldstein, A., Kolman, P., Zheng, J.: Minimum common string partition problem: hardness and approximations. Electr. J. Comb. 12 (2005)Google Scholar
15. 15.
Hardison, R.C.: Comparative genomics. PLoS Biol. 1(2), e58 (2003)
16. 16.
Jiang, H., Zhu, B., Zhu, D., Zhu, H.: Minimum common string partition revisited. J. Comb. Optim. 23(4), 519–527 (2012)
17. 17.
Kolman, P., Walen, T.: Reversal distance for strings with duplicates: linear time approximation using hitting set. Electr. J. Comb. 14(1) (2007)Google Scholar
18. 18.
Mushegian, A.R.: Foundations of Comparative Genomics. Academic Press (AP), Cambridge (2007)Google Scholar
19. 19.
Mustafa, N.H., Ray, S.: Improved results on geometric hitting set problems. Discrete Comput. Geometry 44(4), 883–895 (2010)
20. 20.
Swenson, K.M., Marron, M., Earnest-DeYoung, J.V., Moret, B.M.E.: Approximating the true evolutionary distance between two genomes. ACM J. Experimental Algorithmics 12, 3.5:1–3.5:17 (2008)