Approximating Weighted Duo-Preservation in Comparative Genomics
Motivated by comparative genomics, Chen et al.  introduced the Maximum Duo-preservation String Mapping (MDSM) problem in which we are given two strings \(s_1\) and \(s_2\) from the same alphabet and the goal is to find a mapping \(\pi \) between them so as to maximize the number of duos preserved. A duo is any two consecutive characters in a string and it is preserved in the mapping if its two consecutive characters in \(s_1\) are mapped to same two consecutive characters in \(s_2\). The MDSM problem is known to be NP-hard and there are approximation algorithms for this problem [3, 5], all of which consider only the “unweighted” version of the problem in the sense that a duo from \(s_1\) is preserved by mapping to any same duo in \(s_2\) regardless of their positions in the respective strings. However, it is well-desired in comparative genomics to find mappings that consider preserving duos that are “closer” to each other under some distance measure .
In this paper, we introduce a generalized version of the problem, called the Maximum-Weight Duo-preservation String Mapping (MWDSM) problem, capturing both duos-preservation and duos-distance measures in the sense that mapping a duo from \(s_1\) to each preserved duo in \(s_2\) has a weight, indicating the “closeness” of the two duos. The objective of the MWDSM problem is to find a mapping so as to maximize the total weight of preserved duos. We give a polynomial-time 6-approximation algorithm for this problem.
- 1.Bar-Yehuda, R., Even, S.: A local-ratio theorem for approximating the weighted vertex cover problem. In: Ausiello, G., Lucertini, M. (eds.) Analysis and Design of Algorithms for Combinatorial Problems, vol. 109, pp. 27–45. North-Holland (1985)Google Scholar
- 3.Boria, N., Cabodi, G., Camurati, P., Palena, M., Pasini, P., Quer, S.: A 7/2-approximation algorithm for the maximum duo-preservation string mapping problem. In: Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016), Tel Aviv, Israel, pp. 11:1–11:8 (2016)Google Scholar
- 7.Bulteau, L., Komusiewicz, C.: Minimum common string partition parameterized by partition size is fixed-parameter tractable. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2014), Portland, Oregon, USA, pp. 102–121 (2014)Google Scholar
- 13.Dudek, B., Gawrychowski, P., Ostropolski-Nalewaja, P.: A family of approximation algorithms for the maximum duo-preservation string mapping problem. CoRR, abs/1702.02405 (2017)Google Scholar
- 14.Goldstein, A., Kolman, P., Zheng, J.: Minimum common string partition problem: hardness and approximations. Electr. J. Comb. 12 (2005)Google Scholar
- 17.Kolman, P., Walen, T.: Reversal distance for strings with duplicates: linear time approximation using hitting set. Electr. J. Comb. 14(1) (2007)Google Scholar
- 18.Mushegian, A.R.: Foundations of Comparative Genomics. Academic Press (AP), Cambridge (2007)Google Scholar