Abstract
We deal with a variant of the well-known Longest Common Subsequence (LCS) problem for weighted sequences. A (biological) weighted sequence determines the probability for each symbol to occur at a given position of the sequence (such sequences are also called Position Weighted Matrices, PWM). Two possible such versions of the problem were proposed by (Amir et al., 2009 and 2010), they are called LCWS and LCWS2 (Longest Common Weighted Subsequence 1 and 2 Problem). We solve an open problem, stated in conclusions of the paper by Amir et al., of the tractability of a log-probability version of LCWS2 problem for bounded alphabets, showing that it is NP-hard already for an alphabet of size 2. We also improve the (1/|Σ|)-approximation algorithm given by Amir et al. (where Σ is the alphabet): we show a polynomial-time approximation scheme (PTAS) for the LCWS2 problem using O(n 5) space. We also give a simpler (1/2)-approximation algorithm for the same problem using only O(n 2) space.
The first author is supported by grant no. N206 355636 of the Polish Ministry of Science and Higher Education. The third author is supported by grant no. N206 568540 of the National Science Centre. The fourth author is supported by grant no. N206 566740 of the National Science Centre.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. Theor. Comput. Sci. 395(2-3), 298–310 (2008)
Amir, A., Gotthilf, Z., Shalom, B.R.: Weighted LCS. J. Discrete Algorithms 8, 273–281 (2010)
Amir, A., Iliopoulos, C.S., Kapah, O., Porat, E.: Approximate matching in weighted sequences. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 365–376. Springer, Heidelberg (2006)
Antoniou, P., Iliopoulos, C.S., Mouchard, L., Pissis, S.P.: Algorithms for mapping short degenerate and weighted sequences to a reference genome. I. J. Computational Biology and Drug Design 2(4), 385–397 (2009)
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: SPIRE, pp. 39–48 (2000)
Christodoulakis, M., Iliopoulos, C.S., Mouchard, L., Perdikuri, K., Tsakalidis, A.K., Tsichlas, K.: Computation of repetitions and regularities of biologically weighted sequences. Journal of Computational Biology 13(6), 1214–1231 (2006)
Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Inf. Process. Lett. 12(5), 244–250 (1981)
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2003)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York (1979)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Iliopoulos, C.S., Makris, C., Panagis, Y., Perdikuri, K., Theodoridis, E., Tsakalidis, A.K.: The weighted suffix tree: An efficient data structure for handling molecular weighted sequences and its applications. Fundam. Inform. 71(2-3), 259–277 (2006)
Iliopoulos, C.S., Miller, M., Pissis, S.P.: Parallel algorithms for degenerate and weighted sequences derived from high throughput sequencing technologies. In: Holub, J., Zdárek, J. (eds.) Stringology, pp. 249–262. Prague Stringology Club, Department of Computer Science and Engineering, Faculty of Electrical Engineering, Czech Technical University in Prague (2009)
Iliopoulos, C.S., Mouchard, L., Perdikuri, K., Tsakalidis, A.K.: Computing the repetitions in a biological weighted sequence. Journal of Automata, Languages and Combinatorics 10(5/6), 687–696 (2005)
Iliopoulos, C.S., Perdikuri, K., Theodoridis, E., Tsakalidis, A., Tsichlas, K.: Motif extraction from weighted sequences. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 286–297. Springer, Heidelberg (2004)
Myers, E.W., Celera Genomics Corporation: A whole-genome assembly of drosophila 287(5461), 2196–2204 (2000)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)
Venter, J.C., Celera Genomics Corporation: The sequence of the human genome. Science 291, 1304–1351 (2001)
Zhang, H., Guo, Q., Fan, J., Iliopoulos, C.S.: Loose and strict repeats in weighted sequences of proteins. Protein and Peptide Letters 17(9), 1136–1142(7) (2010)
Zhang, H., Guo, Q., Iliopoulos, C.S.: String matching with swaps in a weighted sequence. In: Zhang, J., He, J.-H., Fu, Y. (eds.) CIS 2004. LNCS, vol. 3314, pp. 698–704. Springer, Heidelberg (2004)
Zhang, H., Guo, Q., Iliopoulos, C.S.: An algorithmic framework for motif discovery problems in weighted sequences. In: Calamoneri, T., Diaz, J. (eds.) CIAC 2010. LNCS, vol. 6078, pp. 335–346. Springer, Heidelberg (2010)
Zhang, H., Guo, Q., Iliopoulos, C.S.: Varieties of regularities in weighted sequences. In: Chen, B. (ed.) AAIM 2010. LNCS, vol. 6124, pp. 271–280. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cygan, M., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T. (2011). Polynomial-Time Approximation Algorithms for Weighted LCS Problem. In: Giancarlo, R., Manzini, G. (eds) Combinatorial Pattern Matching. CPM 2011. Lecture Notes in Computer Science, vol 6661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21458-5_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-21458-5_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21457-8
Online ISBN: 978-3-642-21458-5
eBook Packages: Computer ScienceComputer Science (R0)