Abstract
Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing gaps in decompositions and errors in palindromes, and also imposing a lower bound to the length of acceptable palindromes.
We first present an algorithm for obtaining a palindromic decomposition of a string of length n with the minimal total gap length in time \(\mathcal {O}(n \log {n} \cdot g)\) and space \(\mathcal {O}(n \cdot g)\), where g is the number of allowed gaps in the decomposition. We then consider a decomposition of the string in maximal \(\delta \)-palindromes (i.e. palindromes with \(\delta \) errors under the edit or Hamming distance) and g allowed gaps. We present an algorithm to obtain such a decomposition with the minimal total gap length in time \(\mathcal {O}(n \cdot (g+\delta ))\) and space \(\mathcal {O}(n\cdot g)\).
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
M. Alzamel is supported by the Saudi Ministry of Higher Education.
P. Charalampopoulos is supported by the Graduate Teaching Scholarship scheme of the Department of Informatics at King’s College London.
J. Radoszewski is a Newton International Fellow and is supported by the Polish Ministry of Science and Higher Education under the ‘Iuventus Plus’ program grant no. 0392/IP3/2015/73.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alatabbi, A., Iliopoulos, C.S., Rahman, M.S.: Maximal palindromic factorization. In: Stringology, pp. 70–77 (2013)
Apostolico, A., Breslauer, D., Galil, Z.: Parallel detection of all palindromes in a string. Theor. Comput. Sci. 141(1), 163–173 (1995). http://dx.doi.org/10.1016/0304-3975(94)00083-U
Breslauer, D., Galil, Z.: Finding all periods and initial palindromes of a string in parallel. Algorithmica 14(4), 355–366 (1995). http://dx.doi.org/10.1007/BF01294132
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2003)
Droubay, X.: Palindromes in the Fibonacci word. Inf. Process. Lett. 55(4), 217–221 (1995). http://dx.doi.org/10.1016/0020-0190(95)00080-V
Droubay, X., Pirillo, G.: Palindromes and Sturmian words. Theor. Comput. Sci. 223(1–2), 73–85 (1999). http://dx.doi.org/10.1016/S0304-3975(97)00188–6
Fici, G., Gagie, T., Kärkkäinen, J., Kempa, D.: A subquadratic algorithm for minimum palindromic factorization. J. Discret. Algorithms 28(C), 41–48 (2014). http://dx.doi.org/10.1016/j.jda.2014.08.001
Frid, A., Puzynina, S., Zamboni, L.: On palindromic factorization of words. Adv. Appl. Math. 50(5), 737–748 (2013). http://dx.doi.org/10.1016/j.aam.2013.01.002
Fujishige, Y., Nakamura, M., Inenaga, S., Bannai, H., Takeda, M.: Finding gapped palindromes online. In: Mäkinen, V., Puglisi, S.J., Salmela, L. (eds.) IWOCA 2016. LNCS, vol. 9843, pp. 191–202. Springer, Cham (2016). doi:10.1007/978-3-319-44543-4_15
Galil, Z.: Real-time algorithms for string-matching and palindrome recognition. In: Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pp. 161–173. ACM (1976). http://doi.acm.org/10.1145/800113.803644
Galil, Z., Seiferas, J.: A linear-time on-line recognition algorithm for “palstar”. J. ACM 25(1), 102–111 (1978). http://doi.acm.org/10.1145/322047.322056
Gupta, S., Prasad, R., Yadav, S.: Searching gapped palindromes in DNA sequences using dynamic suffix array. Indian J. Sci. Technol. 8(23), 1 (2015)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
I, T., Sugimoto, S., Inenaga, S., Bannai, H., Takeda, M.: Computing palindromic factorizations and palindromic covers on-line. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 150–161. Springer, Cham (2014). doi:10.1007/978-3-319-07566-2_16
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Kolpakov, R., Kucherov, G.: Searching for gapped palindromes. Theor. Comput. Sci. 410(51), 5365–5373 (2009). http://dx.doi.org/10.1016/j.tcs.2009.09.013
Kosolobov, D., Rubinchik, M., Shur, A.M.: Palk is linear recognizable online. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, J.-J., Wattenhofer, R. (eds.) SOFSEM 2015. LNCS, vol. 8939, pp. 289–301. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46078-8_24
Manacher, G.: A new linear-time “on-line” algorithm for finding the smallest initial palindrome of a string. J. ACM (JACM) 22(3), 346–351 (1975)
Rubinchik, M., Shur, A.M.: EERTREE: an efficient data structure for processing palindromes in strings. In: Lipták, Z., Smyth, W.F. (eds.) IWOCA 2015. LNCS, vol. 9538, pp. 321–333. Springer, Cham (2016). doi:10.1007/978-3-319-29516-9_27
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
Generalized Palindromic Factorization
In this section we show that the approach of Fici et al. [8] works for generalized palindromes for any involution f. The following auxiliary lemma extends the combinatorial properties of standard palindromes used in [8] (see Lemmas 1–3 therein) to generalized palindromes. Recall that a string y is called a border of a string x if it is both a prefix and a suffix of x. A number p is called a period of x if \(x[i]=x[i+p]\) for all \(i=1,\ldots ,|x|-p\). It is well known that x has a period p iff it has a border of length \(|x|-p\); see [4, 5].
Lemma 13
-
(a)
Let y be a suffix of a generalized palindrome x. Then y is a border of x iff y is a generalized palindrome.
-
(b)
Let x be a string with a border y such that \(| x | \le 2 | y |\). Then x is a generalized palindrome iff y is a generalized palindrome.
-
(c)
Let y be a proper suffix of a generalized palindrome x. Then \(| x | - | y |\) is a period of x iff y is a generalized palindrome. In particular, \(| x | - | y |\) is the smallest period of x iff y is the longest generalized palindromic proper suffix of x.
Proof
(a) Let \(y'\) be the prefix of x of length |y|. As x is a generalized palindrome, \(y'=f(y^R)\). \((\Rightarrow )\) If y is a border of x, then \(y = y' = f(y^R)\), so y is a generalized palindrome. \((\Leftarrow )\) If y is a generalized palindrome, then \(y' = f(y^R) = y\), so y is a border of x.
(b) \((\Rightarrow )\) From (a), if x is a generalized palindrome and y is its border, then y is a generalized palindrome. \((\Leftarrow )\) If y is a generalized palindrome, \(f(x^R)\) has a border \(f(y^R)=y\). This border covers the whole string \(f(x^R)\) and is the same as the border of x, so \(x=f(x^R)\) and x indeed is a generalized palindrome.
(c) This is a consequence of part (a) and the relation between borders and periods of a string. \(\square \)
The crucial combinatorial property of standard palindromes used in Step 1 of the algorithm in Sect. 3 is that the sequence of consecutive differences in \(P_j\) is non-increasing and contains at most \(\mathcal {O}(\log j)\) distinct values. We show that the same observation holds for generalized palindromes; this follows from the next lemma, parts (1) and (2). The proof of Lemma 14 follows exactly the lines of the proof of the corresponding Lemma 4 in [8]; due to space constraints, we refer the reader to Fig. 3 illustrating the proof in [8].
Lemma 14
Let x be a generalized palindrome, y the longest generalized palindromic proper suffix of x, and z the longest generalized palindromic proper suffix of y. Let u and v be strings such that \(x = u y\) and \(y = v z\). Then:
-
(1)
\(| u | \ge | v |\);
-
(2)
if \(| u | > | v |\) then \(| u | > | z |\);
-
(3)
if \(| u | = | v |\) then \(u = v\).
Proof
(1) By Lemma 13(c), \(| u | = | x | - | y |\) is the smallest period of x, and \(| v | = | y | - | z |\) is the smallest period of y. Since y is a factor of x, either \(| u |> | y | > | v |\) or |u| is a period of y too, and thus it cannot be smaller than |v|.
(2) By Lemma 13(a), y is a border of x and thus v is a prefix of x. Let w be a string such that \(x = v w\). Then z is a border of w and \(| w | = | zu |\). Since we assume \(| u | > | v |\), we must have \(| w | > | y |\). Suppose to the contrary that \(| u | \le | z |\). Then \(| w | = | zu | \le 2 | z |\), and by Lemma 13(b), w is a generalized palindrome. But this contradicts y being the longest generalized palindromic proper suffix of x.
(3) In the proof of (2) we saw that v is a prefix of x, and so is u by definition. Thus \(u = v\) if \(| u | = | v |\). \(\square \)
We have thus shown that, also in case of generalized palindromes, the set \(P_j\) can be compactly represented by a set \(G_j\), as described in Sect. 3. To complete Step 1 of the algorithm, we need to show that \(G_j\) can be computed from \(G_{j-1}\) in \(\mathcal {O}(\log j)\) time. For this, just as in [8], we show that each triple \((i,\varDelta ,k) \in G_{j-1}\) will be either eliminated or replaced by \((i-1,\varDelta ,k)\) in \(G_j\). The proof exploits part (3) of Lemma 14.
Lemma 15
Let \(p_i\) and \(p_{i + 1}\) be two consecutive elements of \(P_{j - 1 ,\varDelta }\). Then \(p_i - 1 \in P_j\) iff \(p_{i + 1} - 1 \in P_j\).
Proof
By definition, \(p_{i + 1} - p_i = \varDelta \), and the predecessor of \(p_i\) in \(P_j\) is \(p_{i - 1} = p_i - \varDelta \). The strings \(x=S[p_{i-1} ..j-1]\), \(y=S[p_{i} ..j-1]\), and \(z=S[p_{i+1} ..j-1]\) form the situation of Lemma 14(3). Hence, \(S[p_{i}-1]=S[p_{i+1}-1]=c\). Thus, \(p_i - 1 \in P_j\) iff \(S[j]=f(c)\) iff \(p_{i + 1} - 1 \in P_j\). \(\square \)
After this transformation, one might need to update pairs of adjacent triples in \(G_j\) because the gaps between them might have changed. This simple process is explained in detail in [8] and takes only \(\mathcal {O}(\log j)\) additional time.
As for Step 2 of the algorithm, it suffices to show that the following combinatorial observation holds for generalized palindromes. Again we follow the lines of the proof from [8] (cf. Fig. 5 in that paper).
Lemma 16
If \((i,\varDelta ,k) \in G_j\) and \(k \ge 2\), then \((i,\varDelta ,k-1) \in G_{j-\varDelta }\).
Proof
By definition, \(( i , \varDelta , k ) \in G_j\) is equivalent to saying that \(P_{j ,\varDelta } = \{ i , i + \varDelta , \ldots , i + ( k - 1 )\varDelta \}\), and we need to show that \(P_{j -\varDelta ,\varDelta } = \{ i , i + \varDelta , \ldots , i + ( k - 2 )\varDelta \}\). We will show first that \(P_{j -\varDelta ,\varDelta } \cap [ i - \varDelta + 1 ..j - \varDelta ] = \{ i , i + \varDelta , \ldots , i + ( k - 2 )\varDelta \}\) and then that \(P_{j -\varDelta ,\varDelta } \cap [ 1 ..i - \varDelta ] = \emptyset \).
Since \(y = S [ i ..j ]\) and \(x = S [ i - \varDelta ..j ]\) are generalized palindromes and y is the longest proper border of x (by Lemma 13(a)), \(S [ i - \varDelta ..j - \varDelta ] = y = S [ i ..j ]\). Thus for all \(\ell \in [ i ..j ]\), \(\ell \in P_j\) iff \(\ell - \varDelta \in P_{j -\varDelta }\). In particular, the consecutive differences in both cases are the same and for all \(\ell \in [ i + 1 ..j ]\), \(\ell \in P_{j ,\varDelta }\) iff \(\ell - \varDelta \in P_{j -\varDelta ,\varDelta }\). Thus \(P_{j -\varDelta ,\varDelta } \cap [ i - \varDelta + 1 ..j - \varDelta ] = \{ i , i + \varDelta , \ldots , i + ( k - 2 )\varDelta \}\).
We still need to show that \(P_{j -\varDelta ,\varDelta } \cap [ 1 ..i - \varDelta ] = \emptyset \), which is true if and only if \(i - 2 \varDelta \not \in P_{j -\varDelta }\). Suppose to the contrary that \(S [ i - 2 \varDelta ..j -\varDelta ]\) is a generalized palindrome and let \(w = S [ i - 2 \varDelta ..i - \varDelta - 1 ]\). Then \(S [ j - 2 \varDelta + 1 ..j - \varDelta ] = f(w^R)\). Since \(z = S [ i - \varDelta ..j - \varDelta ]\) and \(S [ i - \varDelta ..j ]\) are generalized palindromes too, we have that \(S [ i - \varDelta ..i - 1 ] = w\) and \(S [ j - \varDelta + 1 ..j ] = f(w^R)\). Finally, since z is a generalized palindrome, \(S [ i - 2 \varDelta ..j ] = w zf(w^R)\) is a generalized palindrome. This implies that \(i - 2 \varDelta \in P_j\) and thus \(i - \varDelta \in P_{j ,\varDelta }\), which is a contradiction. \(\square \)
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Adamczyk, M., Alzamel, M., Charalampopoulos, P., Iliopoulos, C.S., Radoszewski, J. (2017). Palindromic Decompositions with Gaps and Errors. In: Weil, P. (eds) Computer Science – Theory and Applications. CSR 2017. Lecture Notes in Computer Science(), vol 10304. Springer, Cham. https://doi.org/10.1007/978-3-319-58747-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-58747-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58746-2
Online ISBN: 978-3-319-58747-9
eBook Packages: Computer ScienceComputer Science (R0)