Palindromic Decompositions with Gaps and Errors

Adamczyk, Michał; Alzamel, Mai; Charalampopoulos, Panagiotis; Iliopoulos, Costas S.; Radoszewski, Jakub

doi:10.1007/978-3-319-58747-9_7

Palindromic Decompositions with Gaps and Errors

Michał Adamczyk¹⁴,
Mai Alzamel¹⁵,
Panagiotis Charalampopoulos¹⁵,
Costas S. Iliopoulos¹⁵ &
…
Jakub Radoszewski^14,15

Conference paper
First Online: 06 May 2017

666 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10304))

Abstract

Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing gaps in decompositions and errors in palindromes, and also imposing a lower bound to the length of acceptable palindromes.

We first present an algorithm for obtaining a palindromic decomposition of a string of length n with the minimal total gap length in time \(\mathcal {O}(n \log {n} \cdot g)\) and space \(\mathcal {O}(n \cdot g)\), where g is the number of allowed gaps in the decomposition. We then consider a decomposition of the string in maximal \(\delta \)-palindromes (i.e. palindromes with \(\delta \) errors under the edit or Hamming distance) and g allowed gaps. We present an algorithm to obtain such a decomposition with the minimal total gap length in time \(\mathcal {O}(n \cdot (g+\delta ))\) and space \(\mathcal {O}(n\cdot g)\).

M. Alzamel is supported by the Saudi Ministry of Higher Education.

P. Charalampopoulos is supported by the Graduate Teaching Scholarship scheme of the Department of Informatics at King’s College London.

J. Radoszewski is a Newton International Fellow and is supported by the Polish Ministry of Science and Higher Education under the ‘Iuventus Plus’ program grant no. 0392/IP3/2015/73.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
See http://www.cesshiv1.org/disview.php?accession=AB220944.

References

Alatabbi, A., Iliopoulos, C.S., Rahman, M.S.: Maximal palindromic factorization. In: Stringology, pp. 70–77 (2013)
Google Scholar
Apostolico, A., Breslauer, D., Galil, Z.: Parallel detection of all palindromes in a string. Theor. Comput. Sci. 141(1), 163–173 (1995). http://dx.doi.org/10.1016/0304-3975(94)00083-U
Article MATH Google Scholar
Breslauer, D., Galil, Z.: Finding all periods and initial palindromes of a string in parallel. Algorithmica 14(4), 355–366 (1995). http://dx.doi.org/10.1007/BF01294132
Article MathSciNet MATH Google Scholar
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)
Book MATH Google Scholar
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2003)
MATH Google Scholar
Droubay, X.: Palindromes in the Fibonacci word. Inf. Process. Lett. 55(4), 217–221 (1995). http://dx.doi.org/10.1016/0020-0190(95)00080-V
Article MathSciNet MATH Google Scholar
Droubay, X., Pirillo, G.: Palindromes and Sturmian words. Theor. Comput. Sci. 223(1–2), 73–85 (1999). http://dx.doi.org/10.1016/S0304-3975(97)00188–6
Article MathSciNet MATH Google Scholar
Fici, G., Gagie, T., Kärkkäinen, J., Kempa, D.: A subquadratic algorithm for minimum palindromic factorization. J. Discret. Algorithms 28(C), 41–48 (2014). http://dx.doi.org/10.1016/j.jda.2014.08.001
Article MathSciNet MATH Google Scholar
Frid, A., Puzynina, S., Zamboni, L.: On palindromic factorization of words. Adv. Appl. Math. 50(5), 737–748 (2013). http://dx.doi.org/10.1016/j.aam.2013.01.002
Article MathSciNet MATH Google Scholar
Fujishige, Y., Nakamura, M., Inenaga, S., Bannai, H., Takeda, M.: Finding gapped palindromes online. In: Mäkinen, V., Puglisi, S.J., Salmela, L. (eds.) IWOCA 2016. LNCS, vol. 9843, pp. 191–202. Springer, Cham (2016). doi:10.1007/978-3-319-44543-4_15
Chapter Google Scholar
Galil, Z.: Real-time algorithms for string-matching and palindrome recognition. In: Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pp. 161–173. ACM (1976). http://doi.acm.org/10.1145/800113.803644
Galil, Z., Seiferas, J.: A linear-time on-line recognition algorithm for “palstar”. J. ACM 25(1), 102–111 (1978). http://doi.acm.org/10.1145/322047.322056
Article MathSciNet MATH Google Scholar
Gupta, S., Prasad, R., Yadav, S.: Searching gapped palindromes in DNA sequences using dynamic suffix array. Indian J. Sci. Technol. 8(23), 1 (2015)
Article Google Scholar
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
Book MATH Google Scholar
I, T., Sugimoto, S., Inenaga, S., Bannai, H., Takeda, M.: Computing palindromic factorizations and palindromic covers on-line. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 150–161. Springer, Cham (2014). doi:10.1007/978-3-319-07566-2_16
Google Scholar
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Article MathSciNet MATH Google Scholar
Kolpakov, R., Kucherov, G.: Searching for gapped palindromes. Theor. Comput. Sci. 410(51), 5365–5373 (2009). http://dx.doi.org/10.1016/j.tcs.2009.09.013
Article MathSciNet MATH Google Scholar
Kosolobov, D., Rubinchik, M., Shur, A.M.: Pal^k is linear recognizable online. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, J.-J., Wattenhofer, R. (eds.) SOFSEM 2015. LNCS, vol. 8939, pp. 289–301. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46078-8_24
Google Scholar
Manacher, G.: A new linear-time “on-line” algorithm for finding the smallest initial palindrome of a string. J. ACM (JACM) 22(3), 346–351 (1975)
Article MATH Google Scholar
Rubinchik, M., Shur, A.M.: EERTREE: an efficient data structure for processing palindromes in strings. In: Lipták, Z., Smyth, W.F. (eds.) IWOCA 2015. LNCS, vol. 9538, pp. 321–333. Springer, Cham (2016). doi:10.1007/978-3-319-29516-9_27
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
Michał Adamczyk & Jakub Radoszewski
Department of Informatics, King’s College London, London, UK
Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos & Jakub Radoszewski

Authors

Michał Adamczyk
View author publications
You can also search for this author in PubMed Google Scholar
Mai Alzamel
View author publications
You can also search for this author in PubMed Google Scholar
Panagiotis Charalampopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Costas S. Iliopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Radoszewski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Radoszewski .

Editor information

Editors and Affiliations

University of Bordeaux , Talence, France
Pascal Weil

A Appendix

Generalized Palindromic Factorization

In this section we show that the approach of Fici et al. [8] works for generalized palindromes for any involution f. The following auxiliary lemma extends the combinatorial properties of standard palindromes used in [8] (see Lemmas 1–3 therein) to generalized palindromes. Recall that a string y is called a border of a string x if it is both a prefix and a suffix of x. A number p is called a period of x if \(x[i]=x[i+p]\) for all \(i=1,\ldots ,|x|-p\). It is well known that x has a period p iff it has a border of length \(|x|-p\); see [4, 5].

Lemma 13

(a)
Let y be a suffix of a generalized palindrome x. Then y is a border of x iff y is a generalized palindrome.
(b)
Let x be a string with a border y such that \(| x | \le 2 | y |\). Then x is a generalized palindrome iff y is a generalized palindrome.
(c)
Let y be a proper suffix of a generalized palindrome x. Then \(| x | - | y |\) is a period of x iff y is a generalized palindrome. In particular, \(| x | - | y |\) is the smallest period of x iff y is the longest generalized palindromic proper suffix of x.

Proof

(a) Let \(y'\) be the prefix of x of length |y|. As x is a generalized palindrome, \(y'=f(y^R)\). \((\Rightarrow )\) If y is a border of x, then \(y = y' = f(y^R)\), so y is a generalized palindrome. \((\Leftarrow )\) If y is a generalized palindrome, then \(y' = f(y^R) = y\), so y is a border of x.

(b) \((\Rightarrow )\) From (a), if x is a generalized palindrome and y is its border, then y is a generalized palindrome. \((\Leftarrow )\) If y is a generalized palindrome, \(f(x^R)\) has a border \(f(y^R)=y\). This border covers the whole string \(f(x^R)\) and is the same as the border of x, so \(x=f(x^R)\) and x indeed is a generalized palindrome.

(c) This is a consequence of part (a) and the relation between borders and periods of a string. \(\square \)

The crucial combinatorial property of standard palindromes used in Step 1 of the algorithm in Sect. 3 is that the sequence of consecutive differences in \(P_j\) is non-increasing and contains at most \(\mathcal {O}(\log j)\) distinct values. We show that the same observation holds for generalized palindromes; this follows from the next lemma, parts (1) and (2). The proof of Lemma 14 follows exactly the lines of the proof of the corresponding Lemma 4 in [8]; due to space constraints, we refer the reader to Fig. 3 illustrating the proof in [8].

Lemma 14

Let x be a generalized palindrome, y the longest generalized palindromic proper suffix of x, and z the longest generalized palindromic proper suffix of y. Let u and v be strings such that \(x = u y\) and \(y = v z\). Then:

(1)
\(| u | \ge | v |\);
(2)
if \(| u | > | v |\) then \(| u | > | z |\);
(3)
if \(| u | = | v |\) then \(u = v\).

Proof

(1) By Lemma 13(c), \(| u | = | x | - | y |\) is the smallest period of x, and \(| v | = | y | - | z |\) is the smallest period of y. Since y is a factor of x, either \(| u |> | y | > | v |\) or |u| is a period of y too, and thus it cannot be smaller than |v|.

(2) By Lemma 13(a), y is a border of x and thus v is a prefix of x. Let w be a string such that \(x = v w\). Then z is a border of w and \(| w | = | zu |\). Since we assume \(| u | > | v |\), we must have \(| w | > | y |\). Suppose to the contrary that \(| u | \le | z |\). Then \(| w | = | zu | \le 2 | z |\), and by Lemma 13(b), w is a generalized palindrome. But this contradicts y being the longest generalized palindromic proper suffix of x.

(3) In the proof of (2) we saw that v is a prefix of x, and so is u by definition. Thus \(u = v\) if \(| u | = | v |\). \(\square \)

We have thus shown that, also in case of generalized palindromes, the set \(P_j\) can be compactly represented by a set \(G_j\), as described in Sect. 3. To complete Step 1 of the algorithm, we need to show that \(G_j\) can be computed from \(G_{j-1}\) in \(\mathcal {O}(\log j)\) time. For this, just as in [8], we show that each triple \((i,\varDelta ,k) \in G_{j-1}\) will be either eliminated or replaced by \((i-1,\varDelta ,k)\) in \(G_j\). The proof exploits part (3) of Lemma 14.

Lemma 15

Let \(p_i\) and \(p_{i + 1}\) be two consecutive elements of \(P_{j - 1 ,\varDelta }\). Then \(p_i - 1 \in P_j\) iff \(p_{i + 1} - 1 \in P_j\).

Proof

By definition, \(p_{i + 1} - p_i = \varDelta \), and the predecessor of \(p_i\) in \(P_j\) is \(p_{i - 1} = p_i - \varDelta \). The strings \(x=S[p_{i-1} ..j-1]\), \(y=S[p_{i} ..j-1]\), and \(z=S[p_{i+1} ..j-1]\) form the situation of Lemma 14(3). Hence, \(S[p_{i}-1]=S[p_{i+1}-1]=c\). Thus, \(p_i - 1 \in P_j\) iff \(S[j]=f(c)\) iff \(p_{i + 1} - 1 \in P_j\). \(\square \)

After this transformation, one might need to update pairs of adjacent triples in \(G_j\) because the gaps between them might have changed. This simple process is explained in detail in [8] and takes only \(\mathcal {O}(\log j)\) additional time.

As for Step 2 of the algorithm, it suffices to show that the following combinatorial observation holds for generalized palindromes. Again we follow the lines of the proof from [8] (cf. Fig. 5 in that paper).

Lemma 16

If \((i,\varDelta ,k) \in G_j\) and \(k \ge 2\), then \((i,\varDelta ,k-1) \in G_{j-\varDelta }\).

Proof

By definition, \(( i , \varDelta , k ) \in G_j\) is equivalent to saying that \(P_{j ,\varDelta } = \{ i , i + \varDelta , \ldots , i + ( k - 1 )\varDelta \}\), and we need to show that \(P_{j -\varDelta ,\varDelta } = \{ i , i + \varDelta , \ldots , i + ( k - 2 )\varDelta \}\). We will show first that \(P_{j -\varDelta ,\varDelta } \cap [ i - \varDelta + 1 ..j - \varDelta ] = \{ i , i + \varDelta , \ldots , i + ( k - 2 )\varDelta \}\) and then that \(P_{j -\varDelta ,\varDelta } \cap [ 1 ..i - \varDelta ] = \emptyset \).

Since \(y = S [ i ..j ]\) and \(x = S [ i - \varDelta ..j ]\) are generalized palindromes and y is the longest proper border of x (by Lemma 13(a)), \(S [ i - \varDelta ..j - \varDelta ] = y = S [ i ..j ]\). Thus for all \(\ell \in [ i ..j ]\), \(\ell \in P_j\) iff \(\ell - \varDelta \in P_{j -\varDelta }\). In particular, the consecutive differences in both cases are the same and for all \(\ell \in [ i + 1 ..j ]\), \(\ell \in P_{j ,\varDelta }\) iff \(\ell - \varDelta \in P_{j -\varDelta ,\varDelta }\). Thus \(P_{j -\varDelta ,\varDelta } \cap [ i - \varDelta + 1 ..j - \varDelta ] = \{ i , i + \varDelta , \ldots , i + ( k - 2 )\varDelta \}\).

We still need to show that \(P_{j -\varDelta ,\varDelta } \cap [ 1 ..i - \varDelta ] = \emptyset \), which is true if and only if \(i - 2 \varDelta \not \in P_{j -\varDelta }\). Suppose to the contrary that \(S [ i - 2 \varDelta ..j -\varDelta ]\) is a generalized palindrome and let \(w = S [ i - 2 \varDelta ..i - \varDelta - 1 ]\). Then \(S [ j - 2 \varDelta + 1 ..j - \varDelta ] = f(w^R)\). Since \(z = S [ i - \varDelta ..j - \varDelta ]\) and \(S [ i - \varDelta ..j ]\) are generalized palindromes too, we have that \(S [ i - \varDelta ..i - 1 ] = w\) and \(S [ j - \varDelta + 1 ..j ] = f(w^R)\). Finally, since z is a generalized palindrome, \(S [ i - 2 \varDelta ..j ] = w zf(w^R)\) is a generalized palindrome. This implies that \(i - 2 \varDelta \in P_j\) and thus \(i - \varDelta \in P_{j ,\varDelta }\), which is a contradiction. \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adamczyk, M., Alzamel, M., Charalampopoulos, P., Iliopoulos, C.S., Radoszewski, J. (2017). Palindromic Decompositions with Gaps and Errors. In: Weil, P. (eds) Computer Science – Theory and Applications. CSR 2017. Lecture Notes in Computer Science(), vol 10304. Springer, Cham. https://doi.org/10.1007/978-3-319-58747-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-58747-9_7
Published: 06 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58746-2
Online ISBN: 978-3-319-58747-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Lemma 13

Proof

Lemma 14

Proof

Lemma 15

Proof

Lemma 16

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation