Skip to main content

Palindromic Decompositions with Gaps and Errors

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10304))

Abstract

Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing gaps in decompositions and errors in palindromes, and also imposing a lower bound to the length of acceptable palindromes.

We first present an algorithm for obtaining a palindromic decomposition of a string of length n with the minimal total gap length in time \(\mathcal {O}(n \log {n} \cdot g)\) and space \(\mathcal {O}(n \cdot g)\), where g is the number of allowed gaps in the decomposition. We then consider a decomposition of the string in maximal \(\delta \)-palindromes (i.e. palindromes with \(\delta \) errors under the edit or Hamming distance) and g allowed gaps. We present an algorithm to obtain such a decomposition with the minimal total gap length in time \(\mathcal {O}(n \cdot (g+\delta ))\) and space \(\mathcal {O}(n\cdot g)\).

M. Alzamel is supported by the Saudi Ministry of Higher Education.

P. Charalampopoulos is supported by the Graduate Teaching Scholarship scheme of the Department of Informatics at King’s College London.

J. Radoszewski is a Newton International Fellow and is supported by the Polish Ministry of Science and Higher Education under the ‘Iuventus Plus’ program grant no. 0392/IP3/2015/73.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    See http://www.cesshiv1.org/disview.php?accession=AB220944.

References

  1. Alatabbi, A., Iliopoulos, C.S., Rahman, M.S.: Maximal palindromic factorization. In: Stringology, pp. 70–77 (2013)

    Google Scholar 

  2. Apostolico, A., Breslauer, D., Galil, Z.: Parallel detection of all palindromes in a string. Theor. Comput. Sci. 141(1), 163–173 (1995). http://dx.doi.org/10.1016/0304-3975(94)00083-U

    Article  MATH  Google Scholar 

  3. Breslauer, D., Galil, Z.: Finding all periods and initial palindromes of a string in parallel. Algorithmica 14(4), 355–366 (1995). http://dx.doi.org/10.1007/BF01294132

    Article  MathSciNet  MATH  Google Scholar 

  4. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)

    Book  MATH  Google Scholar 

  5. Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2003)

    MATH  Google Scholar 

  6. Droubay, X.: Palindromes in the Fibonacci word. Inf. Process. Lett. 55(4), 217–221 (1995). http://dx.doi.org/10.1016/0020-0190(95)00080-V

    Article  MathSciNet  MATH  Google Scholar 

  7. Droubay, X., Pirillo, G.: Palindromes and Sturmian words. Theor. Comput. Sci. 223(1–2), 73–85 (1999). http://dx.doi.org/10.1016/S0304-3975(97)00188–6

    Article  MathSciNet  MATH  Google Scholar 

  8. Fici, G., Gagie, T., Kärkkäinen, J., Kempa, D.: A subquadratic algorithm for minimum palindromic factorization. J. Discret. Algorithms 28(C), 41–48 (2014). http://dx.doi.org/10.1016/j.jda.2014.08.001

    Article  MathSciNet  MATH  Google Scholar 

  9. Frid, A., Puzynina, S., Zamboni, L.: On palindromic factorization of words. Adv. Appl. Math. 50(5), 737–748 (2013). http://dx.doi.org/10.1016/j.aam.2013.01.002

    Article  MathSciNet  MATH  Google Scholar 

  10. Fujishige, Y., Nakamura, M., Inenaga, S., Bannai, H., Takeda, M.: Finding gapped palindromes online. In: Mäkinen, V., Puglisi, S.J., Salmela, L. (eds.) IWOCA 2016. LNCS, vol. 9843, pp. 191–202. Springer, Cham (2016). doi:10.1007/978-3-319-44543-4_15

    Chapter  Google Scholar 

  11. Galil, Z.: Real-time algorithms for string-matching and palindrome recognition. In: Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pp. 161–173. ACM (1976). http://doi.acm.org/10.1145/800113.803644

  12. Galil, Z., Seiferas, J.: A linear-time on-line recognition algorithm for “palstar”. J. ACM 25(1), 102–111 (1978). http://doi.acm.org/10.1145/322047.322056

    Article  MathSciNet  MATH  Google Scholar 

  13. Gupta, S., Prasad, R., Yadav, S.: Searching gapped palindromes in DNA sequences using dynamic suffix array. Indian J. Sci. Technol. 8(23), 1 (2015)

    Article  Google Scholar 

  14. Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)

    Book  MATH  Google Scholar 

  15. I, T., Sugimoto, S., Inenaga, S., Bannai, H., Takeda, M.: Computing palindromic factorizations and palindromic covers on-line. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 150–161. Springer, Cham (2014). doi:10.1007/978-3-319-07566-2_16

    Google Scholar 

  16. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  17. Kolpakov, R., Kucherov, G.: Searching for gapped palindromes. Theor. Comput. Sci. 410(51), 5365–5373 (2009). http://dx.doi.org/10.1016/j.tcs.2009.09.013

    Article  MathSciNet  MATH  Google Scholar 

  18. Kosolobov, D., Rubinchik, M., Shur, A.M.: Palk is linear recognizable online. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, J.-J., Wattenhofer, R. (eds.) SOFSEM 2015. LNCS, vol. 8939, pp. 289–301. Springer, Heidelberg (2015). doi:10.1007/978-3-662-46078-8_24

    Google Scholar 

  19. Manacher, G.: A new linear-time “on-line” algorithm for finding the smallest initial palindrome of a string. J. ACM (JACM) 22(3), 346–351 (1975)

    Article  MATH  Google Scholar 

  20. Rubinchik, M., Shur, A.M.: EERTREE: an efficient data structure for processing palindromes in strings. In: Lipták, Z., Smyth, W.F. (eds.) IWOCA 2015. LNCS, vol. 9538, pp. 321–333. Springer, Cham (2016). doi:10.1007/978-3-319-29516-9_27

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jakub Radoszewski .

Editor information

Editors and Affiliations

A Appendix

A Appendix

Generalized Palindromic Factorization

In this section we show that the approach of Fici et al. [8] works for generalized palindromes for any involution f. The following auxiliary lemma extends the combinatorial properties of standard palindromes used in [8] (see Lemmas 1–3 therein) to generalized palindromes. Recall that a string y is called a border of a string x if it is both a prefix and a suffix of x. A number p is called a period of x if \(x[i]=x[i+p]\) for all \(i=1,\ldots ,|x|-p\). It is well known that x has a period p iff it has a border of length \(|x|-p\); see [4, 5].

Lemma 13

  1. (a)

    Let y be a suffix of a generalized palindrome x. Then y is a border of x iff y is a generalized palindrome.

  2. (b)

    Let x be a string with a border y such that \(| x | \le 2 | y |\). Then x is a generalized palindrome iff y is a generalized palindrome.

  3. (c)

    Let y be a proper suffix of a generalized palindrome x. Then \(| x | - | y |\) is a period of x iff y is a generalized palindrome. In particular, \(| x | - | y |\) is the smallest period of x iff y is the longest generalized palindromic proper suffix of x.

Proof

(a) Let \(y'\) be the prefix of x of length |y|. As x is a generalized palindrome, \(y'=f(y^R)\). \((\Rightarrow )\) If y is a border of x, then \(y = y' = f(y^R)\), so y is a generalized palindrome. \((\Leftarrow )\) If y is a generalized palindrome, then \(y' = f(y^R) = y\), so y is a border of x.

(b) \((\Rightarrow )\) From (a), if x is a generalized palindrome and y is its border, then y is a generalized palindrome. \((\Leftarrow )\) If y is a generalized palindrome, \(f(x^R)\) has a border \(f(y^R)=y\). This border covers the whole string \(f(x^R)\) and is the same as the border of x, so \(x=f(x^R)\) and x indeed is a generalized palindrome.

(c) This is a consequence of part (a) and the relation between borders and periods of a string.    \(\square \)

The crucial combinatorial property of standard palindromes used in Step 1 of the algorithm in Sect. 3 is that the sequence of consecutive differences in \(P_j\) is non-increasing and contains at most \(\mathcal {O}(\log j)\) distinct values. We show that the same observation holds for generalized palindromes; this follows from the next lemma, parts (1) and (2). The proof of Lemma 14 follows exactly the lines of the proof of the corresponding Lemma 4 in [8]; due to space constraints, we refer the reader to Fig. 3 illustrating the proof in [8].

Lemma 14

Let x be a generalized palindrome, y the longest generalized palindromic proper suffix of x, and z the longest generalized palindromic proper suffix of y. Let u and v be strings such that \(x = u y\) and \(y = v z\). Then:

  1. (1)

    \(| u | \ge | v |\);

  2. (2)

    if \(| u | > | v |\) then \(| u | > | z |\);

  3. (3)

    if \(| u | = | v |\) then \(u = v\).

Proof

(1) By Lemma 13(c), \(| u | = | x | - | y |\) is the smallest period of x, and \(| v | = | y | - | z |\) is the smallest period of y. Since y is a factor of x, either \(| u |> | y | > | v |\) or |u| is a period of y too, and thus it cannot be smaller than |v|.

(2) By Lemma 13(a), y is a border of x and thus v is a prefix of x. Let w be a string such that \(x = v w\). Then z is a border of w and \(| w | = | zu |\). Since we assume \(| u | > | v |\), we must have \(| w | > | y |\). Suppose to the contrary that \(| u | \le | z |\). Then \(| w | = | zu | \le 2 | z |\), and by Lemma 13(b), w is a generalized palindrome. But this contradicts y being the longest generalized palindromic proper suffix of x.

(3) In the proof of (2) we saw that v is a prefix of x, and so is u by definition. Thus \(u = v\) if \(| u | = | v |\).    \(\square \)

We have thus shown that, also in case of generalized palindromes, the set \(P_j\) can be compactly represented by a set \(G_j\), as described in Sect. 3. To complete Step 1 of the algorithm, we need to show that \(G_j\) can be computed from \(G_{j-1}\) in \(\mathcal {O}(\log j)\) time. For this, just as in [8], we show that each triple \((i,\varDelta ,k) \in G_{j-1}\) will be either eliminated or replaced by \((i-1,\varDelta ,k)\) in \(G_j\). The proof exploits part (3) of Lemma 14.

Lemma 15

Let \(p_i\) and \(p_{i + 1}\) be two consecutive elements of \(P_{j - 1 ,\varDelta }\). Then \(p_i - 1 \in P_j\) iff \(p_{i + 1} - 1 \in P_j\).

Proof

By definition, \(p_{i + 1} - p_i = \varDelta \), and the predecessor of \(p_i\) in \(P_j\) is \(p_{i - 1} = p_i - \varDelta \). The strings \(x=S[p_{i-1} ..j-1]\), \(y=S[p_{i} ..j-1]\), and \(z=S[p_{i+1} ..j-1]\) form the situation of Lemma 14(3). Hence, \(S[p_{i}-1]=S[p_{i+1}-1]=c\). Thus, \(p_i - 1 \in P_j\) iff \(S[j]=f(c)\) iff \(p_{i + 1} - 1 \in P_j\).    \(\square \)

After this transformation, one might need to update pairs of adjacent triples in \(G_j\) because the gaps between them might have changed. This simple process is explained in detail in [8] and takes only \(\mathcal {O}(\log j)\) additional time.

As for Step 2 of the algorithm, it suffices to show that the following combinatorial observation holds for generalized palindromes. Again we follow the lines of the proof from [8] (cf. Fig. 5 in that paper).

Lemma 16

If \((i,\varDelta ,k) \in G_j\) and \(k \ge 2\), then \((i,\varDelta ,k-1) \in G_{j-\varDelta }\).

Proof

By definition, \(( i , \varDelta , k ) \in G_j\) is equivalent to saying that \(P_{j ,\varDelta } = \{ i , i + \varDelta , \ldots , i + ( k - 1 )\varDelta \}\), and we need to show that \(P_{j -\varDelta ,\varDelta } = \{ i , i + \varDelta , \ldots , i + ( k - 2 )\varDelta \}\). We will show first that \(P_{j -\varDelta ,\varDelta } \cap [ i - \varDelta + 1 ..j - \varDelta ] = \{ i , i + \varDelta , \ldots , i + ( k - 2 )\varDelta \}\) and then that \(P_{j -\varDelta ,\varDelta } \cap [ 1 ..i - \varDelta ] = \emptyset \).

Since \(y = S [ i ..j ]\) and \(x = S [ i - \varDelta ..j ]\) are generalized palindromes and y is the longest proper border of x (by Lemma 13(a)), \(S [ i - \varDelta ..j - \varDelta ] = y = S [ i ..j ]\). Thus for all \(\ell \in [ i ..j ]\), \(\ell \in P_j\) iff \(\ell - \varDelta \in P_{j -\varDelta }\). In particular, the consecutive differences in both cases are the same and for all \(\ell \in [ i + 1 ..j ]\), \(\ell \in P_{j ,\varDelta }\) iff \(\ell - \varDelta \in P_{j -\varDelta ,\varDelta }\). Thus \(P_{j -\varDelta ,\varDelta } \cap [ i - \varDelta + 1 ..j - \varDelta ] = \{ i , i + \varDelta , \ldots , i + ( k - 2 )\varDelta \}\).

We still need to show that \(P_{j -\varDelta ,\varDelta } \cap [ 1 ..i - \varDelta ] = \emptyset \), which is true if and only if \(i - 2 \varDelta \not \in P_{j -\varDelta }\). Suppose to the contrary that \(S [ i - 2 \varDelta ..j -\varDelta ]\) is a generalized palindrome and let \(w = S [ i - 2 \varDelta ..i - \varDelta - 1 ]\). Then \(S [ j - 2 \varDelta + 1 ..j - \varDelta ] = f(w^R)\). Since \(z = S [ i - \varDelta ..j - \varDelta ]\) and \(S [ i - \varDelta ..j ]\) are generalized palindromes too, we have that \(S [ i - \varDelta ..i - 1 ] = w\) and \(S [ j - \varDelta + 1 ..j ] = f(w^R)\). Finally, since z is a generalized palindrome, \(S [ i - 2 \varDelta ..j ] = w zf(w^R)\) is a generalized palindrome. This implies that \(i - 2 \varDelta \in P_j\) and thus \(i - \varDelta \in P_{j ,\varDelta }\), which is a contradiction.    \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Adamczyk, M., Alzamel, M., Charalampopoulos, P., Iliopoulos, C.S., Radoszewski, J. (2017). Palindromic Decompositions with Gaps and Errors. In: Weil, P. (eds) Computer Science – Theory and Applications. CSR 2017. Lecture Notes in Computer Science(), vol 10304. Springer, Cham. https://doi.org/10.1007/978-3-319-58747-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58747-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58746-2

  • Online ISBN: 978-3-319-58747-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics