Hunting Redundancies in Strings

  • Golnaz Badkobeh
  • Supaporn Chairungsee
  • Maxime Crochemore
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6795)

Abstract

The notion of redundancies in texts, regarded as sequences of symbols, appear under various concepts in the literature of Combinatorics on words and of Algorithms on strings: repetitions, repeats, runs, covers, seeds, and palindromes, for example.

We explore some of the newest aspects of these redundancies.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apostolico, A., Breslauer, D.: Of periods, quasiperiods, repetitions and covers, pp. 236–248 (1997)Google Scholar
  2. 2.
    Apostolico, A., Preparata, F.P.: Optimal off-line detection of repetitions in a string. Theoret. Comput. Sci. 22(3), 297–315 (1983)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Badkobeh, G.: Fewest repetitions vs maximal-exponent powers in infinite binary words (2011) (submitted)Google Scholar
  4. 4.
    Badkobeh, G., Crochemore, M.: Bounded number of squares in infinite repetition-constrained binary words. In: Holub, J., Zd’árek, J. (eds.) Prague Stringology Conference, pp. 161–166. Czech Technical University in Prague (2010) ISBN 978-80-01-04597-8Google Scholar
  5. 5.
    Bell, T.C., Clearly, J.G., Witten, I.H.: Text Compression. Prentice Hall Inc., New Jersey (1990)Google Scholar
  6. 6.
    Böckenhauer, H.-J., Bongartz, D.: Algorithmic Aspects of Bioinformatics. Springer, Berlin (2007)MATHGoogle Scholar
  7. 7.
    Chairungsee, S., Crochemore, M.: Efficient computing of longest previous reverse factors. In: Shoukourian, Y. (ed.) Seventh International Conference on Computer Science and Information Technologies (CSIT 2009), pp. 27–30. The National Academy of Sciences of Armenia Publishers, Yerevan (2009)Google Scholar
  8. 8.
    Chen, G., Puglisi, S.J., Smyth, W.F.: Fast and practical algorithms for computing all the runs in a string. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 307–315. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Inf. Process. Lett. 12(5), 244–250 (1981)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Crochemore, M.: Transducers and repetitions. Theoretical Computer Science 45(1), 63–86 (1986)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Crochemore, M., Fazekas, S.Z., Iliopoulos, C., Jayasekera, I.: Number of occurrences of powers in strings. International Journal of Foundations of Computer Science 21(4), 535–547 (2010)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)MATHCrossRefGoogle Scholar
  13. 13.
    Crochemore, M., Ilie, L.: Analysis of maximal repetitions in strings. In: Kučera, L., Kučera, A. (eds.) MFCS 2007. LNCS, vol. 4708, pp. 465–476. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Crochemore, M., Ilie, L.: Computing longest previous factors in linear time and applications. Information Processing Letters 106(2), 75–80 (2008), doi:10.1016/j.ipl.2007.10.006MATHMathSciNetGoogle Scholar
  15. 15.
    Crochemore, M., Ilie, L.: Maximal repetitions in strings. J. Comput. Syst. Sci. 74(5), 796–807 (2008)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Crochemore, M., Ilie, L., Iliopoulos, C., Kubica, M., Rytter, W., Waleń, T.: LPF computation revisited. In: Fiala, J., Kratochvíl, J., Miller, M. (eds.) IWOCA 2009. LNCS, vol. 5874, pp. 158–169. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  17. 17.
    Crochemore, M., Ilie, L., Smyth, W.F.: A simple algorithm for computing the Lempel-Ziv factorization. In: Storer, J.A., Marcellin, M.W. (eds.) 18th Data Compression Conference, March 25-27, pp. 482–488. IEEE Computer Society, Los Alamitos (2008)Google Scholar
  18. 18.
    Crochemore, M., Ilie, L., Tinta, L.: The ”runs” conjecture. In: de Felice, C., Carpi, A. (eds.) Theoretical Computer Science (2010) (in press, corrected proof )Google Scholar
  19. 19.
    Crochemore, M., Iliopoulos, C., Kubica, M., Rytter, W., Waleń, T.: Efficient algorithms for two extensions of LPF table: The power of suffix arrays. In: van Leeuwen, J., Muscholl, A., Peleg, D., Pokorný, J., Rumpe, B. (eds.) SOFSEM 2010. LNCS, vol. 5901, pp. 296–307. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Crochemore, M., Rytter, W.: Squares, cubes and time-space efficient string-searching. Algorithmica 13(5), 405–425 (1995)MATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Dejean, F.: Sur un théorème de Thue. J. Comb. Theory, Ser. A 13(1), 90–99 (1972)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Dekking, F.M.: On repetitions of blocks in binary sequences. J. Comb. Theory, Ser. A 20(3), 292–299 (1976)MATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Fraenkel, A.S., Simpson, J.: How many squares must a binary sequence contain? Electr. J. Comb. 2 (1995)Google Scholar
  24. 24.
    Fraenkel, A.S., Simpson, J.: How many squares can a string contain? J. Comb. Theory, Ser. A 82(1), 112–120 (1998)MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Franek, F., Smyth, W.F., Tang, Y.: Computing all repeats using suffix arrays. Journal of Automata, Languages and Combinatorics 8(4), 579–591 (2003)MATHMathSciNetGoogle Scholar
  26. 26.
    Franek, F., Yang, Q.: An asymptotic lower bound for the maximal-number-of-runs function. In: Holub, J., Zdárek, J. (eds.) Proceedings of the Prague Stringology Conference. Department of Computer Science and Engineering, Faculty of Electrical Engineering, pp. 3–8. Czech Technical University (2006)Google Scholar
  27. 27.
    Giraud, M.: Not so many runs in strings. In: Martin-Vide, C. (ed.) 2nd International Conference on Language and Automata Theory and Applications (2008)Google Scholar
  28. 28.
    Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci. 69(4), 525–546 (2004)MATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Harju, T., Nowotka, D.: Binary words with few squares. Bulletin of the EATCS 89, 164–166 (2006)MATHMathSciNetGoogle Scholar
  30. 30.
    Ilie, L.: A simple proof that a word of length has at most 2 distinct squares. J. Comb. Theory, Ser. A 112(1), 163–164 (2005)MATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Ilie, L.: A note on the number of squares in a word. Theor. Comput. Sci. 380(3), 373–376 (2007)MATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Iliopoulos, C.S., Moore, D., Smyth, W.F.: A characterization of the squares in a Fibonacci string. Theoret. Comput. Sci. 172(1-2), 281–291 (1997)MATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Karhumäki, J., Shallit, J.: Polynomial versus exponential growth in repetition-free binary words. J. Comb. Theory, Ser. A 105(2), 335–347 (2004)MATHCrossRefGoogle Scholar
  34. 34.
    Kolpakov, R., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: Proceedings of the 40th IEEE Annual Symposium on Foundations of Computer Science, pp. 596–604. IEEE Computer Society Press, New York (1999)Google Scholar
  35. 35.
    Kolpakov, R., Kucherov, G.: Searching for gapped palindromes. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 18–30. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  36. 36.
    Lothaire, M. (ed.): Combinatorics on Words, 2nd edn. Cambridge University Press, Cambridge (1997)MATHGoogle Scholar
  37. 37.
    Lothaire, M. (ed.): Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2001)Google Scholar
  38. 38.
    Lothaire, M. (ed.): Appplied Combinatorics on Words. Cambridge University Press, Cambridge (2005)Google Scholar
  39. 39.
    MacDonald, M., Ambrose, C.M.: A novel gene containing a trinucleotide repeat that is expanded and unstable on huntington’s disease chromosomes. Cell 72(6), 971–983 (1993)CrossRefGoogle Scholar
  40. 40.
    Main, M.G.: Detecting leftmost maximal periodicities. Discret. Appl. Math. 25, 145–153 (1989)MATHCrossRefMathSciNetGoogle Scholar
  41. 41.
    Main, M.G., Lorentz, R.J.: An O(n logn) algorithm for finding all repetitions in a string. J. Algorithms 5(3), 422–432 (1984)MATHCrossRefMathSciNetGoogle Scholar
  42. 42.
    Matsubara, W., Kusano, K., Ishino, A., Bannai, H., Shinohara, A.: New lower bounds for the maximum number of runs in a string. In: Holub, J., Zdárek, J. (eds.) Proceedings of the Prague Stringology Conference. Prague Stringology Club, Department of Computer Science and Engineering, Faculty of Electrical Engineering, pp.140–145. Czech Technical University in Prague (2008) Google Scholar
  43. 43.
    Ochem, P.: A generator of morphisms for infinite words. ITA 40(3), 427–441 (2006)MATHMathSciNetGoogle Scholar
  44. 44.
    Pansiot, J.J.: The morse sequence and iterated morphisms. Inf. Process. Lett. 12(2), 68–70 (1981)MATHCrossRefMathSciNetGoogle Scholar
  45. 45.
    Puglisi, S.J., Simpson, J., Smyth, W.F.: How many runs can a string contain? Theor. Comput. Sci. 401(1-3), 165–171 (2008)MATHMathSciNetGoogle Scholar
  46. 46.
    Rampersad, N., Shallit, J., Wei Wang, M.: Avoiding large squares in infinite binary words. Theor. Comput. Sci. 339(1), 19–34 (2005)MATHCrossRefGoogle Scholar
  47. 47.
    Rao, M.: Last cases of Dejean’s conjecture. In: Carpi, A., de Felice, C. (eds.) WORDS 2009. University of Salerno, Italy (2009)Google Scholar
  48. 48.
    Rytter, W.: The number of runs in a string: Improved analysis of the linear upper bound. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 184–195. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  49. 49.
    Rytter, W.: The number of runs in a string. Inf. Comput. 205(9), 1459–1469 (2007)MATHCrossRefMathSciNetGoogle Scholar
  50. 50.
    Séébold, P.: Sur les morphismes qui engendrent des mots infinis ayant des facteurs prescrits, pp. 301–311 (1983)Google Scholar
  51. 51.
    Shallit, J.: Simultaneous avoidance of large squares and fractional powers in infinite binary words. Intl. J. Found. Comput. Sci. 15, 317–327 (2004)MATHCrossRefMathSciNetGoogle Scholar
  52. 52.
    Simpson, J.: Modified Padovan words and the maximum number of runs in a word. Australasian J. of Comb. 46, 129–145 (2010)MATHMathSciNetGoogle Scholar
  53. 53.
    Thue: Uber unendliche zeichenreihen. Norske vid. Selsk. Skr. I. Mat. Nat. Kl. Christiana 7, 1–22 (1906)Google Scholar
  54. 54.
    Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes. Van Nostrand Reinhold (1994)Google Scholar
  55. 55.
    Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, 337–343 (1977)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Golnaz Badkobeh
    • 1
  • Supaporn Chairungsee
    • 1
  • Maxime Crochemore
    • 1
    • 2
  1. 1.King’s College LondonLondonUnited Kingdom
  2. 2.Université Paris-EstFrance

Personalised recommendations