Efficient Algorithms for Two Extensions of LPF Table: The Power of Suffix Arrays

  • Maxime Crochemore
  • Costas S. Iliopoulos
  • Marcin Kubica
  • Wojciech Rytter
  • Tomasz Waleń
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5901)

Abstract

Suffix arrays provide a powerful data structure to solve several questions related to the structure of all the factors of a string. We show how they can be used to compute efficiently two new tables storing different types of previous factors (past segments) of a string. The concept of a longest previous factor is inherent to Ziv-Lempel factorization of strings in text compression, as well as in statistics of repetitions and symmetries. The longest previous reverse factor for a given position i is the longest factor starting at i, such that its reverse copy occurs before, while the longest previous non-overlapping factor is the longest factor v starting at i which has an exact copy occurring before. The previous copies of the factors are required to occur in the prefix ending at position i − 1. We design algorithms computing the table of longest previous reverse factors (LPrF table) and the table of longest previous non-overlapping factors (LPnF table). The latter table is useful to compute repetitions while the former is a useful tool for extracting symmetries. These tables are computed, using two previously computed read-only arrays (SUF and LCP) composing the suffix array, in linear time on any integer alphabet. The tables have not been explicitly considered before, but they have several applications and they are natural extensions of the LPF table which has been studied thoroughly before. Our results improve on the previous ones in several ways. The running time of the computation no longer depends on the size of the alphabet, which drops a log factor. Moreover the newly introduced tables store additional information on the structure of the string, helpful to improve, for example, gapped palindrome detection and text compression using reverse factors.

Keywords

Longest previous reverse factor longest previous non-overlapping factor longest previous factor palindrome runs Suffix Array text compression 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bell, T.C., Clearly, J.G., Witten, I.H.: Text Compression. Prentice Hall Inc., New Jersey (1990)Google Scholar
  2. 2.
    Bender, M.A., Farach-Colton, M.: The LCA Problem Revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Böckenhauer, H.-J., Bongartz, D.: Algorithmic Aspects of Bioinformatics. Springer, Berlin (2007)MATHGoogle Scholar
  4. 4.
    Crochemore, M.: Transducers and Repetitions. Theoretical Computer Science 45(1), 63–86 (1986)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)MATHGoogle Scholar
  6. 6.
    Crochemore, M., Ilie, L., Iliopoulos, C., Kubica, M., Rytter, W., Waleń, T.: LPF Computation Revisited. In: Fiala, J., Kratochvíl, J., Miller, M. (eds.) IWOCA 2009. LNCS, vol. 5874, pp. 158–169. Springer, Heidelberg (2009)Google Scholar
  7. 7.
    Fischer, J., Heun, V.: Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 36–48. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Fischer, J., Heun, V.: A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 459–470. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  9. 9.
    Gabow, H., Bentley, J., Tarjan, R.: Scaling and Related Techniques for Geometry Problems. In: Symposium on the Theory of Computing (STOC), pp. 135–143 (1984)Google Scholar
  10. 10.
    Grumbach, S., Tahi, F.: Compression of DNA Sequences. In: Data Compression Conference, pp. 340–350 (1993)Google Scholar
  11. 11.
    Hartman, A., Rodeh, M.: Optimal Parsing of Strings. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words, Computer and System Sciences, vol. 12, pp. 155–167. Springer, Berlin (1985)Google Scholar
  12. 12.
    Kolpakov, R.M., Kucherov, G.: Finding Maximal Repetitions in a Word in Linear Time. In: FOCS, pp. 596–604 (1999)Google Scholar
  13. 13.
    Kolpakov, R.M., Kucherov, G.: Searching for Gapped Palindromes. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 18–30. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Main, M.G.: Detecting Leftmost Maximal Periodicities. Discret. Appl. Math. 25, 145–153 (1989)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Sadakane, K.: Succinct Data Structures for Flexible Text Retrieval Systems. Journal of Discrete Algorithms 5(1), 12–22 (2007)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Tischler, G.: Personal communicationGoogle Scholar
  17. 17.
    Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory, 337–343 (1977)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Maxime Crochemore
    • 1
    • 3
  • Costas S. Iliopoulos
    • 1
    • 4
  • Marcin Kubica
    • 2
  • Wojciech Rytter
    • 2
    • 5
  • Tomasz Waleń
    • 2
  1. 1.Dept. of Computer ScienceKing’s College LondonLondonUK
  2. 2.Institute of InformaticsUniversity of WarsawWarsawPoland
  3. 3.Université Paris-EstFrance
  4. 4.Digital Ecosystems & Business Intelligence InstituteCurtin University of TechnologyPerthAustralia
  5. 5.Faculty of Math. and InformaticsCopernicus UniversityTorunPoland

Personalised recommendations