Languages with Mismatches and an Application to Approximate Indexing

  • Chiara Epifanio
  • Alessandra Gabriele
  • Filippo Mignosi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3572)

Abstract

In this paper we describe a factorial language, denoted by L(S,k,r), that contains all words that occur in a string S up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h≥ 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S,k,r) is a non-increasing function of r and a non-decreasing function of k and that the equation r=R(S,k,r) admits a unique solution.

The repetition index plays an important role in the construction of an indexing data structure based on a trie that represents the set of all factors of L(S,k,r) having length equal to R(S,k,r). For each word xL(S,k,r) this data structure allows us to find the list occ(x) of all occurrences of the word x in a text S up to k mismatches every r symbols in time proportional to |x|+|occ(x)|.

Keywords

Combinatorics on words formal languages approximate string matching indexing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arratia, R., Waterman, M.: The Erdös-Rényi strong law for pattern matching with given proportion of mismatches. Annals of Probability 4, 200–225 (1989)Google Scholar
  2. 2.
    Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Annual ACM Symposium on Theory of Computing, STOC 2004 (2004)Google Scholar
  3. 3.
    Crochemore, M., Hancart, C., Lecroq, T.: Algorithmique du texte. Vuibert, 347 pages (2001)Google Scholar
  4. 4.
    Gabriele, A., Mignosi, F., Restivo, A., Sciortino, M.: Indexing structure for approximate string matching. In: Petreschi, R., Persiano, G., Silvestri, R. (eds.) CIAC 2003. LNCS, vol. 2653, pp. 140–151. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Gabriele, A., Mignosi, F., Restivo, A., Sciortino, M.: Approximate string matching: indexing and the k-mismatch problem. Technical Report 244, University of Palermo, Department of Mathematics and Applications (2004)Google Scholar
  6. 6.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge, 534 pages. (1997), ISBN 0 521 58519 8 hardbackGoogle Scholar
  7. 7.
    Huynh, T.N.D., Hon, W.K., Lam, T.W., Sung, W.K.: Approximate string matching using compressed suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 434–444. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Lothaire, M.: Algebraic Combinatorics on Words. Cambridge University Press, Cambridge (2002), ISBN 0-521-81220-8 hardbackMATHGoogle Scholar
  9. 9.
    Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 657–666 (2002)Google Scholar
  10. 10.
    Pelfrêne, J., Abdeddaïm, S., Alexandre, J.: Extracting approximate patterns. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 328–347. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: A basis of tiling motifs for generating repeated patterns and its complexity for higher quorum. In: Rovan, B., Vojtáš, P. (eds.) MFCS 2003. LNCS, vol. 2747, pp. 622–631. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  12. 12.
    Pisanti, N., Crochemore, M., Grossi, R., Sagot, M.-F.: A comparative study of bases for motif inference. In: Iliopoulos, C., Lecroq, T. (eds.) String Algorithmics, King’s College London Publications (2004)Google Scholar
  13. 13.
    Szpankowski, W.: Average Case Analysis of Algorithms on Sequences. John Wiley & Sons, Chichester (2001)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Chiara Epifanio
    • 1
  • Alessandra Gabriele
    • 1
  • Filippo Mignosi
    • 1
  1. 1.Dipartimento di Matematica ed ApplicazioniUniversità degli Studi di PalermoPalermoItaly

Personalised recommendations