New Algorithms for Text Fingerprinting

  • Roman Kolpakov
  • Mathieu Raffinot
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)


Let s = s 1 .. s n be a text (or sequence) on a finite alphabet Σ. A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists of computing the set \({\mathcal{F}}\) of all fingerprints of all its substrings and being able to efficiently answer several questions on this set. A given fingerprint \(f \in {\mathcal{F}}\) is represented by a binary array, F, of size |Σ| named a fingerprint table. A fingerprint, \(f \in {\mathcal{F}}\), admits a number of maximal locations (i,j) in S, that is the alphabet of s i .. s j is f and s i − − 1, s j + 1, if defined, are not in f. The total number of maximal locations is \({\mathcal{L}} \leq n |\Sigma|+1.\) We present new algorithms and a new data structure for the three problems: (1) compute \({\mathcal{F}}\); (2) given F, answer if F represents a fingerprint in \({\mathcal{F}}\); (3) given F, find all maximal locations of F in s. These problems are respectively solved in \(O(({\mathcal{L}}+ n) \log |\Sigma|)\), Θ(|Σ|), and Θ(|Σ| + K) time – where K is the number of maximal locations of F.


Maximal Location Hash Table Distinct Character Naming Algorithm Edge Label 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via parikh mapping. J. Discrete Algorithms 1(5-6), 409–421 (2003)CrossRefMathSciNetMATHGoogle Scholar
  2. 2.
    Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)MATHGoogle Scholar
  3. 3.
    Didier, G.: Common intervals of two sequences. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 17–24. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Didier, G., Schmidt, T., Stoye, J., Tsur, D.: Character sets of strings (submitted, 2004)Google Scholar
  5. 5.
    Kolpakov, R., Raffinot, M.: New Algorithms for Text Fingerprinting (unpublished, 2006) (submitted),

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Roman Kolpakov
    • 1
  • Mathieu Raffinot
    • 2
  1. 1.Liapunov French-Russian InstituteLomonosov Moscow State UniversityMoscowRussia
  2. 2.CNRS, Poncelet LaboratoryIndependent University of MoscowMoscowRussia

Personalised recommendations