CPM 2006: Combinatorial Pattern Matching pp 342-353

# New Algorithms for Text Fingerprinting

• Roman Kolpakov
• Mathieu Raffinot
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)

## Abstract

Let s = s 1 .. s n be a text (or sequence) on a finite alphabet Σ. A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists of computing the set $${\mathcal{F}}$$ of all fingerprints of all its substrings and being able to efficiently answer several questions on this set. A given fingerprint $$f \in {\mathcal{F}}$$ is represented by a binary array, F, of size |Σ| named a fingerprint table. A fingerprint, $$f \in {\mathcal{F}}$$, admits a number of maximal locations (i,j) in S, that is the alphabet of s i .. s j is f and s i − − 1, s j + 1, if defined, are not in f. The total number of maximal locations is $${\mathcal{L}} \leq n |\Sigma|+1.$$ We present new algorithms and a new data structure for the three problems: (1) compute $${\mathcal{F}}$$; (2) given F, answer if F represents a fingerprint in $${\mathcal{F}}$$; (3) given F, find all maximal locations of F in s. These problems are respectively solved in $$O(({\mathcal{L}}+ n) \log |\Sigma|)$$, Θ(|Σ|), and Θ(|Σ| + K) time – where K is the number of maximal locations of F.

## Keywords

Maximal Location Hash Table Distinct Character Naming Algorithm Edge Label
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## Preview

Unable to display preview. Download preview PDF.

## References

1. 1.
Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via parikh mapping. J. Discrete Algorithms 1(5-6), 409–421 (2003)
2. 2.
Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
3. 3.
Didier, G.: Common intervals of two sequences. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 17–24. Springer, Heidelberg (2003)
4. 4.
Didier, G., Schmidt, T., Stoye, J., Tsur, D.: Character sets of strings (submitted, 2004)Google Scholar
5. 5.
Kolpakov, R., Raffinot, M.: New Algorithms for Text Fingerprinting (unpublished, 2006) (submitted), http://www-igm.univ-mlv.fr/~raffinot/ftp/~fingerprint.pdf

© Springer-Verlag Berlin Heidelberg 2006

## Authors and Affiliations

• Roman Kolpakov
• 1
• Mathieu Raffinot
• 2
1. 1.Liapunov French-Russian InstituteLomonosov Moscow State UniversityMoscowRussia
2. 2.CNRS, Poncelet LaboratoryIndependent University of MoscowMoscowRussia