CPM 2006: Combinatorial Pattern Matching pp 342-353

# New Algorithms for Text Fingerprinting

• Roman Kolpakov
• Mathieu Raffinot
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)

## Abstract

Let s = s 1 .. s n be a text (or sequence) on a finite alphabet Σ. A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists of computing the set $${\mathcal{F}}$$ of all fingerprints of all its substrings and being able to efficiently answer several questions on this set. A given fingerprint $$f \in {\mathcal{F}}$$ is represented by a binary array, F, of size |Σ| named a fingerprint table. A fingerprint, $$f \in {\mathcal{F}}$$, admits a number of maximal locations (i,j) in S, that is the alphabet of s i .. s j is f and s i − − 1, s j + 1, if defined, are not in f. The total number of maximal locations is $${\mathcal{L}} \leq n |\Sigma|+1.$$ We present new algorithms and a new data structure for the three problems: (1) compute $${\mathcal{F}}$$; (2) given F, answer if F represents a fingerprint in $${\mathcal{F}}$$; (3) given F, find all maximal locations of F in s. These problems are respectively solved in $$O(({\mathcal{L}}+ n) \log |\Sigma|)$$, Θ(|Σ|), and Θ(|Σ| + K) time – where K is the number of maximal locations of F.

## Keywords

Maximal Location Hash Table Distinct Character Naming Algorithm Edge Label

## Preview

Unable to display preview. Download preview PDF.

## References

1. 1.
Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via parikh mapping. J. Discrete Algorithms 1(5-6), 409–421 (2003)
2. 2.
Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
3. 3.
Didier, G.: Common intervals of two sequences. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 17–24. Springer, Heidelberg (2003)
4. 4.
Didier, G., Schmidt, T., Stoye, J., Tsur, D.: Character sets of strings (submitted, 2004)Google Scholar
5. 5.
Kolpakov, R., Raffinot, M.: New Algorithms for Text Fingerprinting (unpublished, 2006) (submitted), http://www-igm.univ-mlv.fr/~raffinot/ftp/~fingerprint.pdf

## Copyright information

© Springer-Verlag Berlin Heidelberg 2006

## Authors and Affiliations

• Roman Kolpakov
• 1
• Mathieu Raffinot
• 2
1. 1.Liapunov French-Russian InstituteLomonosov Moscow State UniversityMoscowRussia
2. 2.CNRS, Poncelet LaboratoryIndependent University of MoscowMoscowRussia