Abstract
A classical construction of Aho and Corasick solves the pattern matching problem for a finite set of words X in linear time, where the size of the input X is the sum of the lengths of its elements. It produces an automaton that recognizes A * X, where A is a finite alphabet, but which is generally not minimal. As an alternative to classical minimization algorithms, which yields a \({\mathcal O}(n\log n)\) solution to the problem, we propose a linear pseudo-minimization algorithm specific to Aho-Corasick automata, which produces an automaton whose size is between the size of the input automaton and the one of its associated minimal automaton. Moreover this algorithm generically computes the minimal automaton: for a large variety of natural distributions the probability that the output is the minimal automaton of A * X tends to one as the size of X tends to infinity.
This work was completed with the support of the ANR project MAGNUM number 2010-BLAN-0204.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
AitMous, O., Bassino, F., Nicaud, C.: Building the Minimal Automaton of A * X in Linear Time, When X Is of Bounded Cardinality. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 275–287. Springer, Heidelberg (2010)
Baker, T.P.: A technique for extending rapid exact-match string matching to arrays of more than one dimension. SIAM J. Comput., 533–541 (1978)
Bassino, F., Giambruno, L., Nicaud, C.: The average state complexity of rational operations on finite languages. Int. J. Found. Comput. Sci. 21(4), 495–516 (2010)
Bird, R.S.: Two dimensional pattern matching. Inf. Process. Lett. 6(5), 168–170 (1977)
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on strings. Cambridge University Press (2007)
Crochemore, M., Rytter, W.: Text Algorithms. Oxford Univ. Press (1994)
Hopcroft, J.E.: An n logn algorithm for minimizing states in a finite automaton. In: Theory of Machines and Computations, pp. 189–196. Academic Press (1971)
Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory, Languages and Computation. Addison-Wesley (1979)
Lothaire, M.: Applied Combinatorics on Words. Cambridge University Press (2005)
Revuz, D.: Dictionnaires et lexiques: methodes et algorithmes. PhD thesis, Institut Blaise Pascal (1991)
Revuz, D.: Minimisation of acyclic deterministic automata in linear time. Theoret. Comput. Sci. 92(1), 181–189 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
AitMous, O., Bassino, F., Nicaud, C. (2012). An Efficient Linear Pseudo-minimization Algorithm for Aho-Corasick Automata. In: Kärkkäinen, J., Stoye, J. (eds) Combinatorial Pattern Matching. CPM 2012. Lecture Notes in Computer Science, vol 7354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31265-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-31265-6_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31264-9
Online ISBN: 978-3-642-31265-6
eBook Packages: Computer ScienceComputer Science (R0)