Learning Regular Expressions from Noisy Sequences

Galassi, Ugo; Giordana, Attilio

doi:10.1007/11527862_7

Ugo Galassi²⁰ &
Attilio Giordana²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3607))

Included in the following conference series:

International Symposium on Abstraction, Reformulation, and Approximation

1050 Accesses
9 Citations
3 Altmetric

Abstract

The presence of long gaps dramatically increases the diffculty of detecting and characterizing complex events hidden in long sequences. In order to cope with this problem, a learning algorithm based on an abstraction mechanism is proposed: it can infer the general model of complex events from a set of learning sequences. Events are described by means of regular expressions, and the abstraction mechanism is based on the substitution property of regular languages. The induction algorithm proceeds bottom-up, progressively coarsening the sequence granularity, letting correlations between subsequences, separated by long gaps, naturally emerge. Two abstraction operators are defined. The first one detects, and abstracts into non-terminal symbols, regular expressions not containing iterative constructs. The second one detects and abstracts iterated subsequences. By interleaving the two operators, regular expressions in general form may be inferred. Both operators are based on string alignment algorithms taken from bio-informatics. A restricted form of the algorithm has already been outlined in previous papers, where the emphasis was on applications. Here, the algorithm, in an extended version, is described and analyzed into details.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Angluin, D.: Queries and concept learning. Machine Learning 2(4), 319–342 (1988)
Google Scholar
Botta, M., Galassi, U., Giordana, A.: Learning complex and sparse events in long sequences. In: Proceedings of the European Conference on Artificial Intelligence, ECAI 2004, Valencia, Spain (August 2004)
Google Scholar
Botta, M., Giordana, A., Terenziani, P.: Discovering complex events in long sequences. In: Proceedings of the Workshop on learning in temporal sequences, Machine Learning Conference, Sidney, Australia (July 2002)
Google Scholar
Denis, F.: Learning regular languages from simple positive examples. Machine Learning 44(1/2), 37–66 (2001)
Article MATH MathSciNet Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)
Book MATH Google Scholar
Elman, J.L.: Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning 7, 195–225 (1991)
Google Scholar
Fine, S., Singer, Y., Tishby, N.: The hierarchical hidden markov model: Analysis and applications. Machine Learning 32, 41–62 (1998)
Article MATH Google Scholar
Frasconi, P., Bengio, Y.: An em approach to grammatical inference: iputo/output hmms. In: Proceedings of International Conference on Pattern Recognition, ICPR 1994 (1994)
Google Scholar
Fu, K.S.: Syntactic pattern recognition and applications. Prentice Hall, Englewood Cliffs (1982)
MATH Google Scholar
Fu, K.S., Booth, T.L.: Grammatical inference: Introduction and survey (part 1). IEEE Transaction on System, Men and Cybernetics 5, 85–111 (1975)
Google Scholar
Galassi, U., Giordana, A., Mendola, D.: Learning user profiles from traces. Technical report TR-INF-2005-04-02-UNIPMN (2005)
Google Scholar
Gussfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
Book Google Scholar
Hopcroft, J.E., Ullman, J.D.: Formal languages and their relation to automata. Addison-Wesley, Reading (1969)
MATH Google Scholar
Murphy, K., Paskin, M.: Linear time inference in hierarchical hmms. In: Advances in Neural Information Processing Systems (NIPS 2001), vol. 14 (2001)
Google Scholar
Myers, E.W., Miller, W.: Approximate matching of regular expressions. Bulletin of Mathematical Biology 51(2), 5–37 (1989)
MATH MathSciNet Google Scholar
Parekh, R.G., Honavar, V.G.: Learning DFA from simple examples. In: Li, M. (ed.) ALT 1997. LNCS (LNAI), vol. 1316, pp. 116–131. Springer, Heidelberg (1997)
Google Scholar
Parekh, R., Nichitiu, C., Honavar, V.: A polynomial time incremental algorithm for learning DFA. In: Honavar, V.G., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, pp. 37–50. Springer, Heidelberg (1998)
Chapter Google Scholar
Garca, P., Vidal, E.: Inference of k-testable languages in the strict sense and applications to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(9), 920–925 (1990)
Article Google Scholar
Porat, S., Feldman, J.: Learning automata from ordered examples. Machine Learning 7, 109–138 (1991)
MATH Google Scholar
Saitta, L. (ed.): The abstraction paths, Special issue of the Philosophical Transactions of Royal Society. Series B (2003)
Google Scholar
Skounakis, M., Craven, M., Ray, S.: Hierarchical hidden markov models for information extraction. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence IJCAI 2003. Morgan Kaufmann, San Francisco (2003)
Google Scholar
Xie, L., Chang, S., Divakaran, A., Sun, H.: Learning hierarchical hidden Markov models for video structure discovery. Tech. Rep. 2002-006. ADVENT Group, Columbia University (December 2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università Amedeo Avogadro, Via Bellini 25G, 15100, Alessandria, Italy
Ugo Galassi & Attilio Giordana

Authors

Ugo Galassi
View author publications
You can also search for this author in PubMed Google Scholar
Attilio Giordana
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

UR 079 GEODES, IRD, 32 avenue Henri Varagnat, 93143, Bondy, France
Jean-Daniel Zucker
Dip. di Informatica, Università del Piemonte Orientale, Via Bellini 25/G, 15100, Alessandria, Italy
Lorenza Saitta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Galassi, U., Giordana, A. (2005). Learning Regular Expressions from Noisy Sequences. In: Zucker, JD., Saitta, L. (eds) Abstraction, Reformulation and Approximation. SARA 2005. Lecture Notes in Computer Science(), vol 3607. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527862_7

Download citation

DOI: https://doi.org/10.1007/11527862_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27872-6
Online ISBN: 978-3-540-31882-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics