Abstract
The presence of long gaps dramatically increases the diffculty of detecting and characterizing complex events hidden in long sequences. In order to cope with this problem, a learning algorithm based on an abstraction mechanism is proposed: it can infer the general model of complex events from a set of learning sequences. Events are described by means of regular expressions, and the abstraction mechanism is based on the substitution property of regular languages. The induction algorithm proceeds bottom-up, progressively coarsening the sequence granularity, letting correlations between subsequences, separated by long gaps, naturally emerge. Two abstraction operators are defined. The first one detects, and abstracts into non-terminal symbols, regular expressions not containing iterative constructs. The second one detects and abstracts iterated subsequences. By interleaving the two operators, regular expressions in general form may be inferred. Both operators are based on string alignment algorithms taken from bio-informatics. A restricted form of the algorithm has already been outlined in previous papers, where the emphasis was on applications. Here, the algorithm, in an extended version, is described and analyzed into details.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Angluin, D.: Queries and concept learning. Machine Learning 2(4), 319–342 (1988)
Botta, M., Galassi, U., Giordana, A.: Learning complex and sparse events in long sequences. In: Proceedings of the European Conference on Artificial Intelligence, ECAI 2004, Valencia, Spain (August 2004)
Botta, M., Giordana, A., Terenziani, P.: Discovering complex events in long sequences. In: Proceedings of the Workshop on learning in temporal sequences, Machine Learning Conference, Sidney, Australia (July 2002)
Denis, F.: Learning regular languages from simple positive examples. Machine Learning 44(1/2), 37–66 (2001)
Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)
Elman, J.L.: Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning 7, 195–225 (1991)
Fine, S., Singer, Y., Tishby, N.: The hierarchical hidden markov model: Analysis and applications. Machine Learning 32, 41–62 (1998)
Frasconi, P., Bengio, Y.: An em approach to grammatical inference: iputo/output hmms. In: Proceedings of International Conference on Pattern Recognition, ICPR 1994 (1994)
Fu, K.S.: Syntactic pattern recognition and applications. Prentice Hall, Englewood Cliffs (1982)
Fu, K.S., Booth, T.L.: Grammatical inference: Introduction and survey (part 1). IEEE Transaction on System, Men and Cybernetics 5, 85–111 (1975)
Galassi, U., Giordana, A., Mendola, D.: Learning user profiles from traces. Technical report TR-INF-2005-04-02-UNIPMN (2005)
Gussfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
Hopcroft, J.E., Ullman, J.D.: Formal languages and their relation to automata. Addison-Wesley, Reading (1969)
Murphy, K., Paskin, M.: Linear time inference in hierarchical hmms. In: Advances in Neural Information Processing Systems (NIPS 2001), vol. 14 (2001)
Myers, E.W., Miller, W.: Approximate matching of regular expressions. Bulletin of Mathematical Biology 51(2), 5–37 (1989)
Parekh, R.G., Honavar, V.G.: Learning DFA from simple examples. In: Li, M. (ed.) ALT 1997. LNCS (LNAI), vol. 1316, pp. 116–131. Springer, Heidelberg (1997)
Parekh, R., Nichitiu, C., Honavar, V.: A polynomial time incremental algorithm for learning DFA. In: Honavar, V.G., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, pp. 37–50. Springer, Heidelberg (1998)
Garca, P., Vidal, E.: Inference of k-testable languages in the strict sense and applications to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(9), 920–925 (1990)
Porat, S., Feldman, J.: Learning automata from ordered examples. Machine Learning 7, 109–138 (1991)
Saitta, L. (ed.): The abstraction paths, Special issue of the Philosophical Transactions of Royal Society. Series B (2003)
Skounakis, M., Craven, M., Ray, S.: Hierarchical hidden markov models for information extraction. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence IJCAI 2003. Morgan Kaufmann, San Francisco (2003)
Xie, L., Chang, S., Divakaran, A., Sun, H.: Learning hierarchical hidden Markov models for video structure discovery. Tech. Rep. 2002-006. ADVENT Group, Columbia University (December 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Galassi, U., Giordana, A. (2005). Learning Regular Expressions from Noisy Sequences. In: Zucker, JD., Saitta, L. (eds) Abstraction, Reformulation and Approximation. SARA 2005. Lecture Notes in Computer Science(), vol 3607. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527862_7
Download citation
DOI: https://doi.org/10.1007/11527862_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27872-6
Online ISBN: 978-3-540-31882-8
eBook Packages: Computer ScienceComputer Science (R0)