Skip to main content

Learning Regular Expressions from Noisy Sequences

  • Conference paper
Abstraction, Reformulation and Approximation (SARA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3607))

Abstract

The presence of long gaps dramatically increases the diffculty of detecting and characterizing complex events hidden in long sequences. In order to cope with this problem, a learning algorithm based on an abstraction mechanism is proposed: it can infer the general model of complex events from a set of learning sequences. Events are described by means of regular expressions, and the abstraction mechanism is based on the substitution property of regular languages. The induction algorithm proceeds bottom-up, progressively coarsening the sequence granularity, letting correlations between subsequences, separated by long gaps, naturally emerge. Two abstraction operators are defined. The first one detects, and abstracts into non-terminal symbols, regular expressions not containing iterative constructs. The second one detects and abstracts iterated subsequences. By interleaving the two operators, regular expressions in general form may be inferred. Both operators are based on string alignment algorithms taken from bio-informatics. A restricted form of the algorithm has already been outlined in previous papers, where the emphasis was on applications. Here, the algorithm, in an extended version, is described and analyzed into details.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Angluin, D.: Queries and concept learning. Machine Learning 2(4), 319–342 (1988)

    Google Scholar 

  2. Botta, M., Galassi, U., Giordana, A.: Learning complex and sparse events in long sequences. In: Proceedings of the European Conference on Artificial Intelligence, ECAI 2004, Valencia, Spain (August 2004)

    Google Scholar 

  3. Botta, M., Giordana, A., Terenziani, P.: Discovering complex events in long sequences. In: Proceedings of the Workshop on learning in temporal sequences, Machine Learning Conference, Sidney, Australia (July 2002)

    Google Scholar 

  4. Denis, F.: Learning regular languages from simple positive examples. Machine Learning 44(1/2), 37–66 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  5. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological sequence analysis. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  6. Elman, J.L.: Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning 7, 195–225 (1991)

    Google Scholar 

  7. Fine, S., Singer, Y., Tishby, N.: The hierarchical hidden markov model: Analysis and applications. Machine Learning 32, 41–62 (1998)

    Article  MATH  Google Scholar 

  8. Frasconi, P., Bengio, Y.: An em approach to grammatical inference: iputo/output hmms. In: Proceedings of International Conference on Pattern Recognition, ICPR 1994 (1994)

    Google Scholar 

  9. Fu, K.S.: Syntactic pattern recognition and applications. Prentice Hall, Englewood Cliffs (1982)

    MATH  Google Scholar 

  10. Fu, K.S., Booth, T.L.: Grammatical inference: Introduction and survey (part 1). IEEE Transaction on System, Men and Cybernetics 5, 85–111 (1975)

    Google Scholar 

  11. Galassi, U., Giordana, A., Mendola, D.: Learning user profiles from traces. Technical report TR-INF-2005-04-02-UNIPMN (2005)

    Google Scholar 

  12. Gussfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)

    Book  Google Scholar 

  13. Hopcroft, J.E., Ullman, J.D.: Formal languages and their relation to automata. Addison-Wesley, Reading (1969)

    MATH  Google Scholar 

  14. Murphy, K., Paskin, M.: Linear time inference in hierarchical hmms. In: Advances in Neural Information Processing Systems (NIPS 2001), vol. 14 (2001)

    Google Scholar 

  15. Myers, E.W., Miller, W.: Approximate matching of regular expressions. Bulletin of Mathematical Biology 51(2), 5–37 (1989)

    MATH  MathSciNet  Google Scholar 

  16. Parekh, R.G., Honavar, V.G.: Learning DFA from simple examples. In: Li, M. (ed.) ALT 1997. LNCS (LNAI), vol. 1316, pp. 116–131. Springer, Heidelberg (1997)

    Google Scholar 

  17. Parekh, R., Nichitiu, C., Honavar, V.: A polynomial time incremental algorithm for learning DFA. In: Honavar, V.G., Slutzki, G. (eds.) ICGI 1998. LNCS (LNAI), vol. 1433, pp. 37–50. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Garca, P., Vidal, E.: Inference of k-testable languages in the strict sense and applications to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(9), 920–925 (1990)

    Article  Google Scholar 

  19. Porat, S., Feldman, J.: Learning automata from ordered examples. Machine Learning 7, 109–138 (1991)

    MATH  Google Scholar 

  20. Saitta, L. (ed.): The abstraction paths, Special issue of the Philosophical Transactions of Royal Society. Series B (2003)

    Google Scholar 

  21. Skounakis, M., Craven, M., Ray, S.: Hierarchical hidden markov models for information extraction. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence IJCAI 2003. Morgan Kaufmann, San Francisco (2003)

    Google Scholar 

  22. Xie, L., Chang, S., Divakaran, A., Sun, H.: Learning hierarchical hidden Markov models for video structure discovery. Tech. Rep. 2002-006. ADVENT Group, Columbia University (December 2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Galassi, U., Giordana, A. (2005). Learning Regular Expressions from Noisy Sequences. In: Zucker, JD., Saitta, L. (eds) Abstraction, Reformulation and Approximation. SARA 2005. Lecture Notes in Computer Science(), vol 3607. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527862_7

Download citation

  • DOI: https://doi.org/10.1007/11527862_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27872-6

  • Online ISBN: 978-3-540-31882-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics