LARS: A learning algorithm for rewriting systems

Eyraud, Rémi; de la Higuera, Colin; Janodet, Jean-Christophe

doi:10.1007/s10994-006-9593-8

LARS: A learning algorithm for rewriting systems

Published: 04 August 2006

Volume 66, pages 7–31, (2007)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

LARS: A learning algorithm for rewriting systems

Download PDF

Rémi Eyraud¹,
Colin de la Higuera¹ &
Jean-Christophe Janodet¹

648 Accesses
11 Citations
Explore all metrics

Abstract

Whereas there is a number of methods and algorithms to learn regular languages, moving up the Chomsky hierarchy is proving to be a challenging task. Indeed, several theoretical barriers make the class of context-free languages hard to learn. To tackle these barriers, we choose to change the way we represent these languages. Among the formalisms that allow the definition of classes of languages, the one of string-rewriting systems (SRS) has outstanding properties. We introduce a new type of SRS’s, called Delimited SRS (DSRS), that are expressive enough to define, in a uniform way, a noteworthy and non trivial class of languages that contains all the regular languages, \(\{a^{n}b^{n}: n \geq 0 \}\), \(\{w\in \{a,b\}^{*}:|w|_{a}=|w|_{b}\}\), the parenthesis languages of Dyck, the language of Lukasiewicz, and many others. Moreover, DSRS’s constitute an efficient (often linear) parsing device for strings, and are thus promising candidates in forthcoming applications of grammatical inference. In this paper, we pioneer the problem of their learnability. We propose a novel and sound algorithm (called LARS) which identifies a large subclass of them in polynomial time (but not data). We illustrate the execution of our algorithm through several examples, discuss the position of the class in the Chomsky hierarchy and finally raise some open questions and research directions.

Article PDF

Learning Tree Languages

Stochastic Context-Free Grammars, Regular Languages, and Newton’s Method

FORT 2.0

References

Adriaans, P., Fernau, H., & van Zaannen, M. (Eds.) (2002). Grammatical inference: Algorithms and applications, In Proceedings of ICGI ’02, vol. 2484 of LNAI, Berlin, Heidelberg: Springer-Verlag.
Adriaans, P., Vervoort, M. (2002). The EMILE 4.1 grammar induction toolbox. In P., Adriaans, H., Fernau, & van, M. Zaannen (Eds.), Grammatical inference: Algorithms and applications, Proceedings of ICGI ’02, vol. 2484 of LNAI (pp. 293–295). Berlin, Heidelberg: Springer-Verlag.
Angluin, D. (2001). Queries revisited. In N. Abe, R. Khardon, & T. Zeugmann (Eds.), Proceedings of ALT 2001, number 2225 in LNCS, (pp. 12–31), Berlin, Heidelberg: Springer-Verlag.
Boasson, L. (1980). Grammaire à non-terminaux séparés. In Proc. 7th ICALP (pp. 105–118). LNCS 85.
Book, R., & Otto, F. (1993). String-rewriting systems. Springer-Verlag.
Calera-Rubio, J., & Carrasco, R. C. (1998). Computing the relative entropy between regular tree languages. Information Processing Letters, 68(6), 283–289.
Article MathSciNet Google Scholar
Carrasco, R. C., & Oncina, J. (Eds.) (1994). Grammatical inference and applications. In Proceedings of ICGI ’94, number 862 in LNAI, Berlin, Heidelberg: Springer-Verlag.
Carrasco, R. C., & Oncina, J. (1994) Learning stochastic regular grammars by means of a state merging method. In R. C., Carrasco, & J. Oncina (Eds.), Grammatical inference and applications. Proceedings of ICGI ’94, number 862 in LNAI, Berlin, (pp. 139–150), Heidelberg, Springer-Verlag.
Carrasco, R. C., Oncina, J., & Calera-Rubio, J. (2001). Stochastic inference of regular tree languages. Machine Learning Journal, 44(1), 185–197.
Article MATH Google Scholar
Charniak, E. (1996). Tree-bank grammars. In AAAI/IAAI, (vol. 2, pp. 1031–1036).
Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 3, 113–124.
Article MATH Google Scholar
Clark, A. (2006). Learning deterministic context free grammars: the omphalos competition. Published in this special issue.
de la Higuera, C. (1997). Characteristic sets for polynomial grammatical inference. Machine Learning Journal, 27, 125–138.
Article MATH Google Scholar
de la Higuera, C., Adriaans, P., van Zaanen, M., & Oncina, J. (Eds.), (2003). In Proceedings of the Workshop and Tutorial on Learning Context-free Grammars. ISBN 953-6690-39-X.
de la Higuera, C., & Oncina, J. (2002). Learning deterministic linear languages. In J., Kivinen, & R. H., Sloan, (Eds.), Proceedings of COLT 2002, number 2375 in LNAI, (pp. 185–200). Berlin, Heidelberg. Springer-Verlag.
de la Higuera, C., & Oncina, J. (2006). Learning context-free languages. Artificial Intelligence Reviews. (To appear).
de Oliveira, A. L., & Silva, J. P. M. (2001). Efficient algorithms for the inference of minimum size DFAs. Machine Learning Journal, 44(1), 93–119.
Article Google Scholar
Dershowitz, N., & Jouannaud, J. (1990). Rewrite systems. In J. van Leeuwen (Ed.), Handbook of Theoretical Computer Science: Formal Methods and Semantics, (vol. B, chap. 6, pp. 243–320). North Holland, Amsterdam.
Dupont, P. (1994). Regular grammatical inference from positive & negative samples by genetic search: the GIG method. In R. C., Carrasco, & J. Oncina, (Eds.), Grammatical inference and applications, Proceedings of ICGI ’94, number 862 in LNAI (pp. 236–245). Berlin, Heidelberg: Springer-Verlag.
Emerald, J. D., Subramanian, K. G., & Thomas, D. G. (1998). Learning a subclass of context-free languages. In V., Honavar, & G. Slutski, (Eds.), Grammatical inference, Proceedings of ICGI ’98, number 1433 in LNAI, (pp. 223–231). Berlin, Heidelberg: Springer-Verlag.
Fernau, H. (2002). Learning tree languages from text. In J., Kivinen, & R. H. Sloan, (Eds.), Proceedings of COLT 2002, number 2375 in LNAI, (pp. 153–168). Berlin, Heidelberg. Springer-Verlag.
Frazier, M., & Page, C.D. Jr, (1994). Prefix grammars: An alternative characterisation of the regular languages. Information Processing Letters, 51(2), 67–71.
Article MATH MathSciNet Google Scholar
García, P., & Oncina, J. (1993). Inference of recognizable tree sets. Technical Report DSIC-II/47/93, Departamento de Lenguajes y Sistemas Informáticos, Universidad Politécnica de Valencia, Spain.
Giordano, J. Y. (1994). Inference of context-free grammars by enumeration: Structural containment as an ordering bias. In R. C., Carrasco, & J. Oncina, (Eds.), Grammatical inference and applications, Proceedings of ICGI ’94, number 862 in LNAI, (pp. 212–221). Berlin, Heidelberg, Springer-Verlag.
Gold, E. M. (1978). Complexity of automaton identification from given data. Information and Control, 37, 302–320.
Article MATH MathSciNet Google Scholar
Goldman, S. A., & Kearns, M. (1995). On the complexity of teaching. Journal of Computer and System Sciences, 50(1), 20–31.
Article MATH MathSciNet Google Scholar
Habrard, A., Bernard, M., & Jacquenet, F. (2002). Generalized stochastic tree automata for multi-relational data mining. In P., Adriaans, H., Fernau, & M. van Zaannen. (Eds.), Grammatical inference: Algorithms and applications, Proceedings of ICGI ’02, vol. 2484 of LNAI, (pp. 120–133). Berlin, Heidelberg. Springer-Verlag.
Honavar, V., & Slutski, G. (Eds.) (1998). Grammatical inference, Proceedings of ICGI ’98, number 1433 in LNAI, Berlin, Heidelberg. Springer-Verlag.
Ishizaka, H. (1995). Polynomial time learnability of simple deterministic languages. Machine Learning Journal, 5, 151–164.
Google Scholar
Kivinen, J., & Sloan, R. H. (Eds.), (2002). In Proceedings of COLT 2002, number 2375 in LNAI, Berlin, Heidelberg: Springer-Verlag.
Klop, J. W. (1992). Term rewriting systems. In S. Abramsky, D. Gabbay, & T. Maibaum, (Eds.), Handbook of Logic in Computer Science, (vol. 2, pp. 1–112). Oxford University Press.
Knuutila, T., & Steinby, M. (1994). Inference of tree languages from a finite sample: an algebraic approach. Theoretical Computer Science, 129, 337–367.
Article MATH MathSciNet Google Scholar
Koshiba, T., Mäkinen, E., & Takada, Y. (2000). Inferring pure context-free languages from positive data. Acta Cybernetica, 14(3), 469–477.
MATH MathSciNet Google Scholar
Kremer, S. C. (1997). Parallel stochastic grammar induction. In Proceedings of the 1997 International Conference on Neural Networks (ICNN ’97), (vol. I, pp. 612–616).
Laird, P., & Gamble, E. (1990). Ebg and term rewriting systems. In Algorithmic Learning Theory (pp. 425–440).
Lang, K., Pearlmutter, B. A., & Coste, F. (1998). The Gowachin automata learning competition.
Lang, K., Pearlmutter, B. A., & Price, R. A. (1998). The Abbadingo one DFA learning competition. In Proceedings of ICGI’98, (pp. 1–12). The abbadingo competition can be found at the address: http://abbadingo.cs.unm.edu/
Lang, K. J., Pearlmutter, B. A., & Price, R. A. (1998). Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In V., Honavar, & G. Slutski, (Eds.), Grammatical Inference, Proceedings of ICGI ’98, number 1433 in LNAI, (pp. 1–12). Berlin, Heidelberg: Springer-Verlag.
Lari, K., & Young, S. J. (1990). The estimation of stochastic context free grammars using the inside-outside algorithm. Computer Speech and Language, 4, 35–56.
Article Google Scholar
Lee, L. (1996). Learning of context-free languages: A survey of the literature. Technical Report TR-12-96, Center for Research in Computing Technology, Harvard University, Cambridge, Massachusetts.
McNaughton, R., Narendran, P., & Otto, F. (1988). Church-Rosser Thue systems and formal languages. Journal of the Association for Computing Machinery, 35(2), 324–344.
MATH MathSciNet Google Scholar
Moczydlowski, W., & Geser, A. (2005). Termination of single-threaded one-rule semi-thue systems. In Proceedings of the 16th International Conference on Rewriting Techniques and Applications, (pp. 338–352). LNCS 3467.
Nakamura, K., & Matsumoto, M. (2005). Incremental learning of context-free grammars based on bottom-up parsing and search. Pattern Recognition, 38(9), 1384–1392.
Article MATH Google Scholar
Nevill-Manning, C., & Witten, I. (1997). Identifying hierarchical structure in sequences: a linear-time algorithm. Journal of Artificial Intelligence Research, 7, 67–82.
MATH Google Scholar
Nivat, M. (1970). On some families of languages related to the dyck language. In Proc. 2nd Annual Symposium on Theory of Computing.
O’Donnell, M. J. (1977). Computing in Systems Described by Equations, vol. 58 of LNCS. Springer.
Oncina, J., & García, P. (1992). Identifying regular languages in polynomial time. In H. Bunke, (Ed.), Advances in Structural and Syntactic Pattern Recognition, vol. 5 of Series in Machine Perception and Artificial Intelligence, (pp. 99–108). World Scientific.
Petasis, G., Paliouras, G., Karkaletsis, V., Halatsis, C., & Spyropoulos, C. (2004). E-grids: Computationally efficient grammatical inference from positive examples. Grammars, 7, 69–110.
Google Scholar
Rao, M. R. K. Krishna. (2004). Inductive inference of term rewriting systems from positive data. In Agorithmic Learning Theory, (pp. 69–82).
Rico-Juan, J. R., Calera-Rubio, J., & Carrasco, R. C. Stochastic k-testable tree languages and applications. In Adriaans, P., Fernau, H., & van Zaannen, M. (Eds.), (2002). Grammatical inference: Algorithms and applications, In Proceedings of ICGI ’02, vol. 2484 of LNAI, (pp. 199–212). Berlin, Heidelberg: Springer-Verlag.
Sakakibara, Y. (1990). Learning context-free grammars from structural data in polynomial time. Theoretical Computer Science, 76, 223–242.
Article MATH MathSciNet Google Scholar
Sakakibara, Y. (1992). Efficient learning of context-free grammars from positive structural examples. Information and Computation, 97, 23–60.
Article MATH MathSciNet Google Scholar
Sakakibara, Y. (1997). Recent advances of grammatical inference. Theoretical Computer Science, 185, 15–45.
Article MATH MathSciNet Google Scholar
Sakakibara, Y., & Kondo, M. (1999). Ga-based learning of context-free grammars using tabular representations. In Proceedings of 16th International Conference on Machine Learning (ICML-99) (pp. 354–360).
Sénizergues, G. (1998). A polynomial algorithm testing partial confluence of basic semi-thue systems. Theor. Comput. Sci., 192(1), 55–75.
Article MATH Google Scholar
Starkie, B., Coste, F., & van Zaanen, M. (2004). Omphalos context-free language learning competition. The Omphalos competition is at the address: http://www.irisa.fr/Omphalos/
Takada, Y. (1988). Grammatical inference for even linear languages based on control sets. Information Processing Letters, 28(4), 193–199.
Article MATH MathSciNet Google Scholar
Thollard, F., Dupont, P., & de la Higuera, C. (2000). Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In Proc. 17th International Conf. on Machine Learning, (pp. 975–982). San Francisco, CA: Morgan Kaufmann.
Togashi, A., & Noguchi, S. (1990). Inductive inference of term rewriting systems realizing algebras. In Algorithm Learning Theory, (pp. 411–424).
Vanlehn, K., & Ball, W. (1987). A version space approach to learning context-free grammars. Machine Learning Journal, 2, 39–74.
Google Scholar
Wolf, G. (1978). Grammar discovery as data compression. In Proceedings of AISB/GI Conference on Artificial Intelligence, (pp. 375–379), Hamburg.
Yokomori, T. (2003). Polynomial-time identification of very simple grammars from positive data. Theor. Comput. Sci., 1(298), 179–206.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

EURISE, Université Jean Monnet de Saint-Etienne, 23 rue Paul Michelon, 42023, Saint-Etienne, France
Rémi Eyraud, Colin de la Higuera & Jean-Christophe Janodet

Authors

Rémi Eyraud
View author publications
You can also search for this author in PubMed Google Scholar
Colin de la Higuera
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Christophe Janodet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rémi Eyraud.

Additional information

This work was supported in part by the IST Program of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. This publication only reflects the authors’ views.

Editor: Georgios Paliouras and Yasubumi Sakakibara

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eyraud, R., de la Higuera, C. & Janodet, JC. LARS: A learning algorithm for rewriting systems. Mach Learn 66, 7–31 (2007). https://doi.org/10.1007/s10994-006-9593-8

Download citation

Received: 05 July 2005
Revised: 02 May 2006
Accepted: 22 June 2006
Published: 04 August 2006
Issue Date: January 2007
DOI: https://doi.org/10.1007/s10994-006-9593-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

LARS: A learning algorithm for rewriting systems

Abstract

Article PDF

Similar content being viewed by others

Learning Tree Languages

Stochastic Context-Free Grammars, Regular Languages, and Newton’s Method

FORT 2.0

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

LARS: A learning algorithm for rewriting systems

Abstract

Article PDF

Similar content being viewed by others

Learning Tree Languages

Stochastic Context-Free Grammars, Regular Languages, and Newton’s Method

FORT 2.0

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation