Abstract
Whereas there is a number of methods and algorithms to learn regular languages, moving up the Chomsky hierarchy is proving to be a challenging task. Indeed, several theoretical barriers make the class of context-free languages hard to learn. To tackle these barriers, we choose to change the way we represent these languages. Among the formalisms that allow the definition of classes of languages, the one of string-rewriting systems (SRS) has outstanding properties. We introduce a new type of SRS’s, called Delimited SRS (DSRS), that are expressive enough to define, in a uniform way, a noteworthy and non trivial class of languages that contains all the regular languages, \(\{a^{n}b^{n}: n \geq 0 \}\), \(\{w\in \{a,b\}^{*}:|w|_{a}=|w|_{b}\}\), the parenthesis languages of Dyck, the language of Lukasiewicz, and many others. Moreover, DSRS’s constitute an efficient (often linear) parsing device for strings, and are thus promising candidates in forthcoming applications of grammatical inference. In this paper, we pioneer the problem of their learnability. We propose a novel and sound algorithm (called LARS) which identifies a large subclass of them in polynomial time (but not data). We illustrate the execution of our algorithm through several examples, discuss the position of the class in the Chomsky hierarchy and finally raise some open questions and research directions.
Article PDF
Similar content being viewed by others
References
Adriaans, P., Fernau, H., & van Zaannen, M. (Eds.) (2002). Grammatical inference: Algorithms and applications, In Proceedings of ICGI ’02, vol. 2484 of LNAI, Berlin, Heidelberg: Springer-Verlag.
Adriaans, P., Vervoort, M. (2002). The EMILE 4.1 grammar induction toolbox. In P., Adriaans, H., Fernau, & van, M. Zaannen (Eds.), Grammatical inference: Algorithms and applications, Proceedings of ICGI ’02, vol. 2484 of LNAI (pp. 293–295). Berlin, Heidelberg: Springer-Verlag.
Angluin, D. (2001). Queries revisited. In N. Abe, R. Khardon, & T. Zeugmann (Eds.), Proceedings of ALT 2001, number 2225 in LNCS, (pp. 12–31), Berlin, Heidelberg: Springer-Verlag.
Boasson, L. (1980). Grammaire à non-terminaux séparés. In Proc. 7th ICALP (pp. 105–118). LNCS 85.
Book, R., & Otto, F. (1993). String-rewriting systems. Springer-Verlag.
Calera-Rubio, J., & Carrasco, R. C. (1998). Computing the relative entropy between regular tree languages. Information Processing Letters, 68(6), 283–289.
Carrasco, R. C., & Oncina, J. (Eds.) (1994). Grammatical inference and applications. In Proceedings of ICGI ’94, number 862 in LNAI, Berlin, Heidelberg: Springer-Verlag.
Carrasco, R. C., & Oncina, J. (1994) Learning stochastic regular grammars by means of a state merging method. In R. C., Carrasco, & J. Oncina (Eds.), Grammatical inference and applications. Proceedings of ICGI ’94, number 862 in LNAI, Berlin, (pp. 139–150), Heidelberg, Springer-Verlag.
Carrasco, R. C., Oncina, J., & Calera-Rubio, J. (2001). Stochastic inference of regular tree languages. Machine Learning Journal, 44(1), 185–197.
Charniak, E. (1996). Tree-bank grammars. In AAAI/IAAI, (vol. 2, pp. 1031–1036).
Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 3, 113–124.
Clark, A. (2006). Learning deterministic context free grammars: the omphalos competition. Published in this special issue.
de la Higuera, C. (1997). Characteristic sets for polynomial grammatical inference. Machine Learning Journal, 27, 125–138.
de la Higuera, C., Adriaans, P., van Zaanen, M., & Oncina, J. (Eds.), (2003). In Proceedings of the Workshop and Tutorial on Learning Context-free Grammars. ISBN 953-6690-39-X.
de la Higuera, C., & Oncina, J. (2002). Learning deterministic linear languages. In J., Kivinen, & R. H., Sloan, (Eds.), Proceedings of COLT 2002, number 2375 in LNAI, (pp. 185–200). Berlin, Heidelberg. Springer-Verlag.
de la Higuera, C., & Oncina, J. (2006). Learning context-free languages. Artificial Intelligence Reviews. (To appear).
de Oliveira, A. L., & Silva, J. P. M. (2001). Efficient algorithms for the inference of minimum size DFAs. Machine Learning Journal, 44(1), 93–119.
Dershowitz, N., & Jouannaud, J. (1990). Rewrite systems. In J. van Leeuwen (Ed.), Handbook of Theoretical Computer Science: Formal Methods and Semantics, (vol. B, chap. 6, pp. 243–320). North Holland, Amsterdam.
Dupont, P. (1994). Regular grammatical inference from positive & negative samples by genetic search: the GIG method. In R. C., Carrasco, & J. Oncina, (Eds.), Grammatical inference and applications, Proceedings of ICGI ’94, number 862 in LNAI (pp. 236–245). Berlin, Heidelberg: Springer-Verlag.
Emerald, J. D., Subramanian, K. G., & Thomas, D. G. (1998). Learning a subclass of context-free languages. In V., Honavar, & G. Slutski, (Eds.), Grammatical inference, Proceedings of ICGI ’98, number 1433 in LNAI, (pp. 223–231). Berlin, Heidelberg: Springer-Verlag.
Fernau, H. (2002). Learning tree languages from text. In J., Kivinen, & R. H. Sloan, (Eds.), Proceedings of COLT 2002, number 2375 in LNAI, (pp. 153–168). Berlin, Heidelberg. Springer-Verlag.
Frazier, M., & Page, C.D. Jr, (1994). Prefix grammars: An alternative characterisation of the regular languages. Information Processing Letters, 51(2), 67–71.
García, P., & Oncina, J. (1993). Inference of recognizable tree sets. Technical Report DSIC-II/47/93, Departamento de Lenguajes y Sistemas Informáticos, Universidad Politécnica de Valencia, Spain.
Giordano, J. Y. (1994). Inference of context-free grammars by enumeration: Structural containment as an ordering bias. In R. C., Carrasco, & J. Oncina, (Eds.), Grammatical inference and applications, Proceedings of ICGI ’94, number 862 in LNAI, (pp. 212–221). Berlin, Heidelberg, Springer-Verlag.
Gold, E. M. (1978). Complexity of automaton identification from given data. Information and Control, 37, 302–320.
Goldman, S. A., & Kearns, M. (1995). On the complexity of teaching. Journal of Computer and System Sciences, 50(1), 20–31.
Habrard, A., Bernard, M., & Jacquenet, F. (2002). Generalized stochastic tree automata for multi-relational data mining. In P., Adriaans, H., Fernau, & M. van Zaannen. (Eds.), Grammatical inference: Algorithms and applications, Proceedings of ICGI ’02, vol. 2484 of LNAI, (pp. 120–133). Berlin, Heidelberg. Springer-Verlag.
Honavar, V., & Slutski, G. (Eds.) (1998). Grammatical inference, Proceedings of ICGI ’98, number 1433 in LNAI, Berlin, Heidelberg. Springer-Verlag.
Ishizaka, H. (1995). Polynomial time learnability of simple deterministic languages. Machine Learning Journal, 5, 151–164.
Kivinen, J., & Sloan, R. H. (Eds.), (2002). In Proceedings of COLT 2002, number 2375 in LNAI, Berlin, Heidelberg: Springer-Verlag.
Klop, J. W. (1992). Term rewriting systems. In S. Abramsky, D. Gabbay, & T. Maibaum, (Eds.), Handbook of Logic in Computer Science, (vol. 2, pp. 1–112). Oxford University Press.
Knuutila, T., & Steinby, M. (1994). Inference of tree languages from a finite sample: an algebraic approach. Theoretical Computer Science, 129, 337–367.
Koshiba, T., Mäkinen, E., & Takada, Y. (2000). Inferring pure context-free languages from positive data. Acta Cybernetica, 14(3), 469–477.
Kremer, S. C. (1997). Parallel stochastic grammar induction. In Proceedings of the 1997 International Conference on Neural Networks (ICNN ’97), (vol. I, pp. 612–616).
Laird, P., & Gamble, E. (1990). Ebg and term rewriting systems. In Algorithmic Learning Theory (pp. 425–440).
Lang, K., Pearlmutter, B. A., & Coste, F. (1998). The Gowachin automata learning competition.
Lang, K., Pearlmutter, B. A., & Price, R. A. (1998). The Abbadingo one DFA learning competition. In Proceedings of ICGI’98, (pp. 1–12). The abbadingo competition can be found at the address: http://abbadingo.cs.unm.edu/
Lang, K. J., Pearlmutter, B. A., & Price, R. A. (1998). Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In V., Honavar, & G. Slutski, (Eds.), Grammatical Inference, Proceedings of ICGI ’98, number 1433 in LNAI, (pp. 1–12). Berlin, Heidelberg: Springer-Verlag.
Lari, K., & Young, S. J. (1990). The estimation of stochastic context free grammars using the inside-outside algorithm. Computer Speech and Language, 4, 35–56.
Lee, L. (1996). Learning of context-free languages: A survey of the literature. Technical Report TR-12-96, Center for Research in Computing Technology, Harvard University, Cambridge, Massachusetts.
McNaughton, R., Narendran, P., & Otto, F. (1988). Church-Rosser Thue systems and formal languages. Journal of the Association for Computing Machinery, 35(2), 324–344.
Moczydlowski, W., & Geser, A. (2005). Termination of single-threaded one-rule semi-thue systems. In Proceedings of the 16th International Conference on Rewriting Techniques and Applications, (pp. 338–352). LNCS 3467.
Nakamura, K., & Matsumoto, M. (2005). Incremental learning of context-free grammars based on bottom-up parsing and search. Pattern Recognition, 38(9), 1384–1392.
Nevill-Manning, C., & Witten, I. (1997). Identifying hierarchical structure in sequences: a linear-time algorithm. Journal of Artificial Intelligence Research, 7, 67–82.
Nivat, M. (1970). On some families of languages related to the dyck language. In Proc. 2nd Annual Symposium on Theory of Computing.
O’Donnell, M. J. (1977). Computing in Systems Described by Equations, vol. 58 of LNCS. Springer.
Oncina, J., & García, P. (1992). Identifying regular languages in polynomial time. In H. Bunke, (Ed.), Advances in Structural and Syntactic Pattern Recognition, vol. 5 of Series in Machine Perception and Artificial Intelligence, (pp. 99–108). World Scientific.
Petasis, G., Paliouras, G., Karkaletsis, V., Halatsis, C., & Spyropoulos, C. (2004). E-grids: Computationally efficient grammatical inference from positive examples. Grammars, 7, 69–110.
Rao, M. R. K. Krishna. (2004). Inductive inference of term rewriting systems from positive data. In Agorithmic Learning Theory, (pp. 69–82).
Rico-Juan, J. R., Calera-Rubio, J., & Carrasco, R. C. Stochastic k-testable tree languages and applications. In Adriaans, P., Fernau, H., & van Zaannen, M. (Eds.), (2002). Grammatical inference: Algorithms and applications, In Proceedings of ICGI ’02, vol. 2484 of LNAI, (pp. 199–212). Berlin, Heidelberg: Springer-Verlag.
Sakakibara, Y. (1990). Learning context-free grammars from structural data in polynomial time. Theoretical Computer Science, 76, 223–242.
Sakakibara, Y. (1992). Efficient learning of context-free grammars from positive structural examples. Information and Computation, 97, 23–60.
Sakakibara, Y. (1997). Recent advances of grammatical inference. Theoretical Computer Science, 185, 15–45.
Sakakibara, Y., & Kondo, M. (1999). Ga-based learning of context-free grammars using tabular representations. In Proceedings of 16th International Conference on Machine Learning (ICML-99) (pp. 354–360).
Sénizergues, G. (1998). A polynomial algorithm testing partial confluence of basic semi-thue systems. Theor. Comput. Sci., 192(1), 55–75.
Starkie, B., Coste, F., & van Zaanen, M. (2004). Omphalos context-free language learning competition. The Omphalos competition is at the address: http://www.irisa.fr/Omphalos/
Takada, Y. (1988). Grammatical inference for even linear languages based on control sets. Information Processing Letters, 28(4), 193–199.
Thollard, F., Dupont, P., & de la Higuera, C. (2000). Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In Proc. 17th International Conf. on Machine Learning, (pp. 975–982). San Francisco, CA: Morgan Kaufmann.
Togashi, A., & Noguchi, S. (1990). Inductive inference of term rewriting systems realizing algebras. In Algorithm Learning Theory, (pp. 411–424).
Vanlehn, K., & Ball, W. (1987). A version space approach to learning context-free grammars. Machine Learning Journal, 2, 39–74.
Wolf, G. (1978). Grammar discovery as data compression. In Proceedings of AISB/GI Conference on Artificial Intelligence, (pp. 375–379), Hamburg.
Yokomori, T. (2003). Polynomial-time identification of very simple grammars from positive data. Theor. Comput. Sci., 1(298), 179–206.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by the IST Program of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. This publication only reflects the authors’ views.
Editor: Georgios Paliouras and Yasubumi Sakakibara
Rights and permissions
About this article
Cite this article
Eyraud, R., de la Higuera, C. & Janodet, JC. LARS: A learning algorithm for rewriting systems. Mach Learn 66, 7–31 (2007). https://doi.org/10.1007/s10994-006-9593-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-9593-8