## Abstract

The most widely used learning paradigm in Grammatical Inference was introduced in 1967 and is known as *identification in the limit*. An important issue that has been raised with respect to the original definition is the absence of efficiency bounds. Nearly fifty years after its introduction, it remains an open problem how to best incorporate a notion of efficiency and tractability into this framework. This chapter surveys the different refinements that have been developed and studied, and the challenges they face. Main results for each formalization, along with comparisons, are provided.

## Access this chapter

Tax calculation will be finalised at checkout

Purchases are for personal use only

### Similar content being viewed by others

## Notes

- 1.
We define \(\Vert S\Vert = |S|+\sum _{w \in S}|w|\) so that \(\Vert \{a\}\Vert < \Vert \{\lambda ,a\}\Vert \).

- 2.
This is not strictly necessary: for instance, the substitutable languages [13] have no grammatical characterization.

- 3.
The uniform membership problem is the one where given a string and a representation one needs to determine whether the string belongs to the represented language.

- 4.
The size of a sample is the sum of the length of its elements: it has been shown [35] that its cardinality is not a relevant feature when efficiency is considered, as it creates a risk of collusion: one can delay an exponential computation on a given sample of data and wait for a sufficient number of examples to run the computation on the former sample in polynomial time in the size of the latter.

- 5.
An incremental learner is

*conservative*if it changes its current hypothesis*H*if and only if the next datum is inconsistent with*H*. - 6.
Notice that the algorithm was originally presented in an incremental paradigm. However, its study was (mostly) done in a set-based framework and, as is shown in this appendix, the proofs are valid only in this context.

- 7.
This is known as

*behaviorally correct*identification in the limit. - 8.
The algorithm is consistent which implies that the two first elements of the characteristic sample are in the conjectured language and we have the rule \(a \rightarrow abc\) that generates the third element of

*CS*from the axiom.

## References

A. Ambainis, S. Jain, and A. Sharma. Ordinal mind change complexity of language identification.

*Theoretical Computer Science*, pages 323–343, 1999.D. Angluin. Finding patterns common to a set of strings.

*Journal of Computer and System Sciences*, 21:46–62, 1980.D. Angluin. Queries and concept learning.

*Machine Learning*, 2(4):319–342, 1987.D. Angluin, J. Aspnes, and A. Kontorovich. On the learnability of shuffle ideals. In

*Proceedings of the Algorithmic Learning Theory Conference*, pages 111–123, 2012.Dana Angluin. Inductive inference of formal languages from positive data.

*Information and Control*, 45:117–135, 1980.L. Becerra-Bonache, A. Dediu, and C. Tirnăucă. Learning DFA from correction and equivalence queries. In

*Proceedings of the International Colloquium on Grammatical Inference*, pages 281–292, 2006.L. E. Blum and M. Blum. Toward a mathematical theory of inductive inference.

*Information and Control*, 28(2):125–155, 1975.A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Learnability and the Vapnik-Chervonenkis dimension.

*Journal of the ACM*, 36(4):929–965, 1989.R. Book and F. Otto.

*String-Rewriting Systems*. Springer Verlag, 1993.J. Case and T. Kötzing. Difficulties in forcing fairness of polynomial time inductive inference. In

*Proceedings of the Algorithmic Learning Theory Conference*, pages 263–277, 2009.N. Chomsky. Three models for the description of language.

*IRE Transactions on Information Theory*, 2:113–124, 1956.A. Clark. Learning trees from strings: A strong learning algorithm for some context-free grammars.

*Journal of Machine Learning Research*, 14:3537–3559, 2014.A. Clark and R. Eyraud. Polynomial identification in the limit of substitutable context-free languages.

*Journal of Machine Learning Research*, 8:1725–1745, 2007.A. Clark and S. Lappin.

*Linguistic Nativism and the Poverty of the Stimulus*. Wiley-Blackwell, 2011.A. Clark and F. Thollard. PAC-learnability of probabilistic deterministic finite state automata.

*Journal of Machine Learning Research*, 5:473–497, 2004.A. Clark and R. Yoshinaka. Distributional learning of parallel multiple context-free grammars.

*Machine Learning*, 96:5–31, 2014.H. Comon, M. Dauchet, R. Gilleron, C. Löding, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. Tree automata techniques and applications. Available on: http://tata.gforge.inria.fr/, 2007.

C. de la Higuera. Characteristic sets for polynomial grammatical inference.

*Machine Learning*, 27:125–138, 1997.C. de la Higuera.

*Grammatical inference: learning automata and grammars*. Cambridge University Press, 2010.C. de la Higuera and J. Oncina. Learning deterministic linear languages. In

*Proceedings of Conference on Learning Theory*, pages 185–200, 2002.P. Dupont, L. Miclet, and E. Vidal. What is the search space of the regular inference? In

*Proceedings of the International Colloquium on Grammatical Inference*, pages 25–37, 1994.R. Eyraud, C. de la Higuera, and J.-C. Janodet. LARS: A learning algorithm for rewriting systems.

*Machine Learning*, 66(1):7–31, 2007.F. Girosi. An equivalence between sparse approximation and support vector machines.

*Neural Comput.*, 10(6):1455–1480, 1998.E. M. Gold. Language identification in the limit.

*Information and Control*, 10(5):447–474, 1967.J. Heinz. Computational theories of learning and developmental psycholinguistics. In J. Lidz, W. Synder, and J. Pater, editors,

*The Oxford Handbook of Developmental Linguistics*. Cambridge University Press, in pressD. Hsu, S. M. Kakade, and P. Liang. Identifiability and unmixing of latent parse trees. In

*Advances in Neural Information Processing Systems (NIPS)*, pages 1520–1528, 2013.M. Isberner, F. Howar, and B. Steffen. Learning register automata: from languages to program structures.

*Machine Learning*, 96:65–98, 2014.Y. Ishigami and S. Tani. VC-dimensions of finite automata and commutative finite automata with \(k\) letters and \(n\) states.

*Discrete Applied Mathematics*, 74:123–134, 1997.J. Langford. Tutorial on practical prediction theory for classification.

*Journal of Machine Learning Research*, 6:273–306, December 2005.M. Li and P. Vitanyi. Learning simple concepts under simple distributions.

*SIAM Journal of Computing*, 20:911–935, 1991.E. Moore. Gedanken-experiments on sequential machines. In Claude Shannon and John McCarthy, editors,

*Automata Studies*, pages 129–153. Princeton University Press, 1956.T. Oates, D. Desai, and V. Bhat. Learning k-reversible context-free grammars from positive structural examples. In

*Proceedings of the International Conference in Machine Learning*, pages 459–465, 2002.J. Oncina and P. García. Identifying regular languages in polynomial time. In

*Advances in Structural and Syntactic Pattern Recognition*, volume 5 of*Series in Machine Perception and Artificial Intelligence*, pages 99–108. 1992.T.-W. Pao and J. Carr III. A solution of the syntactical induction-inference problem for regular languages.

*Computer Languages*, 3(1):53 – 64, 1978.L. Pitt. Inductive inference, DFA’s, and computational complexity. In

*Analogical and Inductive Inference*, number 397 in LNAI, pages 18–44. Springer-Verlag, 1989.D. Ron, Y. Singer, and N. Tishby. On the learnability and usage of acyclic probabilistic finite automata. In

*Proceedings of the Conference on Learning Theory*, pages 31–40, 1995.G. Rozenberg, editor.

*Handbook of Graph Grammars and Computing by Graph Transformation: Volume I. Foundations*. World Scientific, 1997.Y. Sakakibara. Efficient learning of context-free grammars from positive structural examples.

*Information and Computation*, 97:23–60, 1992.Hiroyuki Seki, Takashi Matsumura, Mamoru Fujii, and Tadao Kasami. On multiple context-free grammars.

*Theoretical Computer Science*, 88(2):191–229, 1991.J. M. Sempere and P. García. A characterization of even linear languages and its application to the learning problem. In

*Proceedings of the International Colloquium in Grammatical Inference*, pages 38–44, 1994.C. Shibata and R. Yoshinaka. PAC-learning of some subclasses of context-free grammars with basic distributional properties from positive data. In

*Proceedings of the Algorithmic Learning Theory conference*, pages 143–157, 2013.Y. Tajima, E. Tomita, M. Wakatsuki, and M. Terada. Polynomial time learning of simple deterministic languages via queries and a representative sample.

*Theoretical Computer Science*, 329(1-3):203 – 221, 2004.L. G. Valiant. A theory of the learnable.

*Communications of the Association for Computing Machinery*, 27(11):1134–1142, 1984.V. Vapnik.

*The nature of statistical learning theory*. Springer, 1995.M. Wakatsuki and E. Tomita. A fast algorithm for checking the inclusion for very simple deterministic pushdown automata.

*IEICE TRANSACTIONS on Information and Systems*, VE76-D(10):1224–1233, 1993.T. Yokomori. On polynomial-time learnability in the limit of strictly deterministic automata.

*Machine Learning*, 19:153–179, 1995.T. Yokomori. Polynomial-time identification of very simple grammars from positive data.

*Theoretical Computer Science*, 1(298):179–206, 2003.R. Yoshinaka. Identification in the limit of \(k, l\)-substitutable context-free languages. In

*Proceedings of the International Colloquium in Grammatical Inference*, pages 266–279, 2008.R. Yoshinaka. Learning efficiency of very simple grammars from positive data.

*Theoretical Computer Science*, 410(19):1807–1825, 2009.R. Yoshinaka. Efficient learning of multiple context-free languages with multidimensional substitutability from positive data.

*Theoretical Computer Science*, 412:1821–1831, 2011.T. Zeugmann. Can learning in the limit be done efficiently? In

*Proceedings of the Algorithmic Learning Theory conference*, pages 17–38, 2003.T. Zeugmann. From learning in the limit to stochastic finite learning.

*Theoretical Computer Science*, 364(1):77–97, 2006.

## Author information

### Authors and Affiliations

### Corresponding author

## Editor information

### Editors and Affiliations

## Appendix

### Appendix

Here we present an example that shows that a learning result in a set-based approach (that of IPTtD) may not yield to a learning result in the incremental approach.

A characteristic sample has been exhibited for a set-based polynomial-time learning algorithm^{Footnote 6} for the class of substitutable context-free languages [13]. The size of this characteristic sample is polynomial in the size of the target grammar and its thickness [48]. From any superset of this set, the algorithm returns a representation that generates the target language. Therefore, one can state that the algorithm learns the class of substitutable context-free languages in a set-based approach.

A particularity of this algorithm is that from two different supersets of the characteristic sample, it may returns two different equivalent grammars, and the number of such pairs of samples is infinite (this is due to the infinite number of congruence classes that a context-free language defines). Consider the incremental version of the algorithm that computes a new grammar for every new example. It therefore does not fit the conditions of identification in the limit since there does not exist a moment after which the algorithm always returns the same hypothesis, though there exists a point after which the generated language will always be the target one.^{Footnote 7}

An intuitive solution is then to make the algorithm conservative: the incremental version of the algorithm has to change its hypothesis only if the new example is not recognized. However, this is not working as is shown with the following example.

Consider the language \(a(\{b,c\}\{b,c\})^*\), which is substitutable. It is also context-free as it can be generated by the grammar whose rules are \(S \rightarrow a | SBB\) and \(B \rightarrow b | c\), with *S* being the only axiom.

As defined in the previously cited papers, the characteristic sample is the following set: \(CS=\{lur \in \varSigma ^* : \exists N\rightarrow \alpha , (l,r)=C(A)\text { and } u=\omega (\alpha )\}\), where *C*(*A*) is the smallest context in which the non-terminal *A* can appear in a derivation, and \(\omega (\alpha )\) is the smallest element of \(\varSigma ^*\) that can be derived from \(\alpha \) in the grammar.

If we assume \(a< b < c\) and \((ab, \lambda ) < (a,b)\), the characteristic sample is then \(CS=\{a, abb, abc\}\).

Suppose the learner gets examples *a*, *abb*, *abbbc* in this order. As the letter *c* is new, the conjecture has to be updated at this point. The new conjecture is then the string rewriting system \(\{a \rightarrow abb, a \rightarrow abc, b\rightarrow bbc\}\) with *a* being the only axiom. It generates every sentence in the characteristic sample.^{Footnote 8} However the hypothesis is not correct since for example *acc* is in the target language but not in the current one. Therefore, if the next example is the missing string of the characteristic sample, *abc*, the algorithm will not change its hypothesis: though all elements of the characteristic sample are available, the current hypothesis is not correct. Once an element of the language that is not generated by the hypothesis is seen, the hypothesis will be updated using a set containing a characteristic sample and thus the new conjecture will correspond to a correct representation of the language.

## Rights and permissions

## Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

## About this chapter

### Cite this chapter

Eyraud, R., Heinz, J., Yoshinaka, R. (2016). Efficiency in the Identification in the Limit Learning Paradigm. In: Heinz, J., Sempere, J. (eds) Topics in Grammatical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48395-4_2

### Download citation

DOI: https://doi.org/10.1007/978-3-662-48395-4_2

Published:

Publisher Name: Springer, Berlin, Heidelberg

Print ISBN: 978-3-662-48393-0

Online ISBN: 978-3-662-48395-4

eBook Packages: Computer ScienceComputer Science (R0)