Skip to main content

Efficiency in the Identification in the Limit Learning Paradigm

  • Chapter
  • First Online:
Topics in Grammatical Inference

Abstract

The most widely used learning paradigm in Grammatical Inference was introduced in 1967 and is known as identification in the limit. An important issue that has been raised with respect to the original definition is the absence of efficiency bounds. Nearly fifty years after its introduction, it remains an open problem how to best incorporate a notion of efficiency and tractability into this framework. This chapter surveys the different refinements that have been developed and studied, and the challenges they face. Main results for each formalization, along with comparisons, are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We define \(\Vert S\Vert = |S|+\sum _{w \in S}|w|\) so that \(\Vert \{a\}\Vert < \Vert \{\lambda ,a\}\Vert \).

  2. 2.

    This is not strictly necessary: for instance, the substitutable languages [13] have no grammatical characterization.

  3. 3.

    The uniform membership problem is the one where given a string and a representation one needs to determine whether the string belongs to the represented language.

  4. 4.

    The size of a sample is the sum of the length of its elements: it has been shown [35] that its cardinality is not a relevant feature when efficiency is considered, as it creates a risk of collusion: one can delay an exponential computation on a given sample of data and wait for a sufficient number of examples to run the computation on the former sample in polynomial time in the size of the latter.

  5. 5.

    An incremental learner is conservative if it changes its current hypothesis H if and only if the next datum is inconsistent with H.

  6. 6.

    Notice that the algorithm was originally presented in an incremental paradigm. However, its study was (mostly) done in a set-based framework and, as is shown in this appendix, the proofs are valid only in this context.

  7. 7.

    This is known as behaviorally correct identification in the limit.

  8. 8.

    The algorithm is consistent which implies that the two first elements of the characteristic sample are in the conjectured language and we have the rule \(a \rightarrow abc\) that generates the third element of CS from the axiom.

References

  1. A. Ambainis, S. Jain, and A. Sharma. Ordinal mind change complexity of language identification. Theoretical Computer Science, pages 323–343, 1999.

    Google Scholar 

  2. D. Angluin. Finding patterns common to a set of strings. Journal of Computer and System Sciences, 21:46–62, 1980.

    Article  MathSciNet  MATH  Google Scholar 

  3. D. Angluin. Queries and concept learning. Machine Learning, 2(4):319–342, 1987.

    Google Scholar 

  4. D. Angluin, J. Aspnes, and A. Kontorovich. On the learnability of shuffle ideals. In Proceedings of the Algorithmic Learning Theory Conference, pages 111–123, 2012.

    Google Scholar 

  5. Dana Angluin. Inductive inference of formal languages from positive data. Information and Control, 45:117–135, 1980.

    Article  MathSciNet  MATH  Google Scholar 

  6. L. Becerra-Bonache, A. Dediu, and C. Tirnăucă. Learning DFA from correction and equivalence queries. In Proceedings of the International Colloquium on Grammatical Inference, pages 281–292, 2006.

    Google Scholar 

  7. L. E. Blum and M. Blum. Toward a mathematical theory of inductive inference. Information and Control, 28(2):125–155, 1975.

    Article  MathSciNet  MATH  Google Scholar 

  8. A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4):929–965, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  9. R. Book and F. Otto. String-Rewriting Systems. Springer Verlag, 1993.

    Google Scholar 

  10. J. Case and T. Kötzing. Difficulties in forcing fairness of polynomial time inductive inference. In Proceedings of the Algorithmic Learning Theory Conference, pages 263–277, 2009.

    Google Scholar 

  11. N. Chomsky. Three models for the description of language. IRE Transactions on Information Theory, 2:113–124, 1956.

    Article  MATH  Google Scholar 

  12. A. Clark. Learning trees from strings: A strong learning algorithm for some context-free grammars. Journal of Machine Learning Research, 14:3537–3559, 2014.

    MathSciNet  MATH  Google Scholar 

  13. A. Clark and R. Eyraud. Polynomial identification in the limit of substitutable context-free languages. Journal of Machine Learning Research, 8:1725–1745, 2007.

    MathSciNet  MATH  Google Scholar 

  14. A. Clark and S. Lappin. Linguistic Nativism and the Poverty of the Stimulus. Wiley-Blackwell, 2011.

    Google Scholar 

  15. A. Clark and F. Thollard. PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 5:473–497, 2004.

    MathSciNet  MATH  Google Scholar 

  16. A. Clark and R. Yoshinaka. Distributional learning of parallel multiple context-free grammars. Machine Learning, 96:5–31, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  17. H. Comon, M. Dauchet, R. Gilleron, C. Löding, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. Tree automata techniques and applications. Available on: http://tata.gforge.inria.fr/, 2007.

  18. C. de la Higuera. Characteristic sets for polynomial grammatical inference. Machine Learning, 27:125–138, 1997.

    Article  MATH  Google Scholar 

  19. C. de la Higuera. Grammatical inference: learning automata and grammars. Cambridge University Press, 2010.

    Google Scholar 

  20. C. de la Higuera and J. Oncina. Learning deterministic linear languages. In Proceedings of Conference on Learning Theory, pages 185–200, 2002.

    Google Scholar 

  21. P. Dupont, L. Miclet, and E. Vidal. What is the search space of the regular inference? In Proceedings of the International Colloquium on Grammatical Inference, pages 25–37, 1994.

    Google Scholar 

  22. R. Eyraud, C. de la Higuera, and J.-C. Janodet. LARS: A learning algorithm for rewriting systems. Machine Learning, 66(1):7–31, 2007.

    Article  Google Scholar 

  23. F. Girosi. An equivalence between sparse approximation and support vector machines. Neural Comput., 10(6):1455–1480, 1998.

    Article  Google Scholar 

  24. E. M. Gold. Language identification in the limit. Information and Control, 10(5):447–474, 1967.

    Article  MATH  Google Scholar 

  25. J. Heinz. Computational theories of learning and developmental psycholinguistics. In J. Lidz, W. Synder, and J. Pater, editors, The Oxford Handbook of Developmental Linguistics. Cambridge University Press, in press

    Google Scholar 

  26. D. Hsu, S. M. Kakade, and P. Liang. Identifiability and unmixing of latent parse trees. In Advances in Neural Information Processing Systems (NIPS), pages 1520–1528, 2013.

    Google Scholar 

  27. M. Isberner, F. Howar, and B. Steffen. Learning register automata: from languages to program structures. Machine Learning, 96:65–98, 2014.

    Article  MathSciNet  MATH  Google Scholar 

  28. Y. Ishigami and S. Tani. VC-dimensions of finite automata and commutative finite automata with \(k\) letters and \(n\) states. Discrete Applied Mathematics, 74:123–134, 1997.

    Article  MathSciNet  MATH  Google Scholar 

  29. J. Langford. Tutorial on practical prediction theory for classification. Journal of Machine Learning Research, 6:273–306, December 2005.

    MathSciNet  MATH  Google Scholar 

  30. M. Li and P. Vitanyi. Learning simple concepts under simple distributions. SIAM Journal of Computing, 20:911–935, 1991.

    Article  MathSciNet  MATH  Google Scholar 

  31. E. Moore. Gedanken-experiments on sequential machines. In Claude Shannon and John McCarthy, editors, Automata Studies, pages 129–153. Princeton University Press, 1956.

    Google Scholar 

  32. T. Oates, D. Desai, and V. Bhat. Learning k-reversible context-free grammars from positive structural examples. In Proceedings of the International Conference in Machine Learning, pages 459–465, 2002.

    Google Scholar 

  33. J. Oncina and P. García. Identifying regular languages in polynomial time. In Advances in Structural and Syntactic Pattern Recognition, volume 5 of Series in Machine Perception and Artificial Intelligence, pages 99–108. 1992.

    Google Scholar 

  34. T.-W. Pao and J. Carr III. A solution of the syntactical induction-inference problem for regular languages. Computer Languages, 3(1):53 – 64, 1978.

    Article  MATH  Google Scholar 

  35. L. Pitt. Inductive inference, DFA’s, and computational complexity. In Analogical and Inductive Inference, number 397 in LNAI, pages 18–44. Springer-Verlag, 1989.

    Google Scholar 

  36. D. Ron, Y. Singer, and N. Tishby. On the learnability and usage of acyclic probabilistic finite automata. In Proceedings of the Conference on Learning Theory, pages 31–40, 1995.

    Google Scholar 

  37. G. Rozenberg, editor. Handbook of Graph Grammars and Computing by Graph Transformation: Volume I. Foundations. World Scientific, 1997.

    Google Scholar 

  38. Y. Sakakibara. Efficient learning of context-free grammars from positive structural examples. Information and Computation, 97:23–60, 1992.

    Article  MathSciNet  MATH  Google Scholar 

  39. Hiroyuki Seki, Takashi Matsumura, Mamoru Fujii, and Tadao Kasami. On multiple context-free grammars. Theoretical Computer Science, 88(2):191–229, 1991.

    Article  MathSciNet  MATH  Google Scholar 

  40. J. M. Sempere and P. García. A characterization of even linear languages and its application to the learning problem. In Proceedings of the International Colloquium in Grammatical Inference, pages 38–44, 1994.

    Google Scholar 

  41. C. Shibata and R. Yoshinaka. PAC-learning of some subclasses of context-free grammars with basic distributional properties from positive data. In Proceedings of the Algorithmic Learning Theory conference, pages 143–157, 2013.

    Google Scholar 

  42. Y. Tajima, E. Tomita, M. Wakatsuki, and M. Terada. Polynomial time learning of simple deterministic languages via queries and a representative sample. Theoretical Computer Science, 329(1-3):203 – 221, 2004.

    Article  MathSciNet  MATH  Google Scholar 

  43. L. G. Valiant. A theory of the learnable. Communications of the Association for Computing Machinery, 27(11):1134–1142, 1984.

    Article  MATH  Google Scholar 

  44. V. Vapnik. The nature of statistical learning theory. Springer, 1995.

    Google Scholar 

  45. M. Wakatsuki and E. Tomita. A fast algorithm for checking the inclusion for very simple deterministic pushdown automata. IEICE TRANSACTIONS on Information and Systems, VE76-D(10):1224–1233, 1993.

    Google Scholar 

  46. T. Yokomori. On polynomial-time learnability in the limit of strictly deterministic automata. Machine Learning, 19:153–179, 1995.

    MATH  Google Scholar 

  47. T. Yokomori. Polynomial-time identification of very simple grammars from positive data. Theoretical Computer Science, 1(298):179–206, 2003.

    Article  MathSciNet  MATH  Google Scholar 

  48. R. Yoshinaka. Identification in the limit of \(k, l\)-substitutable context-free languages. In Proceedings of the International Colloquium in Grammatical Inference, pages 266–279, 2008.

    Google Scholar 

  49. R. Yoshinaka. Learning efficiency of very simple grammars from positive data. Theoretical Computer Science, 410(19):1807–1825, 2009.

    Article  MathSciNet  MATH  Google Scholar 

  50. R. Yoshinaka. Efficient learning of multiple context-free languages with multidimensional substitutability from positive data. Theoretical Computer Science, 412:1821–1831, 2011.

    Article  MathSciNet  MATH  Google Scholar 

  51. T. Zeugmann. Can learning in the limit be done efficiently? In Proceedings of the Algorithmic Learning Theory conference, pages 17–38, 2003.

    Google Scholar 

  52. T. Zeugmann. From learning in the limit to stochastic finite learning. Theoretical Computer Science, 364(1):77–97, 2006.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rémi Eyraud .

Editor information

Editors and Affiliations

Appendix

Appendix

Here we present an example that shows that a learning result in a set-based approach (that of IPTtD) may not yield to a learning result in the incremental approach.

A characteristic sample has been exhibited for a set-based polynomial-time learning algorithmFootnote 6 for the class of substitutable context-free languages [13]. The size of this characteristic sample is polynomial in the size of the target grammar and its thickness [48]. From any superset of this set, the algorithm returns a representation that generates the target language. Therefore, one can state that the algorithm learns the class of substitutable context-free languages in a set-based approach.

A particularity of this algorithm is that from two different supersets of the characteristic sample, it may returns two different equivalent grammars, and the number of such pairs of samples is infinite (this is due to the infinite number of congruence classes that a context-free language defines). Consider the incremental version of the algorithm that computes a new grammar for every new example. It therefore does not fit the conditions of identification in the limit since there does not exist a moment after which the algorithm always returns the same hypothesis, though there exists a point after which the generated language will always be the target one.Footnote 7

An intuitive solution is then to make the algorithm conservative: the incremental version of the algorithm has to change its hypothesis only if the new example is not recognized. However, this is not working as is shown with the following example.

Consider the language \(a(\{b,c\}\{b,c\})^*\), which is substitutable. It is also context-free as it can be generated by the grammar whose rules are \(S \rightarrow a | SBB\) and \(B \rightarrow b | c\), with S being the only axiom.

As defined in the previously cited papers, the characteristic sample is the following set: \(CS=\{lur \in \varSigma ^* : \exists N\rightarrow \alpha , (l,r)=C(A)\text { and } u=\omega (\alpha )\}\), where C(A) is the smallest context in which the non-terminal A can appear in a derivation, and \(\omega (\alpha )\) is the smallest element of \(\varSigma ^*\) that can be derived from \(\alpha \) in the grammar.

If we assume \(a< b < c\) and \((ab, \lambda ) < (a,b)\), the characteristic sample is then \(CS=\{a, abb, abc\}\).

Suppose the learner gets examples aabbabbbc in this order. As the letter c is new, the conjecture has to be updated at this point. The new conjecture is then the string rewriting system \(\{a \rightarrow abb, a \rightarrow abc, b\rightarrow bbc\}\) with a being the only axiom. It generates every sentence in the characteristic sample.Footnote 8 However the hypothesis is not correct since for example acc is in the target language but not in the current one. Therefore, if the next example is the missing string of the characteristic sample, abc, the algorithm will not change its hypothesis: though all elements of the characteristic sample are available, the current hypothesis is not correct. Once an element of the language that is not generated by the hypothesis is seen, the hypothesis will be updated using a set containing a characteristic sample and thus the new conjecture will correspond to a correct representation of the language.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Eyraud, R., Heinz, J., Yoshinaka, R. (2016). Efficiency in the Identification in the Limit Learning Paradigm. In: Heinz, J., Sempere, J. (eds) Topics in Grammatical Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48395-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48395-4_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48393-0

  • Online ISBN: 978-3-662-48395-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics