Machine Learning

, Volume 44, Issue 1–2, pp 185–197 | Cite as

Stochastic Inference of Regular Tree Languages

  • Rafael C. Carrasco
  • Jose Oncina
  • Jorge Calera-Rubio


We generalize a former algorithm for regular language identification from stochastic samples to the case of tree languages. It can also be used to identify context-free languages when structural information about the strings is available. The procedure identifies equivalent subtrees in the sample and outputs the hypothesis in linear time with the number of examples. The results are evaluated with a method that computes efficiently the relative entropy between the target grammar and the inferred one.

grammatical inference stochastic grammars three languages 


  1. Angluin, D. (1988). Identifying languages from stochastic examples. Technical Report YALEU/DCS/RR-614, Yale University Dept. of Computer Science, New Haven, CT.Google Scholar
  2. Calera-Rubio, J.,& Carrasco, R. C. (1998). Computing the relative entropy between regular tree languages. Information Processing Letters, 68:6, 283–289.Google Scholar
  3. Carrasco, R. C.,& Oncina, J. (1999). Learning deterministic regular grammars from stochastic samples in polynomial time. RAIRO (Theoretical Informatics and Applications), 33:1, 1–20.Google Scholar
  4. Cover, T. M.,& Thomas, J. A. (1991). Elements of Information Theory. Wiley Series in Telecommunications. New York, NY, USA: John Wiley&Sons.Google Scholar
  5. Feller, W. (1950). An Introduction to Probability Theory and Its Applications I (2nd edn.). New York: John Wiley.Google Scholar
  6. Gécseg, F.,& Steinby, M. (1984). Tree Automata. Budapest: Akadémiai Kiadó.Google Scholar
  7. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:301, 13–30.Google Scholar
  8. Hopcroft, J.,& Ullman, J. (1980). Introduction to Automata Theory, Languages, and Computation. N. Reading, MA: Addison-Wesley.Google Scholar
  9. Oncina, J.,& García, P. (1994). Inference of rational tree sets. Technical Report DSIC-ii-1994-23, DSIC, Universidad Politécnica de Valencia.Google Scholar
  10. Sakakibara, Y. (1992). Efficient learning of context-free grammars from positive structural examples. Information and Computation, 97:1, 23–60.Google Scholar
  11. Sakakibara, Y., Brown, M., Underwood, R. C., Mian, I. S.,& Haussler, D. (1994). Stochastic context-free grammars for modeling RNA. In L. Hunter, (Ed.), Proceedings of the 27th Annual Hawaii International Conference on System Sciences. Vol. 5: Biotechnology Computing. Los Alamitos, CA, USA (284–294).Google Scholar
  12. Stolcke, A.,& Omohundro, S. (1993). Hidden Markov model induction by Bayesian model merging. In S. J. Hanson, J. D. Cowan, and C. L. Giles, (Eds.), Advances in Neural Information Processing Systems (Vol. 5, pp. 11–18).Google Scholar
  13. Wetherell, C. S. (1980). Probabilistic languages: A review and some open questions. ACM Computing Surveys, 12:4, 361–379.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Rafael C. Carrasco
    • 1
  • Jose Oncina
    • 1
  • Jorge Calera-Rubio
    • 1
  1. 1.Departamento de Lenguajes y Sistemas InformáticosUniversidad de AlicanteAlicante

Personalised recommendations