Probabilistic k-Testable Tree Languages

  • Juan Ramón Rico-Juan
  • Jorge Calera-Rubio
  • Rafael C. Carrasco
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1891)


In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fashion. This method is an alternative to costly learning algorithms (as inside-outside-based methods) or algorithms that require larger samples (as many state merging/splitting methods).


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BPd+92]
    Brown, P.F., Della Pietra, V.J., de Souza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Computational Linguistics 18(4), 467–479 (1992)Google Scholar
  2. [Cha93]
    Charniak, E.: Statistical Language Learning. MIT Press, Cambridge (1993)Google Scholar
  3. [Cha96]
    Charniak, E.: Tree-bank grammars. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence and the Eighth Innovative Applications of Artificial Intelligence Conference, Menlo Park, pp. 1031–1036. AAAI Press/MIT Press (1996)Google Scholar
  4. [Chu67]
    Chung, K.L.: Markov Chains with Stationary Transition Probabilities, 2nd edn. Springer, Berlin (1967)MATHGoogle Scholar
  5. [COCR00]
    Carrasco, R.C., Oncina, J., Calera-Rubio, J.: Stochastic inference of regular tree languages. Machine Learning (2000) (to appear)Google Scholar
  6. [CR86]
    Chaudhuri, R., Rao, A.N.V.: Approximating grammar probabilities: Solution of a conjecture. Journal of the ACM 33(4), 702–705 (1986)CrossRefMathSciNetGoogle Scholar
  7. [CRC98]
    Calera-Rubio, J., Carrasco, R.C.: Computing the relative entropy between regular tree languages. Information Processing Letters 68(6), 283–289 (1998)CrossRefMathSciNetGoogle Scholar
  8. [CT91]
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons, New York (1991)MATHCrossRefGoogle Scholar
  9. [Gar93]
    García, P.: Learning k-testable tree sets from positive data. Technical Report DSIC-ii-1993-46, DSIC, Universidad Politécnica de Valencia (1993)Google Scholar
  10. [GV90]
    García, P., Vidal, E.: Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(9), 920–925 (1990)CrossRefGoogle Scholar
  11. [Jel98]
    Jelinek, F.: Statistical Methods for Speech Recognition. The MIT Press, Cambridge (1998)Google Scholar
  12. [Knu93]
    Knuutila, T.: Inference of k-testable tree languages. In: Bunke, H. (ed.) Advances in Structural and Syntactic Pattern Recognition, Proc. Intl. Workshop on Structural and Syntactic Pattern Recognition, Bern, Switzerland, World Scientific, Singapore (1993)Google Scholar
  13. [NEK95]
    Ney, H., Essen, U., Kneser, R.: On the estimation of small probabilities by leaving-one-out. IEEE Trans. on Pattern Analysis and Machine Intelligence 17(12), 1202–1212 (1995)CrossRefGoogle Scholar
  14. [Rub76]
    Rubin, F.: Experiments in text file compression. Communications of the ACM 19(11), 617–623 (1976)CrossRefGoogle Scholar
  15. [Sak92]
    Sakakibara, Y.: Efficient learning of context-free grammars from positive structural examples. Information and Computation 97(1), 23–60 (1992)MATHCrossRefMathSciNetGoogle Scholar
  16. [SS94]
    Stolcke, A., Segal, J.: Precise n-gram probabilities from stochastic context-free grammars. Technical Report TR-94-007, International Computer Science Institute, Berkeley, CA (January 1994)Google Scholar
  17. [Sto95]
    Stolcke, A.: An efficient context-free parsing algorithm that computes prefix probabilities. Computational Linguistics 21(2), 165–201 (1995)MathSciNetGoogle Scholar
  18. [Wet80]
    Wetherell, C.S.: Probabilistic languages: A review and some open questions. ACM Computing Surveys 12(4), 361–379 (1980)MATHCrossRefMathSciNetGoogle Scholar
  19. [Yok95]
    Yokomori, T.: On polynomial-time learnability in the limit of strictly deterministic automata. Machine Learning 19, 153–179 (1995)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Juan Ramón Rico-Juan
    • 1
  • Jorge Calera-Rubio
    • 1
  • Rafael C. Carrasco
    • 1
  1. 1.Departament de Llenguatges i Sistemes InformàticsUniversitat d’AlacantAlacanSpain

Personalised recommendations