Skip to main content

Languages as hyperplanes: grammatical inference with string kernels

Abstract

Using string kernels, languages can be represented as hyperplanes in a high dimensional feature space. We discuss the language-theoretic properties of this formalism with particular reference to the implicit feature maps defined by string kernels, considering the expressive power of the formalism, its closure properties and its relationship to other formalisms. We present a new family of grammatical inference algorithms based on this idea. We demonstrate that some mildly context-sensitive languages can be represented in this way and that it is possible to efficiently learn these using kernel PCA. We experimentally demonstrate the effectiveness of this approach on some standard examples of context-sensitive languages using small synthetic data sets.

References

  1. Asveld, P. R. J. (2006). Generating all permutations by context-free grammars in Chomsky normal form. Theoretical Computer Science, 354(1), 118–130.

    MathSciNet  MATH  Article  Google Scholar 

  2. Bach, E. (1981). Discontinuous constituents in generalized categorial grammars. In North east linguistics society (NELS 11) (pp. 1–12).

  3. Becerra-Bonache, L. (2006). On the learnability of mildly context-sensitive languages using positive data and correction queries. Ph.D. thesis, Universitat Rovira i Virgili, Tarragona, Spain.

  4. Becerra-Bonache, L., & Yokomori, T. (2004). Learning mild context-sensitiveness: Toward understanding children’s language learning. In G. Paliouras & Y. Sakakibara (Eds.), Lecture notes in computer science : Vol. 3264. ICGI (pp. 53–64). Berlin: Springer.

    Google Scholar 

  5. Becker, T., Rambow, O., & Niv, M. (1992). The derivational generative power of formal systems or scrambling is beyond LCFRS (Tech. Rep. 92–38). Institute For Research in Cognitive Science, University of Pennsylvania.

  6. Chalup, S., & Blair, A. D. (1999). Hill climbing in recurrent neural networks for learning the a n b n c n language. In Proceedings of the sixth international conference on neural information processing (pp. 508–513).

  7. Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 2, 113–124.

    Article  Google Scholar 

  8. Clark, A. (2006). PAC-learning unambiguous NTS languages. In Proceedings of the 8th international colloquium on grammatical inference (ICGI) (pp. 59–71).

  9. Clark, A. (2007). Learning deterministic context free grammars: the Omphalos competition. Machine Learning, 66(1), 93–110.

    Article  Google Scholar 

  10. Clark, A., & Eyraud, R. (2007). Polynomial identification in the limit of substitutable context-free languages. Journal Machine Learning Research, 8, 1725–1745.

    MathSciNet  Google Scholar 

  11. Clark, A., & Thollard, F. (2004). PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 5, 473–497.

    MathSciNet  Google Scholar 

  12. Clark, A., & Watkins, C. (2008). Some alternatives to Parikh matrices using string kernels. Fundamenta Informaticae, 84(3–4), 291–303.

    MathSciNet  MATH  Google Scholar 

  13. Clark, A., Costa Florêncio, C., & Watkins, C. (2006a). Languages as hyperplanes: grammatical inference with string kernels. In ECML, 17th European conference on machine learning (pp. 90–101). Berlin: Springer.

    Google Scholar 

  14. Clark, A., Costa Florêncio, C., Watkins, C., & Serayet, M. (2006b). Planar languages and learnability. In Proceedings of the international conference on grammatical inference (pp. 148–160). Tokyo: Springer.

    Google Scholar 

  15. Cortes, C., Kontorovich, L., & Mohri, M. (2007). Learning languages with rational kernels. In Lecture notes in computer science : Vol. 4539. Proceedings of the 20th annual conference on learning theory (COLT 2007) (pp. 349–364). Heidelberg: Springer.

    Google Scholar 

  16. Crammer, K., & Singer, Y. (2003). Learning algorithms for enclosing points in Bregmanian spheres. In 16th annual conference on learning theory (p. 388). Berlin: Springer.

    Google Scholar 

  17. de la Higuera, C. (1997). Characteristic sets for polynomial grammatical inference. Machine Learning, 27(2), 125–138.

    MATH  Article  Google Scholar 

  18. Floyd, S., & Warmuth, M. (1995). Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning 21(3), 269–304.

    Google Scholar 

  19. Gentner, T. Q., Fenn, K. M., Margoliash, D., & Nusbaum, H. C. (2006). Recursive syntactic pattern learning by songbirds. Nature, 440, 1204–1207. DOI 10.1038/nature04675. http://www.isrl.uiuc.edu/~amag/langev/paper/gentner06songbirds.html.

    Article  Google Scholar 

  20. Gold, E. M. (1967). Language identification in the limit. Information and Control, 10, 447–474.

    MATH  Article  Google Scholar 

  21. Heinz, J. (2010). String extension learning. In Proceedings of the 48th annual meeting of the Association for Computational Linguistics. Uppsala, Sweden.

  22. Huybregts, R. (1984). The weak inadequacy of context-free phrase structure grammars. In G. J. de Haan, M. Trommelen, & W. Zonneveld (Eds.), Van Periferie naar Kern. Dordrecht: Foris.

    Google Scholar 

  23. Kanazawa, M. (1994). A note on language classes with finite elasticity (Tech. Rep. CS-R9471). CWI, Amsterdam.

  24. Kanazawa, M. (1998). Learnable classes of categorial grammars. CSLI publications, Stanford: Stanford University, distributed by Cambridge University Press.

    MATH  Google Scholar 

  25. Kearns, M., & Valiant, L. G. (1989). Cryptographic limitations on learning boolean formulae and finite automata. In 21st annual ACM symposium on theory of computation (pp. 433–444). New York: ACM.

    Google Scholar 

  26. Kearns, M., & Vazirani, U. (1994). An introduction to computational learning theory. Cambridge: MIT Press.

    Google Scholar 

  27. Kontorovich, L., Cortes, C., & Mohri, M. (2006). Learning linearly separable languages. In Algorithmic learning theory, 17th international conference (pp. 288–303).

  28. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2, 419–444.

    MATH  Article  Google Scholar 

  29. Motoki, T., Shinohara, T., & Wright, K. (1991). The correct definition of finite elasticity: Corrigendum to identification of unions. In The fourth workshop on computational learning theory. San Mateo: Morgan Kaufmann.

    Google Scholar 

  30. Oates, T., Amstrong, T., Becerra-Bonache, L., & Atamas, M. (2005). A polynomial time algorithm for inferring grammars for mildly context sensitive languages. In Workshop on grammatical inference applications: successes and future challenges (pp. 61–65). Edinburgh, Scotland.

  31. Parikh, R. J. (1966). On context-free languages. Journal of the ACM, 13(4), 570–581.

    MathSciNet  MATH  Article  Google Scholar 

  32. Radzinski, D. (1991). Chinese number-names, tree adjoining languages, and mild context-sensitivity. Computational Linguistics 17(3), 277–299.

    Google Scholar 

  33. Salomaa, A. (2005). On languages defined by numerical parameters (Tech. Rep. 663). Turku Centre for Computer Science.

  34. Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5).

  35. Sempere, J. M. (2008). Learning context-sensitive languages from linear structural information. In A. Clark, F. Coste, & L. Miclet (Eds.), LNAI : Vol. 5278. Proceedings of 9th international colloquium on grammatical inference ICGI’08 (pp. 175–186). Berlin: Springer.

    Google Scholar 

  36. Shawe-Taylor, J., & Christianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.

    Google Scholar 

  37. Shieber, S. M. (1985). Evidence against the context-freeness of natural language. Linguistics and Philosophy, 8, 333–343.

    Article  Google Scholar 

  38. Shinohara, T. (1990). Inductive inference of monotonic formal systems from positive data. In S. Arikawa, S. Goto, S. Ohsuga, & T. Yokomori (Eds.), Algorithmic learning theory (pp. 339–351). New York: Springer.

    Google Scholar 

  39. Starkie, B., Coste, F., & van Zaanen, M. (2004). The Omphalos context-free grammar learning competition. In LNAI : Vol. 3264. International colloquium on grammatical inference, Athens, Greece (pp. 16–27). Berlin: Springer.

    Google Scholar 

  40. Uemura, Y., Hasegawa, A., Kobayashi, S., & Yokomori, T. (1999). Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science, 210(2), 277–303.

    MathSciNet  MATH  Article  Google Scholar 

  41. Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.

    MATH  Article  Google Scholar 

  42. Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2), 264–280.

    MathSciNet  MATH  Article  Google Scholar 

  43. Vijay-Shanker, K., Weir, D. J., & Joshi, A. K. (1987). Characterizing structural descriptions produced by various grammatical formalisms. In Proceedings of the 25th annual meeting on Association for Computational Linguistics (pp. 104–111). Morristown: Association for Computational Linguistics.

    Chapter  Google Scholar 

  44. Watkins, C. (2000). Dynamic alignment kernels. In A. J. Smola, P. L. Bartlette, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 39–50). Cambridge: MIT Press.

    Google Scholar 

  45. Wright, K. (1989). Identification of unions of languages drawn from an identifiable class. In The 1989 workshop on computational learning theory (pp. 328–333). San Mateo: Morgan Kaufmann.

    Google Scholar 

  46. Yokomori, T. (1991). Polynomial-time learning of very simple grammars from positive data. In Proceedings of the fourth annual workshop on computational learning theory, University of California, Santa Cruz (pp. 213–227). New York: ACM Press.

    Google Scholar 

  47. Yokomori, T., & Kobayashi, S. (1998). Learning local languages and their application to DNA sequence analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10), 1067–1079. DOI:http://dx.doi.org/10.1109/34.722617.

    Article  Google Scholar 

  48. Yoshinaka, R. (2009). Learning mildly context-sensitive languages with multidimensional substitutability from positive data. In R. Gavaldà, G. Lugosi, T. Zeugmann, & S. Zilles (Eds.), Lecture notes in computer science : Vol. 5809. ALT (pp. 278–292). Berlin: Springer.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Alexander Clark.

Additional information

Editor: Nicolò Cesa-Bianchi.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Clark, A., Costa Florêncio, C. & Watkins, C. Languages as hyperplanes: grammatical inference with string kernels. Mach Learn 82, 351–373 (2011). https://doi.org/10.1007/s10994-010-5218-3

Download citation

Keywords

  • Kernel methods
  • Grammatical inference