Skip to main content

Advertisement

SpringerLink
  • Log in
  1. Home
  2. Machine Learning
  3. Article
Languages as hyperplanes: grammatical inference with string kernels
Download PDF
Your article has downloaded

Similar articles being viewed by others

Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.

Vector Symbolic Architectures for Context-Free Grammars

24 December 2021

Peter beim Graben, Markus Huber, … Matthias Wolff

Do Kernel and Neural Embeddings Help in Training and Generalization?

11 July 2022

Arman Rahbar, Emilio Jorge, … Morteza Haghir Chehreghani

Heat Kernel Analysis of Syntactic Structures

24 February 2021

Andrew Ortegaray, Robert C. Berwick & Matilde Marcolli

Why Can Computers Understand Natural Language?

14 May 2020

Juan Luis Gastaldi

The voice of optimization

19 July 2020

Dimitris Bertsimas & Bartolomeo Stellato

Conceptualizing syntactic categories as semantic categories: Unifying part-of-speech identification and semantics using co-occurrence vector averaging

13 September 2018

Chris Westbury & Geoff Hollis

The processing of pseudoword form and meaning in production and comprehension: A computational modeling approach using linear discriminative learning

06 May 2020

Yu-Ying Chuang, Marie Lenka Vollmer, … R. Harald Baayen

TISK 1.0: An easy-to-use Python implementation of the time-invariant string kernel model of spoken word recognition

30 April 2018

Heejo You & James S. Magnuson

Linear Classifier and Projection Onto a Polytope*

25 May 2020

N. G. Zhurbenko

Download PDF
  • Published: 25 September 2010

Languages as hyperplanes: grammatical inference with string kernels

  • Alexander Clark1,
  • Christophe Costa Florêncio2 &
  • Chris Watkins1 

Machine Learning volume 82, pages 351–373 (2011)Cite this article

  • 649 Accesses

  • 2 Citations

  • Metrics details

Abstract

Using string kernels, languages can be represented as hyperplanes in a high dimensional feature space. We discuss the language-theoretic properties of this formalism with particular reference to the implicit feature maps defined by string kernels, considering the expressive power of the formalism, its closure properties and its relationship to other formalisms. We present a new family of grammatical inference algorithms based on this idea. We demonstrate that some mildly context-sensitive languages can be represented in this way and that it is possible to efficiently learn these using kernel PCA. We experimentally demonstrate the effectiveness of this approach on some standard examples of context-sensitive languages using small synthetic data sets.

Download to read the full article text

Working on a manuscript?

Avoid the common mistakes

References

  • Asveld, P. R. J. (2006). Generating all permutations by context-free grammars in Chomsky normal form. Theoretical Computer Science, 354(1), 118–130.

    Article  MathSciNet  MATH  Google Scholar 

  • Bach, E. (1981). Discontinuous constituents in generalized categorial grammars. In North east linguistics society (NELS 11) (pp. 1–12).

  • Becerra-Bonache, L. (2006). On the learnability of mildly context-sensitive languages using positive data and correction queries. Ph.D. thesis, Universitat Rovira i Virgili, Tarragona, Spain.

  • Becerra-Bonache, L., & Yokomori, T. (2004). Learning mild context-sensitiveness: Toward understanding children’s language learning. In G. Paliouras & Y. Sakakibara (Eds.), Lecture notes in computer science : Vol. 3264. ICGI (pp. 53–64). Berlin: Springer.

    Google Scholar 

  • Becker, T., Rambow, O., & Niv, M. (1992). The derivational generative power of formal systems or scrambling is beyond LCFRS (Tech. Rep. 92–38). Institute For Research in Cognitive Science, University of Pennsylvania.

  • Chalup, S., & Blair, A. D. (1999). Hill climbing in recurrent neural networks for learning the a n b n c n language. In Proceedings of the sixth international conference on neural information processing (pp. 508–513).

  • Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, 2, 113–124.

    Article  Google Scholar 

  • Clark, A. (2006). PAC-learning unambiguous NTS languages. In Proceedings of the 8th international colloquium on grammatical inference (ICGI) (pp. 59–71).

  • Clark, A. (2007). Learning deterministic context free grammars: the Omphalos competition. Machine Learning, 66(1), 93–110.

    Article  Google Scholar 

  • Clark, A., & Eyraud, R. (2007). Polynomial identification in the limit of substitutable context-free languages. Journal Machine Learning Research, 8, 1725–1745.

    MathSciNet  Google Scholar 

  • Clark, A., & Thollard, F. (2004). PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research, 5, 473–497.

    MathSciNet  Google Scholar 

  • Clark, A., & Watkins, C. (2008). Some alternatives to Parikh matrices using string kernels. Fundamenta Informaticae, 84(3–4), 291–303.

    MathSciNet  MATH  Google Scholar 

  • Clark, A., Costa Florêncio, C., & Watkins, C. (2006a). Languages as hyperplanes: grammatical inference with string kernels. In ECML, 17th European conference on machine learning (pp. 90–101). Berlin: Springer.

    Google Scholar 

  • Clark, A., Costa Florêncio, C., Watkins, C., & Serayet, M. (2006b). Planar languages and learnability. In Proceedings of the international conference on grammatical inference (pp. 148–160). Tokyo: Springer.

    Google Scholar 

  • Cortes, C., Kontorovich, L., & Mohri, M. (2007). Learning languages with rational kernels. In Lecture notes in computer science : Vol. 4539. Proceedings of the 20th annual conference on learning theory (COLT 2007) (pp. 349–364). Heidelberg: Springer.

    Google Scholar 

  • Crammer, K., & Singer, Y. (2003). Learning algorithms for enclosing points in Bregmanian spheres. In 16th annual conference on learning theory (p. 388). Berlin: Springer.

    Google Scholar 

  • de la Higuera, C. (1997). Characteristic sets for polynomial grammatical inference. Machine Learning, 27(2), 125–138.

    Article  MATH  Google Scholar 

  • Floyd, S., & Warmuth, M. (1995). Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning 21(3), 269–304.

    Google Scholar 

  • Gentner, T. Q., Fenn, K. M., Margoliash, D., & Nusbaum, H. C. (2006). Recursive syntactic pattern learning by songbirds. Nature, 440, 1204–1207. DOI 10.1038/nature04675. http://www.isrl.uiuc.edu/~amag/langev/paper/gentner06songbirds.html.

    Article  Google Scholar 

  • Gold, E. M. (1967). Language identification in the limit. Information and Control, 10, 447–474.

    Article  MATH  Google Scholar 

  • Heinz, J. (2010). String extension learning. In Proceedings of the 48th annual meeting of the Association for Computational Linguistics. Uppsala, Sweden.

  • Huybregts, R. (1984). The weak inadequacy of context-free phrase structure grammars. In G. J. de Haan, M. Trommelen, & W. Zonneveld (Eds.), Van Periferie naar Kern. Dordrecht: Foris.

    Google Scholar 

  • Kanazawa, M. (1994). A note on language classes with finite elasticity (Tech. Rep. CS-R9471). CWI, Amsterdam.

  • Kanazawa, M. (1998). Learnable classes of categorial grammars. CSLI publications, Stanford: Stanford University, distributed by Cambridge University Press.

    MATH  Google Scholar 

  • Kearns, M., & Valiant, L. G. (1989). Cryptographic limitations on learning boolean formulae and finite automata. In 21st annual ACM symposium on theory of computation (pp. 433–444). New York: ACM.

    Google Scholar 

  • Kearns, M., & Vazirani, U. (1994). An introduction to computational learning theory. Cambridge: MIT Press.

    Google Scholar 

  • Kontorovich, L., Cortes, C., & Mohri, M. (2006). Learning linearly separable languages. In Algorithmic learning theory, 17th international conference (pp. 288–303).

  • Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. Journal of Machine Learning Research, 2, 419–444.

    Article  MATH  Google Scholar 

  • Motoki, T., Shinohara, T., & Wright, K. (1991). The correct definition of finite elasticity: Corrigendum to identification of unions. In The fourth workshop on computational learning theory. San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Oates, T., Amstrong, T., Becerra-Bonache, L., & Atamas, M. (2005). A polynomial time algorithm for inferring grammars for mildly context sensitive languages. In Workshop on grammatical inference applications: successes and future challenges (pp. 61–65). Edinburgh, Scotland.

  • Parikh, R. J. (1966). On context-free languages. Journal of the ACM, 13(4), 570–581.

    Article  MathSciNet  MATH  Google Scholar 

  • Radzinski, D. (1991). Chinese number-names, tree adjoining languages, and mild context-sensitivity. Computational Linguistics 17(3), 277–299.

    Google Scholar 

  • Salomaa, A. (2005). On languages defined by numerical parameters (Tech. Rep. 663). Turku Centre for Computer Science.

  • Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5).

  • Sempere, J. M. (2008). Learning context-sensitive languages from linear structural information. In A. Clark, F. Coste, & L. Miclet (Eds.), LNAI : Vol. 5278. Proceedings of 9th international colloquium on grammatical inference ICGI’08 (pp. 175–186). Berlin: Springer.

    Google Scholar 

  • Shawe-Taylor, J., & Christianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.

    Google Scholar 

  • Shieber, S. M. (1985). Evidence against the context-freeness of natural language. Linguistics and Philosophy, 8, 333–343.

    Article  Google Scholar 

  • Shinohara, T. (1990). Inductive inference of monotonic formal systems from positive data. In S. Arikawa, S. Goto, S. Ohsuga, & T. Yokomori (Eds.), Algorithmic learning theory (pp. 339–351). New York: Springer.

    Google Scholar 

  • Starkie, B., Coste, F., & van Zaanen, M. (2004). The Omphalos context-free grammar learning competition. In LNAI : Vol. 3264. International colloquium on grammatical inference, Athens, Greece (pp. 16–27). Berlin: Springer.

    Google Scholar 

  • Uemura, Y., Hasegawa, A., Kobayashi, S., & Yokomori, T. (1999). Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science, 210(2), 277–303.

    Article  MathSciNet  MATH  Google Scholar 

  • Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.

    Article  MATH  Google Scholar 

  • Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2), 264–280.

    Article  MathSciNet  MATH  Google Scholar 

  • Vijay-Shanker, K., Weir, D. J., & Joshi, A. K. (1987). Characterizing structural descriptions produced by various grammatical formalisms. In Proceedings of the 25th annual meeting on Association for Computational Linguistics (pp. 104–111). Morristown: Association for Computational Linguistics.

    Chapter  Google Scholar 

  • Watkins, C. (2000). Dynamic alignment kernels. In A. J. Smola, P. L. Bartlette, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 39–50). Cambridge: MIT Press.

    Google Scholar 

  • Wright, K. (1989). Identification of unions of languages drawn from an identifiable class. In The 1989 workshop on computational learning theory (pp. 328–333). San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Yokomori, T. (1991). Polynomial-time learning of very simple grammars from positive data. In Proceedings of the fourth annual workshop on computational learning theory, University of California, Santa Cruz (pp. 213–227). New York: ACM Press.

    Google Scholar 

  • Yokomori, T., & Kobayashi, S. (1998). Learning local languages and their application to DNA sequence analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10), 1067–1079. DOI:http://dx.doi.org/10.1109/34.722617.

    Article  Google Scholar 

  • Yoshinaka, R. (2009). Learning mildly context-sensitive languages with multidimensional substitutability from positive data. In R. Gavaldà, G. Lugosi, T. Zeugmann, & S. Zilles (Eds.), Lecture notes in computer science : Vol. 5809. ALT (pp. 278–292). Berlin: Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science, Royal Holloway, University of London, Egham, TW20 0EX, UK

    Alexander Clark & Chris Watkins

  2. Department of Computer Science, K.U. Leuven, Arenberg Campus III, Celestijnenlaan 200A, 3001, Heverlee (Leuven), Belgium

    Christophe Costa Florêncio

Authors
  1. Alexander Clark
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Christophe Costa Florêncio
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Chris Watkins
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Clark.

Additional information

Editor: Nicolò Cesa-Bianchi.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Clark, A., Costa Florêncio, C. & Watkins, C. Languages as hyperplanes: grammatical inference with string kernels. Mach Learn 82, 351–373 (2011). https://doi.org/10.1007/s10994-010-5218-3

Download citation

  • Received: 08 August 2008

  • Revised: 10 July 2010

  • Accepted: 31 August 2010

  • Published: 25 September 2010

  • Issue Date: March 2011

  • DOI: https://doi.org/10.1007/s10994-010-5218-3

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Kernel methods
  • Grammatical inference
Download PDF

Working on a manuscript?

Avoid the common mistakes

Advertisement

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not logged in - 3.236.209.138

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.