Learning Linearly Separable Languages

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4264)


This paper presents a novel paradigm for learning languages that consists of mapping strings to an appropriate high-dimensional feature space and learning a separating hyperplane in that space. It initiates the study of the linear separability of automata and languages by examining the rich class of piecewise-testable languages. It introduces a high-dimensional feature map and proves piecewise-testable languages to be linearly separable in that space. The proof makes use of word combinatorial results relating to subsequences. It also shows that the positive definite kernel associated to this embedding can be computed in quadratic time. It examines the use of support vector machines in combination with this kernel to determine a separating hyperplane and the corresponding learning guarantees. It also proves that all languages linearly separable under a regular finite cover embedding, a generalization of the embedding we used, are regular.


Support Vector Machine Weight Vector Boolean Function Regular Language Generalization Error 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Angluin, D.: On the complexity of minimum inference of regular sets. Information and Control 3(39), 337–350 (1978)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Angluin, D.: Inference of reversible languages. Journal of the ACM (JACM) 3(29), 741–765 (1982)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Anthony, M.: Threshold Functions, Decision Lists, and the Representation of Boolean Functions. Neurocolt Technical report Series NC-TR-96-028, Royal Holloway, University of London (1996)Google Scholar
  4. 4.
    Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers. In: Advances in kernel methods: support vector learning, pp. 43–54. MIT Press, Cambridge, MA, USA (1999)Google Scholar
  5. 5.
    Boser, B.E., Guyon, I., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop of Computational Learning Theory, Pittsburg, vol. 5, pp. 144–152. ACM, New York (1992)CrossRefGoogle Scholar
  6. 6.
    Cortes, C., Haffner, P., Mohri, M.: Rational Kernels: Theory and Algorithms. Journal of Machine Learning Research (JMLR) 5, 1035–1062 (2004)MathSciNetGoogle Scholar
  7. 7.
    Cortes, C., Vapnik, V.N.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)zbMATHGoogle Scholar
  8. 8.
    Derryberry, J.: Private communication (2004)Google Scholar
  9. 9.
    Freund, Y., Kearns, M., Ron, D., Rubinfeld, R., Schapire, R.E., Sellie, L.: Efficient learning of typical finite automata from random walks. In: STOC 1993: Proceedings of the twenty-fifth annual ACM symposium on Theory of computing, pp. 315–324. ACM Press, New York (1993)CrossRefGoogle Scholar
  10. 10.
    García, P., Ruiz, J.: Learning k-testable and k-piecewise testable languages from positive data. Grammars 7, 125–140 (2004)Google Scholar
  11. 11.
    Gold, E.M.: Language identification in the limit. Information and Control 50(10), 447–474 (1967)CrossRefGoogle Scholar
  12. 12.
    Gold, E.M.: Complexity of automaton identification from given data. Information and Control 3(37), 302–420 (1978)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Haines, L.H.: On free monoids partially ordered by embedding. Journal of Combinatorial Theory 6, 35–40 (1969)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Haussler, D., Littlestone, N., Warmuth, M.K.: Predicting {0,1}- Functions on Randomly Drawn Points. In: Proceedings of the first annual workshop on Computational learning theory (COLT 1988), pp. 280–296. Morgan Kaufmann Publishers Inc., San Francisco (1988)Google Scholar
  15. 15.
    Higman, G.: Ordering by divisibility in abstract algebras. Proceedings of The London Mathematical Society 2, 326–336 (1952)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. The MIT Press, Cambridge (1997)Google Scholar
  17. 17.
    Lodhi, H., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) NIPS 2000, pp. 563–569. MIT Press, Cambridge (2001)Google Scholar
  18. 18.
    Lothaire, M.: Combinatorics on Words. Encyclopedia of Mathematics and Its Applications, vol. 17. Addison-Wesley, Reading (1983)zbMATHGoogle Scholar
  19. 19.
    Mateescu, A., Salomaa, A.: Volume 1: Word, Language, Grammar. In: Formal languages: an Introduction and a Synopsis. Handbook of Formal Languages, pp. 1–39. Springer, New York (1997)Google Scholar
  20. 20.
    Oncina, J., García, P., Vidal, E.: Learning subsequential transducers for pattern recognition interpretation tasks. IEEE Trans. Pattern Anal. Mach. Intell. 15(5), 448–458 (1993)CrossRefGoogle Scholar
  21. 21.
    Pitt, L., Warmuth, M.: The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the Assocation for Computing Machinery 40(1), 95–142 (1993)zbMATHMathSciNetGoogle Scholar
  22. 22.
    Ron, D., Singer, Y., Tishby, N.: On the learnability and usage of acyclic probabilistic finite automata. Journal of Computer and System Sciences 56(2), 133–152 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Simon, I.: Piecewise testable events. In: Brakhage, H. (ed.) GI-Fachtagung 1975. LNCS, vol. 33. Springer, Heidelberg (1975)Google Scholar
  24. 24.
    Trakhtenbrot, B.A., Barzdin, J.M.: Finite Automata: Behavior and Synthesis. Fundamental Studies in Computer Science, vol. 1. North-Holland, Amsterdam (1973)zbMATHGoogle Scholar
  25. 25.
    Vapnik, V.N.: Statistical Learning Theory. John Wiley & Sons, Chichester (1998)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.Google ResearchNew YorkUSA
  3. 3.Courant Institute of Mathematical SciencesNew YorkUSA

Personalised recommendations