Sequences Classification by Least General Generalisations

  • Frédéric Tantini
  • Alain Terlutte
  • Fabien Torre
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6339)


In this paper, we present a general framework for supervised classification. This framework provides methods like boosting and only needs the definition of a generalisation operator called lgg. For sequence classification tasks, lgg is a learner that only uses positive examples. We show that grammatical inference has already defined such learners for automata classes like reversible automata or k-TSS automata. Then we propose a generalisation algorithm for the class of balls of words. Finally, we show through experiments that our method efficiently resolves sequence classification tasks.


sequence classification least general automata balls of words 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Angluin, D.: Inference of reversible languages. Journal of the ACM 29(3), 741–765 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    García, P., Vidal, E.: Inference of k-testable languages in the strict sense and application to syntactic pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12(9), 920–925 (1990)CrossRefGoogle Scholar
  3. 3.
    de la Higuera, C., Janodet, J.C., Tantini, F.: Learning languages from bounded resources: The case of the dfa and the balls of strings. In: Clark, A., Coste, F., Miclet, L. (eds.) ICGI 2008. LNCS (LNAI), vol. 5278, pp. 43–56. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)MathSciNetGoogle Scholar
  5. 5.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the ACM 21, 168–178 (1974)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    de la Higuera, C., Casacuberta, F.: Topology of strings: median string is NP-complete. Theoretical Computer Science 230, 39–48 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  8. 8.
    Oncina, J., García, P.: Identifying regular languages in polynomial time. In: Advances in Structural and Syntactic Pattern Recognition, pp. 99–108. World Scientific Publishing, Singapore (1992)Google Scholar
  9. 9.
    Micó, L., Oncina, J.: Comparison of fast nearest neighbour classifiers for handwritten character recognition. Pattern Recognition Letter 19(3-4), 351–356 (1998)CrossRefGoogle Scholar
  10. 10.
    Oncina, J., Sebban, M.: Learning stochastic edit distance: Application in handwritten character recognition. Pattern Recognition 39(9), 1575–1587 (2006)zbMATHCrossRefGoogle Scholar
  11. 11.
    Boyer, L., Esposito, Y., Habrard, A., Oncina, J., Sebban, J.: Sedil: Software for Edit Distance Learning. In: Daelemans, W., Goethals, B., Morik, K. (eds.) Proceedings of the 19th European Conference on Machine Learning, pp. 672–677. Springer, Heidelberg (2008)Google Scholar
  12. 12.
    Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2), 181–207 (2003)zbMATHCrossRefGoogle Scholar
  13. 13.
    Tantini, F., de la Higuera, C., Janodet, J.C.: Identification in the limit of systematic-noisy languages. In: Sakakibara, Y., Kobayashi, S., Sato, K., Nishino, T., Tomita, E. (eds.) ICGI 2006. LNCS (LNAI), vol. 4201, pp. 19–31. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Janodet, J.C.: The vapnik-chervonenkis dimension of balls of strings is infinite. Personal Communication (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Frédéric Tantini
    • 1
  • Alain Terlutte
    • 2
  • Fabien Torre
    • 2
  1. 1.Parole, CNRS/LORIA Nancy 
  2. 2.Mostrare (INRIA Lille Nord Europe et CNRS LIFL)Université Lille Nord deFrance

Personalised recommendations