Learning Languages from Bounded Resources: The Case of the DFA and the Balls of Strings

  • Colin de la Higuera
  • Jean-Christophe Janodet
  • Frédéric Tantini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5278)

Abstract

Comparison of standard language learning paradigms (identification in the limit, query learning, Pac learning) has always been a complex question. Moreover, when to the question of converging to a target one adds computational constraints, the picture becomes even less clear: how much do queries or negative examples help? Can we find good algorithms that change their minds very little or that make very few errors? In order to approach these problems we concentrate here on two classes of languages, the topological balls of strings (for the edit distance) and the deterministic finite automata ( Open image in new window ), and (re-)visit the different learning paradigms to sustain our claims.

Keywords

Polynomial learnability deterministic finite automata balls of strings edit distance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)MathSciNetGoogle Scholar
  2. 2.
    Navarro, G.: A guided tour to approximate string matching. ACM computing surveys 33(1), 31–88 (2001)CrossRefGoogle Scholar
  3. 3.
    Chávez, E., Navarro, G., Baeza-Yates, R.A., Marroquín, J.L.: Searching in metric spaces. ACM Computing Survey 33(3), 273–321 (2001)CrossRefGoogle Scholar
  4. 4.
    Kohonen, T.: Median strings. Pattern Recognition Letters 3, 309–313 (1985)CrossRefGoogle Scholar
  5. 5.
    Schulz, K.U., Mihov, S.: Fast string correction with Levenshtein automata. Int. Journal on Document Analysis and Recognition 5(1), 67–85 (2002)MATHCrossRefGoogle Scholar
  6. 6.
    Sagot, M.F., Wakabayashi, Y.: Pattern inference under many guises. In: Recent Advances in Algorithms and Combinatorics, pp. 245–287. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Gold, E.M.: Language identification in the limit. Information and Control 10(5), 447–474 (1967)MATHCrossRefGoogle Scholar
  8. 8.
    Angluin, D.: Queries and concept learning. Machine Learning Journal 2, 319–342 (1987)Google Scholar
  9. 9.
    Valiant, L.G.: A theory of the learnable. Communications of the ACM 27(11), 1134–1142 (1984)MATHCrossRefGoogle Scholar
  10. 10.
    Angluin, D.: Negative results for equivalence queries. Machine Learning Journal 5, 121–150 (1990)Google Scholar
  11. 11.
    Pitt, L.: Inductive inference, DFA’s, and computational complexity. In: Jantke, K.P. (ed.) AII 1989. LNCS, vol. 397, pp. 18–44. Springer, Heidelberg (1989)Google Scholar
  12. 12.
    Li, M., Vitanyi, P.: Learning simple concepts under simple distributions. Siam Journal of Computing 20, 911–935 (1991)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Denis, F.: Learning regular languages from simple positive examples. Machine Learning Journal 44(1), 37–66 (2001)MATHCrossRefGoogle Scholar
  14. 14.
    Parekh, R.J., Honavar, V.: On the relationship between models for learning in helpful environments. In: Oliveira, A.L. (ed.) ICGI 2000. LNCS (LNAI), vol. 1891, pp. 207–220. Springer, Heidelberg (2000)Google Scholar
  15. 15.
    Haussler, D., Kearns, M.J., Littlestone, N., Warmuth, M.K.: Equivalence of models for polynomial learnability. Information and Computation 95(2), 129–161 (1991)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Kearns, M., Valiant, L.: Cryptographic limitations on learning boolean formulae and finite automata. In: 21st ACM Symposium on Theory of Computing (STOC 1989), pp. 433–444 (1989)Google Scholar
  17. 17.
    de la Higuera, C.: Characteristic sets for polynomial grammatical inference. Machine Learning Journal 27, 125–138 (1997)MATHCrossRefGoogle Scholar
  18. 18.
    Wagner, R., Fisher, M.: The string-to-string correction problem. Journal of the ACM 21, 168–178 (1974)MATHCrossRefGoogle Scholar
  19. 19.
    Papadimitriou, C.M.: Computational Complexity. Addison Wesley, New York (1994)MATHGoogle Scholar
  20. 20.
    Becerra-Bonache, L., de la Higuera, C., Janodet, J.C., Tantini, F.: Learning balls of strings with correction queries. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 18–29. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Angluin, D.: Learning regular sets from queries and counterexamples. Information and Control 39, 337–350 (1987)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Warmuth, M.: Towards representation independence in PAC-learning. In: Jantke, K.P. (ed.) AII 1989. LNCS, vol. 397, pp. 78–103. Springer, Heidelberg (1989)Google Scholar
  23. 23.
    Kearns, M., Vazirani, U.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)Google Scholar
  24. 24.
    Pitt, L., Valiant, L.G.: Computational limitations on learning from examples. Journal of the ACM 35(4), 965–984 (1988)MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Maier, D.: The complexity of some problems on subsequences and supersequences. Journal of the ACM 25, 322–336 (1977)CrossRefMathSciNetGoogle Scholar
  26. 26.
    de la Higuera, C., Casacuberta, F.: Topology of strings: Median string is NP-complete. Theoretical Computer Science 230, 39–48 (2000)MATHCrossRefMathSciNetGoogle Scholar
  27. 27.
    Pitt, L., Warmuth, M.: The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the ACM 40(1), 95–142 (1993)MATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Angluin, D., Smith, C.: Inductive inference: theory and methods. ACM computing surveys 15(3), 237–269 (1983)CrossRefMathSciNetGoogle Scholar
  29. 29.
    Greenberg, R.I.: Bounds on the number of longest common subsequences. Technical report, Loyola University (2003), http://arXiv.org/abs/cs/0301030v2
  30. 30.
    Greenberg, R.I.: Fast and simple computation of all longest common subsequences. Technical report, Loyola University (2002), http://arXiv.org/abs/cs.DS/0211001
  31. 31.
    Gold, E.M.: Complexity of automaton identification from given data. Information and Control 37, 302–320 (1978)MATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Oncina, J., García, P.: Identifying regular languages in polynomial time. In: Advances in Structural and Syntactic Pattern Recognition. Series in Machine Perception and Artificial Intelligence, vol. 5, pp. 99–108. World Scientific, Singapore (1992)Google Scholar
  33. 33.
    Denis, F., Lemay, A., Terlutte, A.: Learning regular languages using RFSA. Theoretical Computer Science 313(2), 267–294 (2004)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Colin de la Higuera
    • 1
  • Jean-Christophe Janodet
    • 1
  • Frédéric Tantini
    • 1
  1. 1.Universities of LyonSt-Etienne 

Personalised recommendations