Advertisement

From Theoretical Learnability to Statistical Measures of the Learnable

  • Marc Sebban
  • Gilles Richard
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1642)

Abstract

The main focus of theoretical models for machine learning is to formally describe what is the meaning of learnable, what is a learning process, or what is the relationship between a learning agent and a teaching one. However, when we prove from a theoretical point of view that a concept is learnable, we have no a priori idea concerning the difficulty to learn the target concept. In this paper, after reminding some theoretical concepts and the main estimation methods, we provide a learning-system independent measure of the difficulty to learn a concept. It is based on geometrical and statistical concepts, and the implicit assumption that distinct classes occupy distinct regions in the feature space. In such a context, we assume the learnability to be identify by the separability level in the feature space. Our definition is constructive, based on a statistical test and has been implemented on problems of the UCI repository. The results are really convincing and fit well with theoretical results and intuition. Finally, in order to reduce the computational costs of our approach, we propose a new way to characterize the geometrical regions using a k-Nearest-Neighbors graph. We experimentally show that it allows to compute accuracy estimates near from those obtained by a leave-one-out-cross-validation and with smaller standard deviation.

Keywords

Support Vector Machine Feature Space Minimum Span Tree Target Concept Neighborhood Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    J. L. Balcazar, J. Diaz, and J. Gabarro. Structural Complexity. Springer Verlag, 118(2), 1988.Google Scholar
  2. [2]
    A. L. Blum and P. Langley. Selection of relevant features and examples in machine learning. Issue of Artificial Intelligence, pages 487–501, 1997.Google Scholar
  3. [3]
    B. Efron and R. Tishirani. An introduction to the bootstrap. Chapman and Hall, 1993.Google Scholar
  4. [4]
    M. Frazier, S. Goldman, N. Mishra, and L. Pitt. Learning from a consistently ignorant teacher. Journal of Computer and System Sciences, 52(3):471–492, June 1996.zbMATHCrossRefMathSciNetGoogle Scholar
  5. [5]
    I. M. Gold. Language identification in the limit. Information and Control, 10:447–474, 1967.CrossRefzbMATHGoogle Scholar
  6. [6]
    S. A. Goldman and H. D. Mathias. Teaching a smarter learner. Journal of Computer and System Sciences, 52(2):255–267, April 1996.CrossRefMathSciNetGoogle Scholar
  7. [7]
    K. Hoffgen and H. Simon. Lower bounds on learning decision lists and trees. In Proceedings of the Fifth Workshop on COmputational Learning Theory, COLT’92, pages 428–439, 1992.Google Scholar
  8. [8]
    M. J. Kearns and Y. Mansour. On the boosting ability of top-down decision tree learning algorithms. Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pages 459–468, 1996.Google Scholar
  9. [9]
    R. Kohavi. Feature subset selection as search with probabilistic estimates. AAAI Fall Symposium on Relevance, 1994.Google Scholar
  10. [10]
    R. Kohavi. A study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 1137–1143, 1995.Google Scholar
  11. [11]
    D. Koller and R. M. Sahami. Toward optimal feature selection. In Thirteenth International Conference on Machine Learning (Bari-Italy), 1996.Google Scholar
  12. [12]
    J. R. Quinlan. Induction of decision trees. Machine Learning 1, pages 81–106, 1986.Google Scholar
  13. [13]
    J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.Google Scholar
  14. [14]
    C. Rao. Linear statistical inference and its applications. Wiley New York, 1965.zbMATHGoogle Scholar
  15. [15]
    R. E. Schapire and Y. Singer. Improved boosting algorithms using confidencerated predictions. In Proceedings of the Eleventh Annual ACM Conference on Computational Learning Theory, pages 80–91, 1998.Google Scholar
  16. [16]
    M. Sebban. Modèles Thèoriques en Reconnaissance de Formes et Architecture Hybride pour Machine Perceptive. PhD thesis, Universitè Lyon 1, 1996.Google Scholar
  17. [17]
    C. Stanfill and D. Waltz. Toward memory-based reasoning. In Communications of the ACM, pages 1213–1228, 1986.Google Scholar
  18. [18]
    L. G. Valiant. A Theory of the Learnable. In Communications of the ACM, pages 1134–1142, 1984.Google Scholar
  19. [19]
    V. Vapnik. Support vector learning machines. Tutorial at NIPS*97, Denver (CO), December 1997, 1997.Google Scholar
  20. [20]
    D. Wettschereck and T. G. Dietterich. An experimental comparison of the nearest neighbor and nearest hyperrectangle algorithms. In Machine Learning, pages 5–28, 1995.Google Scholar
  21. [21]
    D. R. Wilson and T. R. Martinez. Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6:1–34, 1997.zbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Marc Sebban
    • 1
  • Gilles Richard
    • 1
  1. 1.TRIVIA Research TeamUniversité des Antilles et de la GuyanePointe-à-PitreFrance

Personalised recommendations