Abstract
Minimax lower bounds for concept learning state, for example, that for each sample size n and learning rule g n , there exists a distribution of the observation X and a concept C to be learnt such that the expected error of g n is at least a constant times V/n, where V is the vc dimension of the concept class. However, these bounds do not tell anything about the rate of decrease of the error for a fixed distribution-concept pair.
In this paper we investigate minimax lower bounds in such a (stronger) sense. We show that for several natural k-parameter concept classes, including the class of linear halfspaces, the class of balls, the class of polyhedra with a certain number of faces, and a class of neural networks, for any sequence of learning rules {g n }, there exists a fixed distribution of X and a fixed conceptC such that the expected error is larger than a constant timesk/n for infinitely many n. We also obtain such strong minimax lower bounds for the tail distribution of the probability of error, which extend the corresponding minimax lower bounds.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Antos, A., & Lugosi, G. (1996). Strong minimax lower bounds for learning. Proceedings of the Ninth Annual ACM Conference on Computational Learning Theory (pp. 303–309). New York: Association for Computing Machinery.
Assouad, P. (1983). Densité et dimension. Annales de l'Institut Fourier, 33:233–282.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M.K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36:929–965.
Devroye, L., & Lugosi, G. (1995). Lower bounds in pattern recognition and learning.Pattern Recognition, 28:1011–1018.
Devroye, L., Györfi, L., & Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. New York: Springer-Verlag.
Ehrenfeucht, A., Haussler, D., Kearns, M., & Valiant, L. (1989).A general lower bound on the number of examples needed for learning. Information and Computation, 82:247–261.
Haussler, D., Littlestone, N., & Warmuth, M. (1994). Predicting {0, 1}-functions on randomly drawn points. Information and Computation, 115:248–292.
Lugosi, G. (1995). Improved upper bounds for probabilities of uniform deviations. Statistics and Probability Letters, 25:71–77.
Reiss, R.-D. (1989). Approximate Distributions of Order Statistics. New York: Springer-Verlag.
Schuurmans, D. (1995). Characterizing rational versus exponential learning curves. Computational Learning Theory: Second European Conference. EuroCOLT'95 (pp. 272–286). New York: Springer-Verlag.
Shawe-Taylor, J., Anthony, M., & Biggs, N.L. (1993). Bounding sample size with the Vapnik-Chervonenkis dimension. Discrete Applied Mathematics, 42:65–73.
Vapnik, V.N., & Chervonenkis, A.Ya. (1979). Theory of Pattern Recognition. Moscow: Nauka, 1974 (in Russian); German translation: Theorie der Zeichenerkennung, Berlin: Akademie Verlag.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Antos, A., Lugosi, G. Strong Minimax Lower Bounds for Learning. Machine Learning 30, 31–56 (1998). https://doi.org/10.1023/A:1007454427662
Issue Date:
DOI: https://doi.org/10.1023/A:1007454427662