Abstract
Classifiers are often constructed iteratively by introducing changes sequentially to an initial classifier. Langford and Blum (COLT'99: Proceedings of the 12th Annual Conference on Computational Learning Theory, 1999, San Mateo, CA: Morgan Kaufmann, pp. 209–214) take advantage of this structure (the microchoice structure), to obtain bounds for the generalization ability of such algorithms. These bounds can be sharper than more general bounds. This paper extends the applicability of the microchoice approach to the more realistic case where the classifier space is continuous and the sequence of changes is not restricted to a pre-fixed finite set.
Proving the microchoice bound in the continuous case relies on a conditioning technique that is often used in proving VC results. It is shown how this technique can be used to convert any learning algorithm over a continuous space into a family of algorithms over discrete spaces.
The new continuous microchoice result is applied to obtain a bound for the generalization ability of the perceptron algorithm. The greedy nature of the perceptron algorithm, which generates new classifiers by introducing corrections based on misclassified points, is exploited to obtain a generalization bound that has an asymptotic form of O(\(1/\sqrt n\)), where n is the training set size.
Article PDF
Similar content being viewed by others
References
Breiman, L., Friedman, J. H., Olshen, R. A.,& Stone, C. J. (1984). Classification and Regression Trees.Wadsworth, Belmont, CA.
Devroye, L., Gyorfi, L., & Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer-Verlag.
Freund, Y. (1998). Self bounding classification algorithms. In COLT '98: Proceedings of the Eleventh Annual Conference on Computational Learning Theory (pp. 247-258). New York, NY: ACM Press.
Gat, Y. (2001). A learning generalization bound with an application to sparse-representation classifiers. Machine Learning, 43, 233-240.
Langford, J., & Blum, A. (1999).Microchoice bounds and self bounding learning algorithms. In COLT '99: Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 209-214). Mateo, CA: Morgan Kaufmann.
Minsky, M. L., & Papert, S. A. (1969). Perceptrons. Cambridge: The MIT Press.
Serfling, R. J. (1974). Probability inequalities for the sum in sampling without replacement. Annals of Statistics, 2, 39-48.
Shawe-Taylor, J., & Bartlett, P. L. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44, 1926-1940.
Uhlmann, W. (1963). Ranggrößen als schäatzfunktionen. Metrika, 7, 23-40.
Vapnik, V. N. (1998). Statistical Learning Theory. New York: John Wiley & Sons, Inc.
Warmuth, M. K., & Floyd, S. (1995). Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning, 21, 269-304.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gat, Y. A Microchoice Bound for Continuous-Space Classification Algorithms. Machine Learning 53, 5–21 (2003). https://doi.org/10.1023/A:1025615325644
Issue Date:
DOI: https://doi.org/10.1023/A:1025615325644