Rademacher and Gaussian Complexities: Risk Bounds and Structural Results
We investigate the use of certain data-dependent estimates of the complexity of a function class, called Rademacher and gaussian complexities. In a decision theoretic setting, we prove general risk bounds in terms of these complexities. We consider function classes that can be expressed as combinations of functions from basis classes and show how the Rademacher and gaussian complexities of such a function class can be bounded in terms of the complexity of the basis classes.We give examples of the application of these techniques in finding data-dependent risk bounds for decision trees, neural networks and support vector machines.
Unable to display preview. Download preview PDF.
- 2.Peter L. Bartlett, Stéphane Boucheron, and Gábor Lugosi. Model selection and error estimation. Machine Learning, 2001. (To appear).Google Scholar
- 3.Mostefa Golea, Peter L. Bartlett, and Wee Sun Lee. Generalization in decision trees and DNF: Does size matter? In NIPS 10, pages 259–265, 1998.Google Scholar
- 5.V. Koltchinskii. Rademacher penalties and structural risk minimization. Technical report, Department of Mathematics and Statistics, University of New Mexico, 2000.Google Scholar
- 6.V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Technical report, Department of Mathematics and Statistics, University of New Mexico, 2000.Google Scholar
- 7.V. Koltchinskii and D. Panchenko. Rademacher processes and bounding the risk of function learning. Technical report, Department of Mathematics and Statistics, University of New Mexico, 2000.Google Scholar
- 8.E.B. Kong and T.G. Dietterich. Error-correcting output coding corrects bias and variance. In Proc. 12th International Conference on Machine Learning, pages 313–321. Morgan Kaufmann, 1995.Google Scholar
- 9.M. Ledoux and M. Talagrand. Probability in Banach Spaces: isoperimetry and processes. Springer, 1991.Google Scholar
- 11.C. McDiarmid. On the method of bounded differences. In Surveys in Combinatorics 1989, pages 148–188. Cambridge University Press, 1989.Google Scholar
- 12.Shahar Mendelson. l-norm and its application to learning theory. Positivity, 2001. (To appear—see http://www.axiom.anu.edu.au/~shahar).
- 13.Shahar Mendelson. Rademacher averages and phase transitions in Glivenko-Cantelli classes. (see http://www.axiom.anu.edu.au/~shahar), 2001.
- 14.Shahar Mendelson. Some remarks on covering numbers. (unpublished manuscript—see http://www.axiom.anu.edu.au/~shahar), 2001.
- 15.G. Pisier. The volume of convex bodies and Banach space geometry. Cambridge University Press, 1989.Google Scholar
- 16.Robert E. Schapire. Using output codes to boost multiclass learning problems. In Machine Learning: Proc. Fourteenth International Conference, pages 313–321, 1997.Google Scholar
- 19.N. Tomczak-Jaegermann. Banach-Mazur distance and finite-dimensional operator ideals. Number 38 in Pitman Monographs and Surveys in Pure and Applied Mathematics. Pitman, 1989.Google Scholar
- 21.R.C. Williamson, A.J. Smola, and B. Schölkopf. Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. IEEE Transactions on Information Theory, 2001. (To appear).Google Scholar