Abstract
We review recent advances on uniform martingale laws of large numbers and the associated sequential complexity measures. These results may be considered as forming a non-i.i.d. generalization of Vapnik–Chervonenkis theory. We discuss applications to online learning, provide a recipe for designing online learning algorithms, and illustrate the techniques on the problem of online node classification. We outline connections to statistical learning theory and discuss inductive principles of stochastic approximation and empirical risk minimization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abernethy, J., Bartlett, P.L., Rakhlin, A., Tewari, A.: Optimal strategies and minimax lower bounds for online convex games. In: Proceedings of the 21st Annual Conference on Learning Theory, pp. 414–424. Omnipress (2008)
Abernethy, J., Agarwal, A., Bartlett, P., Rakhlin, A.: A stochastic view of optimal regret through minimax duality. In: Proceedings of the 22th Annual Conference on Learning Theory (2009)
Aizerman, M.A., Braverman, E.M., Rozonoer, L.I.: The probability problem of pattern recognition learning and the method of potential functions. Avtomatika i Telemekhanika 25, 1175–1193 (1964)
Aizerman, M.A., Braverman, E.M., Rozonoer, L.I.: Theoretical foundations of the potential function method in pattern recognition learning. Avtomatika i Telemekhanika 25, 821–837 (1964)
Aizerman, M.A., Braverman, E.M., Rozonoer, L.I.: The Method of Potential Functions in the Theory of Machine Learning. Nauka, Moscow (1970)
Alon, N., Ben-David, S., Cesa-Bianchi, N., Haussler, D.: Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44(4), 615–631 (1997)
Audibert, J.: Progressive mixture rules are deviation suboptimal. Adv. Neural Inf. Process. Syst. 20(2), 41–48 (2007)
Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2002)
Bartlett, P.L., Long, P.M., Williamson, R.C.: Fat-shattering and the learnability of real-valued functions. J. Comput. Syst. Sci. 52(3), 434–452 (1996)
Ben-David, S., Pál, D., Shalev-Shwartz, S.: Agnostic online learning. In: Proceedings of the 22th Annual Conference on Learning Theory (2009)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D.P., Schapire, R.E., Warmuth, M.K.: How to use expert advice. J. ACM 44(3), 427–485 (1997)
Cesa-Bianchi, N., Conconi, A., Gentile, C.: On the generalization ability of on-line learning algorithms. IEEE Trans. Inf. Theory 50(9), 2050–2057 (2004)
Dudley, R.M.: A course on empirical processes. In: Hennequin, P.L. (ed.) École d’Été de Probabilités de Saint-Flour XII–1982. Lecture Notes in Mathematics, vol. 1097, pp. 2–142. Springer, Berlin (1984)
Dudley, R.M.: Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999)
Juditsky, A., Rigollet, P., Tsybakov, A.: Learning by mirror averaging. Ann. Stat. 36(5), 2183–2206 (2008)
Kearns, M.J., Schapire, R.E.: Efficient distribution-free learning of probabilistic concepts. J. Comput. Syst. Sci. 48(3), 464–497 (1994)
Lecué, G., Mendelson, S.: Aggregation via empirical risk minimization. Probab. Theory Relat. Fields 145(3), 591–613 (2009)
Lee, W.S., Bartlett, P.L., Williamson, R.C.: The importance of convexity in learning with squared loss. IEEE Trans. Inf. Theory 44(5), 1974–1980 (1998)
Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach. Learn. 2(4), 285–318 (1988)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)
Mendelson, S., Vershynin, R.: Entropy and the combinatorial dimension. Invent. Math. 152(1), 37–55 (2003)
Pisier, G.: Martingales with values in uniformly convex spaces. Isr. J. Math. 20, 326–350 (1975)
Pollard, D.: Convergence of Stochastic Processes. Springer, Berlin (1984)
Rakhlin, A., Sridharan, K.: Online nonparametric regression. In: The 27th Annual Conference on Learning Theory (2014)
Rakhlin, A., Sridharan, K., Tewari, A.: Online learning: random averages, combinatorial parameters, and learnability. Adv. Neural Inf. Process. Syst. 23, 1984–1992 (2010)
Rakhlin, A., Sridharan, K., Tewari, A.: Online learning: stochastic, constrained, and smoothed adversaries. In: Advances in Neural Information Processing Systems (2011)
Rakhlin, A., Sridharan, K., Tewari, A.: Sequential complexities and uniform martingale laws of large numbers. Probab. Theory Relat. Fields (2014)
Rakhlin, A., Sridharan, K., Tsybakov, A.: Empirical entropy, minimax regret and minimax risk. Bernoulli J. (2015). Forthcoming
Rakhlin, A., Shamir, O., Sridharan, K.: Relax and randomize: from value to algorithms. Adv. Neural Inf. Process. Syst. 25, 2150–2158 (2012)
Rudelson, M., Vershynin, R.: Combinatorics of random processes and sections of convex bodies. Ann. Math. 164(2), 603–648 (2006)
Sridharan, K., Tewari, A.: Convex games in Banach spaces. In: Proceedings of the 23nd Annual Conference on Learning Theory (2010)
Steele, J.M.: Empirical discrepancies and subadditive processes. Ann. Probab. 6(1), 118–127 (1978)
Van de Geer, S.A.: Empirical Processes in M-Estimation. Cambridge University Press, Cambridge (2000)
Van Der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York (1996)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Vapnik, V.N., Chervonenkis, A.Y.: Algorithms with complete memory and recurrent algorithms in pattern recognition learning. Avtomatika i Telemekhanika 4, 95–106 (1968)
Vapnik, V.N., Chervonenkis, A.Y.: Uniform convergence of frequencies of occurrence of events to their probabilities. Dokl. Akad. Nauk SSSR 181, 915–918 (1968)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16(2), 264–280 (1971) (This volume, Chap. 3)
Vapnik, V.N., Chervonenkis, A.Y.: The necessary and sufficient conditions for the uniform convergence of averages to their expected values. Theory Probab. Appl. 26(3), 543–564 (1981)
Vovk, V.: Aggregating strategies. In: Proceedings of the Third Annual Workshop on Computational Learning Theory, pp. 371–386. Morgan Kaufmann, San Mateo (1990)
Acknowledgments
We gratefully acknowledge the support of NSF under grants CAREER DMS-0954737 and CCF-1116928.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Rakhlin, A., Sridharan, K. (2015). On Martingale Extensions of Vapnik–Chervonenkis Theory with Applications to Online Learning. In: Vovk, V., Papadopoulos, H., Gammerman, A. (eds) Measures of Complexity. Springer, Cham. https://doi.org/10.1007/978-3-319-21852-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-21852-6_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21851-9
Online ISBN: 978-3-319-21852-6
eBook Packages: Computer ScienceComputer Science (R0)