Probability Theory and Related Fields

, Volume 161, Issue 1–2, pp 111–153 | Cite as

Sequential complexities and uniform martingale laws of large numbers

  • Alexander RakhlinEmail author
  • Karthik Sridharan
  • Ambuj Tewari


We establish necessary and sufficient conditions for a uniform martingale Law of Large Numbers. We extend the technique of symmetrization to the case of dependent random variables and provide “sequential” (non-i.i.d.) analogues of various classical measures of complexity, such as covering numbers and combinatorial dimensions from empirical process theory. We establish relationships between these various sequential complexity measures and show that they provide a tight control on the uniform convergence rates for empirical processes with dependent data. As a direct application of our results, we provide exponential inequalities for sums of martingale differences in Banach spaces.


Empirical processes Dependent data Uniform Glivenko–Cantelli classes Rademacher averages Sequential prediction 

Mathematics Subject Classification (2000)

60E15 60C05 60F15 91A20 



We would like to thank J. Michael Steele, Dean Foster, and Ramon van Handel for helpful discussions. We gratefully acknowledge the support of NSF under grants CAREER DMS-0954737 and CCF-1116928.


  1. 1.
    Abernethy, J., Agarwal, A., Bartlett, P., Rakhlin, A.: A stochastic view of optimal regret through minimax duality. In: Proceedings of the 22nd Annual Conference on Learning Theory (2009)Google Scholar
  2. 2.
    Adams, T.M., Nobel, A.B.: Uniform convergence of Vapnik–Chervonenkis classes under ergodic sampling. Ann. Probab. 38(4), 1345–1367 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Alon, N., Ben-David, S., Cesa-Bianchi, N., Haussler, D.: Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44, 615–631 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Alon, N., Spencer, J.: The Probabilistic Method, 2nd edn. Wiley, New York (2000)CrossRefzbMATHGoogle Scholar
  5. 5.
    Bartlett, P.L., Long, P.M., Williamson, R.C.: Fat-shattering and the learnability of real-valued functions. In: Proceedings of the 7th Annual ACM Conference on Computational Learning Theory, pp. 299–310. ACM Press, New York (1994)Google Scholar
  6. 6.
    Bartlett, P.L., Long, P.M., Williamson, R.C.: Fat-shattering and the learnability of real-valued functions. J. Comput. Syst. Sci. 52(3), 434–452 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2003)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Ben-David, S., Pal, D., Shalev-Shwartz, S.: Agnostic online learning. In: Proceedings of the 22th Annual Conference on Learning Theory (2009)Google Scholar
  9. 9.
    de la Peña, V., Giné, E.: Decoupling: From Dependence to Independence. Springer, Berlin (1998)zbMATHGoogle Scholar
  10. 10.
    Dehling, H., Mikosch, T., Sørensen, M.: Empirical Process Techniques for Dependent Data. Birkhäuser, Boston (2002)CrossRefzbMATHGoogle Scholar
  11. 11.
    Dudley, R.M.: Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999)CrossRefzbMATHGoogle Scholar
  12. 12.
    Dudley, R.M., Gine, E., Zinn, J.: Uniform and universal Glivenko–Cantelli classes. J. Theor. Probab. 4, 485–510 (1991)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Dudley, R.M., Kunita, H., Ledrappier, F., Hennequin, P.L.: A course on empirical processes. In: École d’Été de Probabilités de Saint-Flour XII-1982. Lecture Notes in Mathematics, vol. 1097, pp. 1–142. Springer, Berlin (1984)Google Scholar
  14. 14.
    Giné, E., Zinn, J.: Some limit theorems for empirical processes. Ann. Probab. 12(4), 929–989 (1984)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Kakade, S.M., Sridharan, K., Tewari, A.: On the complexity of linear prediction: risk bounds, margin bounds, and regularization. In: Advances in Neural Information Processing Systems 21, pp. 793–800. MIT Press, Cambridge (2009)Google Scholar
  16. 16.
    Kearns, M.J., Schapire, R.E.: Efficient distribution-free learning of probabilistic concepts. J. Comput. Syst. Sci. 48(3), 464–497 (1994)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Ledoux, M., Talagrand, M.: Probability in Banach Spaces. Springer, New York (1991)CrossRefzbMATHGoogle Scholar
  18. 18.
    Littlestone, N.: Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach. Learn. 2(4), 285–318 (1988)Google Scholar
  19. 19.
    Mendelson, S.: A few notes on statistical learning theory. In: Mendelson, S., Smola, A.J. (eds.) Advanced Lectures in Machine Learning, LNCS 2600, Machine Learning Summer School 2002, Canberra, Australia, February 11–22, pp. 1–40. Springer, Berlin (2003)Google Scholar
  20. 20.
    Mendelson, S., Vershynin, R.: Entropy and the combinatorial dimension. Inventiones mathematicae 152(1), 37–55 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
  21. 21.
    Nobel, A., Dembo, A.: A note on uniform laws of averages for dependent processes. Stat. Probab. Lett. 17, 169–172 (1993)CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Pinelis, I.: Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22(4), 1679–1706 (1994)CrossRefzbMATHMathSciNetGoogle Scholar
  23. 23.
    Pisier, G.: Martingales with values in uniformly convex spaces. Isr. J. Math. 20, 326–350 (1975)CrossRefzbMATHMathSciNetGoogle Scholar
  24. 24.
    Pollard, D.: Empirical Processes: Theory and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 2. Institute of Mathematical Statistics, Hayward (1990)Google Scholar
  25. 25.
    Rakhlin, A., Sridharan, K., Tewari, A.: Online learning: Random averages, combinatorial parameters, and learnability. Adv. Neural Inf. Process. Syst. 23, 1984–1992 (2010)Google Scholar
  26. 26.
    Rudelson, M., Vershynin, R.: Combinatorics of random processes and sections of convex bodies. Ann. Math. 164(2), 603–648 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Sauer, N.: On the density of families of sets. J. Comb. Theory Ser. A 13, 145–147 (1972)CrossRefzbMATHGoogle Scholar
  28. 28.
    Shelah, S.: A combinatorial problem: stability and order for models and theories in infinitary languages. Pac. J. Math. 4, 247–261 (1972)CrossRefGoogle Scholar
  29. 29.
    Steele, J.M.: Empirical discrepancies and subadditive processes. Ann. Probab. 6(1), 118–127 (1978)CrossRefzbMATHMathSciNetGoogle Scholar
  30. 30.
    Talagrand, M.: The Glivenko–Cantelli problem. Ann. Probab. 15, 837–870 (1987)CrossRefzbMATHMathSciNetGoogle Scholar
  31. 31.
    Talagrand, M.: The Glivenko–Cantelli problem, ten years later. J. Theor. Prob. 9(2), 371–384 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  32. 32.
    Talagrand, M.: The Generic Chaining: Upper and Lower Bounds for Stochastic Processes. Springer, Berlin (2005)Google Scholar
  33. 33.
    Van Der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series. Springer, New York (1996)CrossRefGoogle Scholar
  34. 34.
    van de Geer, S.A.: Empirical Processes in M-Estimation. Cambridge University Press, Cambridge (2000)Google Scholar
  35. 35.
    van de Geer, S.A.: On Hoeffding’s inequality for dependent random variables. In: Empirical Process Techniques for Dependent Data, pp. 161–169. Springer, Berlin (2002)Google Scholar
  36. 36.
    van Handel, R.: The universal Glivenko–Cantelli property. Probab. Theory Relat. Fields 155(3–4), 911–934 (2013)Google Scholar
  37. 37.
    Vapnik, V.N., Chervonenkis, AYa.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16(2), 264–280 (1971)CrossRefzbMATHMathSciNetGoogle Scholar
  38. 38.
    Vapnik, V.N., Chervonenkis, AYa.: The necessary and sufficient conditions for the uniform convergence of averages to their expected values. Teoriya Veroyatnostei i Ee Primeneniya 26(3), 543–564 (1981)zbMATHMathSciNetGoogle Scholar
  39. 39.
    Yu, B.: Rates of convergence for empirical processes of stationary mixing sequences. Ann. Probab. 22(1), 94–116 (1994)CrossRefzbMATHMathSciNetGoogle Scholar
  40. 40.
    Yukich, J.E.: Rates of convergence for classes of functions: the non-i.i.d. case. J. Multivar. Anal. 20(2), 175–189 (1986)CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Alexander Rakhlin
    • 1
    Email author
  • Karthik Sridharan
    • 1
  • Ambuj Tewari
    • 2
  1. 1.University of PennsylvaniaPhiladelphiaUSA
  2. 2.University of MichiganAnn ArborUSA

Personalised recommendations