Statistical learning based on Markovian data maximal deviation inequalities and learning rates

  • Stephan ClémençonEmail author
  • Patrice Bertail
  • Gabriela Ciołek


In statistical learning theory, numerous works established non-asymptotic bounds assessing the generalization capacity of empirical risk minimizers under a large variety of complexity assumptions for the class of decision rules over which optimization is performed, by means of sharp control of uniform deviation of i.i.d. averages from their expectation, while fully ignoring the possible dependence across training data in general. It is the purpose of this paper to show that similar results can be obtained when statistical learning is based on a data sequence drawn from a (Harris positive) Markov chain X, through the running example of estimation of minimum volume sets (MV-sets) related to X’s stationary distribution, an unsupervised statistical learning approach to anomaly/novelty detection. Based on novel maximal deviation inequalities we establish, using the regenerative method, learning rate bounds that depend not only on the complexity of the class of candidate sets but also on the ergodicity rate of the chain X, expressed in terms of tail conditions for the length of the regenerative cycles. In particular, this approach fully tailored to Markovian data permits to interpret the rate bound results obtained in frequentist terms, in contrast to alternative coupling techniques based on mixing conditions: the larger the expected number of cycles over a trajectory of finite length, the more accurate the MV-set estimates. Beyond the theoretical analysis, this phenomenon is supported by illustrative numerical experiments.


Concentration inequality Empirical process Generalization bound Harris positive Markov chain Minimum volume set Novelty detection Regenerative method Stationary probability distribution Unsupervised learning 

Mathematics Subject Classification (2010)

60J20 60J05 62M05 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



This research was supported by a public grant as part of the Investissement d’avenir, project reference ANR-11-LABX-0056-LMH. Gabriela Ciołek was also supported by the Polish National Science Centre NCN (grant No. UMO2016/23/N/ST1/01355 ) and (partly) by the Ministry of Science and Higher Education. This research has also been conducted as part of the project Labex MME-DII (ANR11-LBX-0023-01). Part of this research was conducted during a stay of Gabriela Ciołek at Center for Advanced Intelligence Project (AIP), RIKEN, Tokyo, Japan.


  1. 1.
    Adamczak, R., Bednorz, W.: Exponential concentration inequalities for additive functionals of Markov chains. ESAIM: PS 19, 440–481 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Adams, T.M., Nobel, A.B.: Uniform convergence of Vapnik-Chervonenkis classes under ergodic sampling. Ann. Probab. 38, 1345–1367 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Agarwal, A., Duchi, J.: The generalization ability of online algorithms for dependent data. IEEE Trans. Inf. Theory 59(1), 573–587 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Alquier, P., Wintenberger, O.: Model selection for weakly dependent time series forecasting. Bernoulli 18, 883–913 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Asmussen, S.: Applied probability and queues. Springer, New York (2003)zbMATHGoogle Scholar
  6. 6.
    Bertail, P., Ciołek, G.: New Bernstein and Hoeffding type inequalities for regenerative Markov chains. ALEA Lat. Am. J. Probab. Math. Stat. 259, Äì–277 (2019)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Bertail, P., Clémençon, S.: Edgeworth expansions for suitably normalized sample mean statistics of atomic Markov chains. Prob. Th. Rel Fields 130(3), 388–414 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Bertail, P., Clémençon, S.: A renewal approach to Markovian U-statistics. Math. Methods Statist. 20(2), 79–105 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Bertail, P., Clémençon, S.: Regenerative-block bootstrap for Markov chains. Bernoulli 12(4), 689–712 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Bertail, P., Clémençon, S.: Sharp bounds for the tails of functionals of Markov chains. Theory of Probability and Its Applications 54(3), 505–515 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Ciołek, G.: Bootstrap uniform central limit theorems for harris recurrent Markov chains. Electronic Journal of Statistics 10, 2157–2178 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Clémençon, S., Bertail, P., Papa, G.: Learning from survey training samples: rate bounds for Horvitz-Thompson risk minimizers. In: Proceedings of ACML’16 (2016)Google Scholar
  13. 13.
    de la Pena, V., Giné, E.: Decoupling: from dependence to independence. Springer, Berlin (1999)CrossRefzbMATHGoogle Scholar
  14. 14.
    Di, J., Kolaczyk, E.: Complexity-penalized estimation of minimum volume sets for dependent data. J. Multivar. Anal. 101(9), 1910–1926 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Einmahl, J.H.J., Mason, D.M.: Generalized quantile process. Ann. Stat. 20, 1062–1078 (1992)CrossRefzbMATHGoogle Scholar
  16. 16.
    Giné, E., Zinn, J.: Some limit theorems for empirical processes. Ann. Probab. 12(4), 929–998 (1984). With discussionMathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Hairer, M., Mattingly, J.C.: Yet another look at harris, Äô ergodic theorem for Markov chains. Seminar on Stochastic Analysis, Random Fields and Applications VI. Progr Probab. 63, 109–117 (2011)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Hanneke, S.: Learning whenever learning is possible: Universal learning under general stochastic processes. arXiv:1706.01418 (2017)
  19. 19.
    Jain, J., Jamison, B.: Contributions to Doeblin’s theory of Markov processes. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 8, 19–40 (1967)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Koltchinskii, V.: Rademacher penalties and structural risk minimization. IEEE Trans. Inf. Theory 47, 1902–1914 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Kuznetsov, V., Mohri, M.: Generalization bounds for time series prediction with non-stationary processes. In: Proceedings of ALT’14 (2014)Google Scholar
  22. 22.
    Massart, P.: Some applications of concentration inequalities to statistics. Annales de la faculté des sciences de Toulouse 9, 245–303 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    McGoff, K., Nobel, A.B.: Empirical risk minimization and complexity of dunamical models. Submitted (2018)Google Scholar
  24. 24.
    Merlevède, F., Peligrad, M.: Rosenthal-type inequalities for the maximum of partial sums of stationary processes and examples. Ann. Probab. 41, 914–960 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Meyn, S.P., Tweedie, R.L.: Markov chains and stochastic stability. Springer, Berlin (1996)zbMATHGoogle Scholar
  26. 26.
    Montgomery-Smith, S.J.: Comparison of sums of independent identically distributed random vectors. J. Math. Anal. Appl. 14, 281–285 (1993)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Nummelin, E.: A splitting technique for Harris recurrent chains. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 43, 309–318 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Peligrad, M.: The r-quick version of the strong law for stationary ϕ-mixing sequences. In: Almost Everywhere Convergence (Columbus, OH, 1988). Academic Press, Boston (1989)Google Scholar
  29. 29.
    Petrov, V.V.: Limit theorems of probability theory: sequences of independent random variables. Oxford studies in probability. Clarendon Press, Oxford (1995)zbMATHGoogle Scholar
  30. 30.
    Polonik, W.: Minimum volume sets and generalized quantile processes. Stochastic Processes and their Applications 69(1), 1–24 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Revuz, D.: Markov chains. 2nd edition, North-Holland (1984)Google Scholar
  32. 32.
    Rosenthal, H.P.: On the subspaces of lp (p > 2) spanned by sequences of independent random variables. Israel J. Math. 8, 273–303 (1970)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Scott, C., Nowak, R.: Learning minimum volume sets. J. Mach. Learn. Res. 7, 665–704 (2006)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Shao, Q.: Maximal inequalities for partial sums of ρ-mixing sequences. Ann. Probab. 23, 948–965 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Steinwart, I., Christmann, A.: Fast learning from non-i.i.d. observations. NIPS 22, 1768–1776 (2009)Google Scholar
  36. 36.
    Steinwart, I., Hush, D., Scovel, C.: Learning from dependent observations. J. Multivar. Anal. 100(1), 175–194 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Thorisson, H.: Coupling, stationarity and regeneration. Springer, Berlin (2000)CrossRefzbMATHGoogle Scholar
  38. 38.
    Tuominen, P.K., Tweedie, R.: Subgeometric rates of convergence of f-ergodic Markov chains. Adv. Appl. Probab. 26, 775–798 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  39. 39.
    Utev, S.A.: Sums of random variables with ϕ-mixing. Sib. Adv. Math. 1, 124–155 (1991)zbMATHGoogle Scholar
  40. 40.
    van der Vaart, A.W., Wellner, J.A.: Weak convergence and empirical processes. Springer, Berlin (1996)CrossRefzbMATHGoogle Scholar
  41. 41.
    Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)CrossRefGoogle Scholar
  42. 42.
    Viennet, G.: Inequalities for absolutely regular sequences: application to density estimation. Probab. Theory Relat. Fields 107, 467–492 (1997)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.LTCI, Télécom Paris, Institut Polytechnique de ParisParisFrance
  2. 2.Modal’X, UPL, Univ Paris NanterreNanterreFrance
  3. 3.Faculty of Physics and Applied Computer Science AGH University of Science and TechnologyKrakowPoland

Personalised recommendations