Skip to main content
Log in

Generalization bounds for learning under graph-dependence: a survey

  • Published:
Machine Learning Aims and scope Submit manuscript

Abstract

Traditional statistical learning theory relies on the assumption that data are identically and independently distributed (i.i.d.). However, this assumption often does not hold in many real-life applications. In this survey, we explore learning scenarios where examples are dependent and their dependence relationship is described by a dependency graph, a commonly utilized model in probability and combinatorics. We collect various graph-dependent concentration bounds, which are then used to derive Rademacher complexity and stability generalization bounds for learning from graph-dependent data. We illustrate this paradigm through practical learning tasks and provide some research directions for future work. To our knowledge, this survey is the first of this kind on this subject.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of data and materials

Not applicable.

Code availability

Not applicable.

References

  • Agarwal, S., & Niyogi, P. (2009). Generalization bounds for ranking algorithms via algorithmic stability. Journal of Machine Learning Research, 10(16), 441–474.

    MathSciNet  Google Scholar 

  • Amini, M. R., & Usunier, N. (2015). Learning with partially labeled and interdependent data. Springer.

    Book  Google Scholar 

  • Anselin, L. (2013). Spatial econometrics: Methods and models (Vol. 4). Springer.

    Google Scholar 

  • Baldi, P., & Rinott, Y. (1989). On normal approximations of distributions in terms of dependency graphs. The Annals of Probability, 17(4), 1646–1650.

    Article  MathSciNet  Google Scholar 

  • Bartlett, P. L., & Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.

    MathSciNet  Google Scholar 

  • Betlei, A., Diemert, E., & Amini, M. (2021). Uplift modeling with generalization guarantees. In 27th ACM SIGKDD conference on knowledge discovery and data mining (pp. 55–65).

  • Bollobás, B. (1998). Modern graph theory (Vol. 184). Springer.

    Google Scholar 

  • Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the 5th annual workshop on computational learning theory (COLT’92) (pp. 144–152).

  • Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press.

    Book  Google Scholar 

  • Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. Journal of Machine Learning Research, 2, 499–526.

    MathSciNet  Google Scholar 

  • Bousquet, O., Klochkov, Y., & Zhivotovskiy, N. (2020). Sharper bounds for uniformly stable algorithms. In Conference on learning theory (pp. 610–626). PMLR.

  • Chen, L. H. (1975). Poisson approximation for dependent trials. The Annals of Probability, 534–545

  • Chen, L. H. (1978). Two central limit problems for dependent random variables. Probability Theory and Related Fields, 43(3), 223–243.

    MathSciNet  Google Scholar 

  • Cortes, C., & Mohri, M. (2004). AUC optimization vs. error rate minimization. In Advances in neural information processing systems.

  • Dehling, H., & Philipp, W. (2002). Empirical process techniques for dependent data. In Empirical process techniques for dependent data (pp. 3–113). Springer.

  • Devroye, L., & Wagner, T. (1979). Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 25(5), 601–604.

    Article  MathSciNet  Google Scholar 

  • Dousse, J., & Féray, V. (2019). Weighted dependency graphs and the Ising model. Annales de l’Institut Henri Poincaré D, 6(4), 533–571.

    MathSciNet  Google Scholar 

  • Erdős, P., & Lovász, L. (1975). Problems and results on 3-chromatic hypergraphs and some related questions. Infinite and Finite Sets, 10(2), 609–627.

    MathSciNet  Google Scholar 

  • Feldman, V., & Vondrak, J. (2019). High probability generalization bounds for uniformly stable algorithms with nearly optimal rate. In Conference on learning theory (pp. 1270–1279). PMLR.

  • Féray, V. (2018). Weighted dependency graphs. Electronic Journal of Probability, 23.

  • Freund, Y., Iyer, R. D., Schapire, R. E., et al. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.

    MathSciNet  Google Scholar 

  • Halin, R. (1991). Tree-partitions of infinite graphs. Discrete Mathematics, 97(1–3), 203–217.

    Article  MathSciNet  Google Scholar 

  • Hang, H., & Steinwart, I. (2014). Fast learning from \(\alpha \)-mixing observations. Journal of Multivariate Analysis, 127, 184–199.

    Article  MathSciNet  Google Scholar 

  • He, F., Zuo, L., & Chen, H. (2016). Stability analysis for ranking with stationary \(\varphi \)-mixing samples. Neurocomputing, 171, 1556–1562.

    Article  Google Scholar 

  • Hoeffding, W., & Robbins, H. (1948). The central limit theorem for dependent random variables. Duke Mathematical Journal, 15(3), 773–780.

    Article  MathSciNet  Google Scholar 

  • Ibragimov, I. A. (1962). Some limit theorems for stationary processes. Theory of Probability & its Applications, 7(4), 349–382.

    Article  MathSciNet  Google Scholar 

  • Isaev, M., Rodionov, I., & Zhang, R.R. et al (2021). Extremal independence in discrete random systems. arXiv preprint arXiv:2105.04917

  • Janson, S. (1988). Normal convergence by higher semiinvariants with applications to sums of dependent random variables and random graphs. The Annals of Probability, 16(1), 305–312.

    Article  MathSciNet  Google Scholar 

  • Janson, S. (1990). Poisson approximation for large deviations. Random Structures & Algorithms, 1(2), 221–229.

    Article  MathSciNet  Google Scholar 

  • Janson, S. (2004). Large deviations for sums of partly dependent random variables. Random Structures & Algorithms, 24(3), 234–248.

    Article  MathSciNet  Google Scholar 

  • Janson, S., Łuczak, T., & Rucinski, A. (1988). An exponential bound for the probability of nonexistence of a specified subgraph in a random graph. Institute for Mathematics and its Applications (USA)

  • Kearns, M., & Ron, D. (1999). Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Computation, 11(6), 1427–1453.

    Article  Google Scholar 

  • Kirichenko, A., & Van Zanten, H. (2015). Optimality of Poisson processes intensity learning with Gaussian processes. The Journal of Machine Learning Research, 16(1), 2909–2919.

    MathSciNet  Google Scholar 

  • Kontorovich, L. (2007). Measure concentration of strongly mixing processes with applications. Carnegie Mellon University.

  • Kontorovich, L. A., & Ramanan, K. (2008). Concentration inequalities for dependent random variables via the martingale method. The Annals of Probability, 36(6), 2126–2158.

    Article  MathSciNet  Google Scholar 

  • Kutin, S., & Niyogi, P. (2002). Almost-everywhere algorithmic stability and generalization error. In Proceedings of the eighteenth conference on uncertainty in artificial intelligence (pp. 275–282). Morgan Kaufmann Publishers Inc.

  • Kuznetsov, V., & Mohri, M. (2017). Generalization bounds for non-stationary mixing processes. Machine Learning, 106(1), 93–117.

    Article  MathSciNet  Google Scholar 

  • Lampert, C.H., Ralaivola, L., & Zimin, A. (2018). Dependency-dependent bounds for sums of dependent random variables. arXiv preprint arXiv:1811.01404

  • Ledoux, M., & Talagrand, M. (1991). Probability in Banach spaces: Isoperimetry and processes. Springer.

    Book  Google Scholar 

  • Linderman, S., & Adams, R. (2014). Discovering latent network structure in point process data. In International conference on machine learning (pp. 1413–1421).

  • Lozano, A. C., Kulkarni, S. R., & Schapire, R. E. (2006). Convergence and consistency of regularized boosting algorithms with stationary \(\beta \)-mixing observations. In: Advances in neural information processing systems (pp. 819–826).

  • McDiarmid, C. (1989). On the method of bounded differences. Surveys in Combinatorics, 141(1), 148–188.

    MathSciNet  Google Scholar 

  • Meir, R. (2000). Nonparametric time series prediction through adaptive model selection. Machine Learning, 39(1), 5–34.

    Article  Google Scholar 

  • Mohri, M., & Rostamizadeh, A. (2008). Stability bounds for non-i.i.d. processes. In Advances in neural information processing systems (pp. 1025–1032).

  • Mohri, M., & Rostamizadeh, A. (2009). Rademacher complexity bounds for non-i.i.d. processes. In Advances in neural information processing systems (pp. 1097–1104).

  • Mohri, M., & Rostamizadeh, A. (2010). Stability bounds for stationary \(\varphi \)-mixing and \(\beta \)-mixing processes. Journal of Machine Learning Research, 11, 789–814.

    MathSciNet  Google Scholar 

  • Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning. MIT press.

    Google Scholar 

  • Peña, V. H., & Giné, E. (1999). Decoupling: From dependence to independence. Springer.

    Book  Google Scholar 

  • Ralaivola, L., & Amini, M. R. (2015). Entropy-based concentration inequalities for dependent variables. In International conference on machine learning (pp. 2436–2444).

  • Ralaivola, L., Szafranski, M., & Stempfel, G. (2010). Chromatic PAC-Bayes bounds for non-iid data: Applications to ranking and stationary \(\beta \)-mixing processes. Journal of Machine Learning Research, 11, 1927–1956.

    MathSciNet  Google Scholar 

  • Rogers, W. H., & Wagner, T. J. (1978). A finite sample distribution-free performance bound for local discrimination rules. The Annals of Statistics, 506–514.

  • Rosenblatt, M. (1956). A central limit theorem and a strong mixing condition. Proceedings of the National Academy of Sciences of the United States of America, 42(1), 43.

    Article  MathSciNet  Google Scholar 

  • Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.

    Article  Google Scholar 

  • Seese, D. (1985). Tree-partite graphs and the complexity of algorithms. In International conference on fundamentals of computation theory (pp.412–421) Springer.

  • Sidana, S., Trofimov, M., Horodnytskyi, O., et al. (2021). User preference and embedding learning with implicit feedback for recommender systems. Data Mining Knowledge Discovery, 35(2), 568–592.

    Article  Google Scholar 

  • Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory. The Regents of the University of California.

  • Steinwart, I., & Christmann, A. (2009). Fast learning from non-i.i.d. observations. In Advances in neural information processing systems (pp. 1768–1776).

  • Usunier, N., Amini, M. R., & Gallinari, P. (2005). Generalization error bounds for classifiers trained with interdependent data. Advances in Neural Information Processing Systems, 18, 1369–1376.

    Google Scholar 

  • Volkonskii, V., & Rozanov, Y. A. (1959). Some limit theorems for random functions. I. Theory of Probability & its Applications, 4(2), 178–197.

    Article  MathSciNet  Google Scholar 

  • Weston, J., & Watkins, C. (1998). Multi-class support vector machines.

  • Wood, D. R. (2009). On tree-partition-width. European Journal of Combinatorics, 30(5), 1245–1253.

    Article  MathSciNet  Google Scholar 

  • Yu, B. (1994). Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 94–116.

  • Zhang, R. R. (2022). When Janson meets McDiarmid: Bounded difference inequalities under graph-dependence. Statistics & Probability Letters, 181(109), 272.

    MathSciNet  Google Scholar 

  • Zhang, R. R., Liu, X., Wang, Y., & Wang, L. (2019). McDiarmid-type inequalities for graph-dependent variables and stability bounds. Advances in Neural Information Processing Systems, 32, 10890–10901.

    Google Scholar 

Download references

Acknowledgements

R.-R. Z. thanks David Wood for email communications on tree-partitions. The authors are sincerely grateful to the referees for carefully reading the manuscript and providing invaluable comments and suggestions, which led to a substantial improvement in the presentation.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Author information

Authors and Affiliations

Authors

Contributions

R.-R. Z.: the first and final draft, stability bound, and its applications. M.-R. A.: fractional Rademacher complexity bound, and its applications.

Corresponding author

Correspondence to Rui-Ray Zhang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

All authors participated in this study give the publisher the permission to publish this work.

Additional information

Editor: Aryeh Kontorovich.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, RR., Amini, MR. Generalization bounds for learning under graph-dependence: a survey. Mach Learn (2024). https://doi.org/10.1007/s10994-024-06536-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10994-024-06536-9

Keywords

Navigation