Abstract
Traditional statistical learning theory relies on the assumption that data are identically and independently distributed (i.i.d.). However, this assumption often does not hold in many real-life applications. In this survey, we explore learning scenarios where examples are dependent and their dependence relationship is described by a dependency graph, a commonly utilized model in probability and combinatorics. We collect various graph-dependent concentration bounds, which are then used to derive Rademacher complexity and stability generalization bounds for learning from graph-dependent data. We illustrate this paradigm through practical learning tasks and provide some research directions for future work. To our knowledge, this survey is the first of this kind on this subject.
Similar content being viewed by others
Availability of data and materials
Not applicable.
Code availability
Not applicable.
References
Agarwal, S., & Niyogi, P. (2009). Generalization bounds for ranking algorithms via algorithmic stability. Journal of Machine Learning Research, 10(16), 441–474.
Amini, M. R., & Usunier, N. (2015). Learning with partially labeled and interdependent data. Springer.
Anselin, L. (2013). Spatial econometrics: Methods and models (Vol. 4). Springer.
Baldi, P., & Rinott, Y. (1989). On normal approximations of distributions in terms of dependency graphs. The Annals of Probability, 17(4), 1646–1650.
Bartlett, P. L., & Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.
Betlei, A., Diemert, E., & Amini, M. (2021). Uplift modeling with generalization guarantees. In 27th ACM SIGKDD conference on knowledge discovery and data mining (pp. 55–65).
Bollobás, B. (1998). Modern graph theory (Vol. 184). Springer.
Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the 5th annual workshop on computational learning theory (COLT’92) (pp. 144–152).
Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press.
Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. Journal of Machine Learning Research, 2, 499–526.
Bousquet, O., Klochkov, Y., & Zhivotovskiy, N. (2020). Sharper bounds for uniformly stable algorithms. In Conference on learning theory (pp. 610–626). PMLR.
Chen, L. H. (1975). Poisson approximation for dependent trials. The Annals of Probability, 534–545
Chen, L. H. (1978). Two central limit problems for dependent random variables. Probability Theory and Related Fields, 43(3), 223–243.
Cortes, C., & Mohri, M. (2004). AUC optimization vs. error rate minimization. In Advances in neural information processing systems.
Dehling, H., & Philipp, W. (2002). Empirical process techniques for dependent data. In Empirical process techniques for dependent data (pp. 3–113). Springer.
Devroye, L., & Wagner, T. (1979). Distribution-free performance bounds for potential function rules. IEEE Transactions on Information Theory, 25(5), 601–604.
Dousse, J., & Féray, V. (2019). Weighted dependency graphs and the Ising model. Annales de l’Institut Henri Poincaré D, 6(4), 533–571.
Erdős, P., & Lovász, L. (1975). Problems and results on 3-chromatic hypergraphs and some related questions. Infinite and Finite Sets, 10(2), 609–627.
Feldman, V., & Vondrak, J. (2019). High probability generalization bounds for uniformly stable algorithms with nearly optimal rate. In Conference on learning theory (pp. 1270–1279). PMLR.
Féray, V. (2018). Weighted dependency graphs. Electronic Journal of Probability, 23.
Freund, Y., Iyer, R. D., Schapire, R. E., et al. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.
Halin, R. (1991). Tree-partitions of infinite graphs. Discrete Mathematics, 97(1–3), 203–217.
Hang, H., & Steinwart, I. (2014). Fast learning from \(\alpha \)-mixing observations. Journal of Multivariate Analysis, 127, 184–199.
He, F., Zuo, L., & Chen, H. (2016). Stability analysis for ranking with stationary \(\varphi \)-mixing samples. Neurocomputing, 171, 1556–1562.
Hoeffding, W., & Robbins, H. (1948). The central limit theorem for dependent random variables. Duke Mathematical Journal, 15(3), 773–780.
Ibragimov, I. A. (1962). Some limit theorems for stationary processes. Theory of Probability & its Applications, 7(4), 349–382.
Isaev, M., Rodionov, I., & Zhang, R.R. et al (2021). Extremal independence in discrete random systems. arXiv preprint arXiv:2105.04917
Janson, S. (1988). Normal convergence by higher semiinvariants with applications to sums of dependent random variables and random graphs. The Annals of Probability, 16(1), 305–312.
Janson, S. (1990). Poisson approximation for large deviations. Random Structures & Algorithms, 1(2), 221–229.
Janson, S. (2004). Large deviations for sums of partly dependent random variables. Random Structures & Algorithms, 24(3), 234–248.
Janson, S., Łuczak, T., & Rucinski, A. (1988). An exponential bound for the probability of nonexistence of a specified subgraph in a random graph. Institute for Mathematics and its Applications (USA)
Kearns, M., & Ron, D. (1999). Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Computation, 11(6), 1427–1453.
Kirichenko, A., & Van Zanten, H. (2015). Optimality of Poisson processes intensity learning with Gaussian processes. The Journal of Machine Learning Research, 16(1), 2909–2919.
Kontorovich, L. (2007). Measure concentration of strongly mixing processes with applications. Carnegie Mellon University.
Kontorovich, L. A., & Ramanan, K. (2008). Concentration inequalities for dependent random variables via the martingale method. The Annals of Probability, 36(6), 2126–2158.
Kutin, S., & Niyogi, P. (2002). Almost-everywhere algorithmic stability and generalization error. In Proceedings of the eighteenth conference on uncertainty in artificial intelligence (pp. 275–282). Morgan Kaufmann Publishers Inc.
Kuznetsov, V., & Mohri, M. (2017). Generalization bounds for non-stationary mixing processes. Machine Learning, 106(1), 93–117.
Lampert, C.H., Ralaivola, L., & Zimin, A. (2018). Dependency-dependent bounds for sums of dependent random variables. arXiv preprint arXiv:1811.01404
Ledoux, M., & Talagrand, M. (1991). Probability in Banach spaces: Isoperimetry and processes. Springer.
Linderman, S., & Adams, R. (2014). Discovering latent network structure in point process data. In International conference on machine learning (pp. 1413–1421).
Lozano, A. C., Kulkarni, S. R., & Schapire, R. E. (2006). Convergence and consistency of regularized boosting algorithms with stationary \(\beta \)-mixing observations. In: Advances in neural information processing systems (pp. 819–826).
McDiarmid, C. (1989). On the method of bounded differences. Surveys in Combinatorics, 141(1), 148–188.
Meir, R. (2000). Nonparametric time series prediction through adaptive model selection. Machine Learning, 39(1), 5–34.
Mohri, M., & Rostamizadeh, A. (2008). Stability bounds for non-i.i.d. processes. In Advances in neural information processing systems (pp. 1025–1032).
Mohri, M., & Rostamizadeh, A. (2009). Rademacher complexity bounds for non-i.i.d. processes. In Advances in neural information processing systems (pp. 1097–1104).
Mohri, M., & Rostamizadeh, A. (2010). Stability bounds for stationary \(\varphi \)-mixing and \(\beta \)-mixing processes. Journal of Machine Learning Research, 11, 789–814.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning. MIT press.
Peña, V. H., & Giné, E. (1999). Decoupling: From dependence to independence. Springer.
Ralaivola, L., & Amini, M. R. (2015). Entropy-based concentration inequalities for dependent variables. In International conference on machine learning (pp. 2436–2444).
Ralaivola, L., Szafranski, M., & Stempfel, G. (2010). Chromatic PAC-Bayes bounds for non-iid data: Applications to ranking and stationary \(\beta \)-mixing processes. Journal of Machine Learning Research, 11, 1927–1956.
Rogers, W. H., & Wagner, T. J. (1978). A finite sample distribution-free performance bound for local discrimination rules. The Annals of Statistics, 506–514.
Rosenblatt, M. (1956). A central limit theorem and a strong mixing condition. Proceedings of the National Academy of Sciences of the United States of America, 42(1), 43.
Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.
Seese, D. (1985). Tree-partite graphs and the complexity of algorithms. In International conference on fundamentals of computation theory (pp.412–421) Springer.
Sidana, S., Trofimov, M., Horodnytskyi, O., et al. (2021). User preference and embedding learning with implicit feedback for recommender systems. Data Mining Knowledge Discovery, 35(2), 568–592.
Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory. The Regents of the University of California.
Steinwart, I., & Christmann, A. (2009). Fast learning from non-i.i.d. observations. In Advances in neural information processing systems (pp. 1768–1776).
Usunier, N., Amini, M. R., & Gallinari, P. (2005). Generalization error bounds for classifiers trained with interdependent data. Advances in Neural Information Processing Systems, 18, 1369–1376.
Volkonskii, V., & Rozanov, Y. A. (1959). Some limit theorems for random functions. I. Theory of Probability & its Applications, 4(2), 178–197.
Weston, J., & Watkins, C. (1998). Multi-class support vector machines.
Wood, D. R. (2009). On tree-partition-width. European Journal of Combinatorics, 30(5), 1245–1253.
Yu, B. (1994). Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 94–116.
Zhang, R. R. (2022). When Janson meets McDiarmid: Bounded difference inequalities under graph-dependence. Statistics & Probability Letters, 181(109), 272.
Zhang, R. R., Liu, X., Wang, Y., & Wang, L. (2019). McDiarmid-type inequalities for graph-dependent variables and stability bounds. Advances in Neural Information Processing Systems, 32, 10890–10901.
Acknowledgements
R.-R. Z. thanks David Wood for email communications on tree-partitions. The authors are sincerely grateful to the referees for carefully reading the manuscript and providing invaluable comments and suggestions, which led to a substantial improvement in the presentation.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Author information
Authors and Affiliations
Contributions
R.-R. Z.: the first and final draft, stability bound, and its applications. M.-R. A.: fractional Rademacher complexity bound, and its applications.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
All authors participated in this study give the publisher the permission to publish this work.
Additional information
Editor: Aryeh Kontorovich.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, RR., Amini, MR. Generalization bounds for learning under graph-dependence: a survey. Mach Learn (2024). https://doi.org/10.1007/s10994-024-06536-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10994-024-06536-9