Abstract
This article is a review of theoretical advances in the research field of algebraic geometry and Bayesian statistics in the last two decades. Many statistical models and learning machines which contain hierarchical structures or latent variables are called nonidentifiable, because the map from a parameter to a statistical model is not one-to-one. In nonidentifiable models, both the likelihood function and the posterior distribution have singularities in general, hence it was difficult to analyze their statistical properties. However, from the end of the 20th century, new theory and methodology based on algebraic geometry have been established which enable us to investigate such models and machines in the real world. In this article, the following results in recent advances are reported. First, we explain the framework of Bayesian statistics and introduce a new perspective from the birational geometry. Second, two mathematical solutions are derived based on algebraic geometry. An appropriate parameter space can be found by a resolution map, which makes the posterior distribution be normal crossing and the log likelihood ratio function be well-defined. Third, three applications to statistics are introduced. The posterior distribution is represented by the renormalized form, the asymptotic free energy is derived, and the universal formula among the generalization loss, the cross validation, and the information criterion is established. Two mathematical solutions and three applications to statistics based on algebraic geometry reported in this article are now being used in many practical fields in data science and artificial intelligence.
This is a preview of subscription content, access via your institution.
Data availibility statement
Data sharing is not applicable to this article as no data sets were generated or analyzed during the current study.
References
Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)
Akaike, H.: On the transition of the paradigm of statistical inference. Proc. Inst. Stat. Math. 27, 5–12 (1980)
Amari, S.: Differential and algebraic geometry in multilayer perceptrons. IEICE Trans. Fundam. E84–A, 31–38 (2001)
Amari, S., Fujita, N., Shinomoto, S.: Four types of leaning curves. Neural Comput. 4, 605–618 (1992)
Amari, S., Murata, N.: Statistical theory of learning curves under entropic loss criterion. Neural Comput. 5, 140–153 (1993)
Aoyagi, M., Watanabe, S.: Stochastic complexities of reduced rank regression in Bayesian estimation. Neural Netw. 18, 924–933 (2005)
Aoyagi, M.: Stochastic complexity and generalization error of a restricted Boltzmann machine in Bayesian estimation. J. Mach. Learn. Res. 11, 1243–1272 (2010)
Aoyagi, M., Nagata, K.: Learning coefficient of generalization error in Bayesian estimation and Vandermonde matrix type singularity. Neural Comput. 24(6), 1569–1610 (2012)
Atiyah, M.F.: Resolution of singularities and division of distributions. Commun. Pure Appl. Math. 23, 145–150 (1970)
Binmore, K.: On the foundations of decision theory. Homo Oecon. 34, 259–273 (2017)
Box, G.E.P.: Science and statistics. J. Am. Stat. Assoc. 71, 791–799 (1976)
Drton, M., Plummer, M.: A Bayesian information criterion for singular models. J. R. Stat. Soc. Ser. B 56, 1–38 (2017)
Epifani, I., MacEchern, S.N., Peruggia, M.: Case-Deletion importance sampling estimators: Central limit theorems and related results. Elec. J. Stat. 2, 774–806 (2008)
Fukumizu, K.: A regularity condition of the information matrix of a multilayer perceptron network. Neural Netw. 9, 871–879 (1996)
Gelfand, A.E., Dey, D.K., Chang, H.: Model determination using predictive distributions with implementation via sampling-based method. Technical Report, Department of statistics, Stanford University, 462, 147-167 (1992)
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis III. CRC Press, Florida (2013)
Gelman, A., Shalizi, C.S.: Philosophy and the practice of Bayesian statistics. Br. J. Math. Stat. Psychol. 66, 8–38 (2013)
Gelman, A., Hwang, J., Vehtari, A.: Understanding predictive information criteria for Bayesian models. Stat. Comput. 24, 997–1016 (2014)
Hagiwara, K., Toda, N., Usui, S.: On the problem of applying AIC to determine the structure of a layered feedforward neural network. Proc. of 1993 International Conference on Neural Networks, 3, 2263-2266 (1993)
Hartigan, J.A.: A failure of likelihood asymptotics for normal mixtures. Proc. of Berkeley Conference in Honor of J. Neyman and J. Kiefer, 2, 807-810 (1985)
Hayashi, N., Watanabe, S.: Upper bound of Bayesian generalization error in non-negative matrix factorization. Neurocomputing 266, 21–28 (2017)
Hayashi, N.: The exact asymptotic form of Bayesian generalization error in latent Dirichlet allocation. Neural Netw. 137, 127–137 (2021)
Hironaka, H.: Resolution of singularities of an algebraic variety over a field of characteristic zero. I, II. Ann. Math. 79, 109–326 (1964)
Kariya, N., Watanabe, S.: Asymptotic analysis of singular likelihood ratio of normal mixture by Bayesian learning theory for testing homogeneity. Commun. Stat. Theory Methods 51, 1–18 (2020)
Kariya, N., Watanabe, S.: Testing homogeneity for normal mixture models: variational Bayes approach. IEICE Trans. Fundam Electron Commun. Comput. Sci. 103, 1274–1282 (2020)
Kashiwara, M.: B-functions and holonomic systems. Rationality of roots of B-functions. Invent. Math. 38, 33–53 (1976)
Kollár, J.: Singularities of pairs, Proceedings of Symp. Pure Math., A.M.S. 62, Part 1, 221-287 (1997)
McElreath, S.: Statistical Rethinking: A Bayesian Course With Examples in R and STAN, 2nd edn. CRC Press, Florida (2020)
Murata, N., Yoshizawa, S., Amari, S.: Network information criterion-determining the number of hidden units for an artificial neural network model. IEEE Trans. Neural Netw. 5, 865–872 (1995)
Nagata, K., Watanabe, S.: Asymptotic behavior of exchange ratio in exchange Monte Carlo method. Neural Netw. 21(7), 980–988 (2008)
Nagayasu, S., Watanabe, S.: Asymptotic behavior of free energy when optimal probability distribution is not unique. Neurocomputing 500, 528–536 (2022)
Nakajima, S., Watanake, K., Sugiyama, M.: Variational Bayesian Learning Theory. Cambridge University Press, Cambridge (2019)
Peruggia, M.: On the variability of case-detection importance sampling weights in the Bayesian linear model. J. Am. Stat. Assoc. 92, 199–207 (1997)
Saito, M.: On real log canonical thresholds, arxiv:0707.2308, (2007)
Sato, K., Watanabe, S.: Bayesian generalization error of Poisson mixture and simplex Vandermonde matrix type singularity. arXiv:1912.13289, (2019)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B 64(4), 583–639 (2002)
Vehtari, A., Lampinen, J.: Bayesian model assessment and comparison using cross-validation predictive densities. Neural Comput. 14(10), 2439–2468 (2002)
Vehtari, A., Gelman, A., Gabry, J.: Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27(5), 1413–1432 (2017)
Watanabe, K., Watanabe, S.: Stochastic complexities of Gaussian mixtures in variational Bayesian approximation. J. Mach. Learn. Res. 7, 625–644 (2006)
Watanabe, S.: A generalized Bayesian framework for neural networks with singular Fisher information matrices. Proc. of International Symposium on Nonlinear Theory and Its Applications, 207-210 (1995)
Watanabe, S.: Algebraic analysis for singular statistical estimation. Lect. Notes Comput. Sci. 1720, 39–50 (1999)
Watanabe, S.: Algebraic geometrical methods for hierarchical learning machines. Neural Netw. 14, 1049–1060 (2001)
Watanabe, S.: Learning efficiency of redundant neural networks in Bayesian estimation. IEEE Trans. Neural Netw. 12, 1475–1486 (2001)
Watanabe, S.: Algebraic analysis for nonidentifiable learning machines. Neural Comput. 13, 899–933 (2001)
Watanabe, S., Amari, S.: Learning coefficients of layered models when the true distribution mismatches the singularities. Neural Comput. 15, 1013–1033 (2003)
Watanabe, S.: Almost all learning machines are singular. IEEE Symposium on Foundations of Computational Intelligence, 383-388 (2017)
Watanabe, S.: Algebraic Geometry and Statistical Learning Theory. Cambridge University Press, Cambridge (2009)
Watanabe, S.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11, 3571–3594 (2010)
Watanabe, S.: Asymptotic learning curve and renormalizable condition in statistical learning theory. J. Phys. Conf. Ser. 233, 012014 (2010)
Watanabe, S.: A widely applicable Bayesian information criterion. J. Mach. Learn. Res. 14, 867–897 (2013)
Watanabe, S.: Mathematical Theory of Bayesian Statistics. CRC Press, Florida (2018)
Watanabe, S.: Higher order equivalence of Bayes cross validation and WAIC, pp. 47–73. Springer Proceedings in Mathematics and Statistics, Information Geometry and Its Applications (2018)
Watanabe, S.: WAIC and WBIC for mixture models. Behaviormetrika (2021). https://doi.org/10.1007/s41237-021-00133-z
Watanabe, S.: Information criteria and cross validation for Bayesian inference in regular and singular cases. Jpn. J. Stat. Data Sci. 4, 1–19 (2021)
Watanabe, S.: Mathematical theory of Bayesian statistics where all models are wrong. Advancements in Bayesian Methods and Implementations, Handbook of statistics, 47, 209-238 Elsevier, (2022)
Watanabe, S.: Mathematical theory of Bayesian statistics for unknown information source. to appear in Philosophical Transactions of the Royal Society A, arXiv:2206.05630, (2022)
Watanabe, T., Watanabe, S.: Asymptotic behavior of Bayesian generalization error in multinomial mixtures. arXiv:2203.06884
Wei, S., Murfet, D., Gong, M., Li, H., Gell-Redman, J., Quella, T.: Deep learning is singular, and That’s good. IEEE Trans. Neural Netw. Learn. Syst. 33, 1–14 (2022)
Yamazaki, K., Watanabe, S.: Singularities in mixture models and upper bounds of stochastic complexity. Int. J. Neural Netw. 16(7), 1029–1038 (2003)
Yamazaki, K., Watanabe, S.: Algebraic geometry and stochastic complexity of hidden Markov models. Neurocomputing 69, 62–84 (2005)
Yamazaki, K., Watanabe, S.: Singularities in complete bipartite graph-type boltzmann machines and upper bounds of stochastic complexities. IEEE Trans. Neural Netw. 16, 312–324 (2005)
Yamazaki, K., Kawanabe, M., Watanabe, S., Sugiyama, M., Müller, K.-R.: Asymptotic bayesian generalization error when training and test distributions are different. Proceedings of the 24th international conference on Machine learning 1079-1086 (2007)
Yamazaki, K., Aoyagi, M., Watanabe, S.: Asymptotic analysis of Bayesian generalization error with Newton diagram. Neural Netw. 23, 35–43 (2010)
Yamazaki, K.: Asymptotic accuracy of Bayes estimation for latent variables with redundancy. Mach. Learn. 102, 1–28 (2016)
Yamazaki, K., Kaji, D.: Comparing two Bayes methods based on the free energy functions in Bernoulli mixtures. Neural Netw. 44, 36–43 (2013)
Zwiernik, P.: An asymptotic behavior of the marginal likelihood for general Markov models. J. Mach. Learn. Res. 12, 3283–3310 (2011)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no conflict of interest.
Additional information
Communicated by Noboru Murata.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Watanabe, S. Recent advances in algebraic geometry and Bayesian statistics. Info. Geo. (2022). https://doi.org/10.1007/s41884-022-00083-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41884-022-00083-9
Keywords
- Birational geometry
- Resolution of singularities
- Bayesian statistics
- Real log canonical threshold