Algebraic Analysis for Singular Statistical Estimation
Abstract
This paper clarifies learning efficiency of a non-regular parametric model such as a neural network whose true parameter set is an analytic variety with singular points. By using Sato’s b-function we rigorously prove that the free energy or the Bayesian stochastic complexity is asymptotically equal to λ 1 log n − (m 1 − 1) log log n+constant, where λ 1 is a rational number, m 1 is a natural number, and n is the number of training samples. Also we show an algorithm to calculate λ 1 and m 1 based on the resolution of singularity. In regular models, 2λ 1 is equal to the number of parameters and m 1 = 1, whereas in non-regular models such as neural networks, 2λ 1 is smaller than the number of parameters and m 1 ≥ 1.
Keywords
Generalization Error Regular Model Algebraic Analysis Layered Neural Network Statistical Estimation ErrorPreview
Unable to display preview. Download preview PDF.
References
- 1.Hagiwara, K., Toda, N., Usui, S.,: On the problem of applying AIC to determine the structure of a layered feed-forward neural network. Proc. of IJCNN Nagoya Japan. 3 (1993) 2263–2266CrossRefGoogle Scholar
- 2.Fukumizu, K.: Generalization error of linear neural networks in unidentifiable cases. In this issue.Google Scholar
- 3.Watanabe, S.: Inequalities of generalization errors for layered neural networks in Bayesian learning. Proc. of ICONIP 98 (1998) 59–62Google Scholar
- 4.Levin, E., Tishby, N., Solla, S.A.: A statistical approaches to learning and generalization in layered neural networks. Proc. of IEEE 78(10) (1990) 1568–1674CrossRefGoogle Scholar
- 5.Amari, S., Fujita, N., Shinomoto, S.: Four Types of Learning Curves. Neural Computation 4(4) (1992) 608–618CrossRefGoogle Scholar
- 6.Sato, M., Shintani, T.: On zeta functions associated with prehomogeneous vector space. Anals. of Math., 100 (1974) 131–170CrossRefMathSciNetGoogle Scholar
- 7.Bernstein, I.N.: The analytic continuation of generalized functions with respect to a parameter. Functional Anal. Appl.6 (1972) 26–40.Google Scholar
- 8.Björk, J.E.: Rings of differential operators. Northholand (1979)Google Scholar
- 9.Kashiwara, M.: B-functions and holonomic systems. Inventions Math. 38 (1976) 33–53.zbMATHCrossRefMathSciNetGoogle Scholar
- 10.Gel’fand, I.M., Shilov, G.E.: Generalized functions. Academic Press, (1964).Google Scholar
- 11.Watanabe, S.: Algebraic analysis for neural network learning. Proc. of IEEE SMC Symp., 1999, to appear.Google Scholar
- 12.Watanabe, S.: On the generalization error by a layered statistical model with Bayesian estimation. IEICE Trans. J81-A (1998) 1442–1452. (The English version is to appear in Elect. and Comm. in Japan. John Wiley and Sons)Google Scholar
- 13.Atiyah, M.F.: Resolution of Singularities and Division of Distributions. Comm. Pure and Appl. Math. 13 (1970) 145–150MathSciNetCrossRefGoogle Scholar
- 14.Hörmander, L.: An introduction to complex analysis in several variables. Van Nostrand. (1966)Google Scholar