Learning and Convergence of the Normalized Radial Basis Functions Networks

  • Adam KrzyżakEmail author
  • Marian Partyka
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10841)


In the paper we analyze convergence and rates of convergence of the normalized radial basis function networks by relating their \(L_2\) error to the \(L_2\) error of the Wolverton-Wagner regression estimate. The network parameters are learned by minimizing the empirical risk and are applied in function learning and classification.


Nonlinear regression Classification Wolverton-Wagner recursive radial basis function networks MISE convergence Strong convergence Rates of convergence 


  1. 1.
    Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)zbMATHCrossRefGoogle Scholar
  2. 2.
    Azuma, K.: Weighted sums of certain dependent random variables. Tohoku Math. J. 19(3), 357–367 (1967)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Beirlant, J., Györfi, L.: On the asymptotic \({L}_2\)-error in partitioning regression estimation. J. Stat. Plan. Inference 71, 93–107 (1998)zbMATHGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  6. 6.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth Advanced Books and Software, Belmont, CA (1984)Google Scholar
  7. 7.
    Broomhead, D.S., Lowe, D.: Multivariable functional interpolation and adaptive networks. Complex Syst. 2, 321–323 (1988)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Cybenko, G.: Approximations by superpositions of sigmoidal functions. Math. Control Sig. Syst. 2, 303–314 (1989)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Devroye, L.: Any discrimination rule can have arbitrary bad probability of error for finite sample size. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-4, 154–157 (1982)Google Scholar
  10. 10.
    Devroye, L.P., Wagner, T.J.: On the L1 convergence of the kernel estimators of regression functions with applications in discrimination. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 51(1), 15–25 (1980)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Devroye, L., Györfi, L., Lugosi, G.: Probabilistic Theory of Pattern Recognition. Springer, New York (1996). Scholar
  12. 12.
    Devroye, L., Györfi, L., Krzyżak, A., Lugosi, G.: On the strong universal consistency of nearest neighbor regression function estimates. Ann. Stat. 22, 1371–1385 (1994)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Devroye, L., Krzyżak, A.: An equivalence theorem for \(L_1\) convergence of the kernel regression estimate. J. Stat. Plan. Inference 23, 71–82 (1989)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Devroye, L., Biau, G.: Lectures on the Nearest Neighbor Method. Springer, New York (2015). Scholar
  15. 15.
    Duchon, J.: Sur l’erreur d’interpolation des fonctions de plusieurs variables par les \(D^{m}\)-splines. RAIRO-Analyse Numèrique 12(4), 325–334 (1978)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Faragó, A., Lugosi, G.: Strong universal consistency of neural network classifiers. IEEE Trans. Inf. Theory 39, 1146–1151 (1993)zbMATHCrossRefGoogle Scholar
  17. 17.
    Girosi, F., Anzellotti, G.: Rates of convergence for radial basis functions and neural networks. In: Mammone, R.J. (ed.) Artificial Neural Networks for Speech and Vision, pp. 97–113. Chapman and Hall, London (1993)Google Scholar
  18. 18.
    Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural network architectures. Neural Comput. 7, 219–267 (1995)CrossRefGoogle Scholar
  19. 19.
    Greblicki, W.: Asymptotically Optimal Probabilistic Algorithms for Pattern Recognition and Identification. Monografie No. 3. Prace Naukowe Instytutu Cybernetyki Technicznej Politechniki Wroclawskiej, Nr. 18, Wroclaw, Poland (1974)Google Scholar
  20. 20.
    Greblicki, W., Pawlak, M.: Fourier and Hermite series estimates of regression functions. Ann. Inst. Stat. Math. 37, 443–454 (1985)MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Greblicki, W., Pawlak, M.: Necessary and sufficient conditions for Bayes risk consistency of a recursive kernel classification rule. IEEE Trans. Inf. Theory, IT-33, 408–412 (1987)Google Scholar
  22. 22.
    Györfi, L., Kohler, M., Krzyżak, A., Walk, H.: A Distribution-Free Theory of Nonparametric Regression. Springer, New York (2002). Scholar
  23. 23.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning; Data Mining, Inference and Prediction, 2nd edn. Springer, New York (2009). Scholar
  24. 24.
    Haykin, S.O.: Neural Networks and Learning Machines, 3rd edn. Prentice-Hall, New York (2008)Google Scholar
  25. 25.
    Hornik, K., Stinchocombe, S., White, H.: Multilayer feed-forward networks are universal approximators. Neural Netw. 2, 359–366 (1989)zbMATHCrossRefGoogle Scholar
  26. 26.
    Kohler, M., Krzyżak, A.: Nonparametric regression based on hierarchical interaction models. IEEE Trans. Inf. Theory 63, 1620–1630 (2017)MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Krzyżak, A.: The rates of convergence of kernel regression estimates and classification rules. IEEE Trans. Inf. Theory, IT-32, 668–679 (1986)Google Scholar
  28. 28.
    Krzyżak, A.: Global convergence of recursive kernel regression estimates with applications in classification and nonlinear system estimation. IEEE Trans. Inf. Theory IT-38, 1323–1338 (1992)MathSciNetzbMATHCrossRefGoogle Scholar
  29. 29.
    Krzyżak, A., Linder, T., Lugosi, G.: Nonparametric estimation and classification using radial basis function nets and empirical risk minimization. IEEE Trans. Neural Netw. 7(2), 475–487 (1996)CrossRefGoogle Scholar
  30. 30.
    Krzyżak, A., Linder, T.: Radial basis function networks and complexity regularization in function learning. IEEE Trans. Neural Netw. 9(2), 247–256 (1998)CrossRefGoogle Scholar
  31. 31.
    Krzyżak, A., Niemann, H.: Convergence and rates of convergence of radial basis functions networks in function learning. Nonlinear Anal. 47, 281–292 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  32. 32.
    Krzyżak, A., Partyka, M.: Convergence and rates of convergence of recursive radial basis functions networks in function learning and classification. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2017. LNCS (LNAI), vol. 10245, pp. 107–117. Springer, Cham (2017). Scholar
  33. 33.
    Krzyżak, A., Pawlak, M.: Universal consistency results for the Wolverton-Wagner regression estimate with application in discrimination. Probl. Control Inf. Theory 12, 33–42 (1983)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Krzyżak, A., Pawlak, M.: Distribution-free consistency of a nonparametric kernel regression estimate and classification. IEEE Trans. Inf. Theory IT-30, 78–81 (1984)Google Scholar
  35. 35.
    Krzyżak, A., Schäfer, D.: Nonparametric regression estimation by normalized radial basis function networks. IEEE Trans. Inf. Theory 51, 1003–1010 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  36. 36.
    Lugosi, G., Zeger, K.: Nonparametric estimation via empirical risk minimization. IEEE Trans. Inf. Theory 41, 677–687 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  37. 37.
    McDiarmid, C.: On the method of bounded differences. Surv. Comb. 141, 148–188 (1989)MathSciNetzbMATHGoogle Scholar
  38. 38.
    Moody, J., Darken, J.: Fast learning in networks of locally-tuned processing units. Neural Comput. 1, 281–294 (1989)CrossRefGoogle Scholar
  39. 39.
    Park, J., Sandberg, I.W.: Universal approximation using Radial-Basis-Function networks. Neural Comput. 3, 246–257 (1991)CrossRefGoogle Scholar
  40. 40.
    Park, J., Sandberg, I.W.: Approximation and Radial-Basis-Function networks. Neural Comput. 5, 305–316 (1993)CrossRefGoogle Scholar
  41. 41.
    Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (2008)zbMATHGoogle Scholar
  42. 42.
    Scornet, E., Biau, G., Vert, J.-P.: Consistency of random forest. Ann. Stat. 43(4), 1716–1741 (2015)MathSciNetzbMATHCrossRefGoogle Scholar
  43. 43.
    Shorten, R., Murray-Smith, R.: Side effects of normalising radial basis function networks. Int. J. Neural Syst. 7, 167–179 (1996)CrossRefGoogle Scholar
  44. 44.
    Specht, D.F.: Probabilistic neural networks. Neural Netw. 3, 109–118 (1990)CrossRefGoogle Scholar
  45. 45.
    Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16, 264–280 (1971)zbMATHCrossRefGoogle Scholar
  46. 46.
    Vapnik, V.N.: Estimation of Dependences Based on Empirical Data. Springer, New York (1999). Scholar
  47. 47.
    White, H.: Connectionist nonparametric regression: multilayer feedforward networks that can learn arbitrary mappings. Neural Netw. 3, 535–549 (1990)CrossRefGoogle Scholar
  48. 48.
    Wolverton, C.T., Wagner, T.J.: Asymptotically optimal discriminant functions for pattern classification. IEEE Trans. Inf. Theory IT-15, 258–265 (1969)Google Scholar
  49. 49.
    Xu, L., Krzyżak, A., Yuille, A.L.: On radial basis function nets and kernel regression: approximation ability, convergence rate and receptive field size. Neural Netw. 7, 609–628 (1994)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and Software EngineeringConcordia UniversityMontrealCanada
  2. 2.Department of Electrical EngineeringWestpomeranian University of TechnologySzczecinPoland
  3. 3.Department of Knowledge Engineering, Faculty of Production Engineering and LogisticsOpole University of TechnologyOpolePoland

Personalised recommendations