Optimal Design in Flexible Models, Including Feed-Forward Networks and Nonparametric Regression
Part of the
Nonconvex Optimization and Its Applications
book series (NOIA, volume 51)
Feed-forward networks, also known as multilayer perceptrons, are the most frequently implemented type of neural network. In statistical terminology, they can be regarded as a class of nonlinear regression or classification models, depending on the context, nonlinear in terms of both the explanatory variables and the parameters. An attempt at optimal design therefore leads to a nonlinear design problem. In principle, statistical work in this area can be applied to this context, and a major aim of the paper will be to survey relevant material that has appeared in the neural-computing literature, where it is described under headings such as ‘active learning’, as well as ‘optimal design’. This part of the chapter will reinforce the contribution of Haines (1998).
A major reason for the attraction of feed-forward networks is that they can provide parametric but flexible regression models. One can consider going further and discuss the question of optimal design in nonparametric regression scenarios. The chapter discusses this issue and in particular the approach taken by Cheng et al. (1998) in the context of local linear smoothing.
Keywordsactive learning Bayesian design local linear smoothing neural networks nonlinear nonparametric regression sequential design
Anthony, M.A. and Biggs, N. (1992). Computational Learning Theory.
Cambridge Tracts in Theoretical Computer Science, No. 30. Cambridge: Cambridge University Press.zbMATHGoogle Scholar
Baum, E.B. (1991). Neural net algorithms that learn in polynomial time from examples and queries. IEEE Trans. Neural Networks
, 5–19.CrossRefGoogle Scholar
Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: a review. Statist. Science
, 273–304.MathSciNetzbMATHCrossRefGoogle Scholar
Chang, Y.-J. and Notz, W.I. (1996). Model robust designs. In Handbook of Statistics, Volume 13
Eds S. Ghosh and C.R. Rao, pp. 1055–1098. Amsterdam: Elsevier.Google Scholar
Cheng, B. and Titterington, D.M. (1994). Neural networks: a review from a statistical perspective (with discussion). Statist. Science
, 2–54.MathSciNetzbMATHCrossRefGoogle Scholar
Cheng, M.Y., Hall, P. and Titterington, D.M. (1998). Optimal design for curve estimation by local linear smoothing. Bernoulli
, 3–14.MathSciNetzbMATHCrossRefGoogle Scholar
Cohn, D.A. (1996). Neural network exploration using optimal experimental design. Neural Networks
, 1071–1083.CrossRefGoogle Scholar
Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies. Ann. Statist.
, 196–216.MathSciNetzbMATHCrossRefGoogle Scholar
Faraway, J.J. (1990). Sequential design for the nonparametric regression of curves and surfaces. In Proc. 22nd Symposium on the Interface
, pp. 104–110. New York: Springer.Google Scholar
Fedorov, V.V. (1972). Theory of Optimal Experiments.
New York: Academic Press.Google Scholar
Fedorov, V.V. and Nachtsheim, C.J. (1999). Design of experiments for locally weighted regression. J. Statist. Plan. Infer.
, 363–382.MathSciNetzbMATHCrossRefGoogle Scholar
Ford, I., Kitsos, C.P. and Titterington, D.M. (1989). Recent advances in nonlinear experimental design. Technometrics
, 49–60.MathSciNetzbMATHCrossRefGoogle Scholar
Haines, L.M. (1998). Optimal designs for neural networks. In New Developments and Applications in Experimental Design
Eds N. Flournoy, W.F. Rosenberger and W.K. Wong. IMS Lecture Notes-Monograph Series Volume 34. Hayward, CA: IMS.Google Scholar
Hastie, T. and Loader, C. (1993). Local regression: automatic kernel carpentry (with discussion). Statist. Science
, 120–143.CrossRefGoogle Scholar
Hwang, J.-N., Choi, J.J., Oh, S. and Marks II, R.J. (1991). Query-based learning applied to partially trained multilayer perceptrons. IEEE Trans. Neural Networks
, 131–136.CrossRefGoogle Scholar
Krogh, A. and Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing Systems, Volume 7
Eds G. Tesauro, D.S. Touretzky and T.K. Leen, pp. 231–238. Cambridge MA: MIT Press.Google Scholar
Mackay, D.J.C. (1992). Information-based objective functions for active data selection. Neural Computation
, 590–604.CrossRefGoogle Scholar
Marron, J.S. and Wand, M.P. (1992). Exact mean integrated squared error. Ann. Statist.
, 712–736.MathSciNetzbMATHCrossRefGoogle Scholar
Müller, H.-G. (1984). Optimal designs for nonparametric kernel regression. Statist. Prob. Lett.
, 285–290.zbMATHCrossRefGoogle Scholar
Müller, W.G. (1996). Optimal design for local fitting. J. Statist. Plan. Infer.
, 389–397.zbMATHCrossRefGoogle Scholar
Plutowski, M. and White, H. (1993). Selecting concise training sets from clean data. IEEE Trans. Neural Networks
, 305–318.CrossRefGoogle Scholar
Ripley, B.D. (1994). Neural networks and related methods for classification (with discussion). J. R. Statist. Soc.
, 409–456.MathSciNetzbMATHGoogle Scholar
Sollich, P. (1994). Query construction, entropy, and generalization in neural-network models. Phys. Rev.
, 4637–4651.Google Scholar
Titterington, D.M., (1999), Neural networks. In Encyclopedia of Statistical Science, Update Volume 3
Eds S. Kotz, C.B. Read and D. Banks, pp. 528–535. New York: Wiley.Google Scholar
Wiens, D.P. (1999). Robust sequential designs for nonlinear regression. Technical Report 99.05, Statistics Centre, Univ. of Alberta.Google Scholar
Wiens D.P. (1998). Minimax robust designs and weights for approximately specified regression models with heteroscedastic errors. J. Amer. Statist. Assoc.
, 1440–1450.MathSciNetzbMATHCrossRefGoogle Scholar
© Springer Science+Business Media Dordrecht 2001