Automatic Model Selection in a Hybrid Perceptron/Radial Network

  • Shimon Cohen
  • Nathan Intrator
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2096)


We introduce an algorithm for incrementaly constructing a hybrid network fo radial and perceptron hidden units. The algorithm determins if a radial or a perceptron unit is required at a given region of input space. Given an error target, the algorithm also determins the number of hidden units. This results in a final architecture which is often much smaller than an RBF network or a MLP. A benchmark on four classification problems and three regression problems is given. The most striking performance improvement is achieved on the vowel data set [4].


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. The Wadsworth Statistics/Probability Series, Belmont, CA, 1984.Google Scholar
  2. 2.
    J. Buckheit and D. L. Donoho. Improved linear discrimination using time-frequency dictionaries. Technical Report, Stanford University, 1995.Google Scholar
  3. 3.
    S. Cohen and N. Intrator. A hybrid projection based and radial basis function architecture. In J. Kittler and F. Roli, editors, Proc. Int. Workshop on Multiple Classifier Systems (LNCS1857), pages 147–156, Sardingia, June 2000. Springer.CrossRefGoogle Scholar
  4. 4.
    D. H. Deterding. Speaker Normalisation for Automatic Speech Recognition. PhD thesis, University of Cambridge, 1989.Google Scholar
  5. 5.
    D. L. Donoho and I. M. Johnstone. Projection-based approximation and a duality with kernel methods. Annals of Statistics, 17:58–106, 1989.MATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    H. Drucker, R. Schapire, and P. Simard. Improving performance in neural networks using a boosting algorithm. In Steven J. Hanson, Jack D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 42–49. Morgan Kaufmann, 1993.Google Scholar
  7. 7.
    S. E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. CMU-CS-90-100, Carnegie Mellon University, 1990.Google Scholar
  8. 8.
    G. W. Flake. Square unit augmented, radially extended, multilayer percpetrons. In G. B. Orr and K. Müller, editors, Neural Networks: Tricks of the Trade, pages 145–163. Springer, 1998.Google Scholar
  9. 9.
    J. H. Friedman. Mutltivariate adaptive regression splines. The Annals of Statistics, 19:1–141, 1991.MATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    J. H. Friedman and W. Stuetzle. Projection pursuit regression. Journal of the American Statistical Association, 76:817–823, 1981.CrossRefMathSciNetGoogle Scholar
  11. 11.
    T. Hastie and R. Tibshirani. Generalized additive models. Statistical Science, 1:297–318, 1986.MathSciNetCrossRefGoogle Scholar
  12. 12.
    T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman and Hall, London, 1990.MATHGoogle Scholar
  13. 13.
    T. Hastie, R. Tibshirani, and A. Buja. Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association, 89:1255–1270, 1994.MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3(1):79–87, 1991.CrossRefGoogle Scholar
  15. 15.
    M. I. Jordan and R. A. Jacobs. Hierarchies of adaptive experts. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems, volume 4, pages 985–992. Morgan Kaufmann, San Mateo, CA, 1992.Google Scholar
  16. 16.
    R. E. Kass and A. E. Raftery. Bayes factors. Journal of The American Statistical Association, 90:773–795, 1995.MATHCrossRefGoogle Scholar
  17. 17.
    Y. C. Lee, G. Doolen, H. H. Chen, G. Z. Sun, T. Maxwell, H.Y. Lee, and C. L. Giles. Machine learning using higher order correlation networks. Physica D, 22:276–306, 1986.MathSciNetGoogle Scholar
  18. 18.
    D. J. C. MacKay. Bayesian interpolation. Neural Computation, 4(3):415–447, 1992.CrossRefGoogle Scholar
  19. 19.
    John Moody. Prediction risk and architecture selection for neural networks. In V. Cherkassky, J. H. Friedman, and H. Wechsler, editors, From Statistics to Neural Networks: Theory and Pattern Recognition Applications. Springer, NATO ASI Series F, 1994.Google Scholar
  20. 20.
    S. J. Nowlan. Soft competitive adaptation: Neural network learning algorithms basd on fitting statistical mixtures. Ph.D. dissertation, Carnegie Mellon University, 1991.Google Scholar
  21. 21.
    M. J. Orr, J. Hallman, K. Takezawa, A. Murray, S. Ninomiya, M. Oide, and T. Leonard. Combining regression trees and radial basis functions. Division of informatics, Edinburgh University, 1999. Submitted to IJNS.Google Scholar
  22. 22.
    Gorman R. P. and Sejnowski T. J. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Network, pages 75–89, 1988. Vol. 1.CrossRefGoogle Scholar
  23. 23.
    A. J. Robinson. Dynamic Error Propogation Networks. PhD thesis, University of Cambridge, 1989.Google Scholar
  24. 24.
    D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing, volume 1, pages 318–362. MIT Press, Cambridge, MA, 1986.Google Scholar
  25. 25.
    C. J. Stone. The dimensionality reduction principle for generalized additive models. The Annals of Statistics, 14:590–606, 1986.MATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Shimon Cohen
    • 1
  • Nathan Intrator
    • 1
  1. 1.Computer Science DepartmentTel-Aviv UniversityIsrael

Personalised recommendations