Automatic Model Selection in a Hybrid Perceptron/Radial Network
We introduce an algorithm for incrementaly constructing a hybrid network fo radial and perceptron hidden units. The algorithm determins if a radial or a perceptron unit is required at a given region of input space. Given an error target, the algorithm also determins the number of hidden units. This results in a final architecture which is often much smaller than an RBF network or a MLP. A benchmark on four classification problems and three regression problems is given. The most striking performance improvement is achieved on the vowel data set .
Unable to display preview. Download preview PDF.
- 1.L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. The Wadsworth Statistics/Probability Series, Belmont, CA, 1984.Google Scholar
- 2.J. Buckheit and D. L. Donoho. Improved linear discrimination using time-frequency dictionaries. Technical Report, Stanford University, 1995.Google Scholar
- 4.D. H. Deterding. Speaker Normalisation for Automatic Speech Recognition. PhD thesis, University of Cambridge, 1989.Google Scholar
- 6.H. Drucker, R. Schapire, and P. Simard. Improving performance in neural networks using a boosting algorithm. In Steven J. Hanson, Jack D. Cowan, and C. Lee Giles, editors, Advances in Neural Information Processing Systems, volume 5, pages 42–49. Morgan Kaufmann, 1993.Google Scholar
- 7.S. E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. CMU-CS-90-100, Carnegie Mellon University, 1990.Google Scholar
- 8.G. W. Flake. Square unit augmented, radially extended, multilayer percpetrons. In G. B. Orr and K. Müller, editors, Neural Networks: Tricks of the Trade, pages 145–163. Springer, 1998.Google Scholar
- 15.M. I. Jordan and R. A. Jacobs. Hierarchies of adaptive experts. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems, volume 4, pages 985–992. Morgan Kaufmann, San Mateo, CA, 1992.Google Scholar
- 19.John Moody. Prediction risk and architecture selection for neural networks. In V. Cherkassky, J. H. Friedman, and H. Wechsler, editors, From Statistics to Neural Networks: Theory and Pattern Recognition Applications. Springer, NATO ASI Series F, 1994.Google Scholar
- 20.S. J. Nowlan. Soft competitive adaptation: Neural network learning algorithms basd on fitting statistical mixtures. Ph.D. dissertation, Carnegie Mellon University, 1991.Google Scholar
- 21.M. J. Orr, J. Hallman, K. Takezawa, A. Murray, S. Ninomiya, M. Oide, and T. Leonard. Combining regression trees and radial basis functions. Division of informatics, Edinburgh University, 1999. Submitted to IJNS.Google Scholar
- 23.A. J. Robinson. Dynamic Error Propogation Networks. PhD thesis, University of Cambridge, 1989.Google Scholar
- 24.D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing, volume 1, pages 318–362. MIT Press, Cambridge, MA, 1986.Google Scholar