Summary
In the last decade, several techniques have been developed for combining neural networks [48, 49]. Combining a number of trained neural networks to form what is often referred to as a neural network ensemble, may yield better model accuracy without requiring extensive efforts in training the individual networks or optimising their architecture [21, 48]. However, because the corresponding outputs of the individual networks approximate the same physical quantity (or quantities), they may be highly positively correlated or collinear (linearly dependent). Thus, the estimation of the optimal weights for combining such networks may be subjected to the harmful effects of Collinearity, resulting in a neural network ensemble with inferior generalisation ability compared to the individual networks [20, 42, 48].
In this chapter, we discuss the harmful effects of collinearity on the estimation of the optimal combination-weights for combining the networks. We describe an approach for treating collinearity by the proper selection of the component networks, and test two algorithms for selecting the components networks in order to improve the generalisation ability of the ensemble. We present experimental results to demonstrate the effectiveness of optimal linear combinations, guided by the selection algorithms, in improving model accuracy.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
E. Alpaydin. Multiple networks for function learning. In Proceedings of the 1993 IEEE International Conference on Neural Networks, volume I, pages 914. IEEE Press, Apr. 1993.
R. Battiti and A. M. Colla. Democracy in neural nets: Voting schemes for classification. Neural Networks, 7 (4): 691–707, 1994.
W. G. Baxt. Improving the accuracy of an artificial neural network using multiple differently trained networks. Neural Computation, 4: 772–780, 1992.
D. A. Belsley. Assessing the presence of harmful collinearity and other forms of weak data through a test for signal-to-noise. Journal of Econometrics, 20: 211253, 1982.
D. A. Belsley. Conditioning Diagnostics: Collinearity and Weak Data in Regression. John Wiley & Sons, New York, 1991.
D. A. Belsley, E. Kuth, and R. E. Welsch. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley & Sons, New York, 1980.
J. A. Benediktsson, J. R. Sveinsson, O. K. Ersoy, and P. H. Swain. Parallel consensual neural networks. IEEE Transactions on Neural Networks, 8 (1): 5464, 1997.
L. Breiman. Stacked regressions. Technical Report 367, Department of Statistics, University of California, Berkeley, California 94720, USA, Aug. 1992. Revised June 1994.
D. W. Bunn. Statistical efficiency in the linear combination of forecasts. International Journal of Forecasting, 1: 151–163, 1985.
D. W. Bunn. Forecasting with more than one model. Journal of Forecasting, 8: 161–166, 1989.
V. Cherkassky, D. Gehring, and F. Mulier. Pragmatic comparison of statistical and neural network methods for function estimation. In Proceedings of the 1995 World Congress on Neural Networks, volume II, pages 917–926, 1995.
V. Cherkassky and H. Lari-Najafi. Constrained topological mapping for non-parametric regression analysis. Neural Networks, 4: 27–40, 1991.
R. T. Clemen. Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5: 559–583, 1989.
R. T. Clemen and R. L. Winkler. Combining economic forecasts. Journal of Business & Economic Statistics, 4 (1): 39–46, Jan. 1986.
L. Cooper. Hybrid neural network architectures: Equilibrium systems that pay attention. In R. J. Mammone and Y. Y. Zeevi, editors, Neural Networks: Theory and Applications, pages 81–96. Academic Press, 1991.
C. W. J. Granger. Combining forecasts–twenty years later. Journal of Forecasting, 8: 167–173, 1989.
J. B. Guerard Jr. and R. T. Clemen. Collinearity and the use of latent root regression for combining GNP forecasts. Journal of Forecasting, 8: 231–238, 1989.
L. K. Hansen and P. Salamon. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12 (10): 993–1001, 1990.
S. Hashem. Optimal Linear Combinations of Neural Networks. PhD thesis, School of Industrial Engineering, Purdue University, Dec. 1993.
S. Hashem. Effects of collinearity on combining neural networks. Connection Science, 8(3 & 4):315–336, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.
S. Hashem. Optimal linear combinations of neural networks. Neural Networks, 10 (4): 599–614, 1997.
S. Hashem and B. Schmeiser. Approximating a function and its derivatives using MSE-optimal linear combinations of trained feedforward neural networks. In Proceedings of the 1993 World Congress on Neural Networks, volume I, pages 617–620, New Jersey, 1993. Lawrence Erlbaum Associates.
S. Hashem and B. Schmeiser. Improving model accuracy using optimal linear combinations of trained neural networks. IEEE Transactions on Neural Networks, 6 (3): 792–794, 1995.
S. Hashem, B. Schmeiser, and Y. Yih. Optimal linear combinations of neural networks: An overview. In Proceedings of the 1994 IEEE International Conference on Neural Networks, volume III, pages 1507–1512. IEEE Press, 1994.
W. W. Hines and D. C. Montgomery. Probability and Statistics in Engineering and Management Science. John Wiley & Sons, 1990.
J.-N. Hwang, S.-R. Lay, M. Maechler, R. D. Martin, and J. Schimert. Regression modeling in back-propagation and projection pursuit learning. IEEE Transactions on Neural Networks, 5 (3): 342–353, May 1994.
R. A. Jacobs. Bias/variance analysis of mixtures-of-experts architectures. Neural Computation, 9: 369–383, 1997.
R. A. Jacobs and M. Jordan. A competitive modular connectionist architecture. In R. Lippmann, J. Moody, and D. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 767–773. Morgan Kaufman, 1991.
R. A. Jacobs and M. Jordan. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6: 181–214, 1994.
A. Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, pages 231–238. MIT Press, 1995.
M. Maechler, D. Martin, J. Schimert, M. Csoppenszky, and J. Hwang. Projection pursuit learning networks for regression. In Proceedings of the 2nd International Conference on Tools for Artificial Intelligence, Washington D.C., pages 350–358. IEEE Press, November 1990.
G. Mani. Lowering variance of decisions by using artificial neural networks portfolios. Neural Computation, 3: 484–486, 1991.
L. Menezes and D. Bunn. Specification of predictive distribution from a combination of forecasts. Methods of Operations Research, 64: 397–405, 1991.
H. Moskowitz and G. P. Wright. Statistics for Management and Economics. Charles Merrill Publishing Company, Ohio, 1985.
J. Neter, W. Wasserman, and M. H. Kutner. Applied Linear Statistical Models. Irwin, Homewood, IL, 1990. 3rd Edition.
L. Ohno-Machado and M. A. Musen. Hierarchical neural networks for partial diagnosis in medicine. In Proceedings of the 1994 World Congress on Neural Networks, volume 1, pages 291–296. Lawrence Erlbaum Associates, 1994.
D. W. Opitz and J. W. Shavlik. Actively searching for an effective neural network ensemble. Connection Science, 8(3 &4):337–353, Dec. 1996. Special issue on Combining Neural Networks: Ensemble Approaches.
B. Parmanto, P. W. Munro, and H. R. Doyle. Reducing variance of committee prediction with resampling techniques. Connection Science, 8(3 & 4):405–425, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.
B. Parmanto, P. W. Munro, H. R. Doyle, C. Doria, L. Aldrighetti, I. R. Marino, S. Mitchel, and J. J. Fung. Neural network classifier for hepatoma detection. In Proceedings of the 1994 World Congress on Neural Networks, volume I, pages 285–290, New Jersey, 1994. Lawrence Erlbaum Associates.
B. A. Pearlmutter and R. Rosenfeld. Chaitin-Kolmogorov complexity and generalization in neural networks. In Advances in Neural Information Processing Systems 3, pages 925–931, 1991.
M. P. Perrone. Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization. PhD thesis, Department of Physics, Brown University, May 1993.
M. P. Perrone and L. N. Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In R. J. Mammone, editor, Neural Networks for Speech and Image Processing. Chapman & Hall, 1993.
Y. Raviv and N. Intrator. Bootstrapping with noise: An effective regularization technique. Connection Science, 8(3 & 4):355–372, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.
G. Rogova. Combining the results of several neural network classifiers. Neural Networks, 7 (5): 777–781, 1994.
B. E. Rosen. Ensemble learning using decorrelated neural networks. Connection Science, 8(3 & 4):373–383, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.
R. L. Scheaffer and J. T. McClave. Probability and Statistics for Engineers. PWS-KENT Publishing Company, Boston, 1990.
D. C. Schmittlein, J. Kim, and D. G. Morrison. Combining forecasts: Operational adjustments to theoretically optimal rules. Management Science, 36 (9): 1044–1056, Sept. 1990.
A. J. Sharkey. On combining artificial neural nets. Connection Science, 8(3 & 4):299–313, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.
A. J. Sharkey. Modularity, combining and artificial neural nets. Connection Science, 9(1):3–10, 1997. Special issue on Combining Neural Networks: Modular Approaches.
I. M. Sobol’. The Monte Carlo method. University of Chicago Press, 1974. Translated and adapted from the 2nd Russian edition by R. Messer, J. Stone, and P. Fortini.
K. Turner and J. Ghosh. Error correction and error reduction in ensemble classifiers. Connection Science, 8(3 & 4):385–404, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.
C. T. West. System-based weights versus series-specific weights in the combination of forecasts. Journal of Finance, 15: 369–383, 1996.
R. L. Winkler and R. T. Clemen. Sensitivity of weights in combining forecasts. Operations Research, 40(3):609–614, May-June 1992.
D. H. Wolpert. Stacked generalization. Neural Networks, 5: 241–259, 1992.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this chapter
Cite this chapter
Sharkey, A.J.C. (1999). Treating Harmful Collinearity in Neural Network Ensembles. In: Sharkey, A.J.C. (eds) Combining Artificial Neural Nets. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0793-4_5
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0793-4_5
Publisher Name: Springer, London
Print ISBN: 978-1-85233-004-0
Online ISBN: 978-1-4471-0793-4
eBook Packages: Springer Book Archive