Combining Artificial Neural Nets pp 101-125 | Cite as

# Treating Harmful Collinearity in Neural Network Ensembles

## Summary

In the last decade, several techniques have been developed for combining neural networks [48, 49]. Combining a number of trained neural networks to form what is often referred to as a neural network ensemble, may yield better model accuracy without requiring extensive efforts in training the individual networks or optimising their architecture [21, 48]. However, because the corresponding outputs of the individual networks approximate the same physical quantity (or quantities), they may be highly positively correlated or collinear (linearly dependent). Thus, the estimation of the optimal weights for combining such networks may be subjected to the harmful effects of Collinearity, resulting in a neural network ensemble with inferior generalisation ability compared to the individual networks [20, 42, 48].

In this chapter, we discuss the harmful effects of collinearity on the estimation of the optimal combination-weights for combining the networks. We describe an approach for treating collinearity by the proper selection of the component networks, and test two algorithms for selecting the components networks in order to improve the generalisation ability of the ensemble. We present experimental results to demonstrate the effectiveness of optimal linear combinations, guided by the selection algorithms, in improving model accuracy.

## Keywords

Mean Square Error Generalisation Ability Simple Average Component Network Trained Network## Preview

Unable to display preview. Download preview PDF.

## References

- 1.E. Alpaydin. Multiple networks for function learning. In
*Proceedings of the 1993 IEEE International Conference on Neural Networks*, volume I, pages 914. IEEE Press, Apr. 1993.Google Scholar - 2.R. Battiti and A. M. Colla. Democracy in neural nets: Voting schemes for classification.
*Neural Networks*, 7 (4): 691–707, 1994.CrossRefGoogle Scholar - 3.W. G. Baxt. Improving the accuracy of an artificial neural network using multiple differently trained networks.
*Neural Computation*, 4: 772–780, 1992.CrossRefGoogle Scholar - 4.D. A. Belsley. Assessing the presence of harmful collinearity and other forms of weak data through a test for signal-to-noise.
*Journal of Econometrics*, 20: 211253, 1982.Google Scholar - 5.D. A. Belsley.
*Conditioning Diagnostics: Collinearity and Weak Data in Regression*. John Wiley & Sons, New York, 1991.MATHGoogle Scholar - 6.D. A. Belsley, E. Kuth, and R. E. Welsch.
*Regression Diagnostics: Identifying Influential Data and Sources of Collinearity*. John Wiley & Sons, New York, 1980.MATHCrossRefGoogle Scholar - 7.J. A. Benediktsson, J. R. Sveinsson, O. K. Ersoy, and P. H. Swain. Parallel consensual neural networks.
*IEEE Transactions on Neural Networks*, 8 (1): 5464, 1997.CrossRefGoogle Scholar - 8.L. Breiman. Stacked regressions. Technical Report 367, Department of Statistics, University of California, Berkeley, California 94720, USA, Aug. 1992. Revised June 1994.Google Scholar
- 9.D. W. Bunn. Statistical efficiency in the linear combination of forecasts.
*International Journal of Forecasting*, 1: 151–163, 1985.CrossRefGoogle Scholar - 10.D. W. Bunn. Forecasting with more than one model.
*Journal of Forecasting*, 8: 161–166, 1989.CrossRefGoogle Scholar - 11.V. Cherkassky, D. Gehring, and F. Mulier. Pragmatic comparison of statistical and neural network methods for function estimation. In
*Proceedings of the 1995 World Congress on Neural Networks*, volume II, pages 917–926, 1995.Google Scholar - 12.V. Cherkassky and H. Lari-Najafi. Constrained topological mapping for non-parametric regression analysis.
*Neural Networks*, 4: 27–40, 1991.CrossRefGoogle Scholar - 13.R. T. Clemen. Combining forecasts: A review and annotated bibliography.
*International Journal of Forecasting*, 5: 559–583, 1989.CrossRefGoogle Scholar - 14.R. T. Clemen and R. L. Winkler. Combining economic forecasts.
*Journal of Business & Economic Statistics*, 4 (1): 39–46, Jan. 1986.CrossRefGoogle Scholar - 15.L. Cooper. Hybrid neural network architectures: Equilibrium systems that pay attention. In R. J. Mammone and Y. Y. Zeevi, editors,
*Neural Networks: Theory and Applications*, pages 81–96. Academic Press, 1991.Google Scholar - 16.C. W. J. Granger. Combining forecasts–twenty years later.
*Journal of Forecasting*, 8: 167–173, 1989.CrossRefGoogle Scholar - 17.J. B. Guerard Jr. and R. T. Clemen. Collinearity and the use of latent root regression for combining GNP forecasts.
*Journal of Forecasting*, 8: 231–238, 1989.CrossRefGoogle Scholar - 18.L. K. Hansen and P. Salamon. Neural network ensembles.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*, 12 (10): 993–1001, 1990.CrossRefGoogle Scholar - 19.S. Hashem.
*Optimal Linear Combinations of Neural Networks*. PhD thesis, School of Industrial Engineering, Purdue University, Dec. 1993.Google Scholar - 20.S. Hashem. Effects of collinearity on combining neural networks.
*Connection Science*, 8(3 & 4):315–336, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar - 21.S. Hashem. Optimal linear combinations of neural networks.
*Neural Networks*, 10 (4): 599–614, 1997.CrossRefGoogle Scholar - 22.S. Hashem and B. Schmeiser. Approximating a function and its derivatives using MSE-optimal linear combinations of trained feedforward neural networks. In
*Proceedings of the 1993 World Congress on Neural Networks*, volume I, pages 617–620, New Jersey, 1993. Lawrence Erlbaum Associates.Google Scholar - 23.S. Hashem and B. Schmeiser. Improving model accuracy using optimal linear combinations of trained neural networks.
*IEEE Transactions on Neural Networks*, 6 (3): 792–794, 1995.CrossRefGoogle Scholar - 24.S. Hashem, B. Schmeiser, and Y. Yih. Optimal linear combinations of neural networks: An overview. In
*Proceedings of the 1994 IEEE International Conference on Neural Networks*, volume III, pages 1507–1512. IEEE Press, 1994.CrossRefGoogle Scholar - 25.W. W. Hines and D. C. Montgomery.
*Probability and Statistics in Engineering and Management Science*. John Wiley & Sons, 1990.MATHGoogle Scholar - 26.J.-N. Hwang, S.-R. Lay, M. Maechler, R. D. Martin, and J. Schimert. Regression modeling in back-propagation and projection pursuit learning.
*IEEE Transactions on Neural Networks*, 5 (3): 342–353, May 1994.CrossRefGoogle Scholar - 27.R. A. Jacobs. Bias/variance analysis of mixtures-of-experts architectures.
*Neural Computation*, 9: 369–383, 1997.MathSciNetMATHCrossRefGoogle Scholar - 28.R. A. Jacobs and M. Jordan. A competitive modular connectionist architecture. In R. Lippmann, J. Moody, and D. Touretzky, editors,
*Advances in Neural Information Processing Systems 3*, pages 767–773. Morgan Kaufman, 1991.Google Scholar - 29.R. A. Jacobs and M. Jordan. Hierarchical mixtures of experts and the EM algorithm.
*Neural Computation*, 6: 181–214, 1994.CrossRefGoogle Scholar - 30.A. Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning. In G. Tesauro, D. Touretzky, and T. Leen, editors,
*Advances in Neural Information Processing Systems 7*, pages 231–238. MIT Press, 1995.Google Scholar - 31.M. Maechler, D. Martin, J. Schimert, M. Csoppenszky, and J. Hwang. Projection pursuit learning networks for regression. In
*Proceedings of the 2nd International Conference on Tools for Artificial Intelligence, Washington D.C.*, pages 350–358. IEEE Press, November 1990.CrossRefGoogle Scholar - 32.G. Mani. Lowering variance of decisions by using artificial neural networks portfolios.
*Neural Computation*, 3: 484–486, 1991.CrossRefGoogle Scholar - 33.L. Menezes and D. Bunn. Specification of predictive distribution from a combination of forecasts.
*Methods of Operations Research*, 64: 397–405, 1991.Google Scholar - 34.H. Moskowitz and G. P. Wright.
*Statistics for Management and Economics*. Charles Merrill Publishing Company, Ohio, 1985.Google Scholar - 35.J. Neter, W. Wasserman, and M. H. Kutner.
*Applied Linear Statistical Models.*Irwin, Homewood, IL, 1990. 3rd Edition.Google Scholar - 36.L. Ohno-Machado and M. A. Musen. Hierarchical neural networks for partial diagnosis in medicine. In
*Proceedings of the 1994 World Congress on Neural Networks*, volume 1, pages 291–296. Lawrence Erlbaum Associates, 1994.Google Scholar - 37.D. W. Opitz and J. W. Shavlik. Actively searching for an effective neural network ensemble.
*Connection Science*, 8(3 &4):337–353, Dec. 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar - 38.B. Parmanto, P. W. Munro, and H. R. Doyle. Reducing variance of committee prediction with resampling techniques.
*Connection Science*, 8(3 & 4):405–425, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar - 39.B. Parmanto, P. W. Munro, H. R. Doyle, C. Doria, L. Aldrighetti, I. R. Marino, S. Mitchel, and J. J. Fung. Neural network classifier for hepatoma detection. In
*Proceedings of the 1994 World Congress on Neural Networks*, volume I, pages 285–290, New Jersey, 1994. Lawrence Erlbaum Associates.Google Scholar - 40.B. A. Pearlmutter and R. Rosenfeld. Chaitin-Kolmogorov complexity and generalization in neural networks. In
*Advances in Neural Information Processing Systems 3*, pages 925–931, 1991.Google Scholar - 41.M. P. Perrone.
*Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization*. PhD thesis, Department of Physics, Brown University, May 1993.Google Scholar - 42.M. P. Perrone and L. N. Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In R. J. Mammone, editor,
*Neural Networks for Speech and Image Processing*. Chapman & Hall, 1993.Google Scholar - 43.Y. Raviv and N. Intrator. Bootstrapping with noise: An effective regularization technique.
*Connection Science*, 8(3 & 4):355–372, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar - 44.G. Rogova. Combining the results of several neural network classifiers.
*Neural Networks*, 7 (5): 777–781, 1994.CrossRefGoogle Scholar - 45.B. E. Rosen. Ensemble learning using decorrelated neural networks.
*Connection Science*, 8(3 & 4):373–383, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar - 46.R. L. Scheaffer and J. T. McClave.
*Probability and Statistics for Engineers*. PWS-KENT Publishing Company, Boston, 1990.Google Scholar - 47.D. C. Schmittlein, J. Kim, and D. G. Morrison. Combining forecasts: Operational adjustments to theoretically optimal rules.
*Management Science*, 36 (9): 1044–1056, Sept. 1990.CrossRefGoogle Scholar - 48.A. J. Sharkey. On combining artificial neural nets.
*Connection Science*, 8(3 & 4):299–313, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar - 49.A. J. Sharkey. Modularity, combining and artificial neural nets.
*Connection Science*, 9(1):3–10, 1997. Special issue on Combining Neural Networks: Modular Approaches.Google Scholar - 50.I. M. Sobol’.
*The Monte Carlo method.*University of Chicago Press, 1974. Translated and adapted from the 2nd Russian edition by R. Messer, J. Stone, and P. Fortini.Google Scholar - 51.K. Turner and J. Ghosh. Error correction and error reduction in ensemble classifiers.
*Connection Science*, 8(3 & 4):385–404, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar - 52.C. T. West. System-based weights versus series-specific weights in the combination of forecasts.
*Journal of Finance*, 15: 369–383, 1996.Google Scholar - 53.R. L. Winkler and R. T. Clemen. Sensitivity of weights in combining forecasts.
*Operations Research*, 40(3):609–614, May-June 1992.MATHCrossRefGoogle Scholar - 54.D. H. Wolpert. Stacked generalization.
*Neural Networks*, 5: 241–259, 1992.CrossRefGoogle Scholar