Skip to main content

Treating Harmful Collinearity in Neural Network Ensembles

  • Chapter

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

Summary

In the last decade, several techniques have been developed for combining neural networks [48, 49]. Combining a number of trained neural networks to form what is often referred to as a neural network ensemble, may yield better model accuracy without requiring extensive efforts in training the individual networks or optimising their architecture [21, 48]. However, because the corresponding outputs of the individual networks approximate the same physical quantity (or quantities), they may be highly positively correlated or collinear (linearly dependent). Thus, the estimation of the optimal weights for combining such networks may be subjected to the harmful effects of Collinearity, resulting in a neural network ensemble with inferior generalisation ability compared to the individual networks [20, 42, 48].

In this chapter, we discuss the harmful effects of collinearity on the estimation of the optimal combination-weights for combining the networks. We describe an approach for treating collinearity by the proper selection of the component networks, and test two algorithms for selecting the components networks in order to improve the generalisation ability of the ensemble. We present experimental results to demonstrate the effectiveness of optimal linear combinations, guided by the selection algorithms, in improving model accuracy.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Alpaydin. Multiple networks for function learning. In Proceedings of the 1993 IEEE International Conference on Neural Networks, volume I, pages 914. IEEE Press, Apr. 1993.

    Google Scholar 

  2. R. Battiti and A. M. Colla. Democracy in neural nets: Voting schemes for classification. Neural Networks, 7 (4): 691–707, 1994.

    Article  Google Scholar 

  3. W. G. Baxt. Improving the accuracy of an artificial neural network using multiple differently trained networks. Neural Computation, 4: 772–780, 1992.

    Article  Google Scholar 

  4. D. A. Belsley. Assessing the presence of harmful collinearity and other forms of weak data through a test for signal-to-noise. Journal of Econometrics, 20: 211253, 1982.

    Google Scholar 

  5. D. A. Belsley. Conditioning Diagnostics: Collinearity and Weak Data in Regression. John Wiley & Sons, New York, 1991.

    MATH  Google Scholar 

  6. D. A. Belsley, E. Kuth, and R. E. Welsch. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley & Sons, New York, 1980.

    Book  MATH  Google Scholar 

  7. J. A. Benediktsson, J. R. Sveinsson, O. K. Ersoy, and P. H. Swain. Parallel consensual neural networks. IEEE Transactions on Neural Networks, 8 (1): 5464, 1997.

    Article  Google Scholar 

  8. L. Breiman. Stacked regressions. Technical Report 367, Department of Statistics, University of California, Berkeley, California 94720, USA, Aug. 1992. Revised June 1994.

    Google Scholar 

  9. D. W. Bunn. Statistical efficiency in the linear combination of forecasts. International Journal of Forecasting, 1: 151–163, 1985.

    Article  Google Scholar 

  10. D. W. Bunn. Forecasting with more than one model. Journal of Forecasting, 8: 161–166, 1989.

    Article  Google Scholar 

  11. V. Cherkassky, D. Gehring, and F. Mulier. Pragmatic comparison of statistical and neural network methods for function estimation. In Proceedings of the 1995 World Congress on Neural Networks, volume II, pages 917–926, 1995.

    Google Scholar 

  12. V. Cherkassky and H. Lari-Najafi. Constrained topological mapping for non-parametric regression analysis. Neural Networks, 4: 27–40, 1991.

    Article  Google Scholar 

  13. R. T. Clemen. Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5: 559–583, 1989.

    Article  Google Scholar 

  14. R. T. Clemen and R. L. Winkler. Combining economic forecasts. Journal of Business & Economic Statistics, 4 (1): 39–46, Jan. 1986.

    Article  Google Scholar 

  15. L. Cooper. Hybrid neural network architectures: Equilibrium systems that pay attention. In R. J. Mammone and Y. Y. Zeevi, editors, Neural Networks: Theory and Applications, pages 81–96. Academic Press, 1991.

    Google Scholar 

  16. C. W. J. Granger. Combining forecasts–twenty years later. Journal of Forecasting, 8: 167–173, 1989.

    Article  Google Scholar 

  17. J. B. Guerard Jr. and R. T. Clemen. Collinearity and the use of latent root regression for combining GNP forecasts. Journal of Forecasting, 8: 231–238, 1989.

    Article  Google Scholar 

  18. L. K. Hansen and P. Salamon. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12 (10): 993–1001, 1990.

    Article  Google Scholar 

  19. S. Hashem. Optimal Linear Combinations of Neural Networks. PhD thesis, School of Industrial Engineering, Purdue University, Dec. 1993.

    Google Scholar 

  20. S. Hashem. Effects of collinearity on combining neural networks. Connection Science, 8(3 & 4):315–336, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.

    Google Scholar 

  21. S. Hashem. Optimal linear combinations of neural networks. Neural Networks, 10 (4): 599–614, 1997.

    Article  Google Scholar 

  22. S. Hashem and B. Schmeiser. Approximating a function and its derivatives using MSE-optimal linear combinations of trained feedforward neural networks. In Proceedings of the 1993 World Congress on Neural Networks, volume I, pages 617–620, New Jersey, 1993. Lawrence Erlbaum Associates.

    Google Scholar 

  23. S. Hashem and B. Schmeiser. Improving model accuracy using optimal linear combinations of trained neural networks. IEEE Transactions on Neural Networks, 6 (3): 792–794, 1995.

    Article  Google Scholar 

  24. S. Hashem, B. Schmeiser, and Y. Yih. Optimal linear combinations of neural networks: An overview. In Proceedings of the 1994 IEEE International Conference on Neural Networks, volume III, pages 1507–1512. IEEE Press, 1994.

    Chapter  Google Scholar 

  25. W. W. Hines and D. C. Montgomery. Probability and Statistics in Engineering and Management Science. John Wiley & Sons, 1990.

    MATH  Google Scholar 

  26. J.-N. Hwang, S.-R. Lay, M. Maechler, R. D. Martin, and J. Schimert. Regression modeling in back-propagation and projection pursuit learning. IEEE Transactions on Neural Networks, 5 (3): 342–353, May 1994.

    Article  Google Scholar 

  27. R. A. Jacobs. Bias/variance analysis of mixtures-of-experts architectures. Neural Computation, 9: 369–383, 1997.

    Article  MathSciNet  MATH  Google Scholar 

  28. R. A. Jacobs and M. Jordan. A competitive modular connectionist architecture. In R. Lippmann, J. Moody, and D. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 767–773. Morgan Kaufman, 1991.

    Google Scholar 

  29. R. A. Jacobs and M. Jordan. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6: 181–214, 1994.

    Article  Google Scholar 

  30. A. Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, pages 231–238. MIT Press, 1995.

    Google Scholar 

  31. M. Maechler, D. Martin, J. Schimert, M. Csoppenszky, and J. Hwang. Projection pursuit learning networks for regression. In Proceedings of the 2nd International Conference on Tools for Artificial Intelligence, Washington D.C., pages 350–358. IEEE Press, November 1990.

    Chapter  Google Scholar 

  32. G. Mani. Lowering variance of decisions by using artificial neural networks portfolios. Neural Computation, 3: 484–486, 1991.

    Article  Google Scholar 

  33. L. Menezes and D. Bunn. Specification of predictive distribution from a combination of forecasts. Methods of Operations Research, 64: 397–405, 1991.

    Google Scholar 

  34. H. Moskowitz and G. P. Wright. Statistics for Management and Economics. Charles Merrill Publishing Company, Ohio, 1985.

    Google Scholar 

  35. J. Neter, W. Wasserman, and M. H. Kutner. Applied Linear Statistical Models. Irwin, Homewood, IL, 1990. 3rd Edition.

    Google Scholar 

  36. L. Ohno-Machado and M. A. Musen. Hierarchical neural networks for partial diagnosis in medicine. In Proceedings of the 1994 World Congress on Neural Networks, volume 1, pages 291–296. Lawrence Erlbaum Associates, 1994.

    Google Scholar 

  37. D. W. Opitz and J. W. Shavlik. Actively searching for an effective neural network ensemble. Connection Science, 8(3 &4):337–353, Dec. 1996. Special issue on Combining Neural Networks: Ensemble Approaches.

    Google Scholar 

  38. B. Parmanto, P. W. Munro, and H. R. Doyle. Reducing variance of committee prediction with resampling techniques. Connection Science, 8(3 & 4):405–425, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.

    Google Scholar 

  39. B. Parmanto, P. W. Munro, H. R. Doyle, C. Doria, L. Aldrighetti, I. R. Marino, S. Mitchel, and J. J. Fung. Neural network classifier for hepatoma detection. In Proceedings of the 1994 World Congress on Neural Networks, volume I, pages 285–290, New Jersey, 1994. Lawrence Erlbaum Associates.

    Google Scholar 

  40. B. A. Pearlmutter and R. Rosenfeld. Chaitin-Kolmogorov complexity and generalization in neural networks. In Advances in Neural Information Processing Systems 3, pages 925–931, 1991.

    Google Scholar 

  41. M. P. Perrone. Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization. PhD thesis, Department of Physics, Brown University, May 1993.

    Google Scholar 

  42. M. P. Perrone and L. N. Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In R. J. Mammone, editor, Neural Networks for Speech and Image Processing. Chapman & Hall, 1993.

    Google Scholar 

  43. Y. Raviv and N. Intrator. Bootstrapping with noise: An effective regularization technique. Connection Science, 8(3 & 4):355–372, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.

    Google Scholar 

  44. G. Rogova. Combining the results of several neural network classifiers. Neural Networks, 7 (5): 777–781, 1994.

    Article  Google Scholar 

  45. B. E. Rosen. Ensemble learning using decorrelated neural networks. Connection Science, 8(3 & 4):373–383, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.

    Google Scholar 

  46. R. L. Scheaffer and J. T. McClave. Probability and Statistics for Engineers. PWS-KENT Publishing Company, Boston, 1990.

    Google Scholar 

  47. D. C. Schmittlein, J. Kim, and D. G. Morrison. Combining forecasts: Operational adjustments to theoretically optimal rules. Management Science, 36 (9): 1044–1056, Sept. 1990.

    Article  Google Scholar 

  48. A. J. Sharkey. On combining artificial neural nets. Connection Science, 8(3 & 4):299–313, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.

    Google Scholar 

  49. A. J. Sharkey. Modularity, combining and artificial neural nets. Connection Science, 9(1):3–10, 1997. Special issue on Combining Neural Networks: Modular Approaches.

    Google Scholar 

  50. I. M. Sobol’. The Monte Carlo method. University of Chicago Press, 1974. Translated and adapted from the 2nd Russian edition by R. Messer, J. Stone, and P. Fortini.

    Google Scholar 

  51. K. Turner and J. Ghosh. Error correction and error reduction in ensemble classifiers. Connection Science, 8(3 & 4):385–404, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.

    Google Scholar 

  52. C. T. West. System-based weights versus series-specific weights in the combination of forecasts. Journal of Finance, 15: 369–383, 1996.

    Google Scholar 

  53. R. L. Winkler and R. T. Clemen. Sensitivity of weights in combining forecasts. Operations Research, 40(3):609–614, May-June 1992.

    Article  MATH  Google Scholar 

  54. D. H. Wolpert. Stacked generalization. Neural Networks, 5: 241–259, 1992.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag London Limited

About this chapter

Cite this chapter

Sharkey, A.J.C. (1999). Treating Harmful Collinearity in Neural Network Ensembles. In: Sharkey, A.J.C. (eds) Combining Artificial Neural Nets. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0793-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0793-4_5

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-004-0

  • Online ISBN: 978-1-4471-0793-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics