Treating Harmful Collinearity in Neural Network Ensembles

  • Amanda J. C. Sharkey
Part of the Perspectives in Neural Computing book series (PERSPECT.NEURAL)

Summary

In the last decade, several techniques have been developed for combining neural networks [48, 49]. Combining a number of trained neural networks to form what is often referred to as a neural network ensemble, may yield better model accuracy without requiring extensive efforts in training the individual networks or optimising their architecture [21, 48]. However, because the corresponding outputs of the individual networks approximate the same physical quantity (or quantities), they may be highly positively correlated or collinear (linearly dependent). Thus, the estimation of the optimal weights for combining such networks may be subjected to the harmful effects of Collinearity, resulting in a neural network ensemble with inferior generalisation ability compared to the individual networks [20, 42, 48].

In this chapter, we discuss the harmful effects of collinearity on the estimation of the optimal combination-weights for combining the networks. We describe an approach for treating collinearity by the proper selection of the component networks, and test two algorithms for selecting the components networks in order to improve the generalisation ability of the ensemble. We present experimental results to demonstrate the effectiveness of optimal linear combinations, guided by the selection algorithms, in improving model accuracy.

Keywords

Covariance Rosen Decid Doyle Estima 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    E. Alpaydin. Multiple networks for function learning. In Proceedings of the 1993 IEEE International Conference on Neural Networks, volume I, pages 914. IEEE Press, Apr. 1993.Google Scholar
  2. 2.
    R. Battiti and A. M. Colla. Democracy in neural nets: Voting schemes for classification. Neural Networks, 7 (4): 691–707, 1994.CrossRefGoogle Scholar
  3. 3.
    W. G. Baxt. Improving the accuracy of an artificial neural network using multiple differently trained networks. Neural Computation, 4: 772–780, 1992.CrossRefGoogle Scholar
  4. 4.
    D. A. Belsley. Assessing the presence of harmful collinearity and other forms of weak data through a test for signal-to-noise. Journal of Econometrics, 20: 211253, 1982.Google Scholar
  5. 5.
    D. A. Belsley. Conditioning Diagnostics: Collinearity and Weak Data in Regression. John Wiley & Sons, New York, 1991.MATHGoogle Scholar
  6. 6.
    D. A. Belsley, E. Kuth, and R. E. Welsch. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley & Sons, New York, 1980.MATHCrossRefGoogle Scholar
  7. 7.
    J. A. Benediktsson, J. R. Sveinsson, O. K. Ersoy, and P. H. Swain. Parallel consensual neural networks. IEEE Transactions on Neural Networks, 8 (1): 5464, 1997.CrossRefGoogle Scholar
  8. 8.
    L. Breiman. Stacked regressions. Technical Report 367, Department of Statistics, University of California, Berkeley, California 94720, USA, Aug. 1992. Revised June 1994.Google Scholar
  9. 9.
    D. W. Bunn. Statistical efficiency in the linear combination of forecasts. International Journal of Forecasting, 1: 151–163, 1985.CrossRefGoogle Scholar
  10. 10.
    D. W. Bunn. Forecasting with more than one model. Journal of Forecasting, 8: 161–166, 1989.CrossRefGoogle Scholar
  11. 11.
    V. Cherkassky, D. Gehring, and F. Mulier. Pragmatic comparison of statistical and neural network methods for function estimation. In Proceedings of the 1995 World Congress on Neural Networks, volume II, pages 917–926, 1995.Google Scholar
  12. 12.
    V. Cherkassky and H. Lari-Najafi. Constrained topological mapping for non-parametric regression analysis. Neural Networks, 4: 27–40, 1991.CrossRefGoogle Scholar
  13. 13.
    R. T. Clemen. Combining forecasts: A review and annotated bibliography. International Journal of Forecasting, 5: 559–583, 1989.CrossRefGoogle Scholar
  14. 14.
    R. T. Clemen and R. L. Winkler. Combining economic forecasts. Journal of Business & Economic Statistics, 4 (1): 39–46, Jan. 1986.CrossRefGoogle Scholar
  15. 15.
    L. Cooper. Hybrid neural network architectures: Equilibrium systems that pay attention. In R. J. Mammone and Y. Y. Zeevi, editors, Neural Networks: Theory and Applications, pages 81–96. Academic Press, 1991.Google Scholar
  16. 16.
    C. W. J. Granger. Combining forecasts–twenty years later. Journal of Forecasting, 8: 167–173, 1989.CrossRefGoogle Scholar
  17. 17.
    J. B. Guerard Jr. and R. T. Clemen. Collinearity and the use of latent root regression for combining GNP forecasts. Journal of Forecasting, 8: 231–238, 1989.CrossRefGoogle Scholar
  18. 18.
    L. K. Hansen and P. Salamon. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12 (10): 993–1001, 1990.CrossRefGoogle Scholar
  19. 19.
    S. Hashem. Optimal Linear Combinations of Neural Networks. PhD thesis, School of Industrial Engineering, Purdue University, Dec. 1993.Google Scholar
  20. 20.
    S. Hashem. Effects of collinearity on combining neural networks. Connection Science, 8(3 & 4):315–336, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar
  21. 21.
    S. Hashem. Optimal linear combinations of neural networks. Neural Networks, 10 (4): 599–614, 1997.CrossRefGoogle Scholar
  22. 22.
    S. Hashem and B. Schmeiser. Approximating a function and its derivatives using MSE-optimal linear combinations of trained feedforward neural networks. In Proceedings of the 1993 World Congress on Neural Networks, volume I, pages 617–620, New Jersey, 1993. Lawrence Erlbaum Associates.Google Scholar
  23. 23.
    S. Hashem and B. Schmeiser. Improving model accuracy using optimal linear combinations of trained neural networks. IEEE Transactions on Neural Networks, 6 (3): 792–794, 1995.CrossRefGoogle Scholar
  24. 24.
    S. Hashem, B. Schmeiser, and Y. Yih. Optimal linear combinations of neural networks: An overview. In Proceedings of the 1994 IEEE International Conference on Neural Networks, volume III, pages 1507–1512. IEEE Press, 1994.CrossRefGoogle Scholar
  25. 25.
    W. W. Hines and D. C. Montgomery. Probability and Statistics in Engineering and Management Science. John Wiley & Sons, 1990.MATHGoogle Scholar
  26. 26.
    J.-N. Hwang, S.-R. Lay, M. Maechler, R. D. Martin, and J. Schimert. Regression modeling in back-propagation and projection pursuit learning. IEEE Transactions on Neural Networks, 5 (3): 342–353, May 1994.CrossRefGoogle Scholar
  27. 27.
    R. A. Jacobs. Bias/variance analysis of mixtures-of-experts architectures. Neural Computation, 9: 369–383, 1997.MathSciNetMATHCrossRefGoogle Scholar
  28. 28.
    R. A. Jacobs and M. Jordan. A competitive modular connectionist architecture. In R. Lippmann, J. Moody, and D. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 767–773. Morgan Kaufman, 1991.Google Scholar
  29. 29.
    R. A. Jacobs and M. Jordan. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6: 181–214, 1994.CrossRefGoogle Scholar
  30. 30.
    A. Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning. In G. Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, pages 231–238. MIT Press, 1995.Google Scholar
  31. 31.
    M. Maechler, D. Martin, J. Schimert, M. Csoppenszky, and J. Hwang. Projection pursuit learning networks for regression. In Proceedings of the 2nd International Conference on Tools for Artificial Intelligence, Washington D.C., pages 350–358. IEEE Press, November 1990.CrossRefGoogle Scholar
  32. 32.
    G. Mani. Lowering variance of decisions by using artificial neural networks portfolios. Neural Computation, 3: 484–486, 1991.CrossRefGoogle Scholar
  33. 33.
    L. Menezes and D. Bunn. Specification of predictive distribution from a combination of forecasts. Methods of Operations Research, 64: 397–405, 1991.Google Scholar
  34. 34.
    H. Moskowitz and G. P. Wright. Statistics for Management and Economics. Charles Merrill Publishing Company, Ohio, 1985.Google Scholar
  35. 35.
    J. Neter, W. Wasserman, and M. H. Kutner. Applied Linear Statistical Models. Irwin, Homewood, IL, 1990. 3rd Edition.Google Scholar
  36. 36.
    L. Ohno-Machado and M. A. Musen. Hierarchical neural networks for partial diagnosis in medicine. In Proceedings of the 1994 World Congress on Neural Networks, volume 1, pages 291–296. Lawrence Erlbaum Associates, 1994.Google Scholar
  37. 37.
    D. W. Opitz and J. W. Shavlik. Actively searching for an effective neural network ensemble. Connection Science, 8(3 &4):337–353, Dec. 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar
  38. 38.
    B. Parmanto, P. W. Munro, and H. R. Doyle. Reducing variance of committee prediction with resampling techniques. Connection Science, 8(3 & 4):405–425, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar
  39. 39.
    B. Parmanto, P. W. Munro, H. R. Doyle, C. Doria, L. Aldrighetti, I. R. Marino, S. Mitchel, and J. J. Fung. Neural network classifier for hepatoma detection. In Proceedings of the 1994 World Congress on Neural Networks, volume I, pages 285–290, New Jersey, 1994. Lawrence Erlbaum Associates.Google Scholar
  40. 40.
    B. A. Pearlmutter and R. Rosenfeld. Chaitin-Kolmogorov complexity and generalization in neural networks. In Advances in Neural Information Processing Systems 3, pages 925–931, 1991.Google Scholar
  41. 41.
    M. P. Perrone. Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization. PhD thesis, Department of Physics, Brown University, May 1993.Google Scholar
  42. 42.
    M. P. Perrone and L. N. Cooper. When networks disagree: Ensemble methods for hybrid neural networks. In R. J. Mammone, editor, Neural Networks for Speech and Image Processing. Chapman & Hall, 1993.Google Scholar
  43. 43.
    Y. Raviv and N. Intrator. Bootstrapping with noise: An effective regularization technique. Connection Science, 8(3 & 4):355–372, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar
  44. 44.
    G. Rogova. Combining the results of several neural network classifiers. Neural Networks, 7 (5): 777–781, 1994.CrossRefGoogle Scholar
  45. 45.
    B. E. Rosen. Ensemble learning using decorrelated neural networks. Connection Science, 8(3 & 4):373–383, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar
  46. 46.
    R. L. Scheaffer and J. T. McClave. Probability and Statistics for Engineers. PWS-KENT Publishing Company, Boston, 1990.Google Scholar
  47. 47.
    D. C. Schmittlein, J. Kim, and D. G. Morrison. Combining forecasts: Operational adjustments to theoretically optimal rules. Management Science, 36 (9): 1044–1056, Sept. 1990.CrossRefGoogle Scholar
  48. 48.
    A. J. Sharkey. On combining artificial neural nets. Connection Science, 8(3 & 4):299–313, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar
  49. 49.
    A. J. Sharkey. Modularity, combining and artificial neural nets. Connection Science, 9(1):3–10, 1997. Special issue on Combining Neural Networks: Modular Approaches.Google Scholar
  50. 50.
    I. M. Sobol’. The Monte Carlo method. University of Chicago Press, 1974. Translated and adapted from the 2nd Russian edition by R. Messer, J. Stone, and P. Fortini.Google Scholar
  51. 51.
    K. Turner and J. Ghosh. Error correction and error reduction in ensemble classifiers. Connection Science, 8(3 & 4):385–404, 1996. Special issue on Combining Neural Networks: Ensemble Approaches.Google Scholar
  52. 52.
    C. T. West. System-based weights versus series-specific weights in the combination of forecasts. Journal of Finance, 15: 369–383, 1996.Google Scholar
  53. 53.
    R. L. Winkler and R. T. Clemen. Sensitivity of weights in combining forecasts. Operations Research, 40(3):609–614, May-June 1992.MATHCrossRefGoogle Scholar
  54. 54.
    D. H. Wolpert. Stacked generalization. Neural Networks, 5: 241–259, 1992.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 1999

Authors and Affiliations

  • Amanda J. C. Sharkey
    • 1
  1. 1.Department of Computer ScienceUniversity of SheffieldSheffieldUK

Personalised recommendations