The Dynamics of Negative Correlation Learning

Article

Abstract

In this paper we combine two points made in two previous papers on negative correlation learning (NC) by different authors, which have theoretical implications for the optimal setting of λ, a parameter of the method whose correct choice is critical for stability and good performance. An expression for the optimal λ is derived whose value λ* depends only on the number of classifiers in the ensemble. This result arises from the form of the ambiguity decomposition of the ensemble error, and the close links between this and the error function used in NC. By analyzing the dynamics of the outputs we find dramatically different behavior for λ < λ*, λ = λ* and λ > λ*, providing further motivation for our choice of λ and theoretical explanations for some empirical observations in other papers on NC. These results will be illustrated using well known synthetic and medical datasets.

Keywords

negative correlation learning dynamics stability ensemble methods combination classification neural networks 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    L. Breiman, “Bagging Predictors,” Machine Learning, vol. 24, no. 2, 1996, pp. 123–140.MathSciNetMATHGoogle Scholar
  2. 2.
    L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, 2001, pp. 5–32.CrossRefGoogle Scholar
  3. 3.
    G. Brown, and J.L. Wyatt, “The Use of the Ambiguity Decomposition in Neural Network Ensemble Learning Methods,” in 20th International Conference on Machine Learning (ICML’03), T. Fawcett and N. Mishra (Eds.), Washington DC, USA, August 2003.Google Scholar
  4. 4.
    C.L. Blake D.J. Newman, S. Hettich, and C.J. Merz, UCI Repository of Machine Learning Databases, 1998.Google Scholar
  5. 5.
    Y. Freund and R.E. Schapire, “Experiments with a New Boosting Algorithm,” in Proceedings of the 13th International Conference on Machine Learning, Morgan Kaufmann, 1996, pp. 148–156.Google Scholar
  6. 6.
    Md.M. Islam, X. Yao, and K. Murase, “A Constructive Algorithm for Training Cooperative Neural Network Ensembles,” IEEE Transactions on Neural Networks, vol. 14, no. 4, 2003, pp. 820–834 (July).CrossRefGoogle Scholar
  7. 7.
    R.A. Jacobs, M.I. Jordan, S.J. Nowlan, and G.E. Hinton, “Adaptive Mixtures of Local Experts,” Neural Computation, vol. 3, no. 1, 1991, pp. 79–87.CrossRefGoogle Scholar
  8. 8.
    A. Krogh, and J.A. Hertz, “A Simple Weight Decay Can Improve Generalization,” in Advances in Neural Information Processing Systems, volume 4, J.E. Moody, S.J. Hanson, and R.P. Lippmann (Eds.), Morgan Kaufmann Publishers, Inc., 1992, pp. 950–957.Google Scholar
  9. 9.
    A. Krogh and J. Vedelsby, “Neural Network Ensembles, Cross Validation, and Active Learning,” Advances in Neural Information Processing Systems, vol. 7, 1995, pp. 231–238.Google Scholar
  10. 10.
    Y. Liu and X. Yao, “Ensemble Learning Via Negative Correlation,” Neural Networks, vol. 12, 1999, pp. 1399–1404.CrossRefGoogle Scholar
  11. 11.
    R. McKay and H. Abbass, “Analyzing Anticorrelation in Ensemble Learning,” in Proceedings of 2001 Conference on Artificial Neural Networks and Expert Systems, Otago, New Zealand, 2001, pp. 22–27.Google Scholar
  12. 12.
    P. Melville and R. Mooney, “Constructing diverse classifier ensembles using artificial training examples,” in Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Mexico, August 2003, pp. 505–510.Google Scholar
  13. 13.
    D. Opitz and J. Shavlik, “A genetic Algorithm Approach for Creating Neural Network Ensembles,” Combining Artificial Neural Nets, Springer, 1999, pp. 79–99.Google Scholar
  14. 14.
    D. Ruta and B. Gabrys, “A Theoretical Analysis of the Limits of Majority Voting Errors for Multiple Classifier Systems,” Pattern Analysis and Applications, vol. 5, 2002, pp. 333–350.MathSciNetCrossRefGoogle Scholar
  15. 15.
    W.N. Street, W.H. Wolberg, and O.L. Mangasarian, “Nuclear Feature Extraction for Breast Tumour Diagnosis,” International Symposium on Electronic Imaging: Science and Technology, vol. 1905, 1993, pp. 861–870.Google Scholar
  16. 16.
    K. Tumer and N.C. Oza, “Input Decimated Ensembles,” Pattern Analysis and Applications, vol. 6, no. 1, 2003, pp. 65–77.MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    K. Tumer and J. Ghosh, “Error Correlation and Error Reduction in Ensemble Classifiers,” Connection Science, vol. 8, no. 3–4, 1996, pp. 385–403.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Computational Intelligence Research Group, School of Design, Engineering and ComputingBournemouth UniversityBournemouthUK

Personalised recommendations