Natural Computing

, Volume 8, Issue 2, pp 289–320 | Cite as

Negative correlation in incremental learning

Article

Abstract

Negative Correlation Learning (NCL) has been successfully applied to construct neural network ensembles. It encourages the neural networks that compose the ensemble to be different from each other and, at the same time, accurate. The difference among the neural networks that compose an ensemble is a desirable feature to perform incremental learning, for some of the neural networks can be able to adapt faster and better to new data than the others. So, NCL is a potentially powerful approach to incremental learning. With this in mind, this paper presents an analysis of NCL, aiming at determining its weak and strong points to incremental learning. The analysis shows that it is possible to use NCL to overcome catastrophic forgetting, an important problem related to incremental learning. However, when catastrophic forgetting is very low, no advantage of using more than one neural network of the ensemble to learn new data is taken and the test error is high. When all the neural networks are used to learn new data, some of them can indeed adapt better than the others, but a higher catastrophic forgetting is obtained. In this way, it is important to find a trade-off between overcoming catastrophic forgetting and using an entire ensemble to learn new data. The NCL results are comparable with other approaches which were specifically designed to incremental learning. Thus, the study presented in this work reveals encouraging results with negative correlation in incremental learning, showing that NCL is a promising approach to incremental learning.

Keywords

Neural network ensembles Incremental learning Negative correlation learning Multi-layer perceptrons Self-generating neural tree Self-organising neural grove Classification 

Abbreviations

NCL

Negative correlation learning

SGNT

Self-generating neural tree

SGNN

Self-generating neural network

ESGNN

Ensemble of self-generating neural networks

SONG

Self-organising neural grove

MLP

Multi-layer perceptron

SOM

Self-organising map

EFuNN

Evolving fuzzy neural network

AdaBoost

Adaptive boosting

ART

Adaptive resonance theory

GL

Generalization loss

References

  1. Adamczak R, Duch W, Jankowski N (1997) New developments in the feature space mapping model. In: Proceedings of the third conference on neural networks and their applications, Kule, Poland, pp 65–70Google Scholar
  2. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATHMathSciNetGoogle Scholar
  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32MATHCrossRefGoogle Scholar
  4. Brown G (2004) Diversity in neural network ensembles. Ph.D. Thesis, School of Computer Science, The University of Birmingham, Birmingham, UK. URL: http://www.cs.man.ac.uk/∼gbrown/research.php
  5. Brown G, Wyatt JL, Tiño P (2005) Managing diversity in regression ensembles. J Mach Learn Res 6:1621–1650MathSciNetGoogle Scholar
  6. Carpenter GA, Grossberg S, Reynolds JH (1991) ARTMAP: supervied real-time learning and classification of nonstationary data by a self organizing neural network. Neural Networks 4(5):565–588CrossRefGoogle Scholar
  7. Carpenter GA, Grossberg S, Markuzon N, Reynolds JH (1992) Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans Neural Networks 3:698–713CrossRefGoogle Scholar
  8. Chandra A, Yao X (2006) Evolving hybrid ensembles of learning machines for better generalisation. Neurocomputing 69:686–700CrossRefGoogle Scholar
  9. Chandra A, Chen H, Yao X (2006) Trade-off between diversity and accuracy in ensemble generation. In: Jin Y (ed) Multi-objective machine learning. Springer-Verlag, pp 429–464Google Scholar
  10. Dietterich TG (1997) Machine learning research: four current directions. AI Mag 18:97–136Google Scholar
  11. Dietterich TG (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10:1895–1923CrossRefGoogle Scholar
  12. Dietterich T (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Mach Learn 40(2):1–22CrossRefGoogle Scholar
  13. Eiben AE, Smith JE (2003) Introduction to evolutionary computing. Springer-Verlag Berlin Heidelberg, New YorkMATHGoogle Scholar
  14. Freund Y, Schapire R (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comp Syst Sci 55(1):119–139MATHCrossRefMathSciNetGoogle Scholar
  15. Inoue H, Narihisa H (2000) Improving generalization ability of self-generating neural networks through ensemble averaging. In: Proceedings of the fourth Pacific-Asia conference on knowledge discovery and data mining (LNAI 1805), Kyoto, Japan, pp 177–180Google Scholar
  16. Inoue H, Narihisa H (2003) Effective pruning method for a multiple classifier system based on self-generating neural networks. In: Proceedings of the 2003 joint international conference (ICANN/ICONIP’03-LNCS 2714), Istanbul, Turkey, pp 11–18Google Scholar
  17. Inoue H, Narihisa H (2005) Self-organizing neural grove and its applications. In: Proceedings of the 2005 international joint conference on neural networks (IJCNN’05), Montreal, Canada, pp 1205–1210Google Scholar
  18. Islam MM, Yao X, Murase K (2003) A constructive algorithm for training cooperative neural network ensembles. IEEE Trans Neural Networks 14(4):820–834CrossRefGoogle Scholar
  19. Kasabov N (2001) Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning. IEEE Trans Syst Man Cybernet – Part B: Cybernet 31(6):902–918CrossRefGoogle Scholar
  20. Kohonen T (1995) Self-organizing maps. Springer-Verlag, BerlinGoogle Scholar
  21. Kuncheva L, Whitaker C (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207MATHCrossRefGoogle Scholar
  22. Larose DT (2004) Discovering knowledge in data: an introduction to data mining. Wiley-InterscienceGoogle Scholar
  23. Liu Y, Yao X (1999a) Ensemble learning via negative correlation. Neural Networks 12:1399–1404CrossRefGoogle Scholar
  24. Liu Y, Yao X (1999b) Simultaneous training of negatively correlated neural networks in an ensemble. IEEE Trans Syst Man Cybernet Part B – Cybernet 29(6):716–725CrossRefGoogle Scholar
  25. Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. URL: http://www.ics.uci.edu/∼mlearn/MLRepository.html
  26. Polikar R, Udpa L, Udpa SS, Honavar V (2001) Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans Syst Man Cybernet – Part C: Appl Rev 31(4):497–508CrossRefGoogle Scholar
  27. Prechelt L (1994) PROBEN1 – a set of neural network benchmark problems and benchmarking rules. Technical Report 21/94, FakultSt fnr Informatik, UniversitSt Karlsruhe, Karlsruhe, GermanyGoogle Scholar
  28. Rätsch G, Onoda T, Müller K-R (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320MATHCrossRefGoogle Scholar
  29. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. Parallel Distrib Process: Explor Microstruct Cogn I:318–362Google Scholar
  30. Schapire R (1990) Strength of weak learning. Mach Learn 5:197–227Google Scholar
  31. Schapire RE, Freund Y, Bartlett PL, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5):1651–1686MATHCrossRefMathSciNetGoogle Scholar
  32. Seipone T, Bullinaria J (2005) Evolving improved incremental learning schemes for neural network systems. In: Proceedings of the 2005 IEEE congress on evolutionary computing (CEC’2005), Piscataway, NJ, pp 273–280Google Scholar
  33. Tang EK, Suganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 62(1):247–271CrossRefGoogle Scholar
  34. Wang Z, Yao X, Xu Y (2004) An improved constructive neural network ensemble approach to medical diagnoses. In: Proceedings of the fifth international conference on intelligent data engineering and automated learning (IDEAL’04), Lecture Notes in Computer Science, vol 3177, Springer, Exeter, UK, pp 572–577Google Scholar
  35. Wen WX, Jennings A, Liu H (1992) Learning a neural tree. In: Proceedings of the 1992 international joint conference on neural networks (IJCNN’92), vol 2, Beijing, China, pp 751–756Google Scholar
  36. Witten IH, Frank E (2000) Data mining – pratical machine learning tools and techniques with Java implementations. Morgan Kaufmann Publishers, San FranciscoGoogle Scholar
  37. Zanchettin C, Minku FL, Ludermir TB (2005) Design of experiments in neuro-fuzzy systems. In: Proceedings of the 5th international conference on hybrid intelligent systems, HIS’2005, Rio de Janeiro, Brasil, pp 218–223Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  1. 1.The Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer ScienceThe University of BirminghamEdgbastonUK
  2. 2.Department of Electrical Engineering and Information ScienceKure National College of TechnologyKureJapan

Personalised recommendations