Skip to main content
Log in

Incorporation of a Regularization Term to Control Negative Correlation in Mixture of Experts

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Combining accurate neural networks (NN) in the ensemble with negative error correlation greatly improves the generalization ability. Mixture of experts (ME) is a popular combining method which employs special error function for the simultaneous training of NN experts to produce negatively correlated NN experts. Although ME can produce negatively correlated experts, it does not include a control parameter like negative correlation learning (NCL) method to adjust this parameter explicitly. In this study, an approach is proposed to introduce this advantage of NCL into the training algorithm of ME, i.e., mixture of negatively correlated experts (MNCE). In this proposed method, the capability of a control parameter for NCL is incorporated in the error function of ME, which enables its training algorithm to establish better balance in bias-variance-covariance trade-off and thus improves the generalization ability. The proposed hybrid ensemble method, MNCE, is compared with their constituent methods, ME and NCL, in solving several benchmark problems. The experimental results show that our proposed ensemble method significantly improves the performance over the original ensemble methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Mu XY, Watta P, Hassoun M (2009) Analysis of a plurality voting-based combination of classifiers. Neural Process Lett 29(2): 89–107. doi:10.1007/s11063-009-9097-1

    Article  Google Scholar 

  2. Wang Z, Chen SC, Xue H, Pan ZS (2010) A Novel regularization learning for single-view patterns: multi-view discriminative regularization. Neural Process Lett 31(3): 159–175. doi:10.1007/s11063-010-9132-2

    Article  Google Scholar 

  3. Valle C, Saravia F, Allende H, Monge R, Fernandez C (2010) Parallel approach for ensemble learning with locally coupled neural networks. Neural Process Lett 32(3): 277–291. doi:10.1007/s11063-010-9157-6

    Article  Google Scholar 

  4. Aladag CH, Egrioglu E, Yolcu U (2010) Forecast combination by using artificial neural networks. Neural Process Lett 32(3): 269–276. doi:10.1007/s11063-010-9156-7

    Article  Google Scholar 

  5. Gómez-Gil P, Ramírez-Cortes JM, Pomares Hernández SE, Alarcón-Aquino V (2011) A neural network scheme for long-term forecasting of chaotic time series. Neural Process Lett 33(3): 215–233

    Article  Google Scholar 

  6. Lorrentz P, Howells WGJ, McDonald-Maier KD (2010) A novel weightless artificial neural based multi-classifier for complex classifications. Neural Process Lett 31(1): 25–44. doi:10.1007/s11063-009-9125-1

    Article  Google Scholar 

  7. Ghaderi R (2000) Arranging simple neural networks to solve complex classification problems. Surrey University, Surrey

    Google Scholar 

  8. Ghaemi M, Masoudnia S, Ebrahimpour R (2010) A new framework for small sample size face recognition based on weighted multiple decision templates. Neural Inf Process Theory Algorithms 6643/2010:470–477. doi:10.1007/978-3-642-17537-4_58

  9. Tresp V, Taniguchi M (1995) Combining estimators using non-constant weighting functions. Adv Neural Inf Process Syst:419–426

  10. Engineering T-IIoTDoE: (1994) Bias, variance and the combination of estimators: the case of linear least squares. TR Deptartment of Electrical Engineering, Technion, Haifa

    Google Scholar 

  11. Tumer K, Ghosh J (1996) Error correlation and error reduction in ensemble classifiers. Connect Sci 8(3): 385–404

    Article  Google Scholar 

  12. Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, Hoboken

    Book  MATH  Google Scholar 

  13. Jacobs RA (1997) Bias/variance analyses of mixtures-of-experts architectures. Neural Comput 9(2): 369–383

    Article  MATH  Google Scholar 

  14. Hansen JV (2000) Combining predictors: meta machine learning methods and bias/variance & ambiguity decompositions. Computer Science Deptartment, Aarhus University, Aarhus

    Google Scholar 

  15. Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140

    MathSciNet  MATH  Google Scholar 

  16. Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2): 197–227

    Google Scholar 

  17. Liu Y, Yao X (1999) Ensemble learning via negative correlation. Neural Netw 12(10): 1399–1404

    Article  Google Scholar 

  18. Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE (1991) Adaptive mixtures of local experts. Neural Comput 3(1): 79–87

    Article  Google Scholar 

  19. Islam MM, Yao X, Nirjon SMS, Islam MA, Murase K (2008) Bagging and boosting negatively correlated neural networks. Ieee Trans Syst Man Cybern B 38(3): 771–784. doi:10.1109/Tsmcb.2008.922055

    Article  Google Scholar 

  20. Ebrahimpour R, Arani SAAA, Masoudnia S (2011) Improving combination method of NCL experts using gating network. Neural Comput Appl:1–7. doi:10.1007/s00521-011-0746-8

  21. Waterhouse SR (1997) Classification and regression using mixtures of experts. Unpublished doctoral dissertation, Cambridge University

  22. Waterhouse S, Cook G (1997) Ensemble methods for phoneme classification. Adv Neural Inf Process Syst:800–806

  23. Avnimelech R, Intrator N (1999) Boosted mixture of experts: an ensemble learning scheme. Neural Comput 11(2): 483–497

    Article  Google Scholar 

  24. Liu Y, Yao X (1999) Simultaneous training of negatively correlated neural networks in an ensemble. Ieee Trans Syst Man Cybern B 29(6): 716–725

    Article  Google Scholar 

  25. Ueda N, Nakano R (1996) Generalization error of ensemble estimators. Proc Int Conf Neural Netw 91: 90–95

    Google Scholar 

  26. Brown G, Wyatt JM (2003) Negative correlation learning and the ambiguity family of ensemble methods. Mult Classif Syst Proc 2709: 266–275

    Article  Google Scholar 

  27. Brown G (2004) Diversity in neural network ensembles. Unpublished doctoral thesis, University of Birmingham, Birmingham, UK

  28. Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inf Fusion 6(1): 5–20

    Article  Google Scholar 

  29. Chen H (2008) Diversity and regularization in neural network ensembles. PhD thesis, School of Computer Science, University of Birmingham

  30. Hansen JV (2000) Combining predictors: Meta machine learning methods and bias/variance & ambiguity decompositions. Unpublished doctoral thesis, Computer Science Deptartment, Aarhus University, Aarhus

  31. Jacobs RA, Jordan MI, Barto AG (1991) Task Decomposition through competition in a modular connectionist architecture: the what and where vision tasks. Cogn Sci 15(2): 219–250

    Article  Google Scholar 

  32. Dailey MN, Cottrell GW (1999) Organization of face and object recognition in modular neural network models. Neural Netw 12(7–8): 1053–1074

    Article  Google Scholar 

  33. Ebrahimpour R, Kabir E, Yousefi MR (2007) Face detection using mixture of MLP experts. Neural Process Lett 26(1): 69–82. doi:10.1007/s11063-007-9043-z

    Article  Google Scholar 

  34. Rokach L (2010) Pattern classification using ensemble methods, vol 75. World Scientific Pub Co Inc., Singapore

    Google Scholar 

  35. Ebrahimpour R, Nikoo H, Masoudnia S, Yousefi MR, Ghaemi MS (2011) Mixture of MLP-experts for trend forecasting of time series: A case study of the Tehran stock exchange. Int J Forecast 27(3): 804–816

    Article  Google Scholar 

  36. Xing HJ, Hua BG (2008) An adaptive fuzzy c-means clustering-based mixtures of experts model for unlabeled data classification. Neurocomputing 71(4-6): 1008–1021. doi:10.1016/j.neucom.2007.02.010

    Article  Google Scholar 

  37. Ebrahimpour R, Kabir E, Yousefi MR (2008) Teacher-directed learning in view-independent face recognition with mixture of experts using overlapping eigenspaces. Comput Vis Image Underst 111(2): 195–206. doi:10.1016/j.cviu.2007.10.003

    Article  Google Scholar 

  38. Ubeyli ED (2009) Modified mixture of experts employing eigenvector methods and Lyapunov exponents for analysis of electroencephalogram signals. Expert Syst 26(4): 339–354. doi:10.1111/j.1468-0394.2009.00490.x

    Article  Google Scholar 

  39. Asuncion A, Newman DJ (2007) UCI Machine Learning Repository [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California. School of Information and Computer Science

  40. Pepe MS (2004) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Ebrahimpour.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Masoudnia, S., Ebrahimpour, R. & Arani, S.A.A.A. Incorporation of a Regularization Term to Control Negative Correlation in Mixture of Experts. Neural Process Lett 36, 31–47 (2012). https://doi.org/10.1007/s11063-012-9221-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-012-9221-5

Keywords

Navigation