Trade-Off Between Diversity and Accuracy in Ensemble Generation

  • Arjun Chandra
  • Huanhuan Chen
  • Xin Yao
Part of the Studies in Computational Intelligence book series (SCI, volume 16)

Abstract

Ensembles of learning machines have been formally and empirically shown to outperform (generalise better than) single learners in many cases. Evidence suggests that ensembles generalise better when they constitute members which form a diverse and accurate set. Diversity and accuracy are hence two factors that should be taken care of while designing ensembles in order for them to generalise better. There exists a trade-off between diversity and accuracy. Multi-objective evolutionary algorithms can be employed to tackle this issue to good effect. This chapter includes a brief overview of ensemble learning in general and presents a critique on the utility of multi-objective evolutionary algorithms for their design. Theoretical aspects of a committee of learners viz. the bias-variance-covariance decomposition and ambiguity decomposition are further discussed in order to support the importance of having both diversity and accuracy in ensembles. Some recent work and experimental results, considering classification tasks in particular, based on multi-objective learning of ensembles are then presented as we examine ensemble formation using neural networks and kernel machines.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    H. A. Abbass. A memetic pareto evolutionary approach to artificial neural networks. In Proceedings of the 14th Australian Joint Conference on Artificial Intelligence, pages 1–12, Berlin, 2000. Springer-Verlag.Google Scholar
  2. [2]
    H. A. Abbass. Pareto neuro-ensemble. In 16th Australian Joint Conference on Artificial Intelligence, pages 554–566, Perth, Australia, 2003. Springer.Google Scholar
  3. [3]
    H. A. Abbass. Pareto neuro-evolution: Constructing ensemble of neural networks using multi-objective optimization. In The IEEE 2003 Conference on Evolutionary Computation, volume 3, pages 2074–2080. IEEE Press, 2003.Google Scholar
  4. [4]
    H. A. Abbass. Speeding up backpropagation using multiobjective evolutionary algorithms. Neural Computation, 15(11):2705–2726, November 2003.MATHCrossRefGoogle Scholar
  5. [5]
    H. A. Abbass, R. Sarker, and C. Newton. Pde: A pareto-frontier differential evolution approach for multi-objective optimization problems. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC2001), volume 2, pages 971–978. IEEE Press, 2001.Google Scholar
  6. [6]
    E. Bauer and R. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1–2):105–139, 1999.CrossRefGoogle Scholar
  7. [7]
    C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.Google Scholar
  8. [8]
    A. Blum and R. L. Rivest. Training a 3-node neural network is NP-complete. In Machine Learning: From Theory to Applications, pages 9–28, 1993.Google Scholar
  9. [9]
    E. Boers, M. Borst, and I. Sprinkhuizen-Kuyper. Evolving artificial neural networks using the “baldwin effect”. Technical Report 95–14, Leiden Unversity, Deptartment of Computer Science, The Netherlands, 1995.Google Scholar
  10. [10]
    L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.MATHMathSciNetGoogle Scholar
  11. [11]
    L. Breiman. Using adaptive bagging to debias regressions. Technical Report 547, University of California, Berkeley, 1999.Google Scholar
  12. [12]
    G. Brown. Diversity in Neural Network Ensembles. PhD thesis, School of Computer Science, University of Birmingham, 2004.Google Scholar
  13. [13]
    G. Brown, J. Wyatt, R. Harris, and X. Yao. Diversity creation methods: A survey and categorisation. Journal of Information Fusion (Special issue on Diversity in Multiple Classifier Systems), 6:5–20, March 2005.Google Scholar
  14. [14]
    G. Brown and J. L. Wyatt. The use of the ambiguity decomposition in neural network ensemble learning methods. In T. Fawcett and N. Mishra, editors, 20th International Conference on Machine Learning (ICML’03), Washington DC, USA, August 2003.Google Scholar
  15. [15]
    A. Chandra. Evolutionary approach to tackling the trade-off between diversity and accuracy in neural network ensembles. Technical report, School of Computer Science, The University of Birmingham, UK, April 2004.Google Scholar
  16. [16]
    A. Chandra. Evolutionary framework for the creation of diverse hybrid ensembles for better generalisation. Master's thesis, School of Computer Science, The University of Birmingham, Birmingham, UK, September 2004.Google Scholar
  17. [17]
    A. Chandra and X. Yao. DIVACE: Diverse and Accurate Ensemble Learning Algorithm. In Proc. 5th Intl. Conference on Intelligent Data Engineering and Automated Learning (LNCS 3177), pages 619–625, Exeter, UK, August 2004. Springer-Verlag.Google Scholar
  18. [18]
    A. Chandra and X. Yao. Evolutionary framework for the construction of diverse hybrid ensembles. In M. Verleysen, editor, Proc. 13th European Symposium on Artificial Neural Networks, pages 253–258, Brugge, Belgium, April 2005. d-side.Google Scholar
  19. [19]
    A. Chandra and X. Yao. Evolving hybrid ensembles of learning machines for better generalisation. Neurocomputing (submitted), 2005.Google Scholar
  20. [20]
    A. Chandra and X. Yao. Ensemble learning using multi-objective evolutionary algorithms. Journal of Mathematical Modelling and Algorithms (to appear), 2006.Google Scholar
  21. [21]
    P. J. Darwen and X. Yao. Every niching method has its niche: Fitness sharing and implicit sharing compared. In Proc. of the 4th International Conference on Parallel Problem Solving from Nature (PPSN-IV), (LNCS-1141), pages 398–407, Berlin, September 1996. Springer-Verlag.CrossRefGoogle Scholar
  22. [22]
    R. de Albuquerque Teixeira, A. P. Braga, R. H. Takahashi, and R. R. Saldanha. Improving generalization of mlps with multi-objective optimization. Neurocomputing, 35:189–194, 2000.MATHCrossRefGoogle Scholar
  23. [23]
    K. Deb. Multi-Objective Optimization Using Evolutionary Algorithms. Chichester, UK : Wiley, 2001.MATHGoogle Scholar
  24. [24]
    T. G. Dietterich. Machine-learning research: Four current directions. The AI Magazine, 18(4):97–136, 1998.Google Scholar
  25. [25]
    T. G. Dietterich and G. Bakiri. Error-correcting output codes: a general method for improving multiclass inductive learning programs. In T. L. Dean and K. McKeown, editors, Proceedings of the Ninth AAAI National Conference on Artificial Intelligence, pages 572–577, Menlo Park, CA, 1991. AAAI Press.Google Scholar
  26. [26]
    P. Domingos. A unified bias-variance decomposition and its applications. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 231–238, Stanford, CA, USA, 2000.Google Scholar
  27. [27]
    S. Forrest, R. E. Smith, B. Javornik, and A. S. Perelson. Using genetic algorithms to explore pattern recognition in the immune system. Evolutionary Computation, 1(3):191–211, 1993.Google Scholar
  28. [28]
    Y. Freund and R. Schapire. A short introduction to boosting. Journal of Japanese Society for Artificial Intelligence, 14(5):771–780, 1999.Google Scholar
  29. [29]
    Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, 1996.Google Scholar
  30. [30]
    J. Friedman. On bias, variance, 0/1 loss and the curse of dimensionality. Data Mining and Knowledge Discovery, 1:55–77, 1997.CrossRefGoogle Scholar
  31. [31]
    S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1–5, 1992.Google Scholar
  32. [32]
    S. Gutta, J. Huang, I. F. Imam, and H. Wechsler. Face and hand gesture recognition using hybrid classifiers. In Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (FG ’96), pages 164–170. IEEE Computer Society, 1996.Google Scholar
  33. [33]
    L. K. Hansen and P. Salamon. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10):993–1001, 1990.CrossRefGoogle Scholar
  34. [34]
    T. Heskes. Bias/variance decomposition for likelihood-based estimators. Neural Computation, 10:1425–1433, 1998.CrossRefGoogle Scholar
  35. [35]
    T. K. Ho, J. J. Hull, and S. N. Srihari. Decision combination in multiple classifier systems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(1):66–75, January 1994.CrossRefGoogle Scholar
  36. [36]
    J. Horn and N. Nafpliotis. Multiobjective Optimization using the Niched Pareto Genetic Algorithm. Technical Report IlliGAl Report 93005, University of Illinois, Urbana-Champaign, July 1993.Google Scholar
  37. [37]
    Y. Jin, T. Okabe, and B. Sendhoff. Applications of Evolutionary Multi-objective Optimization (Advances in Natural Computation), volume 1, chapter Evolutionary multi-objective approach to constructing neural network ensembles for regression, pages 653–672. World Scientific, 2004.Google Scholar
  38. [38]
    Y. Jin, T. Okabe, and B. Sendhoff. Neural Network Regularization and Ensembling Using Multi-objective Evolutionary Algorithms. In 2004 Congress on Evolutionary Computation (CEC’2004), volume 1, pages 1–8, Portland, Oregon, USA, June 2004. IEEE Service Center.Google Scholar
  39. [39]
    E. Kong and T. Dietterich. Error - correcting output coding correct bias and variance. In Proceedings of The XII International Conference on Machine Learning, pages 313–321, San Francisco, CA, USA, 1995.Google Scholar
  40. [40]
    K. Kottathra and Y. Attikiouzel. A novel multicriteria optimization algorithm for the structure determination of multilayer feedforward neural networks. Journal of Network and Computer Applications, 19:135–147, 1996.CrossRefGoogle Scholar
  41. [41]
    A. Krogh and J. Vedelsby. Neural network ensembles, cross validation, and active learning. Neural Information Processing Systems, 7:231–238, 1995.Google Scholar
  42. [42]
    M. A. Kupinski and M. A. Anastasio. Multiobjective genetic optimization of diagnostic classifiers with implications for generating receiver operating characteristic curves. IEEE Transactions on Medical Imaging, 18(8):675–685, August 1999.CrossRefGoogle Scholar
  43. [43]
    W. B. Langdon, S. J. Barrett, and B. F. Buxton. Combining decision trees and neural networks for drug discovery. In Genetic Programming, Proceedings of the 5th European Conference, EuroGP 2002, pages 60–70, Kinsale, Ireland, 3–5 April 2002.Google Scholar
  44. [44]
    B. Littlewood and D. R. Miller. Conceptual modeling of coincident failures in multiversion software. IEEE Transactions on Software Engineering, 15(12):1596–1614, December 1989.CrossRefMathSciNetGoogle Scholar
  45. [45]
    Y. Liu and X. Yao. Ensemble learning via negative correlation. Neural Networks, 12(10):1399–1404, 1999.CrossRefGoogle Scholar
  46. [46]
    Y. Liu and X. Yao. Learning and evolution by minimization of mutual information. In J. J. M. Guervós, P. Adamidis, H.-G. Beyer, J.-L. Fernández-Villacañas, and H.-P. Schwefel, editors, Parallel Problem Solving from Nature VII (PPSN-2002), volume 2439 of LNCS, pages 495–504, Granada, Spain, 2002. Springer Verlag.Google Scholar
  47. [47]
    Y. Liu, X. Yao, and T. Higuchi. Evolutionary ensembles with negative correlation learning. IEEE Transactions on Evolutionary Computation, 4(4):380, November 2000.Google Scholar
  48. [48]
    R. Meir and G. Raetsch. An introduction to boosting and leveraging. Advanced lectures on machine learning, pages 118–183, 2003.Google Scholar
  49. [49]
    C. E. Metz. Basic principles of roc analysis. Seminars in Neuclear Medicine, 8(4):283–298, 1978.Google Scholar
  50. [50]
    D. Michie, D. Spiegelhalter, and C. Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood Limited, 1994.Google Scholar
  51. [51]
    D. Opitz and R. Maclin. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11:169–198, 1999.MATHGoogle Scholar
  52. [52]
    D. W. Opitz and J. W. Shavlik. Generating accurate and diverse members of a neural-network ensemble. Neural Information Processing Systems, 8:535–541, 1996.Google Scholar
  53. [53]
    T. Schnier and X. Yao. Using negative correlation to evolve fault-tolerant circuits. In Proceedings of the 5th International Conference on Evolvable Systems: From Biology to Hardware (ICES’2003), pages 35–46. Springer-Verlag. Lecture Notes in Computer Science, Vol. 2606, March 2003.Google Scholar
  54. [54]
    A. Sharkey. Multi-Net Systems, chapter Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, pages 1–30. Springer-Verlag, 1999.Google Scholar
  55. [55]
    A. Sharkey and N. Sharkey. Combining diverse neural networks. The Knowledge Engineering Review, 12(3):231–247, 1997.CrossRefGoogle Scholar
  56. [56]
    K. O. Stanley and R. Miikkulainen. Evolving neural networks through augmenting topologies. Evolutionary Computation, 10(2):99–127, 2002.CrossRefGoogle Scholar
  57. [57]
    K. Tumer and J. Ghosh. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition, 29(2):341–348, February 1996.CrossRefGoogle Scholar
  58. [58]
    N. Ueda and R. Nakano. Generalization error of ensemble estimators. In Proceedings of International Conference on Neural Networks, pages 90–95, 1996.Google Scholar
  59. [59]
    G. Valentini and T. G. Dietterich. Bias-variance analysis of support vector machines for the development of svm-based ensemble methods. Journal of Machine Learning Research, 5(1):725–775, 2004.Google Scholar
  60. [60]
    G. Valentini and F. Masulli. Ensembles of learning machines. In R. Tagliaferri and M. Marinaro, editors, Neural Nets WIRN Vietri-2002 (LNCS 2486), pages 3–19. Springer-Verlag, June 2002.Google Scholar
  61. [61]
    W. Wang, P. Jones, and D. Partridge. Diversity between neural networks and decision trees for building multiple classifier systems. In Proc. Int. Workshop on Multiple Classifier Systems (LNCS 1857), pages 240–249, Calgiari, Italy, June 2000. Springer.Google Scholar
  62. [62]
    W. Wang, D. Partridge, and J. Etherington. Hybrid ensembles and coincidentfailure diversity. In Proceedings of the International Joint Conference on Neural Networks, (2001), volume 4, pages 2376–2381, Washington, USA, July 2001. IEEE Press.Google Scholar
  63. [63]
    K. Woods, W. Kegelmeyer, and K. Bowyer. Combination of multiple classiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:405–410, 1997.CrossRefGoogle Scholar
  64. [64]
    X. Yao. Evolving artificial neural networks. In Proceedings of the IEEE, volume 87, pages 1423–1447. IEEE, September 1999.Google Scholar
  65. [65]
    X. Yao and Y. Liu. Evolving neural network ensembles by minimization of mutual information. International Journal of Hybrid Intelligent Systems, 1(1), January 2004.Google Scholar
  66. [66]
    W. Yates and D. Partridge. Use of methodological diversity to improve neural network generalization. Neural Computing and Applications, 4(2):114–128, 1996.CrossRefGoogle Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • Arjun Chandra
    • 1
  • Huanhuan Chen
    • 1
  • Xin Yao
    • 1
  1. 1.The Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA)School of Computer Science, The University of Birmingham EdgbastonBirminghamUK

Personalised recommendations