Advertisement

Ensembles of Learning Machines

  • Giorgio Valentini
  • Francesco Masulli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2486)

Abstract

Ensembles of learning machines constitute one of the main current directions in machine learning research, and have been applied to a wide range of real problems. Despite of the absence of an unified theory on ensembles, there are many theoretical reasons for combining multiple learners, and an empirical evidence of the effectiveness of this approach. In this paper we present a brief overview of ensemble methods, explaining the main reasons why they are able to outperform any single classifier within the ensemble, and proposing a taxonomy based on the main ways base classifiers can be generated or combined together.

Keywords

Ensemble methods Combining Multiple Learners 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    D. Aha and R. Bankert. Cloud classification using error-correcting output codes. In Artificial Intelligence Applications: Natural Science, Agriculture and Environmental Science, volume 11, pages 13–28. 1997.Google Scholar
  2. [2]
    K. M. Ali and M. J. Pazzani. Error reduction through learning multiple descriptions. Machine Learning, 24(3):173–202, 1996.Google Scholar
  3. [3]
    E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass to binary: a unifying approach for margin classifiers. Journal of Machine Learning Research, 1:113–141, 2000.CrossRefMathSciNetGoogle Scholar
  4. [4]
    E. Alpaydin and E. Mayoraz. Learning error-correcting output codes from data. In ICANN’99, pages 743–748, Edinburgh, UK, 1999.Google Scholar
  5. [5]
    R. Anand, G. Mehrotra, C. K. Mohan, and S. Ranka. Efficient classification for multiclass problems using modular neural networks. IEEE Transactions on Neural Networks, 6:117–124, 1995.CrossRefGoogle Scholar
  6. [6]
    G. Bakiri and T. G. Dietterich. Achieving high accuracy text-to-speech with machine learning. In Data mining in speech synthesis. 1999.Google Scholar
  7. [7]
    R. Battiti and A. M. Colla. Democracy in neural nets: Voting schemes for classification. Neural Networks, 7:691–707, 1994.CrossRefGoogle Scholar
  8. [8]
    E. Bauer and R.. Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting and variants. Machine Learning, 36(1/2):525–536, 1999.CrossRefGoogle Scholar
  9. [9]
    J. Benediktsson, J. Sveinsson, O. Ersoy, and P. Swain. Parallel consensual neural networks. IEEE Transactions on Neural Networks, 8:54–65, 1997.CrossRefGoogle Scholar
  10. [10]
    J. Benediktsson and P. Swain. Consensus theoretic classification methods. IEEE Transactions on Systems, Man and Cybernetics, 22:688–704, 1992.zbMATHCrossRefGoogle Scholar
  11. [11]
    A. Berger. Error correcting output coding for text classification. In IJCAI’99: Workshop on machine learning for information filtering, 1999.Google Scholar
  12. [12]
    C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.Google Scholar
  13. [13]
    A. Blum and R.L. Rivest. Training a 3-node neural network is NP-complete. In Proc. of the 1988 Workshop ob Computational Learning Learning Theory, pages 9–18, San Francisco, CA, 1988. Morgan Kaufmann.Google Scholar
  14. [14]
    R. C. Bose and D. K. Ray-Chauduri. On a class of error correcting binary group codes. Information and Control, (3):68–79, 1960.Google Scholar
  15. [15]
    L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.zbMATHMathSciNetGoogle Scholar
  16. [16]
    L. Breiman. Arcing classifiers. The Annals of Statistics, 26(3):801–849, 1998.zbMATHCrossRefMathSciNetGoogle Scholar
  17. [17]
    L. Breiman. Prediction games and arcing classifiers. Neural Computation, 11(7):1493–1517, 1999.CrossRefGoogle Scholar
  18. [18]
    M. Breukelen van, R.P.W. Duin, D. Tax, and J.E. Hartog den. Combining classifiers fir the recognition of handwritten digits. In Ist IAPR TC1 Workshop on Statistical Techniques in Pattern Recognition, pages 13–18, Prague, Czech republic, 1997.Google Scholar
  19. [19]
    G. J. Briem, J. A. Benediktsson, and J. R. Sveinsson. Boosting. Bagging and Consensus Based Classification of Multisource Remote Sensing Data. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 279–288. Springer-Verlag, 2001.Google Scholar
  20. [20]
    D.. Chen. Statistical estimates for Kleinberg’s method of Stochastic Discrimination. PhD thesis, The State University of New York, Buffalo, USA, 1998.Google Scholar
  21. [21]
    K. J. Cherkauker. Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In Chan P., editor, Working notes of the AAAI Workshop on Integrating Multiple Learned Models, pages 15–21. 1996.Google Scholar
  22. [22]
    S. Cho and J. Kim. Combining multiple neural networks by fuzzy integral and robust classification. IEEE Transactions on Systems, Man and Cybernetics, 25:380–384, 1995.CrossRefGoogle Scholar
  23. [23]
    S. Cho and J. Kim. Multiple network fusion using fuzzy logic. IEEE Transactions on Neural Networks, 6:497–501, 1995.CrossRefGoogle Scholar
  24. [24]
    S. Cohen and N. Intrator. A Hybrid Projection Based and Radial Basis Function Architecture. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 147–156. Springer-Verlag, 2000.Google Scholar
  25. [25]
    S. Cohen and N. Intrator. Automatic Model Selection in a Hybrid Percep-tron/Radial Network. In Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 349–358. Springer-Verlag, 2001.Google Scholar
  26. [26]
    N. C. de Condorcet. Essai sur l’ application de l’ analyse á la probabilité des decisions rendues á la pluralité des voix. Imprimerie Royale, Paris, 1785.Google Scholar
  27. [27]
    K. Crammer and Y. Singer. On the learnability and design of output codes for multiclass problems. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 35–46, 2000.Google Scholar
  28. [28]
    T. G. Dietterich. Ensemble methods in machine learning. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 1–15. Springer-Verlag, 2000.Google Scholar
  29. [29]
    T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision tress: Bagging, boosting and randomization. Machine Learning, 40(2):139–158, 2000.CrossRefGoogle Scholar
  30. [30]
    T. G. Dietterich and G. Bakiri. Error-correcting output codes: A general method for improving multiclass inductive learning programs. In Proceedings of AAAI-91, pages 572–577. AAAI Press / MIT Press, 1991.Google Scholar
  31. [31]
    T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, (2):263–286, 1995.Google Scholar
  32. [32]
    H. Drucker and C. Cortes. Boosting decision trees. In Advances in Neural Information Processing Systems, volume 8. 1996.Google Scholar
  33. [33]
    H. Drucker, C. Cortes, L. Jackel, Y. LeCun, and V. Vapnik. Boosting and other ensemble methods. Neural Computation, 6(6):1289–1301, 1994.zbMATHCrossRefGoogle Scholar
  34. [34]
    R. P. W. Duin and D. M. J. Tax. Experiments with Classifier Combination Rules. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 16–29. Springer-Verlag, 2000.Google Scholar
  35. [35]
    B. Efron and R. Tibshirani. An introduction to the Bootstrap. Chapman and Hall, New York, 1993.zbMATHGoogle Scholar
  36. [36]
    S. E. Fahlman and C. Lebiere. The cascade-correlation learning architecture. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems, volume 2, pages 524–532. Morgan Kauffman, San Mateo, CA, 1990.Google Scholar
  37. [37]
    E. Filippi, M. Costa, and E. Pasero. Multi-layer perceptron ensembles for increased performance and fault-tolerance in pattern recognition tasks. In IEEE International Conference on Neural Networks, pages 2901–2906, Orlando, Florida, 1994.Google Scholar
  38. [38]
    Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, 1995.zbMATHCrossRefMathSciNetGoogle Scholar
  39. [39]
    Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and Systems Sciences, 55(1):119–139, 1997.zbMATHCrossRefMathSciNetGoogle Scholar
  40. [40]
    Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, pages 148–156. Morgan Kauffman, 1996.Google Scholar
  41. [41]
    J. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 39(5), 2001.Google Scholar
  42. [42]
    J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 38(2):337–374, 2000.CrossRefMathSciNetGoogle Scholar
  43. [43]
    J. H. Friedman. On bias, variance, 0/1 loss and the curse of dimensionality. Data Mining and Knowledge Discovery, 1:55–77, 1997.CrossRefGoogle Scholar
  44. [44]
    C. Furlanello and S. Merler. Boosting of Tree-based Classifiers for Predictive Risk Modeling in GIS. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 857 of Lecture Notes in Computer Science, pages 220–229. Springer-Verlag, 2000.Google Scholar
  45. [45]
    S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias-variance dilemma. Neural Computation, 4(1):1–58, 1992.CrossRefGoogle Scholar
  46. [46]
    R. Ghani. Using error correcting output codes for text classification. In ICML 2000: Proceedings of the 17th International Conference on Machine Learning, pages 303–310, San Francisco, US, 2000. Morgan Kaufmann Publishers.Google Scholar
  47. [47]
    G. Giacinto and F. Roli. Dynamic Classifier Fusion. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 177–189. Springer-Verlag, 2000.Google Scholar
  48. [48]
    G. Giacinto and F. Roli. An approach to automatic design of multiple classifier systems. Pattern Recognition Letters, 22:25–33, 2001.zbMATHCrossRefGoogle Scholar
  49. [49]
    T. Hastie and R. Tibshirani. Generalized Additive Models. Chapman and Hall, London, 1990.zbMATHGoogle Scholar
  50. [50]
    T. Hastie and R. Tibshirani. Classification by pairwise coupling. The Annals of Statistics, 26(1):451–471, 1998.zbMATHMathSciNetGoogle Scholar
  51. [51]
    T. K. Ho. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998.CrossRefGoogle Scholar
  52. [52]
    T. K. Ho. Complexity of Classification Problems ans Comparative Advantages of Combined Classifiers. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 97–106. Springer-Verlag, 2000.Google Scholar
  53. [53]
    T. K. Ho. Data Complexity Analysis for Classifiers Combination. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 53–67, Berlin, 2001. Springer-Verlag.Google Scholar
  54. [54]
    T. K. Ho, J. J. Hull, and S. N. Srihari. Decision combination in multiple classifiers. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(4):405–410, 1997.CrossRefGoogle Scholar
  55. [55]
    K. Hornik. Approximation capabilities of multilayer feedforward networks. Neural Networks, 4:251–257, 1991.CrossRefGoogle Scholar
  56. [56]
    Y. S. Huang and Suen. C. Y. Combination of multiple experts for the recognition of unconstrained handwritten numerals. IEEE Trans. on Pattern Analysis and Machine Intelligence, 17:90–94, 1995.CrossRefGoogle Scholar
  57. [57]
    L. Hyafil and R. L. Rivest. Constructing optimal binary decision tree is np-complete. Information Processing Letters, 5(1):15–17, 1976.zbMATHCrossRefMathSciNetGoogle Scholar
  58. [58]
    S. Impedovo and A. Salzo. A New Evaluation Method for Expert Combination in Multi-expert System Designing. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 230–239. Springer-Verlag, 2000.Google Scholar
  59. [59]
    R.A. Jacobs. Methods for combining experts probability assessment. Neural Computation, 7:867–888, 1995.CrossRefGoogle Scholar
  60. [60]
    R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3(1):125–130, 1991.CrossRefGoogle Scholar
  61. [61]
    A. Jain, R. Duin, and J. Mao. Statistical pattern recognition: a review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:4–37, 2000.CrossRefGoogle Scholar
  62. [62]
    G. James. Majority vote classifiers: theory and applications. PhD thesis, Department of Statistics-Stanford University, Stanford, CA, 1998.Google Scholar
  63. [63]
    C. Ji and S. Ma. Combinination of weak classifiers. IEEE Trans. Neural Networks, 8(1):32–42, 1997.CrossRefGoogle Scholar
  64. [64]
    M. Jordan and R. Jacobs. Hierarchies of adaptive experts. In Advances in Neural Information Processing Systems, volume 4, pages 985–992. Morgan Kauffman, San Mateo, CA, 1992.Google Scholar
  65. [65]
    M. I. Jordan and R. A. Jacobs. Hierarchical mixture of experts and the em algorithm. Neural Computation, 6:181–214, 1994.CrossRefGoogle Scholar
  66. [66]
    J. M. Keller, P. Gader, H. Tahani, J. Chiang, and M. Mohamed. Advances in fuzzy integratiopn for pattern recognition. Fuzzy Sets and Systems, 65:273–283, 1994.CrossRefMathSciNetGoogle Scholar
  67. [67]
    F. Kimura and M. Shridar. Handwritten Numerical Recognition Based on Multiple Algorithms. Pattern Recognition, 24(10):969–983, 1991.CrossRefGoogle Scholar
  68. [68]
    J. Kittler. Combining classifiers: a theoretical framework. Pattern Analysis and Applications, (1):18–27, 1998.Google Scholar
  69. [69]
    J. Kittler, M. Hatef, R. P. W. Duin, and aiMatas J. On combining classifiers. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(3):226–239, 1998.CrossRefGoogle Scholar
  70. [70]
    J. Kittler and F. (editors) Roli. Multiple Classifier Systems, Proc. of 1st International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, 2000.Google Scholar
  71. [71]
    J. Kittler and F. (editors) Roli. Multiple Classifier Systems, Proc. of 2nd International Workshop, MCS2001, Cambridge, UK. Springer-Verlag, Berlin, 2001.Google Scholar
  72. [72]
    E. M. Kleinberg. On the Algorithmic Implementation of Stochastic Discrimination. IEEE Transactions on Pattern Analysis and Machine Intelligence.Google Scholar
  73. [73]
    E. M. Kleinberg. Stochastic Discrimination. Annals of Mathematics and Artificial Intelligence, pages 207–239, 1990.Google Scholar
  74. [74]
    E. M. Kleinberg. An overtraining-resistant stochastic modeling method for pattern recognition. Annals of Statistics, 4(6):2319–2349, 1996.MathSciNetGoogle Scholar
  75. [75]
    E. M. Kleinberg. A Mathematically Rigorous Foundation for Supervised Learning. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 67–76. Springer-Verlag, 2000.Google Scholar
  76. [76]
    J. Kolen and Pollack J. Back propagation is sensitive to initial conditions. In Advances in Neural Information Processing Systems, volume 3, pages 860–867. Morgan Kauffman, San Francisco, CA, 1991.Google Scholar
  77. [77]
    E. Kong and T. G. Dietterich. Error-correcting output coding correct bias and variance. In The XII International Conference on Machine Learning, pages 313–321, San Francisco, CA, 1995. Morgan Kauffman.Google Scholar
  78. [78]
    A. Krogh and J. Vedelsby. Neural networks ensembles, cross validation and active learning. In D.S. Touretzky, G. Tesauro, and T.K. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 107–115. MIT Press, Cambridge, MA, 1995.Google Scholar
  79. [79]
    L. I. Kuncheva. Genetic algorithm for feature selection for parallel classifiers. Information Processing Letters, 46:163–168, 1993.zbMATHCrossRefGoogle Scholar
  80. [80]
    L. I. Kuncheva. An application of OWA operators to the aggragation of multiple classification decisions. In The Ordered Weighted Averaging operators. Theory and Applciations, pages 330–343. Kluwer Academic Publisher, USA, 1997.Google Scholar
  81. [81]
    L. I. Kuncheva, J. C. Bezdek, and R. P. W. Duin. Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognition, 34(2):299–314, 2001.zbMATHCrossRefGoogle Scholar
  82. [82]
    L. I. Kuncheva, F. Roli, G. L. Marcialis, and C. A. Shipp. Complexity of Data Subsets Generated by the Random Subspace Method: An Experimental Investigation. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 349–358. Springer-Verlag, 2001.Google Scholar
  83. [83]
    L. I. Kuncheva and C. J. Whitaker. Feature Subsets for Classifier Combination: An Enumerative Experiment. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 228–237. Springer-Verlag, 2001.Google Scholar
  84. [84]
    L. I. Kuncheva et al. Is independence good for combining classifiers? In Proc. of 15th Int. Conf. on Pattern Recognition, Barcelona, Spain, 2000.Google Scholar
  85. [85]
    L. Lam. Classifier combinations: Implementations and theoretical issues. In Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 77–86. Springer-Verlag, 2000.Google Scholar
  86. [86]
    L. Lam and C. Sue. Optimal combination of pattern classifiers. Pattern Recognition Letters, 16:945–954, 1995.CrossRefGoogle Scholar
  87. [87]
    L. Lam and C. Sue. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Transactions on Systems, Man and Cybernetics, 27(5):553–568, 1997.CrossRefGoogle Scholar
  88. [88]
    M. Li and P Vitanyi. An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, Berlin, 1993.zbMATHGoogle Scholar
  89. [89]
    L. Mason, P. Bartlett, and J. Baxter. Improved generalization through explicit optimization of margins. Machine Learning, 2000.Google Scholar
  90. [90]
    F. Masulli and G. Valentini. Comparing decomposition methods for classification. In R. J. Howlett and L. C. Jain, editors, KES’2000, Fourth International Conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies, pages 788–791, Piscataway, NJ, 2000. IEEE.Google Scholar
  91. [91]
    F. Masulli and G. Valentini. Effectiveness of error correcting output codes in multiclass learning problems. In Lecture Notes in Computer Science, volume 1857, pages 107–116. Springer-Verlag, Berlin, Heidelberg, 2000.Google Scholar
  92. [92]
    F. Masulli and G. Valentini. Dependence among Codeword Bits Errors in ECOC Learning Machines: an Experimental Analysis. In Lecture Notes in Computer Science, volume 2096, pages 158–167. Springer-Verlag, Berlin, 2001.Google Scholar
  93. [93]
    F. Masulli and G. Valentini. Quantitative Evaluation of Dependence among Outputs in ECOC Classifiers Using Mutual Information Based Measures. In K. Marko and P. Webos, editors, Proceedings of the International Joint Conference on Neural Networks IJCNN’01, volume 2, pages 784–789, Piscataway, NJ, USA, 2001. IEEE.Google Scholar
  94. [94]
    E. Mayoraz and M. Moreira. On the decomposition of polychotomies into dichotomies. In The XIV International Conference on Machine Learning, pages 219–226, Nashville, TN, July 1997.Google Scholar
  95. [95]
    S. Merler, C. Furlanello, B. Larcher, and A. Sboner. Tuning Cost-Sensitive Boosting and its Application to Melanoma Diagnosis. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 32–42. Springer-Verlag, 2001.Google Scholar
  96. [96]
    M. Moreira and E. Mayoraz. Improved pairwise coupling classifiers with correcting classifiers. In C. Nedellec and C. Rouveirol, editors, Lecture Notes in Artificial Intelligence, Vol. 1398, pages 160–171, Berlin, Heidelberg, New York, 1998.Google Scholar
  97. [97]
    D.W. Opitz and J. W. Shavlik. Actively searching for an effective neural network ensemble. Connection Science, 8(3/4):337–353, 1996.CrossRefGoogle Scholar
  98. [98]
    N. C. Oza and K. Tumer. Input Decimation Ensembles: Decorrelation through Dimensionality Reduction. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 238–247. Springer-Verlag, 2001.Google Scholar
  99. [99]
    H. S. Park and S. W. Lee. Off-line recognition of large sets handwritten characters with multiple Hidden-Markov models. Pattern Recognition, 29(2):231–244, 1996.CrossRefGoogle Scholar
  100. [100]
    J. Park and I. W. Sandberg. Approximation and radial basis function networks. Neural Computation, 5(2):305–316, 1993.CrossRefGoogle Scholar
  101. [101]
    B. Parmanto, P. Munro, and H. Doyle. Improving committe diagnosis with resampling techniques. In D. S. Touretzky, M. Mozer, and M. Hesselmo, editors, Advances in Neural Information Processing Systems, volume 8, pages 882–888. MIT Press, Cambridge, MA, 1996.Google Scholar
  102. [102]
    B. Parmanto, P. Munro, and H. Doyle. Reducing variance of committee predition with resampling techniques. Connection Science, 8(3/4):405–416, 1996.CrossRefGoogle Scholar
  103. [103]
    D. Partridge and W. B. Yates. Engineering multiversion neural-net systems. Neural Computation, 8:869–893, 1996.CrossRefGoogle Scholar
  104. [104]
    M. P. Perrone and L. N. Cooper. When networks disagree: ensemble methods for hybrid neural networks. In Mammone R. J., editor, Artificial Neural Networks for Speech and Vision, pages 126–142. Chapman & Hall, London, 1993.Google Scholar
  105. [105]
    W.W. Peterson and E. J. Jr. Weldon. Error correcting codes. MIT Press, Cambridge, MA, 1972.zbMATHGoogle Scholar
  106. [106]
    J.R. Quinlan. C4.5 Programs for Machine Learning. Morgan Kauffman, 1993.Google Scholar
  107. [107]
    Y. Raviv and N. Intrator. Bootstrapping with noise: An effective regularization technique. Connection Science, 8(3/4):355–372, 1996.CrossRefGoogle Scholar
  108. [108]
    G. Rogova. Combining the results of several neural neetworks classifiers. Neural Networks, 7:777–781, 1994.CrossRefGoogle Scholar
  109. [109]
    F. Roli, G. Giacinto, and G. Vernazza. Methods for Designing Multiple Classifier Systems. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 78–87. Springer-Verlag, 2001.Google Scholar
  110. [110]
    R. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135–168, 2000.zbMATHCrossRefGoogle Scholar
  111. [111]
    R.E. Schapire. The strenght of weak learnability. Machine Learning, 5(2):197–227, 1990.Google Scholar
  112. [112]
    R.E. Schapire. A brief introduction to boosting. In Thomas Dean, editor, 16th International Joint Conference on Artificial Intelligence, pages 1401–1406. Morgan Kauffman, 1999.Google Scholar
  113. [113]
    R. E. Schapire, Y. Freund, P. Bartlett, and W. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, 1998.zbMATHCrossRefMathSciNetGoogle Scholar
  114. [114]
    R.E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297–336, 1999.zbMATHCrossRefGoogle Scholar
  115. [115]
    H. Schwenk and Y. Bengio. Training methods for adaptive boosting of neural networks. In Advances in Neural Information Processing Systems, volume 10, pages 647–653. 1998.Google Scholar
  116. [116]
    A. Sharkey, N. Sharkey, and G. Chandroth. Diverse neural net solutions to a fault diagnosis problem. Neural Computing and Applications, 4:218–227, 1996.CrossRefGoogle Scholar
  117. [117]
    A Sharkey, N. Sharkey, U. Gerecke, and G. Chandroth. The test and select approach to ensemble combination. In J. Kittler and F. Roli, editors, Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 30–44. Springer-Verlag, 2000.Google Scholar
  118. [118]
    A. Sharkey (editor). Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems. Springer-Verlag, London, 1999.zbMATHGoogle Scholar
  119. [119]
    M. Skurichina and R. P. W. Duin. Bagging, boosting and the randon subspace method for linear classifiers. Pattern Analysis and Applications. (in press).Google Scholar
  120. [120]
    M. Skurichina and R. P. W. Duin. Bagging for linear classifiers. Pattern Recognition, 31(7):909–930, 1998.CrossRefGoogle Scholar
  121. [121]
    M. Skurichina and R. P. W. Duin. Bagging and the Random Subspace Method for Redundant Feature Spaces. In Multiple Classifier Systems. Second International Workshop, MCS 2001, Cambridge, UK, volume 2096 of Lecture Notes in Computer Science, pages 1–10. Springer-Verlag, 2001.Google Scholar
  122. [122]
    C. Suen and L. Lam. Multiple classifier combination methodologies for different output levels. In Multiple Classifier Systems. First International Workshop, MCS 2000, Cagliari, Italy, volume 1857 of Lecture Notes in Computer Science, pages 52–66. Springer-Verlag, 2000.Google Scholar
  123. [123]
    K. Tumer and J. Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Science, 8(3/4):385–404, 1996.CrossRefGoogle Scholar
  124. [124]
    K. Tumer and N. C. Oza. Decimated input ensembles for improved generalization. In IJCNN-99, The IEEE-INNS-ENNS International Joint Conference on Neural Networks, 1999.Google Scholar
  125. [125]
    G. Valentini. Upper bounds on the training error of ECOC-SVM ensembles. Technical Report TR-00-17, DISI-Dipartimento di Informatica e Scienze dell’ Informazione-Universita di Genova, 2000. ftp://ftp.disi.unige.it/person/ValentiniG/papers/TR-00-17.ps.gz.
  126. [126]
    G. Valentini. Gene expression data analysis of human lymphoma using Support Vector Machines and Output Coding ensembles. Artificial Intelligence in Medicine (to appear).Google Scholar
  127. [127]
    J. Van Lint. Coding theory. Spriger Verlag, Berlin, 1971.zbMATHGoogle Scholar
  128. [128]
    D. Wang, J.M. Keller, C. A. Carson, K.K. McAdoo-Edwards, and C. W. Bailey. Use of fuzzy logic inspired features to improve bacterial recognition through classifier fusion. IEEE Transactions on Systems, Man and Cybernetics, 28B(4):583–591, 1998.Google Scholar
  129. [129]
    K. Woods, W. P. Kegelmeyer, and K. Bowyer. Combination of multiple classifiers using local accuracy estimates. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(4):405–410, 1997.CrossRefGoogle Scholar
  130. [130]
    L Xu, C Krzyzak, and C. Suen. Methods of combining multiple classifiers and their applications to handwritting recognition. IEEE Transactions on Systems, Man and Cybernetics, 22(3):418–435, 1992.CrossRefGoogle Scholar
  131. [131]
    C. Yeang et al. Molecular classification of multiple tumor types. In ISMB 2001, Proceedings of the 9th International Conference on Intelligent Systems for Molecular Biology, pages 316–322, Copenaghen, Denmark, 2001. Oxford University Press.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Giorgio Valentini
    • 1
    • 2
  • Francesco Masulli
    • 1
    • 3
  1. 1.INFM, Istituto Nazionale per la Fisica della MateriaGenovaItaly
  2. 2.DISI, Universitá di GenovaGenovaItaly
  3. 3.Dipartimento di InformaticaUniversitá di PisaPisaItaly

Personalised recommendations