Feature Selection for Ensembles Using the Multi-Objective Optimization Approach

  • Luiz S. Oliveira
  • Marisa Morita
  • Robert Sabourin
Part of the Studies in Computational Intelligence book series (SCI, volume 16)


Feature selection for ensembles has shown to be an effective strategy for ensemble creation due to its ability of producing good subsets of features, which make the classifiers of the ensemble disagree on difficult cases. In this paper we present an ensemble feature selection approach based on a hierarchical multi-objective genetic algorithm. The underpinning paradigm is the “overproduce and choose”. The algorithm operates in two levels. Firstly, it performs feature selection in order to generate a set of classifiers and then it chooses the best team of classifiers. In order to show its robustness, the method is evaluated in two different contexts: supervised and unsupervised feature selection. In the former, we have considered the problem of handwritten digit recognition and used three different feature sets and multi-layer perceptron neural networks as classifiers. In the latter, we took into account the problem of handwritten month word recognition and used three different feature sets and hidden Markov models as classifiers. Experiments and comparisons with classical methods, such as Bagging and Boosting, demonstrated that the proposed methodology brings compelling improvements when classifiers have to work with very low error rates.


Feature Selection Recognition Rate Feature Subset Feature Selection Algorithm Perform Feature Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    L. Breiman. Stacked regressions. Machine Learning, 24(1):49–64, 1996.zbMATHMathSciNetGoogle Scholar
  2. [2]
    E. Cantu-Paz. Efficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publishers, 2000.Google Scholar
  3. [3]
    D. L. Davies and D. W. Bouldin. A cluster separation measure. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1(224–227):550–554, 1979.Google Scholar
  4. [4]
    K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. John Wiley and Sons Ltd, 2nd edition, April 2002.Google Scholar
  5. [5]
    J. G. Dy and C. E. Brodley. Feature subset selection and order identification for unsupervised learning. In Proc. 17 th International Conference on Machine Learning, 2000.Google Scholar
  6. [6]
    B. Efron and Tibshirani R. An introduction to the Bootstrap. Chapman and Hall, 1993.Google Scholar
  7. [7]
    Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proc. of 13 th International Conference on Machine Learning, pages 148–156, Bary-Italy, 1996.Google Scholar
  8. [8]
    G. Fumera, F. Roli, and G. Giacinto. Reject option with multiple thresholds. Pattern Recognition, 33(12):2099–2101, 2000.CrossRefGoogle Scholar
  9. [9]
    G. Giacinto and F. Roli. Design of effective neural network ensemble for image classification purposes. Image Vision and Computing Journal, 9–10:697–705, 2001.Google Scholar
  10. [10]
    S. Gunter and H. Bunke. Creation of classifier ensembles for handwritten word recogntion using feature selection algorithms. In Proc. of 8 th IWFHR, pages 183–188, Niagara-on-the-Lake, Canada, 2002.Google Scholar
  11. [11]
    S. Hashem. Optimal linear combinations of neural networks. Neural Networks, 10(4):599–614, 1997.CrossRefGoogle Scholar
  12. [12]
    T. K. Ho. The random subspace method for constructing decision forests. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998.CrossRefGoogle Scholar
  13. [13]
    G. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection problems. In Proc. of 11 th International Conference on Machine Learning, pages 121–129, 1994.Google Scholar
  14. [14]
    J. J. Oliveira Jr., J. M. Carvalho, C. O. A. Freitas, and R. Sabourin. Evaluating NN and HMM classifiers for handwritten word recognition. In Proceedings of the 15 th Brazilian Symposium on Computer Graphics and Image Processing, pages 210–217. IEEE Computer Society, 2002.Google Scholar
  15. [15]
    Y. S. Kim, W. N. Street, and F. Menczer. Feature selection in unsupervised learning via evolutionary search. In Proc. 6 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 365–369, 2000.Google Scholar
  16. [16]
    J. Kittler, M. Hatef, R. Duin, and J. Matas. On combining classifiers. IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(3):226–239, 1998.CrossRefGoogle Scholar
  17. [17]
    A. Krogh and J. Vedelsby. Neural networks ensembles, cross validation, and active learning. In G. Tesauro et al, editor, Advances in Neural Information Processing Systems 7, pages 231–238. MIT Press, 1995.Google Scholar
  18. [18]
    M. Kudo and J. Sklansky. Comparision of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1):25–41, 2000.CrossRefGoogle Scholar
  19. [19]
    L. Kuncheva. That elusive diversity in classifier ensembles. In Proc. of ibPRIA, LNCS 2652, pages 1126–1138, Mallorca, Spain, 2003.Google Scholar
  20. [20]
    L. Kuncheva, J. C. Bezdek, and R. P. W. Duin. Decision templates for multiple classifier fusion: An experimental comparison. Pattern Recognition, 34(2):299–314, 2001.zbMATHCrossRefGoogle Scholar
  21. [21]
    L. Kuncheva and L. C. Jain. Designing classifier fusion systems by genetic algorithms. IEEE Trans. on Evolutionary Computation, 4(4):327–336, 2000.CrossRefGoogle Scholar
  22. [22]
    L. I. Kuncheva and C. J. Whitaker. Ten measures of diversity in classifier ensembles: limits for two classifiers. In Proc. of IEE Workshop on Intelligent Sensor Processing, pages 1–10, 2001.Google Scholar
  23. [23]
    L. I. Kuncheva and C. J. Whitaker. Measures of diversity in classifier ensembles. Machine Learning, 51:181–207, 2003.zbMATHCrossRefGoogle Scholar
  24. [24]
    M. Last, H. Bunke, and A. Kandel. A feature-based serial approach to classifier combination. Pattern Analysis and Applications, 5:385–398, 2002.CrossRefMathSciNetGoogle Scholar
  25. [25]
    M. Miki, T. Hiroyasu, K. Kaneko, and K. Hatanaka. A parallel genetic algorithm with distributed environment scheme. In Proc. of International Conference on System, Man, and Cybernetics, volume 1, pages 695–700, 1999.Google Scholar
  26. [26]
    J. Moody and J. Utans. Principled architecture selection for neural networks: Application to corporate bond rating prediction. In J. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems 4. Morgan Kaufmann, 1991.Google Scholar
  27. [27]
    M. Morita, R. Sabourin, F. Bortolozzi, and C. Y. Suen. Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In Proceedings of the 7 th International Conference on Document Analysis and Recognition, pages 666–670. IEEE Computer Society, 2003.Google Scholar
  28. [28]
    M. Morita, R. Sabourin, F. Bortolozzi, and Suen C. Y. Segmentation and recognition of handwritten dates: An hmm-mlp hybrid approach. International Journal on Document Analysis and Recognition, 6:248–262, 2003.CrossRefGoogle Scholar
  29. [29]
    L. S. Oliveira, R. Sabourin, F. Bortolozzi, and C. Y. Suen. Automatic recognition of handwritten numerical strings: A recognition and verification strategy. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(11):1438–1454, 2002.CrossRefGoogle Scholar
  30. [30]
    L. S. Oliveira, R. Sabourin, F. Bortolozzi, and C. Y. Suen. A methodology for feature selection using multi-objective genetic algorithms for handwritten digit string recognition. International Journal of Pattern Recognition and Artificial Intelligence, 17(6):903–930, 2003.CrossRefGoogle Scholar
  31. [31]
    D. W. Optiz. Feature selection for ensembles. In Proc. of 16 th International Conference on Artificial Intelligence, pages 379–384, 1999.Google Scholar
  32. [32]
    D. Partridge and W. B. Yates. Engineering multiversion neural-net systems. Neural Computation, 8(4):869–893, 1996.Google Scholar
  33. [33]
    D. Ruta. Multilayer selection-fusion model for pattern classification. In Proceedings of the IASTED Artificial Intelligence and Application Conference, Insbruck, Austria, 2004.Google Scholar
  34. [34]
    N. Srinivas and K. Deb. Multiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary Computation, 2(3):221–248, 1995.Google Scholar
  35. [35]
    A. Tsymbal, S. Puuronen, and D. W. Patterson. Ensemble feature selection with the simple Bayesian classification. Information Fusion, 4:87–100, 2003.CrossRefGoogle Scholar
  36. [36]
    K. Tumer and J. Ghosh. Error correlation and error reduction in ensemble classifiers. Connection Science, 8(3–4):385–404, 1996.CrossRefGoogle Scholar
  37. [37]
    K. Tumer and N. C. Oza. Input decimated ensembles. Pattern Analysis and Applications, 6:65–77, 2003.zbMATHCrossRefMathSciNetGoogle Scholar
  38. [38]
    H. Yuan, S. S. Tseng, W. Gangshan, and Z. Fuyan. A two-phase feature selection method using both filter and wrapper. In Proc. of IEEE International Conference on Systems, Man, and Cybernetics, volume 2, pages 132–136, 1999.Google Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • Luiz S. Oliveira
    • 1
  • Marisa Morita
    • 2
  • Robert Sabourin
    • 3
  1. 1.Pontifical Catholic University of ParanáCuritibaBrazil
  2. 2.HSBC Bank BrazilCuritibaBrazil
  3. 3.Ecole de Technologie SupérieureMontrealCanada

Personalised recommendations