Abstract
Selecting reliable predictors has always been crucial in classification. Especially decision trees are very popular for solving supervised variable selection and classification problems. When variable selection has to be performed with regard to acquisition costs, which have to be paid whenever the respective variable is extracted for a new observation, the problem of balancing the predictive power of the model against its costs describes a multi-objective optimization problem which can be solved with meta-heuristics such as evolutionary multi-objective algorithms. In this paper, we present a non-hierarchical evolutionary multi-objective tree learner (NHEMOtree) based on genetic programming using a binary decision tree representation to handle multi-objective optimization problems with equitable optimization criteria. This tree learner is applied to a multi-objective classification problem from medicine as well as to simulated data to evaluate its performance relative to two wrapper approaches based on either NSGA-II or SMS-EMOA with bitstring representation and CART as the enclosed classification algorithm. Moreover, a novel crossover operator based on a multi-objective variable importance measure is introduced. Using this crossover operator, NHEMOtree can be improved.
Similar content being viewed by others
References
Alba, E., Garcia-Nieto, J., Jourdan, L., Talbi, E.G.: Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 284–290 (2007)
Angeline, P.J.: An investigation into the sensitivity of genetic programming to the frequency of leaf selection during subtree crossover. In: Proceedings of the First Annual Conference on Genetic Programming, pp. 21–29 (1996)
Banzhaf, W., Nordin, P., Keller, R., Francone, F.: Genetic Programming: An Introduction. Morgan Kaufmann, San Francisco (1998)
Berney, S.C., Gordon, I.R., Opdam, H.I., Denehy, L.: A classification and regression tree to assist clinical decision making in airway management for patients with cervical spinal cord injury. Spinal Cord 49(2), 244–250 (2010)
Beume, N., Naujoks, B., Emmerich, M.: SMS-EMOA: multiobjective selection based on dominated hypervolume. Eur. J. Oper. Res. 181(3), 1653–1669 (2007)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC, Boca Raton (1998)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Castillo Tapia, M.G., Coello Coello, C.A.: Applications of multi-objective evolutionary algorithms in economics and finance: a survey. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 532–539 (2007)
Coello Coello, C.A.: Twenty years of evolutionary multi-objective optimization: a historical view of the field. IEEE Comput. Intell. Mag. 1(1), 28–36 (2006)
Coello Coello, C.A.: Evolutionary multi-objective optimization: some current research trends and topics that remain to be explored. Front. Comput. Sci. China 3(1), 18–30 (2009)
De Jong, K.A.: Parameter setting in EAs: a 30 year perspective. Stud. Comput. Intell. 54, 1–18 (2007)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Derrac, J., García, S., Molina, D., Herrera, F.: A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 1(1), 3–18 (2011)
Diao, R., Sun, K., Vittal, V., O’Keefe, R.J., Richardson, M.R., Bhatt, N., Stradford, D., Sarawgi, S.K.: Decision tree-based online voltage security assessment using PMU measurements. IEEE Trans. Power Syst. 24(2), 832–839 (2009)
Emmanouilidis, C., Hunter, A., MacIntyre, J.: A multiobjective evolutionary setting for feature selection and a commonality-based crossover operator. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 309–316 (2000)
Emmerich, M., Beume, N., Naujoks, B.: An EMO algorithm using the hypervolume measure as selection criterion. In: Evolutionary Multi-Criterion Optimization, pp. 62–76 (2005)
Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Trans. Syst. Man Cybern. C 40(2), 121–144 (2010)
Garcia-Nieto, J., Alba, E., Jourdan, L., Talbi, E.: Sensitivity and specificity based multiobjective approach for feature selection: application to cancer diagnosis. Inf. Process. Lett. 109(16), 887–896 (2009)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009)
Ito, T., Iba, H., Sato, S.: Depth-dependent crossover for genetic programming. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 775–780 (1998)
Jabeen, H., Baig, A.R.: Review of classification using genetic programming. Int. J. Eng. Sci. Technol. 2(2), 94–103 (2010)
Jin, Y., Sendhoff, B.: Pareto-based multiobjective machine learning: an overview and case studies. IEEE Trans. Syst. Man Cybern. C 38(3), 397–415 (2008)
Jones, D.F., Mirrazavi, S.K., Tamiz, M.: Multi-objective meta-heuristics: an overview of the current state-of-the-art. Eur. J. Oper. Res. 137(1), 1–9 (2002)
Kim, T.-W., Koh, D.-H., Park, C.-Y.: Decision tree of occupational lung cancer using classification and regression analysis. Saf. Health Work 1(2), 140–148 (2010)
Kinnear Jr., K.E.: Generality and difficulty in genetic programming: evolving a sort. In: Proceedings of the Fifth International Conference on Genetic Algorithms, pp. 287–294 (1993).
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)
Kraft, D.H., Petry, F.E., Buckles, B.P., Sadasivan, T.: The use of genetic programming to build queries for information retrieval. In: Proceedings of the IEEE Conference on Evolutionary Computation, pp. 468–473 (1994)
Matuszyk, A., Mues, C., Thomas, L.C.: Modelling LGD for unsecured personal loans: decision tree approach. J. Oper. Res. Soc. 61(3), 393–398 (2009)
McCarty Jr, K., Miller, L., Cox, E., Konrath, J., McCarty, K.: Estrogen receptor analyses. Correction of biochemical and immunohistochemical methods using monoclonal antireceptor antibodies. Arch. Pathol. Lab. Med. 109(8), 716–721 (1985)
Mugambi, E., Hunter, A.: Multi-objective genetic programming optimization of decision trees for classifying medical data. In: Palade, V., Howlett, R.J., Jain, L. (eds.) Knowledge-Based Intelligent Information and Engineering Systems. Lecture Notes in Computer Science, vol. 2773, pp. 293–299. Springer, Berlin (2003)
Oliveira, L., Sabourin, R., Bortolozzi, F., Suen, C.: A methodology for feature selection using multiobjective genetic algorithms for handwritten digit string recognition. Int. J. Patt. Recogn. Artif. Intell. 17(6), 903–929 (2003)
Pesch, B., Casjens, S., Stricker, I., Westerwick, D., Taeger, D., Rabstein, S., Wiethege, T., Tannapfel, A., Brüning, T., Johnen, G.: Notch1, hif1a and other cancer-related proteins in lung tissue from uranium miners—variation by occupational exposure and subtype of lung cancer. PLoS ONE (2012). doi:10.1371/journal.pone.0045305
Poli, R., Langdon, W., McPhee, N.: A Field Guide to Genetic Programming. Lulu Enterprises, UK (2008)
R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing. http://www.R-project.org (2013)
Rechenberg, I.: Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der biologischen Evolution. Fromman-Holzboog, Stuttgart (1973)
Reynolds, A., de la Iglesia, B.: Rule induction for classification using multi-objective genetic programming. In: Obayashi, S., Deb, K., Poloni, C., Hiroyasu, T., Murata, T. (eds.) Evolutionary Multi-Criterion Optimization. Lecture Notes in Computer Science, vol. 4403, pp. 516–530. Springer, Berlin (2007)
Santner, T., Williams, B., Notz, W.: The Design and Analysis of Computer Experiments. Springer, New York (2003)
Schwender, H., Ickstadt, K.: Identification of SNP interactions using logic regression. Biostatistics 9(1), 187–198 (2008)
Srinivas, N., Deb, K.: Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol. Comput. 2(3), 221–248 (1994)
Strobl, C., Boulesteix, A.L., Zeileis, A., Hothorn, T.: Bias in Random Forest variable importance measures: illustrations, sources and a solution. BMC Bioinform. (2007). doi:10.1186/1471-2105-8-25
Wagner T., Trautmann H.: Online convergence detection for evolutionary multi-objective algorithms revisited. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 1–8 (2010)
Wyns, B., Boullart, L.: Efficient tree traversal to reduce code growth in tree-based genetic programming. J. Heuristics 15(1), 77–104 (2009)
Zhao, H.: A multi-objective genetic programming approach to developing Pareto optimal decision trees. Decis. Support Syst. 43(3), 809–826 (2007)
Zitzler E., Thiele L.,: Multiobjective optimization using evolutionary algorithms - a comparative case study. In: Eiben, A.E., Bäck, T., Schoenauer, M., Schwefel, H.-P. (eds.) Parallel Problem Solving from Nature—PPSN V. Lecture Notes in Computer Science, vol. 1498, pp. 292–301. Springer, Berlin (1998).
Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8(2), 173–195 (2000)
Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., da Grunert Fonseca, V.: Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7(2), 117–132 (2003)
Acknowledgments
This work was supported by the Federal Office for Radiation Protection, Neuherberg, Germany [StSch 4528]; Deutsche Forschungsgemeinschaft (DFG) [SCHW1508/3-1 to H.S.]; and DFG within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project C4, to K.I.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Casjens, S., Schwender, H., Brüning, T. et al. A novel crossover operator based on variable importance for evolutionary multi-objective optimization with tree representation. J Heuristics 21, 1–24 (2015). https://doi.org/10.1007/s10732-014-9269-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10732-014-9269-7