Abstract
There are several data based methods in the field of artificial intelligence which are nowadays frequently used for analyzing classification problems in the context of medical applications. As we show in this paper, the application of enhanced evolutionary computation techniques to classification problems has the potential to evolve classifiers of even higher quality than those trained by standard machine learning methods. On the basis of five medical benchmark classification problems taken from the UCI repository as well as the Melanoma data set (prepared by members of the Department of Dermatology of the Medical University Vienna) we document that the enhanced genetic programming approach presented here is able to produce comparable or even better results than linear modeling methods, artificial neural networks, kNN classification, support vector machines and also various genetic programming approaches.
Similar content being viewed by others
Notes
In contrast to the GP procedure described here, grammar driven GP ([58, 59]) has also been frequently used for solving classification tasks. Grammar based GP is an extension of GP that uses a grammar which defines the structure of the evolved solutions. An example for the application of grammar based GP for data mining in the context of medical knowledge retrieval is given in [36].
The abbreviation SASEGASA stands for Self Adaptive Segregative Genetic Algorithm with aspects of Simulated Annealing.
A even more detailed listing of test results for this data set can be found in [23].
References
M. Affenzeller, Segregative genetic algorithms (SEGA): a hybrid superstructure upwards compatible to genetic algorithms for retarding premature convergence. IJCSS 2(1), 18–32 (2001)
M. Affenzeller, Population Genetics and Evolutionary Computation: Theoretical and Practical Aspects. Schriften der Johannes Kepler Universität Linz. Universitätsverlag Rudolf Trauner (2005)
M. Affenzeller, S. Wagner, SASEGASA: a new generic parallel evolutionary algorithm for achieving highest quality results. J. Heuristics - Special Issue on New Advances on Parallel Meta-Heuristics for Complex Problems 10, 239–263 (2004)
M. Affenzeller, S. Wagner, Offspring selection: a new self-adaptive selection scheme for genetic algorithms, in Adaptive and Natural Computing Algorithms, ed. by B. Ribeiro, R.F. Albrecht, A. Dobnikar, D.W. Pearson, N.C. Steele (Springer Computer Science, Springer, 2005), pp. 218–221
D. Alberer, L. del Re, S. Winkler, P. Langthaler, Virtual sensor design of particulate and nitric oxide emissions in a di diesel engine. in Proceedings of the 7th International Conference on Engines for Automobile ICE 2005, 2005-24-063, Capri, Italy, 2005
W. Banzhaf, C. Lasarczyk, Genetic programming of an algorithmic chemistry. in Genetic Programming Theory and Practice II. ed. by U. O’Reilly, T. Yu, R. Riolo, B. Worzel (University of Michigan, Ann Arbor, 2004), pp. 175–190
H. Beyer, The Theory of Evolution Strategies. Springer, New York (2001)
C. Bojarczuk, H. Lopes, A. Freitas, Discovering comprehensible classification rules using genetic programming: a case study in a medical domain. in Proceedings of the Genetic and Evolutionary Computation Conference, vol. 2, Orlando, Florida, USA, 1999, pp. 953–958
C. Bojarczuk, H. Lopes, A. Freitas, E. Michalkiewicz, A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets. Artif. Intell. Med. 30(1), 27–48 (2004)
A. Bradley, The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997)
M. Brameier, W. Banzhaf, A comparison of linear genetic programming and neural networks inmedical data mining. Evolutionary Computation, IEEE Transactions on 5(1), 17–26 (2001)
C. Brodley, P. Utgoff, Multivariate decision trees. Mach. Learn. 19(1), 45–77 (1995)
G. Brown, Diversity in neural network ensembles. Ph.D. thesis, School of Computer Science, University of Birmingham, 2003
I. De Falco, A. Della Cioppa, E. Tarantino, Discovering interesting classification rules with genetic programming. Appl. Soft Comput. J. 1(4), 257–269 (2002)
S. Dreiseitl, L. Ohno-Machado, H. Kittler, S. Vinterbo, H. Billhardt, M. Binder, A comparison of machine learning methods for the diagnosis of pigmented skin lesions. J. Biomed. Inform. 34, 28–36 (2001)
W. Duch, R. Adamczak, K. Grabczewski, A new methodology of extraction, optimization and application of crisp and fuzzy logical rules. IEEE Trans. Neural Netw. 12, 277–306 (2001)
R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (Wiley Interscience, 2000)
J. Eggermont, J.N. Kok, W.A. Kosters, Genetic programming for data classification: partitioning the search space. in Proceedings of the 2004 Symposium on applied computing ACM SAC’04, Nicosia, Cyprus, ACM, 2004, pp. 1001–1005
C. Gathercole, P. Ross, Dynamic training subset selection for supervised learning in genetic programming. in Parallel Problem Solving from Nature III, LNCS, vol. 866 ed. by Y. Davidor, H.P. Schwefel, R. Männer, (Springer-Verlag, 1994), pp. 312–321
P. Gill, W. Murray, M. Wright, Practical Optimization. (Academic Press, 1982)
H. Hamilton, N. Shan, N. Cercone, Riac: a rule induction algorithm based on approximate classification. Tech. Rep. CS 96-06, (Regina University, 1996)
P.L. Hammer, A. Kogan, B. Simeone, S. Szedmak, Pareto-optimal patterns in logical analysis of data. Discrete Appl. Math. 144, 102 (2004)
I. Jonyer, L.B. Holder, D.J. Cook, Attribute-value selection based on minimum description length. in Proceedings of the International Conference on Artificial Intelligence, Las Vegas, Nevada, USA, 2004, pp. 1154–1159
S. Keerthi, S. Shevade, C. Bhattacharyya, K. Murthy, Improvements to platt’s SMO algorithm for SVM classifier design. Neural Comput. 13(3), 637–649 (2001)
J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. The MIT Press (1992)
K. Levenberg, A method for the solution of certain non-linear problems in least squares. Q. Appl. Math. 2, 164–168 (1944)
D.P.X. Li, V. Ciesielski, Multi-objective techniques in genetic programming for evolving classifiers. in Proceedings of the 2005 Congress on Evolutionary Computation (CEC ’05), Munich, Germany, 2005, pp. 183–190
P. Lichodzijewski, M. Heywood, Pareto-coevolutionary genetic programming for problem decomposition in multi-class classification. in Proceedings of the Genetic and Evolutionary Computation Conference GECCO’07, London, England, 2006, pp. 464–471
L. Ljung, System Identification – Theory For the User, 2nd edn. (PTR Prentice Hall, Upper Saddle River, NJ, 1999)
T. Loveard, V. Ciesielski, Representing classification problems in genetic programming. in Proceedings of the Congress on Evolutionary Computation, vol. 2 (IEEE Press, COEX, World Trade Center, 159 Samseong-dong, Gangnam-gu, Seoul, Korea, 2001), pp. 1070–1077
D.W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11, 431–441 (1963)
T.M. Mitchell, Machine Learning. (McGraw-Hill, New York, 2000)
B. Moghaddam, G. Shakhnarovich, Boosted dyadic kernel discriminants. in Advances in Neural Information Processing Systems (NIPS), vol. 15 ed. by S. Becker, S. Thrun, K. Obermayer (2002)
K. Morik, M. Imhoff, P. Brockhausen, T. Joachims, U. Gather, Knowledge discovery and knowledge validation in intensive care. Artif. Intell. Med. 19, 225–249 (2000)
O. Nelles, Nonlinear System Identification. (Springer Verlag, Berlin Heidelberg, New York, 2001)
P. Ngan, M. Wong, K. Leung, J. Cheng, Using grammar based genetic programming for data mining of medical knowledge. (Genetic Programming, 1998), pp. 254–259
M. Nørgaard, Neural network based system identification toolbox. Tech. Rep. 00-E-891, Technical University of Denmark (2000)
J. Platt, Fast training of support vector machines using sequential minimal optimization. in Advances in Kernel Methods-Support Vector Learning, ed. by B. Schoelkopf, C. Burges, A. Smola, (MIT Press, 1999). pp. 185–208
L. Prechelt, Proben1 - a set of neural network benchmark problems and benchmarking rules. Tech. rep., Fakultät für Informatik, Universität Karlsruhe (1994)
M.L. Raymer, L.A. Kuhn, W.F. Punch, Knowledge discovery in biological datasets using a hybrid bayes classifier/evolutionary algorithm. in BIBE ’01: Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering. (IEEE Computer Society, Washington, DC, USA, 2001), pp. 236–244
S.J. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, 2nd edn. (Prentice Hall, 2003)
W. Schiffmann, M. Joost, R. Werner, Optimization of the backpropagation algorithm for training multilayer perceptrons. Tech. Rep. 15, University of Koblenz, Institute of Physics (1992)
W. Schiffmann, M. Joost, R. Werner, Comparison of optimized backpropagation algorithms. in Proceedings of the European Symposium on Artificial Neural Networks ESANN ’93, Brussels, Belgium, 1993, pp. 97–104
I. Taha, J. Ghosh, Evaluation and ordering of rules extracted from feedforward networks. in Proceedings of the IEEE International Conference on Neural Networks, Houston, Texas, USA, 1997, pp. 221–226
M.K. Titsias, A.C. Likas, Shared kernel models for class conditional density estimation. IEEE-NN 12, 987–997 (2001)
V. Vapnik, Statistical Learning Theory. (Wiley, New York, 1998)
S. Wagner, M. Affenzeller, Heuristiclab: A generic and extensible optimization environment. in Adaptive and Natural Computing Algorithms, ed. by B. Ribeiro, R.F. Albrecht, A. Dobnikar, D.W. Pearson, N.C. Steele, (Springer Computer Science, Springer, 2005a), pp. 538–541
S. Wagner, M. Affenzeller, Sexual GA: gender-specific selection for genetic algorithms. in Proceedings of the 9th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI) 2005, vol. 4, Orlando, Florida, USA, ed. by N. Callaos, W. Lesso, E. Hansen. (International Institute of Informatics and Systemics, 2005b), pp. 76–81
S. Weiss, I. Kapouleas, An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. in Readings in Machine Learning. ed. by Shavlik J.W., Dietterich T.G. (Kaufmann, San Mateo, CA), pp. 177–183 (1990)
J. Wen-Hua, D. Madigan, S.L. Scott, On bayesian learning of sparse classifiers. Tech. Rep. 2003-08, Avaya Labs Research (2003)
S. Winkler, Evolutionary system identification—modern concepts and practical applications. Ph.D. thesis, Institute for Formal Models and Verification, Johannes Kepler University Linz, 2008
S. Winkler, M. Affenzeller, S. Wagner, Automatic data based patient classification using genetic programming. in Cybernetics and Systems 2006, vol. 1, ed. by R. Trappl, R. Brachman, R. Brooks, H. Kitano, D. Lenat, O. Stock, W. Wahlster, M. Wooldridge. (Austrian Society for Cybernetic Studies, 2006a), pp. 251–256
S. Winkler, M. Affenzeller, S. Wagner, Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis—an empirical study. in Proceedings of the GECCO 2006 Workshop on Medical Applications of Genetic and Evolutionary Computation (MedGEC 2006), Seattle, Washington, USA. Association for Computing Machinery (ACM), 2006b
S. Winkler, M. Affenzeller, S. Wagner, Advanced genetic programming based machine learning. J. Math. Model. Algorithms 6(3), 455–480 (2007a)
S. Winkler, M. Affenzeller, S. Wagner, Selection pressure driven sliding window genetic programming. Lecture Notes in Computer Science 4739: Computer Aided Systems Theory - EuroCAST 2007, pp. 789–795 (2007b)
S. Winkler, M. Affenzeller, S. Wagner, Offspring selection and its effects on genetic propagation in genetic programming based system identification. in Cybernetics and Systems 2008, vol. 2, ed. by R. Trappl. (Austrian Society for Cybernetic Studies, 2008), pp. 549–554
I. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. (Morgan Kaufmann, San Francisco, 2005)
M.L. Wong, K.S. Leung, Inducing logic programs with genetic algorithms: the genetic logicprogramming system genetic logic programming and applications. IEEE Expert 10(5), 68–76 (1995)
M.L. Wong, K.S. Leung, Evolutionary program induction directed by logic grammars. Evol. Comput. 5(2), 143–180 (1997)
Z.H. Zhou, Y. Jiang, Nec4.5: neural ensemble based c4.5. IEEE Trans. Knowl. Data Eng. 16(6), 770–773 (2004)
M.H. Zweig, Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clin. Chem. 39, 561–577 (1993)
Author information
Authors and Affiliations
Corresponding author
Additional information
The work described in this paper was done within the Translational Research Project L284-N04 “GP-Based Techniques for the Design of Virtual Sensors” sponsored by the Austrian Science Fund (FWF).
Rights and permissions
About this article
Cite this article
Winkler, S.M., Affenzeller, M. & Wagner, S. Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis. Genet Program Evolvable Mach 10, 111–140 (2009). https://doi.org/10.1007/s10710-008-9076-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10710-008-9076-8