Abstract
Feature extraction transforms high-dimensional data into a new subspace of lower dimensionality while keeping the classification accuracy. Traditional algorithms do not consider the multi-objective nature of this task. Data transformations should improve the classification performance on the new subspace, as well as to facilitate data visualization, which has attracted increasing attention in recent years. Moreover, new challenges arising in data mining, such as the need to deal with imbalanced data sets call for new algorithms capable of handling this type of data. This paper presents a Pareto-based multi-objective genetic programming algorithm for feature extraction and data visualization. The algorithm is designed to obtain data transformations that optimize the classification and visualization performance both on balanced and imbalanced data. Six classification and visualization measures are identified as objectives to be optimized by the multi-objective algorithm. The algorithm is evaluated and compared to 11 well-known feature extraction methods, and to the performance on the original high-dimensional data. Experimental results on 22 balanced and 20 imbalanced data sets show that it performs very well on both types of data, which is its significant advantage over existing feature extraction algorithms.
Similar content being viewed by others
References
Alcalá R, Alcalá-Fdez J, Gacto MJ, Herrera F (2008) On the use of multiobjective genetic algorithms to improve the accuracy-interpretability trade-off of fuzzy rule-based systems. In: Multi-objective evolutionary algorithms for knowledge discovery from data bases, vol 98. Springer, New York, pp 91–107
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Anal Framew J Mult-Valued Log Soft Comput 17:255–287
Bae SH, Choi JY, Qiu J, Fox GC (2010) Dimension reduction and visualization of large high-dimensional data via interpolation. In: Proceedings of the 19th ACM international symposium on high performance distributed computing, pp 203–214
Ben-David A (2008) About the relationship between ROC curves and Cohen’s kappa. Eng Appl Artif Intell 21(6):874–882
Bertini E, Tatu A, Keim D (2011) Quality metrics in high-dimensional data visualization: an overview and systematization. IEEE Trans Vis Comput Graph 17(12):2203–2212
Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern Part B Cybern 28(3):301–315
Biber D (1992) The multi-dimensional approach to linguistic analyses of genre variation: an overview of methodology and findings. Comput Humanit 26(5):331–345
Borg I, Groenen PJF (2005) Modern multidimensional scaling: theory and applications. In: Springer series in statistics. Springer, New York
Cai D (2012) Matlab codes for dimensionality reduction (subspace learning). http://www.cad.zju.edu.cn/home/dengcai/Data/DimensionReduction.html
Cai D, He X, Han J (2007a) Spectral regression for efficient regularized subspace learning. In: Proceedings of the IEEE international conference on computer vision, pp 1–8
Cai D, He X, Zhou K, Han J, Bao H (2007b) Locality sensitive discriminant analysis. In: Proceedings of the international joint conference on artificial intelligence, pp 1713–1726
Cano A, Ventura S (2014) Gpu-parallel subtree interpreter for genetic programming. In: Proceedings of the conference on genetic and evolutionary computation, pp 887–894
Cano A, Zafra A, Ventura S (2012) Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Comput 16(2):187–202
Cano A, Zafra A, Ventura S (2015a) Speeding up multiple instance learning classification rules on GPUs. Knowl Inf Syst 44(1):127–145
Cano A, Luna JM, Zafra A, Ventura S (2015b) A classification module for genetic programming algorithms in JCLEC. J Mach Learn Res 16:491–494
Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Derrac J, García S, Hui S, Nagaratnam Suganthan P, Herrera F (2014) Analyzing convergence performance of evolutionary algorithms: a statistical approach. Inf Sci 289:41–58
Dhir CS, Lee J, Lee SY (2012) Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 30(2):359–375
Espejo PG, Ventura S, Herrera F (2010) A survey on the application of genetic programming to classification. IEEE Trans Syst Man Cybern Part C (Appl Rev) 40(2):121–144
Fayyad U, Grinstein GG, Wierse A (2001) Information visualization in data mining and knowledge discovery. Morgan Kaufmann, San Francisco
Fernández A, González AM, Díaz J, Dorronsoro JR (2015) Diffusion maps for dimensionality reduction and visualization of meteorological data. Neurocomputing 163:25–37
Fernández-Blanco E, Rivero D, Gestal M, Dorado J (2013) Classification of signals by means of genetic programming. Soft Comput 17(10):1929–1937
Ferreira de Oliveira MC, Levkowitz H (2003) From visual data exploration to visual data mining: a survey. IEEE Trans Vis Comput Graph 9(3):378–394
Ferri C, Hernandez-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recognit Lett 30(1):27–38
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Fradkin D, Madigan D (2003) Experiments with random projections for machine learning. In: Proceedings of the SIGKDD international conference on knowledge discovery and data mining, pp 517–522
García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694
García S, Molina D, Lozano M, Herrera F (2009) Study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study. J Heuristics 15:617–644
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064
Gisbrecht A, Hammer B (2015) Data visualization by nonlinear dimensionality reduction. Wiley Interdiscip Rev Data Min Knowl Discov 5(2):51–73
Guo H, Jack LB, Nandi AK (2005) Feature generation using genetic programming with application to fault classification. IEEE Trans Syst Man Cybern Part B Cybern 35(1):89–99
Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications. In: Studies in fuzziness and soft computing. Springer, New York
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310
Hubert LJ, Levin JR (1976) A general statistical framework for assessing categorical clustering in free recall. Psychol Bull 78(6):1072–1080
Icke I, Rosenberg A (2011) Multi-objective genetic programming for visual analytics. In: Proceedings of the European conference on genetic programming, pp 322–334
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
Krawiec K (2002) Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet Program Evol Mach 3:329–343
Lee JA, Verleysen M (2010) Unsupervised dimensionality reduction: overview and recent advances. In: Proceedings of the IJCNN IEEE world congress on computational intelligence, pp 4163–4170
Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Kluwer Academic Publishers, Norwell
Liu B, Xiao Y, Yu PS, Hao Z, Cao L (2014) An efficient orientation distance-based discriminative feature extraction method for multi-classification. Knowl Inf Syst 39(2):409–433
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Mckay RI, Hoai NX, Whigham PA, Shan Y, O’Neill M (2010) Grammar-based genetic programming: a survey. Genet Program Evol Mach 11(3–4):365–396
Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello CA (2014) A survey of multiobjective evolutionary algorithms for data mining: part I. IEEE Trans Evol Comput 18(1):4–19
Neshatian K, Zhang M, Johnston M (2007) Feature construction and dimension reduction using genetic programming. In: Orgun MA, Thornton J (eds) AI 2007: advances in artificial intelligence. Lecture notes in computer science, vol 4830, pp 160–170
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(6):559–572
Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409
Sanger TD (1989) Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 2(6):459–473
Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319
van der Maaten L, Postma EO, van den Herik HJ (2009) Dimensionality reduction: a comparative review. Technical report, Tilburg University Technical Report, TiCC-TR 2009–005
Venna J, Peltonen J, Nybo K, Aidos H, Kaski S (2010) Information retrieval perspective to nonlinear dimensionality reduction for data visualization. J Mach Learn Res 11:451–490
Verleysen M, Franois D (2005) The curse of dimensionality in data mining and time series prediction. In: Cabestany J, Prieto A, Sandoval F (eds) Computational intelligence and bioinspired systems. Lecture notes in computer science, vol 3512. Springer, Berlin, pp 758–770
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83
Yeh TT, Chen TY, Chen YC, Wei HW (2011) Parallel non-linear dimension reduction algorithm on GPU. Int J GranuL Comput Rough Sets Intell Syst 2(2):149–165
Zhang Y, Rockett PI (2006) Feature extraction using multi-objective genetic programming. In: Jin Y (ed) Multi-objective machine learning. Studies in computational intelligence, vol 16, chapter 4. Springer, New York, pp 79–106
Zhang Y, Rockett PI (2007) Multiobjective genetic programming feature extraction with optimized dimensionality. In: Soft computing in industrial applications. Advances in soft computing, vol 39. Springer, New York, pp 159–168
Zhang Y, Rockett PI (2009) A generic multi-dimensional feature extraction method using multiobjective genetic programming. Evol Comput 17(1):89–115
Zhang Y, Rockett PI (2010) Domain-independent feature extraction for multi-classification using multi-objective genetic programming. Pattern Anal Appl 13:273–288
Acknowledgments
This work has been supported by the National Institutes of Health Grant 1R01HD056235-01A1 (KJC), the Spanish Ministry of Economy and Competitiveness Project TIN2014-55252-P and FEDER funds (SV).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Cano, A., Ventura, S. & Cios, K.J. Multi-objective genetic programming for feature extraction and data visualization. Soft Comput 21, 2069–2089 (2017). https://doi.org/10.1007/s00500-015-1907-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-015-1907-y