Soft Computing

, Volume 21, Issue 8, pp 2069–2089 | Cite as

Multi-objective genetic programming for feature extraction and data visualization

  • Alberto Cano
  • Sebastián Ventura
  • Krzysztof J. Cios
Methodologies and Application

Abstract

Feature extraction transforms high-dimensional data into a new subspace of lower dimensionality while keeping the classification accuracy. Traditional algorithms do not consider the multi-objective nature of this task. Data transformations should improve the classification performance on the new subspace, as well as to facilitate data visualization, which has attracted increasing attention in recent years. Moreover, new challenges arising in data mining, such as the need to deal with imbalanced data sets call for new algorithms capable of handling this type of data. This paper presents a Pareto-based multi-objective genetic programming algorithm for feature extraction and data visualization. The algorithm is designed to obtain data transformations that optimize the classification and visualization performance both on balanced and imbalanced data. Six classification and visualization measures are identified as objectives to be optimized by the multi-objective algorithm. The algorithm is evaluated and compared to 11 well-known feature extraction methods, and to the performance on the original high-dimensional data. Experimental results on 22 balanced and 20 imbalanced data sets show that it performs very well on both types of data, which is its significant advantage over existing feature extraction algorithms.

Keywords

Classification Feature extraction Visualization  Genetic programming 

References

  1. Alcalá R, Alcalá-Fdez J, Gacto MJ, Herrera F (2008) On the use of multiobjective genetic algorithms to improve the accuracy-interpretability trade-off of fuzzy rule-based systems. In: Multi-objective evolutionary algorithms for knowledge discovery from data bases, vol 98. Springer, New York, pp 91–107Google Scholar
  2. Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Anal Framew J Mult-Valued Log Soft Comput 17:255–287Google Scholar
  3. Bae SH, Choi JY, Qiu J, Fox GC (2010) Dimension reduction and visualization of large high-dimensional data via interpolation. In: Proceedings of the 19th ACM international symposium on high performance distributed computing, pp 203–214Google Scholar
  4. Ben-David A (2008) About the relationship between ROC curves and Cohen’s kappa. Eng Appl Artif Intell 21(6):874–882MathSciNetCrossRefGoogle Scholar
  5. Bertini E, Tatu A, Keim D (2011) Quality metrics in high-dimensional data visualization: an overview and systematization. IEEE Trans Vis Comput Graph 17(12):2203–2212CrossRefGoogle Scholar
  6. Bezdek JC, Pal NR (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern Part B Cybern 28(3):301–315CrossRefGoogle Scholar
  7. Biber D (1992) The multi-dimensional approach to linguistic analyses of genre variation: an overview of methodology and findings. Comput Humanit 26(5):331–345Google Scholar
  8. Borg I, Groenen PJF (2005) Modern multidimensional scaling: theory and applications. In: Springer series in statistics. Springer, New YorkGoogle Scholar
  9. Cai D (2012) Matlab codes for dimensionality reduction (subspace learning). http://www.cad.zju.edu.cn/home/dengcai/Data/DimensionReduction.html
  10. Cai D, He X, Han J (2007a) Spectral regression for efficient regularized subspace learning. In: Proceedings of the IEEE international conference on computer vision, pp 1–8Google Scholar
  11. Cai D, He X, Zhou K, Han J, Bao H (2007b) Locality sensitive discriminant analysis. In: Proceedings of the international joint conference on artificial intelligence, pp 1713–1726Google Scholar
  12. Cano A, Ventura S (2014) Gpu-parallel subtree interpreter for genetic programming. In: Proceedings of the conference on genetic and evolutionary computation, pp 887–894Google Scholar
  13. Cano A, Zafra A, Ventura S (2012) Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Comput 16(2):187–202CrossRefGoogle Scholar
  14. Cano A, Zafra A, Ventura S (2015a) Speeding up multiple instance learning classification rules on GPUs. Knowl Inf Syst 44(1):127–145Google Scholar
  15. Cano A, Luna JM, Zafra A, Ventura S (2015b) A classification module for genetic programming algorithms in JCLEC. J Mach Learn Res 16:491–494Google Scholar
  16. Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314CrossRefMATHGoogle Scholar
  17. Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227CrossRefGoogle Scholar
  18. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197CrossRefGoogle Scholar
  19. Derrac J, García S, Hui S, Nagaratnam Suganthan P, Herrera F (2014) Analyzing convergence performance of evolutionary algorithms: a statistical approach. Inf Sci 289:41–58Google Scholar
  20. Dhir CS, Lee J, Lee SY (2012) Extraction of independent discriminant features for data with asymmetric distribution. Knowl Inf Syst 30(2):359–375CrossRefGoogle Scholar
  21. Espejo PG, Ventura S, Herrera F (2010) A survey on the application of genetic programming to classification. IEEE Trans Syst Man Cybern Part C (Appl Rev) 40(2):121–144Google Scholar
  22. Fayyad U, Grinstein GG, Wierse A (2001) Information visualization in data mining and knowledge discovery. Morgan Kaufmann, San FranciscoGoogle Scholar
  23. Fernández A, González AM, Díaz J, Dorronsoro JR (2015) Diffusion maps for dimensionality reduction and visualization of meteorological data. Neurocomputing 163:25–37CrossRefGoogle Scholar
  24. Fernández-Blanco E, Rivero D, Gestal M, Dorado J (2013) Classification of signals by means of genetic programming. Soft Comput 17(10):1929–1937CrossRefGoogle Scholar
  25. Ferreira de Oliveira MC, Levkowitz H (2003) From visual data exploration to visual data mining: a survey. IEEE Trans Vis Comput Graph 9(3):378–394CrossRefGoogle Scholar
  26. Ferri C, Hernandez-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recognit Lett 30(1):27–38CrossRefGoogle Scholar
  27. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188CrossRefGoogle Scholar
  28. Fradkin D, Madigan D (2003) Experiments with random projections for machine learning. In: Proceedings of the SIGKDD international conference on knowledge discovery and data mining, pp 517–522Google Scholar
  29. García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Mach Learn Res 9:2677–2694MATHGoogle Scholar
  30. García S, Molina D, Lozano M, Herrera F (2009) Study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study. J Heuristics 15:617–644CrossRefMATHGoogle Scholar
  31. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180(10):2044–2064CrossRefGoogle Scholar
  32. Gisbrecht A, Hammer B (2015) Data visualization by nonlinear dimensionality reduction. Wiley Interdiscip Rev Data Min Knowl Discov 5(2):51–73CrossRefGoogle Scholar
  33. Guo H, Jack LB, Nandi AK (2005) Feature generation using genetic programming with application to fault classification. IEEE Trans Syst Man Cybern Part B Cybern 35(1):89–99CrossRefGoogle Scholar
  34. Guyon I, Gunn S, Nikravesh M, Zadeh LA (2006) Feature extraction: foundations and applications. In: Studies in fuzziness and soft computing. Springer, New YorkGoogle Scholar
  35. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284CrossRefGoogle Scholar
  36. Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310CrossRefGoogle Scholar
  37. Hubert LJ, Levin JR (1976) A general statistical framework for assessing categorical clustering in free recall. Psychol Bull 78(6):1072–1080CrossRefGoogle Scholar
  38. Icke I, Rosenberg A (2011) Multi-objective genetic programming for visual analytics. In: Proceedings of the European conference on genetic programming, pp 322–334Google Scholar
  39. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69MathSciNetCrossRefMATHGoogle Scholar
  40. Krawiec K (2002) Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genet Program Evol Mach 3:329–343CrossRefMATHGoogle Scholar
  41. Lee JA, Verleysen M (2010) Unsupervised dimensionality reduction: overview and recent advances. In: Proceedings of the IJCNN IEEE world congress on computational intelligence, pp 4163–4170Google Scholar
  42. Liu H, Motoda H (1998) Feature extraction, construction and selection: a data mining perspective. Kluwer Academic Publishers, NorwellCrossRefMATHGoogle Scholar
  43. Liu B, Xiao Y, Yu PS, Hao Z, Cao L (2014) An efficient orientation distance-based discriminative feature extraction method for multi-classification. Knowl Inf Syst 39(2):409–433CrossRefGoogle Scholar
  44. López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141CrossRefGoogle Scholar
  45. Mckay RI, Hoai NX, Whigham PA, Shan Y, O’Neill M (2010) Grammar-based genetic programming: a survey. Genet Program Evol Mach 11(3–4):365–396CrossRefGoogle Scholar
  46. Mukhopadhyay A, Maulik U, Bandyopadhyay S, Coello CA (2014) A survey of multiobjective evolutionary algorithms for data mining: part I. IEEE Trans Evol Comput 18(1):4–19CrossRefGoogle Scholar
  47. Neshatian K, Zhang M, Johnston M (2007) Feature construction and dimension reduction using genetic programming. In: Orgun MA, Thornton J (eds) AI 2007: advances in artificial intelligence. Lecture notes in computer science, vol 4830, pp 160–170Google Scholar
  48. Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(6):559–572CrossRefMATHGoogle Scholar
  49. Sammon JW (1969) A nonlinear mapping for data structure analysis. IEEE Trans Comput 18(5):401–409CrossRefGoogle Scholar
  50. Sanger TD (1989) Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 2(6):459–473CrossRefGoogle Scholar
  51. Schölkopf B, Smola A, Müller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319CrossRefGoogle Scholar
  52. van der Maaten L, Postma EO, van den Herik HJ (2009) Dimensionality reduction: a comparative review. Technical report, Tilburg University Technical Report, TiCC-TR 2009–005Google Scholar
  53. Venna J, Peltonen J, Nybo K, Aidos H, Kaski S (2010) Information retrieval perspective to nonlinear dimensionality reduction for data visualization. J Mach Learn Res 11:451–490MathSciNetMATHGoogle Scholar
  54. Verleysen M, Franois D (2005) The curse of dimensionality in data mining and time series prediction. In: Cabestany J, Prieto A, Sandoval F (eds) Computational intelligence and bioinspired systems. Lecture notes in computer science, vol 3512. Springer, Berlin, pp 758–770Google Scholar
  55. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83CrossRefGoogle Scholar
  56. Yeh TT, Chen TY, Chen YC, Wei HW (2011) Parallel non-linear dimension reduction algorithm on GPU. Int J GranuL Comput Rough Sets Intell Syst 2(2):149–165CrossRefGoogle Scholar
  57. Zhang Y, Rockett PI (2006) Feature extraction using multi-objective genetic programming. In: Jin Y (ed) Multi-objective machine learning. Studies in computational intelligence, vol 16, chapter 4. Springer, New York, pp 79–106Google Scholar
  58. Zhang Y, Rockett PI (2007) Multiobjective genetic programming feature extraction with optimized dimensionality. In: Soft computing in industrial applications. Advances in soft computing, vol 39. Springer, New York, pp 159–168Google Scholar
  59. Zhang Y, Rockett PI (2009) A generic multi-dimensional feature extraction method using multiobjective genetic programming. Evol Comput 17(1):89–115CrossRefGoogle Scholar
  60. Zhang Y, Rockett PI (2010) Domain-independent feature extraction for multi-classification using multi-objective genetic programming. Pattern Anal Appl 13:273–288MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Alberto Cano
    • 1
    • 2
  • Sebastián Ventura
    • 1
    • 3
  • Krzysztof J. Cios
    • 2
    • 4
  1. 1.Department of Computer ScienceVirginia Commonwealth UniversityRichmondUSA
  2. 2.Department of Computer Science and Numerical AnalysisUniversity of CórdobaCórdobaSpain
  3. 3.Computer Sciences Department, Faculty of Computing and Information TechnologyKing Abdulaziz UniversityJeddahSaudi Arabia
  4. 4.IITiS Polish Academy of SciencesGliwicePoland

Personalised recommendations