Abstract
In the era of healthcare, and its related research fields, the dimensionality problem of high dimensional data is a massive challenge as it contains a huge number of variables forming complex data matrices. The demand for dimension reduction of complex data is growing immensely to improvise data prediction, analysis and visualization. In general, dimension reduction techniques are defined as a compression of dataset from higher dimensional matrix to lower dimensional matrix. Several computational techniques have been implemented for data dimension reduction, which is further segregated into two categories such as feature extraction and feature selection. In this review, a detailed investigation of various feature extraction and feature selection methods has been carried out with a systematic comparison of several dimension reduction techniques for the analysis of high dimensional data and to overcome the problem of data loss. Then, some case studies are also cited to verify the better approach for data dimension reduction by considering few advances described in the technical literature. This review paper may guide researchers to choose the most effective method for satisfactory analysis of high dimensional data.
Similar content being viewed by others
References
Aggarwal CC, Cheng XZ (2012) Mining text data. Springer, Berlin
Al-Bakri NF, Soukaena HH (2018) Reducing data sparsity in recommender systems. Al-Naharin J Sci 21:138–147
Alexander CA, Wang L (2017) High dimensional data in healthcare: a new frontier inpersonalized medicine. Open Access J Trans Med Res 1–5
Alfaar AS, Waleed MH, Mohamed SB, Ibrahim Q (2016) Neonates with cancer and causes of death; lessons from 615 cases in the SEER databases. Cancer Med 6:1817–1826
Al-Rawi M, Karajeh H (2007) Genetic algorithm matched filter optimization for automated detection of blood vessels from digital retinal images. Comput Methods Prog Biomed 87(3):248–253
Ang JC, Andri M, Habibollah H, Haza Nuzly AH (2016) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinf 13(5):971–989
Archenaa J, Mary Anita EA (2015) A survey of big data analytics in healthcare and government. Procedia Comput Sci 50:408–413
Behbahani BA, Yazdi FT, Shahidi F, Mortazavi SA, Mohebbi M (2017) Principle component analysis (PCA) for investigation of relationship between population dynamics of microbial pathogenesis, chemical and sensory characteristics in beef slices containing Tarragon essential oil. Microb Pathog 100(105):37–50
Cannistraci CV, Ravasi T, Montevecchi FM, Ideker T, Alessio M (2010) Nonlinear dimension reduction and clustering by minimum curvilinearity unfold neuropathic pain and tissue embryological classes. Bioinformatics 26(18):531–539
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Chen J, Yang L (2011) Locally linear embedding: a survey. Artif Intell Rev 36(1):29–48
Chen J, Zhang S (2009) Manifold learning based phoneme recognition. In: The proceedings of 2009 international conference on image analysis and signal processing, Taizhou, China
Cong I, Duan L (2016) Quantum discriminant analysis for dimensionality reduction and classification. New J Phys 18:1–10
Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of 8th international conference on machine learning (ICML), vol 1, pp 74–81
David M, Wien FHT (2015).Support vector machines. The interface to LIBSVM in package, p 28
Deyan C, Zhao H (2012) Data security and privacy protection issues in cloud computing. In: Proceedings of 2012 international conference on computer science and electronics engineering, 1, pp 647–651
Ding S, Zhu H, Jia W, Su C (2012) A survey on feature extraction for pattern recognition. Artif Intell Rev 37(3):169–180
Fu MC (ed) (2016) Handbook of simulation optimization. Springer, Berlin
Gedik N (2016) A new feature extraction method based on multi-resolution representations of mammograms. Appl Soft Comput 44:128–133
Ghosh A, Datta A, Ghosh S (2013) Self-adaptive differential evolution for feature selection in hyperspectral image data. Appl Soft Comput 13:1969–1977
Gysels E, Philippe R, Patrick C (2005) SVM-based recursive feature elimination to compare phase synchronization computed from broadband and narrowband EEG signals in brain–computer interfaces. Signal Process 85(11):2178–2189
Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinf. https://doi.org/10.1155/2015/198363
Hong Y, Dong Z (2004) Genetic algorithms with applications in wireless communications. Int J Syst Sci 35(13):751–762
Hossain MS, Muhammad G (2016) Healthcare big data voice pathology assessment framework. IEEE Access 4:7806–7815
Hsu HH, Cheng WH, Ming-Da L (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38:8144–8150
Inbarani HH, Azar AT, Jothi G (2014) Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput Methods Programs Biomed 113(1):175–185
Jain D, Singh V (2018) An efficient hybrid feature selectionmodel for dimensionality reduction. Procedia Comput Sci 132:333–341
Jendoubi T, Strimmer K (2019) A whitening approach to probabilistic canonical correlation analysis for omics data integration. BMC Bioinf 20:1–13
Jiang QY, Li WJ (2015) Scalable graph hashing with feature transformation. In: Proceeding of 24th international joint conference on artificial intelligence, pp 2248–2254
Jinxing C, Yang Y, Li Li BX, Zhang S, Deng C (2017) Maximum relevance minimum common redundancy feature selection for nonlinear data. Inf Sci 409:68–86
Jothi JAA, Rajam VMA (2017) A survey on automated cancer diagnosis from histopathology images. Artif Intell Rev 48:31–81
Kapsoulis D, Tsiakas K, Trompoukis X, Asouti V, Giannakoglou K (2018) Evolutionary multi-objective optimization assisted by matamodels, kernel PCA and multi-criteria decision making techniques with applications in aerodynamics. Appl Soft Comput 64:1–13
Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. Proc AAAI 92(2):129–134
Lee JA, Lendasse A, Verleysen M (2004) Nonlinear projection with curvilinear distances: isomap versus curvilinear distance analysis. Neuro-Comput 57:49–76
Li Y, Ngom A (2013) The non-negative matrix factorization toolbox for biological data mining. Source Code Biol Med 8:10
Li F, Wang J, Chyu MK, Tang B (2015) Weak fault diagnosis of rotating machinery based on feature reduction with supervised orthogonal local fisher discriminant analysis. Neuro-Comput 168:505–519
Lichman M (2013) UCI machine learning repository, University of California, School of Information and Computer Science, Irvine, CA. https://archive.ics.uci.edu/ml/datasets.php
Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press, Boca Raton
Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27:3111–3124
Mahale RA, Chavan SD (2012) A survey: evolutionary and swarm based bio-inspired optimization algorithms. Int J Sci Res 2(12):1–6
Malik ZK, Hussain A, Wu J (2016) An online generalized eigenvalue version of laplacian eigenmaps for visual big data. Neurocomputing 173(2):127–136
Mathias F., Metka B., and Bauer-wersing U. (2018). Navigation system based on slow feature gradients. U.S. Patent Application 15/905,962, filed August 30, 2018
Mazomenos EB, Biswas D, Acharyya A, Chen T, Maharatna K, Rosengarten J, Morgan J, Curzen N (2013) A low-complexity ECG feature extraction algorithm for mobile healthcare applications. IEEE J Biomed Health Inf 2:459–469
McDonnell LA, Remoortere AV, Velde ND, Zeijl RJMV, Deelder AM (2010) Imaging mass spectrometry data reduction: automated feature identification and extraction. J Am Soc Mass Spectrom 21(12):1969–1978
Michaeli T, Wang W, Livescu K (2016) Nonparametric canonical correlation analysis. In: Proceedings of international conference on machine learning, pp 1967–1976
Naji S, Jalab HA, Kareem SA (2019) A survey on skin detection in colored images. Artif Intell Rev 52:1041–1087
Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint ℓ2, 1-norms minimization. In: Advances in neural information processing systems, pp 1813–1821
Ozdenizci O, Erdogmus D (2019) Information theoretic feature transformation learning for brain interfaces. IEEE Trans Biomed Eng 67:69–78
Patro S, Sahu KK (2015) Normalization: a preprocessing stage. arXiv preprint
Pedram G, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12:309–313
Raghupati W, Raghupati V (2014) High dimensional data analytics in healthcare. Promise Potential Health Inf Sci Syst 2–3
Ridge K (2005) Kent-Ridge biomedical dataset repository. http://leo.ugr.es/elvira/DBCRepository/index.html
Sacha D, Zhang L, Sedlmair M, Lee JA, Peltonen J, Weiskopf D, North SC, Keim DA (2017) Visual interaction with dimensionality reduction: a structured literature analysis. IEEE Trans Vis Comput Gr 1:241–250
Sorzano C.O, Vargas J, Montano A.P (2014). ‘A survey of dimensionality reduction techniques’. preprint arXiv, 1403-2877
Stanojević G, Krivokapić Z (2014) Rare tumors of the colon and rectum in colorectal cancer-surgery, diagnostics and treatment. IntechOpen, Hamilton
Suguna R, Devi MS, Mathew RM (2019) Customer churn predictive analysis by component minimization using machine learning. Int J Innov Technol Explor Eng (IJITEE) 8(8):3229–3233
Sun T, Wang J, Li X, Lu P, Liu F, Luo Y, Gao Q, Zhu H, Guo X (2013) Comparative evaluation of support vector machines for computer aided diagnosis of lung cancer in CT based on a multi-dimensional data set. Comput Methods Programs Biomed 111:519–524
Tan PN (2018) Introduction to data mining. Pearson Education, Chennai
Tao Z, Huiling L, Wenwen W, Xia Y (2018) GA-SVM based feature selection and parameter optimization in hospitalization expense modeling. Appl Soft Comput 75:323–332
Van der Linden C, Dufresne Y (2017) The curse of dimensionality in voting advice applications: reliability and validity in algorithm design. J Elections Public Opin Parties 27(1):9–30
Varghese K, Kolhekar MM, Hande S (2018) Denoising of facial images using non-negative matrix factorization with sparseness constraint. In: Proceedings of 3rd IEEE international conference for convergence in technology (I2CT), pp 1–4
Verónica B, Betanzos A, Amparo M, Sánchez CN (2017) Artificial intelligence, foundations, theory and algorithms feature selection for high-dimensional data. Springer, Berlin
Wang YX, Zhang YJ (2013) Nonnegative matrix factorization: a comprehensive review. IEEE Trans Knowl Data Eng 25(16):1336–1353
Wang J, Tian F, Yu H, Liu CH, Zhan K, Wang X (2018a) Diverse non-negative matrix factorization for multi-view data representation. IEEE Trans Cybern 48:2620–2632
Wang H, Yu D, Li Y, Li Z, Wang G (2018b) Multi-label online streaming feature selection based on spectral granulation and mutual information. In: International joint conference on rough sets. Springer, pp 215–228
Wilms I, Croux C (2015) Sparse canonical correlation analysis from a predictive point of view. Biom J 57:834–851
Xu K, Zhang L, Pérez D, Nguyen PH, Ogilvie-Smith A (2017) Evaluating interactive visualization of multidimensional data projection with feature transformation. Multimodal Technol Interact 1(3):13
Zeren DY, Adhikari N, Wong YK, Aksakalli V, Gumus AT, Abbasi B (2018) SPSA-FSR: simultaneous perturbation stochastic approximation for feature selection and ranking. arXiv preprint
Zeynep A, Thurau C, Bauckhage C (2011) Non-negative matrix factorization in multimodality data for segmentation and label prediction. In: Proceedings of 16th computer vision winter workshop, Austria
Zhang J, Hua H, Wang J (2010) Manifold learning for visualizing and analyzing high-dimensional data. IEEE Intell Syst 25(4):54–61
Zhao C, Gao F (2015) A nested-loop Fisher discriminant analysis algorithm. Chemom Intell Lab Syst 146:396–406
Zhi W, Zhang Y, Chen Z, Yang H, Sun Y, Kang J, Yang Y, Liang X (2016) Application of ReliefF algorithm to selecting feature sets for classification of high resolution remote sensing image. In: Proceedings of 2016 IEEE international geoscience and remote sensing symposium (IGARSS), pp 755–758
Zhou HF, Zhang Y, Zhang YJ, Liu HJ (2019) Feature selection based on conditional mutual information: minimum conditional relevance and minimum conditional redundancy. Appl Intell 49:883–896
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ray, P., Reddy, S.S. & Banerjee, T. Various dimension reduction techniques for high dimensional data analysis: a review. Artif Intell Rev 54, 3473–3515 (2021). https://doi.org/10.1007/s10462-020-09928-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-020-09928-0