Skip to main content
Log in

Various dimension reduction techniques for high dimensional data analysis: a review

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

In the era of healthcare, and its related research fields, the dimensionality problem of high dimensional data is a massive challenge as it contains a huge number of variables forming complex data matrices. The demand for dimension reduction of complex data is growing immensely to improvise data prediction, analysis and visualization. In general, dimension reduction techniques are defined as a compression of dataset from higher dimensional matrix to lower dimensional matrix. Several computational techniques have been implemented for data dimension reduction, which is further segregated into two categories such as feature extraction and feature selection. In this review, a detailed investigation of various feature extraction and feature selection methods has been carried out with a systematic comparison of several dimension reduction techniques for the analysis of high dimensional data and to overcome the problem of data loss. Then, some case studies are also cited to verify the better approach for data dimension reduction by considering few advances described in the technical literature. This review paper may guide researchers to choose the most effective method for satisfactory analysis of high dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

References

  • Aggarwal CC, Cheng XZ (2012) Mining text data. Springer, Berlin

    Book  Google Scholar 

  • Al-Bakri NF, Soukaena HH (2018) Reducing data sparsity in recommender systems. Al-Naharin J Sci 21:138–147

    Google Scholar 

  • Alexander CA, Wang L (2017) High dimensional data in healthcare: a new frontier inpersonalized medicine. Open Access J Trans Med Res 1–5

  • Alfaar AS, Waleed MH, Mohamed SB, Ibrahim Q (2016) Neonates with cancer and causes of death; lessons from 615 cases in the SEER databases. Cancer Med 6:1817–1826

    Google Scholar 

  • Al-Rawi M, Karajeh H (2007) Genetic algorithm matched filter optimization for automated detection of blood vessels from digital retinal images. Comput Methods Prog Biomed 87(3):248–253

    Google Scholar 

  • Ang JC, Andri M, Habibollah H, Haza Nuzly AH (2016) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinf 13(5):971–989

    Google Scholar 

  • Archenaa J, Mary Anita EA (2015) A survey of big data analytics in healthcare and government. Procedia Comput Sci 50:408–413

    Google Scholar 

  • Behbahani BA, Yazdi FT, Shahidi F, Mortazavi SA, Mohebbi M (2017) Principle component analysis (PCA) for investigation of relationship between population dynamics of microbial pathogenesis, chemical and sensory characteristics in beef slices containing Tarragon essential oil. Microb Pathog 100(105):37–50

    Google Scholar 

  • Cannistraci CV, Ravasi T, Montevecchi FM, Ideker T, Alessio M (2010) Nonlinear dimension reduction and clustering by minimum curvilinearity unfold neuropathic pain and tissue embryological classes. Bioinformatics 26(18):531–539

    Google Scholar 

  • Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28

    Google Scholar 

  • Chen J, Yang L (2011) Locally linear embedding: a survey. Artif Intell Rev 36(1):29–48

    Google Scholar 

  • Chen J, Zhang S (2009) Manifold learning based phoneme recognition. In: The proceedings of 2009 international conference on image analysis and signal processing, Taizhou, China

  • Cong I, Duan L (2016) Quantum discriminant analysis for dimensionality reduction and classification. New J Phys 18:1–10

    MATH  Google Scholar 

  • Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: Proceedings of 8th international conference on machine learning (ICML), vol 1, pp 74–81

  • David M, Wien FHT (2015).Support vector machines. The interface to LIBSVM in package, p 28

  • Deyan C, Zhao H (2012) Data security and privacy protection issues in cloud computing. In: Proceedings of 2012 international conference on computer science and electronics engineering, 1, pp 647–651

  • Ding S, Zhu H, Jia W, Su C (2012) A survey on feature extraction for pattern recognition. Artif Intell Rev 37(3):169–180

    Google Scholar 

  • Fu MC (ed) (2016) Handbook of simulation optimization. Springer, Berlin

    Google Scholar 

  • Gedik N (2016) A new feature extraction method based on multi-resolution representations of mammograms. Appl Soft Comput 44:128–133

    Google Scholar 

  • Ghosh A, Datta A, Ghosh S (2013) Self-adaptive differential evolution for feature selection in hyperspectral image data. Appl Soft Comput 13:1969–1977

    Google Scholar 

  • Gysels E, Philippe R, Patrick C (2005) SVM-based recursive feature elimination to compare phase synchronization computed from broadband and narrowband EEG signals in brain–computer interfaces. Signal Process 85(11):2178–2189

    MATH  Google Scholar 

  • Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinf. https://doi.org/10.1155/2015/198363

    Article  Google Scholar 

  • Hong Y, Dong Z (2004) Genetic algorithms with applications in wireless communications. Int J Syst Sci 35(13):751–762

    MATH  Google Scholar 

  • Hossain MS, Muhammad G (2016) Healthcare big data voice pathology assessment framework. IEEE Access 4:7806–7815

    Google Scholar 

  • Hsu HH, Cheng WH, Ming-Da L (2011) Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 38:8144–8150

    Google Scholar 

  • Inbarani HH, Azar AT, Jothi G (2014) Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput Methods Programs Biomed 113(1):175–185

    Google Scholar 

  • Jain D, Singh V (2018) An efficient hybrid feature selectionmodel for dimensionality reduction. Procedia Comput Sci 132:333–341

    Google Scholar 

  • Jendoubi T, Strimmer K (2019) A whitening approach to probabilistic canonical correlation analysis for omics data integration. BMC Bioinf 20:1–13

    Google Scholar 

  • Jiang QY, Li WJ (2015) Scalable graph hashing with feature transformation. In: Proceeding of 24th international joint conference on artificial intelligence, pp 2248–2254

  • Jinxing C, Yang Y, Li Li BX, Zhang S, Deng C (2017) Maximum relevance minimum common redundancy feature selection for nonlinear data. Inf Sci 409:68–86

    MATH  Google Scholar 

  • Jothi JAA, Rajam VMA (2017) A survey on automated cancer diagnosis from histopathology images. Artif Intell Rev 48:31–81

    Google Scholar 

  • Kapsoulis D, Tsiakas K, Trompoukis X, Asouti V, Giannakoglou K (2018) Evolutionary multi-objective optimization assisted by matamodels, kernel PCA and multi-criteria decision making techniques with applications in aerodynamics. Appl Soft Comput 64:1–13

    Google Scholar 

  • Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. Proc AAAI 92(2):129–134

    Google Scholar 

  • Lee JA, Lendasse A, Verleysen M (2004) Nonlinear projection with curvilinear distances: isomap versus curvilinear distance analysis. Neuro-Comput 57:49–76

    Google Scholar 

  • Li Y, Ngom A (2013) The non-negative matrix factorization toolbox for biological data mining. Source Code Biol Med 8:10

    Google Scholar 

  • Li F, Wang J, Chyu MK, Tang B (2015) Weak fault diagnosis of rotating machinery based on feature reduction with supervised orthogonal local fisher discriminant analysis. Neuro-Comput 168:505–519

    Google Scholar 

  • Lichman M (2013) UCI machine learning repository, University of California, School of Information and Computer Science, Irvine, CA. https://archive.ics.uci.edu/ml/datasets.php

  • Liu H, Motoda H (2007) Computational methods of feature selection. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Knowl Data Eng 27:3111–3124

    Google Scholar 

  • Mahale RA, Chavan SD (2012) A survey: evolutionary and swarm based bio-inspired optimization algorithms. Int J Sci Res 2(12):1–6

    Google Scholar 

  • Malik ZK, Hussain A, Wu J (2016) An online generalized eigenvalue version of laplacian eigenmaps for visual big data. Neurocomputing 173(2):127–136

    Google Scholar 

  • Mathias F., Metka B., and Bauer-wersing U. (2018). Navigation system based on slow feature gradients. U.S. Patent Application 15/905,962, filed August 30, 2018

  • Mazomenos EB, Biswas D, Acharyya A, Chen T, Maharatna K, Rosengarten J, Morgan J, Curzen N (2013) A low-complexity ECG feature extraction algorithm for mobile healthcare applications. IEEE J Biomed Health Inf 2:459–469

    Google Scholar 

  • McDonnell LA, Remoortere AV, Velde ND, Zeijl RJMV, Deelder AM (2010) Imaging mass spectrometry data reduction: automated feature identification and extraction. J Am Soc Mass Spectrom 21(12):1969–1978

    Google Scholar 

  • Michaeli T, Wang W, Livescu K (2016) Nonparametric canonical correlation analysis. In: Proceedings of international conference on machine learning, pp 1967–1976

  • Naji S, Jalab HA, Kareem SA (2019) A survey on skin detection in colored images. Artif Intell Rev 52:1041–1087

    Google Scholar 

  • Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint ℓ2, 1-norms minimization. In: Advances in neural information processing systems, pp 1813–1821

  • Ozdenizci O, Erdogmus D (2019) Information theoretic feature transformation learning for brain interfaces. IEEE Trans Biomed Eng 67:69–78

    Google Scholar 

  • Patro S, Sahu KK (2015) Normalization: a preprocessing stage. arXiv preprint

  • Pedram G, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12:309–313

    Google Scholar 

  • Raghupati W, Raghupati V (2014) High dimensional data analytics in healthcare. Promise Potential Health Inf Sci Syst 2–3

  • Ridge K (2005) Kent-Ridge biomedical dataset repository. http://leo.ugr.es/elvira/DBCRepository/index.html

  • Sacha D, Zhang L, Sedlmair M, Lee JA, Peltonen J, Weiskopf D, North SC, Keim DA (2017) Visual interaction with dimensionality reduction: a structured literature analysis. IEEE Trans Vis Comput Gr 1:241–250

    Google Scholar 

  • Sorzano C.O, Vargas J, Montano A.P (2014). ‘A survey of dimensionality reduction techniques’. preprint arXiv, 1403-2877

  • Stanojević G, Krivokapić Z (2014) Rare tumors of the colon and rectum in colorectal cancer-surgery, diagnostics and treatment. IntechOpen, Hamilton

    Google Scholar 

  • Suguna R, Devi MS, Mathew RM (2019) Customer churn predictive analysis by component minimization using machine learning. Int J Innov Technol Explor Eng (IJITEE) 8(8):3229–3233

    Google Scholar 

  • Sun T, Wang J, Li X, Lu P, Liu F, Luo Y, Gao Q, Zhu H, Guo X (2013) Comparative evaluation of support vector machines for computer aided diagnosis of lung cancer in CT based on a multi-dimensional data set. Comput Methods Programs Biomed 111:519–524

    Google Scholar 

  • Tan PN (2018) Introduction to data mining. Pearson Education, Chennai

    Google Scholar 

  • Tao Z, Huiling L, Wenwen W, Xia Y (2018) GA-SVM based feature selection and parameter optimization in hospitalization expense modeling. Appl Soft Comput 75:323–332

    Google Scholar 

  • Van der Linden C, Dufresne Y (2017) The curse of dimensionality in voting advice applications: reliability and validity in algorithm design. J Elections Public Opin Parties 27(1):9–30

    Google Scholar 

  • Varghese K, Kolhekar MM, Hande S (2018) Denoising of facial images using non-negative matrix factorization with sparseness constraint. In: Proceedings of 3rd IEEE international conference for convergence in technology (I2CT), pp 1–4

  • Verónica B, Betanzos A, Amparo M, Sánchez CN (2017) Artificial intelligence, foundations, theory and algorithms feature selection for high-dimensional data. Springer, Berlin

    Google Scholar 

  • Wang YX, Zhang YJ (2013) Nonnegative matrix factorization: a comprehensive review. IEEE Trans Knowl Data Eng 25(16):1336–1353

    Google Scholar 

  • Wang J, Tian F, Yu H, Liu CH, Zhan K, Wang X (2018a) Diverse non-negative matrix factorization for multi-view data representation. IEEE Trans Cybern 48:2620–2632

    Google Scholar 

  • Wang H, Yu D, Li Y, Li Z, Wang G (2018b) Multi-label online streaming feature selection based on spectral granulation and mutual information. In: International joint conference on rough sets. Springer, pp 215–228

  • Wilms I, Croux C (2015) Sparse canonical correlation analysis from a predictive point of view. Biom J 57:834–851

    MathSciNet  MATH  Google Scholar 

  • Xu K, Zhang L, Pérez D, Nguyen PH, Ogilvie-Smith A (2017) Evaluating interactive visualization of multidimensional data projection with feature transformation. Multimodal Technol Interact 1(3):13

    Google Scholar 

  • Zeren DY, Adhikari N, Wong YK, Aksakalli V, Gumus AT, Abbasi B (2018) SPSA-FSR: simultaneous perturbation stochastic approximation for feature selection and ranking. arXiv preprint

  • Zeynep A, Thurau C, Bauckhage C (2011) Non-negative matrix factorization in multimodality data for segmentation and label prediction. In: Proceedings of 16th computer vision winter workshop, Austria

  • Zhang J, Hua H, Wang J (2010) Manifold learning for visualizing and analyzing high-dimensional data. IEEE Intell Syst 25(4):54–61

    Google Scholar 

  • Zhao C, Gao F (2015) A nested-loop Fisher discriminant analysis algorithm. Chemom Intell Lab Syst 146:396–406

    Google Scholar 

  • Zhi W, Zhang Y, Chen Z, Yang H, Sun Y, Kang J, Yang Y, Liang X (2016) Application of ReliefF algorithm to selecting feature sets for classification of high resolution remote sensing image. In: Proceedings of 2016 IEEE international geoscience and remote sensing symposium (IGARSS), pp 755–758

  • Zhou HF, Zhang Y, Zhang YJ, Liu HJ (2019) Feature selection based on conditional mutual information: minimum conditional relevance and minimum conditional redundancy. Appl Intell 49:883–896

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Surender Reddy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ray, P., Reddy, S.S. & Banerjee, T. Various dimension reduction techniques for high dimensional data analysis: a review. Artif Intell Rev 54, 3473–3515 (2021). https://doi.org/10.1007/s10462-020-09928-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09928-0

Keywords

Navigation