A Review of Dimensionality Reduction in High-Dimensional Data Using Multi-core and Many-core Architecture
Data is growing. The growth is twofold – size and dimensionality. To deal with such a huge data – “the big data”, researchers, data analysts are relying on the machine learning and data mining techniques. However, the performance of these techniques is degrading due to this twofold growth that further adds to the complexity of the data. The need of the hour is to leave up with the complexity of such a datasets and to focus on improving the accuracy of data mining and machine learning techniques as well as on enhancing the performance of the algorithms. The accuracy of the mining algorithms can be enhanced by reducing dimensionality of data. Not all information that contributes to the dimensionality of the datasets is important for the said techniques of data analysis – the dimensionality can be reduced. Contemporary research focuses on the techniques of removing unwanted, unnecessary, redundant information; importantly removing the data that adds up to dimensionality making it high dimensional. The performance of the algorithm is further upgraded with the help of the parallel computing on high-performance computing (HPC) infrastructure. Parallel computing on multi-core and many-core architectures, on the low-cost general purpose graphics processing unit (GPGPU) is a boon for data analysts, researchers for finding high-performance solutions. The GPGPU have gained popularity due to their cost benefits and very high data processing power. Also, parallel processing techniques achieve better speedup and scaleup. The objective of this paper is to present an insight for the researchers, data analysts on how the high dimensionality of the data can be dealt with so that the accuracy and computational complexity of the machine learning and data mining techniques is not compromised. To prove the point, this work discusses various parallel computing approaches on multi-core (CPU) and many-core architectures (GPGPU) for time complexity enhancement. Moreover, the contemporary dimensionality reduction methods are reviewed.
KeywordsHigh-performance computing Parallel computing Dimensionality reduction Classification High-dimensionality data General purpose graphics processing unit
- 7.Achlioptas, D.: Database-friendly random projections. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 274–281. ACM (2001)Google Scholar
- 10.Chen, M., Xu, Z., Weinberger, K., Sha, F.: Marginalized denoising autoencoders for domain adaptation. arXiv preprint arXiv:1206.4683 (2012)
- 20.Passi, K., Nour, A., Jain, C.K.: Markov blanket: efficient strategy for feature subset selection method for high dimensional microarray cancer datasets. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1864–1871. IEEE (2017)Google Scholar
- 22.Dheeru, D., Taniskidou, E.K.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
- 23.Hyperspectral dataset. http://lesun.weebly.com/hyperspectral-data-set.html