Advertisement

A Review of Dimensionality Reduction in High-Dimensional Data Using Multi-core and Many-core Architecture

  • Siddheshwar V. PatilEmail author
  • Dinesh B. Kulkarni
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 964)

Abstract

Data is growing. The growth is twofold – size and dimensionality. To deal with such a huge data – “the big data”, researchers, data analysts are relying on the machine learning and data mining techniques. However, the performance of these techniques is degrading due to this twofold growth that further adds to the complexity of the data. The need of the hour is to leave up with the complexity of such a datasets and to focus on improving the accuracy of data mining and machine learning techniques as well as on enhancing the performance of the algorithms. The accuracy of the mining algorithms can be enhanced by reducing dimensionality of data. Not all information that contributes to the dimensionality of the datasets is important for the said techniques of data analysis – the dimensionality can be reduced. Contemporary research focuses on the techniques of removing unwanted, unnecessary, redundant information; importantly removing the data that adds up to dimensionality making it high dimensional. The performance of the algorithm is further upgraded with the help of the parallel computing on high-performance computing (HPC) infrastructure. Parallel computing on multi-core and many-core architectures, on the low-cost general purpose graphics processing unit (GPGPU) is a boon for data analysts, researchers for finding high-performance solutions. The GPGPU have gained popularity due to their cost benefits and very high data processing power. Also, parallel processing techniques achieve better speedup and scaleup. The objective of this paper is to present an insight for the researchers, data analysts on how the high dimensionality of the data can be dealt with so that the accuracy and computational complexity of the machine learning and data mining techniques is not compromised. To prove the point, this work discusses various parallel computing approaches on multi-core (CPU) and many-core architectures (GPGPU) for time complexity enhancement. Moreover, the contemporary dimensionality reduction methods are reviewed.

Keywords

High-performance computing Parallel computing Dimensionality reduction Classification High-dimensionality data General purpose graphics processing unit 

References

  1. 1.
    Yamada, M., et al.: Ultra high-dimensional nonlinear feature selection for big biological data. IEEE Trans. Knowl. Data Eng. 30(7), 1352–1365 (2018)CrossRefGoogle Scholar
  2. 2.
    Wu, Z., Li, Y., Plaza, A., Li, J., Xiao, F., Wei, Z.: Parallel and distributed dimensionality reduction of hyperspectral data on cloud computing architectures. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 9(6), 2270–2278 (2016)CrossRefGoogle Scholar
  3. 3.
    Ramírez-Gallego, S., et al.: An information theory-based feature selection framework for big data under apache spark. IEEE Trans. Syst. Man Cybern. Syst. 48(9), 1441–1453 (2018)CrossRefGoogle Scholar
  4. 4.
    Martel, E., et al.: Implementation of the principal component analysis onto high-performance computer facilities for hyperspectral dimensionality reduction: results and comparisons. Remote Sens. 10(6), 864 (2018)CrossRefGoogle Scholar
  5. 5.
    Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417 (1933)CrossRefGoogle Scholar
  6. 6.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788 (1999)CrossRefGoogle Scholar
  7. 7.
    Achlioptas, D.: Database-friendly random projections. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 274–281. ACM (2001)Google Scholar
  8. 8.
    Bengio, Y., et al.: Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  10. 10.
    Chen, M., Xu, Z., Weinberger, K., Sha, F.: Marginalized denoising autoencoders for domain adaptation. arXiv preprint arXiv:1206.4683 (2012)
  11. 11.
    Cox, T.F., Cox, M.A.: Multidimensional Scaling. Chapman and Hall/CRC, Boca Raton (2000)CrossRefGoogle Scholar
  12. 12.
    Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
  13. 13.
    Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
  14. 14.
    Kasun, L.L.C., Yang, Y., Huang, G.B., Zhang, Z.: Dimension reduction with extreme learning machine. IEEE Trans. Image Process. 25(8), 3906–3918 (2016)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1999).  https://doi.org/10.1007/3-540-49257-7_15CrossRefGoogle Scholar
  16. 16.
    Zubova, J., Liutvinavicius, M., Kurasova, O.: Parallel computing for dimensionality reduction. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 230–241. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46254-7_19CrossRefGoogle Scholar
  17. 17.
    Zhao, L., Chen, Z., Hu, Y., Min, G., Jiang, Z.: Distributed feature selection for efficient economic big data analysis. IEEE Trans. Big Data 2, 164–176 (2018)CrossRefGoogle Scholar
  18. 18.
    Cuomo, S., Galletti, A., Marcellino, L., Navarra, G., Toraldo, G.: On GPU-CUDA as preprocessing of fuzzy-rough data reduction by means of singular value decomposition. Soft Comput. 22(5), 1525–1532 (2018)CrossRefGoogle Scholar
  19. 19.
    Tang, M., Yu, Y., Aref, W.G., Malluhi, Q., Ouzzani, M.: Efficient parallel skyline query processing for high-dimensional data. IEEE Trans. Knowl. Data Eng. 30, 1838–1851 (2018)CrossRefGoogle Scholar
  20. 20.
    Passi, K., Nour, A., Jain, C.K.: Markov blanket: efficient strategy for feature subset selection method for high dimensional microarray cancer datasets. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1864–1871. IEEE (2017)Google Scholar
  21. 21.
    Li, W., Zhang, L., Zhang, L., Du, B.: GPU parallel implementation of isometric mapping for hyperspectral classification. IEEE Geosci. Remote Sens. Lett. 14(9), 1532–1536 (2017)CrossRefGoogle Scholar
  22. 22.
    Dheeru, D., Taniskidou, E.K.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
  23. 23.

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Walchand College of EngineeringSangliIndia

Personalised recommendations