Hash-tree PCA: accelerating PCA with hash-based grouping

Battulga, Lkhagvadorj; Lee, Sang-Hyun; Nasridinov, Aziz; Yoo, Kwan-Hee

doi:10.1007/s11227-019-02947-x

Hash-tree PCA: accelerating PCA with hash-based grouping

Published: 11 July 2019

Volume 76, pages 8248–8264, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Lkhagvadorj Battulga¹,
Sang-Hyun Lee²,
Aziz Nasridinov³ &
…
Kwan-Hee Yoo³

348 Accesses
2 Citations
Explore all metrics

Abstract

In data mining or machine learning, one of the most commonly used feature extraction techniques is principal component analysis (PCA). However, it performs poorly on a large dataset. In this paper, we propose a new method of accelerating conventional PCA, named hash-tree PCA. It samples the objects that are similar to each other without losing the original data distribution. First, it explores similar objects and stores them in hash tables. Afterward, it samples a certain number of the objects from each hash table and creates a new dataset with a reduced number of objects. Finally, it executes PCA on the sampled dataset. Experimental results show that our method outperforms the PCA and fast PCA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

References

Augusto JC (2009) Past, present and future of ambient intelligence and smart environments. In: International Conference on Agents and Artificial Intelligence, pp. 3–15. Springer, Berlin
Pearson K (1901) LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos Mag J Sci 2(11):559–572
Article Google Scholar
McAfee A, Brynjolfsson E, Davenport TH, Patil DJ, Barton D (2012) Big data: the management revolution. Harvard Bus Rev 90(10):60–68
Google Scholar
Funatsu N, Kuroki Y (2010) Fast parallel processing using GPU in computing L1-PCA bases. In: IEEE Region 10 Conference TENCON, pp 2087–2090
Vogt F, Tacke M (2001) Fast principal component analysis of large data sets. Chemometr Intell Lab Syst 59(1–2):1–18
Article Google Scholar
Battulga L, Nasridinov A, Yoo KH (2017) Quad-PCA: quad-tree based data composition for fast PCA. In: International Conference on Big Data Applications and Services, pp 331–338
Golub GH (1996) CF van loan. Matrix Computations, The Johns Hopkins
Google Scholar
Zhu X, Li X, Zhang S, Xu Z, Yu L, Wang C (2017) Graph PCA hashing for similarity search. IEEE Trans Multimed 19(9):2033–2044
Article Google Scholar
Mao M, Zheng Z, Chen Z, Liu H, He X, Ye R (2016) Two-dimensional pca hashing and its extension. In: 23rd International Conference on Pattern Recognition (ICPR), pp 1624–1629
Honda K, Notsu A, Ichihashi H (2010) Fuzzy PCA-guided robust k-means clustering. IEEE Trans Fuzzy Syst 18(1):67–79
Article Google Scholar
Andrecut M (2009) Parallel GPU implementation of iterative PCA algorithms. J Comput Biol 16(11):1593–1599
Article MathSciNet Google Scholar
Jain A, Bakshi M, Kalele A, Subramanian E (2015) On accelerating concurrent PCA computations for financial risk applications. In: IEEE 22nd International Conference on High Performance Computing (HiPC), pp 175–184
Sharma A, Paliwal KK (2007) Fast principal component analysis using fixed-point algorithm. Pattern Recogn Lett 28(10):1151–1155
Article Google Scholar
Wang J, Barreto A, Rishe N, Andrian J, Adjouadi M (2011) A fast incremental multilinear principal component analysis algorithm. Int J Innov Comput Inf Control 7:6019–6040
Google Scholar
Bartelmaos S, Abed-Meraim K (2008) Fast principal component extraction using givens rotations. IEEE Signal Process Lett 15:369–372
Article Google Scholar
Borzsony S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings 17th International Conference on Data Engineering, pp 421–430
Samet H (2006) Foundations of multidimensional and metric data structures. Morgan Kaufmann, Burlington
MATH Google Scholar
Cao Y, Qi H, Zhou W, Kato J, Li K, Liu X, Gui J (2018) Binary hashing for approximate nearest neighbor search on big data: a survey. IEEE Access 6:2039–2054
Article Google Scholar
Wang J, Liu W, Kumar S, Chang SF (2016) Learning to hash for indexing big data—A survey. Proc IEEE 104(1):34–57
Article Google Scholar
Song feature dataset. www.kaggle.com/uciml/msd-audio-features. Accessed 18 July 2018

Download references

Acknowledgements

This research was supported by the Ministry of Trade, Industry & Energy (MOTIE, Korea) under Industrial Technology Innovation Program (No. 10082578, Development of intelligent operation system based on big data for production process efficiency and quality optimization in non-ferrous metal industry).

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, The Netherlands
Lkhagvadorj Battulga
YURA Co., Ltd, Seongnam, Gyeunggi, South Korea
Sang-Hyun Lee
Department of Computer Science, Chungbuk National University, Cheongju, South Korea
Aziz Nasridinov & Kwan-Hee Yoo

Authors

Lkhagvadorj Battulga
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Hyun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Aziz Nasridinov
View author publications
You can also search for this author in PubMed Google Scholar
Kwan-Hee Yoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kwan-Hee Yoo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Battulga, L., Lee, SH., Nasridinov, A. et al. Hash-tree PCA: accelerating PCA with hash-based grouping. J Supercomput 76, 8248–8264 (2020). https://doi.org/10.1007/s11227-019-02947-x

Download citation

Published: 11 July 2019
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11227-019-02947-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hash-tree PCA: accelerating PCA with hash-based grouping

Abstract

Access this article

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

Feature selection techniques for machine learning: a survey of more than two decades of research

Big data preprocessing: methods and prospects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hash-tree PCA: accelerating PCA with hash-based grouping

Abstract

Access this article

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

Feature selection techniques for machine learning: a survey of more than two decades of research

Big data preprocessing: methods and prospects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation