Skip to main content
Log in

MapReduce-based distributed tensor clustering algorithm

  • S.I. : Applications and Techniques in Cyber Intelligence (ATCI2022)
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Cluster analysis is one of the most fundamental methods in data mining, and it has been widely used in economics, social sciences and computer science. However, with the rapid development of Internet technology, the volume of data required for various web applications has grown rapidly, making the traditional clustering analysis methods face technical challenges. How to obtain useful information in a large amount of data quickly and efficiently is an urgent problem in many industrial fields. With the continuous development of cloud computing technology, large amounts of data can be performed quickly and efficiently. Hadoop is an open source distributed cloud computing platform with HDFS (Digital File System) and MapReduce as its core. HDFS provides massive data storage, while MapReduce uses the MapReduce programming model to achieve parallel processing. Compared with the traditional parallel programming model, it contains basic functions such as data partitioning, task scheduling, and parallel processing, making it possible for users to develop distributed applications on their own without understanding the basics of distributed basics, thus facilitating the design of parallel programs. K-means algorithm is a typical clustering analysis method, which is widely used in industry, but the number of iterations will increase significantly due to the growth of data volume, thus reducing the efficiency of computation. In order to better apply to the cluster analysis of large-scale data, this paper firstly implements a parallelization algorithm based on MapReduce on Hadoop platform using the basic idea of MapReduce and improves the K-means algorithm for the problems of blindness and easy to fall into local optimum when selecting randomly in clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Eken S, Sayar A (2021) A MapReduce-based distributed and scalable framework for stitching of satellite mosaic images. Arab J Geosci 14(18):1–16

    Article  Google Scholar 

  2. Gupta AK, Varshney P, Kumar A, Prasad BR, Agarwal S (2018) Evaluation of MapReduce-based distributed parallel machine learning algorithms. In: Rajsingh EB, Veerasamy J, Alavi AH, Dinesh Peter J (eds) Advances in Big Data and Cloud Computing. Springer, Singapore, pp 101–111. https://doi.org/10.1007/978-981-10-7200-0_9

    Chapter  Google Scholar 

  3. Ryu HC, Jung S (2020) Mapreduce-based distributed clustering method using CF+ Tree. IEEE Access 8:104232–104246

    Article  Google Scholar 

  4. Barkhordari M, Niamanesh M (2018) Hengam a MapReduce-based distributed data warehouse for big data: a MapReduce-based distributed data warehouse for big data. Int J Artif Life Res (IJALR) 8(1):16–35

    Article  Google Scholar 

  5. Choi SY, Chung K (2019) Knowledge process of health big data using MapReduce-based associative mining. Pers Ubiquit Comput 24(5):571–581

    Article  Google Scholar 

  6. Sowkuntla P, Prasad PS (2020) MapReduce based improved quick reduct algorithm with granular refinement using vertical partitioning scheme. Knowl-Based Syst 189:105104

    Article  Google Scholar 

  7. Jeong H, Cha KJ (2019) An efficient MapReduce-based parallel processing framework for user-based collaborative filtering. Symmetry 11(6):748

    Article  Google Scholar 

  8. Karya G, Sitohang B, Akbar S et al (2020) Basic knowledge construction technique to reduce the volume of low-dimensional big data[C]. In: 2020 5th international conference on informatics and computing (ICIC)

  9. Baliarsingh SK, Vipsita S, Gandomi AH et al (2020) Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network. Comput Methods Programs Biomed 195:105625

    Article  Google Scholar 

  10. Jukic S, Subasi A (2017) A MapReduce-based rotation forest classifier for epileptic seizure prediction. arXiv preprint arXiv:1712.06071

  11. Kim CS, Winn MD, Sachdeva V et al (2017) K-mer clustering algorithm using aMapReduce framework: application to the parallelization of the Inchworm module of Trinity[J]. BMC Bioinform 18(1):1–15

  12. Li J, Wang J, Liu B et al (2018) An improved algorithm for optimizing MapReduce based on locality and overlapping. Tsinghua Sci Technol 23(6):744–753

    Article  Google Scholar 

  13. Lu W (2020) Improved K-means clustering algorithm for big data mining under Hadoop parallel framework[J]. J Grid Comput 18:239–250

  14. Lin Q, Zhuo B, Jiao L et al (2021) Distributed Facial Feature Clustering Algorithm Based on Spatiotemporal Locality[C]//Innovative Mobile and Internet Services in Ubiquitous Computing. In: Proceedings of the 14th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS-2020). Springer International Publishing, pp 394–403

  15. Zhao Y, Zhang W, Sun M et al (2020) An improved consensus clustering algorithm based on cell-like p systems with multi-catalysts[J]. IEEE Access 8:154502–154517

  16. Liu N, Li L, Li W et al (2021) Hyperspectral restoration and fusion with multispectral imagery via low-rank tensor-approximation[J]. IEEE Trans Geosci Remote Sens 59(9):7817–7830

  17. Yin L, Qin L, Jiang Z et al (2021) A fast parallel attribute reduction algorithm using Apache Spark[J]. Knowl Based Syst 212:106582

  18. Mehrbani E, Kahaei MH, Beheshti SA (2021) Tensor Laplacian Regularized Low-Rank Representation for Non- Uniformly Distributed Data Subspace Clustering[J]. IEEE Signal Process Lett 29:612–616

  19. Ji B-Y, You Z-H, Yang L, Zhou J-R, Peng-Wei H (2020) A MapReduce-based parallel random forest approach for predicting large-scale protein-protein interactions. In: Huang D-S, Premaratne P (eds) Intelligent Computing Methodologies: 16th International Conference. Springer International Publishing, Cham, pp 400–407. https://doi.org/10.1007/978-3-030-60796-8_34

    Chapter  Google Scholar 

  20. Behera RK, Naik D, Ramesh D et al (2020) Mr-ABC: Mapreduce-based incremental betweenness centrality in large-scale complex networks. Soc Netw Anal Min 10(1):1–13

    Article  Google Scholar 

  21. Zhao C, Dong M, Ota K et al (2019) Edge-MapReduce-based intelligent information-centric IoV: cognitive route planning. IEEE Access 7:50549–50560

    Article  Google Scholar 

  22. Asadianfam S, Shamsi M, Kenari AR (2021) TVD-MRDL: traffic violation detection system using MapReduce-based deep learning for large-scale data. Multimed Tools Appl 80(2):2489–2516

    Article  Google Scholar 

  23. Irandoost MA, Rahmani AM, Setayeshi S (2019) A novel algorithm for handling reducer side data skew in MapReduce based on a learning automata game. Inf Sci 501:662–679

    Article  Google Scholar 

  24. Banharnsakun A (2017) A MapReduce-based artificial bee colony for large-scale data clustering. Pattern Recogn Lett 93:78–84

    Article  Google Scholar 

  25. Sinha A, Jana PK (2018) A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets. J Supercomput 74(4):1562–1579

    Article  Google Scholar 

  26. Singh S, Garg R, Mishra PK (2018) Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster. Comput Electr Eng 67:348–364

    Article  Google Scholar 

  27. Sardar TH, Ansari Z (2020) An analysis of distributed document clustering using MapReduce based K-means algorithm. J Inst Eng India: Ser B 101(6):641–650

    Article  Google Scholar 

  28. Ansari Z, Afzal A, Sardar TH (2019) Data categorization using Hadoop MapReduce-based parallel K-means clustering. J Inst Eng India: Ser B 100(2):95–103

    Article  Google Scholar 

  29. Bhattacharya N, Mondal S, Khatua S (2019) A MapReduce-Based association rule mining using hadoop cluster—an application of disease Analysis. In: Saini HS, Sayal R, Govardhan A, Buyya R (eds) Innovations in Computer Science and Engineering. Springer, Singapore, pp 533–541. https://doi.org/10.1007/978-981-13-7082-3_61

    Chapter  Google Scholar 

  30. Tripathi AK, Sharma K, Bala M (2018) A novel clustering method using enhanced grey wolf optimizer and mapreduce. Big Data Res 14:93–100

    Article  Google Scholar 

  31. Mulimani M, Koolagudi SG (2019) Extraction of MapReduce-based features from spectrograms for audio-based surveillance. Digital Signal Process 87:1–9

    Article  Google Scholar 

Download references

Funding

The subject is sponsored by the National Natural Science Foundation of P. R. China (No. 61872196, No. 61872194, No. 61902196, No. 62102194 and No. 62102196), Six Talent Peaks Project of Jiangsu Province (No. RJFW-111), Postgraduate Research and Practice Innovation Program of Jiangsu Province (No. KYCX19_0909, No. KYCX19_0911, No. KYCX20_0759, No. KYCX21_0787, No. KYCX21_0788 and No. KYCX21_0799, KYCX22_1019, KYCX22_1027).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Li, P., Meng, F. et al. MapReduce-based distributed tensor clustering algorithm. Neural Comput & Applic 35, 24633–24649 (2023). https://doi.org/10.1007/s00521-023-08415-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08415-1

Keywords

Navigation