Skip to main content
Log in

An efficient incremental clustering based improved K-Medoids for IoT multivariate data cluster analysis

  • Published:
Peer-to-Peer Networking and Applications Aims and scope Submit manuscript

Abstract

Clustering the data is an efficient way present in data analysis. Most of the clustering techniques are unable to underlying the hidden patterns, since the algorithms are supposed to store the data at a time from the data repository for data analysis. These data objects are infeasible, since the Internet of Things (IoT) dynamic data is too large to process and perform analysis over it. In olden days, the traditional clustering techniques implemented on batch processing systems with static data. In recent days, while considering IoT, Big data, and sensor technologies, the multivariate data is huge and unable to perform analysis with traditional approaches. Therefore, clustering multivariate data with an efficient way is a challenging problem and yielding insignificant clustering results. To overcome these limitations, in this paper, an Efficient Incremental Clustering by Fast Search driven Improved K-Medoids (EICFS-IKM) for IoT data integration and cluster analysis is proposed. The proposed EICFS-IKM contains cluster creating and cluster merging techniques for integrating the current dynamic multivariate data into the existing pattern data for final clustering data. For dynamically updating and modifying the centers of clusters of the new arriving instances, the improved k-medoids is employed. The proposed EICFS-IKM has implemented and experimented on four UCI machine learning data repository datasets, two dynamic industrial datasets, two linked stream datasets and compared with leading approaches namely IAPNA, IMMFC, ICFSKM, and E-ICFSMR and yielding encouraging results with computational time, NMI, purity and clustering accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Sivadi Balakrishna and M Thirumaran “Semantic Interoperable Traffic Management Framework for IoT Smart City Applications”, EAI Endorsed Transactions on Internet of Things, EAI, Vol 4 Issue 13, ISSN: 2414–1399 pp. 1–17, 2018

  2. Sivadi Balakrishna and M Thirumaran “Towards an Optimized Semantic Interoperability Framework for IoT-Based Smart Home Applications”, In: Balas V., Solanki V., Kumar R., Khari M. (eds) Internet of Things and Big Data Analytics for Smart Generation. Intelligent Systems Reference Library, vol 154. Springer, Cham, Print ISBN 978–3–030-04202-8, Online ISBN 978–3–030-04202-5, pp 185–211, 2018

  3. Perera C, Zaslavsky A, Christen P, Georgakopoulos D (2014) Sensing As a Service Model for Smart Cities Supported by Internet of Things. European Transactions on Emerging Telecommunications Technologies 25(1):81–93

    Google Scholar 

  4. Sivadi Balakrishna and M Thirumaran “Semantic Interoperability in IoT and Big Data for Healthcare: A Collaborative Approach”, In: Balas V., Solanki V., Kumar R., Khari M. (eds) A Handbook of Data Science Approaches for Biomedical Engineering, Elsevier, ISBN 9780128183182, pp 1–36, 2019

  5. Sheng Z, Mahapatra C, Zhu C, Leung VCM (2015) Recent advances in industrial wireless sensor networks toward efficient management in IoT. IEEE Access 3:622–637

    Google Scholar 

  6. Sivadi Balakrishna, M. Thirumaran, Vijender Kumar Solanki, and Raghvendra Kumar “Survey on machine learning based clustering algorithms for IoT data cluster analysis” In: Proceedings of the 4th International Conference on Research in Intelligent and Computing in Engineering (RICE), Springer, Hanoi University of Industry, Vietnam, pp 1-9, 2020

  7. Sivadi Balakrishna, M Thirumaran, and Vijender Kumar Solanki “A Framework for IoT Sensor Data Acquisition and Analysis”, EAI Endorsed Transactions on Internet of Things, EAI, Vol 4 Issue 16, ISSN: 2414-1399 pp. 1–13, 2018

  8. Sivadi Balakrishna and M Thirumaran “Programming Paradigms for IoT Applications: An Exploratory Study”, In: Solanki, V., Díaz, V., Davim, J. (Eds.) Handbook of IoT and Big Data. Boca Raton: CRC Press, Taylor & Francis Group, Print ISBN: 9781138584204 eBook ISBN: 9780429053290, pp 23–57, 2019

  9. Sivadi Balakrishna, M Thirumaran, and Vijender Kumar Solanki “ IoT Sensor Data Integration in Healthcare Using Semantics and Machine Learning Approaches”, In: V. E. Balas et al. (eds.), A Handbook of Internet of Things in Biomedical and Cyber Physical System, Intelligent Systems Reference Library 165, Springer, Print ISBN 978–3–030-23982-4, Online ISBN 978–3–030-23983-1, pp 275–300, 2019

  10. Sivadi Balakrishna, M. Thirumaran, Vijender Kumar Solanki, and Vinit Kumar Gunjan “Performance Analysis of Linked Stream Big Data Processing Mechanisms for Unifying IoT Smart Data” In: Proceedings of International Conference on Intelligent Computing and Communication Technologies (ICICCT), Springer, pp. 680-689, 2019, Hyderabad, India

  11. Sivadi Balakrishna, M. Thirumaran, Vijender Kumar Solanki, and Vinit Kumar Gunjan “A Survey on Semantic approaches for IoT Data Integration in Smart Cities” In: Proceedings of International Conference on Intelligent Computing and Communication Technologies (ICICCT), Springer, pp. 827-835, 2019, Hyderabad, India

  12. Fong S, Wong R, Vasilakos AV (2016) Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data. IEEE Transactions on Services Computing 9(1):33–45

    Google Scholar 

  13. Gu L, Zeng D, Guo S, Xiang Y, Hu J (2016) A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers. IEEE Transactions on Computers 65(1):19–29

    MathSciNet  MATH  Google Scholar 

  14. Whitmore A, Agarwal A, Xu L (2014) The internet of things-a survey of topics and trends. Inf Syst Front 17(2):261–274

    Google Scholar 

  15. Rodriguez A, Laio A (2014) Clustering by Fast Search and Find of Density Peaks. Science 344(6191):1492–1496

    Google Scholar 

  16. Hahsler M, Bolaos M (2016) Clustering data streams based on shared density between micro-clusters. IEEE Transactions on Knowledge & Data Engineering 28:1449–1461

    Google Scholar 

  17. Ester M, Kriegel H, Sander J, “A density based algorithm for discovering clusters in large spatial databases with Noise”. In: Proceedings of KDD. AAAI Press, pp. 226–231, 1996

  18. Miller Z, Dickinson B, Deitrick W (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73

    Google Scholar 

  19. Azzopardi J, Staff C (2012) Incremental clustering of news reports. Algorithms 5:364–378

    MATH  Google Scholar 

  20. Guha S & Mishra N. “Clustering data streams. Data stream management”, Springer Berlin Heidelberg, pp 359–366, 2016

  21. Amini A (2014) Wah T Y, Saboohi H. “on density-based data streams clustering algorithms: a survey”. J Comput Sci Technol 29:116–141

    Google Scholar 

  22. Ackermann M (2012) R, Rtens M, Raupach C, “StreamKM++: a clustering algorithm for data streams”. Journal of Experimental Algorithmics 17:1–30

    Google Scholar 

  23. Cao F, Ester M, Qian W, “Density-based clustering over an evolving data stream with noise”, In Proceedings of SIAM International Conference on Data Mining, April 20–22, Bethesda, USA, pp. 328–339, 2006

  24. Hr S, Lazarescu M (2009) Incremental clustering of dynamic data streams using connectivity based representative points. Data Knowl Eng 68:1–27

    Google Scholar 

  25. Gama J (2011) Rodrigues P P, Lopes L. “clustering distributed sensor data streams using local processing and reduced communication”. Intelligent Data Analysis 15:3–28

    Google Scholar 

  26. Silva JA, Faria ER, Barros RC (2013) Data stream clustering: A survey. ACM Comput Surv 46:13–44

    MATH  Google Scholar 

  27. Chen C Y, Hwang S C, Oyang Y J. “An incremental hierarchical data clustering algorithm based on gravity theory”, In Proceedings of Pacific Asia Conference on Advances in Knowledge Discovery and Data Mining Springer-Verlag, pp.237–250, 2002

  28. Patra BK, Ville O, Launonen R (2013) Distance based incremental clustering for mining clusters of arbitrary shapes. Pattern Recognition and Machine Intelligence:229–236

  29. Bandyopadhyay S, Murty M N. “Axioms to characterize efficient incremental clustering”, In proceedings of International Conference on Pattern Recognition IEEE, pp.450–455, 2017

  30. Ackerman M, Dasgupta S (2014) Incremental clustering: the case for extra clusters. Adv Neural Inf Proces Syst:307–315

  31. Yu H, Zhang C, Wang G (2015) A tree-based incremental overlapping clustering method using the three-way decision theory. Knowl-Based Syst 91:189–203

    Google Scholar 

  32. Pérez-Suárez A, Martínez-Trinidad J (2013) F, Carrasco-Ochoa J a, “an algorithm based on density and compactness for dynamic overlapping clustering”. Pattern Recogn 46:3040–3055

    Google Scholar 

  33. Qiu B Z, Yue F, Shen J Y. “ BRIM: an efficient boundary points detecting algorithm”, In proceedings of Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining Springer-Verlag, pp.761–768, 2007

  34. Li X, Geng P, Qiu B (2016) A cluster boundary detection algorithm based on shadowed set. Intelligent Data Analysis 20:29–45

    Google Scholar 

  35. Tong Q, Li X, Yuan B (2017) A highly scalable clustering scheme using boundary information. Pattern Recogn Lett 89:1–7

    Google Scholar 

  36. Sun L, Guo C (2014) Incremental Affinity Propagation Clustering Based on Message Passing. IEEE Transactions on Knowledge and Data Engineering 26(11):2731–2744

    Google Scholar 

  37. V. Chandrasekhar, C. Tan, M. Wu, L. Li, X. Li, and L. J. Hwee, “Incremental Graph Clustering for Efficient Retrieval from Streaming Egocentric Video Data,” in Proc. of 22nd International Conference on Pattern Recognition, pp. 2631–2636, 2014

  38. Shen J, Lin Y, Chen Z, Zhao M (2005) Mining User Navigation Pattern Using Incremental Ant Colony Clustering. Computer Applications 25(7):1654–1660

    Google Scholar 

  39. Zhang B, Su Y, Cao B (2008) Incremental Web User Clustering Based on Ant Colony Clustering Model. Microcomputer Information 24(15):231–233

    Google Scholar 

  40. Z. Chen and C. Meng, “An Incremental Clustering Algorithm Based on Swarm Intelligence Theory,” in Proc. of IEEE International Conference on Machine Learning and Cybernetics, pp. 1768–1772, 2015

  41. Liu H, Ban X (2015) Clustering by Growing Incremental Self-organizing Neural Network. Expert Systems with Applications 42(22):4965–4981

    Google Scholar 

  42. Havens T, Bezdek J, Leckie C, Hall L, Palaniswami M (2012) Fuzzy c-Means Algorithms for Very Large Data. IEEE Transactions on Fuzzy Systems 20(6):1130–1146

    Google Scholar 

  43. Wang Y, Chen L, Mei J (2014) Incremental Fuzzy Clustering With Multiple Medoids for Large Data. IEEE Transactions on Fuzzy Systems 22(6):1557–1568

    Google Scholar 

  44. Vijaya PA, Murty MN, Subramanian DK (2004) Leaders-subleaders: an efficient hierarchical clustering algorithm for large data sets. Pattern Recogn Lett 25(4):505–513

    Google Scholar 

  45. Popat SK, Emmanuel M (2014) Review and Comparative Study of Clustering Techniques. International Journal of Computer Science and Information Technologies 5(1):805–812

    Google Scholar 

  46. Zhang Q, Yang LT, Chen Z (2016) Privacy Preserving Deep Computation Model on Cloud for Big Data Feature Learning. IEEE Transactions on Computers 65(5):1351–1362

    MathSciNet  MATH  Google Scholar 

  47. Zhang Q, Chen Z (2014) A Weighted Kernel Possibilistic c-Means Algorithm Based on Cloud Computing for Clustering Big Data. International Journal of Communication Systems 27(9):1378–1391

    Google Scholar 

  48. Lu, Yakai, Zhe Tian, Peng, Jide Niu, Wancheng Li, and Hejia Zhang. "GMM clustering for heating load patterns in-depth identification and prediction model accuracy improvement of district heating system." Energy and Buildings 190, pp 49–60, 2019. doi: https://doi.org/10.1016/j.enbuild.2019.02.014

  49. Han, Xu, Runbang Cui, Yanfei Lan, Yanzhe Kang, Jiang Deng, and Ning Jia. "A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets." International Journal of Machine Learning and Cybernetics, pp 1-13, 2019

  50. Diaz-Rozo J, Bielza C, Larrañaga P (2018) Clustering of Data Streams with Dynamic Gaussian Mixture Models: An IoT Application in Industrial Processes. IEEE Internet of Things Journal 5(5):3533–3547

    Google Scholar 

  51. He Z, Ho C-H (2019) An improved clustering algorithm based on finite Gaussian mixture model. Multimedia Tools and Applications 78(17):24285–24299

    Google Scholar 

  52. Wan Y, Liu X, Wu Y, Guo L, Chen Q, Wang M (2018) ICGT: A novel incremental clustering approach based on GMM tree. Data & Knowledge Engineering 117:71–86

    Google Scholar 

  53. Jianwei HU, Xin CHE, Man ZHOU, Yanpeng CUI (2019) Incremental clustering method based on Gaussian mixture model to identify malware family. J Commun 40(6):148–159

    Google Scholar 

  54. Zhao Y, Shrivastava AK, Tsui KL (2018) Regularized Gaussian Mixture Model for High-Dimensional Clustering. IEEE Transactions on Cybernetics:1–12

  55. Zhang Q, Zhu C, Yang LT, Chen Z, Zhao L, Li P (2017) An incremental CFS algorithm for clustering large data in industrial internet of things. IEEE Transactions on Industrial Informatics 13(3):1193–1201

    Google Scholar 

  56. Liang Z, Chen Z, Yang Y, Liang Z, Jane Wang Z (2019) ICFS clustering multiple representatives for large data. IEEE Transactions on Neural Networks and Learning Systems 30(3):728–738

    Google Scholar 

  57. Aras Can Onal, Omer Berat Sezar, Murat Ozbayoglu, Erdogan Dogdu, “ Weather Data Analysis and Sensor Fault Detection Using An Extended IoT Framework with Semantics, Big Data, and Machine Learning”, International Conference on Big Data (BIGDATA), IEEE, pp 2037–2046, 2017

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sivadi Balakrishna.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection: Special Issue on Future Networking Applications Plethora for Smart Cities

Guest Editors: Mohamed Elhoseny, Xiaohui Yuan, and Saru Kumari

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Balakrishna, S., Thirumaran, M., Padmanaban, R. et al. An efficient incremental clustering based improved K-Medoids for IoT multivariate data cluster analysis. Peer-to-Peer Netw. Appl. 13, 1152–1175 (2020). https://doi.org/10.1007/s12083-019-00852-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12083-019-00852-x

Keywords

Navigation