Abstract
Clustering the data is an efficient way present in data analysis. Most of the clustering techniques are unable to underlying the hidden patterns, since the algorithms are supposed to store the data at a time from the data repository for data analysis. These data objects are infeasible, since the Internet of Things (IoT) dynamic data is too large to process and perform analysis over it. In olden days, the traditional clustering techniques implemented on batch processing systems with static data. In recent days, while considering IoT, Big data, and sensor technologies, the multivariate data is huge and unable to perform analysis with traditional approaches. Therefore, clustering multivariate data with an efficient way is a challenging problem and yielding insignificant clustering results. To overcome these limitations, in this paper, an Efficient Incremental Clustering by Fast Search driven Improved K-Medoids (EICFS-IKM) for IoT data integration and cluster analysis is proposed. The proposed EICFS-IKM contains cluster creating and cluster merging techniques for integrating the current dynamic multivariate data into the existing pattern data for final clustering data. For dynamically updating and modifying the centers of clusters of the new arriving instances, the improved k-medoids is employed. The proposed EICFS-IKM has implemented and experimented on four UCI machine learning data repository datasets, two dynamic industrial datasets, two linked stream datasets and compared with leading approaches namely IAPNA, IMMFC, ICFSKM, and E-ICFSMR and yielding encouraging results with computational time, NMI, purity and clustering accuracy.
Similar content being viewed by others
References
Sivadi Balakrishna and M Thirumaran “Semantic Interoperable Traffic Management Framework for IoT Smart City Applications”, EAI Endorsed Transactions on Internet of Things, EAI, Vol 4 Issue 13, ISSN: 2414–1399 pp. 1–17, 2018
Sivadi Balakrishna and M Thirumaran “Towards an Optimized Semantic Interoperability Framework for IoT-Based Smart Home Applications”, In: Balas V., Solanki V., Kumar R., Khari M. (eds) Internet of Things and Big Data Analytics for Smart Generation. Intelligent Systems Reference Library, vol 154. Springer, Cham, Print ISBN 978–3–030-04202-8, Online ISBN 978–3–030-04202-5, pp 185–211, 2018
Perera C, Zaslavsky A, Christen P, Georgakopoulos D (2014) Sensing As a Service Model for Smart Cities Supported by Internet of Things. European Transactions on Emerging Telecommunications Technologies 25(1):81–93
Sivadi Balakrishna and M Thirumaran “Semantic Interoperability in IoT and Big Data for Healthcare: A Collaborative Approach”, In: Balas V., Solanki V., Kumar R., Khari M. (eds) A Handbook of Data Science Approaches for Biomedical Engineering, Elsevier, ISBN 9780128183182, pp 1–36, 2019
Sheng Z, Mahapatra C, Zhu C, Leung VCM (2015) Recent advances in industrial wireless sensor networks toward efficient management in IoT. IEEE Access 3:622–637
Sivadi Balakrishna, M. Thirumaran, Vijender Kumar Solanki, and Raghvendra Kumar “Survey on machine learning based clustering algorithms for IoT data cluster analysis” In: Proceedings of the 4th International Conference on Research in Intelligent and Computing in Engineering (RICE), Springer, Hanoi University of Industry, Vietnam, pp 1-9, 2020
Sivadi Balakrishna, M Thirumaran, and Vijender Kumar Solanki “A Framework for IoT Sensor Data Acquisition and Analysis”, EAI Endorsed Transactions on Internet of Things, EAI, Vol 4 Issue 16, ISSN: 2414-1399 pp. 1–13, 2018
Sivadi Balakrishna and M Thirumaran “Programming Paradigms for IoT Applications: An Exploratory Study”, In: Solanki, V., Díaz, V., Davim, J. (Eds.) Handbook of IoT and Big Data. Boca Raton: CRC Press, Taylor & Francis Group, Print ISBN: 9781138584204 eBook ISBN: 9780429053290, pp 23–57, 2019
Sivadi Balakrishna, M Thirumaran, and Vijender Kumar Solanki “ IoT Sensor Data Integration in Healthcare Using Semantics and Machine Learning Approaches”, In: V. E. Balas et al. (eds.), A Handbook of Internet of Things in Biomedical and Cyber Physical System, Intelligent Systems Reference Library 165, Springer, Print ISBN 978–3–030-23982-4, Online ISBN 978–3–030-23983-1, pp 275–300, 2019
Sivadi Balakrishna, M. Thirumaran, Vijender Kumar Solanki, and Vinit Kumar Gunjan “Performance Analysis of Linked Stream Big Data Processing Mechanisms for Unifying IoT Smart Data” In: Proceedings of International Conference on Intelligent Computing and Communication Technologies (ICICCT), Springer, pp. 680-689, 2019, Hyderabad, India
Sivadi Balakrishna, M. Thirumaran, Vijender Kumar Solanki, and Vinit Kumar Gunjan “A Survey on Semantic approaches for IoT Data Integration in Smart Cities” In: Proceedings of International Conference on Intelligent Computing and Communication Technologies (ICICCT), Springer, pp. 827-835, 2019, Hyderabad, India
Fong S, Wong R, Vasilakos AV (2016) Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data. IEEE Transactions on Services Computing 9(1):33–45
Gu L, Zeng D, Guo S, Xiang Y, Hu J (2016) A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers. IEEE Transactions on Computers 65(1):19–29
Whitmore A, Agarwal A, Xu L (2014) The internet of things-a survey of topics and trends. Inf Syst Front 17(2):261–274
Rodriguez A, Laio A (2014) Clustering by Fast Search and Find of Density Peaks. Science 344(6191):1492–1496
Hahsler M, Bolaos M (2016) Clustering data streams based on shared density between micro-clusters. IEEE Transactions on Knowledge & Data Engineering 28:1449–1461
Ester M, Kriegel H, Sander J, “A density based algorithm for discovering clusters in large spatial databases with Noise”. In: Proceedings of KDD. AAAI Press, pp. 226–231, 1996
Miller Z, Dickinson B, Deitrick W (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73
Azzopardi J, Staff C (2012) Incremental clustering of news reports. Algorithms 5:364–378
Guha S & Mishra N. “Clustering data streams. Data stream management”, Springer Berlin Heidelberg, pp 359–366, 2016
Amini A (2014) Wah T Y, Saboohi H. “on density-based data streams clustering algorithms: a survey”. J Comput Sci Technol 29:116–141
Ackermann M (2012) R, Rtens M, Raupach C, “StreamKM++: a clustering algorithm for data streams”. Journal of Experimental Algorithmics 17:1–30
Cao F, Ester M, Qian W, “Density-based clustering over an evolving data stream with noise”, In Proceedings of SIAM International Conference on Data Mining, April 20–22, Bethesda, USA, pp. 328–339, 2006
Hr S, Lazarescu M (2009) Incremental clustering of dynamic data streams using connectivity based representative points. Data Knowl Eng 68:1–27
Gama J (2011) Rodrigues P P, Lopes L. “clustering distributed sensor data streams using local processing and reduced communication”. Intelligent Data Analysis 15:3–28
Silva JA, Faria ER, Barros RC (2013) Data stream clustering: A survey. ACM Comput Surv 46:13–44
Chen C Y, Hwang S C, Oyang Y J. “An incremental hierarchical data clustering algorithm based on gravity theory”, In Proceedings of Pacific Asia Conference on Advances in Knowledge Discovery and Data Mining Springer-Verlag, pp.237–250, 2002
Patra BK, Ville O, Launonen R (2013) Distance based incremental clustering for mining clusters of arbitrary shapes. Pattern Recognition and Machine Intelligence:229–236
Bandyopadhyay S, Murty M N. “Axioms to characterize efficient incremental clustering”, In proceedings of International Conference on Pattern Recognition IEEE, pp.450–455, 2017
Ackerman M, Dasgupta S (2014) Incremental clustering: the case for extra clusters. Adv Neural Inf Proces Syst:307–315
Yu H, Zhang C, Wang G (2015) A tree-based incremental overlapping clustering method using the three-way decision theory. Knowl-Based Syst 91:189–203
Pérez-Suárez A, Martínez-Trinidad J (2013) F, Carrasco-Ochoa J a, “an algorithm based on density and compactness for dynamic overlapping clustering”. Pattern Recogn 46:3040–3055
Qiu B Z, Yue F, Shen J Y. “ BRIM: an efficient boundary points detecting algorithm”, In proceedings of Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining Springer-Verlag, pp.761–768, 2007
Li X, Geng P, Qiu B (2016) A cluster boundary detection algorithm based on shadowed set. Intelligent Data Analysis 20:29–45
Tong Q, Li X, Yuan B (2017) A highly scalable clustering scheme using boundary information. Pattern Recogn Lett 89:1–7
Sun L, Guo C (2014) Incremental Affinity Propagation Clustering Based on Message Passing. IEEE Transactions on Knowledge and Data Engineering 26(11):2731–2744
V. Chandrasekhar, C. Tan, M. Wu, L. Li, X. Li, and L. J. Hwee, “Incremental Graph Clustering for Efficient Retrieval from Streaming Egocentric Video Data,” in Proc. of 22nd International Conference on Pattern Recognition, pp. 2631–2636, 2014
Shen J, Lin Y, Chen Z, Zhao M (2005) Mining User Navigation Pattern Using Incremental Ant Colony Clustering. Computer Applications 25(7):1654–1660
Zhang B, Su Y, Cao B (2008) Incremental Web User Clustering Based on Ant Colony Clustering Model. Microcomputer Information 24(15):231–233
Z. Chen and C. Meng, “An Incremental Clustering Algorithm Based on Swarm Intelligence Theory,” in Proc. of IEEE International Conference on Machine Learning and Cybernetics, pp. 1768–1772, 2015
Liu H, Ban X (2015) Clustering by Growing Incremental Self-organizing Neural Network. Expert Systems with Applications 42(22):4965–4981
Havens T, Bezdek J, Leckie C, Hall L, Palaniswami M (2012) Fuzzy c-Means Algorithms for Very Large Data. IEEE Transactions on Fuzzy Systems 20(6):1130–1146
Wang Y, Chen L, Mei J (2014) Incremental Fuzzy Clustering With Multiple Medoids for Large Data. IEEE Transactions on Fuzzy Systems 22(6):1557–1568
Vijaya PA, Murty MN, Subramanian DK (2004) Leaders-subleaders: an efficient hierarchical clustering algorithm for large data sets. Pattern Recogn Lett 25(4):505–513
Popat SK, Emmanuel M (2014) Review and Comparative Study of Clustering Techniques. International Journal of Computer Science and Information Technologies 5(1):805–812
Zhang Q, Yang LT, Chen Z (2016) Privacy Preserving Deep Computation Model on Cloud for Big Data Feature Learning. IEEE Transactions on Computers 65(5):1351–1362
Zhang Q, Chen Z (2014) A Weighted Kernel Possibilistic c-Means Algorithm Based on Cloud Computing for Clustering Big Data. International Journal of Communication Systems 27(9):1378–1391
Lu, Yakai, Zhe Tian, Peng, Jide Niu, Wancheng Li, and Hejia Zhang. "GMM clustering for heating load patterns in-depth identification and prediction model accuracy improvement of district heating system." Energy and Buildings 190, pp 49–60, 2019. doi: https://doi.org/10.1016/j.enbuild.2019.02.014
Han, Xu, Runbang Cui, Yanfei Lan, Yanzhe Kang, Jiang Deng, and Ning Jia. "A Gaussian mixture model based combined resampling algorithm for classification of imbalanced credit data sets." International Journal of Machine Learning and Cybernetics, pp 1-13, 2019
Diaz-Rozo J, Bielza C, Larrañaga P (2018) Clustering of Data Streams with Dynamic Gaussian Mixture Models: An IoT Application in Industrial Processes. IEEE Internet of Things Journal 5(5):3533–3547
He Z, Ho C-H (2019) An improved clustering algorithm based on finite Gaussian mixture model. Multimedia Tools and Applications 78(17):24285–24299
Wan Y, Liu X, Wu Y, Guo L, Chen Q, Wang M (2018) ICGT: A novel incremental clustering approach based on GMM tree. Data & Knowledge Engineering 117:71–86
Jianwei HU, Xin CHE, Man ZHOU, Yanpeng CUI (2019) Incremental clustering method based on Gaussian mixture model to identify malware family. J Commun 40(6):148–159
Zhao Y, Shrivastava AK, Tsui KL (2018) Regularized Gaussian Mixture Model for High-Dimensional Clustering. IEEE Transactions on Cybernetics:1–12
Zhang Q, Zhu C, Yang LT, Chen Z, Zhao L, Li P (2017) An incremental CFS algorithm for clustering large data in industrial internet of things. IEEE Transactions on Industrial Informatics 13(3):1193–1201
Liang Z, Chen Z, Yang Y, Liang Z, Jane Wang Z (2019) ICFS clustering multiple representatives for large data. IEEE Transactions on Neural Networks and Learning Systems 30(3):728–738
Aras Can Onal, Omer Berat Sezar, Murat Ozbayoglu, Erdogan Dogdu, “ Weather Data Analysis and Sensor Fault Detection Using An Extended IoT Framework with Semantics, Big Data, and Machine Learning”, International Conference on Big Data (BIGDATA), IEEE, pp 2037–2046, 2017
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection: Special Issue on Future Networking Applications Plethora for Smart Cities
Guest Editors: Mohamed Elhoseny, Xiaohui Yuan, and Saru Kumari
Rights and permissions
About this article
Cite this article
Balakrishna, S., Thirumaran, M., Padmanaban, R. et al. An efficient incremental clustering based improved K-Medoids for IoT multivariate data cluster analysis. Peer-to-Peer Netw. Appl. 13, 1152–1175 (2020). https://doi.org/10.1007/s12083-019-00852-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-019-00852-x