Skip to main content
Log in

Abstract

In the era of the Internet of Things (IoT), the proliferation of interconnected devices and sensors has led to an unprecedented deluge of data. Effective data analysis, particularly clustering, has become pivotal in handling the challenges posed by the vast volumes of IoT data. Clustering evaluation plays a critical role in determining the quality of clustering results. However, traditional cluster validity metrics are ill-suited for the distributed nature of IoT data. To address this gap, we introduce a novel distributed clustering evaluation metric named C4Y. It is rooted in sampling theory and is designed to evaluate the performance of clustering algorithms in distributed IoT environments. It operates based on two key principles: (1) Each dataset within distributed IoT node is treated as a sample of the entire dataset, and the expectation is that each sample exhibits similar data distribution, including category distribution, to the overall dataset. (2) It assumes that the centers of each category in all samples conform to a Gaussian distribution. This metric quantifies the extent to which category centers in different samples adhere to Gaussian distributions and measures the dissimilarity between these categories. Empirical results across various public datasets, spanning diverse sizes and dimensions, demonstrate that C4Y effectively assesses the performance of distributed clustering algorithms. This innovative approach promises to advance data analytics within the realm of distributed IoT data, underpinning the development of sophisticated IoT systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The source code and dataset are avaiable on https://github.com/XFastDataLab/C4Y.

Notes

  1. https://archive.ics.uci.edu/ml/index.php.

  2. http://sci2s.ugr.es/keel/data sets.php.

  3. http://cs.joensuu.fi/sipu/datasets/.

References

  • Bharti, M., Jindal, H.: Optimized clustering-based discovery framework on internet of things. J. Supercomput. 77, 1739–1778 (2021)

    Article  Google Scholar 

  • Bhaskara, A., Wijewardena, M.: Distributed clustering via lsh based data partitioning, in: International Conference on Machine Learning, PMLR, 570–579 (2018)

  • Borlea, I.-D., Precup, R.-E., Borlea, A.-B., Iercan, D.: A unified form of fuzzy c-means and k-means algorithms and its partitional implementation. Knowl.-Based Syst. 214, 106731 (2021)

    Article  Google Scholar 

  • Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)

    MathSciNet  Google Scholar 

  • Campbell, A., Hariri, M.E., Parvania, M.: Asynchronous distributed iot-enabled customer characterization in distribution networks: Theory and hardware implementation. IEEE Transactions on Smart Grid 13(6), 4392–4404 (2022)

    Article  Google Scholar 

  • Casolla, G., Cuomo, S., Di Cola, V.S., Piccialli, F.: Exploring unsupervised learning techniques for the internet of things. IEEE Trans. Industr. Inf. 16(4), 2621–2628 (2019)

    Article  Google Scholar 

  • Chen,Y., Yu,P., Zheng,Z., Shen,J., Guo,M.: Modeling feature interactions for context-aware qos prediction of iot services, Future Generation Computer Systems (2022)

  • Chen, Y., Hu, X., Fan, W., Shen, L., Zhang, Z., Liu, X., Du, J., Li, H., Chen, Y., Li, H.: Fast density peak clustering for large scale data based on knn. Knowl.-Based Syst. 187, 104824 (2020)

    Article  Google Scholar 

  • Chen, Y., Shen, L., Zhong, C., Chen, Y., Du, J.: Survey on density peak clustering algorithm. Journal of Computer Research and Development (in Chinese) 57(02), 378–394 (2020)

    Google Scholar 

  • Chen, Z.-S., Zhang, X., Pedrycz, W., Wang, X.-J., Chin, K.-S., Martínez, L.: K-means clustering for the aggregation of hflts possibility distributions: N-two-stage algorithmic paradigm. Knowl.-Based Syst. 227, 107230 (2021)

    Article  Google Scholar 

  • Chen, Y., Zhou, L., Pei, S., Yu, Z., Chen, Y., Liu, X., Du, J., Xiong, N.: Knn-block dbscan: Fast clustering for large-scale data. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51, 3939–3953 (2021)

    Article  Google Scholar 

  • Chen, Y., Zhou, L., Bouguila, N., Wang, C., Chen, Y., Du, J.: Block-dbscan: Fast clustering for large scale data. Pattern Recognit 109, 107624 (2021)

    Article  Google Scholar 

  • Chen, Y., Hailu, C., Yi, C., Zhao, K., Zhen, L., Jixiang, D.: Survey on dbscan acceleration algorithms for large scale data. Journal of Computer Research and Development (in Chinese) 60(09), 2028–2047 (2023)

    Google Scholar 

  • Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995)

    Article  Google Scholar 

  • Cheng, D., Zhu, Q., Huang, J., Wu, Q., Yang, L.: A novel cluster validity index based on local cores. IEEE Transactions on Neural Networks and Learning Systems 30(4), 985–999 (2018)

    Article  PubMed  Google Scholar 

  • da Silva, L.E.B., Elnabarawy, I., Wunsch, D.C., II.: A survey of adaptive resonance theory neural network models for engineering applications. Neural Netw. 120, 167–203 (2019)

    Article  Google Scholar 

  • Dang, B., Wang, Y., Zhou, J., Wang, R., Chen, L., Chen, C.L.P., Zhang, T., Han, S., Wang, L., Chen, Y.: Transfer collaborative fuzzy clustering in distributed peer-to-peer networks. IEEE Trans. Fuzzy Syst. 30(2), 500–514 (2022)

    Article  Google Scholar 

  • Davies,D. L., Bouldin,D. W.: A cluster separation measure, IEEE Transactions on Pattern Analysis and Machine Intelligence (2) 224–227 (1979)

  • Ding,S., Li,C., Xu,X., Ding,L., Zhang,J., Guo,L., Shi,T.: A sampling-based density peaks clustering algorithm for large-scale data, Pattern Recognition (2022) 109238

  • Dunn, J.C.: A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics 3(3), 32–57 (1973)

    Article  MathSciNet  Google Scholar 

  • Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96, 226–231 (1996)

    Google Scholar 

  • Geng, Y., Li, Q., Liang, M., Chi, C., Tan, J., Huang, H.: Local-density subspace distributed clustering for high-dimensional data. IEEE Transactions on Parallel and Distributed System 31, 1799–1814 (2020)

    Article  Google Scholar 

  • Guha, S., Li, Y., Zhang, Q.: Distributed partial clustering. ACM Transactions on Parallel Computing (TOPC) 6(3), 1–20 (2019)

    Article  Google Scholar 

  • Guijo-Rubio, D., Durán-Rosal, A.M., Gutiérrez, P.A., Troncoso, A., Hervás-Martínez, C.: Time-series clustering based on the characterization of segmtimeent typologies. IEEE Transactions on Cybernetics 51(11), 5409–5422 (2020)

    Article  Google Scholar 

  • Hu, L., Zhong, C.: An internal validity index based on density-involved distance. IEEE Access 7, 40038–40051 (2019)

    Article  Google Scholar 

  • Huang, H., Wang, C., Rubelt, F., Scriba, T.J., Davis, M.M.: Analyzing the mycobacterium tuberculosis immune response by t-cell receptor clustering with gliph2 and genome-wide antigen screening. Nat. Biotechnol. 38(10), 1194–1202 (2020)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Iglesias, F., Zseby, T., Zimek, A.: Absolute cluster validity. IEEE Trans. Pattern Anal. Mach. Intell. 42(9), 2096–2112 (2020)

    Article  PubMed  Google Scholar 

  • Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recognit Lett. 31(8), 651–666 (2010). (award winning papers from the 19th International Conference on Pattern Recognition (ICPR))

    Article  ADS  Google Scholar 

  • Januzaj, Eshref, Kriegel, Hans-Peter, Pfeifle, Martin, Scalable density-based distributed clustering, in: European Conference on Principles of Data Mining and Knowledge Discovery, Springer, 231–244 (2004)

  • Januzaj, Eshref, Kriegel, Hans-Peter, M. Pfeifle, Dbdc: Density based distributed clustering, in: International Conference on Extending Database Technology, Springer, (2004) 88–105

  • Karunanithy, K., Velusamy, B.: Cluster-tree based energy efficient data gathering protocol for industrial automation using wsns and iot. J. Ind. Inf. Integr. 19, 100156 (2020)

    Google Scholar 

  • Lei, Y., Bezdek, J.C., Romano, S., Vinh, N.X., Chan, J., Bailey, J.: Ground truth bias in external cluster validity indices. Pattern Recogn. 65, 58–70 (2017)

    Article  ADS  Google Scholar 

  • Lipor, J., Balzano, L.: Clustering quality metrics for subspace clustering. Pattern Recogn. 104, 107328 (2020)

    Article  Google Scholar 

  • Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J., Wu, S.: Understanding and enhancement of internal clustering validation measures. IEEE Transactions on Cybernetics 43(3), 982–994 (2013)

    Article  PubMed  Google Scholar 

  • MacQueen, J.: Classification and analysis of multivariate observations. In: 5th Berkeley Symp. Math. Statist. Probability, 281–297 (1967)

  • Maurizio, F., Francesco, C., Francesco, M., Stefano, R.: A survey of kernel and spectral methods for clustering. Pattern Recogn. 41(1), 176–190 (2008)

    Article  Google Scholar 

  • Miao, J., Zhou, X., Huang, T.-Z.: Local segmentation of images using an improved fuzzy c-means clustering algorithm based on self-adaptive dictionary learning. Appl. Soft Comput. 91, 106200 (2020)

    Article  Google Scholar 

  • Mohapatra, A.D., Sahoo, M.N., Sangaiah, A.K.: Distributed fault diagnosis with dynamic cluster-head and energy efficient dissemination model for smart city. Sustain. Cities Soc. 43, 624–634 (2018)

    Article  Google Scholar 

  • Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)

    Article  Google Scholar 

  • Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)

    Article  ADS  CAS  PubMed  Google Scholar 

  • Rojas-Thomas, J., Santos, M., Mora, M.: New internal index for clustering validation based on graphs. Expert Syst. Appl. 86, 334–349 (2017)

    Article  Google Scholar 

  • Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)

    Article  Google Scholar 

  • Sattler, F., Müller, K.-R., Samek, W.: Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints. IEEE Transactions on Neural Networks and Learning Systems 32(8), 3710–3722 (2020)

    Article  MathSciNet  Google Scholar 

  • Sekar, E.V., Anuradha, J., Arya, A., Balusamy, B., Chang, V.: A framework for smart traffic management using hybrid clustering techniques. Clust. Comput. 21, 347–362 (2018)

    Article  Google Scholar 

  • Tripathi, A.K., Sharma, K., Bala, M., Kumar, A., Menon, V.G., Bashir, A.K.: A parallel military-dog-based algorithm for clustering big data in cognitive industrial internet of things. IEEE Trans. Industr. Inf. 17(3), 2134–2142 (2021)

    Article  Google Scholar 

  • Vinh, N., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11, 2837–2854 (2010)

    MathSciNet  Google Scholar 

  • Wang,T., Liang,Y., Shen,X., Zheng,X., Mahmood,A., Sheng,Q. Z.: Edge computing and sensor-cloud: Overview, solutions, and directions, ACM Computing Surveys (2023)

  • Wang,T., Sun,B., Wang,L., Zheng,X., Jia,W.: Eidls: An edge-intelligence-based distributed learning system over internet of things, IEEE Transactions on Systems, Man, and Cybernetics: Systems (2023)

  • Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 13(08), 841–847 (1991)

    Article  Google Scholar 

  • Yan, M., Chen, Y., Hu, X., Cheng, D., Chen, Y., Du, J.: Intrusion detection based on improved density peak clustering for imbalanced data on sensor-cloud systems. J. Syst. Architect. 118, 102212 (2021)

    Article  Google Scholar 

  • Yan, M., Chen, Y., Chen, Y., Zeng, G., Hu, X., Du, J.: A lightweight weakly supervised learning segmentation algorithm for imbalanced image based on rotation density peaks. Knowl.-Based Syst. 244, 108513 (2022)

    Article  Google Scholar 

  • Zhang,Y., Cheny,S., Yu,G.: Efficient distributed density peaks for clustering large data sets in mapreduce, in: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), 67–68 (2017)

  • Zhao, Y., Tarus, S.K., Yang, L.T., Sun, J., Ge, Y., Wang, J.: Privacy-preserving clustering for big data in cyber-physical-social systems: Survey and perspectives. Inf. Sci. 515, 132–155 (2020)

    Article  MathSciNet  Google Scholar 

  • Zhou,X., Ye,X., Kevin,I., Wang,K., Liang,W., Nair,N. K. C., Shimizu,S., Yan,Z., Jin,Q.: Hierarchical federated learning with social context clustering-based participant selection for internet of medical things applications, IEEE Transactions on Computational Social Systems (2023)

  • Zhou,X., Zheng,X., Cui,X., Shi,J., Liang,W., Yan,Z., Yang,L. T., Shimizu,S., Kevin,I., Wang,K.: Digital twin enhanced federated reinforcement learning with lightweight knowledge distillation in mobile networks, IEEE Journal on Selected Areas in Communications (2023)

  • Zhou, X., Liang, W., Kevin, I., Wang, K., Yan, Z., Yang, L.T., Wei, W., Ma, J., Jin, Q.: Decentralized p2p federated learning for privacy-preserving and resilient mobile robotic systems. IEEE Wirel. Commun. 30(2), 82–89 (2023)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61673186, 61972010); the Natural Science Foundation of Fujian Province, China (Nos. 2021J01317, 2020J05059); the Scientific Research Funds of Huaqiao University (No. 19BS307); the Open Project of China Food Flavor and Nutrition Health Innovation Center (No. CFC2023B-029).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yuanyuan Yang or Yi Chen.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Yang, Y. & Chen, Y. C4y: a metric for distributed IoT clustering. CCF Trans. Pervasive Comp. Interact. (2024). https://doi.org/10.1007/s42486-024-00148-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42486-024-00148-x

Keywords

Navigation