Abstract
Due to explosion in the number of autonomous data sources, there is a growing need for effective approaches to distributed clustering. This paper compares the performance of two distributed clustering algorithms namely, Improved Distributed Combining Algorithm and Distributed K-Means algorithm against traditional Centralized Clustering Algorithm. Both algorithms use cluster centroid to form a cluster ensemble, which is required to perform global clustering. The centroid based partitioned clustering algorithms K-Means, Fuzzy K-Means and Rough K-Means are used with each distributed clustering algorithm, in order to analyze the performance of both hard and soft clusters in distributed environment. The experiments are carried out for an artificial dataset and four benchmark datasets of UCI machine learning data repository.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chen, R., Sivakumar, K., Kargupta, H.: Collective Mining of Bayesian Networks from Distributed Heterogeneous Data. Knowledge and Information Systems Journal 6, 164–187 (2004)
Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the Divide: Continuous Clustering of Distributed Data Streams. In: IEEE 23rd International Conference on Data Engineering, pp. 1036–1045 (2007)
Folino, G., Forestiero, A., Spezzano, G.: Swarm-Based Distributed Clustering in Peer-to-Peer Systems. In: Talbi, E.-G., Liardet, P., Collet, P., Lutton, E., Schoenauer, M. (eds.) EA 2005. LNCS, vol. 3871, pp. 37–48. Springer, Heidelberg (2006)
Ghosh, J., Merugu, S.: Distributed Clustering with Limited Knowledge Sharing. In: Proceedings of the 5th International Conference on Advances in Pattern Recognition, pp. 48–53 (2003)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Cluster validity methods: part II. ACM SIGMOD Record 31(3), 19–27 (2002)
Hamerly, G., Elkan, C.: Alternatives to the K-Means algorithm that find better clusterings. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 600–607 (2002)
Hammouda, K.: A Comparative Study of Data Clustering Techniques. In: Tools of Intelligent Systems Design. Course Project SYDE 625 (2000)
Hore, P., Hall Lawrence, O.: Scalable Clustering: A Distributed Approach. In: IEEE International Conference on Fuzzy Systems, pp. 25–29 (2004)
Hore, P., Hall Lawrence, O., Goldgofz, D.: A Cluster Ensemble Framework for Large Datasets. In: Proceedings of IEEE Conference on Systems, Man Cybernetics B (2006)
Jain, A.K., Murthy, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 265–323 (1999)
Januzaj, E., Kriegel Hans, P., Pfeifle, M.: Towards Effective and Efficient Distributed Clustering. In: Proceedings of International Workshop on Clustering Large Datasets, 3rd IEEE International Conference on Data Mining, pp. 49–58 (2003)
Januzaj, E., Kriegel Hans, P., Pfeifle, M.: DBDC: Density Based Distributed Clustering. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 88–105. Springer, Heidelberg (2004)
Jeong, J., Ryu, B., Shin, D., Shin, D.: Integration of Distributed Biological Data using modified K-means algorithm. In: Washio, T., Zhou, Z.-H., Huang, J.Z., Hu, X., Li, J., Xie, C., He, J., Zou, D., Li, K.-C., Freire, M.M. (eds.) PAKDD 2007. LNCS, vol. 4819, pp. 469–475. Springer, Heidelberg (2007)
Genlin, J., Xiaohan, L.: Ensemble learning based distributed clustering. In: Washio, T., Zhou, Z.-H., Huang, J.Z., Hu, X., Li, J., Xie, C., He, J., Zou, D., Li, K.-C., Freire, M.M. (eds.) PAKDD 2007. LNCS, vol. 4819, pp. 312–321. Springer, Heidelberg (2007)
Jin, R., Goswami, A., Agarwal, G.: Fast and exact out-of-core and distributed K-means clustering. Knowledge and Information Systems 10(1), 17–40 (2006)
Karthikeyani, N.V., Thangavel, K., Alagambigai, P.: Ensemble Approach to Distributed Clustering. In: Natarajan (ed.) Mathematical and Computational Model, pp. 252–261. Narosa Publishing House, New Delhi (2007)
Khanuja, J., Karlapalem, K.: CLOUD: Cluster Identification and Outlier Detection for Distributed Data. Technical report (2007)
Kuhn, H.W.: The Hungarian Method for the Assignment Problem. Naval. Res. Logist. Quart 2, 83–97 (1995)
Lamine, M.A., Le-Khac, N., Tahar, M.K.: Lightweight Clustering Technique for Distributed Data Mining Applications. In: Perner, P. (ed.) ICDM 2007. LNCS (LNAI), vol. 4597, pp. 120–134. Springer, Heidelberg (2007)
Le-Khac, N., Lamine, M.A., Tahar, M.K.: A New Approach for Distributed Density Based Clustering on Grid Platform. In: Cooper, R., Kennedy, J. (eds.) BNCOD 2007. LNCS, vol. 4587, pp. 247–258. Springer, Heidelberg (2007)
Li, T., Zhu, S., Ogihara, M.: A New distributed data mining model based on similarity. In: Proceedings of the 2003 ACM symposium on Applied Computing, pp. 432–436 (2003)
Lingras, P., Chen, M., Miao, D.: Precision of Rough Set Clustering. In: The Sixth International Conference on Rough Sets and Current Trends in Computing Akron, Ohio, USA (submitted, 2008)
Lingras, P., West, C.: Interval set clustering of web users with rough k-means. Journal of Intelligent Information Systems 23(1), 5–16 (2004)
Lingras, P., Yan, R., Jain, A.: Web usage mining: Comparison of conventional, fuzzy, and rough set clustering. In: Zhang, Y., Liu, J., Yao, Y. (eds.) Computational Web Intelligence: Intelligent Technology for Web Applications, ch. 7, pp. 133–148. Springer, Heidelberg (2004)
Merugu, S., Ghosh, J.: A Distributed Learning Framework for Heterogeneous Data Sources. In: Proceedings of the 11th International Conference on Knowledge Discovery and Data Mining (KDD 2005) (2005)
Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Databases. Irvine, University of California (1998), http://www.ics.uci.eedu/~mlearn/
Mitra, S., Banka, H., Pedrycz, W.: Rough-Fuzzy Collaborative Clustering. IEEE Transactions on Systems, Man, and Cybernetics –Part B: Cybernetics 36(4), 795–805 (2006)
Park, B., Kargupta, H.: Distributed Data Mining. In: Ye, N. (ed.) The Hand Book of Data Mining. Lawrence Erlabum Associates, Publishers, Mahwah (2003)
Tan, P.-N., Steinbach, M., Kumar, V.: Cluster Analysis: Basic Concepts and Algorithms. In: Introduction to Data Mining. Pearson Addison Wesley, Boston (2006)
Pawlak, Z.: Rough sets. Internationl Journal of Information and Computer Sciences 11, 145–172 (1982)
Perez, J.O., Pazos, R.R., Cruz, L.R., et al.: Improving the Efficiency and Efficacy of the K-Means Clustering Algorithm through a new convergence condition. In: Gervasi, O., Gavrilova, M.L. (eds.) ICCSA 2007, Part III. LNCS, vol. 4707, pp. 674–682. Springer, Heidelberg (2007)
Peters, G.: Some Refinements of Rough K-Means clustering. Pattern Recognition 39(8), 1481–1491 (2006)
Sanghamitra, B., Giannella, C., Maulik, U., et al.: Clustering Distributed Data Streams in Peer-to-Peer Environments. Information Science 176(4), 1952–1985 (2006)
Strehl, A., Ghosh, J.: Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research 3, 583–617 (2002)
Xiong, X., Lee, K.T.: Similarity-Driven Cluster Merging method for Unsupervised fuzzy clustering. In: Proceedings of the 20th conference on Uncertainty in Artificial Intelligence, pp. 611–618 (2004)
Xu, R., Wunsch II, D.: Survey of clustering algorithms. IEEE Transaction on Neural Networks 16(3), 645–678 (2005)
Zhou, A., Cao, F., Yan, Y., Sha, C., He, X.: Distributed Data Stream Clustering: A Fast EM-based Approach. In: ICDE 2007, IEEE 23rd International Conference on Data Engineering, pp. 736–745 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Visalakshi, N.K., Thangavel, K. (2009). Distributed Data Clustering: A Comparative Analysis. In: Abraham, A., Hassanien, AE., de Leon F. de Carvalho, A.P., Snášel, V. (eds) Foundations of Computational, IntelligenceVolume 6. Studies in Computational Intelligence, vol 206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01091-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-01091-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01090-3
Online ISBN: 978-3-642-01091-0
eBook Packages: EngineeringEngineering (R0)