A Framework for Data Clustering of Large Datasets in a Distributed Environment

  • Ch. Swetha Swapna
  • V. Vijaya Kumar
  • J. V. R. Murthy
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 379)


The chief motivation is to develop a framework for handling clustering of large datasets in a distributed manner. The proposal presented in this work addresses both numerical and categorical data with effective noisy information handling approach. Two basic models are developed known as primary and connected model to design the distributed approach. After forming clusters separately based on numerical and categorical features, an evolutionary approach is suggested to merge the clusters for optimization. A modification of multiple kernel-based FCM algorithm (MKFCM) Chen et al. (A multiple kernel fuzzy c-means algorithm for image segmentation 41:1263–1274, 2011) is used to implement the proposal. A comprehensive view of the designed method and algorithm is presented in this paper. Comparison of the results on few sample datasets shows the effectiveness of the proposed approach over existing one.


Clustering Categorical and numerical data Large dataset 


  1. 1.
    Ji, J., Pang, W., Zhou, C., Han, X., Wang, Z.: A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. J. Knowl. Based Syst. 30, 129–135 (2012)CrossRefGoogle Scholar
  2. 2.
    Chen, L., Chen, C.L., Lu, M.: A multiple-kernel fuzzy C-means algorithm for image segmentation. IEEE Trans. Syst. Man Cybern. Part B 41(5), 1263–1274 (2011)CrossRefGoogle Scholar
  3. 3.
    Inderjit, S.D., Modha, D.S.: A data-clustering algorithm on distributed memory multiprocessors. In: Proceedings of KDD Workshop High Performance Knowledge Discovery, pp. 245–260 (1999)Google Scholar
  4. 4.
    Jin, R., Goswami, A., Agrawal, G.: Fast and exact out-of-core and distributed K-Means clustering. J. Knowl. Inf. Syst. 10(1), 17–40 (2006)CrossRefGoogle Scholar
  5. 5.
    Ji, G., Ling, X.: Ensemble learning based distributed clustering. Emerg. Technol. Knowl. Discov. Data Min. 4819, 312–321 (2007)CrossRefGoogle Scholar
  6. 6.
    Beaumont, O., Bonichon, N., Duchon, P., Eyraud-Dubois, L., Larcheveque, H.: A distributed algorithm for resource clustering in large scale platforms. Principles Distrib. Syst. 5401, 564–567 (2008)CrossRefGoogle Scholar
  7. 7.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley, 1st edition (1989)Google Scholar
  8. 8.
  9. 9.

Copyright information

© Springer India 2016

Authors and Affiliations

  • Ch. Swetha Swapna
    • 1
  • V. Vijaya Kumar
    • 2
  • J. V. R. Murthy
    • 1
  1. 1.Department of Computer Science and EngineeringJNTUKakinadaIndia
  2. 2.Department of Computer Science and EngineeringAnurag, Group of Institutions, JNTUHyderabadIndia

Personalised recommendations