A Study of Various Varieties of Distributed Data Mining Architectures

  • Sukriti Paul
  • Nisha P. Shetty
  • Balachandra
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 701)


Owing to the explosion of data in today’s world, datasets are enormous, geographically distributed and heterogeneous. Data mining aims extracting useful information from voluminous repositories where data is stored. Predictive analysis of hidden patterns in massive datasets poses to be a challenge. The problems faced while using the data warehousing model for such datasets were privacy, centralization of the data present at multiple independent sites, bandwidth limitation, complexity of integration, and analysis of the data at a global level. Distributed algorithms have been designed to address the same. Distributed data mining (DDM) techniques regard the distributed datasets as one virtual table and assume the existence of a global model which could be designed if the data were combined centrally. This paper presents distributed data mining systems and frameworks for analyzing data and mining the required knowledge from it. Emphasis has been laid on the architectures of such models. Factors like computation resources, communication, hardware, and usage of distributed resources of data have been considered while analyzing or designing distributed algorithms. Such algorithms primarily aim at memory expense and average distribution of working load. Distributed data finds its application in e-commerce, e-business, intrusion detection systems, and sensor networks.


Data mining Distributed computing Grid computing P2P Similarity model 


  1. 1.
    Kargupta, H., Kamath, C., Chan, P.: Distributed and parallel data mining: emergence, growth, and future directions. In: Advances in Distributed and Parallel Knowledge Discovery, pp. 409–416. AAAI/MIT Press (2000)Google Scholar
  2. 2.
    Li, T., Zhu, S., Ogihara, M.: A New Distributed Data Mining Model Based on Similarity. ACM (2003)Google Scholar
  3. 3.
    Datta, S., Giannella, C., Kargupta, H.: K-means clustering over a large dynamic network. In: SDM, pp. 153–164 (2006)CrossRefGoogle Scholar
  4. 4.
    Rekha Sunny, T., Thampi, S.M.: Survey on distributed data mining in P2P networks. CoRR (2012)Google Scholar
  5. 5.
    Jelasity, M., Montresor, A., Babaoglu, O.: Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst. 23(3), 219–252 (2012)CrossRefGoogle Scholar
  6. 6.
    Mehyar, M., Spanos, D., Pongsajapan, J., Low, S.H., Murray, R.M.: Asynchronous distributed averaging on communication networks. IEEE/ACM Trans. Netw. 15(3) (2007)CrossRefGoogle Scholar
  7. 7.
    Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: Proceedings of 44th Annual IEEE Symposium on Foundations of Computer Science (2003)Google Scholar
  8. 8.
    Kowalczyk, W., Jelasity, M., Eiben, A.E: Towards data mining in large and fully distributed peer-to-peer overlay networks. In: Proceedings of BNAIC’03, pp. 203–210 (2003)Google Scholar
  9. 9.
    Wolff, R., Schuster, A.: Association rule mining in peer-to-peer systems. IEEE Trans. Syst. Man Cybern. 34(6) (2004)Google Scholar
  10. 10.
    Schuster, A., Wolff, R.: Association rule mining in peer-to-peer systems. In: Proceedings of the 3rd International Conference on Data Mining (2003)Google Scholar
  11. 11.
    Siersdorfer, S., Sizov, S.: Automatic document organization in a P2P environment. LNCS, vol. 3936, pp. 265–276 (2006)Google Scholar
  12. 12.
    Eisenhardt, M., Müller, W., Henrich, A.: Classifying Documents by Distributed P2P Clustering (2003)Google Scholar
  13. 13.
    Hammouda, K.M., Kamel, M.S.: Hierarchically distributed peer-to-peer document clustering and cluster summarization. IEEE Trans. Knowl. Data Eng. 21(5), 681–698 (2009)CrossRefGoogle Scholar
  14. 14.
    Ahamed, B.B., Hariharan, S.: A survey on distributed data mining process via grid. Int. J. Database Theory Appl. 4(3) (2011)Google Scholar
  15. 15.
    Cannataro, M., Talia, D.: Knowledge grid an architecture for distributed knowledge discovery. ACM 46(1), 89–93 (2003)CrossRefGoogle Scholar
  16. 16.
    von Laszewski, G., Ruscic, B., Amin, K., Wagstrom, P., Krishnan, S., Nijsure, S.: A framework for building a scientific knowledge grid applied to thermochemical tables. Int. J. High Perform. Comput. Appl. 17(4) (2003)Google Scholar
  17. 17.
    Wankar, R.: Grid computing with globus: an overview and research challenges. Int. J. Comput. Sci. Appl. 5(3), 56– 69Google Scholar
  18. 18.
    Talia, D.: Grid-based Distributed Data Mining Systems, Algorithms and ServicesGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Manipal Institute of Technology, Manipal UniversityManipalIndia

Personalised recommendations