Clusters and Grids for Distributed and Parallel Knowledge Discovery

  • Mario Cannataro
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1823)


Parallel and Distributed Knowledge Discovery (PDKD) is emerging as a possible killer application for clusters and grids of computers. The need to process large volumes of data and the availability of parallel data mining algorithms, makes it possible to exploit the increasing computational power of clusters at low costs. On the other side, grid computing is an emerging “standard” to develop and deploy distributed, high performance applications over geographic networks, in different domains, and in particular for data intensive applications. This paper proposes an approach to integrate cluster of computers within a grid infrastructure to use them, enriched by specific data mining services, as the deployment platform for high performance distributed data mining and knowledge discovery.


Data Mining Service Level Agreement Cluster Computing Grid Service Data Mining Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. Piatesky-Shapiro, The data mining Industry coming of age, IEEE Intelligent Systems, pp. 32–34, november/december 1999Google Scholar
  2. 2.
    A. Freitas, S. Levington, Mining Very Large Databases with Parallel Processing, Kluwer, 1998.Google Scholar
  3. 3.
    M.J.A. Michael, J.A. Berry, Data Mining Techniques, John Wiley & Sons, 1997.Google Scholar
  4. 4.
    D. Abramson, From PC Clusters to a Global Computational Grid, 1st IEEE Workshop on Cluster Computing (IWCC99), Melbourne, 1999.Google Scholar
  5. 5.
    R. Moore, Collection-Based Data Management, Workshop on Large-Scale Parallel, KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
  6. 6.
    S. Bailey, E. Creel, R. Grossman, S. Gutti, H. Sivakumar, A high performance implementation of the data space transfer protocol (DSTP), Workshop on Large-Scale Parallel, KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
  7. 7.
    U. Dayal, Large-Scale Data Mining Applications: Requirements and Architectures, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
  8. 8.
    G. Williams, Integrated Delivery of Large-Scale Data Mining Systems, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
  9. 9.
    R. Grossman, S. Kasif, R. Moore, D. Rocke, J. Ullman, Data Mining Research: Opportunities and Challenges, A report on three NFS Workshops on Mining Large, Massive and Distributed Data, available at
  10. 10.
    B. Grossman and Yike Guo, Communicating Data Mining: Issues and Challenges in Wide Area Distributed Data Mining, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
  11. 11.
    V. Kumar, Large-Scale Data Mining: Where is it Headed?, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
  12. 12.
    Building the Grid: An Integrated Services and Toolkit Architecture for Next-Generation Networked Applications, Working Draft,
  13. 13.
    Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, 1999.Google Scholar
  14. 14.
    Foster, G. H. Thiruvathukal, S. Tuecke, Technologies for Ubiquitous Supercomputing: A Java Interface to the Nexus Communication System, Concurrency: Practice and Experience, special issue edited by G. C. Fox, June 1997.Google Scholar
  15. 15.
    The Globus project, available at
  16. 16.
    The Nimrod project, available at
  17. 17.
    Rajkumar Buyya (editor), High Performance Cluster Computing: Architectures and Systems, Prentice Hall PTR, NJ, USA, 1999.Google Scholar
  18. 18.
    M. Baker, editor, Cluster Computing White Paper,
  19. 19.
    R. L. Grossman, S. Kasif, D. Mon, A. Ramu and B. Malhi, The Preliminary Design of Papyrus: A System for High Performance, Distributed Data Mining over Clusters, Meta-Clusters and Super-Clusters, Proceedings of the KDD-98 Workshop on Distributed Data Mining, AAAI, 1999.Google Scholar
  20. 20.
    S. Stolfo, A. L. Prodromis, P.K. Chan, JAM: Java Agents for Meta-Learning over Distributed Databases, Proc. of the 3rd Int. Conf. On Knowledge Discovery and data Miing, AAAI Press, CA, 1997.Google Scholar
  21. 21.
    Y. Guo et al., Meta Learning for parallel Data Mining, in Proc. o the 7th Parallel Computing Workshop, 1997.Google Scholar
  22. 22.
    Albanese, M. Cannataro, P. Rullo, D. Saccà, Transmitting Datacubes over Congested Networks, Proc. of the IEEE International Conference on Coding and Transmission (ITCC2000), Las Vegas, 2000 (to appear).Google Scholar
  23. 23.
    Foster, I., A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems, Proc. of the SC98 Conference, Orlando, USA, Nov. 7–13, 1998.Google Scholar
  24. 24.
    DiNucci, D. “The Role and Requirements of a Grid Programming Model”, available at

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Mario Cannataro
    • 1
  1. 1.ISI-CNRRendeItaly

Personalised recommendations