Clusters and Grids for Distributed and Parallel Knowledge Discovery
Parallel and Distributed Knowledge Discovery (PDKD) is emerging as a possible killer application for clusters and grids of computers. The need to process large volumes of data and the availability of parallel data mining algorithms, makes it possible to exploit the increasing computational power of clusters at low costs. On the other side, grid computing is an emerging “standard” to develop and deploy distributed, high performance applications over geographic networks, in different domains, and in particular for data intensive applications. This paper proposes an approach to integrate cluster of computers within a grid infrastructure to use them, enriched by specific data mining services, as the deployment platform for high performance distributed data mining and knowledge discovery.
KeywordsData Mining Service Level Agreement Cluster Computing Grid Service Data Mining Algorithm
Unable to display preview. Download preview PDF.
- 1.G. Piatesky-Shapiro, The data mining Industry coming of age, IEEE Intelligent Systems, pp. 32–34, november/december 1999Google Scholar
- 2.A. Freitas, S. Levington, Mining Very Large Databases with Parallel Processing, Kluwer, 1998.Google Scholar
- 3.M.J.A. Michael, J.A. Berry, Data Mining Techniques, John Wiley & Sons, 1997.Google Scholar
- 4.D. Abramson, From PC Clusters to a Global Computational Grid, 1st IEEE Workshop on Cluster Computing (IWCC99), Melbourne, 1999.Google Scholar
- 5.R. Moore, Collection-Based Data Management, Workshop on Large-Scale Parallel, KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
- 6.S. Bailey, E. Creel, R. Grossman, S. Gutti, H. Sivakumar, A high performance implementation of the data space transfer protocol (DSTP), Workshop on Large-Scale Parallel, KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
- 7.U. Dayal, Large-Scale Data Mining Applications: Requirements and Architectures, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
- 8.G. Williams, Integrated Delivery of Large-Scale Data Mining Systems, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
- 9.R. Grossman, S. Kasif, R. Moore, D. Rocke, J. Ullman, Data Mining Research: Opportunities and Challenges, A report on three NFS Workshops on Mining Large, Massive and Distributed Data, available at http://www.ncdm.uic.edu/m3d-finalreport.htm
- 10.B. Grossman and Yike Guo, Communicating Data Mining: Issues and Challenges in Wide Area Distributed Data Mining, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
- 11.V. Kumar, Large-Scale Data Mining: Where is it Headed?, Workshop on Large-Scale Parallel KDD Systems (KDD99), San Diego, CA, 1999.Google Scholar
- 12.Building the Grid: An Integrated Services and Toolkit Architecture for Next-Generation Networked Applications, Working Draft, http://www.gridforum.org/building_the_grid.htm.
- 13.Foster and C. Kesselman, editors, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, 1999.Google Scholar
- 14.Foster, G. H. Thiruvathukal, S. Tuecke, Technologies for Ubiquitous Supercomputing: A Java Interface to the Nexus Communication System, Concurrency: Practice and Experience, special issue edited by G. C. Fox, June 1997.Google Scholar
- 15.The Globus project, available at http://www.globus.org.
- 16.The Nimrod project, available at http://www.dgs.monah.edu/~davida/nimrod.html.
- 17.Rajkumar Buyya (editor), High Performance Cluster Computing: Architectures and Systems, Prentice Hall PTR, NJ, USA, 1999.Google Scholar
- 18.M. Baker, editor, Cluster Computing White Paper, http://www.dcs.port.ac.uk/~mab/tfcc/WhitePaper/
- 19.R. L. Grossman, S. Kasif, D. Mon, A. Ramu and B. Malhi, The Preliminary Design of Papyrus: A System for High Performance, Distributed Data Mining over Clusters, Meta-Clusters and Super-Clusters, Proceedings of the KDD-98 Workshop on Distributed Data Mining, AAAI, 1999.Google Scholar
- 20.S. Stolfo, A. L. Prodromis, P.K. Chan, JAM: Java Agents for Meta-Learning over Distributed Databases, Proc. of the 3rd Int. Conf. On Knowledge Discovery and data Miing, AAAI Press, CA, 1997.Google Scholar
- 21.Y. Guo et al., Meta Learning for parallel Data Mining, in Proc. o the 7th Parallel Computing Workshop, 1997.Google Scholar
- 22.Albanese, M. Cannataro, P. Rullo, D. Saccà, Transmitting Datacubes over Congested Networks, Proc. of the IEEE International Conference on Coding and Transmission (ITCC2000), Las Vegas, 2000 (to appear).Google Scholar
- 23.Foster, I., A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing Systems, Proc. of the SC98 Conference, Orlando, USA, Nov. 7–13, 1998.Google Scholar
- 24.DiNucci, D. “The Role and Requirements of a Grid Programming Model”, available at http://www.elepar.com/GPMWG/gpm.1.ps