KNOWLEDGE GRID: High Performance Knowledge Discovery Services on the Grid
Knowledge discovery tools and techniques are used in an increasing number of scientific and commercial areas for the analysis of large data sets. When large data repositories are coupled with geographic distribution of data, users and systems, it is necessary to combine different technologies for implementing high-performance distributed knowledge discovery systems. On the other hand, computational grid is emerging as a very promising infrastructure for high-performance distributed computing. In this paper we introduce a software architecture for parallel and distributed knowledge discovery (PDKD) systems that is built on top of computational grid services that provide dependable, consistent, and pervasive access to high-end computational resources. The proposed architecture uses the grid services and defines a set of additional layers to implement the services of distributed knowledge discovery process on grid-connected sequential or parallel computers.
KeywordsGrid Service Execution Plan Grid Infrastructure Data Mining Tool Distribute Data Mining
Unable to display preview. Download preview PDF.
- 1.Chattratichat J., Darlington J., Guo Y., Hedvall S., Koler M. and Syed J., An architecture for distributed enterprise data mining. HPCN Europe 1999, Lecture Notes in Computer Science, 1593, 1999, pp. 573–582.Google Scholar
- 2.Chervenak A., Foster I., Kesselman C, Salisbury C. and Tuecke S., The Data Grid: towards an architecture for the distributed management and analysis of large scientific data sets. Journal of Network and Computer Appls, 2001.Google Scholar
- 3.Fayyad U.M. and Uthurusamy R. (eds.), Data mining and knowledge discovery in databases. Communications of the ACM 39, 1997.Google Scholar
- 5.Freitas A.A. and Lavington S.H., Mining Very Large Databases with Parallel Processing, Kluwer Academic Publishers, 1998.Google Scholar
- 7.Grossman R., Bailey S., Kasif S., Mon D., Ramu A. and Malhi B., The preliminary design of papyrus: a system for high performance, distributed data mining over clusters, meta-clusters and super-clusters. International KDD’98 Conference, 1998, pp. 37–43.Google Scholar
- 8.Kargupta H., Park B., Hershberger, D. and Johnson, E., Collective data mining: a new perspective toward distributed data mining. In H. Kargupta and P. Chan (eds.) Advances in Distributed and Parallel Knowledge Discovery, AAAI Press 1999.Google Scholar
- 9.Kimm H. and Ryu T.-W., A framework for distributed knowledge discovery system over heterogeneous networks using CORBA. KDD2000 Workshop on Distributed and Parallel Knowledge Discovery, 2000.Google Scholar
- 10.D. Foti, D. Lipari, C. Pizzuti, D. Talia, “Scalable Parallel Clustering for Data Mining on Multicomputers”, Proc. of the 3rd Int. Workshop on High Performance Data Mining HPDM00-1PDPS, LNCS, Springer-Verlag, Cancun, Mexico, May 2000.Google Scholar
- 11.Moore R., Baru C, Marciano R., Rajasekar A. and Wan M., Data-intensive computing. In I. Foster and C. Kesselman (eds.) The Grid: Blueprint for a Future Computing Inf., Morgan Kaufmann Publishers, 1999, pp. 105–129.Google Scholar
- 12.Rana O.F., Walker D.W., Li M., Lynden S. and Ward M., PaDDMAS: parallel and distributed data mining application suite. Proc. International Parallel and Distributed Processing Symposium (IPDPS/SPDP), IEEE Computer Society Press, 2000, pp. 387–392.Google Scholar
- 13.Stolfo S.J., Prodromidis A.L., Tselepis S., Lee W., Fan D.W., Chan P.K., JAM: Java agents for meta-learning over distributed databases. International KDD’97 Conference, 1997, pp. 74–81.Google Scholar