Skip to main content
Log in

Metadata for Managing Grid Resources in Data Mining Applications

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

The Grid is an infrastructure for resource sharing and coordinated use of those resources in dynamic heterogeneous distributed environments. The effective use of a Grid requires the definition of metadata for managing the heterogeneity of involved resources that include computers, data, network facilities, and software tools provided by different organizations. Metadata management becomes a key issue when complex applications, such as data-intensive simulations and data mining applications, are executed on a Grid. This paper discusses metadata models for heterogeneous resource management in Grid-based data mining applications. In particular, it discusses how resources are represented and managed in the Knowledge Grid, a framework for Grid-enabled distributed data mining. The paper illustrates how XML-based metadata is used to describe data mining tools, data sources, mining models, and execution plans, and how metadata is used for the design and execution of distributed knowledge discovery applications on Grids.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. M. Cannataro and D. Talia, “The KNOWLEDGE GRID,” Communications of the ACM, January 2003, pp. 89–93.

  2. C. Mastroianni, D. Talia and P. Trunfio, “Managing Heterogeneous Resources in Data Mining Applications on Grids Using XML-Based Metadata,” in Proceedings IPDPS 2003, IEEE Computer Society Press, April 2003.

  3. Foster, C. Kesselman and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” Internat. J. Supercomputer Applications, Vol. 15, No. 3, 2001.

  4. Reagan W. Moore, “Persistent Archives for Data Collections SDSC,” UC San Diego SDSC TR-1999-2, October 1999.

  5. W. Johnston, “NASA’s Information Power Grid: Production Grid Experience with Distributed Computing and Data Management,” in Second Global Grid Forum Workshop (GGF2), Washington, DC, 2001.

  6. The Globus Project, “The Monitoring and Discovery Service.” http://www.globus.org/mds

  7. RFC 2251 – Lightweight Directory Access Protocol (v3).

  8. M. Cannataro, A. Congiusta, D. Talia and P. Trunfio, “A Data Mining Toolset for Distributed High-Performance Platforms,” in Proceedings 3rd Int. Conference Data Mining 2002, Bologna, WIT Press, September 2002, pp. 41–50.

  9. XML Schema. http://www.w3.org/XML/Schema

  10. XML Query. http://www.w3.org/XML/Query

  11. Xerces library. http://xml.apache.org

  12. P. Cheeseman and J. Stutz, “Bayesian Classification (AutoClass): Theory and Results,” in U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, pp. 61–83, 1996.

  13. M.S. Chen, J. Han and P.S. Yu, “Data Mining: An Overview from a Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 866–883, 1996.

    Google Scholar 

  14. R.L. Grossman, M.F. Hornick and G. Meyer, “Data Mining Standard Initiatives,” Communications of the ACM, Vol. 45, No. 8, August 2002.

  15. PMML 2.0 – DTD for Clustering Models. http://www.dmg.org/pmmlspecs_v2/ClusteringModel.htm

  16. The Semantic Grid project. http://www.semanticgrid.org

  17. M. Cannataro and C. Comito, “A Data Mining Ontology for Grid Programming,” in Proc. 1st Int. Workshop on Semantics in Peer-to-Peer and Grid Computing, Budapest, May 2003.

  18. The Globus Project, “The Globus Resource Allocation Manager.” http://www.globus.org/gram

  19. The Globus Project, “The Globus Resource Specification Language.” http://www.globus.org/gram/rsl_spec1.html

  20. RFC 2849 – The LDAP Data Interchange Format (LDIF) – Technical Specification.

  21. The Globus Project, “MDS 2.2 GRIS Specification Document: Creating New Information Providers.” http://www.globus.org/mds/creating_new_providers.pdf

  22. B. Mann, R. Williams, M. Atkinson, K. Brodlie, A. Storkey and C. Williams, “Scientific Data Mining, Integration, and Visualization,” Report of the workshop held at the e-Science Institute, Edinburgh, October 2002. http://www.cacr.caltech.edu/~roy/papers/sdmiv-ltr.pdf

  23. G. Fox, “Data and Metadata on the Semantic Grid,” Computing in Science and Engineering, Vol. 5, No. 5, September 2003.

  24. Foster, C. Kesselman, J. Nick and S. Tuecke, “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration,” Globus Project, 2002. www.globus.org/research/papers/ogsa.pdf

  25. V. Curcin, M. Ghanem, Y. Guo, M. Kohler, A. Rowe, J. Syed and P. Wendel, “Discovery Net: Towards a Grid of Knowledge Discovery,” ACM KDD 2002.

  26. The MyGrid project. http://mygrid.man.ac.uk/myGrid/

  27. P. Lord, C. Wroe, R. Stevens, C. Goble, S. Miles, L. Moreau, K. Decker, T. Payne and J. Papay, “Semantic and Personalized Service Discovery,” in Proceedings WI/IAT 2003 Workshop on Knowledge Grid and Grid Intelligence, Halifax, Canada, October 2003.

  28. R. Grossman, S. Bailey, S. Kasif, D. Mon, A. Ramu and B. Malhi, “The Preliminary Design of Papyrus: A System for High Performance, Distributed Data Mining over Clusters, Meta-Clusters and Super-Clusters,” in International KDD’98 Conference, 1998, pp. 37–43.

  29. O.F. Rana, D.W. Walker, M. Li, S. Lynden and M. Ward, “PaDDMAS: Parallel and Distributed Data Mining Application Suite,” in Proc. International Parallel and Distributed Processing Symposium (IPDPS/SPDP), IEEE Computer Society Press, 2000, pp. 387–392.

  30. Foster, J. Vöckler, M. Wilde and Y. Zhao, “Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation,” in SSDBM 2002, pp. 37–46.

  31. R. Grossman, Y. Gu, D. Hanley, X. Hong and G. Rao, “Open DMIX – Data Integration and Exploration Services for Data Grids, Data Web and Knowledge Grid Applications,” in Proceedings WI/IAT 2003 Workshop on Knowledge Grid and Grid Intelligence, Halifax, Canada, October 2003.

  32. R. Grossman and M. Mazzucco, “Dataspace – a Web Infrastructure for the Exploratory Analysis and Mining of Data,” IEEE Computing in Science and Engineering, pp. 44–51, July/August 2002.

  33. S.J. Stolfo, A.L. Prodromidis, S. Tselepis, W. Lee, D.W. Fan and P.K. Chan, “JAM: Java Agents for Meta-Learning over Distributed Databases,” in International KDD’97 Conference, 1997, pp. 74–81.

  34. H. Kargupta, B. Park, D. Hershberger and E. Johnson, “Collective Data Mining: A New Perspective toward Distributed Data Mining,” in H. Kargupta and P. Chan (eds.), Advances in Distributed and Parallel Knowledge Discovery, AAAI Press, 2000.

  35. E. Houstis, A. Catlin, N. Dhanjani, J. Rice, J. Dongarra, H. Casanova, D. Arnold and G. Fox, “Problem-Solving Environments,” in The Parallel Computing Sourcebook, M. Kaufmann Publishers, 2002.

  36. E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, K. Blackburn, A. Lazzarini, A. Arbree, R. Cavanaugh and S. Koranda, “Mapping Abstract Complex Workflows onto Grid Environments,” Journal of Grid Computing, Vol. 1, No. 1, pp. 25–39, 2003.

    Google Scholar 

  37. E. Deelman, J. Blythe, Y. Gil and C. Kesselman, “Workflow Management in GriPhyN,” in J. Nabrzyski, J.M. Schopf and J. Weglarz (co-ed.), Grid Resource Management, Kluwer Academic Publishers, 2003.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlo Mastroianni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mastroianni, C., Talia, D. & Trunfio, P. Metadata for Managing Grid Resources in Data Mining Applications. J Grid Computing 2, 85–102 (2004). https://doi.org/10.1007/s10723-004-2809-x

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-004-2809-x

Keywords

Navigation