Journal of Grid Computing

, Volume 2, Issue 1, pp 85–102 | Cite as

Metadata for Managing Grid Resources in Data Mining Applications

  • Carlo Mastroianni
  • Domenico Talia
  • Paolo Trunfio


The Grid is an infrastructure for resource sharing and coordinated use of those resources in dynamic heterogeneous distributed environments. The effective use of a Grid requires the definition of metadata for managing the heterogeneity of involved resources that include computers, data, network facilities, and software tools provided by different organizations. Metadata management becomes a key issue when complex applications, such as data-intensive simulations and data mining applications, are executed on a Grid. This paper discusses metadata models for heterogeneous resource management in Grid-based data mining applications. In particular, it discusses how resources are represented and managed in the Knowledge Grid, a framework for Grid-enabled distributed data mining. The paper illustrates how XML-based metadata is used to describe data mining tools, data sources, mining models, and execution plans, and how metadata is used for the design and execution of distributed knowledge discovery applications on Grids.


data mining discovery service dynamic scheduling knowledge grid metadata management peer-to-peer resource categorization semantic grid 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    M. Cannataro and D. Talia, “The KNOWLEDGE GRID,” Communications of the ACM, January 2003, pp. 89–93. Google Scholar
  2. 2.
    C. Mastroianni, D. Talia and P. Trunfio, “Managing Heterogeneous Resources in Data Mining Applications on Grids Using XML-Based Metadata,” in Proceedings IPDPS 2003, IEEE Computer Society Press, April 2003. Google Scholar
  3. 3.
    Foster, C. Kesselman and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” Internat. J. Supercomputer Applications, Vol. 15, No. 3, 2001. Google Scholar
  4. 4.
    Reagan W. Moore, “Persistent Archives for Data Collections SDSC,” UC San Diego SDSC TR-1999-2, October 1999. Google Scholar
  5. 5.
    W. Johnston, “NASA’s Information Power Grid: Production Grid Experience with Distributed Computing and Data Management,” in Second Global Grid Forum Workshop (GGF2), Washington, DC, 2001. Google Scholar
  6. 6.
    The Globus Project, “The Monitoring and Discovery Service.”
  7. 7.
    RFC 2251 – Lightweight Directory Access Protocol (v3). Google Scholar
  8. 8.
    M. Cannataro, A. Congiusta, D. Talia and P. Trunfio, “A Data Mining Toolset for Distributed High-Performance Platforms,” in Proceedings 3rd Int. Conference Data Mining 2002, Bologna, WIT Press, September 2002, pp. 41–50. Google Scholar
  9. 9.
  10. 10.
  11. 11.
    Xerces library.
  12. 12.
    P. Cheeseman and J. Stutz, “Bayesian Classification (AutoClass): Theory and Results,” in U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (eds.), Advances in Knowledge Discovery and Data Mining, AAAI Press/MIT Press, pp. 61–83, 1996. Google Scholar
  13. 13.
    M.S. Chen, J. Han and P.S. Yu, “Data Mining: An Overview from a Database Perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 866–883, 1996. Google Scholar
  14. 14.
    R.L. Grossman, M.F. Hornick and G. Meyer, “Data Mining Standard Initiatives,” Communications of the ACM, Vol. 45, No. 8, August 2002. Google Scholar
  15. 15.
    PMML 2.0 – DTD for Clustering Models.
  16. 16.
    The Semantic Grid project.
  17. 17.
    M. Cannataro and C. Comito, “A Data Mining Ontology for Grid Programming,” in Proc. 1st Int. Workshop on Semantics in Peer-to-Peer and Grid Computing, Budapest, May 2003. Google Scholar
  18. 18.
    The Globus Project, “The Globus Resource Allocation Manager.”
  19. 19.
    The Globus Project, “The Globus Resource Specification Language.”
  20. 20.
    RFC 2849 – The LDAP Data Interchange Format (LDIF) – Technical Specification. Google Scholar
  21. 21.
    The Globus Project, “MDS 2.2 GRIS Specification Document: Creating New Information Providers.”
  22. 22.
    B. Mann, R. Williams, M. Atkinson, K. Brodlie, A. Storkey and C. Williams, “Scientific Data Mining, Integration, and Visualization,” Report of the workshop held at the e-Science Institute, Edinburgh, October 2002.
  23. 23.
    G. Fox, “Data and Metadata on the Semantic Grid,” Computing in Science and Engineering, Vol. 5, No. 5, September 2003. Google Scholar
  24. 24.
    Foster, C. Kesselman, J. Nick and S. Tuecke, “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration,” Globus Project, 2002.
  25. 25.
    V. Curcin, M. Ghanem, Y. Guo, M. Kohler, A. Rowe, J. Syed and P. Wendel, “Discovery Net: Towards a Grid of Knowledge Discovery,” ACM KDD 2002. Google Scholar
  26. 26.
    The MyGrid project.
  27. 27.
    P. Lord, C. Wroe, R. Stevens, C. Goble, S. Miles, L. Moreau, K. Decker, T. Payne and J. Papay, “Semantic and Personalized Service Discovery,” in Proceedings WI/IAT 2003 Workshop on Knowledge Grid and Grid Intelligence, Halifax, Canada, October 2003. Google Scholar
  28. 28.
    R. Grossman, S. Bailey, S. Kasif, D. Mon, A. Ramu and B. Malhi, “The Preliminary Design of Papyrus: A System for High Performance, Distributed Data Mining over Clusters, Meta-Clusters and Super-Clusters,” in International KDD’98 Conference, 1998, pp. 37–43. Google Scholar
  29. 29.
    O.F. Rana, D.W. Walker, M. Li, S. Lynden and M. Ward, “PaDDMAS: Parallel and Distributed Data Mining Application Suite,” in Proc. International Parallel and Distributed Processing Symposium (IPDPS/SPDP), IEEE Computer Society Press, 2000, pp. 387–392. Google Scholar
  30. 30.
    Foster, J. Vöckler, M. Wilde and Y. Zhao, “Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation,” in SSDBM 2002, pp. 37–46. Google Scholar
  31. 31.
    R. Grossman, Y. Gu, D. Hanley, X. Hong and G. Rao, “Open DMIX – Data Integration and Exploration Services for Data Grids, Data Web and Knowledge Grid Applications,” in Proceedings WI/IAT 2003 Workshop on Knowledge Grid and Grid Intelligence, Halifax, Canada, October 2003. Google Scholar
  32. 32.
    R. Grossman and M. Mazzucco, “Dataspace – a Web Infrastructure for the Exploratory Analysis and Mining of Data,” IEEE Computing in Science and Engineering, pp. 44–51, July/August 2002. Google Scholar
  33. 33.
    S.J. Stolfo, A.L. Prodromidis, S. Tselepis, W. Lee, D.W. Fan and P.K. Chan, “JAM: Java Agents for Meta-Learning over Distributed Databases,” in International KDD’97 Conference, 1997, pp. 74–81. Google Scholar
  34. 34.
    H. Kargupta, B. Park, D. Hershberger and E. Johnson, “Collective Data Mining: A New Perspective toward Distributed Data Mining,” in H. Kargupta and P. Chan (eds.), Advances in Distributed and Parallel Knowledge Discovery, AAAI Press, 2000. Google Scholar
  35. 35.
    E. Houstis, A. Catlin, N. Dhanjani, J. Rice, J. Dongarra, H. Casanova, D. Arnold and G. Fox, “Problem-Solving Environments,” in The Parallel Computing Sourcebook, M. Kaufmann Publishers, 2002. Google Scholar
  36. 36.
    E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, K. Blackburn, A. Lazzarini, A. Arbree, R. Cavanaugh and S. Koranda, “Mapping Abstract Complex Workflows onto Grid Environments,” Journal of Grid Computing, Vol. 1, No. 1, pp. 25–39, 2003. Google Scholar
  37. 37.
    E. Deelman, J. Blythe, Y. Gil and C. Kesselman, “Workflow Management in GriPhyN,” in J. Nabrzyski, J.M. Schopf and J. Weglarz (co-ed.), Grid Resource Management, Kluwer Academic Publishers, 2003. Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Carlo Mastroianni
    • 1
  • Domenico Talia
    • 1
    • 2
  • Paolo Trunfio
    • 2
  1. 1.ICAR-CNRRende (CS)Italy
  2. 2.DEISUniversità della CalabriaRende (CS)Italy

Personalised recommendations