A Framework for Composing Knowledge Discovery Workflows in Grids

  • Marco Lackovic
  • Domenico Talia
  • Paolo Trunfio
Part of the Studies in Computational Intelligence book series (SCI, volume 206)


Grid computing platforms provide middleware and services for coordinating the use of data and computational resources available throughout the network. Grids are used to implement a wide range of distributed applications and systems, including frameworks for distributed data mining and knowledge discovery. This chapter presents a framework we developed to support the execution of knowledge discovery workflows in Grid environments by executing data mining and computation intelligence algorithms on a set of Grid nodes. Our framework is an extension of Weka, an open-source toolkit for data mining and knowledge discovery, and makes use of Web Service technologies to access Grid resources and distribute the computation. We present the implementation of the framework and show through some applications how it supports the design of knowledge discovery workflows and their execution on a Grid.


Execution Time Grid Node Computing Node Knowledge Flow Total Execution Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the Grid. In: Berman, F., Fox, G., Hey, A. (eds.) Grid Computing: Making the Global Infrastructure a Reality, pp. 217–249. Wiley, New York (2003)Google Scholar
  2. 2.
    Witten, H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar
  3. 3.
    Czajkowski, K., et al.: The WS-Resource Framework Version 1.0. (2006), (visited May 21, 2008)
  4. 4.
    Foster, I.: Globus Toolkit Version 4: Software for service-oriented systems. In: Jin, H., Reed, D., Jiang, W. (eds.) NPC 2005. LNCS, vol. 3779, pp. 2–13. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Talia, D., Trunfio, P., Verta, O.: Weka4WS: a WSRF-enabled Weka Toolkit for distributed data mining on Grids. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS, vol. 3721, pp. 309–320. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Al Sairafi, S., Emmanouil, F.-S., Ghanem, M., Giannadakis, N., Guo, Y., Kalaitzopoulos, D., Osmond, M., Rowe, A., Syed, J., Wendel, P.: The Design of Discovery Net: Towards Open Grid Services for Knowledge Discovery. Int. Journal of High Performance Computing Applications 17(3), 297–315 (2003)CrossRefGoogle Scholar
  7. 7.
    Brezany, P., Hofer, J., Min Tjoa, A., Woehrer, A.: GridMiner: An Infrastructure for Data Mining on Computational Grids. In: APAC Conference and Exhibition on Advanced Computing, Grid Applications and eResearch, Queensland, Australia (2003)Google Scholar
  8. 8.
    Congiusta, A., Talia, D., Trunfio, P.: Distributed data mining services leveraging WSRF. Future Generation Computer Systems 23(1), 34–41 (2007)CrossRefGoogle Scholar
  9. 9.
    Allcock, W., Bresnahan, J., Kettimuthu, R., Link, M., Dumitrescu, C., Raicu, I., Foster, I.: The Globus striped GridFTP framework and server. In: Supercomputing Conf. (2005)Google Scholar
  10. 10.
    Web Services Base Notification 1.3, OASIS Standard (2006), (visited May 21, 2008)
  11. 11.
    Graham, S., et al.: Publish-Subscribe Notification for Web services (2004), (visited May 21, 2008)
  12. 12.
    Java GridFTP client, (visited May 21, 2008)
  13. 13.
    Hettich, S., Bay, S.D.: The UCI KDD Archive, University of California, Department of Information and Computer Science, (visited March 19, 2007)

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Marco Lackovic
    • 1
  • Domenico Talia
    • 1
  • Paolo Trunfio
    • 1
  1. 1.University of CalabriaItaly

Personalised recommendations