Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids

  • Domenico Talia
  • Paolo Trunfio
  • Oreste Verta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3721)

Abstract

This paper presents Weka4WS, a framework that extends the Weka toolkit for supporting distributed data mining on Grid environments. Weka4WS adopts the emerging Web Services Resource Framework (WSRF) for accessing remote data mining algorithms and managing distributed computations. The Weka4WS user interface is a modified Weka Explorer environment that supports the execution of both local and remote data mining tasks. On every computing node, a WSRF-compliant Web Service is used to expose all the data mining algorithms provided by the Weka library. The paper describes the design and the implementation of Weka4WS using a first release of the WSRF library. To evaluate the efficiency of the proposed system, a performance analysis of Weka4WS for executing distributed data mining tasks in different network scenarios is presented.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Curcin, V., Ghanem, M., Guo, Y., Kohler, M., Rowe, A., Syed, J., Wendel, P.: Discovery Net: Towards a Grid of Knowledge Discovery. In: 8th Int. Conf. on Knowledge Discovery and Data Mining (2002)Google Scholar
  2. 2.
    Brezany, P., Hofer, J., Tjoa, A.M., Woehrer, A.: Towards an open service architecture for data mining on the grid. In: Conf. on Database and Expert Systems Applications (2003)Google Scholar
  3. 3.
    Skillicorn, D., Talia, D.: Mining Large Data Sets on Grids: Issues and Prospects. Computing and Informatics 21(4), 347–362 (2002)MATHGoogle Scholar
  4. 4.
    Cannataro, M., Talia, D.: The Knowledge Grid. Communications of the ACM 46(1), 89–93 (2003)CrossRefGoogle Scholar
  5. 5.
    Witten, H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar
  6. 6.
    Czajkowski, K., et al.: The WS-Resource Framework Version 1.0 (2004), http://www-106.ibm.com/developerworks/library/ws-resource/ws-wsrf.pdf
  7. 7.
    Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the Grid. In: Berman, F., Fox, G., Hey, A. (eds.) Grid Computing: Making the Global Infrastructure a Reality, pp. 217–249. Wiley, Chichester (2003)Google Scholar
  8. 8.
    Foster, I.: A Globus Primer (2005), http://www.globus.org/primer
  9. 9.
    Allcock, B., Bresnahan, J., Kettimuthu, R., Link, M., Dumitrescu, C., Raicu, I., Foster, I.: The Globus Striped GridFTP Framework and Server. Conf. on Supercomputing SC 2005 (2005)Google Scholar
  10. 10.
    The UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLRepository.html
  11. 11.
    Khoussainov, R., Zuo, X., Kushmerick, N.: Grid-enabled Weka: A Toolkit for Machine Learning on the Grid. ERCIM News 59 (2004)Google Scholar
  12. 12.
    Shaikh Ali, A., Rana, O.F., Taylor, I.J.: Web Services Composition for Distributed Data Mining. In: Workshop on Web and Grid Services for Scientific Data Analysis (2005)Google Scholar
  13. 13.
    The Triana Problem Solving Environment, http://www.trianacode.org
  14. 14.
    Pérez, M.S., Sanchez, A., Herrero, P., Robles, V., Peña. J. M: Adapting the Weka Data Mining Toolkit to a Grid based environment. In: 3rd Atlantic Web Intelligence Conf. (2005)Google Scholar
  15. 15.
    Tuecke, S., et al.: Open Grid Services Infrastructure (OGSI) Version 1.0 (2003), http://www-unix.globus.org/toolkit/draft-ggf-ogsi-gridservice-33_2003-06-27.pdf

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Domenico Talia
    • 1
  • Paolo Trunfio
    • 1
  • Oreste Verta
    • 1
  1. 1.DEISUniversity of CalabriaRendeItaly

Personalised recommendations