From Parallel Data Mining to Grid-Enabled Distributed Knowledge Discovery

  • Eugenio Cesario
  • Domenico Talia
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4482)


Data mining often is a compute intensive and time requiring process. For this reason, several data mining systems have been implemented on parallel computing platforms to achieve high performance in the analysis of large data sets. Moreover, when large data repositories are coupled with geographical distribution of data, users and systems, more sophisticated technologies are needed to implement high-performance distributed KDD systems. Recently computational Grids emerged as privileged platforms for distributed computing and a growing number of Grid-based KDD systems have been designed. In this paper we first outline different ways to exploit parallelism in the main data mining techniques and algorithms, then we discuss Grid-based KDD systems.


Rough Set Parallel Data Mining Distributed Data Mining Grid 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cohen, W.W.: Fast Effective Rule Induction. In: Proc. of the 12th Int. Conf. Machine Learning (ICML’95), Tahoe City, California, USA, pp. 115–123 (1995)Google Scholar
  2. 2.
    Provost, F.J., Aronis, J.M.: Scaling up inductive learning with massive parallelism. International Journal of Machine Learning 23(1), 33–46 (1996)Google Scholar
  3. 3.
    Skillicorn, D.: Strategies for Parallel Data Mining. IEEE Concurrency 7(4), 26–35 (1999)CrossRefGoogle Scholar
  4. 4.
    Talia, D.: Parallelism in Knowledge Discovery Techniques. In: Fagerholm, J., et al. (eds.) PARA 2002. LNCS, vol. 2367, pp. 127–136. Springer, Heidelberg (2002)Google Scholar
  5. 5.
    Foster, I., et al.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Globus Project (2002),
  6. 6.
    Congiusta, A., Talia, D., Trunfio, P.: Parallel and Grid-Based Data Mining. In: Data Mining and Knowledge Discovery Handbook, pp. 1017–1041. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Pawlak, Z.: Rough Sets. International Journal of Computer and Information Science 11, 341–356 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Düntsch, I., Günther, G.: Roughian: Rough information analysis. International Journal of Intelligent Systems 16(1), 121–147 (2001)CrossRefzbMATHGoogle Scholar
  9. 9.
    Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems. In: Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht (1992)Google Scholar
  10. 10.
    Park, B., Kargupta, H.: Distributed Data Mining: Algorithms, Systems, and Applications. In: Data Mining Handbook, pp. 341–358. IEA Publisher, Amsterdam (2002)Google Scholar
  11. 11.
    Moore, R.: Knowledge-based Grids. In: Proc. of the 18th IEEE Symposium on Mass Storage Systems and 9th Goddard Conference on Mass Storage Systems and Technologies, San Diego, USA (2001)Google Scholar
  12. 12.
    Berman, F.: From TeraGrid to Knowledge Grid. Communications of the ACM 44(11), 27–28 (2001)CrossRefGoogle Scholar
  13. 13.
    Johnston, W.E.: Computational and Data Grids in Large Scale Science and Engineering. Future Generation Computer Systems 18(8), 1085–1100 (2002)CrossRefzbMATHGoogle Scholar
  14. 14.
    Talia, D., Cannataro, M., Trunfio, P.: KNOWLEDGE GRID: High Performance Knowledge Discovery Services on the Grid. In: Lee, C.A. (ed.) GRID 2001. LNCS, vol. 2242, Springer, Heidelberg (2001)Google Scholar
  15. 15.
    Cannataro, M., Talia, D.: The Knowledge Grid. Communications of the ACM 46(1), 89–93 (2003)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Eugenio Cesario
    • 1
  • Domenico Talia
    • 1
    • 2
  1. 1.ICAR-CNRItaly
  2. 2.DEIS-University of CalabriaItaly

Personalised recommendations