Using Mining@Home for Distributed Ensemble Learning

  • Eugenio Cesario
  • Carlo Mastroianni
  • Domenico Talia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7450)

Abstract

Mining@Home was recently designed as a distributed architecture for running data mining applications according to the “volunteer computing” paradigm. Mining@Home already proved its efficiency and scalability when used for the discovery of frequent itemsets from a transactional database. However, it can also be adopted in several different scenarios, especially in those where the overall application can be divided into distinct jobs that may be executed in parallel, and input data can be reused, which naturally leads to the use of data cachers. This paper describes the architecture and implementation of the Mining@Home system and evaluates its performance for the execution of ensemble learning applications. In this scenario, multiple learners are used to compute models from the same input data, so as to extract a final model with stronger statistical accuracy. Performance evaluation on a real network, reported in the paper, confirms the efficiency and scalability of the framework.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anderson, D.P.: Public computing: Reconnecting people to science. In: Proceedings of Conference on Shared Knowledge and the Web, Madrid, Spain, pp. 17–19 (2003)Google Scholar
  2. 2.
    Anderson, D.P.: Boinc: A system for public-resource computing and storage. In: GRID 2004: Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing (GRID 2004), Washington, DC, USA, pp. 4–10 (2004)Google Scholar
  3. 3.
    Bhaduri, K., Wolff, R., Giannella, C., Kargupta, H.: Distributed decision tree induction in peer-to-peer systems (2008)Google Scholar
  4. 4.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
  5. 5.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)MathSciNetMATHGoogle Scholar
  6. 6.
    Cappello, F., Djilali, S., Fedak, G., Herault, T., Magniette, F., Neri, V., Lodygensky, O.: Computing on large-scale distributed systems: Xtrem web architecture, programming models, security, tests and convergence with grid. Future Generation Computer Systems 21(3), 417–437 (2005)CrossRefGoogle Scholar
  7. 7.
    Cozza, P., Mastroianni, C., Talia, D., Taylor, I.: A Super-Peer Model for Multiple Job Submission on a Grid. In: Lehner, W., Meyer, N., Streit, A., Stewart, C. (eds.) Euro-Par Workshops 2006. LNCS, vol. 4375, pp. 116–125. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  8. 8.
    Fedak, G., Germain, C., Neri, V., Cappello, F.: Xtremweb: A generic global computing system. In: Proceedings of the IEEE Int. Symp. on Cluster Computing and the Grid, Brisbane, Australia (2001)Google Scholar
  9. 9.
    Grama, A.Y., Gupta, A., Kumar, V.: Isoefficiency: Measuring the scalability of parallel algorithms and architectures. IEEE Concurrency 1, 12–21 (1993)Google Scholar
  10. 10.
    Guo, Y., Sutiwaraphun, J.: Probing Knowledge in Distributed Data Mining. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 443–452. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  11. 11.
    Lucchese, C., Mastroianni, C., Orlando, S., Talia, D.: Mining@home: Towards a public resource computing framework for distributed data mining. Concurrency and Computation: Practice and Experience 22(5), 658–682 (2010)Google Scholar
  12. 12.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson International Edition (2006)Google Scholar
  13. 13.
    Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Eugenio Cesario
    • 1
  • Carlo Mastroianni
    • 1
  • Domenico Talia
    • 1
    • 2
  1. 1.ICAR-CNRItaly
  2. 2.University of CalabriaItaly

Personalised recommendations