Using Mining@Home for Distributed Ensemble Learning
Mining@Home was recently designed as a distributed architecture for running data mining applications according to the “volunteer computing” paradigm. Mining@Home already proved its efficiency and scalability when used for the discovery of frequent itemsets from a transactional database. However, it can also be adopted in several different scenarios, especially in those where the overall application can be divided into distinct jobs that may be executed in parallel, and input data can be reused, which naturally leads to the use of data cachers. This paper describes the architecture and implementation of the Mining@Home system and evaluates its performance for the execution of ensemble learning applications. In this scenario, multiple learners are used to compute models from the same input data, so as to extract a final model with stronger statistical accuracy. Performance evaluation on a real network, reported in the paper, confirms the efficiency and scalability of the framework.
Unable to display preview. Download preview PDF.
- 1.Anderson, D.P.: Public computing: Reconnecting people to science. In: Proceedings of Conference on Shared Knowledge and the Web, Madrid, Spain, pp. 17–19 (2003)Google Scholar
- 2.Anderson, D.P.: Boinc: A system for public-resource computing and storage. In: GRID 2004: Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing (GRID 2004), Washington, DC, USA, pp. 4–10 (2004)Google Scholar
- 3.Bhaduri, K., Wolff, R., Giannella, C., Kargupta, H.: Distributed decision tree induction in peer-to-peer systems (2008)Google Scholar
- 4.Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
- 6.Cappello, F., Djilali, S., Fedak, G., Herault, T., Magniette, F., Neri, V., Lodygensky, O.: Computing on large-scale distributed systems: Xtrem web architecture, programming models, security, tests and convergence with grid. Future Generation Computer Systems 21(3), 417–437 (2005)CrossRefGoogle Scholar
- 8.Fedak, G., Germain, C., Neri, V., Cappello, F.: Xtremweb: A generic global computing system. In: Proceedings of the IEEE Int. Symp. on Cluster Computing and the Grid, Brisbane, Australia (2001)Google Scholar
- 9.Grama, A.Y., Gupta, A., Kumar, V.: Isoefficiency: Measuring the scalability of parallel algorithms and architectures. IEEE Concurrency 1, 12–21 (1993)Google Scholar
- 11.Lucchese, C., Mastroianni, C., Orlando, S., Talia, D.: Mining@home: Towards a public resource computing framework for distributed data mining. Concurrency and Computation: Practice and Experience 22(5), 658–682 (2010)Google Scholar
- 12.Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson International Edition (2006)Google Scholar
- 13.Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann (2000)Google Scholar