Abstract
Several kinds of scientific and commercial applications require the execution of a large number of independent tasks. One highly successful and low cost mechanism for acquiring the necessary compute power for these applications is the “public-resource computing”, or “desktopGrid” paradigm, which exploits the computational power of private computers. So far, this paradigm has not been applied to data mining applications for two main reasons. First, it is not trivial to decompose a data mining algorithm into truly independent sub-tasks. Second, the large volume of data involved makes it difficult to handle the communication costs of a parallel paradigm. In this paper, we focus on one of the main data mining problem: the extraction of closed frequent itemsets from transactional databases. We show that is possible to decompose this problem into independent tasks, which however need to share a large volume of data. We thus introduce a data-intensive computing network, which adopts a P2P topology based on super peers with caching capabilities, aiming to support the dissemination of large amounts of information. Finally, we evaluate the execution of our data mining job on such network.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Al-Shakarchi, E., Cozza, P., Harrison, A., Mastroianni, C., Shields, M., Talia, D., and Taylor, I. (2007). Distributing workflows over a ubiquitous p2p network. Scientific Programming, 15 (4):269-281.
Anderson, D. P. (2004). Boinc: A system for public-resource computing and storage. In GRID ’04: Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing (GRID’04), pages 4-10.
Barab ási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439):509-512.
Cappello, F., Djilali, S., Fedak, G., Herault, T., Magniette, F., Neri, V., and Lodygensky, O. (2005). Computing on large-scale distributed systems: Xtrem web architecture, programming models, security, tests and convergence with grid. Future Generation Computer Systems, 21 (3):417-437.
Cong, S., Han, J., and Padua, D. A. (2005). Parallel mining of closed sequential patterns. In KDD ’05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 562-567.
Datta, S., Bhaduri, K., Giannella, C., Wolff, R., and Kargupta, H. (2006). Distributed data mining in peer-to-peer networks. IEEE Internet Computing, 10(4):18-26.
Fedak, G., Germain, C., Neri, V., and Cappello, F. (2001). Xtremweb: A generic global computing system. In Proceedings of the IEEE Int. Symp. on Cluster Computing and the Grid, Brisbane, Australia.
Khoussainov, R., Zuo, X., and Kushmerick, N. (2004). A toolkit for machine learning on the grid. ERCIM News No. 59.
Lucchese, C., Orlando, S., and Perego, R. (2007). Parallel mining of frequent closed patterns: Harnessing modern computer architectures. In ICDM ’07: Proceedings of the Fourth IEEE International Conference on Data Mining.
Talia, D., Trunfio, P., and Verta, O. (2005). Weka4ws: A wsrf-enabled weka toolkit for distributed data mining on grids. In Proc. of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2005), Porto, Portugal.
Wille, R. (1982). Restructuring lattice theory: an approach based on hierarchies of concepts. In Rival, I., editor, Ordered sets, pages 445-470, Dordrecht-Boston. Reidel.
Wurst, M. and Morik, K. (2007). Distributed feature extraction in a p2p setting: a case study. Future Gener. Comput. Syst., 23(1):69-75.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Barbalace, D., Lucchese, C., Mastroianni, C., Orlando, S., Talia, D. (2008). Mining@Home:Public resource Computing For Distributed Data Mining. In: Priol, T., Vanneschi, M. (eds) From Grids to Service and Pervasive Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09455-7_16
Download citation
DOI: https://doi.org/10.1007/978-0-387-09455-7_16
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09454-0
Online ISBN: 978-0-387-09455-7
eBook Packages: Computer ScienceComputer Science (R0)