Abstract
In this paper we propose a novel parallel algorithm for frequent itemset mining. The algorithm is based on the filter-stream programming model, in which the frequent itemset mining process is represented as a data flow controlled by a series of producer and consumer components (called filters), and the data flow (communication) between such filters is made via streams. When production rate matches consumption rate, and communication overhead between producer and consumer filters is minimized, a high degree of asynchrony is achieved. Following this strategy, our algorithm employs an asynchronous candidate generation, and minimizes communication between filters by transferring only the necessary aggregated information. Another nice feature of our algorithm is a look forward approach which accelerates frequent itemset determination. Extensive evaluation shows the parallel performance and scalability of our algorithm.
This work has been partially supported by CNPq-Brazil and by CNPq / CT-INFO / PTACS.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Acharya, A., Uysal, M., Satlz, J.: Active disks: Programming model, algorithms and evaluation. In: Proc. of the Intl. Conf. on Architectural Support for programming Languages and Operating Systems (ASPLOS VIII), October 1998, pp. 81–91. ACM Press, New York (1998)
Agrawal, R., Shafer, J.: Parallel mining of association rules. Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the Intl. Conf. on Very Large Databases (VLDB), SanTiago, Chile, June 1994, pp. 487–499 (1994)
Beynon, M., Chang, C., Catalyurek, U., Kurc, T., Sussman, A., Andrade, H., Ferreira, R., Saltz, J.: Processing large-scale multi-dimensional data in parallel and distributed environments. Parallel Computing 28(5), 827–859 (2002)
Beynon, M., Kurc, T., Sussman, A., Saltz, J.: Design of a framework for data-intensive wide-area applications. In: Proc of the Heterogeneous Computing Workshop (HCW), May 2000, pp. 116–130. IEEE Computer Society Press, Los Alamitos (2000)
Catalyurek, U., Gray, M., Kurc, T., Saltz, J., Ferreira, R.: A component-based implementation of multiple sequence alignment. In: Proc. of the ACM Symposium on Applied Computing (SAC), pp. 122–126. ACM, New York (2003)
Cheung, D., Xiao, Y.: Effect of data distribution in parallel mining of associations. Data Mining and Knowledge Discovery 3(3), 291–314 (1999)
Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. Transactions on Knowledge and Data Engineering 12(3), 728–737 (2000)
Joshi, M., Han, E., Karypis, G., Kumar, V.: Efficient parallel algorithms for mining associations. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 418–429. Springer, Heidelberg (2000)
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: An efficient parallel and distributed algorithm for counting frequent sets. In: Proc. of the Intl. Conf. on Vector and Parallel Processing (VECPAR), Porto, Portugal, pp. 421–435 (2002)
Parthasarathy, S., Zaki, M., Ogihara, M., Li, W.: Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems 3(1), 1–29 (2001)
Spencer, M., Ferreira, R., Beynon, M., Kurc, T., Catalyurek, U., Sussman, A., Saltz, J.: Executing multiple pipelined data analysis operations in the grid. In: Proc. of the ACM/IEEE Conf. on Supercomputing, pp. 1–18. IEEE Computer Society Press, Los Alamitos (2002)
Veloso, A., Otey, M., Parthasarathy, S., Meira, W.: Parallel and distributed frequent itemset mining on dynamic datasets. In: Pinkston, T.M., Prasanna, V.K. (eds.) HiPC 2003. LNCS (LNAI), vol. 2913, pp. 184–193. Springer, Heidelberg (2003)
Zaki, M., Gouda, K.: Fast vertical mining using diffsets. In: Proc. of the Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), August 2003, ACM, New York (2003)
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery 4(1), 343–373 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Veloso, A., Meira, W., Ferreira, R., Neto, D.G., Parthasarathy, S. (2004). Asynchronous and Anticipatory Filter-Stream Based Parallel Algorithm for Frequent Itemset Mining. In: Boulicaut, JF., Esposito, F., Giannotti, F., Pedreschi, D. (eds) Knowledge Discovery in Databases: PKDD 2004. PKDD 2004. Lecture Notes in Computer Science(), vol 3202. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30116-5_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-30116-5_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23108-0
Online ISBN: 978-3-540-30116-5
eBook Packages: Springer Book Archive