Advertisement

Asynchronous and Anticipatory Filter-Stream Based Parallel Algorithm for Frequent Itemset Mining

  • Adriano Veloso
  • Wagner MeiraJr.
  • Renato Ferreira
  • Dorgival Guedes Neto
  • Srinivasan Parthasarathy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3202)

Abstract

In this paper we propose a novel parallel algorithm for frequent itemset mining. The algorithm is based on the filter-stream programming model, in which the frequent itemset mining process is represented as a data flow controlled by a series of producer and consumer components (called filters), and the data flow (communication) between such filters is made via streams. When production rate matches consumption rate, and communication overhead between producer and consumer filters is minimized, a high degree of asynchrony is achieved. Following this strategy, our algorithm employs an asynchronous candidate generation, and minimizes communication between filters by transferring only the necessary aggregated information. Another nice feature of our algorithm is a look forward approach which accelerates frequent itemset determination. Extensive evaluation shows the parallel performance and scalability of our algorithm.

Keywords

Association Rule Parallel Algorithm Frequent Itemset Local Support Data Skewness 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Acharya, A., Uysal, M., Satlz, J.: Active disks: Programming model, algorithms and evaluation. In: Proc. of the Intl. Conf. on Architectural Support for programming Languages and Operating Systems (ASPLOS VIII), October 1998, pp. 81–91. ACM Press, New York (1998)Google Scholar
  2. 2.
    Agrawal, R., Shafer, J.: Parallel mining of association rules. Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)CrossRefGoogle Scholar
  3. 3.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the Intl. Conf. on Very Large Databases (VLDB), SanTiago, Chile, June 1994, pp. 487–499 (1994)Google Scholar
  4. 4.
    Beynon, M., Chang, C., Catalyurek, U., Kurc, T., Sussman, A., Andrade, H., Ferreira, R., Saltz, J.: Processing large-scale multi-dimensional data in parallel and distributed environments. Parallel Computing 28(5), 827–859 (2002)CrossRefGoogle Scholar
  5. 5.
    Beynon, M., Kurc, T., Sussman, A., Saltz, J.: Design of a framework for data-intensive wide-area applications. In: Proc of the Heterogeneous Computing Workshop (HCW), May 2000, pp. 116–130. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  6. 6.
    Catalyurek, U., Gray, M., Kurc, T., Saltz, J., Ferreira, R.: A component-based implementation of multiple sequence alignment. In: Proc. of the ACM Symposium on Applied Computing (SAC), pp. 122–126. ACM, New York (2003)Google Scholar
  7. 7.
    Cheung, D., Xiao, Y.: Effect of data distribution in parallel mining of associations. Data Mining and Knowledge Discovery 3(3), 291–314 (1999)CrossRefGoogle Scholar
  8. 8.
    Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. Transactions on Knowledge and Data Engineering 12(3), 728–737 (2000)Google Scholar
  9. 9.
    Joshi, M., Han, E., Karypis, G., Kumar, V.: Efficient parallel algorithms for mining associations. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 418–429. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)CrossRefGoogle Scholar
  11. 11.
    Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: An efficient parallel and distributed algorithm for counting frequent sets. In: Proc. of the Intl. Conf. on Vector and Parallel Processing (VECPAR), Porto, Portugal, pp. 421–435 (2002)Google Scholar
  12. 12.
    Parthasarathy, S., Zaki, M., Ogihara, M., Li, W.: Parallel data mining for association rules on shared-memory systems. Knowledge and Information Systems 3(1), 1–29 (2001)zbMATHCrossRefGoogle Scholar
  13. 13.
    Spencer, M., Ferreira, R., Beynon, M., Kurc, T., Catalyurek, U., Sussman, A., Saltz, J.: Executing multiple pipelined data analysis operations in the grid. In: Proc. of the ACM/IEEE Conf. on Supercomputing, pp. 1–18. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  14. 14.
    Veloso, A., Otey, M., Parthasarathy, S., Meira, W.: Parallel and distributed frequent itemset mining on dynamic datasets. In: Pinkston, T.M., Prasanna, V.K. (eds.) HiPC 2003. LNCS (LNAI), vol. 2913, pp. 184–193. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  15. 15.
    Zaki, M., Gouda, K.: Fast vertical mining using diffsets. In: Proc. of the Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), August 2003, ACM, New York (2003)Google Scholar
  16. 16.
    Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery 4(1), 343–373 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Adriano Veloso
    • 1
  • Wagner MeiraJr.
    • 1
  • Renato Ferreira
    • 1
  • Dorgival Guedes Neto
    • 1
  • Srinivasan Parthasarathy
    • 2
  1. 1.Computer Science DepartmentUniversidade Federal de Minas GeraisBrazil
  2. 2.Department of Computer and Information ScienceThe Ohio-State UniversityUSA

Personalised recommendations