An Abstraction Based Communication Efficient Distributed Association Rule Mining

  • P. Santhi Thilagam
  • V. S. Ananthanarayana
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4904)

Abstract

Association rule mining is one of the most researched areas because of its applicability in various fields. We propose a novel data structure called Sequence Pattern Count, SPC, tree which stores the database compactly and completely and requires only one scan of the database for its construction. The completeness property of the SPC tree with respect to the database makes it more suitable for mining association rules in the context of changing data and changing supports without rebuilding the tree. A performance study shows that SPC tree is efficient and scalable. We also propose a Doubly Logarithmic-depth Tree, DLT, algorithm which uses SPC tree to efficiently mine the huge amounts of geographically distributed datasets in order to minimize the communication and computation costs. DLT requires only O(n) messages for support count exchange and it takes only O(loglogn) time for exchange of messages, which increases its efficiency.

Keywords

Association rule mining Distributed databases Sequence Pattern Count Tree Incremental mining Doubly Logarithmic-depth Tree 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms or mining association rules. In: 20th VLDB Conference, pp. 487–499. Morgan Kaufman, San Francisco (1994)Google Scholar
  2. 2.
    Agrawal, R., Shafer, J.C.: Parallel Mining of Association Rules. IEEE Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)CrossRefGoogle Scholar
  3. 3.
    Cheung, D.W., Han, J., Ng, V.T., Fu, A.W., Fu, Y.: A Fast Distributed Algorithm for Mining Association Rules. In: PDIS 1996 International Conference on Parallel and Distributed Information Systems, Miami, FL, pp. 31–44 (1996)Google Scholar
  4. 4.
    Zaki, M.: Parallel and Distributed Association Mining: A Survey. IEEE Concurrency, 14–25 (1999)Google Scholar
  5. 5.
    Han, J., Pei, J., Yiwen, Y.: Mining Frequent Patterns without Candidate Generation. In: ACM-SIGMOD International Conference on Management of Data, pp. 1–12. ACM Press, New York (2000)CrossRefGoogle Scholar
  6. 6.
    Ananthanarayana, V.S., Subramanian, D.K., Murty, M.N.: Scalable, Distributed and Dynamic Mining of Association Rules. In: Prasanna, V.K., Vajapeyam, S., Valero, M. (eds.) HIPC 2000. LNCS, vol. 1970, pp. 559–566. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Zaiane, M., El-Hajj, Lu, P.: Fast: Parallel Association Rules Mining without Candidacy Generation. In: ICDM 2001. IEEE 2001 International Conference on Data Mining, pp. 665–668 (2001)Google Scholar
  8. 8.
    Schuster, A., Wolff, R., Trock, D.: A High-Performance Distributed Algorithm for Mining Association Rules. In: 3rd IEEE International Conference on Data Mining, Florida, USA (2003)Google Scholar
  9. 9.
    Schuster, Wolff, R.: Communication-Efficient Distributed Mining of Association Rules. Data Mining and Knowledge Discovery 8(2) (2004)Google Scholar
  10. 10.
    Tsai, P.S.M., Lee, C.C., Chen, A.L.P.: An Efficient Approach for Incremental Association Rule Mining, Technical Report (1998)Google Scholar
  11. 11.
    Srikant, R.: Synthetic data generation code for association and sequential patterns, Available from the IBM Quest web site at (1993), http://www.almaden.ibm.com/cs/quest/

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • P. Santhi Thilagam
    • 1
  • V. S. Ananthanarayana
    • 2
  1. 1.Sr. Lecturer, Dept. of Computer Engineering India
  2. 2.Professor, Dept. of Information Technology India

Personalised recommendations