An Abstraction Based Communication Efficient Distributed Association Rule Mining
Association rule mining is one of the most researched areas because of its applicability in various fields. We propose a novel data structure called Sequence Pattern Count, SPC, tree which stores the database compactly and completely and requires only one scan of the database for its construction. The completeness property of the SPC tree with respect to the database makes it more suitable for mining association rules in the context of changing data and changing supports without rebuilding the tree. A performance study shows that SPC tree is efficient and scalable. We also propose a Doubly Logarithmic-depth Tree, DLT, algorithm which uses SPC tree to efficiently mine the huge amounts of geographically distributed datasets in order to minimize the communication and computation costs. DLT requires only O(n) messages for support count exchange and it takes only O(loglogn) time for exchange of messages, which increases its efficiency.
KeywordsAssociation rule mining Distributed databases Sequence Pattern Count Tree Incremental mining Doubly Logarithmic-depth Tree
Unable to display preview. Download preview PDF.
- 1.Agrawal, R., Srikant, R.: Fast algorithms or mining association rules. In: 20th VLDB Conference, pp. 487–499. Morgan Kaufman, San Francisco (1994)Google Scholar
- 3.Cheung, D.W., Han, J., Ng, V.T., Fu, A.W., Fu, Y.: A Fast Distributed Algorithm for Mining Association Rules. In: PDIS 1996 International Conference on Parallel and Distributed Information Systems, Miami, FL, pp. 31–44 (1996)Google Scholar
- 4.Zaki, M.: Parallel and Distributed Association Mining: A Survey. IEEE Concurrency, 14–25 (1999)Google Scholar
- 7.Zaiane, M., El-Hajj, Lu, P.: Fast: Parallel Association Rules Mining without Candidacy Generation. In: ICDM 2001. IEEE 2001 International Conference on Data Mining, pp. 665–668 (2001)Google Scholar
- 8.Schuster, A., Wolff, R., Trock, D.: A High-Performance Distributed Algorithm for Mining Association Rules. In: 3rd IEEE International Conference on Data Mining, Florida, USA (2003)Google Scholar
- 9.Schuster, Wolff, R.: Communication-Efficient Distributed Mining of Association Rules. Data Mining and Knowledge Discovery 8(2) (2004)Google Scholar
- 10.Tsai, P.S.M., Lee, C.C., Chen, A.L.P.: An Efficient Approach for Incremental Association Rule Mining, Technical Report (1998)Google Scholar
- 11.Srikant, R.: Synthetic data generation code for association and sequential patterns, Available from the IBM Quest web site at (1993), http://www.almaden.ibm.com/cs/quest/