Tree Structure Based Parallel Frequent Pattern Mining on PC Cluster

Pramudiono, Iko; Kitsuregawa, Masaru

doi:10.1007/978-3-540-45227-0_53

Iko Pramudiono⁷ &
Masaru Kitsuregawa⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2736))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

683 Accesses
9 Citations

Abstract

Frequent pattern mining has become a fundamental technique for many data mining tasks. Many modern frequent pattern mining algorithms such as FP-growth adopt tree structure to compress database into on-memory compact data structure. Recent studies show that the tree structure can be efficiently mined using frequent pattern growth methodology. Higher level of performance improvement can be expected from parallel execution. In particular, PC cluster is gaining popularity as the high cost-performance parallel platform for data extensive task like data mining. However, we have to address many issues such as space distribution on each node and skew handling to efficiently mine frequent patterns from tree structure on a shared-nothing environment. We develop a framework to address those issues using novel granularity control mechanism and tree remerging. The common framework can be enhanced with temporal constrain to mine web access patterns. We invent improved support counting procedure to reduce the additional communication overhead. Real implementation using up to 32 nodes confirms that good speedup ratio can be achieved even on skewed environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. of the ACM SIGMOD Conference on Management of Data (1993)
Google Scholar
Agrawal, R., Shafer, J.C.: Parallel Mining of Associaton Rules. IEEE Transaction on Knowledge and Data Engineering 8(6), 962–969 (1996)
Article Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceedings of the 20th Int. Conf. on VLDB, pp. 487–499 (September 1994)
Google Scholar
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proc. of International Conference of Data Engineering, pp. 3–14 (March 1995)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining Frequent Pattern without Candidate Generation. In: Proc. of the ACM SIGMOD Conf. on Management of Data (2000)
Google Scholar
Goda, K., Tamura, T., Oguci, M., Kitsuregawa, M.: Run-time Load Balancing System on SAN-connected PC Cluster for Dynamic Injection of CPU and Disk Resource. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, p. 182. Springer, Heidelberg (2002)
Chapter Google Scholar
Orlando, S., Palmerini, P., Perego, R., Silvestri, F.: Adaptive and Resource- Aware Mining of Frequent Sets. In: Proc. of the Int. Conf. on Data Mining (2002)
Google Scholar
Park, J.S., Chen, M.-S., Yu, P.S.: Efficient Parallel Algorithms for Mining Association Rules. In: Proc. of 4th Int. Conf. on Information and Knowledge Management (CIKM 1995), pp. 31–36 (November 1995)
Google Scholar
Pei, J., Han, J., Mortazavi-asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805. Springer, Heidelberg (2000)
Google Scholar
Shintani, T., Kitsuregawa, M.: Hash Based Parallel Algorithms for Mining Association Rules. In: IEEE Fourth Int. Conf. on Parallel and Distributed Information Systems, pp. 19–30 (December 1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8505, Japan
Iko Pramudiono & Masaru Kitsuregawa

Authors

Iko Pramudiono
View author publications
You can also search for this author in PubMed Google Scholar
Masaru Kitsuregawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Gerstner Laboratory, Czech Technical University in Prague, Technická 2, 166 27, Prague 6, Czech Republic
Vladimír Mařík
Johannes Kepler University Linz, Altenberger Str. 69, 4040, Linz, Austria
Werner Retschitzegger
Faculty of Electrical Engineering, The Gerstner Laboratory, Czech Technical University in Prague, Technická 2, 166 27, Prague 6, Czech Republic
Olga Štěpánková

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pramudiono, I., Kitsuregawa, M. (2003). Tree Structure Based Parallel Frequent Pattern Mining on PC Cluster. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_53

Download citation

DOI: https://doi.org/10.1007/978-3-540-45227-0_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40806-2
Online ISBN: 978-3-540-45227-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics