Mining frequent patterns with the pattern tree

Huang, Hao; Wu, Xindong; Relue, Richard

doi:10.1007/BF03037636

Mining frequent patterns with the pattern tree

Regular Papers
Published: December 2005

Volume 23, pages 315–337, (2005)
Cite this article

New Generation Computing Aims and scope Submit manuscript

Hao Huang¹,
Xindong Wu² &
Richard Relue³

95 Accesses
4 Citations
Explore all metrics

Abstract

Mining frequent patterns with a frequent pattern tree (FP-tree in short) avoids costly candidate generation and repeatedly occurrence frequency checking against the support threshold. It therefore achieves much better performance and efficiency than Apriori-like algorithms. However, the database still needs to be scanned twice to get the FP-tree. This can be very time-consuming when new data is added to an existing database because two scans may be needed for not only the new data but also the existing data. In this research we propose a new data structure, the pattern tree (P-tree in short), and a new technique, which can get the P-tree through only one scan of the database and can obtain the corresponding FP-tree with a specified support threshold. Updating a P-tree with new data needs one scan of the new data only, and the existing data does not need to be re-scanned. Our experiments show that the P-tree method outperforms the FP-tree method by a factor up to an order of magnitude in large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A random forest guided tour

Article 19 April 2016

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

References

Agarwal, R., Aggarwal, C. and Prasad, V. V. V., “Depth-first Generation of Long Patterns,”Proc. of Intl. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 108–118, 2000.
Agarwal, R., Aggarwal, C. and Prasad, V. V. V., “A Tree Projection Algorithm for Generation of Frequent Itemsets,”Journ. of Parallel and Distributed Computing, 2000.
Agrawal, R. and Srikant, R., “Fast Algorithms for Mining Association Rules,”Int. Conf. Very Large Data Base (VLDB), pp. 487–499, 1994.
Agrawal R. and Srikant R., “Mining Sequential Patterns,”IEEE International Conference on Data Engineering (ICDE), pp. 3–14, 1995.
Agrawal, R., Imielinski, T. and Swami, A., “Mining Association Rules between Sets of Items in Large Database,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 207–216, 1993.
Bayardo, R. J., “Efficiently Mining Long Patterns from Databases,”Special Interest Group on Management of Data (SIGMOD), pp. 85–93, 1998.
Cheung, D. W., Han, J., Ng, V. T. and Wong C. Y., “Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique,”IEEE International Conference on Data Engineering (ICDE), pp. 106–114, 1996.
Cheung, D. W., Lee, S. D. and Kao, B., “A General Incremental Technique for Maintaining Discovered Association Rules,”Proc. of 5th DASFAA Conf., 1997.
Han, J., Pei, J. and Yin Y., “Mining Frequent Patterns without Candidate Generation,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 1–12, 2000.
Han, J., Pei, J., Yin, Y. and Mao, R., “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,”Data Mining and Knowledge Discovery, 8, 1, pp. 53–87, 2004.
Article MathSciNet Google Scholar
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H. and Verkamo A.I., “Finding Interesting Rules from Large Sets of Discovered Association Rules,”Third International Conference on Information and Knowledge Management (CIKM’94), pp. 401–408, 1994.
Lent, B., Swami, A. and Widom, J., “Clustering Association Rules,”IEEE International Conference on Data Engineering (ICDE), pp. 220–231, 1997.
Ng, R., Lakshmanan, L. V. S., Han, J. and Pang A., “Exploratory Mining and Pruning Optimizations of Constrained Associations Rules,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 13–24, 1998.
Park, J. S., Chen, M. S. and Yu, P. S., “An Effective Hash-based Algorithm for Mining Association Rules,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 175–186, 1995.
Sarawagi, S. Thomas, S. and Agrawal, R., “Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implication,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 343–354, 1998.
Savasere, A., Omiecinski, E. and Navathe S., “An Efficient Algorithm for Mining Association Rules in Large Databases,”Int. Conf. Very Large Data Base (VLDB), pp. 432–443, 1995.
Srikant, R. Vu, Q. and Agrawal, R., “Mining Association Rules with Item Constraints,”Proc. of Intl. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 67–73, 1997.

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Virginia, 22904, Charlottesville, Virginia, USA
Hao Huang
Department of Computer Science, University of Vermont, 05405, Burlington, Vermont, USA
Xindong Wu
Department of Mathematical and Computer Sciences, Colorado School of Mines, 80401, Golden, Colorado, USA
Richard Relue

Authors

Hao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Richard Relue
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

A preliminary version of this paper has been published in theProceedings of the 2002 IEEE International Conference on Data Mining (ICDM ’02), 629–632.

Hao Huang: He is pursuing his Ph.D. degree in the Department of Computer Science at the University of Virginia. His research interests are Gird Computing, Data Mining and their applications in Bioinformatics. He received his M.S. in Computer Science from Colorado School of Mines in 2001.

Xindong Wu, Ph.D.: He is Professor and Chair of the Department of Computer Science at the University of Vermont, USA. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, AAAI, ICML, KDD, ICDM, and WWW. Dr. Wu is the Executive Editor (January 1, 1999-December 31, 2004) and an Honorary Editor-in-Chief (starting January 1, 2005) of Knowledge and Information Systems (a peer-reviewed archival journal published by Springer), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP), and the Chair of the IEEE Computer Society Technical Committee on Computational Intelligence (TCCI). He served as an Associate Editor for the IEEE Transactions on Knowledge and Data Engineering (TKDE) between January 1, 2000 and December 31, 2003, and is the Editor-in-Chief of TKDE since January 1, 2005. He is the winner of the 2004 ACM SIGKDD Service Award.

Richard Relue, Ph.D.: He received his Ph.D. in Computer Science from the Colorado School of Mines in 2003. His research interests include association rules in data mining, neural networks for automated classification, and artificial intelligence for robot navigation. He has been an Information Technology consultant since 1992, working with Ball Aerospace and Technology, Rational Software, Natural Fuels Corporation, and Western Interstate Commission for Higher Education (WICHE).

About this article

Cite this article

Huang, H., Wu, X. & Relue, R. Mining frequent patterns with the pattern tree. New Gener Comput 23, 315–337 (2005). https://doi.org/10.1007/BF03037636

Download citation

Received: 17 February 2004
Revised: 27 October 2004
Issue Date: December 2005
DOI: https://doi.org/10.1007/BF03037636

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining frequent patterns with the pattern tree

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Keywords

Navigation

Mining frequent patterns with the pattern tree

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

References

Author information

Authors and Affiliations

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation