Skip to main content
Log in

Mining frequent patterns with the pattern tree

  • Regular Papers
  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

Mining frequent patterns with a frequent pattern tree (FP-tree in short) avoids costly candidate generation and repeatedly occurrence frequency checking against the support threshold. It therefore achieves much better performance and efficiency than Apriori-like algorithms. However, the database still needs to be scanned twice to get the FP-tree. This can be very time-consuming when new data is added to an existing database because two scans may be needed for not only the new data but also the existing data. In this research we propose a new data structure, the pattern tree (P-tree in short), and a new technique, which can get the P-tree through only one scan of the database and can obtain the corresponding FP-tree with a specified support threshold. Updating a P-tree with new data needs one scan of the new data only, and the existing data does not need to be re-scanned. Our experiments show that the P-tree method outperforms the FP-tree method by a factor up to an order of magnitude in large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal, R., Aggarwal, C. and Prasad, V. V. V., “Depth-first Generation of Long Patterns,”Proc. of Intl. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 108–118, 2000.

  2. Agarwal, R., Aggarwal, C. and Prasad, V. V. V., “A Tree Projection Algorithm for Generation of Frequent Itemsets,”Journ. of Parallel and Distributed Computing, 2000.

  3. Agrawal, R. and Srikant, R., “Fast Algorithms for Mining Association Rules,”Int. Conf. Very Large Data Base (VLDB), pp. 487–499, 1994.

  4. Agrawal R. and Srikant R., “Mining Sequential Patterns,”IEEE International Conference on Data Engineering (ICDE), pp. 3–14, 1995.

  5. Agrawal, R., Imielinski, T. and Swami, A., “Mining Association Rules between Sets of Items in Large Database,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 207–216, 1993.

  6. Bayardo, R. J., “Efficiently Mining Long Patterns from Databases,”Special Interest Group on Management of Data (SIGMOD), pp. 85–93, 1998.

  7. Cheung, D. W., Han, J., Ng, V. T. and Wong C. Y., “Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique,”IEEE International Conference on Data Engineering (ICDE), pp. 106–114, 1996.

  8. Cheung, D. W., Lee, S. D. and Kao, B., “A General Incremental Technique for Maintaining Discovered Association Rules,”Proc. of 5th DASFAA Conf., 1997.

  9. Han, J., Pei, J. and Yin Y., “Mining Frequent Patterns without Candidate Generation,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 1–12, 2000.

  10. Han, J., Pei, J., Yin, Y. and Mao, R., “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,”Data Mining and Knowledge Discovery, 8, 1, pp. 53–87, 2004.

    Article  MathSciNet  Google Scholar 

  11. Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H. and Verkamo A.I., “Finding Interesting Rules from Large Sets of Discovered Association Rules,”Third International Conference on Information and Knowledge Management (CIKM’94), pp. 401–408, 1994.

  12. Lent, B., Swami, A. and Widom, J., “Clustering Association Rules,”IEEE International Conference on Data Engineering (ICDE), pp. 220–231, 1997.

  13. Ng, R., Lakshmanan, L. V. S., Han, J. and Pang A., “Exploratory Mining and Pruning Optimizations of Constrained Associations Rules,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 13–24, 1998.

  14. Park, J. S., Chen, M. S. and Yu, P. S., “An Effective Hash-based Algorithm for Mining Association Rules,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 175–186, 1995.

  15. Sarawagi, S. Thomas, S. and Agrawal, R., “Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implication,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 343–354, 1998.

  16. Savasere, A., Omiecinski, E. and Navathe S., “An Efficient Algorithm for Mining Association Rules in Large Databases,”Int. Conf. Very Large Data Base (VLDB), pp. 432–443, 1995.

  17. Srikant, R. Vu, Q. and Agrawal, R., “Mining Association Rules with Item Constraints,”Proc. of Intl. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 67–73, 1997.

Download references

Author information

Authors and Affiliations

Authors

Additional information

A preliminary version of this paper has been published in theProceedings of the 2002 IEEE International Conference on Data Mining (ICDM ’02), 629–632.

Hao Huang: He is pursuing his Ph.D. degree in the Department of Computer Science at the University of Virginia. His research interests are Gird Computing, Data Mining and their applications in Bioinformatics. He received his M.S. in Computer Science from Colorado School of Mines in 2001.

Xindong Wu, Ph.D.: He is Professor and Chair of the Department of Computer Science at the University of Vermont, USA. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, AAAI, ICML, KDD, ICDM, and WWW. Dr. Wu is the Executive Editor (January 1, 1999-December 31, 2004) and an Honorary Editor-in-Chief (starting January 1, 2005) of Knowledge and Information Systems (a peer-reviewed archival journal published by Springer), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP), and the Chair of the IEEE Computer Society Technical Committee on Computational Intelligence (TCCI). He served as an Associate Editor for the IEEE Transactions on Knowledge and Data Engineering (TKDE) between January 1, 2000 and December 31, 2003, and is the Editor-in-Chief of TKDE since January 1, 2005. He is the winner of the 2004 ACM SIGKDD Service Award.

Richard Relue, Ph.D.: He received his Ph.D. in Computer Science from the Colorado School of Mines in 2003. His research interests include association rules in data mining, neural networks for automated classification, and artificial intelligence for robot navigation. He has been an Information Technology consultant since 1992, working with Ball Aerospace and Technology, Rational Software, Natural Fuels Corporation, and Western Interstate Commission for Higher Education (WICHE).

About this article

Cite this article

Huang, H., Wu, X. & Relue, R. Mining frequent patterns with the pattern tree. New Gener Comput 23, 315–337 (2005). https://doi.org/10.1007/BF03037636

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03037636

Keywords

Navigation