New Generation Computing

, Volume 23, Issue 4, pp 315–337 | Cite as

Mining frequent patterns with the pattern tree

  • Hao Huang
  • Xindong Wu
  • Richard Relue
Regular Papers

Abstract

Mining frequent patterns with a frequent pattern tree (FP-tree in short) avoids costly candidate generation and repeatedly occurrence frequency checking against the support threshold. It therefore achieves much better performance and efficiency than Apriori-like algorithms. However, the database still needs to be scanned twice to get the FP-tree. This can be very time-consuming when new data is added to an existing database because two scans may be needed for not only the new data but also the existing data. In this research we propose a new data structure, the pattern tree (P-tree in short), and a new technique, which can get the P-tree through only one scan of the database and can obtain the corresponding FP-tree with a specified support threshold. Updating a P-tree with new data needs one scan of the new data only, and the existing data does not need to be re-scanned. Our experiments show that the P-tree method outperforms the FP-tree method by a factor up to an order of magnitude in large datasets.

Keywords

Data Mining Association Rules Frequent Patterns 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1).
    Agarwal, R., Aggarwal, C. and Prasad, V. V. V., “Depth-first Generation of Long Patterns,”Proc. of Intl. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 108–118, 2000.Google Scholar
  2. 2).
    Agarwal, R., Aggarwal, C. and Prasad, V. V. V., “A Tree Projection Algorithm for Generation of Frequent Itemsets,”Journ. of Parallel and Distributed Computing, 2000.Google Scholar
  3. 3).
    Agrawal, R. and Srikant, R., “Fast Algorithms for Mining Association Rules,”Int. Conf. Very Large Data Base (VLDB), pp. 487–499, 1994.Google Scholar
  4. 4).
    Agrawal R. and Srikant R., “Mining Sequential Patterns,”IEEE International Conference on Data Engineering (ICDE), pp. 3–14, 1995.Google Scholar
  5. 5).
    Agrawal, R., Imielinski, T. and Swami, A., “Mining Association Rules between Sets of Items in Large Database,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 207–216, 1993.Google Scholar
  6. 6).
    Bayardo, R. J., “Efficiently Mining Long Patterns from Databases,”Special Interest Group on Management of Data (SIGMOD), pp. 85–93, 1998.Google Scholar
  7. 7).
    Cheung, D. W., Han, J., Ng, V. T. and Wong C. Y., “Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique,”IEEE International Conference on Data Engineering (ICDE), pp. 106–114, 1996.Google Scholar
  8. 8).
    Cheung, D. W., Lee, S. D. and Kao, B., “A General Incremental Technique for Maintaining Discovered Association Rules,”Proc. of 5th DASFAA Conf., 1997.Google Scholar
  9. 9).
    Han, J., Pei, J. and Yin Y., “Mining Frequent Patterns without Candidate Generation,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 1–12, 2000.Google Scholar
  10. 10).
    Han, J., Pei, J., Yin, Y. and Mao, R., “Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach,”Data Mining and Knowledge Discovery, 8, 1, pp. 53–87, 2004.CrossRefMathSciNetGoogle Scholar
  11. 11).
    Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H. and Verkamo A.I., “Finding Interesting Rules from Large Sets of Discovered Association Rules,”Third International Conference on Information and Knowledge Management (CIKM’94), pp. 401–408, 1994.Google Scholar
  12. 12).
    Lent, B., Swami, A. and Widom, J., “Clustering Association Rules,”IEEE International Conference on Data Engineering (ICDE), pp. 220–231, 1997.Google Scholar
  13. 13).
    Ng, R., Lakshmanan, L. V. S., Han, J. and Pang A., “Exploratory Mining and Pruning Optimizations of Constrained Associations Rules,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 13–24, 1998.Google Scholar
  14. 14).
    Park, J. S., Chen, M. S. and Yu, P. S., “An Effective Hash-based Algorithm for Mining Association Rules,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 175–186, 1995.Google Scholar
  15. 15).
    Sarawagi, S. Thomas, S. and Agrawal, R., “Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implication,”Proc. of ACM Int. Conf. on Management of Data (SIGMOD), pp. 343–354, 1998.Google Scholar
  16. 16).
    Savasere, A., Omiecinski, E. and Navathe S., “An Efficient Algorithm for Mining Association Rules in Large Databases,”Int. Conf. Very Large Data Base (VLDB), pp. 432–443, 1995.Google Scholar
  17. 17).
    Srikant, R. Vu, Q. and Agrawal, R., “Mining Association Rules with Item Constraints,”Proc. of Intl. Conf. on Knowledge Discovery and Data Mining (KDD), pp. 67–73, 1997.Google Scholar

Copyright information

© Ohmsha, Ltd. and Springer 2005

Authors and Affiliations

  • Hao Huang
    • 1
  • Xindong Wu
    • 2
  • Richard Relue
    • 3
  1. 1.Department of Computer ScienceUniversity of VirginiaCharlottesvilleUSA
  2. 2.Department of Computer ScienceUniversity of VermontBurlingtonUSA
  3. 3.Department of Mathematical and Computer SciencesColorado School of MinesGoldenUSA

Personalised recommendations