Skip to main content
Log in

A new algorithm for fast mining frequent itemsets using N-lists

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks. In this paper, we propose a novel vertical data representation called N-list, which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets. Based on the N-list data structure, we develop an efficient mining algorithm, PrePost, for mining all frequent itemsets. Efficiency of PrePost is achieved by the following three reasons. First, N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree. Second, the counting of itemsets’ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m + n) by an efficient strategy, where m and n are the cardinalities of the two N-lists respectively. Third, PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list. We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets. The experimental results show that the PrePost algorithm is the fastest in most cases. Even though the algorithm consumes more memory when the datasets are sparse, it is still the fastest one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Han J W, Pei J, Yin Y W. Mining frequent itemsets without candidate generation. In: The 2000 ACM SIGMOD International Conference on Management of data (SIGMOD’00), New York, 2000. 1–12

  2. Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In: The 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD’93), Washington, 1993. 207–216

  3. Han J, Cheng H, Xin D, et al. Frequent itemset mining: current status and future directions. Data Min Knowl Discov, 2007, 15: 55–86

    Article  MathSciNet  Google Scholar 

  4. Baralis E, Cerquitelli T, Chiusano S. IMine: index support for item set mining. IEEE TKDE J, 2009, 21: 493–506

    Google Scholar 

  5. Zaki M J, Gouda K. Fast vertical mining using diffsets, In: The 9th ACM SIGKDD International Conference on. Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 326–335

  6. Deng Z H, Wang Z H. A new fast vertical method for mining frequent itemsets. Int J Comput Intell Syst, 2010, 3: 733–744

    Article  Google Scholar 

  7. Agrawal R, Srikant R. Fast algorithm for mining Association rules. In: The 20th International Conference on Very Large Data Bases (VLDB’94), Santiago de Chile, 1994. 487–499

  8. Savasere A, Omiecinski E, Navathe S. An efficient algorithm for mining association rules in large databases. In: The 21th International Conference on Very Large Data Bases (VLDB’95), Zurich, 1995. 432–443

  9. Shenoy P, Haritsa J R, Sundarshan S, et al. Turbo-charging vertical mining of large databases. In: ACM International Conference on Management of Data and Symposium on Principles of Database Systems (SIGMOD’00), Dallas, 2000. 22–33

  10. Zaki M J. Scalable algorithms for association mining. IEEE TKDE J, 2000, 12: 372–390

    Google Scholar 

  11. Pei J, Han J, Lu H, et al. H-mine: Hyper-structure mining of frequent itemsets in large databases. In: IEEE International Conference on Data Mining (ICDM’01), San Jose, 2001. 441–448

  12. Liu G, Lu H, Lou W, et al. Efficient mining of frequent itemsets using ascending frequency ordered prefix-tree. DMKD J, 2004, 9: 249–274

    MathSciNet  Google Scholar 

  13. Grahne G, Zhu J. Fast algorithms for frequent itemset mining using FP-Trees. IEEE TKDE J, 2005, 17: 1347–1362

    Google Scholar 

  14. Woon Y K, Ng W K, Lim E P. A support-ordered trie for fast frequent itemset discovery. IEEE TKDE J, 2004, 16: 875–879

    Google Scholar 

  15. Ananthanarayana V S, Murty N M, Subramanian D K. An incremental data mining algorithm for compact realization of prototypes. Pattern Recognit, 2001, 34: 2249–2251

    Article  MATH  Google Scholar 

  16. Grust T. Accelerating xpath location steps, In: The 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD’02), Madison, 2002. 109–120

  17. Kleinberg J, Tardos E. Algorithm design. Boston: Addison Wesley, 2005

    Google Scholar 

  18. Bayardo Jr R J. Efficiently mining long itemsets from databases. In: ACM SIGMOD International Conference on Management of Data (SIGMOD’98), Seattle, 1998. 85–93

  19. Burdick D, Calimlim M, Flannick J, et al. Mafia: A maximal frequent itemset algorithm. IEEE TKDE J, 2005, 17: 1490–1504

    Google Scholar 

  20. Wang J Y, Han J, Pei J. CLOSET+: Searching for the Best Strate-gies for Mining frequent closed itemsets. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 236–245

  21. Lee A J T, Wang C S, Weng W Y, et al. An efficient algorithm for mining closed inter-transaction itemsets. Data Knowle Eng, 2008, 66: 68–91

    Article  Google Scholar 

  22. Xin D, Han J, Yan X, et al. On compressing frequent itemsets. Data Knowl Eng, 2007, 60: 5–29

    Article  Google Scholar 

  23. Li H, Chen H. Mining non-derivable frequent itemsets over data stream. Data Knowl Eng, 2009, 68: 481–498

    Article  Google Scholar 

  24. Yao H, Hamilton H J, Butz C J. A foundational approach to mining itemset utilities from databases. In: The SIAM Data Mining (SDM’04), Florida, 2004. 482–486

  25. Yao H, Hamilton H J. Mining itemset utilities from transaction databases. Data Knowl Eng, 2006, 59: 603–626

    Article  Google Scholar 

  26. Cao L B, Zhao Y C, Zhang C Q. Mining impact-targeted activity itemsets in imbalanced data. IEEE TKDE, 2008, 20: 1053–1066

    Google Scholar 

  27. Chang J H, Lee W S. Finding recent frequent itemsets adaptively over online data streams. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’03), Washington, 2003. 487–492

  28. Li X, Deng Z H. Mining frequent itemsets from network flows for monitoring network. Expert Syst Appl, 2010, 37: 8850–8860

    Article  Google Scholar 

  29. Agrawal R, Srikant R. Mining sequential itemsets. In: The 11th International Conference on Data Engineering (ICDE’95), Taiwan, 2003. 3–14

  30. Yan X, Han J. gSpan: Graph-based substructure pattern mining. In: The 2002 IEEE International Conference on Data Mining (ICDM’02), Maebashi, 2002. 721–724

  31. Cao L B, Zhang H F, Zhao Y C, et al. Combined mining: Discovering informative knowledge in complex data. IEEE Trans Syst Man Cybern Part B-Cybern, 2011, 41: 699–712

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to ZhiHong Deng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, Z., Wang, Z. & Jiang, J. A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inf. Sci. 55, 2008–2030 (2012). https://doi.org/10.1007/s11432-012-4638-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-012-4638-z

Keywords

Navigation