Advertisement

Applied Intelligence

, Volume 40, Issue 1, pp 29–43 | Cite as

Mining high utility itemsets by dynamically pruning the tree structure

  • Wei Song
  • Yu Liu
  • Jinhong Li
Article

Abstract

Mining high utility itemsets is one of the most important research issues in data mining owing to its ability to consider nonbinary frequency values of items in transactions and different profit values for each item. Mining such itemsets from a transaction database involves finding those itemsets with utility above a user-specified threshold. In this paper, we propose an efficient concurrent algorithm, called CHUI-Mine (Concurrent High Utility Itemsets Mine), for mining high utility itemsets by dynamically pruning the tree structure. A tree structure, called the CHUI-Tree, is introduced to capture the important utility information of the candidate itemsets. By recording changes in support counts of candidate high utility items during the tree construction process, we implement dynamic CHUI-Tree pruning, and discuss the rationality thereof. The CHUI-Mine algorithm makes use of a concurrent strategy, enabling the simultaneous construction of a CHUI-Tree and the discovery of high utility itemsets. Our algorithm reduces the problem of huge memory usage for tree construction and traversal in tree-based algorithms for mining high utility itemsets. Extensive experimental results show that the CHUI-Mine algorithm is both efficient and scalable.

Keywords

Data mining High utility itemset CHUI-Tree Dynamically pruning Concurrency 

Notes

Acknowledgements

We would like to express our deep gratitude to the anonymous reviewers of this paper. The work is partly supported by the National Natural Science Foundation of China (61105045), Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality (PHR201108057), and North China University of Technology (CCXZ201303).

References

  1. 1.
    Adnan M, Alhajj R (2009) DRFP-tree: disk-resident frequent pattern tree. Appl Intell 30(2):84–97 CrossRefGoogle Scholar
  2. 2.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings 20th international conference very large data bases (VLDB’94), pp 487–499 Google Scholar
  3. 3.
    Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2009) Efficient tree structures for high utility pattern. IEEE Trans Knowl Data Eng 21(12):1708–1721 CrossRefGoogle Scholar
  4. 4.
    Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2011) HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198 CrossRefGoogle Scholar
  5. 5.
    Barber B, Hamilton HJ (2003) Extracting share frequent itemsets with infrequent subsets. Data Min Knowl Discov 7(2):153–185 CrossRefMathSciNetGoogle Scholar
  6. 6.
    Chan R, Yang Q, Shen Y-D (2003) Mining high utility itemsets. In: Proceedings of the 3rd IEEE international conference on data mining (ICDM’03), pp 19–26 CrossRefGoogle Scholar
  7. 7.
    Erwin A, Gopalan RP, Achuthan NR (2007) CTU-Mine: an efficient high utility itemset mining algorithm using the pattern growth approach. In: Proceedings of the 7th IEEE international conference on computer and information technology (CIT’07), pp 71–76 CrossRefGoogle Scholar
  8. 8.
    Erwin A, Gopalan RP, Achuthan NR (2007) A bottom-up projection based algorithm for mining high utility itemsets. In: Proceedings of the 2nd international workshop on integrating artificial intelligence and data mining (AIDM’07), pp 3–10 Google Scholar
  9. 9.
    Erwin A, Gopalan RP, Achuthan NR (2008) Efficient mining of high utility itemsets from large datasets. In: Proceedings of the 12th Pacific-Asia conference on advances in knowledge discovery and data mining (PAKDD’08), pp 554–561 CrossRefGoogle Scholar
  10. 10.
    Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees. IEEE Trans Knowl Data Eng 17(10):1347–1362 CrossRefGoogle Scholar
  11. 11.
    Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86 CrossRefMathSciNetGoogle Scholar
  12. 12.
    Han J, Kamber M (2006) Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann, San Francisco Google Scholar
  13. 13.
    Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87 CrossRefMathSciNetGoogle Scholar
  14. 14.
    Hu J, Mojsilovic A (2007) High-utility pattern mining: a method for discovery of high-utility item sets. Pattern Recognit 40(11):3317–3324 CrossRefzbMATHGoogle Scholar
  15. 15.
    Kaya M, Alhajj R (2008) Online mining of fuzzy multidimensional weighted association rules. Appl Intell 29(1):13–34 CrossRefGoogle Scholar
  16. 16.
    Lee C-H (2007) IMSP: an information theoretic approach for multi-dimensional sequential pattern mining. Appl Intell 26(3):231–242 CrossRefzbMATHGoogle Scholar
  17. 17.
    Lan G-C, Hong T-P, Tseng VS (2012) A projection-based approach for discovering high average-utility itemsets. J Inf Sci Eng 28(1):193–209 Google Scholar
  18. 18.
    Li H-F, Huang H-Y, Lee S-Y (2011) Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl Inf Syst 28(3):495–522 CrossRefGoogle Scholar
  19. 19.
    Li H-F (2011) MHUI-max: an efficient algorithm for discovering high-utility itemsets from data streams. J Inf Sci 37(5):532–545 CrossRefGoogle Scholar
  20. 20.
    Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217 CrossRefGoogle Scholar
  21. 21.
    Lin C-W, Hong T-P, Lu W-H (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424 CrossRefGoogle Scholar
  22. 22.
    Liu Y, Liao W-K, Choudhary AN (2005) A two phase algorithm for fast discovery of high utility of itemsets. In: Proceedings of the 9th Pacific-Asia conference on knowledge discovery and data mining (PAKDD’05), pp 689–695 Google Scholar
  23. 23.
    Maunz A, Helma C, Kramer S (2011) Efficient mining for structurally diverse subgraph patterns in large molecular databases. Mach Learn 83(2):193–218 CrossRefzbMATHMathSciNetGoogle Scholar
  24. 24.
    Piatetsky-Shapiro G (2007) Data mining and knowledge discovery 1996 to 2005: overcoming the hype and moving from “university” to “business” and “analytics”. Data Min Knowl Discov 15(1):99–105 CrossRefMathSciNetGoogle Scholar
  25. 25.
    Rauch J (2005) Logic of association rules. Appl Intell 22(1):9–28 CrossRefzbMATHGoogle Scholar
  26. 26.
    Roscoe AW (2010) Understanding concurrent systems. Springer, London CrossRefzbMATHGoogle Scholar
  27. 27.
    Shie B-E, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435 CrossRefGoogle Scholar
  28. 28.
    Song W, Yang BR, Xu ZY (2008) Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl-Based Syst 21(6):507–513 CrossRefGoogle Scholar
  29. 29.
    Tatti N, Cule B (2012) Mining closed strict episodes. Data Min Knowl Discov 25(1):34–66 CrossRefzbMATHMathSciNetGoogle Scholar
  30. 30.
    Tseng M-C, Lin Y-Y, Jeng R (2008) Updating generalized association rules with evolving taxonomies. Appl Intell 29(3):306–320 CrossRefGoogle Scholar
  31. 31.
    Tseng VS, Wu C-W, Shie B-E, Yu PS (2010) UP-Growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’10), pp 253–262 CrossRefGoogle Scholar
  32. 32.
  33. 33.
    Frequent Itemset mining implementations repository. http://fimi.ua.ac.be/
  34. 34.
    Wang S-L, Patel D, Jafari A, Hong T-P (2007) Hiding collaborative recommendation association rules. Appl Intell 27(1):67–77 CrossRefzbMATHGoogle Scholar
  35. 35.
    Wang Y-T, Cheng J-T (2011) Mining periodic movement patterns of mobile phone users based on an efficient sampling approach. Appl Intell 35(1):32–40 CrossRefGoogle Scholar
  36. 36.
    Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 4th SIAM international conference on data mining (SDM’04), pp 482–486 CrossRefGoogle Scholar
  37. 37.
    Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626 CrossRefGoogle Scholar
  38. 38.
    Yu G, Shao S, Luo B, Zeng X (2009) A hybrid method for high-utility itemsets mining in large high-dimensional data. Int J Data Warehous Min 5(1):57–73 CrossRefGoogle Scholar
  39. 39.
    Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), pp 326–335 Google Scholar
  40. 40.
    Zhang S, Chen F, Wu X, Zhang C, Wang R (2012) Mining bridging rules between conceptual clusters. Appl Intell 36(1):108–118 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.College of Information EngineeringNorth China University of TechnologyBeijingChina

Personalised recommendations