Efficiently mining uncertain high-utility itemsets

Lin, Jerry Chun-Wei; Gan, Wensheng; Fournier-Viger, Philippe; Hong, Tzung-Pei; Tseng, Vincent S.

doi:10.1007/s00500-016-2159-1

Efficiently mining uncertain high-utility itemsets

Focus
Published: 02 May 2016

Volume 21, pages 2801–2820, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jerry Chun-Wei Lin¹,
Wensheng Gan¹,
Philippe Fournier-Viger²,
Tzung-Pei Hong^3,4 &
…
Vincent S. Tseng⁵

489 Accesses
41 Citations
Explore all metrics

Abstract

Data mining consists of deriving implicit, potentially meaningful and useful knowledge from databases such as information about the most profitable items. High-utility itemset mining (HUIM) has thus emerged as an important research topic in data mining. But most HUIM algorithms can only handle precise data, although big data collected in real-life applications using experimental measurements or noisy sensors is often uncertain. In this paper, an efficient algorithm, named Mining Uncertain High-Utility Itemsets (MUHUI), is proposed to efficiently discover potential high-utility itemsets (PHUIs) in uncertain data. Based on the probability-utility-list (PU-list) structure, the MUHUI algorithm directly mines PHUIs without generating candidates, and can avoid constructing PU-lists for numerous unpromising itemsets by applying several efficient pruning strategies, which greatly improve its performance. Extensive experiments conducted on both real-life and synthetic datasets show that the proposed algorithm significantly outperforms the state-of-the-art PHUI-List algorithm in terms of efficiency and scalability, and that the proposed MUHUI algorithm scales well when mining PHUIs in large-scale uncertain datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Mining of Uncertain Data for High-Utility Itemsets

Mining High-Average Utility Itemsets with Positive and Negative External Utilities

Article 07 November 2019

Support-Based High Utility Mining with Negative Utility Values

References

Aggarwal CC (2010) Managing and mining uncertain data, managing and mining uncertain data
Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 29–38
Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609–623
Article Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Article Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large database. In: The ACM SIGMOD International Conference on Management of Data, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: International Conference on Very Large Data Bases, pp 487–499
Agrawal R, Srikant R (1994) Quest synthetic data generator. http://www.Almaden.ibm.com/cs/quest/syndata.html
Ahmed CF, Tanbeer SK, Jeong BS, Le YK (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Article Google Scholar
Bernecker T, Kriegel HP, Renz M, Verhein F, Zuefl A (2009) Probabilistic frequent itemset mining in uncertain databases. In: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 119–128
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: IEEE International Conference on Data Mining, pp 19–26
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Article Google Scholar
Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Advances in Knowledge Discovery and Data Mining, pp 47–58
Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 217–228
Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. Found Intell Syst 8502:83–92
Google Scholar
Fournier-Viger P, Zida S (2016) FOSHU: Faster on-shelf high utility itemset mining—with or without negative unit profit. In: The 30th Symposium on Applied Computing, pp 857–864
Frequent itemset mining dataset repository (2012). http://fimi.ua.ac.be/data/
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9 (Article 9)
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53–87
Article MathSciNet Google Scholar
Lan GC, Hong TP, Tseng VS (2011) Discovery of high utility itemsets from on-shelf time periods of products. Expert Syst Appl 38(5):5851–5857
Article Google Scholar
Lan GC, Hong TP, Huang JP, Tseng VS (2014) On-shelf utility mining with negative item values. Expert Syst Appl 41(7):3450–3459
Article Google Scholar
Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: Advances in Knowledge Discovery and Data Mining, pp 653–661
Lin JCW, Gan W, Fournier-Viger P, Hong TP (2015) Mining high-utility itemsets with multiple minimum utility thresholds. In: ACM International C* Conference on Computer Science & Software Engineering, pp 9–17
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2015) Mining potential high-utility itemsets over uncertain databases. In: ACM 5th ASE BigData & SocialInformatics, pp 25
Lin JCW, Gan W, Hong TP, Zhang B (2015) An incremental high-utility mining algorithm with transaction insertion. Sci World J
Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419–7424
Lin CW, Hong TP, Lan GC, Wong JW, Lin WY (2015) Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases. Adv Eng Inform 29(1):16–27
Article Google Scholar
Lin JCW, Gan W, Hong TP (2015) A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification. Adv Eng Inform 29(3):562–574
Article Google Scholar
Lin JCW, Gan W, Hong TP, Tseng VS (2015) Efficient algorithms for mining up-to-date high-utility patterns. Adv Eng Inform 29(3):648–661
Article Google Scholar
Lin CW, Hong TP (2012) A new mining approach for uncertain databases using CUFP trees. Expert Syst Appl 39(4):4084–4093
Article Google Scholar
Liu C, Chen L, Zhang C (2013) Summarizing probabilistic frequent patterns: a fast approach. In: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 527–535
Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in Knowledge Discovery and Data Mining, pp 689–695
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: ACM International Conference on Information and Knowledge Management, pp 55–64
Microsoft (2016) Example database foodmart of microsoft analysis services. http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx
Nilesh D, Dan S (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523–544
Article Google Scholar
Rymon R (1992) Search through systematic set enumeration. In: International Conference Principles of Knowledge Representation and Reasoning, pp 539–550
Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 273–282
Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. VLDB Endow 5(11):1650–1661
Article Google Scholar
Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-growth: an efficient algorithm for high utility itemset mining. In: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 253–262
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Article Google Scholar
Wang L, Cheung DL, Cheng R, Lee SD, Yang XS (2012) Efficient mining of frequent item sets on large uncertain databases. IEEE Trans Knowl Data Eng 24(12):2170–2183
Article Google Scholar
Wang L, Cheng R, Lee SD, Cheung D (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: The 19th ACM International Conference on Information and Knowledge Managemen, pp 429–438
Wu CW, Shie BE, Tseng VS, Yu PS (2012) Mining top-\(k\) high utility itemsets. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 78–86
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: The SIAM International Conference on Data Mining, pp 211–225
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Article Google Scholar
Zihayat M, An A (2014) Mining top-\(k\) high utility patterns over data streams. Inf Sci 285:138–161
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This research was partially supported by the National Natural Science Foundation of China (NSFC) under Grant No.61503092, by the Shenzhen Peacock Project, China, under Grant KQC201109020055A, by the Natural Scientific Research Innova- tion Foundation in Harbin Institute of Technology under Grant HIT.NSRIF.2014100, and by the Shenzhen Strategic Emerging Industries Program under Grant ZDSY20120613125016389.

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, 518055, People’s Republic of China
Jerry Chun-Wei Lin & Wensheng Gan
School of Natural Sciences and Humanities, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, 518055, People’s Republic of China
Philippe Fournier-Viger
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, Taiwan, ROC
Tzung-Pei Hong
Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan, ROC
Tzung-Pei Hong
Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, ROC
Vincent S. Tseng

Authors

Jerry Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Wensheng Gan
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Tzung-Pei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Vincent S. Tseng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerry Chun-Wei Lin.

Ethics declarations

Conflicts of interest

The authors declare that there are no conflicts of interest in this paper.

Ethical standard

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Communicated by C.-H. Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, J.CW., Gan, W., Fournier-Viger, P. et al. Efficiently mining uncertain high-utility itemsets. Soft Comput 21, 2801–2820 (2017). https://doi.org/10.1007/s00500-016-2159-1

Download citation

Published: 02 May 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s00500-016-2159-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficiently mining uncertain high-utility itemsets

Abstract

Access this article

Similar content being viewed by others

Efficient Mining of Uncertain Data for High-Utility Itemsets

Mining High-Average Utility Itemsets with Positive and Negative External Utilities

Support-Based High Utility Mining with Negative Utility Values

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical standard

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficiently mining uncertain high-utility itemsets

Abstract

Access this article

Similar content being viewed by others

Efficient Mining of Uncertain Data for High-Utility Itemsets

Mining High-Average Utility Itemsets with Positive and Negative External Utilities

Support-Based High Utility Mining with Negative Utility Values

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical standard

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation