On efficiently mining high utility sequential patterns

Wang, Jun-Zhe; Huang, Jiun-Long; Chen, Yi-Cheng

doi:10.1007/s10115-015-0914-8

On efficiently mining high utility sequential patterns

Regular Paper
Published: 11 January 2016

Volume 49, pages 597–627, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jun-Zhe Wang¹,
Jiun-Long Huang¹ &
Yi-Cheng Chen²

943 Accesses
96 Citations
Explore all metrics

Abstract

High utility sequential pattern mining is an emerging topic in pattern mining, which refers to identify sequences with high utilities (e.g., profits) but probably with low frequencies. To identify high utility sequential patterns, due to lack of downward closure property in this problem, most existing algorithms first generate candidate sequences with high sequence-weighted utilities (SWUs), which is an upper bound of the utilities of a sequence and all its supersequences, and then calculate the actual utilities of these candidates. This causes a large number of candidates since SWU is usually much larger than the real utilities of a sequence and all its supersequences. In view of this, we propose two tight utility upper bounds, prefix extension utility and reduced sequence utility, as well as two companion pruning strategies, and devise HUS-Span algorithm to identify high utility sequential patterns by employing these two pruning strategies. In addition, since setting a proper utility threshold is usually difficult for users, we also propose algorithm TKHUS-Span to identify top-k high utility sequential patterns by using these two pruning strategies. Three searching strategies, guided depth-first search (GDFS), best-first search (BFS) and hybrid search of BFS and GDFS, are also proposed to improve the efficiency of TKHUS-Span. Experimental results on some real and synthetic datasets show that HUS-Span and TKHUS-Span with strategy BFS are able to generate less candidate sequences and thus outperform other prior algorithms in terms of mining efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-Utility Sequential Pattern Mining with Multiple Minimum Utility Thresholds

Mining High Utility Sequential Patterns Using Maximal Remaining Utility

An Efficient Algorithm for High Utility Sequential Pattern Mining

References

Amazon reviews (2013). http://snap.stanford.edu/data/web-Amazon.html
Ahmed CF, Tanbeer SK, Byeong-Soo J (2011) A framework for mining high utility web access sequences. IETE Tech Rev 28(1):3–16
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong B-S (2010) A novel approach for mining high-utility sequential patterns in sequence databases. ETRI J 32(5):676–686
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721
Article Google Scholar
Ayres J, Gehrke J, Yiu T, Flannick J (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the 8th ACM international conference on knowledge discovery and data mining, pp 429–435
Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Garofalakis MN, Rastogi R, Shim K (1999) Spirit: sequential pattern mining with regular expression constraints. In: Proceedings of the 25th international conference on very large data bases, pp 223–234
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12
Kim C, Lim J-H, Ng RT, Shim K (2007) Squire: sequential pattern mining with quantities. J Syst Softw 80(10):1726–1745
Article Google Scholar
Lan G-C, Hong T-P, Tseng VS, Wang S-L (2014) Applying the maximum utility measure in high utility sequential pattern mining. Expert Syst Appl 41(11):5071–5081
Article Google Scholar
Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217
Article Google Scholar
Liu J, Wang K, Fung B (2012) Direct discovery of high utility itemsets without candidate generation. In: the 12rd IEEE international conference on data mining, pp 984–989
Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 55–64
Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st ACM international workshop on utility-based data mining, pp 90–99
Microsoft sql server 2008 analysis services unleashed (2008). http://www.informit.com/store/microsoft-sql-server-2008-analysis-services-unleashed-9780672330018
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Article Google Scholar
Russell SJ, Norvig P (2003) Artificial intelligence: a modern approach, 2nd edn. Pearson Education, London
MATH Google Scholar
Shie B-E, Hsiao H-F, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. In: Proceedings of the 16th international conference on database systems for advanced applications, pp 224–238
Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology, pp 3–17
Ta-feng datasets (2001). http://aiia.iis.sinica.edu.tw/index.php?option=com_frontpage&Itemid=1
Tseng VS, Wu C-W, Shie B-E, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 253–262
Wang J, Han J (2004) Bide: Efficient mining of frequent closed sequences. In: Proceedings of the 20th IEEE international conference on data engineering, pp 79–90
Wu CW, Shie B-E, Tseng VS, Yu PS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 78–86
Yan X, Han J, Afshar R (2003) Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of the 3rd SIAM international conference on data mining, pp 166–177
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 7th SIAM international conference on data mining, pp 482–486
Yin J, Zheng Z, Cao L (2012) Uspan: an efficient algorithm for mining high utility sequential patterns. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 660–668
Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: the 13rd IEEE international conference on data mining, pp 1259–1264
Zhang W, Korf RE (1993) Depth-first vs. best-first search: New results. In: the 11th AAAI national conference on artificial intelligence, pp 769–775

Download references

Acknowledgments

The authors were supported by the Ministry of Science and Technology, Taiwan, under Project No. MOST 104-2221-E-032-037-MY2, MOST 103-2221-E-009-126-MY2, MOST 104-2918-I-009-003, MOST 104-2218-E-009-009 and MOST 104-2218-E-009-029.

Author information

Authors and Affiliations

Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan, ROC
Jun-Zhe Wang & Jiun-Long Huang
Department of Computer Science and Information Engineering, Tamkang University, New Taipei City, Taiwan, ROC
Yi-Cheng Chen

Authors

Jun-Zhe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiun-Long Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Cheng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiun-Long Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, JZ., Huang, JL. & Chen, YC. On efficiently mining high utility sequential patterns. Knowl Inf Syst 49, 597–627 (2016). https://doi.org/10.1007/s10115-015-0914-8

Download citation

Received: 25 November 2014
Revised: 10 October 2015
Accepted: 29 December 2015
Published: 11 January 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s10115-015-0914-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On efficiently mining high utility sequential patterns

Abstract

Access this article

Similar content being viewed by others

High-Utility Sequential Pattern Mining with Multiple Minimum Utility Thresholds

Mining High Utility Sequential Patterns Using Maximal Remaining Utility

An Efficient Algorithm for High Utility Sequential Pattern Mining

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On efficiently mining high utility sequential patterns

Abstract

Access this article

Similar content being viewed by others

High-Utility Sequential Pattern Mining with Multiple Minimum Utility Thresholds

Mining High Utility Sequential Patterns Using Maximal Remaining Utility

An Efficient Algorithm for High Utility Sequential Pattern Mining

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation