Skip to main content
Log in

On efficiently mining high utility sequential patterns

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

High utility sequential pattern mining is an emerging topic in pattern mining, which refers to identify sequences with high utilities (e.g., profits) but probably with low frequencies. To identify high utility sequential patterns, due to lack of downward closure property in this problem, most existing algorithms first generate candidate sequences with high sequence-weighted utilities (SWUs), which is an upper bound of the utilities of a sequence and all its supersequences, and then calculate the actual utilities of these candidates. This causes a large number of candidates since SWU is usually much larger than the real utilities of a sequence and all its supersequences. In view of this, we propose two tight utility upper bounds, prefix extension utility and reduced sequence utility, as well as two companion pruning strategies, and devise HUS-Span algorithm to identify high utility sequential patterns by employing these two pruning strategies. In addition, since setting a proper utility threshold is usually difficult for users, we also propose algorithm TKHUS-Span to identify top-k high utility sequential patterns by using these two pruning strategies. Three searching strategies, guided depth-first search (GDFS), best-first search (BFS) and hybrid search of BFS and GDFS, are also proposed to improve the efficiency of TKHUS-Span. Experimental results on some real and synthetic datasets show that HUS-Span and TKHUS-Span with strategy BFS are able to generate less candidate sequences and thus outperform other prior algorithms in terms of mining efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Amazon reviews (2013). http://snap.stanford.edu/data/web-Amazon.html

  2. Ahmed CF, Tanbeer SK, Byeong-Soo J (2011) A framework for mining high utility web access sequences. IETE Tech Rev 28(1):3–16

    Article  Google Scholar 

  3. Ahmed CF, Tanbeer SK, Jeong B-S (2010) A novel approach for mining high-utility sequential patterns in sequence databases. ETRI J 32(5):676–686

    Article  Google Scholar 

  4. Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2009) Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708–1721

    Article  Google Scholar 

  5. Ayres J, Gehrke J, Yiu T, Flannick J (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the 8th ACM international conference on knowledge discovery and data mining, pp 429–435

  6. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  7. Garofalakis MN, Rastogi R, Shim K (1999) Spirit: sequential pattern mining with regular expression constraints. In: Proceedings of the 25th international conference on very large data bases, pp 223–234

  8. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, pp 1–12

  9. Kim C, Lim J-H, Ng RT, Shim K (2007) Squire: sequential pattern mining with quantities. J Syst Softw 80(10):1726–1745

    Article  Google Scholar 

  10. Lan G-C, Hong T-P, Tseng VS, Wang S-L (2014) Applying the maximum utility measure in high utility sequential pattern mining. Expert Syst Appl 41(11):5071–5081

    Article  Google Scholar 

  11. Li Y-C, Yeh J-S, Chang C-C (2008) Isolated items discarding strategy for discovering high utility itemsets. Data Knowl Eng 64(1):198–217

    Article  Google Scholar 

  12. Liu J, Wang K, Fung B (2012) Direct discovery of high utility itemsets without candidate generation. In: the 12rd IEEE international conference on data mining, pp 984–989

  13. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp 55–64

  14. Liu Y, Liao W, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: Proceedings of the 1st ACM international workshop on utility-based data mining, pp 90–99

  15. Microsoft sql server 2008 analysis services unleashed (2008). http://www.informit.com/store/microsoft-sql-server-2008-analysis-services-unleashed-9780672330018

  16. Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440

    Article  Google Scholar 

  17. Russell SJ, Norvig P (2003) Artificial intelligence: a modern approach, 2nd edn. Pearson Education, London

    MATH  Google Scholar 

  18. Shie B-E, Hsiao H-F, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. In: Proceedings of the 16th international conference on database systems for advanced applications, pp 224–238

  19. Srikant R, Agrawal R (1996) Mining sequential patterns: Generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology: advances in database technology, pp 3–17

  20. Ta-feng datasets (2001). http://aiia.iis.sinica.edu.tw/index.php?option=com_frontpage&Itemid=1

  21. Tseng VS, Wu C-W, Shie B-E, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 253–262

  22. Wang J, Han J (2004) Bide: Efficient mining of frequent closed sequences. In: Proceedings of the 20th IEEE international conference on data engineering, pp 79–90

  23. Wu CW, Shie B-E, Tseng VS, Yu PS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 78–86

  24. Yan X, Han J, Afshar R (2003) Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of the 3rd SIAM international conference on data mining, pp 166–177

  25. Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: Proceedings of the 7th SIAM international conference on data mining, pp 482–486

  26. Yin J, Zheng Z, Cao L (2012) Uspan: an efficient algorithm for mining high utility sequential patterns. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 660–668

  27. Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: the 13rd IEEE international conference on data mining, pp 1259–1264

  28. Zhang W, Korf RE (1993) Depth-first vs. best-first search: New results. In: the 11th AAAI national conference on artificial intelligence, pp 769–775

Download references

Acknowledgments

The authors were supported by the Ministry of Science and Technology, Taiwan, under Project No. MOST 104-2221-E-032-037-MY2, MOST 103-2221-E-009-126-MY2, MOST 104-2918-I-009-003, MOST 104-2218-E-009-009 and MOST 104-2218-E-009-029.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiun-Long Huang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, JZ., Huang, JL. & Chen, YC. On efficiently mining high utility sequential patterns. Knowl Inf Syst 49, 597–627 (2016). https://doi.org/10.1007/s10115-015-0914-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0914-8

Keywords

Navigation