Mining interesting sequences with low average cost and high average utility

Truong, Tin; Duong, Hai; Le, Bac; Fournier-Viger, Philippe; Yun, Unil

doi:10.1007/s10489-021-02505-0

Mining interesting sequences with low average cost and high average utility

Published: 20 September 2021

Volume 52, pages 7136–7157, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Tin Truong¹,
Hai Duong¹,
Bac Le^2,3,
Philippe Fournier-Viger⁴ &
…
Unil Yun⁵

259 Accesses
3 Citations
Explore all metrics

Abstract

Discovering high utility sequences in a quantitative database is a popular data mining task. The goal is to enumerate all sequences of items (symbols) that have a high value for the user, as measured by a utility function. A representative application of high utility sequence mining is the identification of profitable sequences of purchases in transactions from online stores. Though useful, a drawback of that task is that the cost of items is not considered. However, cost is a key factor for decision-making in that domain and many others. To consider both the cost and utility of items for sequence mining, this paper defines a novel problem \( \mathcal{FLCHUSM} \) of mining frequent sequences having a high average utility and a low average cost. Though the proposed problem is a generalization of the traditional problem of frequent sequence mining, it is more challenging because the average utility and average cost functions do not satisfy the downward-closure property traditionally used to reduce the search space. To offer a solution to this issue, this paper presents a lower bound on the cost and two novel upper bounds on the utility. Besides, four width, depth pruning, reducing and tightening strategies are devised to eliminate unpromising patterns from the search space. Taking these theoretical results as a foundation, a new CUL (Cost-Utility List) data structure is conceived for storing and quickly updating the utility and cost information of patterns, and a novel algorithm named FLCHUSPM is proposed for \( \mathcal{FLCHUSM} \). Results from several experiments show that FLCHUSPM is efficient in terms of memory usage and runtime, and that interesting patterns can be discovered in real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey of High Utility Sequential Pattern Mining

More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies

Article 01 September 2021

References

Agrawal R, Srikant R (1995) Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, pp.3–14
Fournier-Viger P, Gomariz A, Campos M (2014) Fast vertical mining of sequential patterns using co-occurrence information. In Proceedings of 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD ‘2014, pp.40–52
Wu R, Li Q, Chen X (2019) Mining contrast sequential pattern based on subsequence time distribution variation with discreteness constraints. Appl Intell 49(12):4348–4360
Article Google Scholar
Ahmed CF, Tanbeer SK, Jeong BS (2010) Mining high utility web access sequences in dynamic web log data. In Proceedings of 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD2010, pp.76–81
Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 57:214–231
Article Google Scholar
Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772–1786
Article Google Scholar
Zihayat M, Davoudi H, An A (2017) Top-k utility-based gene regulation sequential pattern discovery. In Proceedings of 2016 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2016, pp.266–273
Dalmas B, Fournier-Viger P, Norre S (2017) TWINCLE: a constrained sequential rule mining algorithm for event logs. In Proceedings of 9th International KES Conference (IDT-KES 2017), pp.205–214
Baek Y, Yun U, Kim H, Kim J, Vo B, Truong T (2021) Approximate high utility itemset mining in noisy environments. Knowledge-Based Syst 212:106596
Article Google Scholar
Chan R, Yang Q, Shen Y-D (2003) Minging high utility itemsets. In Proceedings of IEEE International Conference on Data Mining, pp.19–26
Fournier-Viger P, Lin JC-W, Truong T, Nkambou R (2019) A survey of high utility Itemset mining. In High-Utility Pattern Mining: Theory, Algorithms and Applications; Fournier-Viger, Philippe; Jerry Chun-Wei., Lin; Nikambou, Roger; Vo, Bay; Tseng, Vincent S, Springer International Publishing. pp.1–44
Yin J, Zheng Z, Cao L (2012) USpan: An efficient algorithm for mining high utility sequential patterns. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.660–668
Gan W, Lin JC-W, Zhang J, Fournier-Viger P, Chao H, Yu PS (2019) Fast utility mining on complex sequences. CoRR 1904(2):1–15
Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P (2019) FMaxCloHUSM: An efficient algorithm for mining frequent closed and maximal high utility sequences. Eng Appl Artif Intell 85(1):1–20
Article Google Scholar
Zhang J, Wang Y, Yang D (2015) CCSpan: mining closed contiguous sequential patterns. Knowledge-Based Syst 89(1):1–13
Article Google Scholar
Zhang J, Wang Y, Zhang C, Shi Y (2016) Mining contiguous sequential generators in biological sequences. IEEE/ACM Trans Comput Biol Bioinforma 13(5):855–867
Article Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P, Yun U, Fujita H (2021) Efficient algorithms for mining frequent high utility sequences with constraints. Inf Sci (Ny) 568:239–264
Article MathSciNet Google Scholar
Nguyen LTT, Vu VV, Lam MTH, Duong TTM, Manh LT, Nguyen TTT et al (2019) An efficient method for mining high utility closed itemsets. Inf Sci (Ny) 495:78–99
Article Google Scholar
Hong T-P, Lee CH, Wang SL (2011) Effective utility mining with the measure of average utility. Expert Syst Appl 38(7):8259–8265
Article Google Scholar
Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Futur Gener Comput Syst 68(1):346–360
Article Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P (2018) Efficient vertical Mining of High Average-Utility Itemsets Based on novel upper-bounds. IEEE Trans Knowl Data Eng 31(2):301–314
Article Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P, Yun U (2019) Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowledge-Based Syst. 183(1):104847
Article Google Scholar
Truong T, Duong H, Le B, Fournier-Viger P (2020) EHAUSM: An efficient algorithm for high average utility sequence mining. Inf Sci (Ny) 515(1):302–323
Article MathSciNet Google Scholar
Yun U, Kim D, Yoon E, Fujita H (2018) Damped window based high average utility pattern mining over data streams. Knowledge-Based Syst 144:188–205
Article Google Scholar
Zida S, Fournier-Viger P, Lin JC-W, Wu C-W, Tseng VS (2015) EFIM: a highly efficient algorithm for high-utility itemset mining. In Proceedings of Mexican International Conference on Artificial Intelligence (MICAI 2015), pp.530–546
Fournier-viger P, Zhang Y, Lin JC, Fujita H, Koh YS (2019) Mining local and peak high utility itemsets. Inf Sci (Ny). 481 344–367
Kim H, Yun U, Baek Y, Kim J, Vo B, Yoon E, et al. (2021) Efficient list based mining of high average utility patterns with maximum average pruning strategies. Inf Sci (Ny). 543 85–105
Ahmed CF, Tanbeer SK, Jeong BS (2010) A novel approach for mining high-utility sequential patterns in sequence databases. ETRI 32(5):676–686
Article Google Scholar
Truong T, Tran A, Duong H, Le B, Fournier-Viger P (2020) EHUSM : mining high utility sequences with a pessimistic utility model. Data Sci Pattern Recognit 4(2):65–83
Google Scholar
Alkan OK, Karagoz P (2015) CRoM and HuspExt: improving efficiency of high utility sequential pattern extraction. IEEE Trans Knowl Data Eng 27(10):2645–2657
Article Google Scholar
Wang JZ, Huang JL, Chen YC (2016) On efficiently mining high utility sequential patterns. Knowl Inf Syst 49(2):597–627
Article Google Scholar
Gan W, Lin JCW, Fournier-Viger P, Chao HC, Fujita H (2018) Extracting non-redundant correlated purchase behaviors by utility measure. Knowledge-Based Syst 143:30–41
Article Google Scholar
Gan W, Lin JC, Zhang J, Chao H, Fujita H, Yu PS (2020) ProUM : projection-based utility mining on sequence data. Inf Sci (Ny). 513 222–240
Gan W, Lin JC, Chao H, Fujita H, Yu PS (2019) Correlated utility-based pattern mining. Inf Sci (Ny). 504 470–486
Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-K high utility sequential patterns. In Proceedings of 2013 IEEE 13th International Conference on Data Mining (ICDM), pp.1259–1264
Truong T, Fournier-Viger P (2019) A survey of high utility sequential pattern mining. In P. Fournier-Viger, J. C.-W. Lin, R. Nkambou, V. Bay, & V. S. Tseng, High-utility pattern mining: theory, algorithms and applications, pp.97–129
Thilagu M, Nadarajan R (2012) Efficiently mining of effective web traversal patterns with average utility. Procedia Technol 6(1):444–451
Article Google Scholar
Lin JC-W, Li T, Pirouz M, Zhang J, Fournier-Viger P (2020) High average-utility sequential pattern mining based on uncertain databases. Knowl Inf Syst 62(3):1199–1228
Article Google Scholar
Fournier-Viger P, Li J, Lin JC-W, Truong T (2019) Discovering and visualizing efficient patterns in cost/utility sequences. In Proceedings of International Conference on Big Data Analytics and Knowledge Discovery (DaWaK 2019), LNCS 11708, pp.73–88
Fournier-Viger P, Li J, Lin JC-W, Truong T, Kiran RU (2020) Mining cost-effective patterns in event logs Knowledge-Based Syst 191:105241
Google Scholar
Fournier-Viger P, Lin JC-W, Gomaris A, Gueniche T, Soltani A, Deng Z et al (2014) SPMF: a Java open-source pattern mining library version 2. Mach Learn Res 15(1):3389–3393
Google Scholar
Hong T-P, Lee CH, Wang SL (2009) Mining high average-utility itemsets. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp 2526–2530
Google Scholar
Mehrnoosh V, Luca O, Davide A, Mathias F, Matthias R (2015) A learning analytics approach to correlate the academic achievements of students with interaction data from an educational simulator. In Lecture Notes in Computer Science, pp.613–616

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Dalat University, Dalat, Vietnam
Tin Truong & Hai Duong
Deparment of Computer Science, University of Science, Ho Chi Minh City, Vietnam
Bac Le
Vietnam National University, Ho Chi Minh City, Vietnam
Bac Le
School of Humanities and Social Sciences, Harbin Institute of Technology (Shenzhen), Shenzhen, China
Philippe Fournier-Viger
Department of Computer Engineering, Sejong University, Seoul, South Korea
Unil Yun

Authors

Tin Truong
View author publications
You can also search for this author in PubMed Google Scholar
Hai Duong
View author publications
You can also search for this author in PubMed Google Scholar
Bac Le
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Unil Yun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Fournier-Viger.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Truong, T., Duong, H., Le, B. et al. Mining interesting sequences with low average cost and high average utility. Appl Intell 52, 7136–7157 (2022). https://doi.org/10.1007/s10489-021-02505-0

Download citation

Accepted: 04 May 2021
Published: 20 September 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s10489-021-02505-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining interesting sequences with low average cost and high average utility

Abstract

Access this article

Similar content being viewed by others

A Survey of High Utility Sequential Pattern Mining

More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining interesting sequences with low average cost and high average utility

Abstract

Access this article

Similar content being viewed by others

A Survey of High Utility Sequential Pattern Mining

More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation