A Spark-based high utility itemset mining with multiple external utilities

Sethi, Krishan Kumar; Ramesh, Dharavath; Trivedi, Munesh Chandra

doi:10.1007/s10586-021-03442-w

A Spark-based high utility itemset mining with multiple external utilities

Published: 17 November 2021

Volume 25, pages 889–909, (2022)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Krishan Kumar Sethi¹,
Dharavath Ramesh ORCID: orcid.org/0000-0003-3338-6520¹ &
Munesh Chandra Trivedi²

320 Accesses
2 Citations
Explore all metrics

Abstract

High utility itemset (HUI) mining is a powerful data mining technique to discover profitable patterns. The utility of an item is computed by using two measures named quantity and per-unit profit. All existing HUI mining algorithms consider a single value of external utility (per unit profit) for the entire database. However, the per-unit profit of items might fluctuate over time in many applications. This research introduces three novel strategies to comprise the external utilities of items as input for the HUI mining algorithm. Traditional HUI mining algorithms have been developed for the standalone system and do not fit for big data processing due to the limited computing resources (CPU, memory). Big data are efficiently processed on distributed frameworks like Apache Hadoop, Spark, etc. This paper introduces a distributed HUI mining algorithm named Spark-based Top-k high utility itemset (k-SHUI) miner. We also propose a fair load distribution strategy to divide the search space equally among the cluster nodes. The k-SHUI produces top-k HUIs without the requirement of the minimum utility threshold. We conducted extensive experiments on six real-life datasets to compare the proposed algorithm's performance with the existing algorithm. The experimental results demonstrate that the proposed algorithm outperforms the existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

High Utility Pattern Mining Distributed Algorithm Based on Spark RDD

Parallel High Average-Utility Itemset Mining Using Better Search Space Division Approach

EAFIM: efficient apriori-based frequent itemset mining algorithm on Spark for big transactional data

Article 07 April 2020

References

Hashem, I.A.T., Yaqoob, I., Anuar, N.B., Mokhtar, S., Gani, A., Khan, S.U.: The rise of big data on cloud computing review and open research issues. Inf. Syst. 47, 98–115 (2015)
Article Google Scholar
Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Article Google Scholar
Arora, S., Bala, A.: A survey: ICT enabled energy efficiency techniques for big data applications. Clust. Comput. 23(2), 775–796 (2020)
Article Google Scholar
Pacheco, P.: Parallel Programming with MPI. Morgan Kaufmann, San Francisco (1997)
MATH Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pp. 59–72 (2007)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Beijing (2012)
Google Scholar
Chan, R., Yang, Q., Shen, Y.-D.: Mining high utility itemsets. In: Third IEEE International Conference on Data Mining, 2003. ICDM 2003, pp. 19–26. IEEE (2003)
Yao, H., Hamilton, H.J., Butz, C.J.: A foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 482–486. SIAM (2004)
Zhang, C., Han, M., Sun, R., Du, S., Shen, M.: A survey of key technologies for high utility patterns mining. IEEE Access 8, 55798–55814 (2020)
Article Google Scholar
Liu, Y., Liao, W.-K., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Pacific–Asia Conference on Knowledge Discovery and Data Mining, pp. 689–695. Springer (2005)
Li, Y.-C., Yeh, J.-S., Chang, C.-C.: Isolated items discarding strategy for discovering high utility itemsets. Data Knowl. Eng. 64(1), 198–217 (2008)
Article Google Scholar
Ahmed, C.F., Tanbeer, S.K., Jeong, B.-S., Lee, Y.-K.: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)
Article Google Scholar
Tseng, V.S., Wu, C.-W., Shie, B.-E., Yu, P.S.: Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 253–262. ACM (2010)
Tseng, V.S., Shie, B.-E., Wu, C.-W., Philip, S.Y.: Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans. Knowl. Data Eng. 25(8), 1772–1786 (2012)
Article Google Scholar
Yun, U., Ryang, H., Ryu, K.H.: High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst. Appl. 41(8), 3861–3878 (2014)
Article Google Scholar
Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64. ACM (2012)
Fournier-Viger, P., Wu, C.-W., Zida, S., Tseng, V.S.: FHM: faster high utility itemset mining using estimated utility co-occurrence pruning. In: International Symposium on Methodologies for Intelligent Systems, pp. 83–92. Springer (2014)
Krishnamoorthy, S.: Pruning strategies for mining high utility itemsets. Expert Syst. Appl. 42(5), 2371–2381 (2015)
Article Google Scholar
Zida, S., Fournier-Viger, P., Lin, J.C.-W., Wu, C.-W., Tseng, V.S.: EFIM: a highly efficient algorithm for high-utility itemset mining. In: Mexican International Conference on Artificial Intelligence, pp. 530–546. Springer (2015)
Krishnamoorthy, S.: Hminer: Efficiently mining high utility itemsets. Expert Syst. Appl. 90, 168–183 (2017)
Article Google Scholar
Chu, C.-J., Tseng, V.S., Liang, T.: An efficient algorithm for mining high utility itemsets with negative item values in large databases. Appl. Math. Comput. 215(2), 767–778 (2009)
MATH Google Scholar
Lan, G.-C., Hong, T.-P., Huang, J.-P., Tseng, V.S.: On-shelf utility mining with negative item values. Expert Syst. Appl. 41(7), 3450–3459 (2014)
Article Google Scholar
Lin, J.C.-W., Fournier-Viger, P., Gan, W.: FHN: an efficient algorithm for mining high-utility itemsets with negative unit profits. Knowl. Based Syst. 111, 283–298 (2016)
Article Google Scholar
Fournier-Viger, P., Zida, S.: FOSHU: faster on-shelf high utility itemset mining—with or without negative unit profit. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 857–864 (2015)
Wu, C.W., Shie, B.-E., Tseng, V.S., Yu, P.S.: Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 78–86 (2012)
Ryang, H., Yun, U.: Top-k high utility pattern mining with effective threshold raising strategies. Knowl. Based Syst. 76, 109–126 (2015)
Article Google Scholar
Tseng, V.S., Wu, C.-W., Fournier-Viger, P., Philip, S.Y.: Efficient algorithms for mining top-k high utility itemsets. IEEE Trans. Knowl. Data Eng. 28(1), 54–67 (2015)
Article Google Scholar
Duong, Q.-H., Liao, B., Fournier-Viger, P., Dam, T.-L.: An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl. Based Syst. 104, 106–122 (2016)
Article Google Scholar
Krishnamoorthy, S.: A Comparative Study of Top-K High Utility Itemset Mining Methods, pp. 47–74. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-04921-8
Book Google Scholar
Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analysis. O’Reilly Media, Inc., Beijing (2015)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Presented as Part of the 9th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 12), pp. 15–28 (2012)
Lin, Y.C., Wu, C.-W., Tseng, V.S.: Mining high utility itemsets in big data. In: Pacific–Asia Conference on Knowledge Discovery and Data Mining, pp. 649–661. Springer (2015)
Chen, Y., An, A.: Approximate parallel high utility itemset mining. Big Data Res. 6, 26–42 (2016)
Article Google Scholar
Sethi, K.K., Ramesh, D., Sreenu, M.: Parallel high average-utility itemset mining using better search space division approach. In: International Conference on Distributed Computing and Internet Technology, pp. 108–124. Springer (2019)
Sethi, K.K., Ramesh, D., Edla, D.R.: P-fhm+: Parallel high utility itemset mining algorithm for big data processing. Procedia Comput. Sci. 132, 918–927 (2018)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Man Jr., E.C., Garey, M., Johnson, D.: Approximation algorithms for bin packing: a survey. In: Approximation Algorithms for NP-Hard Problems, pp. 46–93 (1996)
Rymon, R.: Search Through Systematic Set Enumeration, pp 539–550. University of Pennsylvania (1992)
Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Wu, C.-W., Tseng, V.S.: SPMF: a Java open-source pattern mining library. J. Mach. Learn. Res. 15(1), 3389–3393 (2014)
MATH Google Scholar

Download references

Acknowledgements

This research work was supported by the Indian Institute of Technology (Indian School of Mines), Dhanbad, Government of India. The authors wish to express their gratitude and heartiest thanks to the Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, India, for their research support.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology (ISM), Dhanbad, Jharkhand, 826004, India
Krishan Kumar Sethi & Dharavath Ramesh
Department of Computer Science and Engineering, National Institute of Technology Agartala, Agartala, Tripura, 799046, India
Munesh Chandra Trivedi

Authors

Krishan Kumar Sethi
View author publications
You can also search for this author in PubMed Google Scholar
Dharavath Ramesh
View author publications
You can also search for this author in PubMed Google Scholar
Munesh Chandra Trivedi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dharavath Ramesh.

Ethics declarations

Conflict of interest

The authors do not have any conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sethi, K.K., Ramesh, D. & Trivedi, M.C. A Spark-based high utility itemset mining with multiple external utilities. Cluster Comput 25, 889–909 (2022). https://doi.org/10.1007/s10586-021-03442-w

Download citation

Received: 14 January 2021
Revised: 24 September 2021
Accepted: 25 September 2021
Published: 17 November 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10586-021-03442-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Spark-based high utility itemset mining with multiple external utilities

Abstract

Access this article

Similar content being viewed by others

High Utility Pattern Mining Distributed Algorithm Based on Spark RDD

Parallel High Average-Utility Itemset Mining Using Better Search Space Division Approach

EAFIM: efficient apriori-based frequent itemset mining algorithm on Spark for big transactional data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Spark-based high utility itemset mining with multiple external utilities

Abstract

Access this article

Similar content being viewed by others

High Utility Pattern Mining Distributed Algorithm Based on Spark RDD

Parallel High Average-Utility Itemset Mining Using Better Search Space Division Approach

EAFIM: efficient apriori-based frequent itemset mining algorithm on Spark for big transactional data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation