Mining Skyline Patterns from Big Data Environments based on a Spark Framework

Wu, Jimmy Ming-Tai; Zhou, Huiying; Lin, Jerry Chun-Wei; Srivastava, Gautam; Baza, Mohamed

doi:10.1007/s10723-023-09653-2

Mining Skyline Patterns from Big Data Environments based on a Spark Framework

Published: 05 April 2023

Volume 21, article number 22, (2023)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Jimmy Ming-Tai Wu¹,
Huiying Zhou¹,
Jerry Chun-Wei Lin²,
Gautam Srivastava^3,4,5 &
…
Mohamed Baza⁶

103 Accesses
2 Citations
Explore all metrics

Abstract

Simultaneously, the application of resilient distributed datasets (RDD) in cloud computing provides a good environment for data analysis of big data. In addition, the combination of Machine Learning (ML) algorithms of the edge computing paradigm and the SFUP-SP algorithm may be able to also be used to improve local computing capabilities and speed up data analysis and user decision-making.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Article 12 July 2021

Big data privacy: a technological perspective and review

Article Open access 26 November 2016

Big data analytics in Cloud computing: an overview

Article Open access 06 August 2022

Data Availability

All data is available upon request of the authors

References

Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)
Article Google Scholar
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp. 207–216 (1993)
Agrawal, R., Srikant, R., et al: Fast algorithms for mining association rules. In: Proc. 20Th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499. Citeseer (1994)
Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Lee, Y.K.: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)
Article Google Scholar
Armbrust, M., Das, T., Davidson, A., Ghodsi, A., Or, A., Rosen, J., Stoica, I., Wendell, P., Xin, R., Zaharia, M.: Scaling spark in the real world: performance and usability. Proceedings of the VLDB Endowment 8(12), 1840–1843 (2015)
Article Google Scholar
Benlachmi, Y., Hasnaoui, M.L.: Big data and spark: comparison with hadoop. In: 2020 Fourth World conference on smart trends in systems, security and sustainability (Worlds4), pp. 811–817. IEEE (2020)
Chan, R., Yang, Q., Shen, Y.D.: Mining high utility itemsets. In: Third IEEE International Conference on Data Mining, pp. 19–19. IEEE Computer Society (2003)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dessokey, M., Saif, S.M., Salem, S., Saad, E., Eldeeb, H.: Memory management approaches in apache spark: a review. In: International Conference on Advanced Intelligent Systems and Informatics, pp. 394–403. Springer (2020)
Fournier-Viger, P., Lin, J.C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H.T.: The spmf open-source data mining library version 2. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 36–40. Springer (2016)
Fournier-Viger, P., Wu, C.W., Tseng, V.S.: Mining top-k association rules. In: Canadian Conference on Artificial Intelligence, pp. 61–73. Springer (2012)
Fournier-Viger, P., Wu, C.W., Zida, S., Tseng, V.S.: Fhm: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International Symposium on Methodologies for Intelligent Systems, pp. 83–92. Springer (2014)
Gadekallu, T.R., Pham, Q.V., Nguyen, D.C., Maddikunta, P.K.R., Deepa, N., Prabadevi, B., Pathirana, P.N., Zhao, J., Hwang, W.J.: Blockchain for edge of things: applications, opportunities, and challenges. IEEE Internet Things J. 9(2), 964–988 (2021)
Article Google Scholar
Goyal, V., Sureka, A., Patel, D.: Efficient skyline itemsets mining. In: Proceedings of the Eighth International c* Conference on Computer Science & Software Engineering, pp. 119–124 (2015)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod Record 29(2), 1–12 (2000)
Article Google Scholar
Krishnamoorthy, S.: Pruning strategies for mining high utility itemsets. Expert Syst. Appl. 42(5), 2371–2381 (2015)
Article Google Scholar
Lin, C.W., Hong, T.P., Lu, W.H.: Efficiently mining high average utility itemsets with a tree structure. In: Asian Conference on Intelligent Information and Database Systems, pp. 131–139. Springer (2010)
Lin, C.W., Hong, T.P., Lu, W.H.: An effective tree structure for mining high utility itemsets. Expert Syst. Appl. 38(6), 7419–7424 (2011)
Article Google Scholar
Lin, J.C.W., Yang, L., Fournier-Viger, P., Dawar, S., Goyal, V., Sureka, A., Vo, B.: A more efficient algorithm to mine skyline frequent-utility patterns. In: International Conference on Genetic and Evolutionary Computing, pp. 127–135. Springer (2016)
Lin, J.C.W., Yang, L., Fournier-Viger, P., Hong, T.P.: Mining of skyline patterns by considering both frequent and utility constraints. Eng. Appl. Artif. Intel. 77, 229–238 (2019)
Article Google Scholar
Liu, J., Wang, K., Fung, B.C.: Direct discovery of high utility itemsets without candidate generation. In: 2012 IEEE 12th International Conference on Data Mining, pp. 984–989. IEEE (2012)
Liu, J., Wang, K., Fung, B.C.: Mining high utility patterns in one phase without generating candidates. IEEE Trans. Knowl. Data Eng. 28(5), 1245–1257 (2015)
Article Google Scholar
Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64 (2012)
Liu, Y., Liao, W.K., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 689–695. Springer (2005)
Ogihara, Z.P., Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Citeseer (1997)
Podpecan, V., Lavrac, N., Kononenko, I.: A fast algorithm for mining utility-frequent itemsets. Constraint-Based Min. Learn. p. 9 (2007)
Salloum, S., Dautov, R., Chen, X., Peng, P.X., Huang, J.Z.: Big data analytics on apache spark. Int. J. Data Sci. Anal. 1(3), 145–164 (2016)
Article Google Scholar
Satyanarayanan, M.: The emergence of edge computing. Comput. 50(1), 30–39 (2017)
Article Google Scholar
Song, W., Zheng, C., Fournier-Viger, P.: Mining skyline frequent-utility itemsets with utility filtering. In: Pacific Rim International Conference on Artificial Intelligence, pp. 411–424. Springer (2021)
Tseng, V.S., Wu, C.W., Fournier-Viger, P., Philip, S.Y.: Efficient algorithms for mining top-k high utility itemsets. IEEE Trans. Knowl. Data Eng. 28(1), 54–67 (2015)
Article Google Scholar
Tseng, V.S., Wu, C.W., Shie, B.E., Yu, P.S.: Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 253–262 (2010)
Wu, J.M.T., Lin, J.C.W., Tamrakar, A.: High-utility itemset mining with effective pruning strategies. ACM Trans. Knowl. Discov. Data (TKDD) 13(6), 1–22 (2019)
Article Google Scholar
Wu, J.M.T., Srivastava, G., Lin, J.C.W., Djenouri, Y., Wei, M., Parizi, R.M., Khan, M.S.: Mining of high-utility patterns in big iot-based databases. Mob. Netw. Appl. 26(1), 216–233 (2021)
Article Google Scholar
Wu, J.M.T., Teng, Q., Huda, S., Chen, Y.C., Chen, C.M.: A privacy frequent itemsets mining framework for collaboration in iot using federated learning. ACM Trans. Sens. Netw. (TOSN) (2022)
Wu, J.M.T., Teng, Q., Lin, J.C.W., Cheng, C.F.: Incrementally updating the discovered high average-utility patterns with the pre-large concept. IEEE Access 8, 66788–66798 (2020)
Article Google Scholar
Wu, J.M.T., Wei, M., Wu, M.E., Tayeb, S.: Top-k dominating queries on incomplete large dataset. J. Supercomput., pp. 1–22 (2021)
Yao, H., Hamilton, H.J.: Mining itemset utilities from transaction databases. Data Knowl. Eng. 59(3), 603–626 (2006)
Article Google Scholar
Yen, S.J., Lee, Y.S.: Mining High Utility Quantitative Association Rules. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 283–292. Springer (2007)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a {Fault-Tolerant} abstraction for {In-Memory} cluster computing. In: 9Th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pp. 15–28 (2012)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2Nd USENIX Workshop on Hot Topics in Cloud Computing (Hotcloud 10) (2010)
Zida, S., Fournier-Viger, P., Lin, J.C.W., Wu, C.W., Tseng, V.S.: Efim: a highly efficient algorithm for high-utility itemset mining. In: Mexican International Conference on Artificial Intelligence, pp. 530–546. Springer (2015)
Zida, S., Fournier-Viger, P., Lin, J.C.W., Wu, C.W., Tseng, V.S.: Efim: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 595–625 (2017)
Article Google Scholar

Download references

Funding

No funding was obtained for this study

Author information

Authors and Affiliations

College of Computer Science and Engineering, Shandong University of Science and Technology, Qindao, China
Jimmy Ming-Tai Wu & Huiying Zhou
Department of Computer Science, Electrical Engineering and Mathematical Sciences, Western Norway University of Applied Sciences, Bergen, Norway
Jerry Chun-Wei Lin
Department of Mathematics, Computer Science, Brandon University, Brandon, Manitoba, Canada
Gautam Srivastava
Research Centre for Interneural Computing, China Medical University, Taichung, Taiwan
Gautam Srivastava
Department of Computer Science and Mathematics, Lebanese American University, Beirut, 1102, Lebanon
Gautam Srivastava
Department of Computer Science, College of Charleston, Charleston, SC, 29424, USA
Mohamed Baza

Authors

Jimmy Ming-Tai Wu
View author publications
You can also search for this author in PubMed Google Scholar
Huiying Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jerry Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Baza
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Jimmy Ming-Tai Wu and Huiying Zhou; Methodology: Jerry Chun-Wei Lin; Formal analysis: Gautam Srivastava and Mohamed Baza; Original Draft: Jimmy Ming-Tai Wu and Huiying Zhou and Mohamed Baza; Review & Editing: Gautam Srivastava, and Jerry Chun-Wei Lin

Corresponding author

Correspondence to Gautam Srivastava.

Ethics declarations

Competing interests

The authors have no competing interests to declare

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, J.MT., Zhou, H., Lin, J.CW. et al. Mining Skyline Patterns from Big Data Environments based on a Spark Framework. J Grid Computing 21, 22 (2023). https://doi.org/10.1007/s10723-023-09653-2

Download citation

Received: 26 September 2022
Accepted: 22 February 2023
Published: 05 April 2023
DOI: https://doi.org/10.1007/s10723-023-09653-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Skyline Patterns from Big Data Environments based on a Spark Framework

Abstract

Access this article

Similar content being viewed by others

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Big data privacy: a technological perspective and review

Big data analytics in Cloud computing: an overview

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining Skyline Patterns from Big Data Environments based on a Spark Framework

Abstract

Access this article

Similar content being viewed by others

Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective

Big data privacy: a technological perspective and review

Big data analytics in Cloud computing: an overview

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation