Skip to main content
Log in

Mining Skyline Patterns from Big Data Environments based on a Spark Framework

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Simultaneously, the application of resilient distributed datasets (RDD) in cloud computing provides a good environment for data analysis of big data. In addition, the combination of Machine Learning (ML) algorithms of the edge computing paradigm and the SFUP-SP algorithm may be able to also be used to improve local computing capabilities and speed up data analysis and user decision-making.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

All data is available upon request of the authors

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Database mining: a performance perspective. IEEE Trans. Knowl. Data Eng. 5(6), 914–925 (1993)

    Article  Google Scholar 

  2. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp. 207–216 (1993)

  3. Agrawal, R., Srikant, R., et al: Fast algorithms for mining association rules. In: Proc. 20Th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499. Citeseer (1994)

  4. Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., Lee, Y.K.: Efficient tree structures for high utility pattern mining in incremental databases. IEEE Trans. Knowl. Data Eng. 21(12), 1708–1721 (2009)

    Article  Google Scholar 

  5. Armbrust, M., Das, T., Davidson, A., Ghodsi, A., Or, A., Rosen, J., Stoica, I., Wendell, P., Xin, R., Zaharia, M.: Scaling spark in the real world: performance and usability. Proceedings of the VLDB Endowment 8(12), 1840–1843 (2015)

    Article  Google Scholar 

  6. Benlachmi, Y., Hasnaoui, M.L.: Big data and spark: comparison with hadoop. In: 2020 Fourth World conference on smart trends in systems, security and sustainability (Worlds4), pp. 811–817. IEEE (2020)

  7. Chan, R., Yang, Q., Shen, Y.D.: Mining high utility itemsets. In: Third IEEE International Conference on Data Mining, pp. 19–19. IEEE Computer Society (2003)

  8. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  9. Dessokey, M., Saif, S.M., Salem, S., Saad, E., Eldeeb, H.: Memory management approaches in apache spark: a review. In: International Conference on Advanced Intelligent Systems and Informatics, pp. 394–403. Springer (2020)

  10. Fournier-Viger, P., Lin, J.C.W., Gomariz, A., Gueniche, T., Soltani, A., Deng, Z., Lam, H.T.: The spmf open-source data mining library version 2. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 36–40. Springer (2016)

  11. Fournier-Viger, P., Wu, C.W., Tseng, V.S.: Mining top-k association rules. In: Canadian Conference on Artificial Intelligence, pp. 61–73. Springer (2012)

  12. Fournier-Viger, P., Wu, C.W., Zida, S., Tseng, V.S.: Fhm: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: International Symposium on Methodologies for Intelligent Systems, pp. 83–92. Springer (2014)

  13. Gadekallu, T.R., Pham, Q.V., Nguyen, D.C., Maddikunta, P.K.R., Deepa, N., Prabadevi, B., Pathirana, P.N., Zhao, J., Hwang, W.J.: Blockchain for edge of things: applications, opportunities, and challenges. IEEE Internet Things J. 9(2), 964–988 (2021)

    Article  Google Scholar 

  14. Goyal, V., Sureka, A., Patel, D.: Efficient skyline itemsets mining. In: Proceedings of the Eighth International c* Conference on Computer Science & Software Engineering, pp. 119–124 (2015)

  15. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod Record 29(2), 1–12 (2000)

    Article  Google Scholar 

  16. Krishnamoorthy, S.: Pruning strategies for mining high utility itemsets. Expert Syst. Appl. 42(5), 2371–2381 (2015)

    Article  Google Scholar 

  17. Lin, C.W., Hong, T.P., Lu, W.H.: Efficiently mining high average utility itemsets with a tree structure. In: Asian Conference on Intelligent Information and Database Systems, pp. 131–139. Springer (2010)

  18. Lin, C.W., Hong, T.P., Lu, W.H.: An effective tree structure for mining high utility itemsets. Expert Syst. Appl. 38(6), 7419–7424 (2011)

    Article  Google Scholar 

  19. Lin, J.C.W., Yang, L., Fournier-Viger, P., Dawar, S., Goyal, V., Sureka, A., Vo, B.: A more efficient algorithm to mine skyline frequent-utility patterns. In: International Conference on Genetic and Evolutionary Computing, pp. 127–135. Springer (2016)

  20. Lin, J.C.W., Yang, L., Fournier-Viger, P., Hong, T.P.: Mining of skyline patterns by considering both frequent and utility constraints. Eng. Appl. Artif. Intel. 77, 229–238 (2019)

    Article  Google Scholar 

  21. Liu, J., Wang, K., Fung, B.C.: Direct discovery of high utility itemsets without candidate generation. In: 2012 IEEE 12th International Conference on Data Mining, pp. 984–989. IEEE (2012)

  22. Liu, J., Wang, K., Fung, B.C.: Mining high utility patterns in one phase without generating candidates. IEEE Trans. Knowl. Data Eng. 28(5), 1245–1257 (2015)

    Article  Google Scholar 

  23. Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64 (2012)

  24. Liu, Y., Liao, W.K., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 689–695. Springer (2005)

  25. Ogihara, Z.P., Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: 3rd Intl. Conf. on Knowledge Discovery and Data Mining, Citeseer (1997)

  26. Podpecan, V., Lavrac, N., Kononenko, I.: A fast algorithm for mining utility-frequent itemsets. Constraint-Based Min. Learn. p. 9 (2007)

  27. Salloum, S., Dautov, R., Chen, X., Peng, P.X., Huang, J.Z.: Big data analytics on apache spark. Int. J. Data Sci. Anal. 1(3), 145–164 (2016)

    Article  Google Scholar 

  28. Satyanarayanan, M.: The emergence of edge computing. Comput. 50(1), 30–39 (2017)

    Article  Google Scholar 

  29. Song, W., Zheng, C., Fournier-Viger, P.: Mining skyline frequent-utility itemsets with utility filtering. In: Pacific Rim International Conference on Artificial Intelligence, pp. 411–424. Springer (2021)

  30. Tseng, V.S., Wu, C.W., Fournier-Viger, P., Philip, S.Y.: Efficient algorithms for mining top-k high utility itemsets. IEEE Trans. Knowl. Data Eng. 28(1), 54–67 (2015)

    Article  Google Scholar 

  31. Tseng, V.S., Wu, C.W., Shie, B.E., Yu, P.S.: Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 253–262 (2010)

  32. Wu, J.M.T., Lin, J.C.W., Tamrakar, A.: High-utility itemset mining with effective pruning strategies. ACM Trans. Knowl. Discov. Data (TKDD) 13(6), 1–22 (2019)

    Article  Google Scholar 

  33. Wu, J.M.T., Srivastava, G., Lin, J.C.W., Djenouri, Y., Wei, M., Parizi, R.M., Khan, M.S.: Mining of high-utility patterns in big iot-based databases. Mob. Netw. Appl. 26(1), 216–233 (2021)

    Article  Google Scholar 

  34. Wu, J.M.T., Teng, Q., Huda, S., Chen, Y.C., Chen, C.M.: A privacy frequent itemsets mining framework for collaboration in iot using federated learning. ACM Trans. Sens. Netw. (TOSN) (2022)

  35. Wu, J.M.T., Teng, Q., Lin, J.C.W., Cheng, C.F.: Incrementally updating the discovered high average-utility patterns with the pre-large concept. IEEE Access 8, 66788–66798 (2020)

    Article  Google Scholar 

  36. Wu, J.M.T., Wei, M., Wu, M.E., Tayeb, S.: Top-k dominating queries on incomplete large dataset. J. Supercomput., pp. 1–22 (2021)

  37. Yao, H., Hamilton, H.J.: Mining itemset utilities from transaction databases. Data Knowl. Eng. 59(3), 603–626 (2006)

    Article  Google Scholar 

  38. Yen, S.J., Lee, Y.S.: Mining High Utility Quantitative Association Rules. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 283–292. Springer (2007)

  39. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a {Fault-Tolerant} abstraction for {In-Memory} cluster computing. In: 9Th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12), pp. 15–28 (2012)

  40. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: 2Nd USENIX Workshop on Hot Topics in Cloud Computing (Hotcloud 10) (2010)

  41. Zida, S., Fournier-Viger, P., Lin, J.C.W., Wu, C.W., Tseng, V.S.: Efim: a highly efficient algorithm for high-utility itemset mining. In: Mexican International Conference on Artificial Intelligence, pp. 530–546. Springer (2015)

  42. Zida, S., Fournier-Viger, P., Lin, J.C.W., Wu, C.W., Tseng, V.S.: Efim: a fast and memory efficient algorithm for high-utility itemset mining. Knowl. Inf. Syst. 51(2), 595–625 (2017)

    Article  Google Scholar 

Download references

Funding

No funding was obtained for this study

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Jimmy Ming-Tai Wu and Huiying Zhou; Methodology: Jerry Chun-Wei Lin; Formal analysis: Gautam Srivastava and Mohamed Baza; Original Draft: Jimmy Ming-Tai Wu and Huiying Zhou and Mohamed Baza; Review & Editing: Gautam Srivastava, and Jerry Chun-Wei Lin

Corresponding author

Correspondence to Gautam Srivastava.

Ethics declarations

Competing interests

The authors have no competing interests to declare

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, J.MT., Zhou, H., Lin, J.CW. et al. Mining Skyline Patterns from Big Data Environments based on a Spark Framework. J Grid Computing 21, 22 (2023). https://doi.org/10.1007/s10723-023-09653-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10723-023-09653-2

Keywords

Navigation