Skip to main content

A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2022 Workshops (ICCSA 2022)

Abstract

Sequential pattern mining algorithms are unsupervised machine learning algorithms that allow finding sequential patterns on data sequences that have been put together based on a particular order. These algorithms are mostly optimized for finding sequential data sequences containing more than one element. Hence, we argue that there is a need for algorithms that are particularly optimized for data sequences that contain only one element. Within the scope of this research, we study the design and development of a novel algorithm that is optimized for data sets containing data sequences with single elements and that can detect sequential patterns with high performance. The time and memory requirements of the proposed algorithm are examined experimentally. The results show that the proposed algorithm has low running times, while it has the same accuracy results as the algorithms in the similar category in the literature. The obtained results are promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB 1994, pp. 487–499. Morgan Kaufmann Publishers Inc., San Francisco (1994). https://doi.org/10.5555/645920.672836

  2. Anil, R., et al.: Apache mahout: machine learning on distributed dataflow systems. J. Mach. Learn. Res. 21, 1–6 (2020)

    MATH  Google Scholar 

  3. Bahadır, D., et al.: A big data processing framework for self-healing internet of things applications. In: 12th International Conference on Semantics, Knowledge and Grids (SKG) (2016)

    Google Scholar 

  4. Burak, C., et al.: Data feature selection methods on distributed big data processing platforms. In: 3rd International Conference On Computer Science And Engineering (2018)

    Google Scholar 

  5. Casado, R., Younas, M.: Emerging trends and technologies in big data processing. Concurr. Comput. Pract. Exp. (CCPE) J. 27(8), 2078–2091 (2015)

    Google Scholar 

  6. Duygu, S., et al.: Implementation of association rule mining algorithms on distributed data processing platforms. In: 4th International Conference on Computer Science and Engineering (UBMK) (2019)

    Google Scholar 

  7. Fournier-Viger, P., Lin, J.C.W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recogn. 1(1), 54–77 (2017)

    Google Scholar 

  8. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2), 1–12 (2000). https://doi.org/10.1145/335191.335372

    Article  Google Scholar 

  9. Kim, B., Yi, G.: Location-based parallel sequential pattern mining algorithm. IEEE Access 7, 128651–128658 (2019)

    Article  Google Scholar 

  10. Li, H., Zhou, X., Pan, C.: Study on GSP algorithm based on hadoop. In: 2015 IEEE 5th International Conference on Electronics Information and Emergency Communication, pp. 321–324 (2015)

    Google Scholar 

  11. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)

    MathSciNet  MATH  Google Scholar 

  12. Mooney, C.H., Roddick, J.F.: Sequential pattern mining-approaches and algorithms. ACM Comput. Surv. (CSUR) 45(2), 1–39 (2013)

    Article  MATH  Google Scholar 

  13. Pei, J., et al.: Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans. Knowl. Data Eng. 16(11), 1424–1440 (2004). https://doi.org/10.1109/TKDE.2004.77

    Article  Google Scholar 

  14. Pokou, Y.J.M., Fournier-Viger, P., Moghrabi, C.: Authorship attribution using small sets of frequent part-of-speech skip-grams. In: The Twenty-Ninth International Flairs Conference (2016)

    Google Scholar 

  15. Sabrina, P.N., Saptawati, G.P.: Multiple mapreduce and derivative projected database: new approach for supporting prefixspan scalability. In: 2015 International Conference on Data and Software Engineering (ICoDSE), pp. 148–153. IEEE (2015)

    Google Scholar 

  16. Sagiroglu, S., Sinanc, D.: Big data: a review. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 42–47 (2013)

    Google Scholar 

  17. Secil, Y., et al.: On the performance analysis of map-reduce programming model on in-memory nosql storage platforms: a case study. In: International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT) (2018)

    Google Scholar 

  18. Spmf an open-source data mining library. http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php, Accessed 15 Sept 2021

  19. Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 1–17. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014140

    Chapter  Google Scholar 

  20. Tas, Y., et al.: An approach to standalone provenance systems for big social provenance data. In: 12th International Conference on Semantics, Knowledge and Grids (SKG) (2016)

    Google Scholar 

  21. Tufek, A., et al.: On the provenance extraction techniques from large scale log files. In: Concurrency And Computation-Practice & Experience (Early Access) (2021) https://doi.org/10.1002/cpe.6559

  22. Uzun-Per, M., Gürel, A.V., Can, A.B., Aktas, M.S.: An approach to recommendation systems using scalable association mining algorithms on big data processing platforms: A case study in airline industry. In: 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–6. IEEE (2021)

    Google Scholar 

  23. Uzun-Per, M., Can, A.B., Gürel, A.V., Aktas, M.S.: Big data testing framework for recommendation systems in e-science and e-commerce domains. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 2353–2361. IEEE (2021)

    Google Scholar 

  24. Uzun-Per, M., Gurel, A.V., Can, A.B., Aktas, M.S.: Scalable recommendation systems based on finding similar items and sequences. Concurr. Comput. Pract. Exp., e6841 (2022)

    Google Scholar 

  25. Wang, J., Han, J., Li, C.: Frequent closed sequence mining without candidate maintenance. IEEE Trans. Knowl. Data Eng. 19(8), 1042–1056 (2007)

    Article  Google Scholar 

  26. Wei, Y.Q., Liu, D., Duan, L.S.: Distributed prefixspan algorithm based on mapreduce. In: 2012 International Symposium on Information Technologies in Medicine and Education, vol. 2, pp. 901–904 (2012)

    Google Scholar 

  27. Yasin, U., et al.: Technical analysis on financial time series data based on map-reduce programming model: a case study. In: International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT) (2018)

    Google Scholar 

  28. Yasin, U., et al.: On the large-scale graph data processing for user interface testing in big data science projects. In: 8th IEEE International Conference on Big Data (Big Data) (2020)

    Google Scholar 

  29. Yu, X., Li, Q., Liu, J.: Scalable and parallel sequential pattern mining using spark. World Wide Web 22(1), 295–324 (2018). https://doi.org/10.1007/s11280-018-0566-1

    Article  Google Scholar 

  30. Yu, X., Liu, J., Liu, X., Ma, C., Li, B.: A mapreduce reinforced distributed sequential pattern mining algorithm. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 183–197 (2015)

    Google Scholar 

  31. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). https://doi.org/10.1145/2934664

    Article  Google Scholar 

  32. Zaki, M.J.: Spade: an efficient algorithm for mining frequent sequences. Mach. Learn. 42, 31–60 (2004)

    Article  MATH  Google Scholar 

  33. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: The Third International Conference on Knowledge Discovery and Data Mining (KDD-97), pp. 283–286. AAAI Press, Newport Beach (1997)

    Google Scholar 

Download references

Acknowledgements

This study was supported by BiletBank R &D Center. We would like to thank BiletBank for providing us with the necessary hardware and access to their datasets.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Burak Can .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Can, A.B., Uzun-Per, M., Aktas, M.S. (2022). A Novel Sequential Pattern Mining Algorithm for Large Scale Data Sequences. In: Gervasi, O., Murgante, B., Misra, S., Rocha, A.M.A.C., Garau, C. (eds) Computational Science and Its Applications – ICCSA 2022 Workshops. ICCSA 2022. Lecture Notes in Computer Science, vol 13377. Springer, Cham. https://doi.org/10.1007/978-3-031-10536-4_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-10536-4_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-10535-7

  • Online ISBN: 978-3-031-10536-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics