Abstract
Sequential frequent itemsets detection is one of the core problems in data mining with many applications in business, marketing, data stream analysis, etc. In the current paper, we propose a new methodology based on our previous work regarding the detection of all repeated patterns in a sequence, i.e., frequent and non-frequent itemsets. By analyzing big datasets from FIMI website of up to one million transactions we were able to detect not only the most frequent sequential itemsets, but also any sequential itemset that occurred at least twice in the dataset and, therefore, detect outliers which may be important while no other methodology can perform such analysis. For this purpose, we have used the novel data structure LERP-RSA (Longest Expected Repeated Pattern-Reduced Suffix Array) and the innovative ARPaD algorithm which allows the detection of all repeated patterns in a string. The methodology uses a pre-statistical analysis of the transactions and this allows constructing in a very efficient way smaller LERP-RSA data structures for each transaction. The integration and classification of all LERP-RSAs let the ARPaD algorithm to be executed in parallel which can accelerate the process and find the itemsets in a very efficient way.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal R, Srikant R. Mining sequential patterns. In: Yu PS, Chen ASP, editors. 11th International Conference on Data Engineering (ICDE’95). Taipei, Taiwan: IEEE Computer Society Press; 1995. p. 3–14.
Srikant R, Agrawal R. Mining sequential patterns: generalizations and performance improvements. Berlin Heidelberg: Springer; 1996. p. 1–17.
Akerkar R, Akerkar R. Discrete mathematics. New Delhi: Pearson India; 2007.
Enberton H. Set theory. Encyclopedia Britannica. http://www.britannica.com/topic/set-theory.
Xylogiannopoulos K, Karampelas P, Alhajj R. Analyzing very large time series using suffix arrays. Appl Intell. 2014;41(3):941–55.
Xylogiannopoulos K, Karampelas P, Alhajj R. Repeated patterns detection in big data using classification and parallelism on LERP reduced suffix arrays. Appl Intell. 2016:1–31. doi:10.1007/s10489-016-0766-2.
Mabroukeh NR, Ezeife CI. A taxonomy of sequential pattern mining algorithms. ACM Comput Surv. 2010;43(1):41.
Gupta M, Han J. Applications of pattern discovery using sequential data mining. In: Pattern discovery using sequence data mining: applications and studies. Hershey: Information Science Reference; 2012. p. 1–23.
Mooney CH, Roddick JF. Sequential pattern mining—approaches and algorithms. ACM Comput Surv (CSUR). 2013;45(2):19.
Masseglia F, Cathala F, Poncelet P. The psp approach for mining sequential patterns. In: Principles of data mining and knowledge discovery. Berlin Heidelberg: Springer; 1998. p. 176–84.
Garofalakis MN, Rastogi R, Shim K. SPIRIT: sequential pattern mining with regular expression constraints. VLDB. 1999;99:7–10.
Zhang M, Kao B, Yip CL, Cheung D. A GSP-based efficient algorithm for mining frequent sequences. In: Proceedings of IC-AI, June 2001, p. 497–503.
Luo C, Chung SM. A scalable algorithm for mining maximal frequent sequences using a sample. Knowl Inf Syst. 2008;15(2):149–79.
Zaki MJ. Efficient enumeration of frequent sequences. In: Proceedings of the seventh international conference on Information and knowledge management. New York: ACM; Nov 1998. p. 68–75.
Ayres J, Flannick J, Gehrke J, Yiu T. Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM; July 2002. p. 429–35.
Yang Z, Wang Y, Kitsuregawa M. LAPIN: effective sequential pattern mining algorithms by last position induction for dense databases. In: Advances in databases: concepts, systems and applications. Berlin Heidelberg: Springer; 2007. p. 1020–3.
Orlando S, Perego R, Silvestri C. A new algorithm for gap constrained sequence mining. In: Proceedings of the 2004 ACM symposium on Applied computing. New York: ACM; Mar 2004. p. 540–7.
Savary L, Zeitouni K. Indexed bit map (ibm) for mining frequent sequences. In: Knowledge discovery in databases: PKDD 2005. Berlin Heidelberg: Springer; 2005. p. 659–66.
Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC. FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM; Aug 2000. p. 355–9.
Pei J, Han J, Mortazavi-Asl B, et al. Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE Computer Society; 2001. p. 0215.
Seno M, Karypis G. Lpminer: an algorithm for finding frequent itemsets using length-decreasing support constraint. In: Proceedings of IEEE International Conference on Data Mining, 2001 (ICDM 2001). IEEE; 2001. p. 505–12.
Kum HC, Pei J, Wang W, Duncan D. ApproxMAP: approximate mining of consensus sequential patterns. In: SDM. May 2003. p. 311–5.
Song S, Hu H, Jin S. HVSM: a new sequential pattern mining algorithm using bitmap representation. In: Advanced data mining and applications. Berlin Heidelberg: Springer; 2005. p. 455–63.
Chiu DY, Wu YH, Chen AL. An efficient algorithm for mining frequent sequences by a new strategy without support counting. In: Proceedings of 20th IEEE international conference on data engineering, 2004. Mar 2004. p. 375–86.
Yin J, Zheng Z, Cao L. USpan: an efficient algorithm for mining high utility sequential patterns. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM; Aug 2012. p. 660–8.
Zihayat M, Wu CW, An A, Tseng VS. Mining high utility sequential patterns from evolving data streams. In: Proceedings of the ASE big data and social informatics 2015. New York: ACM; Oct 2015. p. 52.
Ukkonen E. Maximal and minimal representations of gapped and non-gapped motifs of a string. Theor Comput Sci. 2009;410(43):4341–9.
Xylogiannopoulos K, Karampelas P, Alhajj R. Experimental analysis on the normality of π, e, φ, sqrt(2) using advanced data-mining techniques. Exp Math. 2014;23(2):105–28.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R. (2017). Frequent and Non-frequent Sequential Itemsets Detection. In: Kaya, M., Erdoǧan, Ö., Rokne, J. (eds) From Social Data Mining and Analysis to Prediction and Community Detection. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-51367-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-51367-6_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51366-9
Online ISBN: 978-3-319-51367-6
eBook Packages: Computer ScienceComputer Science (R0)