Abstract
This paper proposes an approach for behavioural analysis on continuous data based on sequential pattern mining (SPM) and classification. Discretization and granularity issues associated with SPM and classification with time series data are examined, and a thorough examination of the many factors and parameters that are used to control the granularity of the various dimensions of the problem is conducted. The key contribution of the paper is the seminal examination of machine learning-based classification driven by sequential pattern mining to classify behaviour patterns. Results are reported on the effectiveness of various levels and settings on acceleration data collected from the walking activity of 17 subjects obtained from a publicly available dataset. Using the optimal set of parameter settings identified, the classifier was able to correctly identify 70% percent of walking data segments from a held-out test set as belonging to a particular subject, and 78% of those not belonging to the subject. Moreover, a number of interesting findings were identified when varying the granularity and other settings for the various discretization parameters, solidifying the work as a significant advancement in the area of behavioural analysis with sequential pattern mining and classification on time series data.
Similar content being viewed by others
References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering 1995, IEEE, pp 3–14
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 429–435
Buffett S, Emond B (2015) Using sequential pattern mining and social network analysis to identify similarities, differences and evolving behaviour in event logs. In: Business Process Management, 2015
Buffett S, Emond B Goutte C (2014) Using sequence classification to label behavior from sequential event logs. In: Business Process Management, 2014
Buffett S, Pagiatakis C, Jiang D (2018) Pattern-based behavioural analysis on neurosurgical simulation data. In: Proceedings of the Machine Learning for Healthcare Conference, 2018, pp 514–533
Casale P, Pujol O, Radeva P (2012) Personalization and user verification in wearable systems using biometric walking patterns. Person Ubiquitous Comput 16(5):563–580
Casale P, Pujol O, Radeva P (2011) Human activity recognition from accelerometer data using a wearable device. In: Iberian Conference on Pattern Recognition and Image Analysis, Springer (pp 289–296)
Chena YL, Kuo MH, Wub SY, Tang K (2009) Discovering recency, frequency, and monetary (rfm) sequential patterns from customers’ purchasing data. Electron Commerce Res Appl 8(5):241–251
Das G, Lin KI, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. KDD 98(1):16–22
Dileep A, Veena T, Sekhar CC (2012) A review of kernel methods based approaches to classification and clustering of sequential patterns, part i: sequences of continuous feature vectors. In: Pattern Discovery Using Sequence Data Mining: applications and Studies 2012, IGI Global, pp 24–50
Egho E, Gay D, Boullé M, Voisine N, Clérot F (2015) A parameter-free approach for mining robust sequential classification rules. In: 2015 IEEE International Conference on Data Mining, IEEE, pp 745–750
Egho E, Gay D, Trinquart R, Boullé M, Voisine N, & Clérot F (2017) Misere-hadoop: a large-scale robust sequential classification rules mining framework. In: International conference on big data analytics and knowledge discovery, Springer, pp 105–119
Emond B, Buffett S (2015) Analyzing student inquiry data using process discovery and sequence classification. In: International conference on educational data mining 2015, pp 412–415
Emond B, Buffett S, Goutte C, Guo RJ (2015) Analysing and refining pilot training. In: International conference on educational data mining 2016, pp 682–687
Gracy J, Argos P (1998) Automated protein sequence database classification. i. integration of compositional similarity search, local similarity search, and multiple sequence alignment. Bioinform (Oxf, Engl) 14(2):164–173
Hrovat G, Fister I Jr, Yermak K, Stiglic G, Fister I (2015) Interestingness measure for mining sequential patterns in sports. J Intell Fuzzy Syst 29(5):1981–1994
Jaber M, Wood PT, Papapetrou P, González-Marcos A (2016) A multi-granularity pattern-based sequence classification framework for educational data. In: 2016 IEEE international conference on data science and advanced analytics (DSAA), IEEE, pp 370–378
Kinnebrew JS, Biswas G (2012) Identifying learning behaviors by contextualizing differential sequence mining with action features and performance evolution. In: International conference on educational data mining 2012, pp 57–64
Kudenko D, Hirsh H (1998) Feature generation for sequence categorization. In: Proceedings of AAAI/IAAI 1998, pp 733–738
Lesh N, Zaki MJ, Oglhara M (2000) Scalable feature mining for sequential data. Intell Syst Appl IEEE 15(2):48–56
Lesh N, Zaki MJ, Ogihara M (1999) Mining features for sequence classification. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 342–346
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144
Liu H, Cocea M (2019a) Granular computing-based approach of rule learning for binary classification. Granul Comput 4(2):275–283
Liu H, Cocea M (2019b) Nature-inspired framework of ensemble learning for collaborative classification in granular computing context. Granul Comput 4(4):715–724
Liu Q, Liu Q, Yang L, Wang G (2018) A multi-granularity collective behavior analysis approach for online social networks. Granul Comput 3(4):333–343
Li Z, Zhang A, Li D, Wang L (2007)Discovering novel multistage attack strategies. In: International conference on advanced data mining and applications, Springer, pp 45–56
Mooney CH, Roddick JF (2013) Sequential pattern mining-approaches and algorithms. ACM Comput Surv (CSUR) 45(2):1–39
Muhire B, Varsani A, Martin D (2014) SDT: a virus classification tool based on pairwise sequence alignment and identity calculation. PLoS One 9(9):1–8
Özdemir AT, Barshan B (2014) Detecting falls with wearable sensors using machine learning techniques. Sensors 14(6):10691–10708
Padmaja TM, Bapi RS, Krishna PR (2012) Unbalanced sequential data classification using extreme outlier elimination and sampling techniques. In: Pattern discovery using sequence data mining: applications and studies 2012, IGI Global, pp 83–93
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu MC (2001) Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: 2013 IEEE 29th international conference on data engineering (ICDE), IEEE Computer Society, pp 215–224
Perera D, Kay J, Koprinska I, Yacef K, Zaïane OR (2008) Clustering and sequential pattern mining of online collaborative learning data. IEEE Trans Knowl Data Eng 21(6):759–772
Pradhan GN, Prabhakaran B (2017) Association rule mining in multiple, multidimensional time series medical data. J Healthcare Inf Res 1(1):92–118
Slim A, Heileman GL, Al-Doroubi W, Abdallah CT (2016) The impact of course enrollment sequences on student success. In: 2016 IEEE 30th International conference on advanced information networking and applications (AINA), IEEE, pp 59–65
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: International conference on extending database technology. Springer, Berlin, Heidelberg, pp 1–17
Tseng VS, Lee CH (2009) Effective temporal data classification by integrating sequential pattern mining and probabilistic induction. Expert Syst Appl 36(5):9524–9532
Veena T, Dileep A, Sekhar CC (2012) A review of kernel methods based approaches to classification and clustering of sequential patterns, part ii: sequences of discrete symbols. In: Pattern discovery using sequence data mining: applications and studies, IGI Global, pp 51–71
Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor Newsl 12(1):40–48
Xue R, Zhang T, Chen D, Le J, Lavassani M (2016) Sensor time series association rule discovery based on modified discretization method. In: 2016 First ieee international conference on computer communication and the internet (ICCCI), IEEE, pp 196–202
Yusof N, Zurita-Milla R, Kraak MJ, Retsios B (2016) Interactive discovery of sequential patterns in time series of wind data. Int J Geogr Inf Sci 30(8):1486–1506
Zaki MJ (2001) Spade: An efficient algorithm for mining frequent sequences. Mach Learn 42(1–2):31–60
Zhou C, Cule B, Goethals B (2015) Pattern based sequence classification. IEEE Trans Knowl Data Eng 28(5):1285–1298
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Buffett, S. Discretized sequential pattern mining for behaviour classification. Granul. Comput. 6, 853–866 (2021). https://doi.org/10.1007/s41066-020-00234-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41066-020-00234-2