Abstract
Contrast sequential pattern is defined as a pattern that occurs frequently in one sequence dataset but not in the others. Contrast sequential pattern mining has been widely used in many fields, such as customer behavior analysis and medical diagnosis. Existing algorithms first require users to set a distinguishing location and then use this fixed location to identify distribution differences of different subsequences, i.e., the subsequence pattern that appears before the given distinguishing location in one sequence dataset and after the same location in another sequence dataset. However, it is difficult for users to set an appropriate location without sufficient prior knowledge. Since the distinguishing location is different for different subsequences, setting a fixed location may ignore many meaningful patterns. In addition, previous studies rarely considered the time distribution variation of subsequences and the discreteness of patterns. To solve the above problems, we propose a novel method of mining contrast sequential pattern based on subsequence time distribution variation with discreteness constraints in this paper. A suffix-tree based search algorithm, which transforms the dataset to be processed into a tree representation, is designed to mine contrast sequential pattern based on subsequence time distribution variation. Experiments are conducted on real-world time-series datasets, and the experimental results validate the superiority of our method in terms of effectiveness and efficiency when compared with other state-of-the-art methods.
Similar content being viewed by others
References
Deng K, Zaïane OR (2010) An occurrence based approach to mine emerging sequences[J]. Lect Notes Comput Sci 6263:275–284
Chen X, Xiao B. (2017) Emerging sequences pattern mining based on location information[J]. Comput Sci 44(07):175–179
Huynh B, Vo B, Snasel V (2017) An efficient method for mining frequent sequential patterns using multi-Core processors[J]. Appl Intell 46(3):703–716
Pazhanikumar K, Arumugaperumal S (2015) An algorithm for mining closed weighted sequential patterns with flexing time interval for medical time series data[C]. In: International conference on computers
Dinh DT, Le B, Fournier-Viger P, et al. (2018) An efficient algorithm for mining periodic high-utility sequential patterns[J]. Appl Intell, 1–21
Pei J, Wang H, Liu J, et al. (2006) Discovering frequent closed partial orders from strings[J]. IEEE Trans Knowl Data Eng 18(11):1467–1481
Yang H, Duan L, Dong G, et al. (2015) Mining itemset-based distinguishing sequential patterns with gap constraint[M]. Database systems for advanced applications. Springer International Publishing, pp 39–54
Zheng Z, Wei W, Liu C, et al. (2016) An effective contrast sequential pattern mining approach to taxpayer behavior analysis[J]. World Wide Web-internet Web Inf Syst 19(4):633–651
Conklin D, Anagnostopoulou C (2010) Comparative pattern analysis of cretan folk songs[C]. In: International workshop on machine learning and music. ACM, pp 33–36
Nielsen H, Engelbrecht J, Von HG, et al. (2015) Defining a similarity threshold for a functional protein sequence pattern: the signal peptide cleavage site[J]. Proteins Struct Funct Bioinform 24(2):165–177
Colbran LL, Chen L, Capra JA, Short DNA (2017) sequence patterns accurately identify broadly active human enhancers[J]. Bmc Genom 18(1):536
Xie X, Guan J, Zhou S (2015) Similarity evaluation of DNA sequences based on frequent patterns and entropy[J]. BMC Genom, 16
Tanvee MM, Kabeer SJ, Chowdhury TM, et al. (2014) Mining maximal adjacent frequent patterns from DNA sequences using location information[J]. Int J Comput Appl 76(15):26–32
Shen B, Zheng Q, Li X, et al. (2015) A framework for mining actionable navigation patterns from in-store RFID datasets via indoor mapping[J]. Sensors 15(3):5344–75
Yaeli A, Bak P, Feigenblat G (2014) Understanding customer behavior using indoor location analysis and visualization[J]. Ibm J Res Develop 58(5/6):3:1-3:12
Wang X, Leckie C, Xie H, et al. (2015) Discovering the impact of urban traffic interventions using contrast mining on vehicle trajectory data[C]. Pacific-asia conference on knowledge discovery & data mining. Springer, Cham
Li L, Leckie C (2016) Trajectory pattern identification and anomaly detection of pedestrian flows based on visual clustering[M]. Trajectory pattern intelligent information processing VIII. Springer International Publishing
An A, Wan Q, Zhao J, et al. (2009) Diverging patterns: discovering significant frequency change dissimilarities in large databases[C]. In: ACM Conference on information and knowledge management. ACM, pp 1473–1476
Ji X, Bailey J, Dong G (2007) Mining minimal distinguishing subsequence patterns with gap constraints[J]. Knowled Inf Syst 11(3):259–286
Wang HF, Lei D, Jie Z, et al. (2016) Efficient mining of distinguishing sequential patterns without a predefined gap constraint[J]. Chinese Journal of Computers
Hao Y, Lei D, Bin HU, et al. (2015) Mining top-k distinguishing sequential patterns with gap constraint[J]. Journal of Software
Gao C, Duan L, Dong G, et al. (2016) Mining top- k distinguishing sequential patterns with flexible gap constraints[M]. Web-age information management. Springer International Publishing, pp 82–94
Wang X, Duan L, Dong G, et al. (2014) Efficient mining of density-aware distinguishing sequential patterns with gap constraints[M]. Database systems for advanced applications. Springer International Publishing, pp 372–387
Pang T, Duan L, Liling J, et al. (2017) Mining similarity-aware distinguishing sequential patterns from biomedical sequences[C]. IEEE Second international conference on data science in cyberspace
Wu Y, Wang Y, Liu J et al (2018) Mining distinguishing subsequence patterns with nonoverlapping condition[J]. Cluster Comput 1:1–13
Duan L, Yan L, Dong G, et al. (2017) Mining top-k distinguishing temporal sequential patterns from event sequences[M]. Database Systems for Advanced Applications
Li L, Erfani S, Leckie C (2017) Pattern tree based method for mining conditional contrast patterns of multi-source data[C]. In: IEEE International conference on data mining workshops IEEE computer society, pp 916–923
UCI machine learning repository. http://archive.ics.uci.edu/ml
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, R., Li, Q. & Chen, X. Mining contrast sequential pattern based on subsequence time distribution variation with discreteness constraints. Appl Intell 49, 4348–4360 (2019). https://doi.org/10.1007/s10489-019-01492-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01492-7