Distinguishing subsequence patterns mining aims to discover the differences between different categories of sequence databases and to express characteristics of classes. It plays an important role in biomedicine, feature information selection, time-series classification, and other areas. The existing distinguishing subsequence patterns mining only focuses on whether a pattern appears in a sequence, regardless of the number of occurrences of the pattern in the sequence and the proportion of the pattern in the entire sequence database, which affects the discovery of the distinguishing patterns when there are a large number of irrelevant occurrences. Therefore, the nonoverlapping conditional distinguishing subsequence patterns mining algorithm is proposed. In this paper, we focus on the number of nonoverlapping occurrences that effectively reduce the number of irrelevant or redundant occurrences, and in this way, the number of occurrences can be better grasped. At the same time, we use a specially designed data structure, namely, a Nettree, to avoid backtracking. In addition, we use the distinguishing patterns as classification features, and carry out classification experiments on DNA sequences and time-series data with two classes. Extensive experimental results and comparisons demonstrate the efficiency of the proposed algorithm and the correctness of the feature extraction.
This is a preview of subscription content, log in to check access.
The work was supported in part by the National Natural Science Foundation of China under Grant 61673159, in part by the Natural Science Foundation of Hebei Province under Grant F2016202145, in part by the Science and the Technology Project of Hebei Province under Grant 15210325, and in part by the Graduate Student Innovation Program of Hebei Province under Grant CXZZSS2017037.
Malarvizhi, S.P., Sathiyabhama, B.: Frequent pagesets from web log by enhanced weighted association rule mining. Clust. Comput. 19(1), 269–277 (2016)CrossRefGoogle Scholar
Ding, B., Lo, D., Han, J., et al.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: IEEE 25th International Conference on Data Engineering, pp. 1024–1035 (2009)Google Scholar
Zhang, S., Du, Z., Wang, J.T.: New techniques for mining frequent patterns in unordered trees. IEEE Trans. Cybern. 45(6), 1113–1125 (2015)CrossRefGoogle Scholar
Tan, C., Min, F., Wang, M., et al.: Discovering patterns with weak-wildcard gaps. IEEE Access 4, 4922–4932 (2016)CrossRefGoogle Scholar
Feng, Y., Ji, M., Xiao, J., et al.: Mining spatial-temporal patterns and structural sparsity for human motion data denoising. IEEE Trans. Cybern. 45(12), 2693–2706 (2015)CrossRefGoogle Scholar
Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)CrossRefGoogle Scholar
Chou, C., Jea, K., Liao, H.: A syntactic approach to twig-query matching on XML streams. J. Syst. Softw. 84(6), 993–1007 (2011)CrossRefGoogle Scholar
Cole, J., Chai, B., Farris, R., et al.: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 33(suppl_1), D294–D296 (2005)CrossRefGoogle Scholar
Li, C., Yang, Q., Wang, J., et al.: Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data 6(1), 2 (2012)CrossRefGoogle Scholar
Ghosh, S., Feng, M., Nguyen, H., et al.: Risk prediction for acute hypotensive patients by using gap constrained sequential contrast patterns. In: AMIA Annual Symposium Proceedings, pp. 1748–1757. American Medical Informatics Association (2014)Google Scholar
Drory Retwitzer, M., Polishchuk, M., Churkin, E., et al.: RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res. 43(W1), W507–W512 (2015)CrossRefGoogle Scholar
Wang, X., Duan, L., Dong, G., et al.: Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: International Conference on Database Systems for Advanced Applications, pp. 372–387. Springer, Cham (2014)CrossRefGoogle Scholar
Yang, H., Duan, L., Hu, B., et al.: Mining Top-k distinguishing sequential patterns with gap constraint. J. Softw. 26(11), 2994–3009 (2015). (in Chinese)MathSciNetzbMATHGoogle Scholar
Wang, H., Duan, L., Zuo, J., et al.: Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chin. J. Comput. 39(10), 1979–1991 (2016). (in Chinese)MathSciNetGoogle Scholar