Cluster Computing

, Volume 22, Supplement 3, pp 5905–5917 | Cite as

Mining distinguishing subsequence patterns with nonoverlapping condition

  • Youxi Wu
  • Yuehua Wang
  • Jingyu Liu
  • Ming Yu
  • Jing Liu
  • Yan LiEmail author


Distinguishing subsequence patterns mining aims to discover the differences between different categories of sequence databases and to express characteristics of classes. It plays an important role in biomedicine, feature information selection, time-series classification, and other areas. The existing distinguishing subsequence patterns mining only focuses on whether a pattern appears in a sequence, regardless of the number of occurrences of the pattern in the sequence and the proportion of the pattern in the entire sequence database, which affects the discovery of the distinguishing patterns when there are a large number of irrelevant occurrences. Therefore, the nonoverlapping conditional distinguishing subsequence patterns mining algorithm is proposed. In this paper, we focus on the number of nonoverlapping occurrences that effectively reduce the number of irrelevant or redundant occurrences, and in this way, the number of occurrences can be better grasped. At the same time, we use a specially designed data structure, namely, a Nettree, to avoid backtracking. In addition, we use the distinguishing patterns as classification features, and carry out classification experiments on DNA sequences and time-series data with two classes. Extensive experimental results and comparisons demonstrate the efficiency of the proposed algorithm and the correctness of the feature extraction.


Nonoverlapping occurrences Distinguishing subsequence pattern Nettree Feature extraction 



The work was supported in part by the National Natural Science Foundation of China under Grant 61673159, in part by the Natural Science Foundation of Hebei Province under Grant F2016202145, in part by the Science and the Technology Project of Hebei Province under Grant 15210325, and in part by the Graduate Student Innovation Program of Hebei Province under Grant CXZZSS2017037.


  1. 1.
    Malarvizhi, S.P., Sathiyabhama, B.: Frequent pagesets from web log by enhanced weighted association rule mining. Clust. Comput. 19(1), 269–277 (2016)CrossRefGoogle Scholar
  2. 2.
    Ding, B., Lo, D., Han, J., et al.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: IEEE 25th International Conference on Data Engineering, pp. 1024–1035 (2009)Google Scholar
  3. 3.
    Zhang, S., Du, Z., Wang, J.T.: New techniques for mining frequent patterns in unordered trees. IEEE Trans. Cybern. 45(6), 1113–1125 (2015)CrossRefGoogle Scholar
  4. 4.
    Tan, C., Min, F., Wang, M., et al.: Discovering patterns with weak-wildcard gaps. IEEE Access 4, 4922–4932 (2016)CrossRefGoogle Scholar
  5. 5.
    Feng, Y., Ji, M., Xiao, J., et al.: Mining spatial-temporal patterns and structural sparsity for human motion data denoising. IEEE Trans. Cybern. 45(12), 2693–2706 (2015)CrossRefGoogle Scholar
  6. 6.
    Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)CrossRefGoogle Scholar
  7. 7.
    Wu, Y., Wang, L., Ren, J., et al.: Mining sequential patterns with periodic wildcard gaps. Appl. Intell. 41(1), 99–116 (2014)CrossRefGoogle Scholar
  8. 8.
    Chou, C., Jea, K., Liao, H.: A syntactic approach to twig-query matching on XML streams. J. Syst. Softw. 84(6), 993–1007 (2011)CrossRefGoogle Scholar
  9. 9.
    Cole, J., Chai, B., Farris, R., et al.: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 33(suppl_1), D294–D296 (2005)CrossRefGoogle Scholar
  10. 10.
    Li, C., Yang, Q., Wang, J., et al.: Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data 6(1), 2 (2012)CrossRefGoogle Scholar
  11. 11.
    Ghosh, S., Feng, M., Nguyen, H., et al.: Risk prediction for acute hypotensive patients by using gap constrained sequential contrast patterns. In: AMIA Annual Symposium Proceedings, pp. 1748–1757. American Medical Informatics Association (2014)Google Scholar
  12. 12.
    Drory Retwitzer, M., Polishchuk, M., Churkin, E., et al.: RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res. 43(W1), W507–W512 (2015)CrossRefGoogle Scholar
  13. 13.
    Wang, X., Duan, L., Dong, G., et al.: Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: International Conference on Database Systems for Advanced Applications, pp. 372–387. Springer, Cham (2014)CrossRefGoogle Scholar
  14. 14.
    Yang, H., Duan, L., Hu, B., et al.: Mining Top-k distinguishing sequential patterns with gap constraint. J. Softw. 26(11), 2994–3009 (2015). (in Chinese)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Wang, H., Duan, L., Zuo, J., et al.: Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chin. J. Comput. 39(10), 1979–1991 (2016). (in Chinese)MathSciNetGoogle Scholar
  16. 16.
    Wu, Y., Tong, Y., Zhu, X., et al.: NOSEP: nonoverlapping sequence pattern mining with gap constraints. IEEE Trans. Cybern. (2017). CrossRefGoogle Scholar
  17. 17.
    Min, F., Wu, Y., Wu, X.: The Apriori property of sequence pattern mining with wildcard gaps. Int. J. Funct. Inform. Pers. Med. 4(1), 15–31 (2012)Google Scholar
  18. 18.
    Zhang, M., Kao, B., Cheung, D., et al.: Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2), 7 (2007)CrossRefGoogle Scholar
  19. 19.
    Zhang, L., Luo, P., Tang, L., et al.: Occupancy-based frequent pattern mining. ACM Trans. Knowl. Discov. Data (TKDD) 10(2), 14 (2015)Google Scholar
  20. 20.
    Wu, Y., Liu, D., Jiang, H.: Length-changeable incremental extreme learning machine. J. Comput. Sci. Technol. 32(3), 630–643 (2017)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Egho, E., Gay, D., Boulle, M., et al.: A parameter-free approach for mining robust sequential classification rules. Knowl. Inf. Syst. 52(1), 53–81 (2017)CrossRefGoogle Scholar
  22. 22.
    Wu, Y., Shen, C., Jiang, H., et al.: Strict pattern matching under non-overlapping condition. Sci. China Inf. Sci. 60(1), 012101 (2017)CrossRefGoogle Scholar
  23. 23.
    Yen, S., Lee, Y.: Mining non-redundant time-gap sequential patterns. Appl. Intell. 39(4), 727–738 (2013)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Wu, Y., Wu, X., Min, F., et al.: A Nettree for pattern matching with flexible wildcard constraints. In: International Conference on Information Reuse and Integration, pp. 109–114 (2010)Google Scholar
  25. 25.
    Wu, Y., Tang, Z., Jiang, H., et al.: Approximate pattern matching with gap constraints. J. Inf. Sci. 42(5), 639–658 (2016)CrossRefGoogle Scholar
  26. 26.
    Wu, Y., Fu, S., Jiang, H., et al.: Strict approximate pattern matching with general gaps. Appl. Intell. 42(3), 566–580 (2015)CrossRefGoogle Scholar
  27. 27.
    Fradkin, D., Mörchen, F.: Mining sequential patterns for classification. Knowl. Inf. Syst. 45(3), 731–749 (2015)CrossRefGoogle Scholar
  28. 28.
    Zhou, C., Cule, B., Goethals, B.: Pattern based sequence classification. IEEE Trans. Knowl. Data Eng. 28(5), 1285–1298 (2016)CrossRefGoogle Scholar
  29. 29.
    Fong, S., Wong, R., Vasilakos, A.: Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans. Serv. Comput. 9(1), 33–45 (2016)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Youxi Wu
    • 1
    • 2
  • Yuehua Wang
    • 1
    • 2
  • Jingyu Liu
    • 1
    • 2
  • Ming Yu
    • 1
    • 2
  • Jing Liu
    • 1
    • 2
  • Yan Li
    • 3
    Email author
  1. 1.School of Computer Science and EngineeringHebei University of TechnologyTianjinChina
  2. 2.Hebei Province Key Laboratory of Big Data CalculationTianjinChina
  3. 3.School of Economics and ManagementSchool of Hebei University of TechnologyTianjinChina

Personalised recommendations