Skip to main content
Log in

Mining distinguishing subsequence patterns with nonoverlapping condition

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Distinguishing subsequence patterns mining aims to discover the differences between different categories of sequence databases and to express characteristics of classes. It plays an important role in biomedicine, feature information selection, time-series classification, and other areas. The existing distinguishing subsequence patterns mining only focuses on whether a pattern appears in a sequence, regardless of the number of occurrences of the pattern in the sequence and the proportion of the pattern in the entire sequence database, which affects the discovery of the distinguishing patterns when there are a large number of irrelevant occurrences. Therefore, the nonoverlapping conditional distinguishing subsequence patterns mining algorithm is proposed. In this paper, we focus on the number of nonoverlapping occurrences that effectively reduce the number of irrelevant or redundant occurrences, and in this way, the number of occurrences can be better grasped. At the same time, we use a specially designed data structure, namely, a Nettree, to avoid backtracking. In addition, we use the distinguishing patterns as classification features, and carry out classification experiments on DNA sequences and time-series data with two classes. Extensive experimental results and comparisons demonstrate the efficiency of the proposed algorithm and the correctness of the feature extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Malarvizhi, S.P., Sathiyabhama, B.: Frequent pagesets from web log by enhanced weighted association rule mining. Clust. Comput. 19(1), 269–277 (2016)

    Article  Google Scholar 

  2. Ding, B., Lo, D., Han, J., et al.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: IEEE 25th International Conference on Data Engineering, pp. 1024–1035 (2009)

  3. Zhang, S., Du, Z., Wang, J.T.: New techniques for mining frequent patterns in unordered trees. IEEE Trans. Cybern. 45(6), 1113–1125 (2015)

    Article  Google Scholar 

  4. Tan, C., Min, F., Wang, M., et al.: Discovering patterns with weak-wildcard gaps. IEEE Access 4, 4922–4932 (2016)

    Article  Google Scholar 

  5. Feng, Y., Ji, M., Xiao, J., et al.: Mining spatial-temporal patterns and structural sparsity for human motion data denoising. IEEE Trans. Cybern. 45(12), 2693–2706 (2015)

    Article  Google Scholar 

  6. Ji, X., Bailey, J., Dong, G.: Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Inf. Syst. 11(3), 259–286 (2007)

    Article  Google Scholar 

  7. Wu, Y., Wang, L., Ren, J., et al.: Mining sequential patterns with periodic wildcard gaps. Appl. Intell. 41(1), 99–116 (2014)

    Article  Google Scholar 

  8. Chou, C., Jea, K., Liao, H.: A syntactic approach to twig-query matching on XML streams. J. Syst. Softw. 84(6), 993–1007 (2011)

    Article  Google Scholar 

  9. Cole, J., Chai, B., Farris, R., et al.: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 33(suppl_1), D294–D296 (2005)

    Article  Google Scholar 

  10. Li, C., Yang, Q., Wang, J., et al.: Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data 6(1), 2 (2012)

    Article  Google Scholar 

  11. Ghosh, S., Feng, M., Nguyen, H., et al.: Risk prediction for acute hypotensive patients by using gap constrained sequential contrast patterns. In: AMIA Annual Symposium Proceedings, pp. 1748–1757. American Medical Informatics Association (2014)

  12. Drory Retwitzer, M., Polishchuk, M., Churkin, E., et al.: RNAPattMatch: a web server for RNA sequence/structure motif detection based on pattern matching with flexible gaps. Nucleic Acids Res. 43(W1), W507–W512 (2015)

    Article  Google Scholar 

  13. Wang, X., Duan, L., Dong, G., et al.: Efficient mining of density-aware distinguishing sequential patterns with gap constraints. In: International Conference on Database Systems for Advanced Applications, pp. 372–387. Springer, Cham (2014)

    Chapter  Google Scholar 

  14. Yang, H., Duan, L., Hu, B., et al.: Mining Top-k distinguishing sequential patterns with gap constraint. J. Softw. 26(11), 2994–3009 (2015). (in Chinese)

    MathSciNet  MATH  Google Scholar 

  15. Wang, H., Duan, L., Zuo, J., et al.: Efficient mining of distinguishing sequential patterns without a predefined gap constraint. Chin. J. Comput. 39(10), 1979–1991 (2016). (in Chinese)

    MathSciNet  Google Scholar 

  16. Wu, Y., Tong, Y., Zhu, X., et al.: NOSEP: nonoverlapping sequence pattern mining with gap constraints. IEEE Trans. Cybern. (2017). https://doi.org/10.1109/TCYB.2017.2750691

    Article  Google Scholar 

  17. Min, F., Wu, Y., Wu, X.: The Apriori property of sequence pattern mining with wildcard gaps. Int. J. Funct. Inform. Pers. Med. 4(1), 15–31 (2012)

    Google Scholar 

  18. Zhang, M., Kao, B., Cheung, D., et al.: Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2), 7 (2007)

    Article  Google Scholar 

  19. Zhang, L., Luo, P., Tang, L., et al.: Occupancy-based frequent pattern mining. ACM Trans. Knowl. Discov. Data (TKDD) 10(2), 14 (2015)

    Google Scholar 

  20. Wu, Y., Liu, D., Jiang, H.: Length-changeable incremental extreme learning machine. J. Comput. Sci. Technol. 32(3), 630–643 (2017)

    Article  MathSciNet  Google Scholar 

  21. Egho, E., Gay, D., Boulle, M., et al.: A parameter-free approach for mining robust sequential classification rules. Knowl. Inf. Syst. 52(1), 53–81 (2017)

    Article  Google Scholar 

  22. Wu, Y., Shen, C., Jiang, H., et al.: Strict pattern matching under non-overlapping condition. Sci. China Inf. Sci. 60(1), 012101 (2017)

    Article  Google Scholar 

  23. Yen, S., Lee, Y.: Mining non-redundant time-gap sequential patterns. Appl. Intell. 39(4), 727–738 (2013)

    Article  MathSciNet  Google Scholar 

  24. Wu, Y., Wu, X., Min, F., et al.: A Nettree for pattern matching with flexible wildcard constraints. In: International Conference on Information Reuse and Integration, pp. 109–114 (2010)

  25. Wu, Y., Tang, Z., Jiang, H., et al.: Approximate pattern matching with gap constraints. J. Inf. Sci. 42(5), 639–658 (2016)

    Article  Google Scholar 

  26. Wu, Y., Fu, S., Jiang, H., et al.: Strict approximate pattern matching with general gaps. Appl. Intell. 42(3), 566–580 (2015)

    Article  Google Scholar 

  27. Fradkin, D., Mörchen, F.: Mining sequential patterns for classification. Knowl. Inf. Syst. 45(3), 731–749 (2015)

    Article  Google Scholar 

  28. Zhou, C., Cule, B., Goethals, B.: Pattern based sequence classification. IEEE Trans. Knowl. Data Eng. 28(5), 1285–1298 (2016)

    Article  Google Scholar 

  29. Fong, S., Wong, R., Vasilakos, A.: Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans. Serv. Comput. 9(1), 33–45 (2016)

    Google Scholar 

Download references

Acknowledgements

The work was supported in part by the National Natural Science Foundation of China under Grant 61673159, in part by the Natural Science Foundation of Hebei Province under Grant F2016202145, in part by the Science and the Technology Project of Hebei Province under Grant 15210325, and in part by the Graduate Student Innovation Program of Hebei Province under Grant CXZZSS2017037.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Wang, Y., Liu, J. et al. Mining distinguishing subsequence patterns with nonoverlapping condition. Cluster Comput 22 (Suppl 3), 5905–5917 (2019). https://doi.org/10.1007/s10586-017-1671-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1671-0

Keywords

Navigation