Skip to main content
Log in

Optimizing shapelets quality measure for imbalanced time series classification

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Time series classification has been considered as one of the most challenging problems in data mining and is widely used in a broad range of fields. A biased distribution leads to classification on minority time series objects more severe. A commonly taken approach is to extract or select the representative features to retain the structure of a time series object. However, when the data distribution is imbalanced, the traditional features cannot represent time series effectively, especially in multi-class environment. In this paper, Shapelets — a primitive time series mining technology — is applied to extract the most representative subsequences. Especially, we verify that IG (Information Gain) is unsuitable as a shapelet quality measure for imbalanced data sets. Nevertheless, we propose two quality measures for shapelets on imbalanced binary and multi-class problem respectively. Based on extracted shapelet features, we select the diversified top-k shapelets based on new quality measure to represent the top-k best features and achieve this procedure on map-reduce framework. Lastly, two oversampling methods based on shapelet features are proposed to re-balance the binary and multi-class time series data sets. We validated our methods on the benchmark data sets by comparing with the canonical classifiers and the state-of-the-art time series algorithms. It is verified that the proposed algorithms perform more competitive than the compared methods in statistical significance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://github.com/itoshikiQAQ/DISM-and-Multi_DISM-algorithm

References

  1. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):1–12

    Article  Google Scholar 

  2. Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying: mining of time series data experimental comparison of representations and distance measures. Proc VLDB Endow 1(2):1542–1552

    Article  Google Scholar 

  3. Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Mining Knowl Discov 22(1–2):149–182

    Article  MathSciNet  Google Scholar 

  4. Lin J, Keogh E, Li W, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Mining Knowl Discov 15(2):107–144

    Article  MathSciNet  Google Scholar 

  5. Lines J, Davis LM, Hills J, Bagnall A (2012) A shapelet transform for time series classification. In: Acm Sigkdd international conference on knowledge discovery & data mining

  6. Yan Q, Sun Q, Yan X (2016) Adapting ELM to time series classification: a novel diversified top-k shapelets extraction method. In: Databases theory and applications - 27th Australasian database conference, ADC, pp 215–227

  7. Deng H, Runger G, Tuv E, Vladimir M (2013) A time series forest for classification and feature extraction. Inf Sci 239(4):142– 153

    Article  MathSciNet  Google Scholar 

  8. Mohan S, Zhihai W (2018) Random Pairwise shapelets forest[C]. In: Advances in knowledge discovery and data mining, pp 68–80

  9. Collell G, Prelec D, Patil KR (2018) A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data[J]. Neurocomputing 275:330–340

    Article  Google Scholar 

  10. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357

    Article  Google Scholar 

  11. Han H, Wang W, Mao B (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Proceedings of advances in intelligent computing, pp 878–887

  12. Nitesh V, Chawla L (2003) SMOTEBoost: improving prediction of the minority class in boosting. In: Knowledge discovery in databases: PKDD 2003, pp 107–119

  13. Zhou C, Liu B, Wang S (2016) CMO-SMOTE: misclassification cost minimization oriented synthetic minority oversampling technique for imbalanced learning. In: International conference on intelligent human-machine systems & cybernetics

  14. He H, Yang B, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International joint conference on neural networks, pp 1322–1328

  15. Bo T, He H (2017) GIR-based ensemble sampling approaches for imbalanced learning. Pattern Recogn 71:306–319

    Article  Google Scholar 

  16. Zhang C, Guo J, Qi C, Jiang ZL, Xuan W (2018) EHSBoost: enhancing ensembles for imbalanced data-sets by evolutionary hybrid-sampling. In: International conference on security, pattern analysis, and cybernetics (SPAC)

  17. Braytee A, Hussain FK, Anaissi A, Kennedy PJ (2015) ABC-sampling for balancing imbalanced datasets based on artificial bee colony algorithm. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 594–599

  18. Kang Q, Chen X, Li S, Zhou M (2017) A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans Cybern 47(12):4263–4274

    Article  Google Scholar 

  19. Rivera WA (2017) Noise reduction a priori synthetic over-sampling for class imbalanced data sets. Inf Sci 408:146–161

    Article  Google Scholar 

  20. Zhang W, Kobeissi S, Tomko S, Challis C (2017) Adaptive sampling scheme for learning in severely imbalanced large scale data. In: Proceedings of the Ninth Asian conference on machine learning, pp 240–247

  21. Zhu T, Lin Y, Liu Y (2017) Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recogn 72:327–340

    Article  Google Scholar 

  22. Alejo R, Monroy-De-Jesús J, Ambriz-Polo JC, Pacheco-Sánchez JH (2017) An improved dynamic sampling back-propagation algorithm based on mean square error to face the multi-class imbalance problem[J]. Neural Comput Appl 1:1–15

    Google Scholar 

  23. García-Pedrajas N, Romero Del Castillo JA, Cerruela-García G (2017) A proposal for local k values for k-nearest neighbor rule. IEEE Trans Neural Netw Learn Syst 28(2):470–475

    Article  Google Scholar 

  24. Mullick SS, Datta S, Das S (2018) Adaptive learning-based k-nearest neighbor classifiers with resilience to class imbalance. IEEE Trans Neural Netw Learn Syst 29(11):5713–5725

    MathSciNet  Google Scholar 

  25. Deepak G, Bharat R (2018) Entropy based fuzzy least squares twin support vector machine for class imbalance learning. Appl Intell 48(11):4212–4231

    Article  Google Scholar 

  26. Xu Y, Wang Q (2018) Maximum margin of twin spheres machine with pinball loss for imbalanced data classification. Appl Intell 48(1):23–34. learning. Applied intelligence, 1–20

    Article  MathSciNet  Google Scholar 

  27. Lines J, Taylor S, Bagnall AJ (2018) Time Series Classification with HIVE-COTE: the hierarchical vote collective of transformation-based ensembles. TKDD 12(5):51–52

    Article  Google Scholar 

  28. Chen Z, Lin T (2018) A synthetic neighborhood generation based ensemble learning for the imbalanced data classification. Appl Intell 48(8):2441–2457

    Article  Google Scholar 

  29. Cao H, Li X-L, Woon Y-K, Ng S-K (2011) SPO: structure preserving oversampling for imbalanced time series classification. In: IEEE 11th international conference on data mining

  30. Cao H, Li XLi, Woon YK, Ng SK (2013) Integrated oversampling for imbalanced time series classification. IEEE Trans Knowl Data Eng 25(12):2809–2822

    Article  Google Scholar 

  31. Liang G, Zhang C (2012) An efficient and simple under-sampling technique for imbalanced time series classification. In: Acm International conference on information & knowledge management

  32. Liang G (2013) An effective method for imbalanced time series classification: hybrid sampling. In: Proceedings of the 26th Australasian joint conference on ai 2013: advances in artificial intelligence, pp 374–385

  33. Gong Z, Chen H (2016) Model-based oversampling for imbalanced sequence classification. In: CIKM, pp 1009–1018

  34. Ye L, Keogh EJ (2009) Time series shapelets: a new primitive for data mining. In: Acm Sigkdd international conference on knowledge discovery & data mining, pp 947–956

  35. He Q, Zhidong, Zhuang F , Shang T, Shi Z (2012) Fast time series classification based on infrequent shapelets. In: International conference on machine learning & applications, pp 215–219

  36. Zakaria J, Mueen A, Keogh E (2012) Clustering time series using unsupervised-shapelets. In: IEEE International conference on data mining, pp 785–794

  37. Dong YJ, Hai WZ, Meng H (2015) Shapelet pruning and shapelet coverage for time series classification. J Softw, 2311–2325

  38. Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. in: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1154–1162

  39. Hou L, Kwok JT, Zurada JM (2016) Efficient learning of timeseries shapelets. In: Thirtieth Aaai conference on artificial intelligence, pp 1209–1215

  40. The UCR Time Series Classification Archive. (2015) www.cs.ucr.edu/eamonn/time_series_data/

  41. Cao H, Tan, et al (2014) A parsimonious mixture of Gaussian trees model for oversampling in imbalanced and multimodal time-series classification. IEEE Trans Neural Netw Learn Syst 25(12):2226–2239

    Article  Google Scholar 

  42. Keerthi SS, Shevade SK, Bhattacharyya C, et al (2001) Improvements to Platt’s SMO algorithm for SVM classifier design. Neur Comput 13(3):637–649

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China(No.61876186),the Youth Science Foundation of China University of Mining and Technology under Grant No (2013QNB16).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiuyan Yan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, Q., Cao, Y. Optimizing shapelets quality measure for imbalanced time series classification. Appl Intell 50, 519–536 (2020). https://doi.org/10.1007/s10489-019-01535-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01535-z

Keywords

Navigation