Skip to main content
Log in

Reliable early classification of time series based on discriminating the classes over time

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The goal of early classification of time series is to predict the class value of a sequence early in time, when its full length is not yet available. This problem arises naturally in many contexts where the data is collected over time and the label predictions have to be made as soon as possible. In this work, a method based on probabilistic classifiers is proposed for the problem of early classification of time series. An important feature of this method is that, in its learning stage, it discovers the timestamps in which the prediction accuracy for each class begins to surpass a pre-defined threshold. This threshold is defined as a percentage of the accuracy that would be obtained if the full series were available, and it is defined by the user. The class predictions for new time series will only be made in these timestamps or later. Furthermore, when applying the model to a new time series, a class label will only be provided if the difference between the two largest predicted class probabilities is higher than or equal to a certain threshold, which is calculated in the training step. The proposal is validated on 45 benchmark time series databases and compared with several state-of-the-art methods, and obtains superior results in both earliness and accuracy. In addition, we show the practical applicability of our method for a real-world problem: the detection and identification of bird calls in a biodiversity survey scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.sc.ehu.es/ccwbayes/members/umori/ECDIRE/ECDIRE.html.

  2. http://www.sc.ehu.es/ccwbayes/members/umori/ECDIRE/parameters.

  3. ECTS and EDCS: http://zhengzhengxing.blogspot.com.es/p/research.html.

  4. Rel.Class: http://www.mayagupta.org/publications/Early_Classification_For_Web.zip.

  5. http://www.sc.ehu.es/ccwbayes/members/umori/ECDIRE/pvalues.

  6. http://www.sc.ehu.es/ccwbayes/members/umori/ECDIRE/ECDIRE.html.

  7. http://www.sc.ehu.es/ccwbayes/members/umori/ECDIRE/runtimes.

  8. The p values for these tests can be seen in http://www.sc.ehu.es/ccwbayes/members/umori/ECDIRE/pvalues.

  9. http://www.sc.ehu.es/ccwbayes/members/umori/ECDIRE/runtimes.

References

  • Bregón A, Simón MA, Rodríguez JJ, Alonso C, Pulido B, Moro I (2006) Early fault classification in dynamic systems using case-based reasoning. In: CAEPIA’05-Proceedings of the 11th Spanish association conference on current topics in artificial intelligence. pp 211–220

  • Calvo B, Santafé G (2015) scmamp: Statistical comparison of multiple algorithms in multiple problems. R package version 0.2.2. https://github.com/b0rxa/scmamp

  • Collar NJ (2001) Chrysomma altirostre. In: Collar NJ, Andreev A, Chan S, Subramanya S, Tobias J, Tobias J (eds) Threatened birds of Asia: the birdlife international red data book. BirdLife International, Cambridge, pp 2112–2119

    Google Scholar 

  • Dell’Amore C (2015) ’Extinct’ bird rediscovered in Myanmar, surprising scientists. http://news.nationalgeographic.com/news/2015/03/150305-birds-extinct-rediscovered-myanmar-burma-animals-science/

  • Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Evans RS, Kuttler KG, Simpson KJ, Howe S, Crossno PF, Johnson KV, Schreiner MN, Lloyd JF, Tettelbach WH, Keddington RK, Tanner A, Wilde C, Clemmer TP (2015) Automated detection of physiologic deterioration in hospitalized patients. J Am Med Inform Assoc 22(2):350–60. http://www.ncbi.nlm.nih.gov/pubmed/25164256

  • Gaber MM, Zaslavsky A, Shonali K (2007) A survey of classification methods in data streams. In: Data streams. Vol. 31. pp 39–59. http://link.springer.com/chapter/10.1007/978-0-387-47534-9_3

  • Ghalwash MF, Radosavljevic V, Obradovic Z (2014) Utilizing temporal patterns for estimating uncertainty in interpretable early decision making. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining—KDD ’14. ACM Press, New York, pp 402–411

  • Ghalwash MF, Ramljak D, Obradovic Z (2012) Early classification of multivariate time series using a hybrid HMM/SVM model. In: IEEE international conference on bioinformatics and biomedicine. pp 1–6

  • Girolami M, Rogers S (2006) Variational Bayesian multinomial probit regression with Gaussian process priors. Neural Comput 18:1790–1817

    Article  MathSciNet  MATH  Google Scholar 

  • Graepel T, Herbrich R, Bollmann-sdorra P, Obermayert K (1998) Classification on pairwise proximity data. NIPS. The MIT Press, Cambridge, pp 438–444

    Google Scholar 

  • Hatami N, Chira C (2013) Classifiers with a reject option for early time-series classification. In: IEEE symposium on computational intelligence and ensemble learning (CIEL). pp 9–16

  • He G, Duan Y, Peng R, Jing X, Qian T, Wang L (2015) Early classification on multivariate time series. Neurocomputing 149:777–787

    Article  Google Scholar 

  • Kadous MW, Sammut C (2005) Classification of multivariate time series and structured data using constructive induction. Mach Learn 58(2–3):179–216

    Article  Google Scholar 

  • Kate RJ (2015) Using dynamic time warping distances as features for improved time series classification. Data Mining and Knowledge Discovery. http://link.springer.com/10.1007/s10618-015-0418-x

  • Keogh E, Zhu Q, Hu B, Y., H., Xi X, Wei L, Ratanamahatana CA (2011) The UCR time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/

  • Kogan Ja, Margoliash D (1998) Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden Markov models: a comparative study. J Acoust Soc Am 103(4):2185–2196

    Article  Google Scholar 

  • Lama N, Girolami M (2014) vbmp: variational Bayesian multinomial probit regression. R package version 1.34.0. http://bioinformatics.oxfordjournals.org/cgi/content/short/btm535v1

  • Lara OD, Labrador MA (2012) A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials, pp 1–18. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6365160

  • Li C, Khan L, Prabhakaran B (2006) Feature selection for classification of variable length multiattribute motions. Knowl Inf Syst 10(2):163–183

    Article  Google Scholar 

  • Parrish N, Anderson HS, Hsiao DY (2013) Classifying with confidence from incomplete information. J Mach Learn Res 14:3561–3589

    MathSciNet  MATH  Google Scholar 

  • Pree H, Herwig B, Gruber T, Sick B, David K, Lukowicz P (2014) On general purpose time series similarity measures and their use as kernel functions in support vector machines. Inf Sci 281:478–495

    Article  Google Scholar 

  • Putter J (1955) The treatment of ties in some nonparametric tests. Ann Math Stat 26(3):368–386

    Article  MathSciNet  MATH  Google Scholar 

  • Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, Cambridge. www.GaussianProcess.org/gpml

  • Rodríguez JD, Pérez A, Lozano JA (2013) A general framework for the statistical analysis of the sources of variance for classification error estimators. Pattern Recognit 46(3):855–864

    Article  Google Scholar 

  • Stathopoulos V, Zamora-Gutierrez V, Jones KE, Girolami M (2014) Bat call identification with Gaussian process multinomial probit regression and a dynamic time warping kernel. In: Proceedings of the 17th international conference on artificial intelligence and statistics. Vol. 33, pp 913–921

  • Ulanova L, Begum N, Keogh E (2015) Scalable clustering of time series with U-shapelets. In: SIAM international conference on data mining (SDM 2015)

  • Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2012) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309

    Article  MathSciNet  Google Scholar 

  • Xeno-canto Foundation (2005) xeno-canto: Compartiendo cantos de aves de todo el mundo. http://www.xeno-canto.org/

  • Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor Newsl 12(1):40

    Article  Google Scholar 

  • Xing Z, Pei J, Yu PS (2011a) Early classification on time series. Knowl Inf Syst 31(1):105–127

    Article  Google Scholar 

  • Xing Z, Yu PS, Wang K (2011b) Extracting interpretable features for early classification on time series. In: Proceedings of the eleventh SIAM international conference on data mining. pp 247–258

  • Ye L, Keogh E (2009) Time series shapelets : a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. pp 947–956

Download references

Acknowledgments

We are deeply grateful to Jerónimo Hernández-González for his helpful comments. Also, thanks to Nurjahan Begum and Liudmila Ulanova for their useful advice, and for formatting and preparing the data used in the bird call identification case study. We would also like to thank the UCR archive and the Xeno-Canto Foundation for providing access to the data used in this study. This work has been partially supported by the Saiotek and IT-609-13 programs (Basque Government), TIN2013-41272P (Spanish Ministry of Science and Innovation) and by the NICaiA Project PIRSES-GA-2009-247619 (European Commission). Usue Mori holds a grant from the University of the Basque Country UPV/EHU.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Usue Mori.

Additional information

Responsible editor: Toon Calders.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mori, U., Mendiburu, A., Keogh, E. et al. Reliable early classification of time series based on discriminating the classes over time. Data Min Knowl Disc 31, 233–263 (2017). https://doi.org/10.1007/s10618-016-0462-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-016-0462-1

Keywords

Navigation