Abstract
The goal of early classification of time series is to predict the class value of a sequence early in time, when its full length is not yet available. This problem arises naturally in many contexts where the data is collected over time and the label predictions have to be made as soon as possible. In this work, a method based on probabilistic classifiers is proposed for the problem of early classification of time series. An important feature of this method is that, in its learning stage, it discovers the timestamps in which the prediction accuracy for each class begins to surpass a pre-defined threshold. This threshold is defined as a percentage of the accuracy that would be obtained if the full series were available, and it is defined by the user. The class predictions for new time series will only be made in these timestamps or later. Furthermore, when applying the model to a new time series, a class label will only be provided if the difference between the two largest predicted class probabilities is higher than or equal to a certain threshold, which is calculated in the training step. The proposal is validated on 45 benchmark time series databases and compared with several state-of-the-art methods, and obtains superior results in both earliness and accuracy. In addition, we show the practical applicability of our method for a real-world problem: the detection and identification of bird calls in a biodiversity survey scenario.
Similar content being viewed by others
Notes
ECTS and EDCS: http://zhengzhengxing.blogspot.com.es/p/research.html.
The p values for these tests can be seen in http://www.sc.ehu.es/ccwbayes/members/umori/ECDIRE/pvalues.
References
Bregón A, Simón MA, Rodríguez JJ, Alonso C, Pulido B, Moro I (2006) Early fault classification in dynamic systems using case-based reasoning. In: CAEPIA’05-Proceedings of the 11th Spanish association conference on current topics in artificial intelligence. pp 211–220
Calvo B, Santafé G (2015) scmamp: Statistical comparison of multiple algorithms in multiple problems. R package version 0.2.2. https://github.com/b0rxa/scmamp
Collar NJ (2001) Chrysomma altirostre. In: Collar NJ, Andreev A, Chan S, Subramanya S, Tobias J, Tobias J (eds) Threatened birds of Asia: the birdlife international red data book. BirdLife International, Cambridge, pp 2112–2119
Dell’Amore C (2015) ’Extinct’ bird rediscovered in Myanmar, surprising scientists. http://news.nationalgeographic.com/news/2015/03/150305-birds-extinct-rediscovered-myanmar-burma-animals-science/
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Evans RS, Kuttler KG, Simpson KJ, Howe S, Crossno PF, Johnson KV, Schreiner MN, Lloyd JF, Tettelbach WH, Keddington RK, Tanner A, Wilde C, Clemmer TP (2015) Automated detection of physiologic deterioration in hospitalized patients. J Am Med Inform Assoc 22(2):350–60. http://www.ncbi.nlm.nih.gov/pubmed/25164256
Gaber MM, Zaslavsky A, Shonali K (2007) A survey of classification methods in data streams. In: Data streams. Vol. 31. pp 39–59. http://link.springer.com/chapter/10.1007/978-0-387-47534-9_3
Ghalwash MF, Radosavljevic V, Obradovic Z (2014) Utilizing temporal patterns for estimating uncertainty in interpretable early decision making. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining—KDD ’14. ACM Press, New York, pp 402–411
Ghalwash MF, Ramljak D, Obradovic Z (2012) Early classification of multivariate time series using a hybrid HMM/SVM model. In: IEEE international conference on bioinformatics and biomedicine. pp 1–6
Girolami M, Rogers S (2006) Variational Bayesian multinomial probit regression with Gaussian process priors. Neural Comput 18:1790–1817
Graepel T, Herbrich R, Bollmann-sdorra P, Obermayert K (1998) Classification on pairwise proximity data. NIPS. The MIT Press, Cambridge, pp 438–444
Hatami N, Chira C (2013) Classifiers with a reject option for early time-series classification. In: IEEE symposium on computational intelligence and ensemble learning (CIEL). pp 9–16
He G, Duan Y, Peng R, Jing X, Qian T, Wang L (2015) Early classification on multivariate time series. Neurocomputing 149:777–787
Kadous MW, Sammut C (2005) Classification of multivariate time series and structured data using constructive induction. Mach Learn 58(2–3):179–216
Kate RJ (2015) Using dynamic time warping distances as features for improved time series classification. Data Mining and Knowledge Discovery. http://link.springer.com/10.1007/s10618-015-0418-x
Keogh E, Zhu Q, Hu B, Y., H., Xi X, Wei L, Ratanamahatana CA (2011) The UCR time series classification/clustering homepage. www.cs.ucr.edu/~eamonn/time_series_data/
Kogan Ja, Margoliash D (1998) Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden Markov models: a comparative study. J Acoust Soc Am 103(4):2185–2196
Lama N, Girolami M (2014) vbmp: variational Bayesian multinomial probit regression. R package version 1.34.0. http://bioinformatics.oxfordjournals.org/cgi/content/short/btm535v1
Lara OD, Labrador MA (2012) A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials, pp 1–18. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6365160
Li C, Khan L, Prabhakaran B (2006) Feature selection for classification of variable length multiattribute motions. Knowl Inf Syst 10(2):163–183
Parrish N, Anderson HS, Hsiao DY (2013) Classifying with confidence from incomplete information. J Mach Learn Res 14:3561–3589
Pree H, Herwig B, Gruber T, Sick B, David K, Lukowicz P (2014) On general purpose time series similarity measures and their use as kernel functions in support vector machines. Inf Sci 281:478–495
Putter J (1955) The treatment of ties in some nonparametric tests. Ann Math Stat 26(3):368–386
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. The MIT Press, Cambridge. www.GaussianProcess.org/gpml
Rodríguez JD, Pérez A, Lozano JA (2013) A general framework for the statistical analysis of the sources of variance for classification error estimators. Pattern Recognit 46(3):855–864
Stathopoulos V, Zamora-Gutierrez V, Jones KE, Girolami M (2014) Bat call identification with Gaussian process multinomial probit regression and a dynamic time warping kernel. In: Proceedings of the 17th international conference on artificial intelligence and statistics. Vol. 33, pp 913–921
Ulanova L, Begum N, Keogh E (2015) Scalable clustering of time series with U-shapelets. In: SIAM international conference on data mining (SDM 2015)
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2012) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309
Xeno-canto Foundation (2005) xeno-canto: Compartiendo cantos de aves de todo el mundo. http://www.xeno-canto.org/
Xing Z, Pei J, Keogh E (2010) A brief survey on sequence classification. ACM SIGKDD Explor Newsl 12(1):40
Xing Z, Pei J, Yu PS (2011a) Early classification on time series. Knowl Inf Syst 31(1):105–127
Xing Z, Yu PS, Wang K (2011b) Extracting interpretable features for early classification on time series. In: Proceedings of the eleventh SIAM international conference on data mining. pp 247–258
Ye L, Keogh E (2009) Time series shapelets : a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. pp 947–956
Acknowledgments
We are deeply grateful to Jerónimo Hernández-González for his helpful comments. Also, thanks to Nurjahan Begum and Liudmila Ulanova for their useful advice, and for formatting and preparing the data used in the bird call identification case study. We would also like to thank the UCR archive and the Xeno-Canto Foundation for providing access to the data used in this study. This work has been partially supported by the Saiotek and IT-609-13 programs (Basque Government), TIN2013-41272P (Spanish Ministry of Science and Innovation) and by the NICaiA Project PIRSES-GA-2009-247619 (European Commission). Usue Mori holds a grant from the University of the Basque Country UPV/EHU.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Toon Calders.
Rights and permissions
About this article
Cite this article
Mori, U., Mendiburu, A., Keogh, E. et al. Reliable early classification of time series based on discriminating the classes over time. Data Min Knowl Disc 31, 233–263 (2017). https://doi.org/10.1007/s10618-016-0462-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-016-0462-1