Abstract
This work presents a novel approach to multivariate time series classification. The method exploits the multivariate structure of the time series and the possibilities of the stacking ensemble method. The basics of the method may be described in three steps: first, decomposing the multivariate time series on its constituent univariate time series; second, inducing a classifier for each univariate time series plus and additional multivariate classifier for the whole time series; third, creating the final multivariate time series classifier stacking the previous classifiers. The ensemble obtained has the potential to improve the accuracy of the single multivariate time series classifier. Several configurations of the stacking method have been tested on seven multivariate time series data sets. In five out of seven data sets, the proposed method obtains the smallest error rate. Moreover, in two out of seven data sets, stacking only the univariate time series classifiers provides the best results. The experimental results show that when a multivariate time series method does not produce an accurate classifier, stacking it with univariate time series classifiers is an alternative worthy of consideration.
Similar content being viewed by others
Notes
Auslan, Japanese vowels and pendigits are available at the UCI repository [30]. Auslan is available at http://sites.google.com/site/waleedkadous/data-1. ECG and wafer are available at http://www.cs.cmu.edu/bobski/data/data.html.
References
Agrawal R, Lin KI, Sawhney HS, Shim K (1995) Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In: Dayal U, Gray PMD, Nishio S (eds) VLDB’95, Proceedings of 21th international conference on very large data bases, Zurich, September 1995. Morgan Kaufmann, Massachusetts, pp 490–501
Alimoglu F (1996) Combining multiple classifiers for pen-based handwritten digit recognition. Master’s thesis, Institute of Graduate Studies in Science and Engineering, Bogazici University, Istanbul
Alimoglu F, Alpaydin E (2001) Combining multiple representations for pen-based handwritten digit recognition. ELEKTRIK Turk J Electr Eng Comput Sci 9(1):1–12
Alonso C, Prieto O, Rodríguez JJ, Bregón A (2008) Multivariate time series classification via stacking of univariate classifiers. In: Okun O, Valentini G (eds) Supervised and unsupervised ensemble methods and their applications, Springer, New York, pp 135–152
Bahlmann C, Haasdonk B, Burkhardt H (2002) On-line handwriting recognition with support vector machines: a kernel approach. In: Proceedings of the 8th International workshop on frontiers in handwriting recognition (IWFHR), pp 49–54
Bankó Z, Abonyi J (2012) Correlation based dynamic time warping of multivariate time series. Expert Syst Appl 39(17):12,814–12,823
Baum LE, Petrie T (1966) Statistical inference for probabilistic functions of finite state markov chains. Ann Math Stat 27:1554–1563
Box G, Jenkins G (1976) Time series analysis forecasting and control. Prentice-Hall, USA
Box G, Jenkings G, Reinsel G (1994) Time series analysis, forecasting and control, 3rd edn. Prentice Hall, Englewood Cliffs
Bratko I, Mozetič I, Lavrač N (1989) KARDIO: a study in deep and qualitative knowledge for expert systems. MIT Press, Cambridge
Bregón A, Simón A, Rodríguez JJ, Alonso CJ, Pulido B, Moro I (2006) Early fault classification in dynamic systems using case-based reasoning. In: Marín R, Onaindía E, Bugarín A, Santos J (eds) Current topics in artificial intelligence. 11th conference of the Spanish association for artificial intelligence, revised selected papers. Lecture notes in artificial intelligence, vol 4177. Springer, New York, pp 211–220
Campbell JP (1997) Speaker recognition : a tutorial. Proc IEEE 85(9):1437–1462
Chakrabarti C, Rammohan R, Luger GF (2005) A first-order stochastic prognostic system for the diagnosis of helicopter rotor systems for the us navy. In: Prasad B (ed) IICAI, pp 3645–3656
Chaovalitwongse W, Pardalos P (2008) On the time series support vector machine using dynamic time warping kernel for brain activity classification. Cybern Syst Anal 44(1):125–138. doi:10.1007/s10559-008-0012-y
Chen L, Kamel MS (2007) A new design of multiple classifier system and its application to the classification of time series data. In: Proceedings of the ISIC. IEEE international conference on systems, man and cybernetics, pp 385–391
Chen L, Kamel M, Jiang J (2004) A modular system for the classification of time series data. In: Roli F, Kittler J, Windeatt T (eds) Multiple classifier systems, pp 134–143
Clancey WJ (1985) Heuristic classification. Artif Intell 27:289–350
Cohen W (1995) Learning to classify english text with ilp methods. In: Raedt LD (ed) Proceedings of the 5th international workshop on inductive logic programming, pp 3–24
Console L, Picardi C, Dupre DT (2003) Temporal decision trees: model-based diagnosis of dynamic systems on-board. J Artif Intell Res 19:469–512
Cuturi M, Vert JP, Birkenes O, Matsui T (2007) A kernel for time series based on global alignments. In: Acoustics, speech and signal processing. Proceedings of the IEEE international conference on ICASSP 2007, vol 2. pp II-413–II-416. doi:10.1109/ICASSP.2007.366260
Das G, Lin KI, Mannila H, Renganathan G, Smyth P (1998) Rule discovery from time series. In: Proceedings of the 4th international conference of knowledge discovery and data mining. AAAI Press, California, pp 16–22
Dash M, Liu H (1997) Feature selection for classification. Int J Intell Data Anal 1(4):131–156
Dean T, Kanazaba K (1989) A model for reasoning about persistence and causation. Comput Intell 5(3):142–150
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Dietrich C, Schwenker F, Palm G (2001) MCS2001, chap classification of time series utilizing temporal and decision fusion. Springer, New York, pp 378–387
Dietterich TG (1997) Machine-learning research. AI Mag 18(4):97–136
Dong M, He D (2007) Hidden semi-Markov model-based methodology for multi-sensor equipment health diagnosis and prognosis. Eur J Oper Res 178:858–878
Dousson C, Duong TV (1999) Discovering chronicles with numerical time constraints from alarm logs for monitoring dynamic systems. In: Thomas D (ed) Proceedings of the 16th international joint conference on artificial intelligence (IJCAI-99), vol 1. pp 620–626
Esmael B, Arnaout A, Fruhwirth RK, Thonhauser G (2012) Multivariate time series classification by combining trend-based and value-based approximations. In: Computational science and Its applications—ICCSA 2012. Lecture notes in computer science, vol 7336. pp 392–403. doi:10.1007/978-3-642-31128-4_29
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fu Tc (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181. doi:10.1016/j.engappai.2010.09.007
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. Syst Man Cybern Part C Appl Rev IEEE Trans 42(4):463–484. doi: 10.1109/TSMCC.2011.2161285
García S, Herrera F (2008) An extension on "statistical comparisons of classifiers over multiple data sets" for all pairwise comparisons. J Mach Learn Res 9:2677–2694. http://www.jmlr.org/papers/volume9/garcia08a/garcia08a.pdf
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining. Exp Analf Power Inf Sci 180(10):2044–2064. doi:10.1016/j.ins.2009.12.010
van Gerven MA, Taal BG, Lucas PJ (2008) Dynamic bayesian networks as prognostic models for clinical patient management. J Biomed Inform 41(4):515–529
Geurts P, Wehenkel L (2005) Segment and combine approach for non-parametric time-series classification. Knowl Discov Databases PKDD 478–485. doi:10.1007/11564126_48
Ghalwash MF, Obradovic Z (2012) Early classification of multivariate temporal observations by extraction of interpretable shape lets. BMC Bio Inform 13:195
Ghosh J, Deuser L, Beck S (1992) A neural network based hybrid system for detection, characterization and classification of short-duration oceanic signals. IEEE J Ocean Eng 17(4):351–363
Hansen JV, Nelson RD (2002) Data mining of time series using stacked generalizers. Neurocomputing 43:173–184
Huang YS, Suen C (1995) A method of combining multiple experts forf the recognition of unconstrained handwritten numerals. IEEE Trans Pattern Anal Mach Intell 17(1):90–94
Jelinek F (1997) Statistical methods for speech recognition. MIT Press, Cambridge
Kadous MW (2002) Temporal classification: extending the classification paradigm to multivariate time series. PhD thesis, The University of New South Wales, School of Computer Science and Engineering. http://sites.google.com/site/waleedkadous/publications
Kadous MW, Sammut C (2005) Classification of multivariate time series and structured data using constructive induction. Mach Learn 58(2–3):179–216
Keogh E, Pazzani M (2001) Derivative dynamic time warping. In: Proceedings of the first SIAM international conference on data mining. http://www.cs.ucr.edu/eamonn/sdm01.pdf
Keogh E, Ratanamahatana CA (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3):358–386
Kudo M, Toyama J, Shimbo M (1999) Multidimensional curve classification using passing-through regions. Pattern Recognit Lett 20(11–13):1103–1111
Kuncheva L (2004) Combining pattern classifiers: methods and algorithms. Wiley-Interscience, New York
Kuncheva LI (2001) Combining classifiers: soft computing solutions. In: Pal SK (eds) Pattern recognition: from classical to modern approaches, World Scientific, Singapore, pp 427–452
Kuncheva LI (2005) Diversity in multiple classifier systems. Inf Fusion 6(1):3–4
Lin HT, Li L (2007) Support vector machinery for infinite ensemble learning. J Mach Learn Res 9:285–312. http://www.jmlr.org/papers/volume9/lin08a/lin08a.pdf
Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: DMKD ’03 Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, ACM Press, New York, pp 2–11. http://portal.acm.org/citation.cfm?id=882086
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15:107–144
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 14(4):491–502
Mikut R, Burmeister O, Groll L, Reischl M (2008) Takagi–Sugeno–Kang Fuzzy classifiers for a special class of time-varying systems. Fuzzy Syst IEEE Trans 16(4):1038–1049. doi:10.1109/TFUZZ.2008.917291
Minnen D, Zang P, Isbell C, Starner T (2007) Boosting diverse learners for domain agnostic time series classification. In: Proceedings of the workshop and challenge on time series classification at SIGKDD
Nadeau C, Bengio Y (2003) Inference for the generalization error. Mach Learn 52(3):239–281
Nakamura T, Taki K, Nomiya H, Seki K, Uehara K (2012) A shape-based similarity measure for time series data with ensemble learning. Pattern Anal Appl (in press). doi:10.1007/s10044-011-0262-6
Olszewski RT (2001) Generalized feature extraction for structural pattern recognition in time-series data. PhD thesis, Computer Science Department, Carnegie Mellon University. http://reports-archive.adm.cs.cmu.edu/anon/2001/abstracts/01-108.html
Orsenigo C, Vercellis C (2010) Combining discrete svm and fixed cardinality warping distances for multivariate time series classification. Pattern Recognit 43(11):3787–3794
Papadimitriou S, Sun J, Faloutsos C (2005) Streaming pattern discovery in multiple time-series. In: VLDB ’05: Proceedings of the 31st international conference on very large data bases, VLDB Endowment, pp 697–708
Patton R, Chen J, Siew T (1994) Fault diagnosis in nonlinear dynamic systems via neural networks. In: Proceedings of the IEEE international conference control’94, vol 2. pp 1346–1351
Petitjean B, Barut S, Rolet S, Simonet D (2006) Damage detection on aeorespace structures. In: Gemes A (ed) Proceedings of the third European workshop on structural health monitoring, pp 159–166
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE, vol 77(2)
Rodríguez JJ, Alonso CJ (2004) Support vector machines of interval-based features for time series classification. In: Bramer M, Coenen F, Allen T (eds) Research and develpment in intelligent systems XXI, Springer, New York, pp 244–257
Rodríguez JJ, Alonso CJ, Boström H (2001) Boosting interval based literals. Intell Data Anal 5(3):245–262
Rodríguez JJ, Alonso CJ, Maestro JA (2005) Support vector machines of interval-based features for time series classification. Knowl-Based Syst 18(4–5):171–178. doi:10.1016/j.knosys.2004.10.007
Ron D, Singer Y, Tishby N (1998) On the learn ability and usage of acyclic probabilistic finite automata. J Comput Syst Sci 56:133–152
Rooney N, Patterson DW, Nugent CD (2007) Non-strict heterogeneous stacking. Pattern Recognit Lett 28(9):1050–1061
Roverso D (2000) Multivariate temporal classification by windowed wavelet decomposition and recurrent neural networks. In: Paper presented at the 3rd ANS international topical meeting on nuclear plant instrumentation, control and human-machine interface
Roychoudhury I, Biswas G, Koutsoukos X (2008) Comprehensive diagnosis of continuous systems using dynamic bayes nets. In: Proceedings of the 19th international workshop on principles of diagnosis
Schreiber G, Akkermans H, Anjewierden A, de Hoog R, Shadbolt N, de Velde WV, Wielinga B (1999) Knowledge engineering and management, the CommonKADS methodology. The MIT Press, Cambridge
Sivaramakrishnan KR, Karthik K, Bhattacharyya C (2007) Kernels for large margin time-series classification. In: Proceedings of the international joint conference on neural networks, IJCNN 2007, pp 2746–2751. doi:10.1109/IJCNN.2007.4371393
Strickert M, Hammer B (2005) Merge som for temporal data. Neurocomputing 64:39–71. doi:10.1016/j.neucom.2004.11.014
Ting KM, Witten IH (1999) Issues in stacked generalization. J Artif Intell Res (JAIR) 10:271–289
Vincent RD, Pineau J, de Guzman P, Avoli M (2007) Recurrent boosting for classification of natural and synthetic time-series data. In: Proceedings of the Canadian conference on AI, pp 192–203
Weng X, Shen J (2008a) Classification of multivariate time series using locality preserving projections. Knowl-Based Syst 21(7):581–587
Weng X, Shen J (2008b) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl-Based Syst 21(7):535–539
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Yoon H, Yang K, Shahabi C (2005) Feature subset selection and feature ranking for multivariate time series. Trans Knowl Data Eng 17(9):1186–1198. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1490526
Zang P, Isbell C (2007) Managing domain knowledge and multiple models with boosting. In: Proceedings of international joint conference on artificial intelligence, pp 1144–1149
Zhang X, Xiao X, Tao L, Xu G (2007) Real-time situation detection based on rao-blackwellized particle filters in meetings. In: Proceedings of the 2007 IEEE international conference on robotics and biomimetics
Zhao JH, Dong ZY, Xu Z (2006) Effective feature preprocessing for time series forecasting. In: ADMA, pp 769–781
Zhao ZYJ, Sun J, Ge SS (2007) High performance quadratic classifier and the application on pendigits recognition. In: Proceedings of the 46th IEEE conference on decision and control, pp 3072–3077. doi:10.1109/CDC.2007.4434191
Acknowledgements
We express our gratitude to the donors of the different data sets.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Prieto, O.J., Alonso-González, C.J. & Rodríguez, J.J. Stacking for multivariate time series classification. Pattern Anal Applic 18, 297–312 (2015). https://doi.org/10.1007/s10044-013-0351-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-013-0351-9