Abstract
As industrial systems become more complex and monitoring sensors for everything from surveillance to our health become more ubiquitous, multivariate time series prediction is taking an important place in the smooth-running of our society. A recurrent neural network with attention to help extend the prediction windows is the current-state-of-the-art for this task. However, we argue that their vanishing gradients, short memories, and serial architecture make RNNs fundamentally unsuited to long-horizon forecasting with complex data. Temporal convolutional networks (TCNs) do not suffer from gradient problems and they support parallel calculations, making them a more appropriate choice. Additionally, they have longer memories than RNNs, albeit with some instability and efficiency problems. Hence, we propose a framework, called PSTA-TCN, that combines a parallel spatio-temporal attention mechanism to extract dynamic internal correlations with stacked TCN backbones to extract features from different window sizes. The framework makes full use parallel calculations to dramatically reduce training times, while substantially increasing accuracy with stable prediction windows up to 13 times longer than the status quo.
Similar content being viewed by others
References
Liu F, Xue S, Wu J, Zhou C, Hu W, Paris C, Nepal S, Yang J, Yu PS (2020) Deep learning for community detection: progress, challenges and opportunities. arXiv:2005.08225
Zhao R, Yan R, Chen Z, Mao K, Wang P, Gao RX (2019) Deep learning and its applications to machine health monitoring. Mech Syst Signal Process 115:213–237
Christ M, Kempa-Liehr AW, Feindt M (2016) Distributed and parallel time series feature extraction for industrial big data applications. arXiv:1610.07717
Yan H, Wan J, Zhang C, Tang S, Hua Q, Wang Z (2018) Industrial big data analytics for prediction of remaining useful life based on deep learning. IEEE Access 6:17 190-17 197
Hou L, Bergmann NW (2012) Novel industrial wireless sensor networks for machine condition monitoring and fault diagnosis. IEEE Trans Instrum Meas 61(10):2787–2798
Xu Y, Sun Y, Wan J, Liu X, Song Z (2017) Industrial big data for fault diagnosis: taxonomy, review, and applications. IEEE Access 5:17 368-17 380
Wu J, Zhu X, Zhang C, Philip SY (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26(10):2382–2396
Wu J, Pan S, Zhu X, Cai Z (2014) Boosting for multi-graph classification. IEEE Trans Cybern 45(3):416–429
Li Y, Zhu Z, Kong D, Han H, Zhao Y (2019) Ea-lstm: evolutionary attention-based lstm for time series prediction. Knowl Based Syst 181:104785
Hua Y, Zhao Z, Li R, Chen X, Liu Z, Zhang H (2019) Deep learning with long short-term memory for time series prediction. IEEE Commun Mag 57(6):114–119
Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271
Hübner R, Steinhauser M, Lehle C (2010) A dual-stage two-phase model of selective attention. Psychol Rev 117:759–84
Liu Y, Gong C, Yang L, Chen Y (2020) Dstp-rnn: a dual-stage two-phase attention-based recurrent neural network for long-term and multivariate time series prediction. Expert Syst Appl 143:113082
Qin Y, Song D, Chen H, Cheng W, Jiang G, Cottrell G (2017) A dual-stage attention-based recurrent neural network for time series prediction. arXiv:1704.02971
Li H, Shen Y, Zhu Y (2018) Stock price prediction using attention-based multi-input lstm. In:Asian conference on machine learning. pp 454–469
Soares E, Costa P Jr, Costa B, Leite D (2018) Ensemble of evolving data clouds and fuzzy models for weather time series prediction. Appl Soft Comput 64:445–453
Zamora-Martínez F, Romeu P, Botella-Rocamora P, Pardo J (2014) On-line learning of indoor temperature forecasting models towards energy efficiency. Energy Build 83:162–172
Cornacchia M, Ozcan K, Zheng Y, Velipasalar S (2016) A survey on activity detection and classification using wearable sensors. IEEE Sens J 17(2):386–403
Candanedo LM, Feldheim V, Deramaix D (2017) Data driven prediction models of energy use of appliances in a low-energy house. Energy Build 140:81–97
Wang Y, Li H (2020) Industrial process time-series modeling based on adapted receptive field temporal convolution networks concerning multi-region operations. In: Computers and chemical engineering. p 106877
Liang Y, Ke S, Zhang J, Yi X, Zheng Y (2018)“Geoman: Multi-level attention networks for geo-sensory time series prediction.” in IJCAI, pp. 3428–3434
Hao H, Wang Y, Xia Y, Zhao J, Shen F (2020) Temporal convolutional attention-based network for sequence modeling. arXiv:2002.12530
Box GE, Pierce DA (1970) Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J Am Stat Assoc 65(332):1509–1526
Van Gestel T, Suykens JA, Baestaens D-E, Lambrechts A, Lanckriet G, Vandaele B, De Moor B, Vandewalle J (2001) Financial time series prediction using least squares support vector machines within the evidence framework. IEEE Trans Neural Netw 12(4):809–821
Han M, Xu M (2017) Laplacian echo state network for multivariate time series prediction. IEEE Trans Neural Netw Learn Syst 29(1):238–244
Sivakumar S, Sivakumar S (2017) Marginally stable triangular recurrent neural network architecture for time series prediction. IEEE Trans Cybern 99:1–15
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, London
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In Advances in neural information processing systems. pp 5998–6008
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Fan J, Wang H, Huang y, Zhang K, Zhao B (2020) Aedmts: an attention-based encoder-decoder framework for multi-sensory time series analytic. In: IEEE access, vol PP. pp 1–1, 02
Huang S, Wang D, Wu X, Tang A (2019) Dsanet: dual self-attention network for multivariate time series forecasting. In: Proceedings of the 28th ACM international conference on information and knowledge management. pp 2129–2132
Zhu J, Ge Z, Song Z, Gao F (2018) Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data. Ann Rev Control 46:107–133
Acknowledgements
This work was supported by a Grant from The National Natural Science Foundation of China (No. U1609211), National Key Research and Development Project (2019YFB1705100). The corresponding author is Baiping Chen.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fan, J., Zhang, K., Huang, Y. et al. Parallel spatio-temporal attention-based TCN for multivariate time series prediction. Neural Comput & Applic 35, 13109–13118 (2023). https://doi.org/10.1007/s00521-021-05958-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-05958-z