Machine Learning

, Volume 108, Issue 8–9, pp 1421–1441 | Cite as

Temporal pattern attention for multivariate time series forecasting

  • Shun-Yao ShihEmail author
  • Fan-Keng Sun
  • Hung-yi Lee
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2019 Journal Track


Forecasting of multivariate time series data, for instance the prediction of electricity consumption, solar power production, and polyphonic piano pieces, has numerous valuable applications. However, complex and non-linear interdependencies between time steps and series complicate this task. To obtain accurate prediction, it is crucial to model long-term dependency in time series data, which can be achieved by recurrent neural networks (RNNs) with an attention mechanism. The typical attention mechanism reviews the information at each previous time step and selects relevant information to help generate the outputs; however, it fails to capture temporal patterns across multiple time steps. In this paper, we propose using a set of filters to extract time-invariant temporal patterns, similar to transforming time series data into its “frequency domain”. Then we propose a novel attention mechanism to select relevant time series, and use its frequency domain information for multivariate forecasting. We apply the proposed model on several real-world tasks and achieve state-of-the-art performance in almost all of cases. Our source code is available at


Multivariate time series Attention mechanism Recurrent neural network Convolutional neural network Polyphonic music generation 



This work was financially supported by the Ministry of Science and Technology of Taiwan.


  1. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. ICLR.Google Scholar
  2. Bloomfield, P. (1976). Fourier analysis of time series: An introduction. New York, NY: Wiley.zbMATHGoogle Scholar
  3. Bouchachia, A., & Bouchachia, S. (2008). Ensemble learning for time series prediction. Proceedings of the 1st international workshop on nonlinear dynamics and synchronization.Google Scholar
  4. Box, G. E., Reinsel, G. C., Jenkins, G. M., & Ljung, G. M. (2015). Time series analysis: Forecasting and control. Hoboken, NJ: Wiley.zbMATHGoogle Scholar
  5. Cao, L. J., & Tay, F. E. H. (2003). Support vector machine with adaptive parameters in financial time series forecasting. IEEE Transactions on Neural Networks, pp. 1506–1518.Google Scholar
  6. Chen, S., Wang, X. X., & Harris, C. J. (2008). Narxbased nonlinear system identification using orthogonal least squares basis hunting. IEEE Transactions on Control Systems, pp. 78–84.Google Scholar
  7. Cho, K., Bahdanau, D., Van Merrienboer, B., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:14091259.
  8. Chuan, C. H., & Herremans, D. (2018). Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation.
  9. Connor, J., Atlas, L. E., & Martin, D. R. (1991). Recurrent networks and NARMA modeling. Advances in Neural Information Processing Systems, pp. 301–308.Google Scholar
  10. Dasgupta, S., & Osogami, T. (2017). Nonlinear dynamic Boltzmann machines for time-series prediction.Google Scholar
  11. Dong, H.-W., Yang, L. C., Hsiao, W.-Y., & Yang, Y. H. (2018). MuseGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment.Google Scholar
  12. Elman, J. L. (1990). Finding structure in time. Cognitive Science, pp. 179–211.Google Scholar
  13. Frigola, R., & Rasmussen, C. E. (2014). Integrated pre-processing for Bayesian nonlinear system identification with Gaussian processes. IEEE Conference on Decision and Control, pp. 552–560.Google Scholar
  14. Frigola-Alcade, R. (2015). Bayesian time series learning with Gaussian processes. Ph.D. thesis, University of Cambridge.Google Scholar
  15. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. Scholar
  16. Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., et al. (1998). The empirical mode decomposition and Hilbert spectrum for nonlinear and nonstationary time series analysis. Proceedings of the Royal Society of London. Series A, 454, 903–995.MathSciNetCrossRefzbMATHGoogle Scholar
  17. Jain, A., & Kumar, A. M. (2007). Hybrid neural network models for hydrologic time series forecasting. Applied Soft Computing, 7(2), 585–592.CrossRefGoogle Scholar
  18. Kim, K. J. (2003). Financial time series forecasting using support vector machines. Neurocomputing, 55(1), 307–319.CrossRefGoogle Scholar
  19. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, pp. 1097–1105.Google Scholar
  20. Lai, G., Chang, W. C., Yang, Y., & Liu, H. (2018). Modeling long- and short-term temporal patterns with deep neural networks. SIGIR, pp. 95–104.Google Scholar
  21. LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks.Google Scholar
  22. Luong, T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 conference on empirical methods in natural language processing, pp. 1412–1421.Google Scholar
  23. Nicolas Boulanger-Lewandowski, Y. B., & Vincent, P. (2012). Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription.Google Scholar
  24. Qin, Y., Song, D., Cheng, H., Cheng, W., Jiang, G., & Cottrell, G. W. (2017). A dual-stage attention-based recurrent neural network for time series prediction. In IJCAI’17, pp. 2627–2633.
  25. Raffel, C. (2016). Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching. Ph.D. thesis.Google Scholar
  26. Rippel, O., Snoek, J., & Adams, R. P. (2015). Spectral representations for convolutional neural networks. NIPS, pp. 2449–2457.Google Scholar
  27. Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., & Aigrain, S. (2011). Gaussian processes for time-series modelling. Philosophical Transactions of the Royal Society A.Google Scholar
  28. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by backpropagating errors. Nature, pp. 533–536.Google Scholar
  29. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, pp. 3104–3112.Google Scholar
  30. Tong, H., & Lim, K. S. (2009). Threshold autoregression, limit cycles and cyclical data. In Exploration of a nonlinear world: An appreciation of Howell Tong’s contributions to statistics, World Scientific, pp. 9–56.Google Scholar
  31. Vapnik, V., Golowich, S. E., & Smola, A. (1997). Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems, pp. 281–287.Google Scholar
  32. Werbos, P. J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE, pp. 1550–1560.Google Scholar
  33. Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, pp. 159–175.Google Scholar
  34. Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting, pp. 35–62.Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.National Taiwan UniversityTaipeiTaiwan

Personalised recommendations