Skip to main content
Log in

Transformers in Time-Series Analysis: A Tutorial

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

Transformer architectures have widespread applications, particularly in Natural Language Processing and Computer Vision. Recently, Transformers have been employed in various aspects of time-series analysis. This tutorial provides an overview of the Transformer architecture, its applications, and a collection of examples from recent research in time-series analysis. We delve into an explanation of the core components of the Transformer, including the self-attention mechanism, positional encoding, multi-head, and encoder/decoder. Several enhancements to the initial Transformer architecture are highlighted to tackle time-series tasks. The tutorial also provides best practices and techniques to overcome the challenge of effectively training Transformers for time-series analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Data Availability

The manuscript has no associated data.

References

  1. A.F. Agarap, Deep learning using rectified linear units (relu) (2018). arXiv:1803.08375

  2. S. Ahmed, D. Dera, S.U. Hassan, N. Bouaynaya, G. Rasool, Failure detection in deep neural networks for medical imaging. Front. Med. Technol. (2022). https://doi.org/10.3389/fmedt.2022.919046

    Article  Google Scholar 

  3. S. Albawi, T.A. Mohammed, S. Al-Zawi. Understanding of a convolutional neural network. in 2017 International Conference on Engineering and Technology (ICET) (IEEE, 2017), pp. 1–6. https://doi.org/10.1109/ICEngTechnol.2017.8308186

  4. A.A. Ariyo, A.O. Adewumi, C.K. Ayo, Stock price prediction using the arima model. in 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation (IEEE, 2014), pp. 106–112. https://doi.org/10.1109/UKSim.2014.67

  5. K. ArunKumar, D.V. Kalaga, C.M.S. Kumar, M. Kawaji, T.M. Brenza, Forecasting of COVID-19 using deep layer recurrent neural networks (RNNs) with gated recurrent units (GRUs) and long short-term memory (LSTM) cells. Chaos Solitons Fractals 146, 110861 (2021). https://doi.org/10.1016/j.chaos.2021.110861

    Article  MathSciNet  Google Scholar 

  6. J.L. Ba, J.R. Kiros, G.E. Hinton, Layer normalization (2016). arXiv:1607.06450

  7. T. Bachlechner, B.P. Majumder, H. Mao, G. Cottrell, J. McAuley, C. de Campos, M.H. Maathuis, (eds) ReZero is all you need: fast convergence at large depth. in Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence Vol. 161 of Proceedings of Machine Learning Research, ed by C. de Campos, M. H. Maathuis (PMLR, 2021), pp. 1352–1361

  8. D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate (2014). arXiv:1409.0473

  9. A. Bapna, M. Chen, O. Firat, Y. Cao, Y. Wu, Training deeper neural machine translation models with transparent attention. in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Brussels, Belgium, 2018), pp. 3028–3033. https://doi.org/10.18653/v1/D18-1338

  10. L. Behera, S. Kumar, A. Patnaik, On adaptive learning rate that guarantees convergence in feedforward networks. IEEE Trans. Neural Netw. 17(5), 1116–1125 (2006). https://doi.org/10.1109/TNN.2006.878121

    Article  Google Scholar 

  11. C. Bergmeir, R.J. Hyndman, B. Koo, A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput. Stat. Data Anal. 120, 70–83 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  12. L. Cai, K. Janowicz, G. Mai, B. Yan, R. Zhu, Traffic transformer: capturing the continuity and periodicity of time series for traffic forecasting. Trans. GIS 24(3), 736–755 (2020). https://doi.org/10.1111/tgis.12644

    Article  Google Scholar 

  13. Z. Che, S. Purushotham, K. Cho, D. Sontag, Y. Liu, Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 6085 (2018). https://doi.org/10.1038/s41598-018-24271-9

    Article  Google Scholar 

  14. G. Chen, A gentle tutorial of recurrent neural network with error backpropagation (2016). arXiv:1610.02583

  15. K. Chen, et al. NAST: non-autoregressive spatial-temporal transformer for time series forecasting (2021). arXiv:2102.05624

  16. L. Chen et al., Decision transformer: reinforcement learning via sequence modeling. Adv. Neural. Inf. Process. Syst. 34, 15084–15097 (2021)

    Google Scholar 

  17. W. Chen, et al. Learning to rotate: quaternion transformer for complicated periodical time series forecasting. in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22 (Association for Computing Machinery, New York, NY, USA, 2022), pp. 146–156. https://doi.org/10.1145/3534678.3539234

  18. K. Choromanski, et al. Rethinking attention with performers (2020). arXiv:2009.14794

  19. J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). arXiv:1412.3555

  20. Y.N. Dauphin, A. Fan, M. Auli, D. Grangier, D. Precup, Y.W. Teh, Y.W. (eds) Language modeling with gated convolutional networks. in Proceedings of the 34th International Conference on Machine Learning Vol. 70 of Proceedings of Machine Learning Research, ed. by D. Precup, Y.W. Teh (PMLR, 2017), pp. 933–941

  21. D. Dera, S. Ahmed, N.C. Bouaynaya, G. Rasool, Trustworthy uncertainty propagation for sequential time-series analysis in rnns. IEEE Trans. Knowl. Data Eng. (2023). https://doi.org/10.1109/TKDE.2023.3288628

    Article  Google Scholar 

  22. J. Devlin, M.W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference 1(Mlm), pp. 4171–4186 (2019)

  23. L. Di Persio, O. Honchar, Recurrent neural networks approach to the financial forecast of google assets. Int. J. Math. Comput. Simul. 11, 7–13 (2017)

    Google Scholar 

  24. M. Dixon, J. London, Financial forecasting with \(\alpha \)-rnns: a time series modeling approach. Front. Appl. Math. Stat. 6, 551138 (2021). https://doi.org/10.3389/fams.2020.551138

    Article  Google Scholar 

  25. A. Dosovitskiy, et al. An image is worth 16x16 words: transformers for image recognition at scale (2020). arXiv:2010.11929

  26. D. Dua, C. Graff, UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2017). http://archive.ics.uci.edu/ml

  27. J. El Zini, Y. Rizk, M. Awad, An optimized parallel implementation of non-iteratively trained recurrent neural networks. J. Artif. Intell. Soft Comput. Res. 11(1), 33–50 (2021). https://doi.org/10.2478/jaiscr-2021-0003

    Article  Google Scholar 

  28. H. Fei, F. Tan, Bidirectional grid long short-term memory (bigridlstm): a method to address context-sensitivity and vanishing gradient. Algorithms 11(11), 172 (2018). https://doi.org/10.3390/a11110172

    Article  Google Scholar 

  29. J. Frankle, M. Carbin, The lottery ticket hypothesis: finding sparse, trainable neural networks (2018). arXiv:1803.03635

  30. J. Gehring, M. Auli, D. Grangier, D. Yarats, Y.N. Dauphin, Convolutional sequence to sequence learning. in 34th International Conference on Machine Learning ICML 2017(3), pp. 2029–2042 (2017)

  31. X. Glorot, Y. Bengio, Y.W. Teh, M. Titterington, (eds) Understanding the difficulty of training deep feedforward neural networks. in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics Vol. 9 of Proceedings of Machine Learning Research, ed. by Y.W. Teh, M. Titterington. (PMLR, Chia Laguna Resort, Sardinia, Italy, 2010), pp. 249–256

  32. A. Gupta, A.M. Rush, Dilated convolutions for modeling long-distance genomic dependencies (2017). arXiv:1710.01278

  33. J. Hao, et al. Modeling recurrence for transformer (2019). arXiv:1904.03092

  34. J. Ho, N. Kalchbrenner, D. Weissenborn, T. Salimans, Axial attention in multidimensional transformers (2019). arXiv:1912.12180

  35. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  36. X.S. Huang, F. Perez, J. Ba, M. Volkovs, H.D. III, A. Singh, (eds) Improving transformer optimization through better initialization. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed. by H.D. III, A. Singh (PMLR, 2020), pp. 4475–4483

  37. Y. Huang, H. Wallach, et al. (eds) GPipe: efficient training of giant neural networks using pipeline parallelism. in Advances in Neural Information Processing Systems, ed. by H. Wallach, et al. Vol. 32. (Curran Associates, Inc., 019)

  38. R. Interdonato, D. Ienco, R. Gaetano, K. Ose, DuPLO: a DUal view Point deep Learning architecture for time series classificatiOn. ISPRS J. Photogramm. Remote. Sens. 149, 91–104 (2019)

    Article  Google Scholar 

  39. H.V. Jagadish et al., Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014). https://doi.org/10.1145/2611567

    Article  Google Scholar 

  40. A.E. Johnson et al., MIMIC-III, a freely accessible critical care database. Sci. data 3(1), 1–9 (2016)

    Article  MathSciNet  Google Scholar 

  41. N. Jouppi, C. Young, N. Patil, D. Patterson, Motivation for and evaluation of the first tensor processing unit. IEEE Micro 38(3), 10–19 (2018). https://doi.org/10.1109/MM.2018.032271057

    Article  Google Scholar 

  42. A. Katharopoulos, A. Vyas, N. Pappas, F. Fleuret, III, H.D., A. Singh, (eds) Transformers are RNNs: fast autoregressive transformers with linear attention. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed. by III, H. D., A. Singh (PMLR, 2020), pp. 5156–5165

  43. A. Kirillov, et al. Segment anything (2023). arXiv:2304.02643

  44. N. Kitaev, Ł. Kaiser, A. Levskaya, Reformer: the efficient transformer (2020). arXiv:2001.04451

  45. G. Lai, W.-C. Chang, Y. Yang, H. Liu, Modeling long- and short-term temporal patterns with deep neural networks. in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18 (Association for Computing Machinery, New York, NY, USA, 2018), pp. 95–104. https://doi.org/10.1145/3209978.3210006

  46. C. Li, et al. Automated progressive learning for efficient training of vision transformers. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE Computer Society, Los Alamitos, CA, USA, 2022), pp. 12476–12486. https://doi.org/10.1109/CVPR52688.2022.01216

  47. S. Li, et al. H. Wallach, et al. (eds) Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. in Advances in Neural Information Processing Systems, ed by. H. Wallach, et al. Vol. 32 (Curran Associates, Inc., 2019)

  48. Z. Li, et al. III, H. Daumé, S. Aarti (ed.) Train big, then compress: rethinking model size for efficient training and inference of transformers. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed by. III, H. Daumé, S. Aarti (PMLR, 2020), pp. 5958–5968

  49. B. Lim, S. Arık, N. Loeff, T. Pfister, Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 37(4), 1748–1764 (2021). https://doi.org/10.1016/j.ijforecast.2021.03.012

    Article  Google Scholar 

  50. T. Lin, Y. Wang, X. Liu, X. Qiu, A survey of transformers. AI Open 3, 111–132 (2022). https://doi.org/10.1016/j.aiopen.2022.10.001

    Article  Google Scholar 

  51. Z.C. Lipton, J. Berkowitz, C. Elkan, A critical review of recurrent neural networks for sequence learning. (2015). arXiv:1506.00019

  52. A. Liška, G. Kruszewski, M. Baroni, Memorize or generalize? searching for a compositional rnn in a haystack (2018). arXiv:1802.06467

  53. L. Liu, X. Liu, J. Gao, W. Chen, J. Han, Understanding the difficulty of training transformers. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, Online, 2020), pp. 5747–5763. https://doi.org/10.18653/v1/2020.emnlp-main.463

  54. M. Liu, et al. Gated transformer networks for multivariate time series classification. (2021). arXiv:2103.14438

  55. S. Liu, et al. Pyraformer: low-complexity pyramidal attention for long-range time series modeling and forecasting. In: International conference on learning representations (2021)

  56. J. Lu, C. Clark, R. Zellers, R. Mottaghi, A. Kembhavi, Unified-IO: a unified model for vision, language, and multi-modal tasks (2022). arXiv:2206.08916

  57. K. Madhusudhanan, J. Burchert, N. Duong-Trung, S. Born, L. Schmidt-Thieme, Yformer: U-net inspired transformer architecture for far horizon time series forecasting (2021). arXiv:2110.08255

  58. T. Mikolov, K. Chen, G. Corrado, J. Dean, efficient estimation of word representations in vector space (2013). arXiv:1301.3781

  59. Y. Nie, N.H. Nguyen, P. Sinthong, J. Kalagnanam, A time series is worth 64 words: long-term forecasting with transformers (2022). arXiv:2211.14730

  60. I.E. Nielsen, D. Dera, G. Rasool, R.P. Ramachandran, N.C. Bouaynaya, Robust explainability: a tutorial on gradient-based attribution methods for deep neural networks. IEEE Signal Process. Mag. 39(4), 73–84 (2022). https://doi.org/10.1109/MSP.2022.3142719

    Article  Google Scholar 

  61. I. Padhi, et al. Tabular transformers for modeling multivariate time series. in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, Toronto, 2021), pp. 3565–3569. https://doi.org/10.1109/ICASSP39728.2021.9414142

  62. R. Pascanu, T. Mikolov, Y. Bengio, S. Dasgupta, D. McAllester, (eds) On the difficulty of training recurrent neural networks. in Proceedings of the 30th International Conference on Machine Learning Vol. 28 of Proceedings of Machine Learning Research ed. by S. Dasgupta, D. McAllester (PMLR, Atlanta, Georgia, USA, 2013), pp. 1310–1318

  63. C. Pelletier, G.I. Webb, F. Petitjean, Temporal convolutional neural network for the classification of satellite image time series. Remote Sens (2019). https://doi.org/10.3390/rs11050523

    Article  Google Scholar 

  64. J. Pennington, R. Socher, C. Manning, GloVe: global vectors for word representation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Vol. 31 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2014), pp. 1532–1543. https://doi.org/10.3115/v1/D14-1162

  65. M.E. Peters, et al. Deep contextualized word representations. NAACL HLT 2018—2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference 1, 2227–2237 (2018). https://doi.org/10.18653/v1/n18-1202

  66. M. Popel, O. Bojar, Training tips for the transformer model. Prague Bull. Math. Linguist. 110(1), 43–70 (2018). https://doi.org/10.2478/pralin-2018-0002

    Article  Google Scholar 

  67. X. Qi, et al. From known to unknown: knowledge-guided transformer for time-series sales forecasting in Alibaba (2021). arXiv:2109.08381

  68. Y. Qin, et al. Knowledge inheritance for pre-trained language models (2021). arXiv:2105.13880

  69. A.H. Ribeiro, K. Tiels, L.A. Aguirre, T. & Schön, S. Chiappa, R. Calandra, (eds) Beyond exploding and vanishing gradients: analysing rnn training using attractors and smoothness. in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics Vol. 108 of Proceedings of Machine Learning Research ed. by S. Chiappa, R. Calandra (PMLR, 2020), pp. 2370–2380

  70. A. Roy, M. Saffar, A. Vaswani, D. Grangier, Efficient content-based sparse attention with routing transformers. Trans. Assoc. Comput. Linguist. 9, 53–68 (2021). https://doi.org/10.1162/tacl_a_00353

    Article  Google Scholar 

  71. M. Rußwurm, M. Körner, Self-attention for raw optical satellite time series classification. ISPRS J. Photogramm. Remote. Sens. 169, 421–435 (2020). https://doi.org/10.1016/j.isprsjprs.2020.06.006

    Article  Google Scholar 

  72. D. Salinas, V. Flunkert, J. Gasthaus, T. Januschowski, DeepAR: probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 36(3), 1181–1191 (2020)

    Article  Google Scholar 

  73. G. Salton, Some experiments in the generation of word and document associations. in Proceedings of the December 4–6, 1962, Fall Joint Computer Conference, AFIPS ’62 (Fall) (Association for Computing Machinery, New York, NY, USA, 1962), pp. 234–250. https://doi.org/10.1145/1461518.1461544

  74. F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)

    Article  Google Scholar 

  75. S.M. Shankaranarayana, D. Runje, Attention augmented convolutional transformer for tabular time-series (IEEE Computer Society, 2021), pp. 537–541. https://doi.org/10.1109/ICDMW53433.2021.00071

  76. L. Shen, Y. Wang, TCCT: tightly-coupled convolutional transformer on time series forecasting. Neurocomputing 480, 131–145 (2022). https://doi.org/10.1016/j.neucom.2022.01.039

    Article  Google Scholar 

  77. A. Sherstinsky, Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020). https://doi.org/10.1016/j.physd.2019.132306

    Article  MathSciNet  MATH  Google Scholar 

  78. A. Shewalkar, D. Nyavanandi, S.A. Ludwig, Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 9(4), 235–245 (2019). https://doi.org/10.2478/jaiscr-2019-0006

    Article  Google Scholar 

  79. L.N. Smith, A disciplined approach to neural network hyper-parameters: part 1–learning rate, batch size, momentum, and weight decay (2018). arXiv:1212.5701

  80. H. Song, D. Rajan, J. Thiagarajan, A. Spanias, Attend and diagnose: clinical time series analysis using attention models. Proc. AAAI Conf. Artif. Intell. (2018). https://doi.org/10.1609/aaai.v32i1.11635

    Article  Google Scholar 

  81. G. Sugihara, R.M. May, Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature 344(6268), 734–741 (1990)

    Article  Google Scholar 

  82. C. Szegedy, et al. Going deeper with convolutions. in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2015)

  83. S. Takase, N. Okazaki, Positional encoding to control output sequence length (2019). arXiv:1904.07418

  84. Y. Tay, M. Dehghani, D. Bahri, D. Metzler, Efficient transformers: a survey. ACM Comput. Surv. 55(6), 1–28 (2022). https://doi.org/10.1145/3530811

    Article  Google Scholar 

  85. S.J. Taylor, B. Letham, Forecasting at scale. Am. Stat. 72(1), 37–45 (2018). https://doi.org/10.1080/00031305.2017.1380080

    Article  MathSciNet  MATH  Google Scholar 

  86. S. Tipirneni, C.K. Reddy, Self-supervised transformer for multivariate clinical time-series with missing values. arXiv:2107.14293 (2021)

  87. M. Tschannen, O. Bachem, M. Lucic, Recent advances in autoencoder-based representation learning (2018). arXiv:1812.05069

  88. A. Vaswani et al., Attention is all you need, in Advances in Neural Information Processing Systems 30 (2017)

  89. J. Vig, (2022). BertViz. https://github.com/jessevig/bertviz Accessed 5 May 2022

  90. C.-Y. Wang, et al. CSPNet: a new backbone that can enhance learning capability of CNN. in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE, 2020), pp. 1571–1580

  91. P. Wang, et al. Learning to grow pretrained models for efficient transformer training (2023). arXiv:2303.00980

  92. Q. Wang, et al. Learning deep transformer models for machine translation (2019). arXiv:1906.01787

  93. Z. Wang, W. Yan, T. Oates, Time series classification from scratch with deep neural networks: a strong baseline. in 2017 International joint conference on neural networks (IJCNN) (IEEE, 2017), pp. 1578–1585. https://doi.org/10.1109/IJCNN.2017.7966039

  94. Z. Wang, Y. Ma, Z. Liu, J. Tang, R-Transformer: recurrent neural network enhanced transformer (2019). arXiv:1907.05572

  95. A. Waqas, H. Farooq, N.C. Bouaynaya, G. Rasool, Exploring robust architectures for deep artificial neural networks. Commun. Eng. 1(1), 46 (2022). https://doi.org/10.1038/s44172-022-00043-2

    Article  Google Scholar 

  96. A. Waqas, A. Tripathi, R.P. Ramachandran, P. Stewart, G. Rasool, multimodal data integration for oncology in the era of deep neural networks: a review (2023). arXiv:2303.06471

  97. N. Wu, B. Green, X. Ben, S. O’Banion, Deep transformer models for time series forecasting: the influenza prevalence case (2020). arXiv:2001.08317

  98. S. Wu, et al. H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin, (eds) Adversarial sparse transformer for time series forecasting. in Advances in Neural Information Processing Systems, ed. by Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H., Vol. 33 (Curran Associates, Inc., 2020), pp. 17105–17115

  99. R. Xiong, et al. III, H. D. & Singh, A. (eds) On layer normalization in the transformer architecture. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed. by III, H. D. & Singh, A. (PMLR, 2020), pp. 10524–10533

  100. J. Xu, H. Wu, J. Wang, M. Long, anomaly transformer: time series anomaly detection with association discrepancy (2021). arXiv:2110.02642

  101. P. Xu, et al. Optimizing deeper transformers on small datasets. in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (Association for Computational Linguistics, Online, 2021), pp. 2089–2102. https://doi.org/10.18653/v1/2021.acl-long.163

  102. H. Yang, AliGraph: a comprehensive graph neural network platform. in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19 (Association for Computing Machinery, New York, NY, USA, 2019), pp. 3165–3166. https://doi.org/10.1145/3292500.3340404

  103. Y. You, et al. Large batch optimization for deep learning: training bert in 76 minutes. (2019). arXiv:1904.00962

  104. F. Yu, V. Koltun, T. Funkhouser, Dilated residual networks. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE Computer Society, Los Alamitos, CA, USA, 2017), pp. 636–644. https://doi.org/10.1109/CVPR.2017.75

  105. L. Yuan, et al. Tokens-to-Token ViT: Training vision transformers from scratch on ImageNet. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 538–547 (2021)

  106. Y. Yuan, L. Lin, Self-supervised pretraining of transformers for satellite image time series classification. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 14, 474–487 (2020)

    Article  Google Scholar 

  107. M.D. Zeiler, Adadelta: an adaptive learning rate method arXiv:1212.5701 (2012)

  108. Y. Zhang, J. Yan, Crossformer: transformer utilizing cross-dimension dependency for multivariate time series forecasting. in The Eleventh International Conference on Learning Representations (2023)

  109. J. Zheng, S. Ramasinghe, S. Lucey, Rethinking positional encoding (2021). arXiv:2107.02561

  110. H. Zhou, et al. Informer: beyond efficient transformer for long sequence time-series forecasting Vol. 35, 11106–11115 (2021). https://doi.org/10.1609/aaai.v35i12.17325

  111. T. Zhou, et al. K. Chaudhuri, et al. (eds) FEDformer: fequency enhanced decomposed transformer for long-term series forecasting. in Proceedings of the 39th International Conference on Machine Learning Vol. 162 of Proceedings of Machine Learning Research, ed. by K. Chaudhuri, et al. (PMLR, 2022), pp. 27268–27286

Download references

Acknowledgements

This work was partly supported by the National Science Foundation Awards ECCS-1903466, OAC-2008690, and OAC-2234836.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sabeen Ahmed.

Ethics declarations

Conflict of interest

There are no conflicts of interest or competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ahmed, S., Nielsen, I.E., Tripathi, A. et al. Transformers in Time-Series Analysis: A Tutorial. Circuits Syst Signal Process 42, 7433–7466 (2023). https://doi.org/10.1007/s00034-023-02454-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-023-02454-8

Keywords

Navigation