Transformers in Time-Series Analysis: A Tutorial

Ahmed, Sabeen; Nielsen, Ian E.; Tripathi, Aakash; Siddiqui, Shamoon; Ramachandran, Ravi P.; Rasool, Ghulam

doi:10.1007/s00034-023-02454-8

Transformers in Time-Series Analysis: A Tutorial

Published: 25 July 2023

Volume 42, pages 7433–7466, (2023)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Sabeen Ahmed¹^na1,
Ian E. Nielsen²^na1,
Aakash Tripathi¹^na1,
Shamoon Siddiqui²,
Ravi P. Ramachandran ORCID: orcid.org/0000-0003-0011-4561² &
…
Ghulam Rasool¹

7363 Accesses
16 Citations
2 Altmetric
Explore all metrics

Abstract

Transformer architectures have widespread applications, particularly in Natural Language Processing and Computer Vision. Recently, Transformers have been employed in various aspects of time-series analysis. This tutorial provides an overview of the Transformer architecture, its applications, and a collection of examples from recent research in time-series analysis. We delve into an explanation of the core components of the Transformer, including the self-attention mechanism, positional encoding, multi-head, and encoder/decoder. Several enhancements to the initial Transformer architecture are highlighted to tackle time-series tasks. The tutorial also provides best practices and techniques to overcome the challenge of effectively training Transformers for time-series analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Temporal Attention Signatures for Interpretable Time-Series Prediction

Recurrence and Self-attention vs the Transformer for Time-Series Classification: A Comparative Study

Invariant time-series factorization

Article 04 July 2014

Data Availability

The manuscript has no associated data.

References

A.F. Agarap, Deep learning using rectified linear units (relu) (2018). arXiv:1803.08375
S. Ahmed, D. Dera, S.U. Hassan, N. Bouaynaya, G. Rasool, Failure detection in deep neural networks for medical imaging. Front. Med. Technol. (2022). https://doi.org/10.3389/fmedt.2022.919046
Article Google Scholar
S. Albawi, T.A. Mohammed, S. Al-Zawi. Understanding of a convolutional neural network. in 2017 International Conference on Engineering and Technology (ICET) (IEEE, 2017), pp. 1–6. https://doi.org/10.1109/ICEngTechnol.2017.8308186
A.A. Ariyo, A.O. Adewumi, C.K. Ayo, Stock price prediction using the arima model. in 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation (IEEE, 2014), pp. 106–112. https://doi.org/10.1109/UKSim.2014.67
K. ArunKumar, D.V. Kalaga, C.M.S. Kumar, M. Kawaji, T.M. Brenza, Forecasting of COVID-19 using deep layer recurrent neural networks (RNNs) with gated recurrent units (GRUs) and long short-term memory (LSTM) cells. Chaos Solitons Fractals 146, 110861 (2021). https://doi.org/10.1016/j.chaos.2021.110861
Article MathSciNet Google Scholar
J.L. Ba, J.R. Kiros, G.E. Hinton, Layer normalization (2016). arXiv:1607.06450
T. Bachlechner, B.P. Majumder, H. Mao, G. Cottrell, J. McAuley, C. de Campos, M.H. Maathuis, (eds) ReZero is all you need: fast convergence at large depth. in Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence Vol. 161 of Proceedings of Machine Learning Research, ed by C. de Campos, M. H. Maathuis (PMLR, 2021), pp. 1352–1361
D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate (2014). arXiv:1409.0473
A. Bapna, M. Chen, O. Firat, Y. Cao, Y. Wu, Training deeper neural machine translation models with transparent attention. in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Brussels, Belgium, 2018), pp. 3028–3033. https://doi.org/10.18653/v1/D18-1338
L. Behera, S. Kumar, A. Patnaik, On adaptive learning rate that guarantees convergence in feedforward networks. IEEE Trans. Neural Netw. 17(5), 1116–1125 (2006). https://doi.org/10.1109/TNN.2006.878121
Article Google Scholar
C. Bergmeir, R.J. Hyndman, B. Koo, A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput. Stat. Data Anal. 120, 70–83 (2018)
Article MathSciNet MATH Google Scholar
L. Cai, K. Janowicz, G. Mai, B. Yan, R. Zhu, Traffic transformer: capturing the continuity and periodicity of time series for traffic forecasting. Trans. GIS 24(3), 736–755 (2020). https://doi.org/10.1111/tgis.12644
Article Google Scholar
Z. Che, S. Purushotham, K. Cho, D. Sontag, Y. Liu, Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 6085 (2018). https://doi.org/10.1038/s41598-018-24271-9
Article Google Scholar
G. Chen, A gentle tutorial of recurrent neural network with error backpropagation (2016). arXiv:1610.02583
K. Chen, et al. NAST: non-autoregressive spatial-temporal transformer for time series forecasting (2021). arXiv:2102.05624
L. Chen et al., Decision transformer: reinforcement learning via sequence modeling. Adv. Neural. Inf. Process. Syst. 34, 15084–15097 (2021)
Google Scholar
W. Chen, et al. Learning to rotate: quaternion transformer for complicated periodical time series forecasting. in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22 (Association for Computing Machinery, New York, NY, USA, 2022), pp. 146–156. https://doi.org/10.1145/3534678.3539234
K. Choromanski, et al. Rethinking attention with performers (2020). arXiv:2009.14794
J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). arXiv:1412.3555
Y.N. Dauphin, A. Fan, M. Auli, D. Grangier, D. Precup, Y.W. Teh, Y.W. (eds) Language modeling with gated convolutional networks. in Proceedings of the 34th International Conference on Machine Learning Vol. 70 of Proceedings of Machine Learning Research, ed. by D. Precup, Y.W. Teh (PMLR, 2017), pp. 933–941
D. Dera, S. Ahmed, N.C. Bouaynaya, G. Rasool, Trustworthy uncertainty propagation for sequential time-series analysis in rnns. IEEE Trans. Knowl. Data Eng. (2023). https://doi.org/10.1109/TKDE.2023.3288628
Article Google Scholar
J. Devlin, M.W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference 1(Mlm), pp. 4171–4186 (2019)
L. Di Persio, O. Honchar, Recurrent neural networks approach to the financial forecast of google assets. Int. J. Math. Comput. Simul. 11, 7–13 (2017)
Google Scholar
M. Dixon, J. London, Financial forecasting with \(\alpha \)-rnns: a time series modeling approach. Front. Appl. Math. Stat. 6, 551138 (2021). https://doi.org/10.3389/fams.2020.551138
Article Google Scholar
A. Dosovitskiy, et al. An image is worth 16x16 words: transformers for image recognition at scale (2020). arXiv:2010.11929
D. Dua, C. Graff, UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences (2017). http://archive.ics.uci.edu/ml
J. El Zini, Y. Rizk, M. Awad, An optimized parallel implementation of non-iteratively trained recurrent neural networks. J. Artif. Intell. Soft Comput. Res. 11(1), 33–50 (2021). https://doi.org/10.2478/jaiscr-2021-0003
Article Google Scholar
H. Fei, F. Tan, Bidirectional grid long short-term memory (bigridlstm): a method to address context-sensitivity and vanishing gradient. Algorithms 11(11), 172 (2018). https://doi.org/10.3390/a11110172
Article Google Scholar
J. Frankle, M. Carbin, The lottery ticket hypothesis: finding sparse, trainable neural networks (2018). arXiv:1803.03635
J. Gehring, M. Auli, D. Grangier, D. Yarats, Y.N. Dauphin, Convolutional sequence to sequence learning. in 34th International Conference on Machine Learning ICML 2017(3), pp. 2029–2042 (2017)
X. Glorot, Y. Bengio, Y.W. Teh, M. Titterington, (eds) Understanding the difficulty of training deep feedforward neural networks. in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics Vol. 9 of Proceedings of Machine Learning Research, ed. by Y.W. Teh, M. Titterington. (PMLR, Chia Laguna Resort, Sardinia, Italy, 2010), pp. 249–256
A. Gupta, A.M. Rush, Dilated convolutions for modeling long-distance genomic dependencies (2017). arXiv:1710.01278
J. Hao, et al. Modeling recurrence for transformer (2019). arXiv:1904.03092
J. Ho, N. Kalchbrenner, D. Weissenborn, T. Salimans, Axial attention in multidimensional transformers (2019). arXiv:1912.12180
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
X.S. Huang, F. Perez, J. Ba, M. Volkovs, H.D. III, A. Singh, (eds) Improving transformer optimization through better initialization. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed. by H.D. III, A. Singh (PMLR, 2020), pp. 4475–4483
Y. Huang, H. Wallach, et al. (eds) GPipe: efficient training of giant neural networks using pipeline parallelism. in Advances in Neural Information Processing Systems, ed. by H. Wallach, et al. Vol. 32. (Curran Associates, Inc., 019)
R. Interdonato, D. Ienco, R. Gaetano, K. Ose, DuPLO: a DUal view Point deep Learning architecture for time series classificatiOn. ISPRS J. Photogramm. Remote. Sens. 149, 91–104 (2019)
Article Google Scholar
H.V. Jagadish et al., Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014). https://doi.org/10.1145/2611567
Article Google Scholar
A.E. Johnson et al., MIMIC-III, a freely accessible critical care database. Sci. data 3(1), 1–9 (2016)
Article MathSciNet Google Scholar
N. Jouppi, C. Young, N. Patil, D. Patterson, Motivation for and evaluation of the first tensor processing unit. IEEE Micro 38(3), 10–19 (2018). https://doi.org/10.1109/MM.2018.032271057
Article Google Scholar
A. Katharopoulos, A. Vyas, N. Pappas, F. Fleuret, III, H.D., A. Singh, (eds) Transformers are RNNs: fast autoregressive transformers with linear attention. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed. by III, H. D., A. Singh (PMLR, 2020), pp. 5156–5165
A. Kirillov, et al. Segment anything (2023). arXiv:2304.02643
N. Kitaev, Ł. Kaiser, A. Levskaya, Reformer: the efficient transformer (2020). arXiv:2001.04451
G. Lai, W.-C. Chang, Y. Yang, H. Liu, Modeling long- and short-term temporal patterns with deep neural networks. in The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18 (Association for Computing Machinery, New York, NY, USA, 2018), pp. 95–104. https://doi.org/10.1145/3209978.3210006
C. Li, et al. Automated progressive learning for efficient training of vision transformers. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE Computer Society, Los Alamitos, CA, USA, 2022), pp. 12476–12486. https://doi.org/10.1109/CVPR52688.2022.01216
S. Li, et al. H. Wallach, et al. (eds) Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. in Advances in Neural Information Processing Systems, ed by. H. Wallach, et al. Vol. 32 (Curran Associates, Inc., 2019)
Z. Li, et al. III, H. Daumé, S. Aarti (ed.) Train big, then compress: rethinking model size for efficient training and inference of transformers. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed by. III, H. Daumé, S. Aarti (PMLR, 2020), pp. 5958–5968
B. Lim, S. Arık, N. Loeff, T. Pfister, Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 37(4), 1748–1764 (2021). https://doi.org/10.1016/j.ijforecast.2021.03.012
Article Google Scholar
T. Lin, Y. Wang, X. Liu, X. Qiu, A survey of transformers. AI Open 3, 111–132 (2022). https://doi.org/10.1016/j.aiopen.2022.10.001
Article Google Scholar
Z.C. Lipton, J. Berkowitz, C. Elkan, A critical review of recurrent neural networks for sequence learning. (2015). arXiv:1506.00019
A. Liška, G. Kruszewski, M. Baroni, Memorize or generalize? searching for a compositional rnn in a haystack (2018). arXiv:1802.06467
L. Liu, X. Liu, J. Gao, W. Chen, J. Han, Understanding the difficulty of training transformers. in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Association for Computational Linguistics, Online, 2020), pp. 5747–5763. https://doi.org/10.18653/v1/2020.emnlp-main.463
M. Liu, et al. Gated transformer networks for multivariate time series classification. (2021). arXiv:2103.14438
S. Liu, et al. Pyraformer: low-complexity pyramidal attention for long-range time series modeling and forecasting. In: International conference on learning representations (2021)
J. Lu, C. Clark, R. Zellers, R. Mottaghi, A. Kembhavi, Unified-IO: a unified model for vision, language, and multi-modal tasks (2022). arXiv:2206.08916
K. Madhusudhanan, J. Burchert, N. Duong-Trung, S. Born, L. Schmidt-Thieme, Yformer: U-net inspired transformer architecture for far horizon time series forecasting (2021). arXiv:2110.08255
T. Mikolov, K. Chen, G. Corrado, J. Dean, efficient estimation of word representations in vector space (2013). arXiv:1301.3781
Y. Nie, N.H. Nguyen, P. Sinthong, J. Kalagnanam, A time series is worth 64 words: long-term forecasting with transformers (2022). arXiv:2211.14730
I.E. Nielsen, D. Dera, G. Rasool, R.P. Ramachandran, N.C. Bouaynaya, Robust explainability: a tutorial on gradient-based attribution methods for deep neural networks. IEEE Signal Process. Mag. 39(4), 73–84 (2022). https://doi.org/10.1109/MSP.2022.3142719
Article Google Scholar
I. Padhi, et al. Tabular transformers for modeling multivariate time series. in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, Toronto, 2021), pp. 3565–3569. https://doi.org/10.1109/ICASSP39728.2021.9414142
R. Pascanu, T. Mikolov, Y. Bengio, S. Dasgupta, D. McAllester, (eds) On the difficulty of training recurrent neural networks. in Proceedings of the 30th International Conference on Machine Learning Vol. 28 of Proceedings of Machine Learning Research ed. by S. Dasgupta, D. McAllester (PMLR, Atlanta, Georgia, USA, 2013), pp. 1310–1318
C. Pelletier, G.I. Webb, F. Petitjean, Temporal convolutional neural network for the classification of satellite image time series. Remote Sens (2019). https://doi.org/10.3390/rs11050523
Article Google Scholar
J. Pennington, R. Socher, C. Manning, GloVe: global vectors for word representation. in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) Vol. 31 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2014), pp. 1532–1543. https://doi.org/10.3115/v1/D14-1162
M.E. Peters, et al. Deep contextualized word representations. NAACL HLT 2018—2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies—Proceedings of the Conference 1, 2227–2237 (2018). https://doi.org/10.18653/v1/n18-1202
M. Popel, O. Bojar, Training tips for the transformer model. Prague Bull. Math. Linguist. 110(1), 43–70 (2018). https://doi.org/10.2478/pralin-2018-0002
Article Google Scholar
X. Qi, et al. From known to unknown: knowledge-guided transformer for time-series sales forecasting in Alibaba (2021). arXiv:2109.08381
Y. Qin, et al. Knowledge inheritance for pre-trained language models (2021). arXiv:2105.13880
A.H. Ribeiro, K. Tiels, L.A. Aguirre, T. & Schön, S. Chiappa, R. Calandra, (eds) Beyond exploding and vanishing gradients: analysing rnn training using attractors and smoothness. in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics Vol. 108 of Proceedings of Machine Learning Research ed. by S. Chiappa, R. Calandra (PMLR, 2020), pp. 2370–2380
A. Roy, M. Saffar, A. Vaswani, D. Grangier, Efficient content-based sparse attention with routing transformers. Trans. Assoc. Comput. Linguist. 9, 53–68 (2021). https://doi.org/10.1162/tacl_a_00353
Article Google Scholar
M. Rußwurm, M. Körner, Self-attention for raw optical satellite time series classification. ISPRS J. Photogramm. Remote. Sens. 169, 421–435 (2020). https://doi.org/10.1016/j.isprsjprs.2020.06.006
Article Google Scholar
D. Salinas, V. Flunkert, J. Gasthaus, T. Januschowski, DeepAR: probabilistic forecasting with autoregressive recurrent networks. Int. J. Forecast. 36(3), 1181–1191 (2020)
Article Google Scholar
G. Salton, Some experiments in the generation of word and document associations. in Proceedings of the December 4–6, 1962, Fall Joint Computer Conference, AFIPS ’62 (Fall) (Association for Computing Machinery, New York, NY, USA, 1962), pp. 234–250. https://doi.org/10.1145/1461518.1461544
F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, G. Monfardini, The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)
Article Google Scholar
S.M. Shankaranarayana, D. Runje, Attention augmented convolutional transformer for tabular time-series (IEEE Computer Society, 2021), pp. 537–541. https://doi.org/10.1109/ICDMW53433.2021.00071
L. Shen, Y. Wang, TCCT: tightly-coupled convolutional transformer on time series forecasting. Neurocomputing 480, 131–145 (2022). https://doi.org/10.1016/j.neucom.2022.01.039
Article Google Scholar
A. Sherstinsky, Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D 404, 132306 (2020). https://doi.org/10.1016/j.physd.2019.132306
Article MathSciNet MATH Google Scholar
A. Shewalkar, D. Nyavanandi, S.A. Ludwig, Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 9(4), 235–245 (2019). https://doi.org/10.2478/jaiscr-2019-0006
Article Google Scholar
L.N. Smith, A disciplined approach to neural network hyper-parameters: part 1–learning rate, batch size, momentum, and weight decay (2018). arXiv:1212.5701
H. Song, D. Rajan, J. Thiagarajan, A. Spanias, Attend and diagnose: clinical time series analysis using attention models. Proc. AAAI Conf. Artif. Intell. (2018). https://doi.org/10.1609/aaai.v32i1.11635
Article Google Scholar
G. Sugihara, R.M. May, Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature 344(6268), 734–741 (1990)
Article Google Scholar
C. Szegedy, et al. Going deeper with convolutions. in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2015)
S. Takase, N. Okazaki, Positional encoding to control output sequence length (2019). arXiv:1904.07418
Y. Tay, M. Dehghani, D. Bahri, D. Metzler, Efficient transformers: a survey. ACM Comput. Surv. 55(6), 1–28 (2022). https://doi.org/10.1145/3530811
Article Google Scholar
S.J. Taylor, B. Letham, Forecasting at scale. Am. Stat. 72(1), 37–45 (2018). https://doi.org/10.1080/00031305.2017.1380080
Article MathSciNet MATH Google Scholar
S. Tipirneni, C.K. Reddy, Self-supervised transformer for multivariate clinical time-series with missing values. arXiv:2107.14293 (2021)
M. Tschannen, O. Bachem, M. Lucic, Recent advances in autoencoder-based representation learning (2018). arXiv:1812.05069
A. Vaswani et al., Attention is all you need, in Advances in Neural Information Processing Systems 30 (2017)
J. Vig, (2022). BertViz. https://github.com/jessevig/bertviz Accessed 5 May 2022
C.-Y. Wang, et al. CSPNet: a new backbone that can enhance learning capability of CNN. in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE, 2020), pp. 1571–1580
P. Wang, et al. Learning to grow pretrained models for efficient transformer training (2023). arXiv:2303.00980
Q. Wang, et al. Learning deep transformer models for machine translation (2019). arXiv:1906.01787
Z. Wang, W. Yan, T. Oates, Time series classification from scratch with deep neural networks: a strong baseline. in 2017 International joint conference on neural networks (IJCNN) (IEEE, 2017), pp. 1578–1585. https://doi.org/10.1109/IJCNN.2017.7966039
Z. Wang, Y. Ma, Z. Liu, J. Tang, R-Transformer: recurrent neural network enhanced transformer (2019). arXiv:1907.05572
A. Waqas, H. Farooq, N.C. Bouaynaya, G. Rasool, Exploring robust architectures for deep artificial neural networks. Commun. Eng. 1(1), 46 (2022). https://doi.org/10.1038/s44172-022-00043-2
Article Google Scholar
A. Waqas, A. Tripathi, R.P. Ramachandran, P. Stewart, G. Rasool, multimodal data integration for oncology in the era of deep neural networks: a review (2023). arXiv:2303.06471
N. Wu, B. Green, X. Ben, S. O’Banion, Deep transformer models for time series forecasting: the influenza prevalence case (2020). arXiv:2001.08317
S. Wu, et al. H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, H. Lin, (eds) Adversarial sparse transformer for time series forecasting. in Advances in Neural Information Processing Systems, ed. by Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. & Lin, H., Vol. 33 (Curran Associates, Inc., 2020), pp. 17105–17115
R. Xiong, et al. III, H. D. & Singh, A. (eds) On layer normalization in the transformer architecture. in Proceedings of the 37th International Conference on Machine Learning Vol. 119 of Proceedings of Machine Learning Research, ed. by III, H. D. & Singh, A. (PMLR, 2020), pp. 10524–10533
J. Xu, H. Wu, J. Wang, M. Long, anomaly transformer: time series anomaly detection with association discrepancy (2021). arXiv:2110.02642
P. Xu, et al. Optimizing deeper transformers on small datasets. in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (Association for Computational Linguistics, Online, 2021), pp. 2089–2102. https://doi.org/10.18653/v1/2021.acl-long.163
H. Yang, AliGraph: a comprehensive graph neural network platform. in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19 (Association for Computing Machinery, New York, NY, USA, 2019), pp. 3165–3166. https://doi.org/10.1145/3292500.3340404
Y. You, et al. Large batch optimization for deep learning: training bert in 76 minutes. (2019). arXiv:1904.00962
F. Yu, V. Koltun, T. Funkhouser, Dilated residual networks. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE Computer Society, Los Alamitos, CA, USA, 2017), pp. 636–644. https://doi.org/10.1109/CVPR.2017.75
L. Yuan, et al. Tokens-to-Token ViT: Training vision transformers from scratch on ImageNet. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 538–547 (2021)
Y. Yuan, L. Lin, Self-supervised pretraining of transformers for satellite image time series classification. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 14, 474–487 (2020)
Article Google Scholar
M.D. Zeiler, Adadelta: an adaptive learning rate method arXiv:1212.5701 (2012)
Y. Zhang, J. Yan, Crossformer: transformer utilizing cross-dimension dependency for multivariate time series forecasting. in The Eleventh International Conference on Learning Representations (2023)
J. Zheng, S. Ramasinghe, S. Lucey, Rethinking positional encoding (2021). arXiv:2107.02561
H. Zhou, et al. Informer: beyond efficient transformer for long sequence time-series forecasting Vol. 35, 11106–11115 (2021). https://doi.org/10.1609/aaai.v35i12.17325
T. Zhou, et al. K. Chaudhuri, et al. (eds) FEDformer: fequency enhanced decomposed transformer for long-term series forecasting. in Proceedings of the 39th International Conference on Machine Learning Vol. 162 of Proceedings of Machine Learning Research, ed. by K. Chaudhuri, et al. (PMLR, 2022), pp. 27268–27286

Download references

Acknowledgements

This work was partly supported by the National Science Foundation Awards ECCS-1903466, OAC-2008690, and OAC-2234836.

Author information

Sabeen Ahmed, Ian E. Nielsen and Aakash Tripathi have contributed equally to this work.

Authors and Affiliations

Department of Machine Learning, Moffitt Cancer Center, 12902 USF Magnolia Drive, Tampa, FL, 33612, USA
Sabeen Ahmed, Aakash Tripathi & Ghulam Rasool
Department of Electrical and Computer Engineering, Rowan University, 201 Mullica Hill Rd, Glassboro, NJ, 08028, USA
Ian E. Nielsen, Shamoon Siddiqui & Ravi P. Ramachandran

Authors

Sabeen Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Ian E. Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
Aakash Tripathi
View author publications
You can also search for this author in PubMed Google Scholar
Shamoon Siddiqui
View author publications
You can also search for this author in PubMed Google Scholar
Ravi P. Ramachandran
View author publications
You can also search for this author in PubMed Google Scholar
Ghulam Rasool
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sabeen Ahmed.

Ethics declarations

Conflict of interest

There are no conflicts of interest or competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ahmed, S., Nielsen, I.E., Tripathi, A. et al. Transformers in Time-Series Analysis: A Tutorial. Circuits Syst Signal Process 42, 7433–7466 (2023). https://doi.org/10.1007/s00034-023-02454-8

Download citation

Received: 23 December 2022
Revised: 04 July 2023
Accepted: 10 July 2023
Published: 25 July 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00034-023-02454-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformers in Time-Series Analysis: A Tutorial

Abstract

Access this article

Similar content being viewed by others

Temporal Attention Signatures for Interpretable Time-Series Prediction

Recurrence and Self-attention vs the Transformer for Time-Series Classification: A Comparative Study

Invariant time-series factorization

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transformers in Time-Series Analysis: A Tutorial

Abstract

Access this article

Similar content being viewed by others

Temporal Attention Signatures for Interpretable Time-Series Prediction

Recurrence and Self-attention vs the Transformer for Time-Series Classification: A Comparative Study

Invariant time-series factorization

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation