Abstract
For many medical applications, a single, stationary image may not be sufficient for detecting subtle pathology. Advancements in fields such as computer vision have produced robust deep learning (DL) techniques able to effectively learn complex interactions between space and time for prediction. This chapter presents an overview of different medical applications of spatiotemporal DL for prognostic and diagnostic predictive tasks, and how they built on important advancements in DL from other domains. Although many of the current approaches draw heavily from previous works in other fields, adaptation to the medical domain brings unique challenges, which will be discussed, along with techniques being used to address them. Although the use of spatiotemporal DL in medical applications is still relatively new, and lags behind the progress seen from still images, it provides unique opportunities to incorporate information about functional dynamics into prediction, which could be vital in many medical applications. Current medical applications of spatiotemporal DL have demonstrated the potential of these models, and recent advancements make this space poised to produce state-of-the-art models for many medical applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nat 521:436–444
Deng J, Dong W, Socher R et al (2009) ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conf on Comput Vis and Pattern Recognit (pp. 248–255)
Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402
Karpathy A, Toderici G, Shetty S et al (2014) Large-Scale Video Classification with Convolutional Neural Networks. In: 2014 IEEE Conf on Comput Vis and Pattern Recognit, (pp. 1725–1732)
Kay W, Carreira J, Simonyan K et al (2017) The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.
Willamowski J, Arregui D, Csurka G et al (2004) Categorizing nine visual classes using local appearance descriptors. Illum 17(21)
Laptev I, Marszalek M, Schmid C et al (2008) Learning realistic human actions from movies. In: 2008 IEEE Conf on Comput Vis and Pattern Recognit (pp. 1–8)
Wang H, Kläser A, Schmid C et al (2013) Dense Trajectories and Motion Boundary Descriptors for Action Recognition. Int J Comput Vis 103:60–79
Wang H, Schmid C (2013) Action Recognition with Improved Trajectories. In: 2013 IEEE Int Conf on Comput Vis (pp. 3551–3558)
Lecun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. In: Proc of the IEEE (pp. 2278–2324)
LeCun Y, Huang FJ, Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: Proc of the 2004 IEEE Comput Society Conf on Comput Vis and Pattern Recognit (pp. II–104)
Krizhevsky Alex, Sutskever I, Hinton G (2012) ImageNet Classification with Deep Convolutional Neural Networks. In: Adv in Neural Inf Process Syst
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Adv in Neural Inf Process Syst 27 (2014)
Tran D, Bourdev L, Fergus R et al (2014) Learning Spatiotemporal Features with 3D Convolutional Networks. In: Proc of the IEEE Int Conf on Comput Vis (pp. 4489–4497).
He K, Zhang X, Ren S et al (2016) Identity mappings in deep residual networks. In: Proc of the Comput Vis-ECCV 2016: 14th Eur Conf (pp. 630–645)
Qiu Z, Yao T, Mei T (2017) Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. In: Proc of the IEEE Int Conf on Comput Vis(pp. 5533–5541)
Tran D, Wang H, Torresani L et al (2018). A Closer Look at Spatiotemporal Convolutions for Action Recognition. In: Proc of the IEEE Conf on Comput Vis and Pattern Recognit (pp. 6450–6459)
Wang L, Xiong Y, Wang Z et al (2016) Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. Temporal segment networks: Towards good practices for deep action recognition. In: Eur Conf on Comput Vis (pp. 20–36)
Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE Trans on Pattern Anal and Mach Intell 40(6):1510–1517
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nat 323(6088):533–536
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans on Neural Netw 5(2):157–166, 1994
Hochreiter S, Schmidhuber J (1997) Long Short-Term Memory. Neural Comput 9(8):1735–1780
Gers FA, Schmidhuber J, Cummins F (2000) Learning to Forget: Continual Prediction with LSTM. Neural Comput 12(10):2451–2471
Cho K, Merriënboer B, Bahdanau D, Bengio Y (2014) On the Properties of Neural Machine Translation: Encoder-decoder Approaches. In: Proc of SSST-8, Eighth Workshop on Syntax, Semant and Struct in Stat Transl
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint, arXiv:1412.3555
Baccouche M, Mamalet F, Wolf C et al (2010) Action classification in soccer videos with long short-term memory recurrent neural networks. In: Proc ICANN(2)
Ng JYH, Hausknecht M, Vijayanarasimhan S et al (2015) Beyond Short Snippets: Deep Networks for Video Classification. In: Proc of the IEEE Conf on Comput Vis and Pattern Recognit (CVPR)
Donahue J, Anne Hendricks, L., Guadarrama S, Rohrbach M et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proc of the IEEE Conf on Comput Vis and Pattern Recognit (pp. 2625–2634)
Bahdanau D, Cho K, Yoshua B (2014) Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention Is All You Need. In: Proc of the 31st Int Conf on Neural Inf Process Syst
Devlin J, Chang MW, Lee K et al (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G Gelly S, Uszkoreit J, Houlsby N (2020) An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929
Langlotz CP, Allen B, Erickson BJ, et al (2019) A roadmap for foundational research on artificial intelligence in medical imaging: from the 2018 NIH/RSNA/ACR/The Academy Workshop. Radiol 291(3):781–791
Klem GH, Lüders HO, Jasper HH, et al (1999) The ten-twenty electrode system of the International Federation. The Int Fed of Clin Neurophysiol. Electroencephalogr Clin Neurophysiol Suppl 52:3–6
Salama ES, El-Khoribi RA, Shoman ME et al (2018) EEG-Based Emotion Recognition using 3D Convolutional Neural Networks. Int J of Adv Comput Sci and Appl 9(8)
Koelstra S, Muhl C, Soleymani M (2012) DEAP: A database for emotion analysis using physiological signals. IEEE Trans on Affect Comput 3(1):18–31
Cho J, Hwang H (2020) Spatio-temporal representation of an electoencephalogram for emotion recognition using a three-dimensional convolutional neural network. Sensors 20(12)
Song Y, Jia X, Yang L (2021) Transformer-based Spatial-Temporal Feature Learning for EEG Decoding. arXiv preprint arXiv:2106.11170
Brunner C, Leeb R, Muller-Putz GR et al (2008) BCI Competition 2008-Graz data set A. Inst for Knowl Discov (Laboratory of Brain-Computer Interfaces), Graz Univ of Technol, 16, 1–6
Aslam MH, Usman SM, Khalid S et al (2022) Classification of EEG Signals for Prediction of Epileptic Seizures. Appl Sci 12(14):7251
Bardeci M, Ip CT, Olbrich S (2021) Deep learning applied to electroencephalogram data in mental disorders: A systematic review. Biological Psychol 162:108–117
Liu Y, Pu C, Xia S et al (2022) Machine learning approaches for diagnosing depression using EEG: A review. Transl Neurosci 13(1):224–235
Mousavi S, Afghah F, Acharya UR (2019) SleepEEGNet: Automated sleep stage scoring with sequence to sequence deep learning approach. PLOS ONE 14(5)
Rafie N, Kashou AH, Noseworthy PA (2021) ECG Interpretation: Clinical Relevance, Challenges, and Advances. Hearts 2(4):505–513
Pantelopoulos A, Bourbakis NG (2010) A Survey on Wearable Sensor-Based Systems for Health Monitoring and Prognosis. IEEE Transact on Syst, Man, and Cybern, Part C (Appl and Rev). 40(1):1–12
Sun W, Kalmady SV, Salimi A et al (2022) ECG for high-throughput screening of multiple diseases: Proof-of-concept using multi-diagnosis deep learning from population-based datasets. arXiv preprint arXiv:2210.06291
Yao Q, Wang R, Fan X et al (2020) Multi-class Arrhythmia detection from 12-lead varied-length ECG using Attention-based Time-Incremental Convolutional Neural Network. Inf Fusion 53:174–182
Che C, Zhang P, Zhu M et al (2021) Constrained transformer network for ECG signal processing and arrhythmia classification. BMC Med Inform Decis Mak 21(184)
Somani S, Russak AJ, Richter F et al (2021) Deep learning and the electrocardiogram: review of the current state-of-the-art. EP Europace 23(8):1179–1191
Atzori M, Gijsberts A, Kuzborskij I et al (2015) Characterization of a benchmark database for myoelectric movement classification. IEEE Transact on Neural Syst and Rehabilitation Eng 23(1):73–83
Park KH, Lee SW (2016) Movement intention decoding based on deep learning for multiuser myoelectric interfaces. In: 4th Int Winter Conf on Brain-Comput Interface
Tommasi T, Orabona F, Castellini C et al (2013) Improving Control of Dexterous Hand Prostheses Using Adaptive Learning. IEEE Transact on Robotics 29(1):207–219
Côté-Allard U, Fall CL, Drouin A et al (2019) Deep Learning for Electromyographic Hand Gesture Signal Classification Using Transfer Learning. IEEE Transact on Neural Syst and Rehabilitation Eng 27(4):760–771
Ravichandran T, Kamel N, Al-Ezzi AA et al (2021) Electrooculography-based Eye Movement Classification using Deep Learning Models. In: 2020 IEEE-EMBS Conf on Biomedical Eng and Sci (IECBES) (pp. 57–61)
Hernandez KAL, Rienmüller T, Baumgartner D et al (2021) Deep learning in spatiotemporal cardiac imaging: A review of methodologies and clinical usability. Comput in Biology and Medicine 130
Fiorito AM, Østvik A, Smistad E et al (2018) Detection of cardiac events in echocardiography using 3D convolutional recurrent neural networks. In: 2018 IEEE Int Ultrasonics Symp (IUS) (pp. 1–4)
Dezaki TF, Liao Z, Luong C et al (2019) Cardiac phase detection in echocardiograms with densely gated recurrent neural networks and global extrema loss. IEEE Trans Med Imag 38(8):1821–1832
Jahren TS, Steen EN, Aase SA et al (2020) Estimation of End-Diastole in Cardiac Spectral Doppler Using Deep Learning. IEEE Trans on Ultrason, Ferroelectr, and Freq Control 67(12):2605–2614
Ouyang D, He B, Ghorbani A et al (2020) Video-based AI for beat-to-beat assessment of cardiac function. Nat 580:252–256
Reynaud H, Vlontzos A, Hou B et al (2021) Ultrasound Video Transformers for Cardiac Ejection Fraction Estimation. In: Proc of Med Image Comput and Comput Assist Interv-MICCAI 2021, Part VI 24 (pp. 495–505)
Kalam K, Otahal P, Marwick TH (2014) Prognostic implications of global LV dysfunction: a systematic review and meta-analysis of global longitudinal strain and ejection fraction. Heart. 100(21):1673–80
Tsai CH, Ma HP, Lin YT et al (2020) Usefulness of heart rhythm complexity in heart failure detection and diagnosis. Sci Rep 10(1):14916
Shad R, Quach N, Fong R et al (2021) Predicting post-operative right ventricular failure using video-based deep learning. Nat Commun 12:5192
Hwang IC, Choi D, Choi YJ et al (2022) Differential diagnosis of common etiologies of left ventricular hypertrophy using a hybrid CNN-LSTM model. Sci Rep 12:20998
Zaman F, Ponnapureddy R, Wang YG et al (2021) Spatio-temporal Hybrid Neural Networks Reduce Erroneous Human “Judgment Calls” in the Diagnosis of Takotsubo Syndrome. EClinicalMedicine 40:101115
Kwong RY, Yucel EK (2003) Computed Tomography Scan and Magnetic Resonance Imaging. Circ 108(15):e104–e106
Varoquaux G, Cheplygina V (2022) Machine learning for medical imaging: methodological failures and recommendations for the future. NPJ Digit. Med. 5(1):48
Mittermeier A, Reidler P, Fabritius MP et al (2022) End-to-End Deep Learning Approach for Perfusion Data: A Proof-of-Concept Study to Classify Core Volume in Stroke CT. Diagn 12(5):1142
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Hu T, Lei Y, Su J et al (2021) Learning spatiotemporal features of DSA using 3D CNN and BiConvGRU for ischemic moyamoya disease detection. Int J of Neurosci 1–11
Nielsen M, WaldmannM, Frölich AM et al (2021) Deep Learning-Based Automated Thrombolysis in Cerebral Infarction Scoring: A Timely Proof-of-Principle Study. Stroke 52:3497–3504
Tan M, Le QV (2019) EfficientNet: Rethinking model scaling for convolutional neural networks. In: Int Conf on Machine Learn (pp. 6105–6114)
Ashby FG (2015) An introduction to fMRI. In: Forstmann BU, Wagenmakers E-J (ed) An introduction to model-based cognitive neuroscience, 91–112. Springer International Publishing
Damoiseaux JS, Rombouts SARB, Barkhof F et al (2006) Consistent resting-state networks across healthy subjects. Proc Natl Acad Sci USA 103(37):13848–13853
Li X, Dvornek NC, Papademetris X, et al (2018) 2-Channel convolutional 3D deep neural network (2CC3D) for fMRI analysis: ASD classification and feature learning. In: IEEE Int Symp on Biomed Imaging (pp. 1252–1255)
Riaz A, Asad M ,Alsano E et al (2020) DeepFMRI: End-to-end deep learning for functional connectivity and classification of ADHD using fMRI. J Neurosci Methods 335:0165–0270
Riaz A, Asad M, Al-Arid SMMR, et al (2017) Fcnet: a convolutional neural network for calculating functional connectivity from functional mri. In: Proc Int Workshop on Connectomics in NeuroImaging (pp. 70–78)
Zhang T, Li C, Li P et al (2020) Separated Channel Attention Convolutional Neural Network (SC-CNN-Attention) to Identify ADHD in Multi-Site Rs-fMRI Dataset. Entropy 22(8):893
Li W, Lin X, Chen X (2020) Detecting Alzheimer’s disease Based on 4D fMRI: An exploration under deep learning framework. Neurocomputing 388:280–287
Wang L, Li K, Chen X et al (2019) Application of convolutional recurrent neural network for individual recognition based on resting state fMRI data. Front in Neurosci 13:434
Mao Z, Su Y, Xu G, et al (2019) Spatio-temporal deep learning method for ADHD fMRI classification. Inf Sci 499:1–11
Xie J, Huo Z, Liu X et al (2022) An fMRI Sequence Representation Learning Framework for Attention Deficit Hyperactivity Disorder Classification. Appl Sci 12(12):6211
Thomas AW, Ré C, Poldrack RA (2022) Self-supervised learning of brain dynamics from broad neuroimaging data. arXiv preprint arXiv:2206.11417
Kong Y, Gao S, Yue Y et al (2021) Spatio-temporal graph convolutional network for diagnosis and treatment response prediction of major depressive disorder from functional connectivity. Hum Brain Mapp 42(12):3922–3933
El Gazzar A, Thomas R, Van Wingen G (2022) Benchmarking Graph Neural Networks for FMRI analysis. arXiv preprint arXiv:2211.08927
Specht K (2020) Current challenges in translational and clinical fMRI and future directions. Front Psychiatry 10:924
He K, Girshick R, Dollár P (2018) Rethinking ImageNet Pre-training. In: Procof the IEEE/CVF Int Conf on Comput Vis (pp. 4918–4927)
Raghu M, Zhang C, Kleinberg J et al (2019) Transfusion: Understanding Transfer Learning for Medical Imaging. Adv in Neural Inf Process Syst 32
Rusu AA, Rabinowitz NC, Desjardins G et al (2016) Progressive neural networks. arXiv preprint arXiv:1606.04671.
Goodfellow I, Bengion Y, Courville A (2016) Deep Learning. MIT Press
Srivastava N, Hinton G, Krizhevsky A et al (2014) Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J of Mach Learn Res 15(56):1929–1958
DeVries T, Taylor GW (2017) Improved regularization of convolutional neural networks with cutout, arXiv preprint arXiv:1708.04552.
Goodfellow IJ, Pouget-Abadie J, Mirza M et al (2014) Generative Adversarial Networks. Commun ACM 63(11):139–144.
Liu R, Huang ZA, Hu Y et al (2022) Attention-Like Multimodality Fusion With Data Augmentation for Diagnosis of Mental Disorders Using MRI. In: IEEE Trans on Neural Netw and Learn Syst
Mirza M, Osindero S (2014) Conditional Generative Adversarial Nets. arXiv preprint arXiv:1411.1784
Caruana R (1993) Multitask learning: A knowledge-based source of inductive bias1. In: Proc of the Tenth Int Conf on Mach Learn (pp. 41–48)
Caruana R, Baluja S, Mitchell T (1995) Using the future to “sort out” the present: Rankprop and multitask learning for medical risk evaluation. In: Adv in Neural Inf Process Syst 8
Zhao Y, Wang X, Che T et al (2023) Multi-task deep learning for medical image computing and analysis: A review. Comput Biol Med 153:106496
Xue W, Brahm G, Pandey S et al (2017) Full left ventricle quantification via deep multitask relationships learning. Med Image Anal43:54–65
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Harris, J.K., Greiner, R. (2023). Deep Learning Approaches for End-to-End Modeling of Medical Spatiotemporal Data. In: Ali, H., Rehmani, M.H., Shah, Z. (eds) Advances in Deep Generative Models for Medical Artificial Intelligence. Studies in Computational Intelligence, vol 1124. Springer, Cham. https://doi.org/10.1007/978-3-031-46341-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-46341-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46340-2
Online ISBN: 978-3-031-46341-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)