Modeling 4D fMRI Data via Spatio-Temporal Convolutional Neural Networks (ST-CNN)

Zhao, Yu; Li, Xiang; Zhang, Wei; Zhao, Shijie; Makkie, Milad; Zhang, Mo; Li, Quanzheng; Liu, Tianming

doi:10.1007/978-3-030-00931-1_21

Yu Zhao¹⁸,
Xiang Li¹⁹,
Wei Zhang¹⁸,
Shijie Zhao²⁰,
Milad Makkie¹⁸,
Mo Zhang²¹,
Quanzheng Li^19,21,22 &
…
Tianming Liu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11072))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

9340 Accesses
26 Citations

Abstract

Simultaneous modeling of the spatio-temporal variation patterns of brain functional network from 4D fMRI data has been an important yet challenging problem for the field of cognitive neuroscience and medical image analysis. Inspired by the recent success in applying deep learning for functional brain decoding and encoding, in this work we propose a spatio-temporal convolutional neural network (ST-CNN) to jointly learn the spatial and temporal patterns of targeted network from the training data and perform automatic, pin-pointing functional network identification. The proposed ST-CNN is evaluated by the task of identifying the Default Mode Network (DMN) from fMRI data. Results show that while the framework is only trained on one fMRI dataset, it has the sufficient generalizability to identify the DMN from different populations of data as well as different cognitive tasks. Further investigation into the results show that the superior performance of ST-CNN is driven by the jointly-learning scheme, which capture the intrinsic relationship between the spatial and temporal characteristic of DMN and ensures the accurate identification.

Y. Zhao and X. Li—Joint first authors

You have full access to this open access chapter, Download conference paper PDF

A Guided Attention 4D Convolutional Neural Network for Modeling Spatio-Temporal Patterns of Functional Brain Networks

Simultaneous Spatial-Temporal Decomposition of Connectome-Scale Brain Networks by Deep Sparse Recurrent Auto-Encoders

Simultaneous spatial-temporal decomposition for connectome-scale brain networks by deep sparse recurrent auto-encoder

Article 23 March 2021

Keywords

1 Introduction

Recently, analytics of the spatio-temporal variation patterns of functional Magnetic Resonance Imaging fMRI [1] has been substantially advanced through machine learning (e.g. independent component analysis (ICA) [2, 3] or sparse representation [4, 5]) and deep learning methods [6, 7]. As fMRI data are acquired as series of 3D brain volumes during a span of time to capture functional dynamics of the brain, the spatio-temporal relationships are intrinsically embedded in the acquired 4D data which need to be characterized and recovered.

In literatures, the spatio-temporal analytics methods can be summarized into two groups: the first group performs the analysis on either spatial or temporal domain based on the corresponding priors, then regress out the variation patterns in the other domain. For example, temporal ICA identifies the temporally independent “signal source” in the 4D fMRI data, then obtains the spatial patterns of those sources through regression. Recently proposed deep learning-based Convolutional Auto-Encoder (CAE) model [8], temporal time series, and spatial maps are regressed later using resulting temporal features. Sparse representation methods, on the other hand, identify the spatially sparse components of the data, while the temporal dynamics of these components are obtained through regression. Works in [9] utilizes Restricted Boltzmann Machine (RBM) for spatial feature analysis ignores the temporal feature.

The second group performs the analysis on spatial and temporal domain simultaneously. For example, [10] applies Hidden Process Model with spatio-temporal “prototypes” to perform the spatio-temporal modeling. Another effective approach to incorporate temporal dynamics (and relationship between time frames) into the network modeling is through Recurrent Neural Network [11]. Inspired by the superior performance and the better interpretability of the simultaneous spatio-temporal modeling, in this work we proposed a deep spatio-temporal convolutional neural network (ST-CNN) to model the 4D fMRI data. The goal of the model is to pinpoint the targeted functional networks (e.g., Default Mode Network DMN) directly from the 4D fMRI data. The framework is based on two simultaneous mappings: the first is the mapping between the input 3D spatial image series and the spatial pattern of the targeted network using a 3D U-Net. The second is the mapping between the regressed temporal pattern of the 3D U-Net output and the temporal dynamics of the targeted network, using a 1D CAE. Summed loss from the two mappings are back-propagated to the two networks in an integrated framework, thus achieving simultaneous modeling of the spatial and temporal domain. Experimental results show that both spatial pattern and temporal dynamics of the DMN can be extracted accurately without hyper-parameter tuning, despite remarkable cortical structural and functional variability in different individuals. Further investigation shows that the framework trained from one fMRI dataset (motor task fMRI) can be effectively applied on other datasets, indicating ST-CNN offers sufficient generalizability for the identification task. With the capability of pin-pointed network identification, ST-CNN can serve as a useful tool for cognitive or clinical neuroscience studies. Further, as the spatio-temporal variation patterns of the data are intrinsically intertwined within an integrated framework, ST-CNN can potentially offer new perspectives for modeling the brain functional architecture.

2 Method and Materials

ST-CNN takes 4D fMRI data as input and generates both spatial map and temporal time series of the targeted brain functional network (DMN) as output. Different from CNNs for image classifications (e.g. [12]), ST-CNN consists of a spatial convolution network and a temporal convolution network, as illustrated in Fig. 1(a). The targeted spatial network maps of sparse representation on fMRI data [4] are used to train the spatial network of ST-CNN, while the corresponding temporal dynamics of the spatial networks are used to train the temporal networks.

2.1 Experimental Data and Preprocessing

We use the Human Connectome Project (HCP) Q1 and S900 release datasets [13] for the experiments. Specifically, we use motor task-evoked fMRI (tfMRI) for training the ST-CNN, and test its performance using the motor and emotion tfMRI data from Q1 release and motor task tfMRI data S900 release. The preprocessing pipelines for tfMRI data include skull removal, motion correction, slice time correction, spatial smoothing, global drift removal (high-pass filtering), all implemented by FSL FEAT.

After preprocessing, we apply sparse representation method [4] to decompose tfMRI data into functional networks on both training and testing data sets. The decomposition results consist of both the temporal dynamics (i.e. “dictionary atoms”) and spatial patterns (i.e. “sparse weights”) of the functional networks. The individual targeted DMN is then manually selected based on the spatial patterns of the resulting networks. The selection process is assisted with sorting the resulting network by their spatial overlap rate with the DMN template (from [14]), measured by Jaccard similarity (i.e. overlap over union). We use the dictionary (1-D time series) of the selected network as ground-truth time series for training the CAE.

2.2 ST-CNN Framework

Spatial Network

The spatial network is inspired from the 2D U-Net [15] for semantic image segmentation. By extending and adapting the 2D classification U-Net to a 3D regression network (Fig. 1(b)), the spatial network takes 4D fMRI data as input, each 3D brain volume along the time frames is assigned to one independent channel. Basically, this 3D U-Net is constructed by a contracting CNN and a expending CNN, where the pooling layers (red arrows in Fig. 1(b)) in the contracting CNN are replaced by up-sampling layers (green arrows in Fig. 1(b)). This 3D U-shaped CNN structure contains only convolutional layers without fully connected layers. Loss function for training the spatial network is the mean squared error between the network output which is a 3-D image and the targeted DMN.

Temporal Network

The temporal network (Fig. 1(b)) is inspired by the 1-D Convolutional Auto-Encoder (CAE) for fMRI modeling [8]. Both the encoder and decoder of the 1-D CAE have the depth of 3. The encoder starts by taking 1-D signal as input and convolving it with a convolutional kernel size of 3, yielding 8 feature map channels, which are down-sampled using a pooling layer. Then a convolutional layer with kernel size 5 is attached, yielding 16 feature map channels, which are also down-sampled using a pooling layer. The last part of the encoder consists of a convolutional layer with kernel size 8, yielding 32 feature map channels. The decoder takes the output of the encoder as input and symmetrize the encoder as traditional auto-encoder structure. Loss function for training the temporal network is negative Pearson correlation (2) between the temporal CAE output time series with the temporal dynamics of the manually-selected DMN.

$$ {\text{Temporal loss}} = - \frac{{N\sum\nolimits_{1}^{N} {xy} - \sum\nolimits_{1}^{N} x \sum\nolimits_{1}^{N} y }}{{\sqrt {(N\mathop \sum \nolimits_{1}^{N} x^{2} - \left( {\sum\nolimits_{1}^{N} x } \right)^{2} )(N\mathop \sum \nolimits_{1}^{N} y^{2} - \left( {\mathop \sum \nolimits_{1}^{N} y} \right)^{2} )} }} $$

(1)

Combination Joint Operator

This combination (Fig. 1(b)) procedure connects spatial network and temporal network through a convolution operator. Inputs for the combination are the 4-D fMRI data and 3-D output from the spatial network (i.e. spatial pattern of estimated DMN). The 3-D output will be used as a 3-D convolutional kernel to perform a valid no-padding convolution over each 3-D volume across each time frame of the 4-D fMRI data (3). Since the convolutional kernel size is the same as each 3D brain volume along the 4th (time) dimension, the no-padding convolution will result in a single value at each time frame, thus forming a time series for the estimated DMN. This output time series ts will be used as the input for temporal 1-D CAE, as described above.

$$ ts\, \in \,{\mathbb{R}}^{T \times 1} \, = \,\left\{ {t_{1} , t_{2} , \ldots , t_{T} |t_{i} = V_{i} \,*\,DMN \, \in \,{\mathbb{R}}} \right\}, $$

(2)

where t_i is the convolution result at each time frame, V_i is the 3-D fMRI volume at time frame i, and DMN is the 3-D spatial network output used as convolution kernel.

2.3 Training Process and Model Convergence

Since the temporal network will rely on the DMN spatial map from the spatial network, we split the training process into 3 stages: at the first stage, only spatial network is trained (Fig. 2(a)); at the second stage, temporal network is trained based on the spatial network results (Fig. 2(b)); and finally, the entire ST-CNN is trained for fine-tuning (Fig. 2(c)). As we can see from Fig. 2, the temporal network converges much faster (around 10 times faster) than the spatial network. Thus during the fine-tuning stage, the loss function for ST-CNN is a weighted sum (10:1) of both spatial and temporal loss.

2.4 Model Evaluation and Validation

We firstly calculate the spatial overlap rate between the spatial pattern of ST-CNN output and a well-established DMN template to evaluate the performance of spatial network. We then calculate the Pearson correlation of the output time series with ground-truth time series from sparse representation results to evaluate the temporal network. Finally we utilize a supervised dictionary learning method [16] to reconstruct the spatial patterns of the network based on temporal network result to investigate whether the spatio-temporal relationship is correctly captured by the framework.

3 Results

We use 52 subjects’ motor tfMRI data from HCP Q1 release for training the ST-CNN. We test the same trained network on three datasets: (1) motor tfMRI data from the rest of 13 subjects. (2) motor tfMRI data from 100 randomly -selected subjects in the HCP S900 release. (3) emotion tfMRI data from 67 subjects from HCP Q1 release. Testing results show consistently good performance for DMN identification, demonstrating that trained network is not limited to specific population and specific cognitive tasks.

3.1 MOTOR Task Testing Results

The trained ST-CNN is tested on 2 different motor task datasets: 13 subjects from HCP Q1 and 100 subjects from HCP S900, respectively. As shown in Fig. 3, the resulting spatial and temporal patterns are consistent with the ground-truth. Quantitative analyses shown in Table 1 demonstrates that the ST-CNN performs better than sparse representation method, although it is trained from the manually-selected results of sparse representation. The rationale is that the ST-CNN can better adapt to the input data by the co-learned spatial and temporal networks, while sparse representation relies on the simple sparsity prior which can be invalid in certain cases. As shown in Fig. 4, sparse representation cannot identify DMN from certain subjects while ST-CNN can. In HCP Q1 dataset, we have observed 20% (13 out of 65 subjects) of cases where sparse representation fails while ST-CNN succeeds. Considering the fact that DMN is supposed to be consistently presented in the functioning brain regardless of task, this is an intriguing and desired characteristic of the ST-CNN model.

Table 1. Performance of ST-CNN measured by spatial overlap rate

Full size table

3.2 EMOTION Task Testing Results

The 67 subjects’ emotion task-evoked fMRI data (HCP Q1) were further tested to demonstrate that our trained network based on motor task is not prone to specific cognitive tasks. The ability to extract DMN both spatially and temporally of our framework showed that the intrinsic features of DMN were well captured. As shown in Fig. 5, the spatial maps resemble with the ground-truth sparse representation results and so do the temporal outputs. Quantitative analyses in Table 1 showed that our outputs also had larger spatial overlap with DMN templates than outputs from sparse representation. The temporal outputs were also shown accurate, with an average Pearson correlation coefficient of 0.51.

3.3 Spatial Output and Temporal Output Relationship

For further validation, supervised sparse representation [16] is applied on 13 testing subjects’ HCP Q1 motor task fMRI data. We set the temporal output of ST-CNN as predefined dictionary atoms to obtain the sparse representation on the data by learning the rest of the dictionaries. The resulting network corresponding to the predefined atom, which has the fixed temporal dynamics during the learning, are compared with ST-CNN spatial outputs. We found that the temporal output of ST-CNN can lead to an accurate estimation of the DMN spatial patterns as in Fig. 6. The average spatial overlap rate between the supervised results and ST-CNN spatial output is 0.144, suggesting that the spatial output of ST-CNN has close relationship with its temporal output.

4 Discussion

In this work, we proposed a novel spatio-temporal CNN model to identify functional networks (DMN as an example) from 4D fMRI data modelling. The effectiveness of ST-CNN is validated by the experimental results on different testing datasets. From an algorithmic perspective, the result shows that ST-CNN embeds the spatial-temporal variation patterns of the 4D fMRI signal into the network, rather than learns the matrix decomposition process by the sparse representation. It is then very important to further refine the framework by training it over DMNs identified by other methods (such as temporal ICA). More importantly, we use DMN as a sample targeted network in the current work, since it should be present in virtually any fMRI data. As detecting the absence/disruption a functional network is as important as identifying it (e.g. for AD/MCI early detection), in the future work we will focus on extending the current framework to pinpoint more functional networks, including task-related networks which should be presented in a limited range of datasets. We will also test ST-CNN on fMRI from abnormal brains for its capability of characterizing the spatio-temporal patterns of the disrupted DMNs.

References

Heeger, D.J., Ress, D.: What does fMRI tell us about neuronal activity? Nat. Rev. Neurosci. 3, 142–151 (2002)
Article Google Scholar
Cole, D.M., Smith, S.M., Beckmann, C.F.: Advances and pitfalls in the analysis and interpretation of resting-state FMRI data. Front. Syst. Neurosci. 4, 8 (2010)
Google Scholar
McKeown, M.J., et al.: Independent component analysis of functional MRI: what is signal and what is noise? Curr. Opin. Neurobiol. 13, 620–629 (2003)
Article Google Scholar
Lv, J., et al.: Sparse representation of whole-brain fMRI signals for identification of functional networks. Med. Image Anal. 20, 112–134 (2015)
Article Google Scholar
Zhao, S., et al.: Decoding auditory saliency from brain activity patterns during free listening to naturalistic audio excerpts. Neuroinformatics 16, 309–324 (2018)
Article Google Scholar
Litjens, G., et al.: A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
Article Google Scholar
Zhao, Y., et al.: Constructing fine-granularity functional brain network atlases via deep convolutional autoencoder. Med. Image Anal. 42, 200–211 (2017)
Article Google Scholar
Huang, H., et al.: Modeling task fMRI data via deep convolutional autoencoder. IEEE Trans. Med. Imaging. 1 (2017)
Google Scholar
Hjelm, R.D., et al.: Restricted Boltzmann machines for neuroimaging: an application in identifying intrinsic networks. Neuroimage 96, 245–260 (2014)
Article Google Scholar
Shen, Y., et al.: Spatial–temporal modelling of fMRI data through spatially regularized mixture of hidden process models. Neuroimage 84, 657–671 (2014)
Article Google Scholar
Hjelm, R.D., Plis, S.M., Calhoun, V.: Recurrent neural networks for spatiotemporal dynamics of intrinsic networks from fMRI data. In: NIPS: Brains and Bits (2016)
Google Scholar
Krizhevsky, A., et al.: Imagenet classification with deep convolutional neural network (2012)
Google Scholar
Barch, D.M., et al.: Function in the human connectome: task-fMRI and individual differences in behavior. Neuroimage 80, 169–189 (2013). WU-Minn HCP Consortium
Article Google Scholar
Smith, S.M., et al.: Correspondence of the brain’s functional architecture during activation and rest. Proc. Natl. Acad. Sci. U.S.A. 106, 13040–13045 (2009)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Zhao, S., et al.: Supervised dictionary learning for inferring concurrent brain networks. IEEE Trans. Med. Imaging 34, 2036–2045 (2015)
Article Google Scholar

Download references

Acknowledgments

Quanzheng Li is supported in part by the National Institutes of Health under Grant R01AG052653.

Author information

Authors and Affiliations

The University of Georgia, Athens, GA, 30605, USA
Yu Zhao, Wei Zhang, Milad Makkie & Tianming Liu
MGH/BWH Center for Clinical Data Science, Boston, MA, 02115, USA
Xiang Li & Quanzheng Li
Northwestern Polytechnical University, Xi’an, 710072, Sha’anxi, China
Shijie Zhao
Peking University, Beijing, 100080, China
Mo Zhang & Quanzheng Li
Laboratory for Biomedical Image Analysis, Beijing Institute of Big Data Research, Beijing, 100871, China
Quanzheng Li

Authors

Yu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shijie Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Milad Makkie
View author publications
You can also search for this author in PubMed Google Scholar
Mo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Quanzheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Tianming Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Quanzheng Li or Tianming Liu .

Editor information

Editors and Affiliations

University of Leeds, Leeds, UK
Alejandro F. Frangi
King’s College London, London, UK
Julia A. Schnabel
University of Pennsylvania, Philadelphia, PA, USA
Christos Davatzikos
Universidad de Valladolid, Valladolid, Spain
Carlos Alberola-López
Queen’s University, Kingston, ON, Canada
Gabor Fichtinger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhao, Y. et al. (2018). Modeling 4D fMRI Data via Spatio-Temporal Convolutional Neural Networks (ST-CNN). In: Frangi, A., Schnabel, J., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science(), vol 11072. Springer, Cham. https://doi.org/10.1007/978-3-030-00931-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-00931-1_21
Published: 13 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00930-4
Online ISBN: 978-3-030-00931-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics