fMRI-based Decoding of Visual Information from Human Brain Activity: A Brief Review

One of the most significant challenges in the neuroscience community is to understand how the human brain works. Recent progress in neuroimaging techniques have validated that it is possible to decode a person’s thoughts, memories, and emotions via functional magnetic resonance imaging (i.e., fMRI) since it can measure the neural activation of human brains with satisfied spatiotemporal resolutions. However, the unprecedented scale and complexity of the fMRI data have presented critical computational bottlenecks requiring new scientific analytic tools. Given the increasingly important role of machine learning in neuroscience, a great many machine learning algorithms are presented to analyze brain activities from the fMRI data. In this paper, we mainly provide a comprehensive and up-to-date review of machine learning methods for analyzing neural activities with the following three aspects, i.e., brain image functional alignment, brain activity pattern analysis, and visual stimuli reconstruction. In addition, online resources and open research problems on brain pattern analysis are also provided for the convenience of future research.


Introduction
One of the most significant challenges in the fields of neuroscience and machine learning is comprehending how the human brain works. As the provenance of human memory, emotion and thoughts, a better comprehension of the brain will expedite the rapid development of society, including science, medicine, education, etc. [1−3] In order to measure neural activities, different modalities of measurement can be utilized, including event-related optical signals (EROS), positron emission tomography (PET), single-photon emission computed tomography (SPECT), near-infrared spectroscopy (NIRS), magnetoencephalography (MEG), electrocorticography (ECoG), electroencephalography (EEG), and functional magnetic resonance imaging (fMRI). Among all of the above imaging biomarkers, fMRI is one non-invasive technique for probing the neurobiological substrates of various cognitive functions that can provide indirect estimation of brain activity and measure the metabolic changes in blood flow [4−7] . Another advantage of fMRI is that it can provide unprecedented spatiotemporal resolution without known side effects, which intuitively can provide more accurate information for the analysis of neural activities.
Based on the fMRI images, many machine learning models are applied to analyse the visual and subjective contents of human brains [8−10] . Generally, the machine learning-based methods aim to build a mathematical model based on the fMRI sample data, namely the training data, in order to make predictions or decisions without being explicitly programmed to perform the neural activity prediction task on the testing set. For instance, Kamitani and Tong [11] applied a linear regression model to classify brain states and found that the cognitive trials of subjects could be reliably predicted via ensemble fMRI signals recorded in early visual areas. Kay et al. [12] proposed a brain decoding method based on quantitative receptive field models, which learn a representation of the relationship between the stimuli images and the evoked fMRI data in early visual areas. By noticing the proportion of voxels that convey the discriminative information is small compared to the total number of measured voxels, Martino et al. [13] applied a recursive feature elimination (RFE) algorithm to eliminate irrelevant voxels and estimate informative spatial patterns. As another work, Yamashita et al. [14] proposed a linear classification algorithm called sparse logistic regression (SLR), that can automatically select relevant voxels as well as estimate their weight parameters for brain state estimation.
Although much progress has been achieved, given the data sets for the analysis of brain activities, major computational and statistical challenges have arisen to realize the full unprecedented scale and complexity of the valuable fMRI data. Overcoming these challenges has become a major and active research topic in the fields of statistical and machine learning. Here, we summarize and list the main challenges for brain pattern analysis as follows: First of all, a key component of fMRI research will be the use of multi-subject datasets. However, both anatomical structure and functional topography (brain activity patterns) vary across subjects [15−17] , and thus the authentic functional and anatomical alignments among different subjects′ neural activities should be addressed before the development of the classification models. Secondly, the dimensionality of fMRI datasets is always high with redundant noise [18,19] . For some specific brain research experiments, such as visual or auditory stimulation, only a part of the brain area is activated in these tasks. Selecting key brain areas is a prerequisite for accurate brain research. Last but not the least, although researchers have successfully improved the classification performance for identifying brain activity patterns, the reconstruction of visual stimuli via brain images is still a challenging task [8,20] . Compared with the classification tasks, reconstruction of visual images can provide more detailed information for understanding human minds. In recent years, some reviews [21−23] reviewed the mechanisms of brain encoding and decoding as well as common and classic methods. These reviews not only summarized the up-to-date methods, but also presented the challenges in the field of brain decoding and neuroscience. In view of the above challenges, the majority of this review will be devoted to the discussion of the machine learning algorithms for solving the following four types of problems in the field of brain decoding, and we show the flowchart of our paper in Fig. 1.
Firstly, in Section 2, we will examine the problem of functional alignment for fMRI analysis across subject, which is a pre-processing step for the brain decoding analysis that takes into account variability between subjects. Since most of the research reviewed here belongs to this category, we will review a few fundamental brain alignment strategies in Section 2 including linear functional alignment, non-linear functional alignment, etc. Secondly, in Section 3, we will explore the problems of multivariate pattern classification and representation similarity analysis that predict the neural patterns with distinctive stimuli, as well as evaluate the similarities (or distances) between different cognitive tasks. Thirdly, in Section 4, we will review the methods for brain image reconstruction that generate the stimuli image via corresponding fMRI signals. Finally, online resources and open research problems on brain pattern analysis are also provided in Section 5.

Functional alignment
One of the challenges in the field of brain decoding is the fMRI analysis of multi-subject [1,9,16] . Basically, multisubject fMRI data analysis is critical for the general evaluation of the research findings across subjects. However, due to the heterogeneous patterns in multi-subject datasets, the fMRI data collected from different subjects must be aligned into a common space in multi-subject cognitive analysis to overcome the between-subject variability [18] . From the perspective of machine learning, we can regard the alignment problem as a multi-view representation learning problem [1,13] . Herein, the assumption of the alignment problem is that there is some common information across subjects, and the alignment of the data means extracting this common information. Generally, there are mainly two kinds of alignment methods, one is anatomical alignment and the other is functional alignment. The most popular method for fMRI image alignment is the anatomical alignment, which is based on anatomical features via structural MRI images, e.g., Talairach alignment [24] , or Montreal neurological institute (MNI) [25,26] . However, these anatomical based alignment methods cannot significantly improve the accuracy since they are insufficient to address the variability in functional topography of brains. The goal of functional alignment, on the other hand, is to precisely align the fMRI response space across the subjects. In other words, it aims at investigating a common space, where we maximize the within-class stimuli correlation and minimize the correlation between the between-class stimuli to ensure that the prominent distances exist in between-class neural activities compared with each other [15,16] . During the past decade, some research has combined both anatomical and functional features for fMRI functional alignment. For example, Conroy et al. [27] proposed an alignment method that uses cortex warping to maximize the inter-subject pattern alignment. Similarly, cortical warping was used [28] to maximize the cross-subject inter-subject correlation (ISC). As another research project focused on the maximization of ISC, Dmochowski et al. [29] aggregated the data collected from different sub-jects into a common matrix that can take cross-subject variability into consideration. Further, Michael et al. [4] proposed group independent component analysis and independent vector analysis for the functional alignment of resting-state fMRI (rs-fMRI). The algorithm did not assume the simultaneity of stimuli, so it concatenated data along the temporal dimension, which means spatial consistency, and learned the components of spatial independence. Based on the above consideration, a famous alignment method, which is called hyperalignment (HA), was proposed by Haxby et al. [1] to align the neural activity patterns across subjects onto a common space with high dimensions. Hyperalignment is a functional alignment method which is uncorrelated with anatomical features. As is shown in Fig. 2, a basic hypothesis of the original proposed hyperalignment model is that it is a common template with noisy rotations. HA uses Procrustes transform [30] to rotate the coordinate axis of the subject′s representation space, in order to align the response vectors from different subjects. The representation space of different subjects is aligned iteratively and finally a common space could be generated for all the subjects.
Followed by the work of Haxby et al. [1] , many improved alignment methods were proposed to achieve better performance. We can use different criteria to divide these methods. For example, it can be divided into supervised, semi-supervised and unsupervised functional alignment methods according to whether the label information is available. Or it can be divided into linear models and non-linear models according to the way of deriving the transform matrix. In this paper, we will introduce several classic and state-of-the-art functional alignment methods via the second division strategy.

Linear transformation methods for functional alignment
Let the matrices record the data of subjects. Here, and represent the number of subjects, the number of TRs (time of repetition) and the number of voxels, respectively. Mathematically, HA can be formulated through the framework of canonical correlation analysis (CCA) [9] : (1) Based on Haxby et al.′s study [1] , some studies have proposed several improved methods to ameliorate the performance of hyperalignment. Xu et al. [31] proposed a regularized hyperaligment (RHA) method, which iteratively found the optimal regularization parameters by using the expectation-maximization (EM) algorithm. RHA proved that the weights of singular vectors in each normalized dataset are controlled by the relevant regularization parameters, and the classification accuracy can be improved by adjusting the regularization parameters. RHA verified that the weights of the singular vectors in each standardized dataset are controlled by the relevant regularization parameters, and the classification accuracy can be improved by adjusting regularization parameters. Chen et al. [32] proposed singular value decomposition hyperalignment (SVDHA) and used joint singular value decomposition to decompose the response matrix. In this way, they reduced the dimension of fMRI for the first time. After that, HA was used to make the subjects align in a new feature space with lower dimensions, which can reduce calculation time while retaining classification accuracy. Furthermore, a shared response model (SRM) was proposed by Chen et al. [33] as another functional alignment method. Indeed, we can think of SRM as a variant of the probability principal component analysis (PCA), and the specific way of converting is to impose orthogonal constraints on the loading matrix. One of the key attributes of SRM is the dimensional reduction mechanism, which reduces the dimensions of the shared feature space. In other studies, Sui et al. [34,35] applied multimodal CCA and independent component analysis (ICA) methods to multimodal data. In this way, specific and shared variance associations across multimodal data can be identified. The above studies are all based on unsupervised machine learning methods. However, in visual stimulation tasks, we can also collect supervision information such as stimulus image labels. Therefore, Yousefnezhad and Zhang [16] proposed a supervised HA method named local discriminant hyperalignment (LDHA), which brings the concept of linear discriminant analysis (LDA) into CCA that can improve the HA performance of the unsupervised methods.

Non-linear transformation methods for functional alignment
All the HA methods mentioned above attempt to find the transformation matrix of each subject by solving the linear model, and project the response matrices of different subjects into a common space. However, there are always nonlinearity and high-dimensionality problems in the real world. Therefore, several nonlinear HA methods were proposed for the alignment of different subjects. For example, Lorbert and Ramadge [9] proposed a non-linear method which is called kernel hyperalignment (KHA) to do the non-linear transformation in the embedded kernel space. KHA can simultaneously solve the voxel and features expansion problems, and the difficulty of HA shifts from the limitation of the number of voxels to the number of subjects. Chen et al. [36] developed a convolutional auto-encoder (CAE) for functional alignment on wholebrain fMRI data. As another nonlinear HA method, CAE firstly reconstructed SRM into a multi-view autoencoder. Then, CAE applied the standard searchlight (SL) to improve the stability and robustness of the cognitive classification model. With the fast development of the deep neural networks, its powerful fitting ability provides another effective way of transformation for the nonlinear HA method. Yousefnezhad and Zhang [17] proposed a deep hyperalignment (DHA) method as an unsupervised kernel model. As can be seen from Fig. 3, DHA used deep networks, i.e., multiple stacked layers of nonlinear transformation, as the kernel function, which can be solved via rank-m SVD and stochastic gradient descent (SGD). DHA not only solved the nonlinear problems and high-dimensional transformation, but also performed well on classification tasks.
Recently, a cross-subject graph was used by Li et al. [15] to describe the similarities or dissimilarities among different subjects for the HA on fMRI datasets. One advantage of this method is that a new optimization algorithm based on kernels was used for nonlinear feature extraction. Here, we report the alignment results of several existing methods in Table 1, the datasets used in these methods are also shown in Table 1. More information about the presented dataset can be found in Section 5.1. It is worth noting that figures in parentheses indicate the number of categories in different datasets (ROI: region-ofinterest; WB: whole-brain; PMC: post medial cortex).
traced back to 2001. Haxby et al. [38] found that when different images are presented to a subject, different categories of visual stimuli induced different fMRI response patterns. Following their work, several brain activity pattern analysis methods [39−42] have been proposed during the last two decades. A key concept of brain encoding and decoding is the representation of high-dimensional vector spaces. Neural responses, also known as patterns of brain activity, exist in vector form in neural representation spaces. Patterns of brain activity are distributed both spatially and temporally [39] . Features, known as elements, in these patterns are represented as local measurements of brain activity, and each local measurement is expressed as one dimension in the space. Currently, there are numerous techniques that can be used to work with task-based fMRI datasets. These techniques, including multi-voxel pattern analysis (MVPA) and representation similarity analysis (RSA), can effectively extract and decode brain activity patterns. In this section, we will introduce the above two high-dimensional feature analysis techniques.

Multi-voxel pattern analysis
In the early days of fMRI data analysis, univariate methods were mainly used for brain activity pattern recognition. In most of these univariate methods, a general linear model (GLM) [43] was used to estimate each voxel in the brain separately, and the analysis results were shown in an image of model parameters or derived statistics [40] . However, with the development of research techniques, researchers found that the univariate method is not sufficient to support the analysis of fMRI data. In this case, multivariate analysis received more and more attention.
Due to the high spatial resolution of the fMRI and the particularity of the imaging method, the fMRI data have the features of high dimensionality and low signal-to-   [31] [37] 80.85 (average performance 21% above that of basic hyperalignment) KHA [9] Raider (7) 48.93 (Average) for ventral temporal 36.34 (Average) for entire cortex SRM [33] Sherlock-movie  noise ratio. The traditional univariate method treats each voxel as an independent feature, ignoring the correlation between features, which makes it difficult to detect spatial patterns [40] . Multivariate pattern analysis, as an alternative to the traditional univariate method, can more accurately detect the activation distribution of the brain and decode the cognitive state. Therefore, multivariate pattern (MVP) analysis is widely used in many studies in the field of neuroimaging. Information is encoded into brain activity patterns. This information comes from people′s experience, or the thinking and imagination of the world. MVP analysis is a modern approach drawn from computational advances in the last two decades [7,41] . As one of the early studies, Haxby et al. [1] illustrated how cognitive states can be distinguished by multi-voxel brain activity patterns. They proposed a new classifier basing on split-half correlation [7] . The experimental results showed that a distribution representation of eight categories, such as bottles, faces, houses, etc., is contained in the ventral-temporal (VT) cortex. Furthermore, these categories could be decoded from human brain activity [42] .

l1 l2
Recently, sparsity learning methods have also been used to select the most discriminative voxels for brain activity pattern analysis [5,14,44] . Specifically, Yamashita et al. [14] proposed a sparse logistic regression (SLR) method, which was a linear model used for feature selection. The SLR was applied to automatically choose the most discriminative voxels in the brain and estimate the parameter weight for cognitive state identification. Moreover, Ryali et al. [44] proposed a logistic regression-based method as well as a combination of and -norm regularizations to select discriminant brain regions across multiple conditions or groups. Grosenick et al. [45] developed a graph-constrained elastic-net (GraphNet) based wholebrain regression and classification method that can automatically provide interpretable coefficient maps. In addition, Yousefnezhad and Zhang [41] proposed an MVP analysis method based on the AdaBoost algorithm, which was named imbalance AdaBoost binary classification (IABC). IABC converted an imbalance MVP analysis problem to a set of balance problems to improve the fMRI analysis performance significantly. Meel et al. [46] used MVP and functional connectivity analysis methods to study the (vertical) symmetrical representation of the regions of the ventral visual stream. Wen et al. [5] proposed a feature selection method based on group sparse Bayesian logistic regression (GSBLR), which was applied to select the most relevant voxels for binary brain decoding. The grouped automatic relevance determination (GARD) was used in this model as prior to set the parameters, which is in concordance with the group sparsity property of the fMRI data.

Representation similarity analysis
RSA is another well-known method that is widely used in the field of brain activity pattern analysis, which is used to evaluate the similarities between various cognitive states [47,48] . In a visual stimuli task, fMRI signals of subjects are acquired when watching different categories of images or videos. In a perceptual stimuli task, different categories of stimuli can evoke corresponding activity patterns in the brain of a subject. Then, RSA will be used to calculate the similarities between various cognitive states. This process will generate the representational similarity matrix (RSM) that encodes the similarity structure of different cognitive tasks. Fig. 4 shows the computational steps for the derivation of the RSM. In the RSM, each block represents a correlation distance between the activity patterns of a pair of stimuli (i.e., conditions in the experiment). The diagonal elements of the RSM are equal to 1. The value of matrix's non-diagonal elements represents the similarity of brain's responses to two different stimuli. The larger the value, the higher the similarity, vice versa.  Classic RSA is mainly based on traditional linear methods, e.g., GLM [43] , ordinary least squares (OLS) [47] , etc. In fact, we can regard RSA as a multi-task regression problem. Kriegeskorte et al. [47] used the ordinary least squares method to fit the linear model of the time frame for each voxel to measure the spatial activity patterns caused in each condition. This linear model includes a hemodynamic response predictor for each case, as well as an optional further predictor for modeling human factors, such as trends, head movement effects, and baseline shifts between measurement runs. RSA [48] assumes that the brain activity patterns are related to stimuli events, which can be formulated as where , denotes the fMRI time series from the -th subject, is the number of repetition time (TR) and is the number of voxels of brain. The design matrix is denoted by The design matrix can be obtained by the convolution of the stimuli time series with a typical hemodynamic response function (HRF). Here, denotes the number of the categories of stimuli, , denotes the estimated regression matrix, and is an amplitude reflecting the response of the -th voxel to the -th stimuli. GLM is based on a linear model and it cannot achieve satisfactory results since the representation matrix is usually a wide matrix, which means that the voxel account is far more than the time points in fMRI dataset. Moreover, this method makes it difficult to convert data into a matrix [41] . Also, the method′s stability and robustness will decrease when the value of signal-tonoise (SNR) reduces [2] . Further, GLM and OLS will face the problem of overfitting. Most of the existing studies avoid overfitting by adding the regularization terms. For instance, the least absolute shrinkage and selection operator (LASSO) [49] was proposed to solve the regression problem by using -norm, whereas -norm was used in the ridge regression [50] method to address the aforementioned problem. The elastic net [51] , as a modified model, was developed to address the above issues via combining and norms.
On the other hand, a concept called searchlight was introduced by researchers as an alternative method of region-of-interest (ROI) based fMRI analysis. SL implements MVP analysis on sphere-shaped groups of voxels centered on each voxel one by one [5] . As we mentioned before, due to the high spatial resolution of fMRI data, the whole-brain datasets have high dimensionality. In the past, when using RSA methods, it was difficult to convert the data into a matrix and we could not avoid the inverse of the voxel matrix. In addition, when the number of voxels is too large, RSA optimization is also plagued by high-dimension data. Fortunately, compared with traditional RSA algorithms, modern RSA algorithms can optimize the solution process [52] . Su et al. [53] proposed an RSA method that uses searchlight technology for EMEG (a combination of MEG and EEG). This method directly implemented the MVP analysis of information flow in the human brain and the spatial and temporal identification of fine-grained dynamic neural calculations. As an extended application, the SL-based RSA method can also be applied for the structure analysis in the ethical violation space [54] .
In short, RSA provides researchers with a new perspective to compare different genomic representation across different subjects, different ROI from one subject, different modalities of measurement, and even different species. Since similarity structures can be estimated from imaging data even without coding models, RSA cannot only be used for model testing but also for exploratory research [48] . RSA is also initially used to study visual representations [43,55,56] , semantic representations [52,57] and lexical representations [53] . Last but not the least, RSA can also be applied to reveal the representations of social networks [58,59] .

Visual stimuli reconstruction
Like the classification and regression task in machine learning, the purpose of brain decoding is to analyze the subject′s brain activity patterns to perform the task of visual stimuli identification or reconstructing the stimuli details. In recent years, quite a lot of studies have been made for the classification of brain activity patterns [1,38,45] . However, the reconstruction of brain images is still a challenging task. A general conceptual framework for visual stimuli reconstruction is shown in Fig. 5, which can be regarded as a cross-modal reconstruction (The green line represents image reconstruction while the blue line denotes the fMRI). Visual stimuli reconstruction focuses on acquiring the relevant features between the stimuli images and fMRI in order to generate the stimuli images via the corresponding fMRI signal.
Many researchers have made preliminary explorations in the field of visual stimuli reconstruction. As an early exploratory study, Thirion et al. [60] used rotating Gabors to reconstruct dot patterns from stimuli and imagery. They predicted the visual stimuli of both real and imagin-  Fig. 5 General conceptual framework for visual stimulation reconstruction ary scenes via the evoked brain activities, which was elicited from the visual cortex. Moreover, Miyawaki et al. [61] firstly asked the volunteers to watch a lot of flashing checker board images as visual stimuli and recorded the evoked brain activity patterns of these stimuli in the early visual cortex (V1/V2/V3) and then built a sparse multi-scale multinomial logistic regression (SMLR) local decoder model for visual stimuli reconstruction. The experimental results showed that this method provided a new way to interpret the visual perception of the brain.
In recent years, many reconstruction methods have been proposed for visual stimuli reconstruction. These methods can be divided into traditional machine learning methods and the latest deep network framework. Among the traditional machine learning methods, the Bayesian model is the most common one. In this paper, we will review the recent progress with the following two aspects, i.e., the Bayesian-based reconstruction models and deep generation model-based reconstruction methods.

Bayesian-based reconstruction model
Inspired by the work of Miyawaki et al. [61] , some reconstruction models based on Bayesian models are proposed to explore the correlations among the signals recorded in fMRI that can reflect the features of corresponding stimuli images. For example, Naselaris et al. [62] proposed a joint model that combines structural and semantic features of brain activity patterns. And a Bayesian framework is used here to infer the stimuli images from a large-scale dataset via the evoked brain activities. Nishimoto et al. [63] used a Bayesian decoding framework for movie scene reconstruction from the given blood-oxygen-level-dependent (BOLD) signals. A motion-energy encoding model is proposed by the authors that largely overcomes the limitation of tardiness of BOLD signals measured via fMRI. Further, a model called Bayesian canonical correlation analysis (BCCA) was proposed by Fujiwara et al. [64] to automatically learn image bases. CCA was used to construct an invertible mapping based on the Bayesian model. Zhan et al. [10] proposed a reconstruction method based on a support vector machine (SVM) and Bayesian classifier followed by ICA to improve the efficiency of feature extraction and reconstruction performance. Cowen et al. [65] used PCA to transform human face stimuli into a new feature space, and then established the relationship between new features and fMRI signals, and realized reconstruction of human face stimuli for the first time. Du et al. [66] proposed a Bayesian-based reconstruction method that derives missing latent variables by Bayesian inference. The joint generative model of external stimuli and brain activities they proposed can not only extract non-linear features of the stimuli images, but also capture the correlation among brain activities. The reconstruction models based on the Bayesian framework aims to find the relationship between the visual stimuli and the corresponding fMRI signals, and establish a linear mapping between them to achieve the task of image reconstruction. However, the linear mapping often cannot truly reflect the relationship between the two cross-modal data, and the reconstruction results obtained are often coarse-grained, making it difficult to describe the details of the images.

Deep network-based reconstruction model
In the last decade, deep learning has drawn significant attentions for its powerful fitting and generating capabilities. Variational autoencoder (VAE) [67] and generative adversarial network (GAN) [68] are two of the most popular approaches. VAE describes potential spatial observations in a probabilistic manner. Therefore, instead of constructing an encoder that outputs a single value to describe each latent state attribute, we use an encoder to describe the probability distribution of each latent attribute. By sampling from the underlying space, we can use the network of decoders to form a generative model that can create new data that are like the observations of the training data. In other words, we could sample from the prior distribution , and assume that it follows a unit Gaussian distribution. Recently, Du et al. [8] proposed a deep generative multi-view model (DGMM) for stimuli image reconstruction from the evoked brain activity patterns. DGMM can be regarded as a nonlinear extension of BCCA by combining image generation models with Bayesian inferences to accomplish reconstruction tasks.
As other deep learning approaches, Horikawa and Kamitani [69] presented a brain decoding method via the computer vision principle, which represent the categories with a group of latent features through hierarchical processing. By this way, they found that the features of visual images can be predicted from brain activities of subjects. A model based on a deep neural network (DNN) was trained by Shen et al. [20] to establish an end-to-end reconstruction model via visual stimuli images and the evoked brain activity patterns. Experimental results showed that a direct mapping can be learned by the proposed model for perceptual reconstruction.
GAN is another relatively important model in the field of deep learning. The original GAN model is proposed by Goodfellow et al. [68] in 2014, where the discriminator and generator play the following mini-max game: is the distribution of real data, is the distribution of generated data, and is the distribution of the discriminator with parameter , E represents the mathe matical expression. The training stage of GAN can S. Huang et al. / fMRI-based Decoding of Visual Information from Human Brain Activity: A Brief Review be seen as a zero-sum game, in which the generator tries to generate the data that can fool the discriminator, and the discriminator is used to distinguish the generated fake data y from the real data x and label them with 1 and 0, respectively.
Some GAN-based visual stimuli reconstruction models have been proposed and greatly improved the precision of the reconstruction results. For instance, St-Yves and Naselaris [70] used GAN architecture to learn an image generation model and completed perceptual stimuli reconstruction through this model. And in this way, the noise model can be inferred from the measured brain activity. Furthermore, some approaches based on GANs are proposed to reconstruct human face images. Güçlütürk et al. [71] proposed a joint model to combine probabilistic inference with the GAN architecture for face stimuli reconstruction from human brain activities. They maximized posteriori estimation to invert the linear transformation from features in latent space to brain activity patterns. Then, the convolutional neural networks (CNN) were used to invert a non-linear transformation from visual stimuli to latent features. Seeliger et al. [72] introduced a deep convolutional generative adversarial network (DCGAN) architecture to reconstruct the stimuli images. Also, they used a linear model to predict the latent space of a generative model from the evoked brain activity patterns. More recently, VanRullen and Reddy [73] presented thousands of celebrity face images of a large dataset to the subjects as a stimuli task. Then, they trained a VAE neural network using a GAN architecture over this dataset and learnt a linear mapping between face images and fMRI activity patterns. Compared with the classic linear reconstruction methods, models based on deep networks can implement non-linear transformations that greatly improve the accuracy of image reconstruction, and describe images in fine granularity. Fig. 6 shows some experimental results of several visual stimuli reconstruction tasks in recent years. In addition to the visual stimuli reconstruction methods based on Bayesian or deep neural networks we mentioned above, considering it is difficult to collect a large amount of pairwise image-fMRI data for training, there are several methods [74−78] using semi-supervised learning (SSL) to improve brain decoding performance by leveraging large number of images.

Open resources
As we all know, the collection of high-quality datasets is an important guarantee for the research of datadriven machine learning methods. For the decoding of visual information from human brain activity, Open NEURO 1 project is a free and open platform for sharing MRI, MEG, EEG, iEEG and ECoG data. As an extended version of Open fMRI project, the project now has 404 available datasets and 12 037 participants across all datasets. Table 2 shows some datasets in Open NEURO project.
In order to promote the rapid development of the field Thirion et al. [60] (2006) Miyawaki et al. [61] (2008) Visual stimuli reconstruction methods Naselaris et al. [62] (2009) Nishimoto et al. [63] (2011) Cowen et al. [65] (2014) Zhan et al. [10] (2013) Du et al. [8] (2018) Horikawa and Kamitani [69] (2017) Shen et al. [20] (2019) Seeliger et al. [72] (2019) St-Yves and Naselaris [70] (2018) VanRullen and Reddy [73] (2019) PyMVPA is an open source software toolbox based on Python, which is used for the application of analysis techniques based on classifiers to fMRI datasets. PyMVPA is a cross-platform toolbox that makes use of the abilities of Python to access the libraries which are written in various of programming languages and computing environments to interface with the wealth of existing machine learning packages [89,90] . Recently, a new toolbox called easy fMRI 4 is developed for analyzing fMRI datasets (shown in Fig. 7). Easy fMRI is a toolbox with the capability of decoding and visualizing the human brain. It is developed by the iBRAIN 5 research group of Nanjing University of Aeronautics and Astronautics, which is free and open source. It is designed based on the brain imaging data structure (BIDS) file, which supports automatic labelling on the de-signed matrix.
Easy fMRI uses advanced machine learning techniques and high-performance computing to analyze taskbased fMRI datasets. It provides a friendly graphical user interface for feature analysis, HA, MVPA, RSA, etc. In addition, easy fMRI is integrated with FMRIB Sofware Library (for the preprocessing step), SciKit-Learn (for model analysis), PyTorch (for deep learning methods), and AFNI/SUMA (for 3D visualization). ANFI represents analysis of functional neuroimages. SUMA allows viewing 3D cortical surface model and mapping volumetric data onto them.

Future work
In this paper, we reviewed the methods developed and employed for the decoding of visual information from human brain activity. In future studies, there are several issues needing to be addressed. For instance, task-based fMRI is difficult to collect due to the difficulty in keeping subjects′ heads stationary. Therefore, the sample sizes of most task-based fMRI datasets are small. Some studies [91−93] are proposed to process them by applying domain adaptation and transfer learning algorithms. In order to make the brain decoding algorithms available to large scale and multi-site fMRI datasets, this is an important issue and needs more studies.
Furthermore, the three aspects we mentioned above, i.e., brain image alignment, brain decoding and brain image reconstruction are usually studied independently. In the future, we will consider combining them together to deal with more complex real-world problems. For ex- S. Huang et al. / fMRI-based Decoding of Visual Information from Human Brain Activity: A Brief Review ample, Du et al. [8] mentioned that in the visual stimuli reconstruction task, the reconstruction results of different subjects were significantly different. To solve this problem, we can combine HA and the reconstruction task to reduce reconstruction differences across subjects. Finally, most of the current methods do not make good use of the structural information of the whole brain structure data. In future studies, we plan to develop information-based models on the basis of understanding the intrinsic information of the whole brain structure data to smooth the data information of small areas. It makes the information valid area in the whole brain data clearer and provides better input information for subsequent feature selection and representation similarity analysis.

Conclusions
In this paper, we have reviewed the mechanisms and the strategies of machine learning methods for analyzing neural activities via fMRI data. As an interdisciplinary field of research, computational neuroscience can break the neural codes via different concepts from different subjects such as mathematics, psychology, machine learning, etc. However, there are still some challenges in the field of fMRI research such as multi-subject datasets, high-dimensional feature analysis and the generation of visual images from fMRI. We conducted a brief review on the state-of-the-art machine learning techniques for solving these challenges, including linear and nonlinear functional alignment, multi-voxel pattern analysis, representation similarity analysis and visual stimuli reconstruction based on Bayesian or deep neural networks. Last but not least, we also provided online resources and open research problems on brain pattern analysis for the convenience of future research, and put forward some ideas for future work in the field of brain science and neural computing.

Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article′s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article′s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.