Key words

1 Introduction

Medical imaging plays a key role in brain disorders. In clinical care, it is vital for detection, diagnosis, and treatment monitoring. It is also an essential tool for research to characterize the anatomical, functional, and molecular alterations in brain disorders, to better understand the pathophysiology, or to evaluate the effects of new treatments in clinical trials, for instance. Medical imaging of the brain is referred to as neuroimaging and involves different modalities such as X-ray computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), or single-photon emission computed tomography (SPECT).

Most neuroimaging modalities have been developed in the 1970s (Fig. 1). The first CT image of a brain was acquired in 1971 [1, 2]. This technology results from the discovery of X-rays by Wilhelm Röntgen in 1895 [3]. A few years later, PET [4] and then SPECT [5, 6] cameras were developed. Both modalities result from the discovery of natural radioactivity in 1896 by Henri Becquerel [7]. The first MR image of a brain goes back to 1978 [8] following the discovery of nuclear magnetic resonance in 1946 by Felix Bloch [9]. Some of these imaging modalities were later combined into hybrid scanners. The first prototype combining PET and CT was introduced into the clinical arena in 1998 [10], while the first PET and MR images of a brain simultaneously acquired were reported in 2007 [11, 12]. The first commercial SPECT/CT system dates back to 1999 [13], while SPECT/MR systems are still under development [14].

Fig. 1
A timeline diagram of the 10 main developments in neuroimaging. It includes the following. X-ray discovery, 1895. C T scan of the brain, 1971. P E T camera, 1975. SPECT cameras, 1976. M R I of a brain, 1978. PET or C T prototype, 1998. Commercial SPECT or C T system, 1999. PET or M R I of a brain, 2007.

Timeline of the main developments in neuroimaging

CT and MRI are the modalities of choice when studying brain anatomy, while SPECT and PET are used to image particular biological processes. Note that MRI is a versatile modality that allows studying both the structure and function of the brain, through the acquisition of different sequences. The use of these imaging modalities differs between clinical practice and research contexts. For example, CT is the main modality used in hospitals on adults [15], while MRI is by far the modality the most used for the study of brain disorders with machine learning (Fig. 2, top). The two most studied disorders with machine learning are brain tumors and dementia, mainly Alzheimer’s disease (Fig. 2, bottom).

Fig. 2
2 pie charts in percentage. 1. Out of 6 imaging modalities, M R I and S P E C T share the maximum and minimum percentages of 70 and 3, respectively. 2. Out of 8 brain disorders, brain tumors, and epilepsy share the maximum and minimum percentages of 37 and 3, respectively.

Distribution by imaging modality (top) and brain disorder (bottom) of 1327 articles presenting a study using machine learning. Note that these numbers should only be taken as rough indicators as they result from a non-exhaustive literature search. The Scopus query and the resulting articles (after some manual filtering) are available as a public Zotero library (https://www.zotero.org/groups/4623150/neuroimaging_with_ml_for_brain_disorders/library)

This chapter will start by shortly describing the nature of neuroimages, detailing the type of features that can be extracted from them, and listing software tools that can be used to do so. We will then briefly describe the principles of the imaging modalities the most used in machine learning studies: anatomical, diffusion, and functional MRI, CT, PET, and SPECT. For each modality, we will report the processing steps often perform to extract features, explain the type of information provided, and give examples of their use in machine learning studies.

2 Manipulating Neuroimages

In clinical routine, neuroimages are primarily exploited through visual inspection by a radiologist (or a neuroradiologist, who is a radiologist with an additional specialization in brain imaging, in expert hospitals) or a nuclear medicine physician. This results in a radiological report that is a written text describing the characteristics of the brain of the patient, its alterations, and possibly the most likely diagnosis. Note that neuroimaging exploration is usually requested by a neurologist or a psychiatrist and is associated with an indication that may correspond to the exploration of a set of symptoms (for instance, the exploration of a dementia syndrome) or to the confirmation of a potential diagnosis. Neuroimaging alone will thus usually not provide a diagnosis. It will rather bring arguments in favor, or against, a potential diagnosis (for instance, in the exploration of a dementia syndrome, MRI can bring positive arguments for a diagnosis of Alzheimer’s disease due to the observed atrophy in specific areas or on the contrary exclude this diagnosis by showing that the syndrome is due to a different cause such as a brain tumor). Overall, the diagnosis will generally be made by the neurologist or the psychiatrist based on a combination of clinical examination and a set of multimodal data (clinical and cognitive tests, radiological report, biomarkers, etc.).

However, the use of neuroimages goes way beyond visual inspection and is subject to quantification using image processing procedures. This is particularly true in research even though image processing tools are also increasingly used in clinical routine. A characteristic of these tools that differentiates them from general purpose image processing tools is their ability to handle three-dimensional (3D) images.

2.1 The Nature of 3D Medical Images

Most medical imaging devices acquire 3D images. This is the case of all the ones presented in this chapter (MRI, CT, PET, and SPECT). If 2D images are essentially 2D arrays of elements called pixels (for picture elements), 3D images are 3D arrays of elements called voxels (for volume elements). Depending on the imaging modality, voxel values will represent different properties of the underlying tissues. For example, in a CT image, they will be proportional to linear attenuation coefficients. The shape and size of a voxel will also depend on the imaging modality (or the type of sequence in MRI). When its three dimensions are of equal lengths, the voxel is isotropic; otherwise, it is anisotropic (see Fig. 3). For instance, a typical voxel size for a T1-weighted MR image is about 1 × 1 × 1 mm3, while it is about 3 × 3 × 3 mm3 for a functional MR image. Most neuroimaging modalities will have a voxel dimension between 0.5 mm and 5 mm.

Fig. 3
A chart represents an M R I of the cross-section of a brain and the 3-D models of the brain with isotropic and anisotropic voxels.

Most neuroimaging modalities are three-dimensional. Left: volume rendering of an excavated T1-weighted MR image. Middle: voxel grid with isotropic, i.e., cubic, voxels overlaid on the MRI. Right: voxel grid with anisotropic, i.e., rectangular, voxels overlaid on the MRI

Even though most neuroimages are 3D, they are visualized as 2D slices along different planes: axial, coronal, or sagittal (see Fig. 4). Multiple tools exist to visualize neuroimages. Several are available within suites such as FSLeyes,Footnote 1 Freeview,Footnote 2 or medInria,Footnote 3 while others are independent such as Vinci,Footnote 4 Mango,Footnote 5 or Horos.Footnote 6 Note that viewers may interpolate the images they display, which may be misleading (see Fig. 5 for an illustration).

Fig. 4
A chart illustrates the M R Is of the axial, coronal, and sagittal slices of a brain. Each has 6 different slices.

Axial, coronal, and sagittal slices extracted from a T1-weighted MR image

Fig. 5
2 pairs of M R Is of the axial views of a brain without interpolation and with linear interpolation. The voxel sizes are mentioned below.

Axial slice of a T1-weighted MRI with an isotropic voxel size originally of 1 × 1 × 1 mm3 (left) and downsampled to 2 × 2 × 2 mm3 (right) displayed without interpolation or with linear interpolation. If the difference with or without interpolation is subtle at 1 × 1 × 1 mm3, it is well visible at 2 × 2 × 2 mm3

2.2 Extracting Features from Neuroimages

When using machine learning to analyze images, one will often extract features. These features can be grouped into four categories that we will now describe and are illustrated in Fig. 6. Note that these features are conceptually the same for the different modalities but their actual content will differ (e.g., volume of a region for anatomical MRI vs average uptake in this region for PET). Modality-specific preprocessing and corrections often need to be applied before neuroimages can be analyzed; these will be described in Subheadings 3, 4, and 5.

Fig. 6
An illustrative chart represents the voxel, vertex, regional, and graph-based features, which gives C T and F P-C I T S P E C T, cortical thickness and F-F D G P E T S U V R, regional fractional anisotropy, and connectome.

Examples of voxel, vertex, regional, and graph features that can be extracted from neuroimages. It is, for instance, possible to extract voxel-based features from CT and SPECT images, vertex-based features from anatomical T1-weighted (T1w) MRI or PET images, regional features from diffusion MRI, and graph-based features from functional MRI. Note that the modalities are just examples. For instance, voxel-based features can be extracted for any modality. See Subheadings 3, 4, and 5 for a description of the imaging modalities

Voxel-Based Features

As mentioned previously, all the imaging modalities described in this chapter produce volumetric images. The whole 3D image can be used as input of a machine learning algorithm. In that case, each subject is seen as a collection of values at each voxel of the image. These values can simply be the intensity of the image at each voxel after some minimal preprocessing (which is very often what is used in deep learning) or some more complex value extracted from the image (for instance, gray-level density from anatomical MRI; see Subheading 3.1). A prerequisite is often to align the images studied in a common space, by registering each image to a template and/or by performing a group-wise registration, thus guaranteeing a voxel-wise correspondence across subjects [16]. Note that this correspondence becomes particularly important when using a machine learning algorithm that takes as input a vector in which each element implicitly represents the same information for each subject (e.g., logistic regression or support vector machine).

Vertex-Based Features

Studying the surface of the cortex is natural given its shape: it is a convoluted ribbon delimited by inner and outer surfaces. Moreover, surface-based characteristics can provide useful information such as for developmental or neurodegenerative diseases. Surfaces can be represented as meshes consisting of vertices, edges, and faces. The vertices encode position and properties such as cortical thickness. In the vertex-based feature scenario, each subject is seen as a collection of values at each vertex of the surface. Classical values computed at each vertex include cortical thickness and local surface area (see Subheading 3.1). As for voxel-based features, images studied are usually aligned in a common space to ensure a vertex-wise correspondence across subjects [17, 18].

Regional Features

The brain can be divided into subregions according to different criteria that can be anatomical or functional [16]. When considering regional features, each subject is seen as a collection of values for each region of the brain defined by an atlas. Many atlases exist, either anatomical or functional, with different degrees of granularity. A list can be found online.Footnote 7 Classical values include the volume of a given region or the average image signal within a region.

Graph-Based Features

A last way to represent an image is through a graph where nodes will correspond to brain regions and edges will encode a particular property (for instance, anatomical or functional connections, possibly together with their strength). Graphs can directly be used as features, but network indices characterizing global and local graph topology, e.g., efficiency or degree, can also be computed [19].

2.3 Neuroimaging Software Tools

The features described above can be obtained using neuroimaging software tools. However, an important step before any preprocessing or analysis is to properly organize data. The neuroimaging community proposed the Brain Imaging Data Structure [20], which specifies how to organize data in folders and sub-folders on disk and how to name the files. It also details the metadata necessary to describe neuroimaging experiments.

Many tools exist to process neuroimages.Footnote 8 The historical generic frameworks include SPMFootnote 9 [21], FSLFootnote 10 [22], FreeSurferFootnote 11 [23], or ANTsFootnote 12 [24]. Some tools are modality-specific such as MRtrixFootnote 13 [25], dedicated to diffusion MRI, or AFNIFootnote 14 [26], dedicated to functional MRI. Recent initiatives aim to make the use of neuroimaging tools easier by distributing them in containers (e.g., BIDSAppsFootnote 15 [27]), by providing in a single environment tools from preprocessing to machine learning (e.g., NilearnFootnote 16 [28]), or by providing automatic pipelines that do not require a particular expertise in image processing (e.g., ClinicaFootnote 17 [29]). Other tools facilitate the application of deep learning approaches to neuroimages or medical images in general: for instance, MONAI,Footnote 18 TorchIOFootnote 19 [30], or ClinicaDLFootnote 20 [31].

3 Magnetic Resonance Imaging

Magnetic resonance imaging is the modality of choice to study brain anatomy, thanks to its high-resolution and excellent soft-tissue contrast, but the applications of MRI go well beyond studying anatomy. This technique can be used to examine tissue micro-architecture (diffusion MRI, covered in Subheading 3.2) or neuronal activity (functional MRI, covered in Subheading 3.3) but also to visualize the brain vasculature (MR angiography), study tissue perfusion and permeability (perfusion MRI), assess iron deposits and calcifications (susceptibility-based imaging), or measure the levels of different metabolites (MR spectroscopy). Note that MRI is an extremely versatile modality and that new sequences are constantly developed to study other brain characteristics.

3.1 Anatomical MRI

3.1.1 Basic Principles

In MRI, most images are obtained by exploiting a magnetic property, called spin, of the hydrogen atomic nuclei found in the water molecules present in the human body. In the absence of a strong external magnetic field, the directions of the proton’s spins are random, thus cancelling each other out (Fig. 7a). When the spins enter a strong external magnetic field (B0), they align either parallel or antiparallel, and they all precess around the B0 axis, referred to as the z axis (Fig. 7b). As a result, they cancel each other out in the transverse (x, y) plane, but they add up along the z axis. The result of this vector addition, called net magnetization M0, is proportional to the proton density (Fig. 7c). With the application of a radio frequency pulse denoted as B1, the system of spins and the net magnetization are tipped by an angle determined by the strength and duration of the radio frequency pulse. For a 90 radio frequency pulse, the magnetization along the z axis (Mz) becomes zero and the magnetization in the transverse plane (Mxy) becomes equal to M0 (Fig. 7d). As this radio frequency pulse provides energy, or excites, the spins, we also talk of radio frequency excitation.

Fig. 7
4 illustrations. A. A cluster of random proton's spins. B. The proton's spins are arranged in a magnetic field. C. It represents net magnetization along the z-axis. The magnetic field passes parallel to the z-axis. D, It represents net magnetization along the y-axis, perpendicular to the magnetic field.

MRI physics in a nutshell. (a) In the absence of a magnetic field, the directions of the proton’s spins are random. (b) When the spins enter a strong external magnetic field (B0), they align either parallel or antiparallel, and they all precess around the B0 axis. (c) The net magnetization M0 is proportional to the proton density. (d) With the application of a radio frequency pulse, the system of spins is tipped

When the radio frequency pulse is then turned off, two phenomena occur. First, the system of spins relaxes back to its preferred energy state of being parallel with B0 in a time T1, called longitudinal or spin-lattice relaxation time, and the longitudinal magnetization Mz slowly recovers to its original magnitude M0. Second, each spin starts precessing at a frequency that is slightly different from the one of its neighboring spins because the field of the MRI scanner is not uniform and because each spin is influenced by the small magnetic fields of the neighboring spins. When the spins are completely dephased, they are evenly spread in the transverse plane, and Mxy becomes zero. Mxy decreases at a much faster rate than that at which Mz recovers to M0. The transverse relaxation time T2, also called spin-spin relaxation time, describes the Mxy decrease because of interference from neighboring spins, while T2* describes the decrease because of both spin-spin interactions and nonuniformities of B0. Finally, the MRI signal is obtained by measuring the transverse magnetization as an electrical current by induction.

The contrast in MR images depends on three main parameters: the proton density, the longitudinal relaxation time T1, and the transverse relaxation time T2. These parameters can be adjusted by changing the time at which the signal is recorded, called echo time, and the interval between successive excitation pulses, called repetition time. A T1-weighted image is created by choosing a short repetition time, a T2-weighted image by choosing a long echo time, and a proton density (PD)-weighted image by minimizing both T1 and T2 weighting of the image (long repetition time and short echo time). The corresponding images are referred to as T1-weighted MRI, T2-weighted MRI, and PD-weighted MRI. Note that many variations of these sequences exist (for instance, gradient-echo vs spin-echo) and the corresponding implementation by different manufacturers usually comes with a specific commercial name (e.g., MPRAGE is a T1-weighted sequence available on Siemens scanners). Furthermore, many more anatomical sequences exist including T2*-weighted, T2-FLAIR (fluid-attenuated inversion recovery), or DIR (double inversion recovery). Examples are displayed in Fig. 8. The set of sequences chosen by the radiologist will depend on the potential disease that is being investigated. Some examples in the context of machine learning are given in Subheading 3.1.3.

Fig. 8
3 M R Is of the brain. The T 1 weighted image represents an X-shaped and dark-shaded center part. The T 2 weighted image exhibits an X-shaped and bright-colored center part. In T 2-F L A I R, a prominent dark-colored X-shaped part is at the center.

Example of anatomical MR images. T1-weighted, T2-weighted, and T2-FLAIR images of a patient with multiple sclerosis from the MSSEG MICCAI 2016 challenge data set [32, 33]

3.1.2 Extracting Features from Anatomical MRI

Several preprocessing steps are often necessary before analyzing anatomical MR images to correct imperfections and ease their comparison.

Bias Field Correction

MR images can be corrupted by a low-frequency and smooth signal caused by magnetic field inhomogeneity. This bias field induces variations in the intensity of the same tissue in different locations of the image, which deteriorates the performance of image analysis algorithms such as registration or segmentation. Several methods exist to correct these intensity inhomogeneities, the most popular being the N4 algorithm [34] available in ANTs [24].

Intensity Rescaling and Standardization

As MRI is usually not a quantitative imaging modality, MR images can have different intensity ranges, and the intensity distribution of the same tissue type may be different between two images, which might affect the subsequent image preprocessing steps. The first point can be dealt with by globally rescaling the image, for example, between 0 and 1, using the minimum and maximum intensity values. More robust choices exist such as the z-score normalization (at each voxel, one subtracts the mean intensity of the image, and the result is divided by the standard deviation across the image), which can be made even more robust by only considering a percentile of the intensities for computing the mean and standard deviation. Intensity standardization, to solve the second point, can be achieved using techniques such as histogram matching [35].

Skull Stripping

Extracranial tissues can be an obstacle for image analysis algorithms [36]. A large number of methods have been developed for brain extraction, also called skull stripping. Some are available in neuroimaging software platforms, such as FSL [22] or BrainSuite [37], and others as independent toolsFootnote 21,Footnote 22 [38, 39]. Note that these methods can be sensitive to the presence of noise and artefacts, which can result in over or under segmentation of the brain.

Image Registration

Medical image registration consists in spatially aligning two or more images, either globally (rigid and affine registration) or locally (nonrigid registration), so that voxels in corresponding positions contain comparable information. A large number of software tools have been developed for MRI-based registration [40]. They are available in all the major platforms (e.g., SPM [21], FSL [22], FreeSurfer [23], or ANTs [24]).

Image Segmentation

Medical image segmentation consists in partitioning an image into a set of nonoverlapping regions. When processing brain images, these regions can correspond to tissue types, e.g., gray matter, white matter, and cerebrospinal fluid [41], but also to anatomical (e.g., hippocampus, pons) or functional (e.g., language network, sensorimotor network) regions defined by an atlas [42]. As for registration, many tools have been developed for MRI-based segmentation and are available, among others, in SPM [21], FSL [22], FreerSurfer [23], or ANTs [24].

Resulting Features

Based on the combination of one, several, or all, of the previously mentioned preprocessing steps, various types of features can be extracted that correspond to those described in Subheading 2.2. For deep learning algorithms, which usually exploit voxel-based features, it is quite common to perform only very basic preprocessing. At the simplest, it can be intensity normalization (this step is mandatory for deep learning methods to work correctly). It is often combined with a bias field correction and a linear registration to a common space. Another common type of voxel-based features is that of tissue density maps (e.g., gray matter or white matter density) [43]. Their extraction involves bias field correction, registration to a common space, and tissue segmentation. Common vertex-based features are the local thickness and the local surface area [44]. Regional features are usually the volume of different regions of the brain, but they can also be the average intensity within the region or the average of another image-derived value. They can as well be related to lesions (for instance, the volume of multiple sclerosis lesions or of different compartments of a brain tumor) rather than anatomical regions. Finally, graph-based features can also be computed from anatomical MRI [45] even though this representation is more common for diffusion MRI and functional MRI.

3.1.3 Examples of Applications in Machine Learning Studies

T1-weighted MRI is the sequence the most commonly found in machine learning studies applied to brain disorders. Several features can be extracted from T1-weighted MRI such as the volume of the whole brain or of regions of interest; the density of a particular tissue, e.g., gray matter; or the local cortical thickness and surface area. All these features, as well as the raw T1-weighted MR images, have, for example, largely been used for the computer-aided diagnosis of dementia, in particular Alzheimer’s disease, as they highlight atrophy, i.e., the neuronal loss that is a marker of neurodegenerative diseases [46,47,48,49].

T1-weighted MR images acquired with and without the injection of a contrast agent are often used in the context of brain tumor detection and segmentation, progression assessment, and survival prediction as they allow distinguishing active tumor structures [50]. Such tasks also typically rely on another sequence called T2-weighted fluid-attenuated inversion recovery (T2-FLAIR) that allows visualizing a wide range of lesions on top of tumors [51], such as those appearing with multiple sclerosis [52, 53] or age-related white matter hyperintensities (also called leukoaraiosis, which is linked to small vessel disease).

3.2 Diffusion-Weighted MRI

3.2.1 Basic Principles

Diffusion MRI [54, 55] allows visualizing tissue micro-architecture, thanks to the diffusion of water molecules. Depending on their surroundings, water molecules are able to either move freely, e.g., in the extracellular space, or move following surrounding constraints, e.g., within a neuron. In the former situation, the diffusion is isotropic, while in the later it is anisotropic. Contrast in a diffusion MR image originates from the fact that following the application of an excitation pulse, water molecules that move in a particular direction, and so the protons they contain, do not have the same magnetic properties as the ones that move randomly but not far from their origin point. The excitation pulse is parametrized by a weighting coefficient b: the higher the b-value, the more sensitive the acquisition is to water diffusion, but the lower the signal-to-noise ratio. Several diffusion MRI volumes, each volume corresponding to a particular b-value and gradient direction, are usually acquired. See examples in Fig. 9 (top row).

Fig. 9
2 charts with 4 M R images of the brain. 1. The image with b = 0 exhibits some prominent bright portions near the border and at the center, while the other 3 images with b = 1000, represent some dark regions at the center. 2. The resultant images are F A, A D, R D, and M D.

Example of diffusion-weighted MR images. Top: diffusion volumes acquired using different b-values (0 and 1000 s/mm2) and gradient directions. Bottom: parametric maps resulting from diffusion tensor modeling (fractional anisotropy, FA; axial diffusivity, AD; radial diffusivity, RD; and mean diffusivity, MD)

3.2.2 Extracting Features from Diffusion MRI

Diffusion MR images are typically acquired with echo-planar imaging, a technique that spatially encodes the MRI signal in a way that enables fast acquisitions with a relatively high signal-to-noise ratio. However, echo-planar imaging induces geometric distortions and signal losses known as magnetic susceptibility artifacts. Other artifacts include eddy currents (due to the rapid switching of diffusion gradients), intensity inhomogeneities (as for anatomical MRI), and potential movements of the subject during the acquisition. These artifacts need to be corrected before further analyzing the images. Various methods exist to do so; they are reviewed in [56]. Two widely used tools enabling the preprocessing of diffusion MR images are FSL [22] and MRtrix [25], but others existFootnote 23 [56].

Once artifacts have been corrected, diffusion MR images can be analyzed in different ways. One of the earliest strategy for modeling water diffusion is the diffusion tensor imaging (DTI) model [57]. Such model can output parametric maps describing several diffusion properties: fractional anisotropy (FA, directional preference of diffusion), mean diffusivity (MD, overall diffusion rate, also called apparent diffusion coefficient), axial diffusivity (AD, diffusion rate along the main axis of diffusion), and radial diffusivity (RD, diffusion rate in the transverse direction). Examples of parametric maps are displayed in Fig. 9 (bottom row). DTI tractography [58] goes one step further by reconstructing white matter tracts. Other diffusion models have been developed to better characterize tissue micro-architecture. This is, for example, the case of neurite orientation dispersion and density imaging (NODDI) [59], which enables the study of neurite morphology by disentangling neurite density and orientation dispersion that both independently influence fractional anisotropy.

One can then again compute most of the different types of features covered in Subheading 2.2. Voxel-based features will represent the value of a given parametric map (e.g., FA, MD). Surface-based features are seldom used because diffusion MRI often focuses on the white matter even though it is in principle possible to project maps that are of interest in the gray matter onto the cortical surface. Regional features represent the average of a given map (e.g., FA, MD) in a set of anatomical regions. Graph-based features can be computed as follows, vertices are often regions of the cortex, and edges correspond to the connection strength, which can be derived, for instance, from the number of tracts connecting two regions or the average FA within those tracts.

3.2.3 Examples of Applications in Machine Learning Studies

Machine learning studies have mainly used diffusion MRI to assess white matter integrity. This has been done in a very wide variety of disorders. For example, fractional anisotropy and mean diffusivity have been used to differentiate cognitively normal subjects from patients with mild cognitive impairment or Alzheimer’s disease [60, 61]. Diffusion MRI has also been exploited to perform tumor grading or subtyping [62] following the assumption that the cellular structure may differ between cancerous and healthy tissues.

3.3 Functional MRI

3.3.1 Basic Principles

When a region of the brain gets activated by a cognitive task, two phenomena occur: a local increase in cerebral blood flow and changes in oxygenation concentration [63]. Functional MRI (fMRI) is used to measure the latter phenomenon. The blood-oxygen-level-dependent (BOLD) contrast originates from the fact that hemoglobin molecules that carry oxygen have different magnetic properties than hemoglobin molecules that do not carry oxygen.

Task fMRI consists in inducing particular neural states, for example, by performing tasks involving the visual or auditory systems and then comparing the signals recorded during the different states. As the differences observed are small, it is important to preserve at best the signal-to-noise ratio that could be degraded because of head motion or polluted by fluctuations of the cardiac and respiratory cycles. This is done by quickly acquiring multiple image volumes with echo-planar imaging. The BOLD signal also varies when the brain is not performing any particular task [64]. These spontaneous fluctuations are studied with resting-state fMRI.

3.3.2 Extracting Features from Functional MRI

The preprocessing of functional MR images has two main objectives: limit the effect of nonneural sources of variability and correct acquisition-related artifacts [65]. Preprocessing steps can, for example, include susceptibility distortion correction (as for diffusion MRI); head motion correction, by registering each volume in the time series to a reference volume (e.g., the first volume); slice-timing correction, to eliminate differences between the time of acquisition of each slice in the volume; or physiologic noise correction, by temporal filtering [63, 65]. These preprocessing steps can be performed using tools such as SPM [21], FSL [22], or AFNI [26], but also using the dedicated fMRIPrep workflow [65].

The majority of machine learning studies in brain disorders focuses on resting-state rather than task fMRI [66]. This can be explained by the fact that the resting-state protocol is simpler and allows multi-site studies (as it is less sensitive to changes in local experimental settings) [66], which should result in larger samples. Depending on the application, preprocessed resting-state fMRI data may be further processed to extract features. One can directly use voxel-based features (or vertex-based features by projecting the functional MRI signal onto the cortical surface) [67]. Nevertheless, to the best of our knowledge, the most common features are graph-based. Indeed, most supervised algorithms for classification or regression use brain networks extracted from resting-state time series. In these networks, also called connectomes, the vertices correspond to brain regions, which size may vary, and the edges encode the functional connectivity strength, which corresponds to the correlation between time series.

3.3.3 Examples of Applications in Machine Learning Studies

Machine learning methods exploiting resting-state fMRI data have been used to investigate brain development and aging, but also neurodegenerative and psychiatric disorders [66]. Functional connectivity patterns have, for instance, been used to distinguish patients with schizophrenia from healthy controls [68] or discriminate schizophrenia and bipolar disorder from healthy controls [69].

4 X-Ray Imaging

X-ray imaging is built on the work of Röntgen who observed that if a “hand be held before the fluorescent screen, the shadow shows the bones darkly, with only faint outlines of the surrounding tissues” [3].

4.1 X-Ray and Angiography

When an X-ray beam passes through the body, part of its energy is absorbed or scattered: the number of X-ray photons is reduced by attenuation (Fig. 10, left). On the opposite side of the body, detectors capture the remaining X-ray photons, and an image is generated. In an X-ray image, the contrast, defined as the relative intensity change produced by an object, originates from the variations in linear attenuation coefficient with tissue type and density.

Fig. 10
A 2-part illustration. 1. A rectangle-shaped material with thickness delta X, input, I subscript i, I subscript o, and marked others. 2. It exhibits a 3-D image with detectors and an X-ray source.

Left: attenuation of X-rays by matter. As it passes through a material of thickness Δx and linear attenuation coefficient μ, the X-ray beam is attenuated. Its intensity decreases exponentially with the distance travelled: Io = Ii eμΔx, where Ii and Io are the input and output X-ray intensities. Right: third-generation CT. A 3D image is created by rotating the X-ray source and detectors around the body

X-ray imaging provides excellent contrast between bone, air, and soft tissue but very little contrast between the different types of soft tissue, hence its limited use when studying brain disorders. However, coupled with the injection of an iodine-based contrast agent, X-ray imaging enables visualizing cerebral blood vessels and detecting potential abnormalities such as an aneurysm. This technique is called X-ray angiography.

4.2 Computed Tomography

4.2.1 Basic Principles

Although the X-ray images produced were originally in 2D, X-ray computed tomography enables the reconstruction of 3D images by rotating the X-ray source and detectors around the body (Fig. 10, right). Rather than using the absolute values of the linear attenuation coefficients, CT image intensities are expressed in a standard unit, the Hounsfield unit (HU). The tissue attenuation coefficient is compared to the attenuation value of water and displayed on the Hounsfield scale:

$$ {x}_{HU}=1000\times \frac{x_{\mu }-{\mu}_{water}}{\mu_{water}-{\mu}_{air}} $$

where μwater and μair are the linear attenuation coefficients of water and air, respectively. For example, air has an attenuation of − 1000 HU, water of 0 HU, and cortical bone between 500 and 1900 HU.

As for 2D X-ray imaging, the injection of an iodine-based contrast agent improves the visualization of cerebral blood vessels. This technique, called CT angiography, is not the only one relying on a contrast agent. CT perfusion tracks the bolus of contrast agent over time and measures the resulting change in signal intensity. Perfusion parameters such as the cerebral blood flow or volume can then be derived [70].

4.2.2 Extracting Features from CT Images

Contrary to MRI, CT images usually do no require extensive preprocessing steps [71]. It can however be useful to extract the head from the hardware elements visible on the image (e.g., the bed or pillow) or extract the brain. This can be done using thresholding and morphological operators. Another common step is spatial normalization.

In the context of stroke, non-contrast CT is useful to detect an intracranial hemorrhage, which appears brighter than the surrounding tissues, or to estimate the extent of early ischemic injury, which results in a loss of gray-white matter differentiation. CT angiography can help identify a potential intracranial arterial occlusion, and CT perfusion allows differentiating the regions with nonviable/non-salvageable tissue, which have very low cerebral blood flow and volume, from the viable and potentially salvageable regions [70]. These techniques may also be employed in the context of brain tumors. In particular, contrast-enhanced CT can detect areas presenting a blood-brain barrier breakdown [72]. An example of CT acquired before and after contrast injection is displayed in Fig. 11.

Fig. 11
3 C T scans of the brain. In the non-contrast C T image, bone window, the bright outline is visible. The non-contrast C T, brain window represents a bright outline and some dark tissue at the center. The contrast-enhanced C T, brain window exhibits a bright outline, a bright part on the left hemisphere.

Example of CT images. Non-contrast CT images, whose window levels were adjusted to better visualize bone or brain tissues and contrast-enhanced CT image of a patient with lymphoma. Case courtesy of Dr Yair Glick, Radiopaedia.org, rID: 94844

To the best of our knowledge, CT is most often used in machine learning in the form of voxel-based features (the image intensities after some minimal preprocessing steps).

4.2.3 Examples of Applications in Machine Learning Studies

The vast majority of machine learning studies relying on CT images, particularly non-contrast CT, focus on cerebrovascular disorders [73, 74]. Non-contrast CT images were, for example, used for the detection of intracranial hemorrhage and its five subtypes [75]. A first neural network was in charge of identifying the presence or absence of intracranial hemorrhage and a second of determining the intracranial hemorrhage subtype, which depends on the bleeding location [75]. In [76], non-contrast CT and CT perfusion images were used to segment the core of stroke lesions, as the lesion volume is a key measurement to assess the prognosis of acute ischemic stroke patients.

5 Nuclear Imaging

In X-ray CT imaging, the photons that are detected originate from an X-ray source. In nuclear imaging, and more precisely emission computed tomography, the photons detected are emitted from a radiopharmaceutical that has been intravenously injected to the patient.

5.1 Positron Emission Tomography

5.1.1 Basic Principles

Positron emission tomography is an imaging technique that requires the injection of a substance labeled with a positron-emitting radioactive isotope [77]. The labeled substance is distributed throughout the patient’s body by the blood circulation and accumulates in target regions. The positrons emitted by the radioactive isotope combine with the electrons present in the tissues and annihilate. Each annihilation produces two nearly collinear photons (Fig. 12). The two photons are simultaneously detected by two opposing detectors, and a coincidence event is assigned to a line of response connecting the two detectors.

Fig. 12
An illustration of P E T annihilation. It represents a positron and an electron strike each other and produce gamma rays with 511-kilo electron volts that moves in the opposite direction.

PET annihilation. When a positron (e+) and an electron (e) collide, they annihilate and create a pair of collinear gamma rays (γ)

Note that the most common isotope in clinical routine is fluorine-18 (18F), which has the advantage of a relatively long half-life (110 min) and thus does not require the presence of a cyclotron at the scanning site. Nevertheless, other isotopes are used. In particular, carbon-11 (11C), which has a shorter half-life (20 min), is often used in research facilities equipped with a cyclotron.

In a time-of-flight PET system, the difference in arrival times between the two coincident photons is measured. Without time-of-flight information, the annihilation is located with equal probability along the line of response, while with time-of-flight information, the annihilation site can be reduced to a limited range (Fig. 13), thus decreasing the spatial uncertainty and increasing the signal-to-noise ratio. Once reconstructed, the PET image is a map of the radioactivity distribution throughout the body.

Fig. 13
2 illustrations represent without time of flight and with a time of flight. the line of response is limited.

Illustration of PET data detection. Without time-of-flight, the annihilation is located with equal probability along the line of response, while with time-of-flight it is located in a limited portion of the line of response

Two main protocols exist when acquiring PET data. Most acquisitions are static: the radiotracer is injected several minutes before the acquisition (e.g., between 30 and 60 min), which gives the tracer time to diffuse in the body and accumulate in the target regions. The subject is then placed in the scanner and the acquisition lasts typically around 15 min. In the dynamic protocol, the subject is first installed in the scanner, and the acquisition starts at the same time the tracer is being injected. This allows recording how the tracer diffuses in the body. Dynamic acquisitions are less common than static ones because of their duration of 60–90 min, which reduces patient throughput. In both static and dynamic protocols, the acquisition is often split in frames of fix (in the static case) or increasing (in the dynamic case) duration. A static acquisition of 15 min can typically be split into three frames of 5 min, resulting in three PET volumes, each corresponding to the average amount of radioactivity detected at each voxel during the time frame.

18F-fluorodeoxyglucose (FDG) is the most widely used PET radiopharmaceutical [77, 78]. As an analogue of glucose, FDG is transported to a cell, but, unlike glucose, it remains trapped in the cell. This radiopharmaceutical is an excellent marker of changes in glucose metabolism. In the brain, FDG acts as an indirect marker of synaptic dysfunction and is part of the diagnosis of epilepsy and neurodegenerative diseases, such as Alzheimer’s disease [79].

If 18F-FDG is a nonspecific tracer, other radiopharmaceuticals target specific molecular or biological processes and are thus preferentially used for studying specific diseases. Amyloid tracers, such as the 11C Pittsburgh compound B, 18F-florbetapir, 18F-florbetaben and 18F-flutemetamol, which bind to fibrillar Aβ plaques, or tau tracers, such as 18F-flortaucipir, and 18F-MK-6240, which bind to neurofibrillary tangles, are, for example, used in the diagnosis of dementia syndromes [80]. Examples are displayed in Fig. 14. Of note, the so-called amyloid tracers are in fact not specific of amyloid and also bind to myelin in the white matter, making them of interest for demyelinating disorders such as multiple sclerosis [81]. 11C-methionine and 18F-fluoroethyltyrosine are both used in neuro-oncology [82]. Note that these are just examples of tracers and dozens of tracers exist for imaging specific molecular or biological processes.

Fig. 14
3 P E T scans of the brain. 1. The F D G P E T scan represents some prominent bright parts with contrast borders. 2. The tau P E T image exhibits some colored patches throughout the region. 3. In the amyloid P E T most of the regions are bright in color.

Example of PET images. Left: 18F-FDG PET displaying brain glucose metabolism. Middle: 18F-flortaucipir PET displaying the presence of tau neurofibrillary tangles. Right: 18F-florbetapir PET displaying the presence of amyloid plaques. All the images correspond to the same Alzheimer’s disease patient from the ADNI study [83]

5.1.2 Extracting Features from PET Images

The reconstruction procedure of the PET signal already includes several corrections (e.g., attenuation and scatter corrections), but several processing steps can be performed before further analyzing PET images. The first one is often motion correction. This is typically done by rigidly registering each frame to a reference frame. The registered frames are then averaged to form a single volume. To allow for intersubject comparison, brain PET images need to be intensity normalized, for example, to compensate for variations in the patients’ weight or dose injected. Standardized uptake value ratios (SUVRs) are generated by dividing a PET image by the mean uptake in a reference region. This region can be obtained from an atlas, and in this case chosen depending on the tracer and disorder suspected, or in a data-driven manner [84]. Partial volume correction can be performed to limit the spill out of activity outside of the region where the tracer is meant to accumulate [85] using tools such as PETPVC [86]. Finally, PET images can also be spatially normalized. If an anatomical image (preferably MRI but also CT) of the subject is available, the PET image is rigidly registered to the anatomical image, and the anatomical image is registered to a template, often in standard space. By composing the two transformations, the PET image is spatially normalized. Alternatively, if no anatomical image is available, the PET image can directly be registered to a PET template, for example, as implemented in SPM [87]. Dynamic PET images are further processed to extract quantitative physiological data using kinetic modeling, which is introduced in [77, 78].

One can then obtain different types of features, as described in Subheading 2.2. Voxel-based features will very often be the SUVR at each voxel, usually after spatial normalization. Vertex-based features will generally be the SUVR projected onto the cortical surface [88]. Regional features will usually correspond to the average SUVR in each region of a parcellation. Graph-based features are less used than for diffusion or functional MRI but are still employed to study the so-called metabolic connectivity [89].

5.1.3 Examples of Applications in Machine Learning Studies

Machine learning studies have mainly exploited brain PET images in the context of dementia [90]. For example, the usefulness of 18F-FDG PET to differentiate patients with Alzheimer’s disease from healthy controls and patients with stable mild cognitive impairment from those who subsequently progressed to Alzheimer’s disease has been shown in [48, 91, 92]. 18F-FDG PET has also been used to differentiate frontotemporal dementia from Alzheimer’s disease [93]. In neuro-oncology, 11C-methionine has been used to predict glioma survival [94] or to differentiate recurrent brain tumor from radiation necrosis [95].

5.2 Single-Photon Emission Computed Tomography

5.2.1 Basic Principles

Single-photon emission computed tomography is an imaging technique that requires the injection of a substance labeled with an isotope that directly emits gamma radiation. Typical isotopes employed in neurology are technetium-99m (99mTc) and iodine-123 (123I). As for PET, the labeled substance is distributed throughout the patient’s body by the blood circulation and accumulates in target regions. The photons emitted are detected by one to three detector heads, called gamma cameras, that rotate around the patient. Having multiple heads allows reducing image acquisition time and improving sensitivity as more photons can be detected. Collimators are placed in front of the detector heads to localize the origin of the gamma rays: a gamma ray moving from the patient toward the camera has a higher probability of being detected if its direction aligns with the collimator (Fig. 15) [96]. Once reconstructed, the SPECT image is a map of the radioactivity distribution throughout the body. Both dynamic and static protocols exist when acquiring SPECT data.

Fig. 15
An illustration of a two-head S P E C T system with detectors on either side. The parallel hole collimators are placed in front of the detectors. Some gamma rays run perpendicular to the detectors.

Illustration of a two-head SPECT system with a parallel hole collimator. The photons whose emission direction is perpendicular to the detector heads have a higher probability of being detected (solid lines)

SPECT is able to visualize and quantify changes in cerebral blood flow and neurotransmitter systems, such as the dopamine system [97, 98]. To image cerebral blood flow, the two most widely used tracers are 99mTc-HMPAO and 99mTc-ECD [97, 99]. These tracers can, for example, be employed in the context of dementia as a decrease in neural function will result in a decrease in cerebral blood flow in different regions. SPECT plays a key role when studying Parkinsonian syndromes, which are characterized by a loss of dopaminergic neurons. In this context, tracers targeting the dopaminergic system, such as 123I-β-CIT and 123I-FP-CIT (also called DaTscan), are employed to differentiate essential tremor from neurodegenerative Parkinsonian syndromes or distinguish dementia with Lewy bodies from other dementias [98]. Examples of SPECT images are displayed in Fig. 16.

Fig. 16
2 pairs of S P E C T scans of the brain. Each pair contains 2 scans. 1. They represent dark-colored patches with contrasting borders. The epileptic patient's brain border seems thick and irregular. 2. They exhibit 2 attached, irregularly-shaped, bright patches with a dark contrasting outline, at the top.

Examples of SPECT images. Left: 99mTc-HMPAO SPECT images of a normal control and an epileptic patient (http://spect.yale.edu) [100]. Right: 123I-FP-CIT SPECT images of a normal control and a patient with Parkinson’s disease from the PPMI study [101]

5.2.2 Extracting Features from SPECT Images

After the reconstruction of a SPECT image, which includes several corrections, two processing steps are typically performed: intensity normalization and spatial normalization [97, 98]. As for PET, the intensity of a SPECT image can be normalized using a reference region, and the image can be spatially normalized by directly registering it to a SPECT template or by registering it first to an anatomical image.

As for PET, the most common feature types are voxel-based (the normalized signal at each voxel) and regional features (often the average normalized signal within a region). To the best of our knowledge, vertex-based and graph-based features are rarely used although they could in principle be computed.

5.2.3 Examples of Applications in Machine Learning Studies

Machine learning studies have mainly exploited brain SPECT images for the computer-aided diagnosis of Parkinsonian syndromes [102]. 123I-FP-CIT SPECT was, for instance, used to distinguish Parkinson’s disease from healthy controls [103, 104], predict future motor severity [105], discriminate Parkinson’s disease from non-Parkinsonian tremor [104], or identify patients clinically diagnosed with Parkinson’s disease but who have scans without evidence of dopaminergic deficit [104].

In studies targeting dementia, both 99mTc-HMPAO [106] and 99mTc-ECD [107] tracers were used to differentiate between images from healthy subjects and images from Alzheimer’s disease patients.

6 Conclusion

Neuroimaging plays a key role for the study of brain disorders. If some modalities provide information regarding the anatomy of the brain (CT and MRI), others provide functional or molecular information (MRI, PET, and SPECT). To provide a complete picture of biological processes and their alterations, it is often necessary to combine multiple brain imaging modalities (Fig. 17). This can be done by acquiring images with multiple standalone systems or with hybrid systems such as SPECT/CT, PET/CT, or PET/MRI scanners [108].

Fig. 17
A neuroimaging chart represents C T, P E T-C T fusion, F D G P E T, P E T-T 1 fusion, and T 1-w M R I scans of the brain. It provides anatomical, functional, and molecular information.

Example of 18F-FDG PET, CT, T1-weighted MRI, and fused images

When analyzing neuroimages, both modality-specific and modality-agnostic processing steps must often be performed. These should be performed with care to obtain reliable features. Machine learning and deep learning are widely used to analyze neuroimaging data. The most common tasks are classification for computer-aided diagnosis, prognosis and disease subtyping, and segmentation to characterize anatomical structures and lesions.