Introduction

As magnetic resonance imaging (MRI) and positron emission tomography (PET) have become more widely available, the field has seen a proliferation of imaging studies designed to improve our scientific understanding of neurodegenerative diseases. Readers of findings from these studies who are not themselves in the brain imaging field may have a harder time understanding them because of the wide range of imaging and analyses types employed. In this review, we describe these most popular types of imaging used in neurodegeneration research and what types of analyses they are primarily used for. We do not attempt to review or summarize any research findings, but only to describe the major methodologies that drive them. We will also review and discuss one of the largest challenges in this field: heterogeneity across scanners, sites, and studies—both of the images themselves and how research groups analyze them.

We begin by discussing each image type individually: describing its reason for inclusion in neuroimaging research and how it is analyzed. These are arranged by modality: first with MRI, then PET, then computed tomography (CT). We also provide a summary in Table 1 and some examples in Fig. 1. Lastly, we describe the challenge of data heterogeneity and the major strategies employed to combat it.

Table 1 Summary of major image types in neurodegeneration research
Fig. 1
figure 1

Examples of imaging sequences discussed in this work. MRI examples are from one participant in the Alzheimer’s Disease Neuroimaging Initiative (ADNI), while PET examples are from a different ADNI participant. PET scans are shown in false color to emphasize findings

MRI

Broadly speaking, brain MRI scans are used in neurodegeneration studies to provide high-resolution imaging of brain structure and physiology. A participant is placed in a very strong magnetic field that forces the magnetic moments of protons in water and fat to become aligned, which allows these protons to absorb and emit pulses of radiofrequency (RF) waves. These pulses of energy are emitted by the scanner in various “pulse sequences” that are absorbed and emitted by body protons, which are then detected and localized by the scanner to image the density and locations of water and fat. By varying these pulse sequences, a wide variety of imaging contrasts may be obtained (which we describe below), making MRI highly versatile.

MRI is generally more expensive than CT, but less expensive than PET. All three modalities are generally considered minimal risk to participants when performed within approved protocols, but they may require mild sedation for participants with severe claustrophobia, and some participants may be excluded by physical size/weight limitations. Unlike PET and CT, MRI has additional contraindications for participants with metallic implants and foreign bodies such as pacemakers, aneurism clips, neurostimulators, shrapnel/bullets, insulin pumps, metallic intraocular foreign bodies, and some classes of tattoos or permanent make-up. However, unlike both PET and CT, MRI does not require the use of ionizing radiation that has been associated with an increased risk of cancer. In the following sections, we describe each of the major classes of MRI pulse sequences used in brain neuroimaging research and the types of analyses in which they are typically used.

T1-Weighted (Structural) MRI

T1-weighted (T1-w) scans are the most standard of MRI contrasts/sequences. These show tissue anatomy and density, where roughly white matter (WM) is white, gray matter (GM) is gray, and cerebrospinal fluid (CSF) is dark (i.e., CSF < GM < WM). In neuroimaging research, these scans have two major uses: (1) localization of brain regions and (2) estimation of tissue density. We show example images of these processes in Fig. 2.

Fig. 2
figure 2

Examples of processing of T1-w MRI. The top row depicts images in “native space” and the bottom row depicts images in “template space”. All images in the top row, after the original input image, were automatically produced by the segmentation and nonlinear registration (warping) using the pre-defined template/atlas information in the bottom row

Localization of Brain Regions

Arguably, the most fundamental research use of T1-w images is to automatically calculate a mapping between every voxel (3D volumetric pixel) location and its analogous location in a standard template brain. This process is called registration or normalization, and its mappings can be either linear (e.g., rigid or affine) or nonlinear (sometimes called warping), depending on the application and accuracy needed. These mappings allow automatic localization of any number of named regions of interest (ROIs) from an atlas (each drawn/defined on the template brain) by propagating them through the mapping onto the participant scan, enabling comparison of analogous brain regions across participants. Such mappings can be generated by a wide variety of software, most of which were designed specifically for brain images [1,2,3,4].

The most ubiquitous brain template is MNI152 aka ICBM152, which was generated from averaging 152 scans at the Montreal Neuroimaging Institute (MNI) as part of an International Consortium for Brain Mapping (ICBM) initiative [5,6,7]. Many other templates have been created (e.g., for specific ages/populations [8,9,10] or image types [11,12,13]) but the most ubiquitous are MNI152 or MNI305 (its predecessor). These “MNI coordinate spaces” are fundamental to each of the most popular software packages for MRI analyses, including FreeSurfer [14], SPM (Statistical Parametric Mapping [15]), FSL (FMRIB Software Library [16]), and AFNI (Analysis of Functional NeuroImages [17]).

Estimation of Tissue Density (Segmentation)

Localizing brain regions via T1-w images enables many analyses of all kinds of MRI and PET, but analyses of T1-w images themselves are principally focused on estimating tissue density in each voxel (i.e., its estimated probability of being gray matter, white matter, CSF, etc.). This process is typically called tissue segmentation and commonly performed by many of the same software toolkits, e.g., SPM, FreeSurfer, and FSL. These softwares typically use T1-w image intensity (brightness) in conjunction with tissue probability maps (TPMs: segmentation maps that are defined in template space and used as statistical priors that guide the segmentation). Mostly, these segmentations are used to measure gray matter volume: the total of all gray matter in a region or across the whole brain, in mm3 or cm3. The loss of a person’s brain tissue over time, due to healthy aging and/or disease, is called atrophy. Measuring atrophy is the second core use of T1-w images; this can also include measurements of white matter volumes/atrophy, or atrophy of both GM and WM via expansion of CSF volume [18]. Comparisons of volume measurements across individuals are inherently confounded by head size (which itself is heavily correlated with sex). To avoid these biases, analyses typically normalize by or regress out a measurement of total intracranial volume (TIV or ICV) [19, 20], which is also typically measured from T1-w images. Another alternative is to measure cortical thickness, which removes the surface area component of volume measurements to directly capture the width of the cortical ribbon in millimeters [21,22,23], avoiding volumes’ confound with TIV/head size. However, the resolution of most MRI (approximately 1 mm) limits the accuracy and precision of cortical thickness measurements relative to volume [24, 25], and it prevents measuring thickness in regions where the cortex is too small to resolve, e.g., hippocampus, amygdala, and cerebellum.

Comparative Analyses

By combining estimates of tissue density with localization, researchers can compare measurements across participants. Analyses typically occur either in “native space” or “template space”. In native space analyses, software automatically propagates atlas regions from the template space to the “native” space of the individual participant scan. From there, one can sum the volume in each brain region to produce numeric values (i.e., spreadsheets) and compare them across participants. In template space analyses, software transforms the “native space” MRI into the standard “template space” where every voxel location across all participants/scans should be anatomically analogous. After this transformation, comparisons are performed across participants in each voxel (e.g., voxel-based morphometry (VBM)) [26]. Typically, these transformations between native and template space are invertible, so for each scan, they can be computed once and then used in both directions for both types of analyses. Some software pipelines instead analyze transformations themselves (either participant-to-template or within-participant across time); these analyses are known as tensor-based morphometry (TBM), and they are considered the most powerful way to measure within-participant atrophy over time [27, 28]. Some analyses define their template space as a cortical surface, an “unfolded” model of the gray matter ribbon; these surface-based analyses co-register and compare scans in each vertex of the surface model space, rather than in each voxel of an image-based template space [29,30,31].

Analyses of Other Sequences and Modalities

In the previous paragraphs, we described how T1-w images are used both to measure a quantity of interest (tissue density) and to localize and compare these quantities in analogous brain regions. These same processes can be used to measure other quantities from other brain images (other MRI sequences, or PET, CT, etc.), by aligning (linear or affine registration) each image to the T1-w image and then using the existing T1w-to-template voxel mapping to perform the comparisons. In the research setting, analyses of most other image types described below are typically performed first by registering the image to the T1-w image and using this information to perform native-space or template-space analyses. Some image types have spatial distortions that can challenge achieving good alignment with T1, such as echo-planar imaging sequences (e.g., dMRI/fMRI); analyses of these modalities either use specialized distortion corrections to better match T1-w images or bypass T1-w and use modality-specific templates directly [32].

T2-Weighted and FLAIR MRI

While T1-weighted MRI is typically used to measure atrophy and localize tissue, T2-weighted MRI measures different tissue properties that are mostly associated with inflammation and edema. In T2-w, contrast is reversed from T1-w: in order of brightness, CSF > WM > GM. Inflammation and edema typically appear bright (brighter than or similar to CSF), along with white matter hyperintensities (WMH) that are associated with small vessel disease and demyelination in aging [33,34,35] or multiple sclerosis [36, 37]. To separate the signals of CSF from the pathologies of interest (both appear bright), FLAIR (fluid-attenuated inversion recovery) sequences were introduced. In this variant of T2-w imaging, CSF signal is suppressed (appears dark). Consequently, the signal of interest (WMH, inflammation, etc.) is mostly brighter than the rest of the image, aiding both visual and quantitative analyses. FLAIR’s CSF suppression is particularly helpful at the borders between tissue and ventricles, where WMH are very common and cannot be separated from CSF in standard T2-w images. FLAIR images have thus become the most popular sequence for clinical neuroradiology for general pathology detection and cerebrovascular disease [38].

Compared with T2-w sequences, FLAIR scan times may be longer and/or noise levels may be higher. In research imaging studies of aging and dementia, FLAIR or T2-w sequences are typically used for (a) measuring WMH and (b) monitoring for exclusionary criteria or adverse events, such as infarcts or immune/inflammatory responses to treatment [39, 40]. White matter hyperintensities are typically measured from FLAIR scans (sometimes in conjunction with T1-w scans) using automated or semi-automated algorithms that compute regional or whole-brain volumes using tools such as Lesion Segmentation Tool and FSL BIANCA [36, 37, 41]. Another use of T2-w (but not FLAIR) images is to better estimate TIV than with T1-w alone because T2-w images have signal in extracranial CSF, which is suppressed in FLAIR and T1-w sequences but should be included in TIV measurements [42, 43].

Iron-Sensitive Sequences: T2*, SWI, and QSM

MRI scans contain susceptibility artifacts in areas where the magnetic field is disrupted by magnetic materials; these occur in areas of interface between bone, blood, and air, and around metallic implants such as those common in dentistry. In the brain, hemorrhages and microhemorrhages also produce these local artifacts due to iron depots from the blood. To measure the prevalence of these hemorrhages and microhemorrhages, MRI sequences were developed that highlight these regions of magnetic susceptibility. These sequences are called T2*-w (“T2 star weighted”) or gradient-recalled echo (GRE) imaging. Later variants were developed that enhance resolution and sensitivity; these are known as susceptibility-weighted imaging (SWI) [44]. Still-newer sequences allow quantitative susceptibility mapping (QSM), which have the additional benefit of providing quantitative measures of susceptibility rather than only relative intensities within the image (as in T1-w, T2-w, SWI, and T2*-w/GRE) [45]. All of these sequences are principally used in neurodegeneration research to measure microhemorrhages, primarily through manual marking/counting by image analysts and radiologists [38, 46,47,48].

Diffusion MRI

Diffusion MRI or diffusion-weighted MRI (dMRI or DWI) measures the strength and direction of water molecules’ movement during the scan [49]. Clinically, dMRI is primarily used to image pathologies that reduce diffusion (e.g., ischemic strokes). In the research setting, it is used primarily to image the structure and integrity of white matter. Water molecules travel predominantly along axons, and their motion becomes less one-directional (more omnidirectional or isotropic) as axons degrade. These diffusion changes can reflect multiple etiologies at the molecular level (below the resolution of MRI), and thus changes measured in dMRI are referred to nonspecifically as changes in tissue integrity or microstructure [50].

In dMRI, multiple 3D images are acquired sequentially that each measure motion in one specific direction and at a particular strength or b-value. These sets of diffusion volumes are analyzed jointly to compute derived scalar quantities for each 3D voxel location. In its most basic form, typically six directions are acquired at a single b-value, and these are averaged to produce a trace image, which is a mixture of diffusion-weighted signal with underlying T2-w signal. This is the form most used clinically to image strokes, and in clinical contexts, the term DWI is sometimes used specifically to refer to these images.

In the research setting, more volumes (e.g., 30+ directions) are acquired to increase angular (directional) resolution, and this allows tensor-based analyses; such sequences are referred to as diffusion-tensor imaging (DTI). These sequences have become so standard in research neuroimaging that sometimes all diffusion sequences are called DTI, but really DTI should refer only to this specific class of diffusion sequences and their tensor-based analyses. In DTI analyses, a tensor model is estimated at each voxel location, and these are used to compute several derived scalars at each location. The most common are fractional anisotropy (FA) and mean diffusivity (MD). FA is designed to measure the degree (fraction from 0 to 1) of anisotropy (predominance in one direction). Functionally, FA images provide a map of major white matter tracts, which degrade (reduces FA) with age and pathology. In FA images, the order of brightness/intensity is CSF < GM « WM. Sometimes, directional information is added to FA images to produce a synthetic color FA image that coarsely shows whether each FA voxel flows primarily anterior–posterior, superior-inferior, or right-left. White matter hyperintensities (WMH), typically imaged in T2-w and FLAIR imaging, also reduce WM FA [51]. MD estimates the per-voxel average rate of diffusion across all directions, rather than its directionality. CSF is bright in MD, and WMH and other degradation cause an increase in MD signal toward resembling CSF [50]. Together, FA and MD are typically used to investigate nonspecific mostly vascular WM pathology associated with aging and neurodegenerative diseases [52,53,54,55]. Another class of dMRI analyses is tractography, which attempts to trace or follow diffusion along axons to produce maps of their locations (tractograms) [50]. Tractograms can be used to estimate the structural connectivity (i.e., connection through WM) between pairs of cortical regions. The minimal or optimal configuration of directions/strengths/spatial resolution needed for accurate and reproducible tractography is a matter of considerable debate [56,57,58,59].

Several more-advanced forms of diffusion imaging exist in the research setting; these increase the number of directions and strengths (b-values) to enable more complex analyses. These include multi-shell (multiple “shells” of b-values) dMRI, diffusion spectrum imaging (DSI), diffusion kurtosis imaging (DKI), high angular resolution diffusion imaging (HARDI), and others. Such more-advanced sequences enable more-advanced (than tensors) diffusion models like neurite orientation dispersion and density imaging (NODDI) [60]. These models can enhance voxel-, region-, and tractography-based analyses by allowing better separation of diffusion-related changes from general tissue atrophy and better separation of signals from multiple WM tracts in regions where they overlap.

In the research setting, diffusion images are quantified using automated software pipelines. Some of the most popular include DTI-TK [61], DSI Studio [62], and multiple tools within the DIPY [63] and FSL packages [16, 64].

BOLD: Functional and Resting State MRI

MRI sequences in the previous sections all image brain structure. By contrast, functional MRI (fMRI) images function, by using a mechanism called the blood-oxygen-level-dependent (BOLD) signal. Oxygenated and deoxygenated blood have different magnetic susceptibility, and this can be used to track the hemodynamic response function (HRF), where oxygenated blood is delivered to replenish recently fired neurons [65]. BOLD fMRI sequences rapidly image the brain while a stimulus is presented repeatedly, and statistical analyses (temporal correlations of regional activations) can use this signal to infer which regions of the brain were involved (“activated”) during that stimulus by measuring the HRF (i.e., the BOLD signal after a delay of several seconds).

In resting state (RS-) or task-free (TF-) fMRI, these same scans are performed without any stimulus; the participant is instructed to lie awake in the scanner without any particular thought. These analyses are used to identify and measure the strengths of the brain’s functional networks, large spatially disconnected regions of co-activation which typically cycle and change throughout the total imaging time. The strength of the temporal correlations between pairs of brain regions is called their functional connectivity [66]. Together, functional connections and structural connections (measured from dMRI) are called the brain’s connectome and their study is called connectomics. The most well-known functional network is the default-mode network (DMN) and its variants, thought to be active when the brain is at wakeful rest [67, 68]. Measurable changes in DMN integrity have been associated with Alzheimer’s disease, schizophrenia, autism-spectrum disorders, and others [68]. How long of a time series (scanning time) is required to produce clinically meaningful RS-fMRI data is itself an area of debate, but typical times in the research setting may be between approximately 5 and 20 min [69]. In recent years, fMRI and RS-fMRI analyses have come under some controversy due to concerns that their statistical significance may be inflated or findings may be unreproducible [70,71,72].

Other MRI Sequences

MRI is extremely versatile and there are a variety of more experimental, less common sequences that also play a role in neurodegeneration research. Arterial spin labeling (ASL) is one such sequence, where blood flowing into the brain through the neck is “tagged” with an RF pulse, which changes its MRI signal. After a variable delay of several seconds, subtraction between tagged and un-tagged images can be used to produce a map of cerebral blood flow (CBF) or perfusion [73, 74]. Others include magnetic resonance angiography (MRA) and 4-D flow MRI, used to image blood vessels and blood flow [75, 76], and magnetic resonance elastography (MRE), which measures tissue stiffness, akin to palpitation [77]. MRI with intravenous gadolinium-based contrast agents (GBCA) has also been proposed for measuring the integrity of the blood-brain barrier, which may be implicated in several neurodegenerative diseases, but this is rarely included in large imaging studies due to participant discomfort, concerns of adverse reactions, and potential long-term gadolinium retention in the brain [78].

Nuclear Medicine (PET and SPECT)

Positron emission tomography (PET) scans use radioactive tracers or ligands that are injected intravenously and co-localize in the body with a specific target of interest. These tracers then release positrons that collide with local electrons to produce gamma rays that are imaged by the scanner. These are used to image the density and location of specific targets, e.g., amyloid-beta, tau, and metabolic demand, which is not (currently) possible with MRI.

Compared with MRI, PET is significantly more expensive, has much lower resolution (approximately 5–8 mm vs. approximately 1 mm with clinical MRI), and it carries different participant risks in the form of ionizing radiation. However, these scans are currently the only way to image amyloid, tau, metabolism, and dopamine transporters in living brains, which are of critical interest to neurodegeneration research and clinical trials [79,80,81].

PET measurements can be quantitative if they are acquired using full-dynamic scans (participant is in the scanner for approximately 1–2 h, depending on the tracer used) in conjunction with arterial sampling. These scans allow better accuracy and precision (particularly important for longitudinal measurements) by avoiding the influences of perfusion and blood flow [82,83,84], but the costs of these scans are prohibitive in terms of participant burden and risk (via arterial sampling), and the practical costs of using the PET scanners for so long. More commonly, research studies acquire static or late-uptake scans, where after tracer injection, participants wait (in a waiting room, not in the scanner) for an uptake time period (e.g., 30–90 min depending on the tracer) before being scanned for approximately 20 min [85]. These late-uptake scans are quantified using standardized uptake value ratio (SUVR), which measures PET signal as a ratio over the signal in a reference region that is thought to be free of the target pathology. SUVR measurements are only semiquantitative [83, 84], in contrast to quantitative distribution volume ratio (DVR) measurements from full-dynamic PET scans. There is much debate over the best SUVR reference region for each tracer and type of analyses, but most common choices are cerebellar, pontine, or white matter [86,87,88,89,90]. Middle-ground approaches between quantitative and late-uptake scans include full-dynamic scans without arterial sampling [91], or coffee-break protocols where participants are in the scanner for the first 5–10 min after injection, but then leave and return in time to acquire the typical late-uptake periods. For some tracers, this “coffee break” period is sufficient to allow scanning another participant, while producing DVR measurements closer in quality to full-dynamic than late-uptake scans [84, 92].

FDG PET

Fluorodeoxyglucose (FDG) is a form of glucose modified to emit positrons. When used in brain PET imaging, FDG measures the glucose metabolism associated with neuronal and synaptic activity. In the brain, reduced FDG signal (hypometabolism) is associated with neurodegeneration from all causes. FDG is highly effective at capturing spatial patterns that discriminate between diseases and clinical subtypes of diseases at the individual level [93,94,95,96], making it especially useful in clinical differential diagnosis. In the research setting, FDG can also be used to measure non-specific neurodegeneration (atrophy) when MRI is unavailable [79]. It is also relatively inexpensive (compared with other PET tracers) and widely available due to its widespread use in cancer imaging. However, as more research imaging studies begin using amyloid and tau PET imaging, their use of FDG is declining because study participants can only tolerate (for both practical and radiation-safety reasons) a finite number of PET scans.

Amyloid PET

With the introduction of the Pittsburgh Compound B (PiB) tracer in 2004, researchers first became able to image brain amyloid pathology in living people [97], and amyloid PET has become a cornerstone of the imaging and diagnosis (in the research setting) of Alzheimer’s disease (AD) and related dementias [79, 80]. PiB uses the 11-C isotope, which has a half-life of approximately 20 min. Consequently, facilities using PiB must manufacture it on-site, which requires a costly cyclotron and significant nuclear expertise. Since then, many other amyloid PET tracers have been developed using F-18, which has a half-life of 109.7 min and is thus more practical to manufacture centrally and distribute to smaller facilities. These tracers include Florbetapir (aka FBP, AV-45, Amyvid), Florbetaben (aka FBB), Flutemetamol (aka FMT), and Flutafuranol (aka AZD4694, NAV4694) [98]. Each of these F-18-based amyloid tracers has comparable utility to PiB but their measurements are confounded by relatively more off-target binding (nuisance signal) in WM; the exception is Flutafuranol, which is relatively recent and has approximately identical WM binding with PiB [99]. Because the distribution of amyloid in the brain is relatively homogeneous, amyloid PET is often analyzed to produce only a single numeric measurement of total “global” amyloid [100, 101], which can be thresholded to produce a binary measure of amyloid positivity [79, 100, 102]. Individuals typically become amyloid-positive 10–15 years before the clinical onset of AD symptoms [103, 104]. In AD, amyloid PET and other amyloid biomarkers (such as from cerebral spinal fluid obtained via lumbar puncture) typically become abnormal prior to tau biomarkers (PET or CSF), and both typically occur prior to related neurodegeneration (FDG PET, MRI, or CSF) [79, 103,104,105].

Tau PET

Approximately a decade after the introduction of PiB, tracers were developed to image tau proteins associated with Alzheimer’s disease. First-generation tracers include Flortaucipir (aka FTP, AV-1451, Tauvid, T-807) [106], THK5317, THK5351, and PBB3. Second-generation tracers include MK-6240, RO-948, PI-2620, GTP1, and PM-PBB3 [107]. All of these tracers (in both generations) are F-18-based, and thus, they can be transported more practically than PiB. FTP and MK-6240 have emerged as arguably the most common tau tracers in AD research. Each of these tracers has varying binding affinity (i.e., signal strength) to AD-associated tau and varying levels of problematic off-target binding in other regions, such as the basal ganglia, choroid plexus, and even skull/bone surrounding the brain, which can “bleed into” regions of interest and make quantification/interpretation a challenge [107, 108]. Despite these challenges, tau PET has become widespread and crucial to AD research [79, 109,110,111,112,113]. Spatial patterns of tau PET are much more heterogenous across individuals than amyloid PET [114,115,116]. Within the time continuum of Alzheimer’s disease, highly elevated levels of tau PET signal typically occur only after substantial elevation of amyloid PET, and compared with amyloid they are more strongly associated (both in time and spatially within the brain) with neurodegeneration (atrophy) and with clinical symptoms [105, 117].

The use of current tau PET tracers for non-AD tauopathies, such as primary supranuclear palsy (PSP), corticobasal degeneration (CBD), chronic traumatic encephalopathy (CTE), and some subtypes of Frontotemporal lobar degeneration (FTLD), is much more controversial. Studies with early tracers (mainly Flortaucipir) have shown relatively weak signal in these diseases that is detectable on the group-level, but individual measurements are in ranges much lower than in AD patients [112, 118, 119]. The location of this weak signal is typically in white matter regions that are adjacent to the cortex where signal would be expected [119], and autoradiography studies have repeatedly found minimal binding of these tracers to non-AD tau ex-vivo [120, 121]; consequently, the specific pathology underlying this signal in non-AD tauopathies is uncertain and its related findings must be interpreted with caution [107, 120]. Some tracers (PI-2620, PM-PBB3) have reported relatively more signal in these diseases [122, 123], but no tau PET tracers have been widely accepted for non-AD tau [107].

Other Tracers

Although none has become as common as amyloid, tau, and FDG, several other classes of PET tracers are being explored to study the molecular pathology of dementing illnesses. These include tracers for synaptic density (e.g., UCB-J) [124] and neuroinflammation (e.g., ER176, PBR28, SMBT-1) [125126, 127].

Alpha-synuclein is the key protein of interest for Parkinson’s disease, Dementia with Lewy bodies (DLB), and other parkinsonian disorders. There are currently no released PET tracers for imaging alpha-synuclein in vivo. Some early results from in-development tracers (e.g., ACI-3847) have been presented at research meetings, but none is yet available. However, DaTScan (aka Ioflupane) is a single photon emission computed tomography (SPECT) tracer that images dopamine transporters (DaT); the appearance of these images in the striatum can be used to differentiate “true” parkinsonian disorders from essential tremor or drug-induced parkinsonism [128, 129]. These scans are also sometimes used for differential diagnosis in the research setting, particularly in imaging studies of parkinsonian disorders, but they do not allow directly studying the underlying alpha-synuclein pathology. Similarly, tracers for TDP-43 (TAR DNA-binding protein 43) would be of high interest for neurodegeneration research, but none have yet become available.

Quantifying PET Scans

Quantifying PET scans in neurodegeneration research is typically based on registration with T1-w MRI (which is almost always also performed) to allow the same types of template- and atlas-driven region- and voxel-based approaches as above, with largely the same sets of software packages, e.g., SPM, FreeSurfer, and FSL [31, 100]. Once PET scans are registered to T1-w MRI, nonlinear registrations between MRI and standard templates allow propagation of PET scans into template space, or atlas regions onto the native PET images. There are also specialized pipelines for PET-only measurements, and these can achieve similar performance with MRI-based approaches [130, 131], but MRI-based approaches are considered the gold standard because they have higher resolution to more accurately segment tissue and localize regions, and they can better adjust for partial volume (described below).

A large concern in PET quantification is partial volume: PET signal is reduced in regions of tissue loss (atrophy), and it is impossible from PET alone to determine whether a target (e.g., glucose, amyloid, or tau) is sparsely present within dense underlying tissue vs. densely present within sparse underlying tissue (i.e., regions of atrophy). There are many algorithms for partial volume correction (PVC) that use corresponding T1-w MRI to estimate hypothetical PET signal in each voxel or region as if there were no atrophy. PVC methods inherently boost measurement noise because they amplify signal in areas with relatively low signal. Determining which (if any at all) PVC method should be applied for a given study is a very active area of research and discussion [132133,134,135]. Applying PVC to PET analyses comparing two groups typically increases the statistical power of their differences; proponents of PVC argue that this supports it having corrected the biasing effects of atrophy from the PET signal. However, opponents of PVC argue that it effectively multiplies PET signal with MRI, so increased group differences come from increased information content from combining the MRI (atrophy) and the PET (molecular) signals together. Studies examining correlations between in vivo amyloid scans and quantitative pathology at autopsy have found that these correlations were reduced by application of all major classes of PVC [134], which works against the pro-PVC hypothesis that its increased group-differences are the result of more accurately quantifying the underlying molecular pathology.

CT

Computed tomography (CT) scans have a limited role in neurodegeneration research because CT has poor contrast for differentiating brain tissues when compared with MRI. However, most PET scanners are actually PET/CT scanners that acquire a low-dose CT used for attenuation correction (compensation for signal loss due to structurally dense areas such as bone) during reconstruction of the PET images. These CT scans are typically not analyzed after on-scanner PET reconstruction is completed. PET/MRI scanners have been slowly growing in popularity; these replace the low-dose CT with MRI- and artificial intelligence-based substitutes to eliminate the CT’s dose of radiation to the participant. However, achieving CT-like bone contrast with MRI is not straightforward [136], and in neurodegeneration research studies PET/CT scanners still far outnumber PET/MRI scanners.

Challenges to Multi-site and Meta-analyses: Data Standardization and Harmonization

One huge challenge in neuroimaging is the inability to directly compare images and imaging-derived measurements across differences in imaging protocols (pulse sequences, PET tracers, scan times), imaging hardware (scanner vendor/model, head coil, on-board reconstruction software/version), and analyses software/measurements. Each imaging site or research group typically performs these tasks according to their own preferences and hardware availability, which makes the images and measurements unable to be compared directly without some transformation for harmonization [100, 137] to remove non-biologic sources of variability. In an ideal study, every participant would be imaged on the same scanner with the same protocols and data would be analyzed identically, but very large studies are typically multi-site and even within a single site, practical concerns often necessitate the use of multiple scanners. With longitudinal studies, maintaining participants on the same scanner over time can become impossible over long periods, as scanners naturally age, break down, and are replaced with newer technology. In multi-site trials, each typically has a different mix of scanners of varying ages and manufacturer/models, and images from these are rarely directly comparable.

To reduce these challenges, researchers first reduce as many technical sources of variation as they can, i.e., using the same hardware and methods as much as possible. Pulse sequences and imaging protocols have been designed that attempt to produce relatively more-comparable results across a wide range of manufacturers and models for many imaging sequences [138, 139]; these protocols have been shared for use by other research studies and are designed to reduce compatibility issues, but they do not eliminate them. Scanners can also be validated (repeatedly) for use in a research study, such as by scanning standardized hardware phantoms and ensuring that resulting measurements are within expected ranges [140]. Within a study, analyses software/pipelines can also be harmonized; images can always be re-analyzed retrospectively with consistent software, but this is costly in both human and computational resources. Even from identical images, differences in software/analyses (even very small ones like minor version or operating system changes) can produce very different findings [141,142,143], which is a large challenge for scientific reproducibility and meta-analyses.

When standardizing all technical aspects of imaging and analyses is not feasible, statistical approaches can reduce (but not eliminate) the effects of these confounding factors. For example, a coalition of amyloid PET researchers has made specific efforts to correct for the effects of different tracers and analyses by using linear regressions to transform every individual combination to a standard “centiloid” 0–100 scale [87, 144]. Nonlinear mappings for amyloid PET have also been proposed to increase agreement across tracers and methods [145]. Other techniques involve post-processing images to better match each other, e.g., blurring or adding random noise to images of higher quality to match a common lower standard, which can be effective but inherently reduces the quality of the data that is deliberately degraded [146, 147]. Researchers are also increasingly applying techniques designed for generic numeric data (i.e., not specific for images) to reduce non-biologic factors, such as ComBat [148]. These methods can reduce, but do not eliminate, the effects of non-biologic sources of variance [149].

Conclusion

In this review, we have summarized the major types of brain images acquired in neurodegeneration research studies, their major purposes, and how they are analyzed. We also discussed some of the larger technical challenges in this area and how researchers are working to address them. We hope that this work will help readers who are relatively less familiar with neuroimaging technology to gain a better understanding of the wide breadth of imaging-based neurodegeneration research studies.