All raw and processed data along with the processing scripts that were used in this manuscript are available at https://github.com/afids/afids-clinical. This repository is licensed under the MIT License.
Subject demographics and MRI acquisition
Subject scans used in this study were obtained from 39 individuals diagnosed with PD (age: 60.2 ± 6.8, sex: 33.3% female). For all subjects, the MRI sequence used was a post-gadolinium-enhanced volumetric T1-weighted (T1w) image (echo time = 1.5 ms, inversion time = 300 ms, flip angle = 20°, receiver bandwidth = 22.73 kHz, field of view = 26 cm × 26 cm, matrix size = 256 × 256, slice thickness = 1.4 mm, resolution = 1.25 × 1.25 × 1.50 mm) (Signa, 1.5 T, General Electric, Milwaukee, Wisconsin, USA). The subject data were collected at University Hospital in London, ON, Canada. The study was approved by the Human Subject Research Ethics Board (HSREB) office at the University of Western Ontario (REB# 109,045).
AFID placement
The individual scans were imported into 3D Slicer version 4.10.0 (Fedorov et al. 2012). The subject scans were first transformed into anterior commissure (AC)–posterior commissure (PC) space (AC–PC space), and the raters were required to initially place 4 of the AFIDs, which included: AC (AFID01), PC (AFID02) and two additional on the midline. The built-in “AC–PC transform” function in 3D Slicer was used to align the AC–PC horizontally in-line in the anteroposterior plane. Adequate alignment was subjectively judged by each rater, who then placed the remaining AFIDs as previously outlined (Lau et al. 2019). An interactive three-dimensional schematic brain with all AFIDs labelled can be found in the supplementary material (Online Resource 1) for reference.
Five raters were initially trained to place AFIDs using publicly available brain images: MNI152NLin2009bAsym (Fonov et al. 2011; Ciric et al. 2021), deepbrain7t (Lau et al. 2017) and PD25-T1MPRAGE (Xiao et al. 2017). Each template has a set of ideal AFID coordinates (ground truth), which represents the mean AFID coordinate between a set of experienced raters. The ground truth standards are included in the GitHub repository (https://github.com/greydongilmore/afids-clinical/data/fid_standards). Quality assurance was performed to ensure each rater was placing the AFIDs on the templates below a minimum threshold of error (Euclidean error < 2.00 mm when compared with ground truth placements). Once the raters had received adequate feedback about their initial ratings during the training phase, they then independently performed the AFIDs protocol in the subject scans. Two raters (MA and GG) had prior neuroanatomy experience and were deemed “expert”, while three (AT, MJ and RC) had no prior neuroanatomy experience and were deemed “novice”. The novice raters had no experience with medical imaging so additional training was provided on navigating an MRI sequence in 3D Slicer (i.e. left/right, axial/coronal/sagittal views etc.). A total of 6240 AFIDs were placed.
Analysis in subject space
The 3D coordinates of each AFID were exported and subsequently analyzed in MATLAB (vR2018b). The anatomical fiducial localization error (AFLE) was calculated as the Euclidean distance between each individually placed AFID and the group mean, in each of the 32 AFIDs in each scan. Therefore, 6240 AFLE measurements were made for each manually placed AFID. Outliers were determined as having an AFLE of greater than 10.0 mm and are reported in the results.
To determine each rater’s deviation from the group mean, the mean rater AFLE across all 39 subjects was calculated for each AFID. AFLE was then dichotomized between expert and novice raters by calculating the mean AFLE among these two groups. Wilcoxon rank-sum tests were used to determine significance in AFLEs between expert and novice raters. Bonferroni correction was used to account for multiple comparisons with an adjusted p value of 0.05/32 as a threshold for significance. The overall AFLE for each AFID was then calculated as the mean AFLE across all raters.
Rater reliability was assessed using intraclass correlation (ICC), which was calculated in each dimension. A two-way random effects model with single measurement type was used, ICC(2,1) as determined by Shrout and Fleiss (Shrout and Fleiss 1979). ICC among all raters, expert raters and novice raters was calculated.
Analysis in MNI space
To assess and quantify registration error, the subject scans were non-linearly transformed to MNI152NLin2009cAsym brain template space using fMRIPrep 1.5.4 ((Esteban et al. 2019); RRID:SCR_016216), which is based on Nipype 1.3.1 ((Gorgolewski et al. 2011); RRID:SCR_002502). Specifically, the T1-weighted (T1w) image was corrected for intensity non-uniformity (INU) with N4BiasFieldCorrection (Tustison et al. 2010), distributed with ANTs 2.2.0 ((Avants et al. 2008); RRID:SCR_004757), and used as T1w reference throughout the workflow. The T1w reference was then skull-stripped with a Nipype implementation of the antsBrainExtraction.sh workflow (from ANTs), using OASIS30ANTs as the target template. Brain tissue segmentation of cerebrospinal fluid, white-matter and gray-matter was performed on the brain-extracted T1w using the fast algorithm from FSL 5.0.9 ((Zhang et al. 2001); RRID:SCR_002823). Volume-based spatial normalization to one standard space (MNI152NLin2009cAsym) was performed using a symmetric diffeomorphic image registration method (antsRegistration; ANTs 2.2.0), using brain-extracted versions of both T1w reference and the T1w template. The following template was selected for spatial normalization: ICBM 152 Nonlinear Asymmetrical template version 2009c ((Fonov et al. 2009); RRID:SCR_008796; TemplateFlow ID: MNI152NLin2009cAsym). Many internal operations of fMRIPrep use Nilearn 0.6.0 ((Abraham et al. 2014); RRID:SCR_001362), mostly within the functional processing workflow. For more details of the pipeline, see the section corresponding to workflows in fMRIPrep’s documentation.
We transformed each individually placed AFID to MNI space, and the mean coordinates of each AFID across all raters to MNI space. We calculate the Euclidean distance between each individually placed AFID transformed to MNI space and the group mean for each AFID placed in MNI space. We term this the real-world Anatomical Fiducial Registration Error (AFRE). The mean real-world AFRE across all subjects and raters was then calculated in the same manner as for the AFLE. We then calculate the Euclidean distance from the mean AFID transformed to MNI space, obtained by averaging the coordinates across all raters, and termed this the consensus AFRE, consistent with our definition in the original manuscript (Lau et al. 2019). The real-world AFRE represents the expected AFRE obtained by a single rater, and we focussed on this analysis since it most represents the situation in a clinical setting, although we also computed the consensus AFRE since it represents a better overall measure of registration error within our clinical sample and is directly comparable to our prior work. A schematic illustrating these measures is presented in Fig. 1.
We calculated the mean AFRE for linearly and non-linearly registered images. Wilcoxon rank-sum tests were used to determine significance between real-world AFREs obtained following both linear and non-linear registration, and significance between non-linearly registered real-world and consensus AFREs. Bonferroni correction was used to account for multiple comparisons with an adjusted p value of 0.05/32 as a threshold for significance.
Distance between AFIDs as a biomarker of disease
We sought to investigate a possible secondary benefit of the AFIDs protocol to examine unique morphometric features in our PD patient population. As such, we computed all pairwise Euclidean distances between AFIDs, generating 496 distance measures (32*31/2). We compared these values to distances obtained from a control group of 30 subjects from the OASIS-1 database with AFIDs previously placed (Lau et al. 2019). All 30 subjects used had maximum Mini-Mental State Exam (MMSE) scores (i.e. 30 out of 30). The mean age is 58.0 ± 17.9, and 17 subjects (56.7%) were female. Age between the two groups was compared using an unpaired two-tailed t test, and sex between the two groups was compared using a chi-square test. Wilcoxon rank-sum tests were used to determine significant differences in pairwise distances between the two groups, and Bonferroni correction was used to account for multiple comparisons with an adjusted p value of 0.05/496 being used as a threshold for significance.