Introduction

Imaging databases are very useful to re-analyse data in a different context, to increase the number of subjects of a study, and to develop new methods. Imaging databases play a crucial role in numerous analysis methods that rely in the comparison between the data of a group or of an individual and a group of reference. This includes studies using a normative database for analysis and quantification purposes (such as partial volume correction), machine learning approaches, multi-atlas techniques, and validation of image processing pipelines. Databases with different modalities per participant also allow approaches that derive “missing” modalities, e.g. creating pseudo-CTs for attenuation correction in PET-MR [1,2,3,4].

In the last years, an increasing number of neuroimaging databases has been made available. These databases generally consist of MR images (such as ADNI http://adni.loni.usc.edu, OASIS https://www.oasis-brains.org; [5, 6], for a review see [7, 8]). There is also a large database of PET from the Copenhagen group, CIMBI, containing mainly serotonine receptor PET and associated data [9]. We are aware of very few datasets for [18F]fluorodeoxyglucose ([18F]FDG) PET imaging that have been published [10]) or are available on request ([11]; [12]; Alzheimer’s Disease Neuroimaging Initiative, ADNI http://adni.loni.usc.edu).

Acquisition of imaging data, such as MRI scanning and in particular PET imaging that requires the injection of a radiotracer, represents an important logistical and monetary cost. In addition, participants have to consent to data acquisition and dissemination, and many countries have restrictions on using ionising radiation in healthy controls, adding to difficulties in acquiring such databases. Database sharing thus contributes to reduce research costs and reduces radiation exposure of healthy controls.

In order to make database sharing more efficient, the scientific community has implemented a database standardisation to organize and describe the data (Brain Imaging Data Structure (BIDS), https://bids.neuroimaging.io, [13]) and more specifically for PET modality (https://bids-specification.readthedocs.io/en/bep-009/04-modality-specific-files/09-positron-emission-tomography.html, [14]). In this work we introduce a multi-modal database of 37 healthy subjects constructed with MRI, CT and [18F]FDG PET images to BIDS standard. We have obtained ethical permission to share the data on request.

Materials and methods

Recruitment and cohort characteristics

All enrolled subjects provided written informed consent to participate in the study (EudraCT: 2014-000610-56). The subjects were informed that their anonymized images could be used for methodological development and had been given the option to oppose this use of their data. The inclusion criteria were adult healthy subject and aged between 20 and 65 years. Exclusion criteria were (1) children and adults older than 65 years, (2) woman of childbearing potential without effective contraception, (3) history of neurological disorders, (4) any contraindication for MRI scanning, (5) active infectious disease. Thirty-nine subjects were included in the study. Each subject had a T1-weighted MRI, a T2 fluid-attenuated inversion recovery (FLAIR) MRI and an [18F]FDG PET/CT brain scan. For all participants, the PET/CT scan and the MRI session took place on the same day (between 8 a.m. and 14 p.m.). The subjects’ MR and PET images were visually reviewed by two neurologists for conspicuous brain abnormalities. Two subjects showing brain lesions on the MR images (one probable insular cavernoma, one cerebellar lesion with hyperintense signal in the FLAIR sequence suggesting possible inflammatory disease of the central nervous system) were excluded from the database.

MRI acquisition and reconstruction

MRI sequences were obtained on a Siemens Sonata 1.5 T scanner. Three-dimensional anatomical T1-weighted sequences (MPRAGE) were acquired in sagittal orientation (TR 2400 ms, TE 3.55 ms, inversion time 1000 ms, flip angle 8°). The images were reconstructed into a 160 × 192 × 192 matrix with voxel dimensions of 1.2 × 1.2 × 1.2 mm3 (axial field of view 230.4 mm). Sagittal Fluid-Attenuated Inversion Recovery (FLAIR, [15]) images (TR 6000 ms, TE 354 ms, Inversion time 2200 ms, flip angle 180°) were acquired with a 176 × 196 × 256 matrix and a voxel size of 1.2 × 1.2 × 1.2 mm3 (axial field of view 307.2 mm).

PET and CT acquisition and reconstruction

PET and CT data were acquired on a Siemens Biograph mCT64. During the uptake period, participants were instructed to rest with their eyes closed and without auditory stimulation. A static PET data acquisition started 50 min after the injection of 122.30 ± 21.29 MBq of [18F]FDG (individual doses are provided in the demographics table) and lasted 10 min [16]. PET images were reconstructed using 3D ordinary Poisson-ordered subsets expectation maximization (OP-OSEM 3D), incorporating the system point spread function and time of flight, and using 12 iterations and 21 subsets (Siemens’ “HD reconstruction”). Data correction (normalization, attenuation and scatter correction) was fully integrated within the reconstruction process. Gaussian post-reconstruction 3D filtering (FWHM = 4 mm isotropic) was applied to all PET images [17]. Reconstructions were performed with a zoom of 2 yielding a voxel size of 2.04 × 2.04 × 2.03 mm3 in a matrix of 200 × 200 × 109 voxels (axial field of view 221.27 mm). Low-dose CT images for attenuation correction were acquired with a tube voltage of 100 keV and reconstructed in a 512 × 512 × 233 matrix with a voxel size of 0.6 × 0.6 × 1.5 mm3 (axial field of view 349.5 mm).

Processing pipeline

Data anonymisation and pre-processing

Data anonymisation was performed on the DICOM files using the gdcmanon function (http://gdcm.sourceforge.net/html/gdcmanon.html). DICOM files were converted to NIFTI format with dcm2niix software (https://github.com/rordenlab/dcm2nii). The background of CT images was cleaned in order to remove the scanner table and other objects such as the pillow included in the background of the image. For this, a binary mask of the head of the subject was automatically generated following a procedure described in [18] using tools from the FSL (Version 6.0, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/) and NiftySeg (http://cmictig.cs.ucl.ac.uk/wiki/index.php/NiftySeg) suites. Finally, the binary mask was applied to the CT image.

Coregistration

As first step, the origin of each NIFTI image was set to the matrix centre. Then, CT, T1 MRI and FLAIR MRI images were coregistered to the [18F]FDG PET image using the Coregister & Estimate function from the SPM 12 toolbox (https://www.fil.ion.ucl.ac.uk/spm/software/spm12/).

Spatial normalisation

All images were normalized to MNI space through the tissue classification into grey and white matter probability maps of the T1 image. For that, individual subject’s deformation fields were calculated by the Segment function of SPM 12 [19] from the T1 images previously coregistered to the PET image (but not resliced to preserve native resolution). Transformations for MR to PET space coregistration and PET to MNI space normalisation were concatenated and applied at once to avoid an intermediate resampling of the MRI data. All normalized images were resampled at 1 × 1x1 mm using 4th degree B-spline interpolation.

Intensity normalisation

Reconstructed PET images were normalized by the subjects’ weight and injected dose to obtain Standard Uptake Value (SUV) images (radioactivity concentration [kBq/cm3] / (dose [kBq] / weight [kg])). In addition, reconstructed PET images were normalized by each subject’s mean activity within the intracranial volume (ICV) mask provided by SPM12 to obtain Standard Uptake Value ratio (SUVr).

Regional analysis

The T1 MR images were anatomically segmented into 83 regions using the Hammers_mith maximum probability atlas n30r83, which is based on the mutli-atlas fusion of 30 manually delineated MRIs of healthy young adults [5, 6], available at http://brain-development.org. The atlas was wrapped to each individual MRI space via the inverse transformation of the deformation fields from subject’s space to the MNI space computed at the spatial normalisation step. Grey matter and white matter probability maps obtained with the Segment function were thresholded at 0.5 and combined with the 83-ROI anatomical segmentation in order to separate their grey and white matter parts, expect for pure white matter regions like the corpus callosum, and pure grey matter regions like the basal ganglia. Mean regional SUV and SUVr were extracted in a selection of grey matter anatomical regions of the Hammers_mith segmentation.

Leave-one-out SPM analysis on [18F]FDG images

Leave-one-out ANCOVA was performed on SPM12 in order to compare each subject (healthy control) of the database to the others. For the statistical analysis, PET images were smoothed with a Gaussian filter at 8 mm FWHM. This further smoothing is always used in voxel-based analysis to accommodate interindividual anatomical variability and improve the sensitivity of the statistical analysis [20]. We used age and the global mean calculated within the intracranial volume mask as covariates. Two different contrasts were explored: Hyper-metabolism, i.e. activity of one subject > activity of the remaining subjects in the database, and hypo-metabolism, i.e. activity of one subject < activity of the remaining subjects in the database. Significant differences where defined at p < 0.05 FWE at the cluster level.

The database outliers were assessed with three criteria, for both hypometabolism and hypermetabolism.

  • Subject-level: number of subjects with significant differences / total number of subjects in the database × 100

  • Cluster-level: total number of significant clusters across all subjects / average number of resolution elements (resells) in the mask × 100

  • Voxel-level: total number of voxels among the significant clusters across all subjects / number of voxels in the SPM mask × 100

Results

Database IDB-MRXFDG

The final database consists of 37 participants (17 male/20 female, mean age ± SD, 38.11 ± 11.36 years; range, 23–65 years). Each participant has [18F]FDG PET, T1 MRI, FLAIR MRI, and CT images. An example of coregistered T1, FLAIR, CT and [18F]FDG PET images in the subject space are shown in Fig. 1 and the same images in normalized space are shown in Fig. 2. Table 1 summarizes the demographic information for each participant: subject ID, acquisition date, age of the participant at the time of the imaging session, sex, weight, size, injected dose of [18F]FDG, handedness and a comment if any hypersignal was observed on the FLAIR MRI.

Fig. 1
figure 1

Example of coregistered T1 MRI, FLAIR MRI, CT and [18F]FDG PET images (sagittal plane) for one subject of the database

Fig. 2
figure 2

Example of normalized T1 MRI, FLAIR MRI, CT and [18F]FDG PET images (sagittal plane) in MNI space, for one subject of the database

Table 1 Demographics table

The database is available in three different formats, following the BIDS common specification:

  • DICOM (data not processed)

  • NIFTI (multimodal images coregistered to PET subject space)

  • NIFTI normalized (images normalized to MNI space)

Table 2 lists the regional volumes obtained via the Hammers_mith maximum probability atlas. Coefficients of variation were as expected, without obvious outliers. The structure sizes were also in line with expectations [5, 6].

Table 2 Regional volumes in native space (in cm3)

Regional analysis

Figures 3 and 4 show boxplots of mean regional SUV and SUVr respectively, extracted in a selection of grey matter anatomical regions, for all subjects in the database. Each region is composed of left and right sub-regions. Mean regional SUV values were 5.36 ± 1.32, range 1.35–8.54 (Fig. 3). Three subjects in the database had lower SUV values (between 1.35 and 3). The distribution of SUVr values (Fig. 4) remains very similar to the distribution of SUV values (1.49 mean ± 0.26 SD, range 0.85–2.22), except that the dispersion is reduced and the outlier values from the three participants with unusually low SUVs are regularized. Normalizing with the ICV mean value thus acts as an efficient way for regularizing the SUV distribution leaving the inter-regional variability intact.

Fig. 3
figure 3

Boxplot of regional SUV for all subjects in the database. Centre lines correspond to medians, boxes to interquartile ranges, and whiskers to robust ranges. Outliers are represented as dots. Each dot represents a participant for unpaired regions and a participant’s right or left SUV value for paired regions

Fig. 4
figure 4

Boxplot pf regional activity normalized by mean activity in ICV mask (SPM) for all subjects in the database. Centre lines correspond to medians, boxes to interquartile ranges, and whiskers to robust ranges. Outliers are represented as dots

Leave-one-out SPM analysis

Results for the leave-one-out analysis of [18F]FDG PET are reported in Table 3. At the subject-level, 5/37 (13.5%) of the participants had any significant increases in [18F]FDG uptake (hypermetabolism) relative to the other 36 participants. Any significant decreases (hypometabolism) was found for 11/37 (29.7%) of the participants.

Table 3 % of abnormality in the database

At the cluster-level, significant changes were found in at most 5.21% of resolution elements, and at the voxel-level, in at most 0.32% of voxels. All abnormalities in controls compared with controls are by definition false positives. We examined all 16 and present our findings in the Additional file 1 (Table S1 and Table S2). Virtually all false positives had an anatomical or artefactual explanation.

Discussion

A new database of 37 healthy subjects including T1 and FLAIR MRI, CT, and [18F]FDG PET images, called IDB-MRXFDG, has been created.

The age range has been selected to reflect the ages of participants in cognitive and clinical research studies at the CERMEP imaging centre, encompassing amongst others epilepsy, movement disorders, multiple sclerosis and disorders of consciousness and will align with the research priorities of many similar centres.

We performed quality control of all images visually and by screening for volumetric and regional SUV abnormalities. Three subjects had unusually low SUVs; this may be due to imperfect observation of the need for fasting ahead of the scan. This could have been ascertained by measuring the blood glucose level which was not measured here, which is a limitation of the study. We show that a simple global normalisation procedure removes the resulting outliers (Fig. 4); depending on the application more sophisticated intra-scan normalisation procedures are conceivable [21, 22]. We also performed SPM leave-one-out studies for [18F]FDG. The relatively high false-positive rates per subject are explained by the existence of significant clusters of small size (from 1 to 95 voxels). Areas of apparent hypermetabolism were either at the edge of the brain or at the bottom of a particularly deep sulcus (see Additional file 1: Table S1); areas of apparent hypometabolism (Additional file 1: Table S2) were clearly linked to the participant’s anatomy, typically to a wide sulcus or fissure (7/11 cases). The other 4 cases were extracerebral or at the edge of the brain, probably linked to imperfect normalisation. We believe none would have been considered abnormal had they been seen in an analysis comparing one research subject with a particular condition against a group of controls. When testing the normality of the database at the cluster and voxel-level, the expected threshold of 5% of abnormality or lower was found for both hyper- and hypo-metabolism. The database therefore appears suitable for voxel-based [18F]FDG PET analysis with a ≤ 5% risk of Type 1 error.

The IDB-MRXFDG database could be used in many different applications such as the statistical comparison of a patient (or group of patients) to a database of healthy subjects, automatic quantitative analyses, and more generally methodology development in neuroimaging.

The inclusion of [18F]FDG PET in IDB-MRXFDG is particularly important. While there are now many MR databases covering, with varying density, the human lifespan as reviewed in [7], we are aware of very few [18F]FDG PET databases. Wei et al. [10]) scanned 78 healthy subjects aged 3–78 years on a PET/CT scanner; it is not clear whether this database is available on request, and there is no mention of MRI. The Marseille database (used e.g. in [12]) contains data from 60 healthy adults aged 21–78; [18F]FDG PET, T1 weighted MRI, and CT data are available by arrangement. A rare paediatric database [11] contains 24 datasets of participants aged 4.5–17.9 years (mean ± SD 10.06 ± 3.1 years) and may be shared on request. These are “pseudo-controls” derived from epilepsy patients, selected from among a total of 71 children as the subgroup with both a normal visual analysis and a normal SPM analysis derived iteratively. They have been scanned on a traditional PET scanner with transmission-based attenuation correction which makes comparison with PET/CT data difficult [23]; no MRI is available. A large database available on request is the Alzheimer’s Disease Neuroimaging Initiative, ADNI (http://adni.loni.usc.edu/about/) which comprises over 300 healthy control [18F]FDG PET datasets; however, participants are aged 55–90 and therefore more suited to dementia research but outside the typical age range used for studies in normal cognition or epilepsy, one of the main clinical indications for brain FDG PET. Similar concerns about the age of participants apply to those databases from the world-wide ADNI (WW-ADNI) networks that do contain FDG, as for example the Japan ADNI (J-ADNI; age 60–84) [24].

It should be noted that we used very high-quality reconstructions incorporating both the system point spread function and time-of-flight information, which will not be available on all machines. If lower resolution images are required, the images could simply be filtered with a gaussian kernel (e.g. [25]).

Examples of database uses for work in MR include the voxel-wise comparison of a patient with a control group to detect abnormalities from T1 images via voxel-based morphometry [26, 27] and its variants that use T1 derivatives like grey-white matter junction images [28, 29] for the detection of specific pathologies like Focal Cortical Dysplasia. FLAIR as a sequence highly sensitive to pathology has similarly been used at the single-subject level in comparison to control groups (e.g. [30, 31]). Another group of examples is the region-wise comparison of the size of cerebral structures between groups or between individuals and a control group (e.g. [32,33,34,35]). Importantly, such work has been successfully undertaken with control groups scanned on a different scanner (e.g. [36, 37]), and IDB-MRXFDG could be used to increase the size of control groups.

The multimodality aspect of IDB-MRXFDG is particularly important.

Since PET-CT scanners rapidly displaced PET-only scanners in the early 2000s, low-dose CT has been coupled to brain [18F]FDG PET for estimation of tissue density and attenuation correction. With the advent of commercial PET-MR scanners since 2011, there has been no direct way of measuring electron density in the head, and alternative approaches have had to be found. Synthesis of “pseudo-CTs” via atlas approaches [1, 2] is a successful approach that performs well overall [38] but requires pairs of MR and CT images to achieve the synthesis. IDB-MRXFDG has already been used for such approaches [39].

The latter application of databases—MR-based attenuation based on MR-CT pairs—is one domain where Deep Learning methods, notably with Convolutional Neuronal Networks, have recently become very successful [3, 40]. However, they often require substantially larger training datasets or priors than multi-atlas methods, in the case of MR-based attenuation recently estimated at 100–400 pairs, with an influence of MR heterogeneity [40]. More widespread availability of databases will further Deep Learning approaches, particularly when multiple modalities are available per subject, allowing e.g. synthesis of missing modalities [41].

Pairs of data are also required for partial volume effect correction methods incorporating structural MRI information (PET-MR pairs; e.g. [42]). The additional availability of FLAIR-T1 pairs can be exploited e.g. for detection of focal cortical dysplasias as the underlying substrate of medically refractory focal epilepsies [30].