Clinical evaluation of automated quantitative MRI reports for assessment of hippocampal sclerosis

Goodkin, Olivia; Pemberton, Hugh G.; Vos, Sjoerd B.; Prados, Ferran; Das, Ravi K.; Moggridge, James; De Blasi, Bianca; Bartlett, Philippa; Williams, Elaine; Campion, Thomas; Haider, Lukas; Pearce, Kirsten; Bargallό, Nuria; Sanchez, Esther; Bisdas, Sotirios; White, Mark; Ourselin, Sebastien; Winston, Gavin P.; Duncan, John S.; Cardoso, Jorge; Thornton, John S.; Yousry, Tarek A.; Barkhof, Frederik

doi:10.1007/s00330-020-07075-2

Clinical evaluation of automated quantitative MRI reports for assessment of hippocampal sclerosis

Neuro
Open access
Published: 04 August 2020

Volume 31, pages 34–44, (2021)
Cite this article

Download PDF

You have full access to this open access article

European Radiology Aims and scope Submit manuscript

Clinical evaluation of automated quantitative MRI reports for assessment of hippocampal sclerosis

Download PDF

Olivia Goodkin ORCID: orcid.org/0000-0001-9283-9716^1,2^na1,
Hugh G. Pemberton^1,2^na1,
Sjoerd B. Vos^1,2,3,
Ferran Prados^1,4,
Ravi K. Das⁵,
James Moggridge^2,6,
Bianca De Blasi⁷,
Philippa Bartlett^3,8,
Elaine Williams⁹,
Thomas Campion⁶,
Lukas Haider^10,11,
Kirsten Pearce⁶,
Nuria Bargallό¹²,
Esther Sanchez¹³,
Sotirios Bisdas^2,6,
Mark White¹⁴,
Sebastien Ourselin^7,15,
Gavin P. Winston^3,8,16,
John S. Duncan^3,8,
Jorge Cardoso¹⁵,
John S. Thornton^2,6^na1,
Tarek A. Yousry^2,6^na1 &
…
Frederik Barkhof^1,2,6,13^na1

3405 Accesses
12 Citations
4 Altmetric
Explore all metrics

Abstract

Objectives

Hippocampal sclerosis (HS) is a common cause of temporal lobe epilepsy. Neuroradiological practice relies on visual assessment, but quantification of HS imaging biomarkers—hippocampal volume loss and T2 elevation—could improve detection. We tested whether quantitative measures, contextualised with normative data, improve rater accuracy and confidence.

Methods

Quantitative reports (QReports) were generated for 43 individuals with epilepsy (mean age ± SD 40.0 ± 14.8 years, 22 men; 15 histologically unilateral HS; 5 bilateral; 23 MR-negative). Normative data was generated from 111 healthy individuals (age 40.0 ± 12.8 years, 52 men). Nine raters with different experience (neuroradiologists, trainees, and image analysts) assessed subjects’ imaging with and without QReports. Raters assigned imaging normal, right, left, or bilateral HS. Confidence was rated on a 5-point scale.

Results

Correct designation (normal/abnormal) was high and showed further trend-level improvement with QReports, from 87.5 to 92.5% (p = 0.07, effect size d = 0.69). Largest magnitude improvement (84.5 to 93.8%) was for image analysts (d = 0.87). For bilateral HS, QReports significantly improved overall accuracy, from 74.4 to 91.1% (p = 0.042, d = 0.7). Agreement with the correct diagnosis (kappa) tended to increase from 0.74 (‘fair’) to 0.86 (‘excellent’) with the report (p = 0.06, d = 0.81). Confidence increased when correctly assessing scans with the QReport (p < 0.001, η²_p = 0.945).

Conclusions

QReports of HS imaging biomarkers can improve rater accuracy and confidence, particularly in challenging bilateral cases. Improvements were seen across all raters, with large effect sizes, greatest for image analysts. These findings may have positive implications for clinical radiology services and justify further validation in larger groups.

Key Points

• Quantification of imaging biomarkers for hippocampal sclerosis—volume loss and raised T2 signal—could improve clinical radiological detection in challenging cases.

• Quantitative reports for individual patients, contextualised with normative reference data, improved diagnostic accuracy and confidence in a group of nine raters, in particular for bilateral HS cases.

• We present a pre-use clinical validation of an automated imaging assessment tool to assist clinical radiology reporting of hippocampal sclerosis, which improves detection accuracy.

Clinical Evaluation of a Quantitative Imaging Biomarker Supporting Radiological Assessment of Hippocampal Sclerosis

Article Open access 26 June 2023

Automated quantitative MRI volumetry reports support diagnostic interpretation in dementia: a multi-rater, clinical accuracy study

Article Open access 15 January 2021

A quantitative MRI index for assessing the severity of hippocampal sclerosis in temporal lobe epilepsy

Article Open access 25 April 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Hippocampal sclerosis (HS) is the most common cause of temporal lobe epilepsy worldwide [1] and can be effectively treated with surgical excision of the epileptogenic focus [2]. The hallmark pathological features of HS are neuronal loss and gliosis [3], which are characterised on MRI as hippocampal atrophy and T2 signal hyperintensity [4,5,6]. These qualitative imaging features are used in combination with other clinical data to decide whether surgery is recommended, indicating the central role of imaging in the decision-making process. Importantly, successful seizure-free postoperative outcome depends on precisely identifying and removing the seizure focus [7, 8].

Correct interpretation of MRI findings can be straightforward if the volume loss and increased T2 or FLAIR signal are unilateral and unequivocal. Volume loss assessment can be challenging if the subject’s head is positioned asymmetrically, if the changes are subtle, or if there is some concurrent age-related volume loss. A previous inter-rater agreement study demonstrated a threshold effect at which hippocampal volume difference was only visually detected at a volume asymmetry ratio of 0.7 or lower, meaning many subtle pathological changes could be missed [9]. Assessment of subtle T2/FLAIR signal change can be difficult because the hippocampus, like other components of the limbic lobe (archicortex and periarchicortex), has an intrinsically higher T2/FLAIR signal [10, 11]. When the volume and signal changes are both subtle as well as bilateral, the lack of a clear reference makes a correct diagnosis very difficult if not impossible. Quantification of hippocampal volume and signal intensity [12] as an adjunctive tool to visual assessment has the potential of improving detection accuracy and reducing inter-rater variability.

We have recently proposed a new framework to address key factors for translating quantitative imaging biomarkers from inception to clinical radiology practice [13]. The quantitative neuroradiology initiative (QNI) framework specifies six steps (Table 1). Having identified the appropriate imaging biomarkers (step 1), we developed a dual-algorithm quantification process (step 2). Although hippocampal segmentation in the presence of HS is challenging, recent automated techniques like the Hipposeg algorithm have been sensitive to pathology [14]. These segmentations can then be used for automated quantification of T2 signal in the hippocampus [15]. We developed and technically validated an automated pipeline, combining the two algorithms for the quantification of both hippocampal volume and T2 (qT2) [15, 16]. We encoded the pipeline’s output into a quantitative report (step 3), which includes novel representations of measures or ‘profiles’ along the anterior-posterior longitudinal axis of the hippocampus [17].

Table 1 The six steps for imaging biomarker translation outlined by the quantitative neuroradiology initiative (QNI) framework and how each is being addressed in the context of HS

Full size table

We are now working towards the introduction of this pipeline into the clinical workflow. This study is a proof-of-concept clinical validation study, representing the clinical pre-use validation (step 4) designed to assess whether the addition of a quantitative report to the neuroradiologist’s workflow enhances detection accuracy and confidence.

We hypothesise that such a quantitative report will (1) decrease inter-rater variability whilst increasing diagnostic accuracy and confidence for determining the presence of HS, and (2) have an identifiable effect across 3 ‘experience levels’ (neuroradiology consultant, neuroradiology specialist registrar, non-clinical image analyst), most pronounced in the less experienced group.

Methods

Test dataset

Our study group consisted of 43 subjects who had been scanned on a 3T GE MR750 scanner with a 32-channel coil at our centre. This dataset included patients with HS (15 histologically confirmed unilateral HS; 5 bilateral HS based on consensus of semiology, neurophysiology, and MRI) and 23 age-matched MR-negative epilepsy patients (mean age ± SD 40.0 ± 14.8 years, range 21.1–76.1 years, 22 men).

The imaging protocol consisted of:

(1)
three-dimensional (3D) T1-weighted inversion recovery fast spoiled gradient recalled echo (3D-T1) sequence for volumetric assessments; field of view (FOV), 224 × 256 × 256 mm (antero-posterior, left-right, inferior-superior); acquisition matrix, 224 × 256 × 256; voxel size, 1 mm isotropic; echo/repetition/inversion time (TE/TR/TI) = 3.1/7.4/400 ms; flip angle 11°; parallel imaging acceleration factor 2;
(2)
3D T2-weighted fluid attenuation inversion recovery (T2-FLAIR) sequence; a 3D fast spin echo (FSE) sequence with variable flip angle readout (CUBE); FOV, matrix, and angulation identical to the 3D-T1, but with TE/TR/TI = 137/6200/1882 ms [18];
(3)
coronal dual-contrast fast recovery fast spin echo proton density/T2-weighted (PD/T2) sequence for T2 quantification; FOV, 220 × 220; matrix, 512 × 512; in-plane resolution, 0.43 × 0.43 mm; 55 slices of 4 mm thickness (TE effective 30 and 119 ms, TR 7600 ms, SENSE factor 2).

Reference dataset

A normative dataset of 111 healthy controls (age 40.0 ± 12.8, range 17.0–66.6 years; 52 men) was created from subjects on the same scanner and same protocol, as detailed in Vos et al, [17].

Quantitative report generation and display

Hippocampal segmentation was performed using Hipposeg (http://niftyweb.cs.ucl.ac.uk/) which uses non-linear registration and a template database of 400 epilepsy patients with heterogeneous pathologies [14]. Quantitative T2 maps were generated voxel-wise from the two FSE effective echo time images using a monoexponential fit [15]. A group template was aligned to the long axis of the hippocampus, to calculate cross-sectional volume and qT2 values for slice-wise localisation [16]. The reference data was used to create normative reference ranges for total hippocampal volume, qT2 and left:right total hippocampal volume, and T2 ratios. Additionally, we have created novel hippocampal profiles [17] by producing group templates for the control population, aligning them to the long axis of the hippocampus and calculating cross-sectional area and qT2 for each subject, contextualised with normative reference data.

The quantitative report (QReport) displays non-identifying demographics (age, gender, scan date, scanner type, hospital), quality control measures, global volume of each hippocampus as well as hippocampus volume, and qT2 values along its long axis. All values are presented with left:right ratios and normative reference ranges. Snapshots of hippocampal segmentation are displayed (Figs. 1 and 2).

Assessment task

Three groups of raters were invited to assess the test dataset with and without the QReport available, in a fully randomised order. Each group comprised three raters with a pre-defined level of previous reporting experience: experts (consultant neuroradiologists); trainees (specialty registrars with an interest in neuroradiology); and non-clinical image analysts (MRI radiographers working in neurology centres, non-clinical epilepsy research fellows).

We designed a web platform to facilitate participation from various centres and provide consistent assessment conditions for all raters. The website included instructions for the raters, who were blinded to the diagnosis, followed by the cases displayed in a pre-defined randomly generated order, once with and once without the QReport available (Fig. 3). Each MR study was visualised in three orthogonal planes to mimic the routine neuroradiological environment. Raters were asked to assess each case, stating whether the images were normal or abnormal, and if abnormal, to choose between right, left, or bilateral HS. They were also asked to rate their degree of confidence for both decisions on a scale of 1 (not at all confident) to 5 (extremely confident). The exercise was not timed.

Statistical analysis

We used signal detection theory tests to determine the effects of the QReport on diagnostic accuracy. Assessments were defined as correctly ‘abnormal’ (true positive, TP), correctly ‘normal’ (true negative, TN), or erroneously ‘abnormal’ (false positive, FP), and erroneously ‘normal’ (false negative, FN). Accuracy was determined as:

$$ \mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}\times 100 $$

Data were analysed hierarchically. First, counts were made of correct and incorrect as normal or abnormal against our clinicopathological gold standard, both with and without the QReport, and a McNemar test was applied. Mean accuracy and sensitivity were analysed using paired t tests (report present vs. absent). Effect size, Cohen’s d, assesses the standardised difference in mean values, and d > 0.8 is classified as a large effect size [19]. Cohen’s kappa was used to assess agreement between each rater and the gold standard, a measure which accounts for ‘chance’ agreement [20]. Kappa of 0.60–0.79 can be defined as moderate and 0.80–0.90 as strong agreement [21]. Paired t tests were then applied to kappa values (QReport vs. no QReport). The same steps were applied for correct and incorrect lateralisation as R, L, or bilateral HS.

Difference in mean confidence ratings with and without the QReport was assessed with paired t tests. In exploratory analyses, mean confidence ratings were calculated for each rater, split by whether the correct or incorrect diagnosis was made and whether the QReport was present or absent. This was analysed using a 2 (correct vs. incorrect) × 2 (QReport present vs. absent) repeated measures ANOVA. We calculated Cronbach’s alpha and intra-class correlation (ICC) as measures of inter-rater agreement and reliability.

All statistical analyses were performed with SPSS Statistics for Mac, Version 24.0. IBM Corp.

Results

Test dataset characteristics

The mean age (standard deviation) in years (y) and gender ratio for each group of patients were (a) MR-negative 33.8 y (10.1 y), M:F 13:10; (b) left HS 39.2 (13.5), M:F 3:3; (c) right HS 44.7 y (16 y), M:F 4:5; and (d) bilateral HS 42.3 y (17.3 y), M:F 2:3. ANOVA between HS and MR-negative patients showed no significant age difference (F(1,8) = 1.83, p = 0.159). Percentage ratios for volume and qT2 generated by our pipeline for test dataset subjects are presented in Table 2. Values for left and right HS are combined as ‘unilateral’, where volume ratio is calculated as unaffected side:affected side and qT2 as affected side:unaffected side.

Table 2 Quantitative characteristics of the test dataset by disease group

Full size table

Detection accuracy

Detection accuracy for all raters was 87.5% without the QReport, yet still showed trend-level improvement with the QReport to 92.5% (p = 0.07, d = 0.69) (Table 3a). Large magnitude improvement effects were seen in the consultant and image analyst groups (Table 3), and although these did not reach nominal significance, the effect sizes were large [19].

Table 3 Correct detection as normal or abnormal, irrespective of lateralisation, by rater group

Full size table

Lateralisation accuracy improved with the QReport. When correctly rating a patient’s scan as abnormal, raters made an incorrect lateralisation of the HS (incorrectly choosing right, left, or bilateral) in 8.3% of cases without the QReport and only 3.3% of cases with the QReport. Correct lateralisation of HS by rater tended to increase with the QReport, from 83.5 to 91.5%, p = 0.075, with a moderate effect size d = 0.68.

For bilateral vs. all unilateral cases, the QReport improved overall accuracy in detecting bilateral cases (p = 0.028). Assessment accuracy for bilateral HS significantly increased when using the QReport, mean (SD) from 74.4 (28.77) to 91.1% (17.64), p = 0.042, d = 0.7.

Individual rater agreement with the gold standard

Kappa scores increased from 0.74 (SD 0.19), ‘moderate’ to 0.86 (SD 0.09), ‘strong’ with the report across all rater groups for correct lateralisation with a large effect size, p = 0.06, d = 0.81 (Table 4).

Table 4 Kappa scores for agreement of each rater with the gold standard

Full size table

Inter-rater agreement

Cronbach’s alpha for agreement across raters showed improvement in overall rating reliability from 0.452 without the report to 0.598 with the QReport, indicating some improved overall reliability. The ICC increased with the QReport from 0.073 to 0.138 for single measures and from 0.417 to 0.591 for average measures, again indicating a small improvement in rater agreement when using the report.

Rater confidence

Difference in subjective confidence levels reported by raters when assessing scans with and without the QReports was evaluated in a series of paired samples t tests (Table 5). These showed that with the QReport, raters were significantly more confident when correctly rating both normal (p < 0.01, Hedges’ g_z = 1.78) and abnormal scans (p < 0.01, g_z = 1.28).

Table 5 Rater confidence for normal and abnormal classification for all raters assessed by paired samples t tests

Full size table

To assess whether the effects of the QReport on confidence in correct diagnostic decisions depended upon experience level and scan normality, a 2 (QReport/no report) × 2 (normal vs. abnormal diagnosis) × 3 (experience level) mixed ANOVA was run on self-reported diagnostic confidence ratings in correctly diagnosed scans. Although power was limited by the small N, there was a very large main effect of the QReport, with raters being more confident in their correct diagnoses with the QReport (F(1,6) = 102.65, p < 0.001, effect size partial eta squared η²_p = 0.945). Raters were also significantly more confident in making abnormal diagnoses than normal diagnoses (F(1,6) = 8.911, p = 0.024, η²_p = 0.598), although this was unaffected by the QReport. The QReport’s effects on confidence were moderated by experience level (QReport*Experience Interaction F(2,6) = 7.748, p = 0.022, η²_p = 0.721), indicating a greater confidence increase in the non-clinical image analyst group (F(1,6) = 81.491, p < 0.001, η²_p = 0.931).

Discussion

We have performed a novel proof-of-concept clinical validation study to determine the effect of the availability of an automatically generated quantitative MRI report for HS on diagnostic accuracy and confidence across 3 levels of experience. Using previously tested algorithms, we developed a novel automated QReport pipeline for hippocampal volume and qT2, and evaluated the benefit of this QReport following a previously proposed scheme [13]. We found that the availability of a QReport increased accuracy and confidence in diagnosing HS, whilst decreasing inter-rater variability, evidenced by strong effect sizes, although not always reaching significance. The thus acquired pilot data will inform a future larger study.

In patients with temporal lobe epilepsy, the correct identification of MR changes typical for HS is central to their management and treatment. This process is often straightforward, but if the changes are subtle, making the correct diagnosis can be challenging. Previous studies using T2 relaxometry, or quantitative T2, have demonstrated high sensitivity and specificity for HS pathology [5, 22, 23], even when there was no obvious loss of hippocampal volume [24]. The importance of the clinical impact as well as the availability of postprocessing solutions led us to the adoption of hippocampus quantification into our QNI framework (Table 1). We have selected techniques that are currently the most suitable for translation into clinical service to support single-subject assessment using clinical quality MRI data. Based on previously published methodology [15, 16], we have encoded a fully automated pipeline, which we combined to create novel graphical representations embedded into a QReport for intended use in the neuroradiologist’s clinical workflow.

Overall, the availability of the QReport led to a large effect increase in assessment accuracy and rater agreement with the gold standard. QReports improved accuracy in all rater groups regardless of prior expertise, and increased correct lateralisation of pathology. Confidence in assessment increased significantly with quantification, consistent with previous outcomes when rating hippocampal atrophy in the case of dementia [25]. Our test dataset represents a broad spectrum of disease severity evidenced by the spread of volume and qT2 ratios (Table 2). Importantly, they included a substantial number of subtle unilateral HS cases with volume ratios > 0.7, a threshold at which unassisted visual detection can be very challenging [9]. We have successfully demonstrated the proof-of-concept for combining single-subject quantification with normative reference data for HS assessment, with potential import to clinical assessment and decision-making.

Previous HS biomarker validation studies have demonstrated enhanced assessment accuracy when using quantitative measures along with visual assessment, or ability to outperform visual inspection. These quantitative measures however have been applied as research paradigms, some using arbitrary thresholds for abnormality [26] and others comparing volume quantification alone to visual assessment alone [27, 28]. Our study presents raters with quantitative information of both volume and T2 signal, allowing them to assimilate the quantitative data with their visual qualitative impressions, as they would do in a clinical reporting setting. This novelty and similarity to the clinical reporting workflow supports a viable translational opportunity for quantitative HS reporting as an adjunct to neuroradiologists’ assessments.

Another important aspect of our study is the use of multiple groups of raters with different experience levels, again reflecting the clinical situation. The largest QReport-associated improvements in both assessment accuracy and confidence were seen in the image analyst group of raters. This aligned with our hypothesis that less experienced raters would benefit from having individual quantified results contextualised within what is expected as normal reference ranges. In addition, we saw large effect sizes for individual rater agreement with the gold standard (kappa) for the expert group of raters. Even more interesting is the finding that the experts’ kappa scores were highest of the three groups without the QReport and they became higher still with the QReport. We assume that raters with higher levels of expertise have built up an internal normative reference based on their own years of practice, which would account for their high baseline scores. The quantitative report would then further assist them in the challenging or subtle cases. Presenting this information to the less experienced raters could level out the baseline discrepancy of expertise and afford the individual patient with a more objective and informed assessment by any imaging specialist.

Interestingly, we saw that the image analyst results improved more than trainees’ with the QReport available. This possibly reflects that image analysts, with no radiological experience, more strongly rely on the report than the trainees, who may struggle to find a balance of integrating the quantitative information with their own assessment in some subtle cases. The improvement in the consultant group indicates that they found a balance between integrating the QReport information where it was helpful.

Our study also addresses the challenging issue of bilateral HS, which can be particularly subtle and difficult to detect visually, making treatment decisions challenging to reach. Despite the small sample size, we found a significant subgroup effect of increased detection accuracy for bilateral HS when a QReport was available. Correct assessment of bilateral HS is clinically very important. Incorrectly diagnosing bilateral HS as unilateral HS, or as normal, could severely impact outcome, as surgical resection of one hippocampus is unlikely to result in seizure freedom postoperatively, whilst likely to cause significant memory impairment. Indeed, it is thought that some surgical failures may be due to a subtle bilateral component that had not been appreciated on imaging [29]. Graphical depiction of subtle raised signal or volume loss along the length of the hippocampus that we provide in our reports may be very useful in helping to elucidate focal abnormalities that are not readily detected visually.

Limitations

There were several potential limitations to our study. The overall number of subjects enrolled was limited as was the number of raters. Many of the beneficial effects of the QReport were therefore only demonstrated at trend-level significance, albeit with robust effect sizes. Since raters were starting from a high baseline accuracy of detection, a larger test subject population may be needed to demonstrate significant benefit.

Although raters were not informed of the number of positive cases to expect, it is possible that they were primed to expect HS cases at a higher rate than would be encountered in routine clinical practice in which most scans are negative. Contrary to the clinical environment, they were also deprived of any clinical referral data to which they would usually have access.

We also considered the potential for raters to misjudge the QReport. Although we did see instances where a correct assessment was made without a QReport and an incorrect one made with a QReport, this only occurred in 1.7 cases per rater on average, and was even lower for experienced raters at 1.3 cases per rater in the consultant group.

In constructing a dataset with a clinical/pathological gold standard to allow statistical analysis, we may have chosen histologically confirmed or bilateral HS cases with high clinical certainty that were inevitably more visually apparent than more subtle or equivocal cases. This approach is, however, difficult to avoid, if a gold standard is required for reference. Furthermore, our control subjects were MRI-negative patients with epilepsy, and their underlying diagnoses were not established prior to this study. It is possible that subtle hippocampal pathology was present in some of these cases. In addition, although our cohort had a wide age range, it was skewed towards younger individuals, when HS is likely to come to medical attention.

Finally, all data was collected on a single scanner with a uniform imaging protocol. Although providing favourable study conditions, this does not reflect the clinical variability in scanner, imaging protocol, and image quality usually encountered in a radiology department. This variability is a limitation that would need to be assessed and mitigated prior to widespread adoption of our pipeline.

Conclusion

This proof-of-concept clinical validation represents a key step for the translation of HS imaging biomarkers into clinical practice. We have shown that single-subject quantitative measures, presented in the context of normative data in a novel report format, can improve assessment accuracy, inter-rater agreement, and well-placed rater confidence. Based on the positive results of this study, we now plan to proceed to a supervised introduction into our local clinical service for in-use validation, as well as longer-term outcome and efficiency evaluation to assess the impact on treatment decisions for patients with HS.

Abbreviations

3D:: Three dimensional
ANOVA:: Analysis of variance
Corp:: Corporation
df:: Degrees of freedom
FLAIR:: Fluid attenuation inversion recovery
FN:: False negative
FOV:: Field of view
FP:: False positive
FSE:: Fast spin echo
GE:: General electric
HS:: Hippocampal sclerosis
IBM:: International Business Machines
ICC:: Intra-class correlation
MR:: Magnetic resonance
MRI:: Magnetic resonance imaging
P-A:: Posterior-anterior
PACS:: Picture archiving and communication system
PD:: Proton density
QNI:: Quantitative neuroradiology initiative
QReports:: Quantitative reports
qT2:: Quantitative T2
SD:: Standard deviation
SENSE:: Sensitivity encoding
SPSS:: Statistical Package for Social Sciences
TE:: Echo time
TI:: Inversion time
TN:: True negative
TP:: True positive
TR:: Repetition time
y:: Years
η ² _p :: Partial eta squared

References

Morita ME, Cendes F (2010) Quantitative MRI techniques in MTLE: toward a better understanding of hippocampal sclerosis. Epilepsia 51:76–79. https://doi.org/10.1111/j.1528-1167.2009.02454.x
Article PubMed Google Scholar
De Tisi J, Bell GS, Peacock JL et al (2011) The long-term outcome of adult epilepsy surgery, patterns of seizure remission, and relapse: a cohort study. Lancet 378:1388–1395. https://doi.org/10.1016/S0140-6736(11)60890-8
Article PubMed Google Scholar
Thom M, Eriksson S, Martinian L et al (2009) Temporal lobe sclerosis associated with hippocampal sclerosis in temporal lobe epilepsy: neuropathological features. J Neuropathol Exp Neurol 68:928–938. https://doi.org/10.1097/NEN.0b013e3181b05d67
Article PubMed PubMed Central Google Scholar
Van Paesschen W (2004) Qualitative and quantitative imaging of the hippocampus in mesial temporal lobe epilepsy with hippocampal sclerosis. Neuroimaging Clin N Am 14:373–400. https://doi.org/10.1016/j.nic.2004.04.004
Article PubMed Google Scholar
Briellmann RS, Kalnins RM, Berkovic SF, Jackson GD (2002) Hippocampal pathology in refractory temporal lobe epilepsy: T2-weighted signal change reflects dentate gliosis. Neurology 58:265–271. https://doi.org/10.1212/wnl.58.2.265
Article PubMed Google Scholar
Peixoto-Santos JE, Kandratavicius L, Velasco TR et al (2017) Individual hippocampal subfield assessment indicates that matrix macromolecules and gliosis are key elements for the increased T2 relaxation time seen in temporal lobe epilepsy. Epilepsia 58:149–159. https://doi.org/10.1111/epi.13620
Article CAS PubMed Google Scholar
Lencz T, McCarthy G, Bronen RA et al (1992) Quantitative magnetic resonance imaging in temporal lobe epilepsy: relationship to neuropathology and neuropsychological function. Ann Neurol 31:629–637. https://doi.org/10.1002/ana.410310610
Article CAS PubMed Google Scholar
Duncan JS, Sagar HJ (1987) Seizure characteristics, pathology, and outcome after temporal lobectomy. Neurology 37:405–409. https://doi.org/10.1212/wnl.37.3.405
Article CAS PubMed Google Scholar
Reutens DC, Stevens JM, Kingsley D et al (1996) Reliability of visual inspection for detection of volumetric hippocampal asymmetry. Neuroradiology 38:221–225. https://doi.org/10.1007/BF00596533
Article CAS PubMed Google Scholar
Hirai T, Korogi Y, Yoshizumi K, Shigematsu Y, Sugahara T, Takahashi M (2000) Limbic lobe of the human brain: evaluation with turbo fluid-attenuated inversion-recovery MR imaging. Radiology 215:470–475. https://doi.org/10.1148/radiology.215.2.r00ma06470
Asao C, Hirai T, Yoshimatsu S et al (2008) Human cerebral cortices: signal variation on diffusion-weighted MR imaging. Neuroradiology 50:205–211. https://doi.org/10.1007/s00234-007-0327-9
Article PubMed Google Scholar
Sullivan DC, Obuchowski NA, Kessler LG et al (2015) Metrology standards for quantitative imaging biomarkers. Radiology 277:813–825. https://doi.org/10.1148/radiol.2015142202
Article PubMed PubMed Central Google Scholar
Goodkin O, Pemberton H, Vos SB et al (2019) The quantitative neuroradiology initiative framework: application to dementia. Br J Radiol 20190365. https://doi.org/10.1259/bjr.20190365
Winston GP, Cardoso MJ, Williams EJ et al (2013) Automated hippocampal segmentation in patients with epilepsy: available free online. Epilepsia 54:2166–2173. https://doi.org/10.1111/epi.12408
Article PubMed PubMed Central Google Scholar
Winston GP, Vos SB, Burdett JL, Cardoso MJ, Ourselin S, Duncan JS (2017) Automated T2 relaxometry of the hippocampus for temporal lobe epilepsy. Epilepsia 58:1645–1652. https://doi.org/10.1111/epi.13843
Vos SB, Winston GP, Toussaint N, Cardoso MJ, Ourselin S, Duncan JS (2017) Automated hippocampal volumetry profiles along the anterior-posterior axis for hippocampal sclerosis detection. Epilepsia 58:S183–S184
Vos SB, Winston GP, Goodkin O et al (2019) Hippocampal profiling: localized magnetic resonance imaging volumetry and T2 relaxometry for hippocampal sclerosis. Epilepsia. https://doi.org/10.1111/epi.16416
Vos SB, Micallef C, Barkhof F et al (2018) Evaluation of prospective motion correction of high-resolution 3D-T2-FLAIR acquisitions in epilepsy patients. J Neuroradiol 45:368–373. https://doi.org/10.1016/J.NEURAD.2018.02.007
Article PubMed PubMed Central Google Scholar
Cohen J (2013) Statistical power analysis for the behavioral sciences, 2nd edn. Routledge, New York
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46. https://doi.org/10.1177/001316446002000104
Article Google Scholar
McHugh ML (2012) Interrater reliability: the kappa statistic. Biochem Med (Zagreb) 22:276–282. https://doi.org/10.11613/bm.2012.031
Article Google Scholar
Jackson GD, Connelly A, Duncan JS, Grünewald RA, Gadian DG (1993) Detection of hippocampal pathology in intractable partial epilepsy: increased sensitivity with quantitative magnetic resonance T2 relaxometry. Neurology 43:1793–1799. https://doi.org/10.1212/wnl.43.9.1793
Van Paesschen W, Revesz T, Duncan JS, King MD, Connelly A (1997) Quantitative neuropathology and quantitative magnetic resonance imaging of the hippocampus in temporal lobe epilepsy. Ann Neurol 42:756–766. https://doi.org/10.1002/ana.410420512
Bernasconi A, Bernasconi N, Caramanos Z et al (2000) T2 relaxometry can lateralize mesial temporal lobe epilepsy in patients with normal MRI. Neuroimage 12:739–746. https://doi.org/10.1006/nimg.2000.0724
Article CAS PubMed Google Scholar
Bosco P, Redolfi A, Bocchetta M et al (2017) The impact of automated hippocampal volumetry on diagnostic confidence in patients with suspected Alzheimer’s disease: a European Alzheimer’s disease consortium study. Alzheimers Dement 13:1013–1023. https://doi.org/10.1016/j.jalz.2017.01.019
Article PubMed Google Scholar
Hu WH, Liu LN, Zhao BT et al (2018) Use of an automated quantitative analysis of hippocampal volume, signal, and glucose metabolism to detect hippocampal sclerosis. Front Neurol 9. https://doi.org/10.3389/fneur.2018.00820
Farid N, Girard HM, Kemmotsu N et al (2012) Temporal lobe epilepsy: quantitative MR volumetry in detection of hippocampal atrophy 1. Radiology 264. https://doi.org/10.1148/radiol.12112638/-/DC1
Mettenburg JM, Branstetter BF, Wiley CA, Lee P, Richardson RM (2019) Improved detection of subtle mesial temporal sclerosis: validation of a commercially available software for automated segmentation of hippocampal volume. AJNR Am J Neuroradiol 40:440–445. https://doi.org/10.3174/ajnr.A5966
Hennessy MJ, Elwes RD, Binnie CD, Polkey CE (2000) Failed surgery for epilepsy: a study of persistence and recurrence of seizures following temporal resection. Brain 123:2445–2466. https://doi.org/10.1093/brain/123.12.2445
Article PubMed Google Scholar

Download references

Acknowledgments

We would like to thank our radiographers for acquiring high-quality clinical data. We are grateful to the Epilepsy Society for supporting the Epilepsy Society MRI scanner.

Funding

This work is supported by the National Institute for Health Research (NIHR) University College London Hospitals Biomedical Research Centre and the Engineering and Physical Sciences Research Council (EPSRC). GPW was supported by the MRC (G0802012, MR/M00841X/1). BDB was supported by the EPSRC-funded UCL CDT in Medical Imaging (EP/L016478/1).

Author information

Olivia Goodkin and Hugh Pemberton contributed equally to this work.

Authors and Affiliations

Centre for Medical Image Computing (CMIC), University College London, London, UK
Olivia Goodkin, Hugh G. Pemberton, Sjoerd B. Vos, Ferran Prados & Frederik Barkhof
Neuroradiological Academic Unit, UCL Queen Square Institute of Neurology, University College London, London, UK
Olivia Goodkin, Hugh G. Pemberton, Sjoerd B. Vos, James Moggridge, Sotirios Bisdas, John S. Thornton, Tarek A. Yousry & Frederik Barkhof
Epilepsy Society MRI Unit, Chalfont St Peter, UK
Sjoerd B. Vos, Philippa Bartlett, Gavin P. Winston & John S. Duncan
Universitat Oberta de Catalunya, Barcelona, Spain
Ferran Prados
Clinical, Educational and Health Psychology, University College London, London, UK
Ravi K. Das
Lysholm Department of Neuroradiology, National Hospital for Neurology and Neurosurgery, UCLH NHS Foundation Trust, London, UK
James Moggridge, Thomas Campion, Kirsten Pearce, Sotirios Bisdas, John S. Thornton, Tarek A. Yousry & Frederik Barkhof
Department of Medical Physics and Bioengineering, University College London, London, UK
Bianca De Blasi & Sebastien Ourselin
Department of Clinical and Experimental Epilepsy, University College London, London, UK
Philippa Bartlett, Gavin P. Winston & John S. Duncan
Wellcome Trust Centre for Neuroimaging, UCL Queen Square Institute of Neurology, University College London, London, UK
Elaine Williams
Department of Biomedical Imaging and Image Guided Therapy, Medical University of Vienna, Vienna, Austria
Lukas Haider
NMR Research Unit, Department of Neuroinflammation, UCL Queen Square Institute of Neurology, University College London, London, UK
Lukas Haider
Radiology Department, Hospital Clínic de Barcelona and Magnetic Resonance Image Core Facility, Institut d’Investigacions Biomèdiques August Pi I Sunyer (IDIBAPS), Barcelona, Spain
Nuria Bargallό
Radiology & Nuclear Medicine, VU University Medical Center, Amsterdam, The Netherlands
Esther Sanchez & Frederik Barkhof
Digital Services, University College London Hospital, London, UK
Mark White
School of Biomedical Engineering and Imaging Sciences, King’s College London, London, UK
Sebastien Ourselin & Jorge Cardoso
Department of Medicine, Division of Neurology, Queen’s University, Kingston, Ontario, Canada
Gavin P. Winston

Authors

Olivia Goodkin
View author publications
You can also search for this author in PubMed Google Scholar
Hugh G. Pemberton
View author publications
You can also search for this author in PubMed Google Scholar
Sjoerd B. Vos
View author publications
You can also search for this author in PubMed Google Scholar
Ferran Prados
View author publications
You can also search for this author in PubMed Google Scholar
Ravi K. Das
View author publications
You can also search for this author in PubMed Google Scholar
James Moggridge
View author publications
You can also search for this author in PubMed Google Scholar
Bianca De Blasi
View author publications
You can also search for this author in PubMed Google Scholar
Philippa Bartlett
View author publications
You can also search for this author in PubMed Google Scholar
Elaine Williams
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Campion
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Haider
View author publications
You can also search for this author in PubMed Google Scholar
Kirsten Pearce
View author publications
You can also search for this author in PubMed Google Scholar
Nuria Bargallό
View author publications
You can also search for this author in PubMed Google Scholar
Esther Sanchez
View author publications
You can also search for this author in PubMed Google Scholar
Sotirios Bisdas
View author publications
You can also search for this author in PubMed Google Scholar
Mark White
View author publications
You can also search for this author in PubMed Google Scholar
Sebastien Ourselin
View author publications
You can also search for this author in PubMed Google Scholar
Gavin P. Winston
View author publications
You can also search for this author in PubMed Google Scholar
John S. Duncan
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Cardoso
View author publications
You can also search for this author in PubMed Google Scholar
John S. Thornton
View author publications
You can also search for this author in PubMed Google Scholar
Tarek A. Yousry
View author publications
You can also search for this author in PubMed Google Scholar
Frederik Barkhof
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olivia Goodkin.

Ethics declarations

Guarantor

The scientific guarantor of this publication is Professor Frederik Barkhof.

Conflict of interest

The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

Statistics and biometry

One of the authors has significant statistical expertise.

Informed consent

Written informed consent was waived by the Institutional Review Board.

Ethical approval

The clinically acquired data used was considered a service improvement by the National Hospital for Neurology and Neurosurgery and the Institute of Neurology Joint Research Ethics Committee. Informed written consent was obtained from control subjects.

Study subjects or cohorts overlap

Study subjects or cohorts have been previously reported in Vos et al, Epilepsia 2019.

Methodology

• retrospective

• experimental

• performed at one institution

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Goodkin, O., Pemberton, H.G., Vos, S.B. et al. Clinical evaluation of automated quantitative MRI reports for assessment of hippocampal sclerosis. Eur Radiol 31, 34–44 (2021). https://doi.org/10.1007/s00330-020-07075-2

Download citation

Received: 13 February 2020
Revised: 07 May 2020
Accepted: 15 July 2020
Published: 04 August 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s00330-020-07075-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Clinical evaluation of automated quantitative MRI reports for assessment of hippocampal sclerosis

Abstract

Objectives

Methods

Results

Conclusions

Key Points

Similar content being viewed by others

Clinical Evaluation of a Quantitative Imaging Biomarker Supporting Radiological Assessment of Hippocampal Sclerosis

Automated quantitative MRI volumetry reports support diagnostic interpretation in dementia: a multi-rater, clinical accuracy study

A quantitative MRI index for assessing the severity of hippocampal sclerosis in temporal lobe epilepsy

Introduction

Methods

Test dataset

Reference dataset

Quantitative report generation and display

Assessment task

Statistical analysis

Results

Test dataset characteristics

Detection accuracy

Individual rater agreement with the gold standard

Inter-rater agreement

Rater confidence

Discussion

Limitations

Conclusion

Abbreviations

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Guarantor

Conflict of interest

Statistics and biometry

Informed consent

Ethical approval

Study subjects or cohorts overlap

Methodology

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation