Introduction

Gliomas are the commonest primary brain tumour and they remain a leading cause of solid cancer-related deaths in the under 40s [1]. Gliomas encompass a heterogeneous broad group of tumours with different cellular origins and variable biological behaviour. Classification of gliomas is therefore essential to guide therapy, anticipate treatment response and predict prognosis. Historically, the old WHO glioma classification was based on a histologic definition of predominant cellular lineage and grade, essentially differentiating gliomas into high-grade more aggressive and low-grade less aggressive [2]. The advent of biomolecular characterisation has led to the identification of key genetic markers which also influence tumour behaviour. The isocitrate dehydrogenase (IDH) gene has received particular recognition and has contributed to a major revision in the latest 2016 World Health Organization classification [3]. Despite these advances, this essential classification remains reliant on invasive tissue sampling. This has considerable risks, including foremost permanent neurological deficit which also greatly precludes repeat sampling to check for high-grade transformation. Tissue sampling is additionally fallible, with biopsies and incomplete resections potentially providing a non-representative sample due to intra-tumoral heterogeneity. Thus, there is a clear need for a non-invasive imaging marker of tumour type, grade and genetic status which would inform management and estimate prognosis.

Diffusion-weighted imaging (DWI) and diffusion tensor imaging (DTI) have a well-established role in the radiological assessment of brain tumours. The utility of these diffusion-based sequences in staging remains suboptimal, however [4]. Both DWI and DTI assume that the diffusion of water molecules involves random, Brownian motion. Following this assumption the probability distribution function (PDF), the chance of a proton diffusing between two points in a given time is thought to follow a Gaussian distribution [5]. The apparent diffusion coefficient (ADC) is based on the standard deviation of this PDF and DTI extends this by deriving the ADC in a direction-dependent manner [5, 6]. Although the Gaussian model in DWI and DTI holds true for pure liquids, it overlooks the in vivo effect of the complex cytoarchitecture of organic tissue formed of various compartments, cell types and intracellular constituents. The true PDF, therefore, exhibits non-Gaussian behaviour and the way this deviates from a Gaussian PDF can be assessed using the dimensionless statistical measure called kurtosis. Diffusion kurtosis imaging (DKI) is a novel extension of DTI and provides the degree of directional, non-Gaussian diffusion, i.e. the diffusion kurtosis tensor [7, 8]. Although the actual physiological basis of DKI remains unclear, the notion is that microstructural variances between gliomas of different grades will result in different DKI parameters, e.g. mean kurtosis (MK), and therefore will potentially provide a more accurate, non-invasive biomarker for glioma staging [5, 9]. Early research is encouraging. Two prior meta-analyses which looked at the diagnostic accuracy of DKI for glioma grading projected a pooled area under the curve (AUC) of 0.94 and 0.96 for MK [10, 11]. We attempted to consolidate the preliminary meta-analyses evidence in the topic through an updated systematic review and meta-analysis. These earlier studies also only analysed the ability of DKI to differentiate glioma grade with no probing of DKIs role in differentiating glioma from other intra-axial tumours. This is an important question as it assesses the real-world applicability of DKI as a non-invasive imaging tool in tumours of an unknown lineage that have not been sampled. To address these issues, our systematic review and meta-analysis will scrutinise two research questions; firstly, we will further assess the diagnostic accuracy of DKI in differentiating low-grade glioma (LGG) from high-grade glioma (HGG) by broadening the inclusion criteria and including recent studies that adhere to the up-to-date WHO 2016 glioma classification and include IDH genotyping. To answer this first question, we will specifically look at the mean difference in MK between HGG and LGG and the overall diagnostic accuracy of DKI. Secondly, for the first time, we will also review the role of DKI in differentiating gliomas from other intra-axial tumours.

Materials and methods

A comprehensive review protocol was set up according to the Preferred Reporting Items for Systematic Review and Meta-analysis Protocols (PRISMA-P) statement and guidance from the Joanna Briggs Institute Reviewers’ Manual for the systematic review of studies of diagnostic test accuracy [12, 13]. The protocol was registered in the international prospective register of systematic reviews, PROSPERO (registration number: CRD42018099192) and details have been published previously [14]. For the development of this full systematic review, the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) checklist was used [15]. A detailed PRISMA checklist is available in Online Resource 1.

Search strategy for identification of studies

A systematic literature search in four databases (PubMed, Medline via Ovid, Scopus and Embase) was conducted on the 12th of July 2018, with the help of librarian KB. The search syntax used the keywords ‘glioma’ and ‘diffusional kurtosis’ as both MeSH (Medical Subject Headings) terms and free text words without language restrictions. Reference lists of the included articles were also searched for keywords. A detailed search strategy is given in Online Resource 2.

Inclusion and exclusion criteria

Studies were eligible for inclusion if they assessed DKI in (i) the diagnosis of primary or recurrent glioma using either the WHO 2007 or the WHO 2016 classifications; or (ii) the differentiation of gliomas in comparison with other brain tumours. Exclusion criteria comprised paediatric age groups, non-original research articles (review, commentaries, erratum, books, editorial and conference abstracts), animal studies, non-imaging studies, non-MRI studies, non-kurtosis MRI studies, non-neoplastic conditions, non-glial tumours only, non-cerebral tumours and studies written in languages other than English, French or German. Non-relevant studies were excluded following a reading of eligible articles in full text.

Methodological quality (risk of bias) assessment

Risk of bias and applicability concerns were assessed independently by two authors (GA and SES) using the QUADAS-2 tool (revised tool of quality assessment of diagnostic accuracy studies) [16]. Any disagreements were resolved in consensus. If an agreement could not be reached, a third reviewer was available as an adjudicator (PMM). Quality assessment was performed twice separately for each of the two review questions. Risk of bias was assessed in four domains: patient selection, index test, reference standard, and flow and timing. Applicability concerns were assessed in three domains: patient selection, index test, and reference standard. The risk of each domain was judged to be high, low, or unclear. Risk of bias across studies (i.e. ‘publication bias’) was not assessed as there remains no universally accepted method and the number of studies was small [17]. For the risk of bias domains, certain adaptations were outlined: (1) In patient selection prospective, studies were deemed low risk whilst retrospective studies were judged high risk. (2) For index test, studies were judged low risk if the radiologist was blinded to the reference standard during image analysis versus high risk if unblinded. (3) Reference standard was defined as histopathological assessment. (4) Inflow and timing unclear risk was recorded if the interval between the index test and reference standard was not given or if patients were removed from the study without defined reason.

Data extraction

Data extraction was conducted by GA and SES using pre-designed standardized sheets. Extracted data included the name of the first author, publication year, study type, patient population demographics, acquisition techniques, processing and post-processing software, reference standard, WHO classification scheme and diagnostic test accuracy results (true and false positive and negative values).

Overlapping datasets

Studies that include the same authors will be checked for overlapping patient cohorts through direct contact with the study’s author/s. If there are overlapping cohorts across different studies, then the most recently published study from that group will be included in the analysis.

Data synthesis and analyses

Mean and standard deviation of mean kurtosis (MK) value were used to describe the mean differences between low-grade glioma (LGG) and high-grade glioma (HGG) groups with a random-effects meta-analysis model using the restricted maximum likelihood method. To analyse the heterogeneity and the robustness of the results, we performed subgroup analysis by stratifying the included studies according to the type of technique such as the time of repetition (TR) value, number of b values (i.e. the degree of diffusion weighting), maximum b value and diffusion direction. As a measure of the degree of heterogeneity between the studies, we used the Q and I2 statistics.

For analysis of diagnostic test accuracy (DTA), a bivariate random-effects meta-analysis using the restricted maximum likelihood method was used to calculate the summary receiver operating characteristic (ROC) curve and its area under the curve (AUC). Coupled plots showing points of sensitivity and false-positive rate with their 95% confidence intervals (CI) were also calculated. Bivariate meta-regression was implemented to assess study level characteristics: time to echo (TE), TR, max b value, number of b values and number of diffusion directions. Statistical analysis was performed with R (version 3.5.1).

Results

Search results and included studies

A systematic search of the four databases led to the identification of 216 studies. After checking for duplicates, 88 studies remained. Of these, 65 were excluded based on the pre-defined inclusion and exclusion criteria by reading titles and abstracts. The remaining 23 articles were selected for full-text evaluation. Following a detailed assessment, 3 studies were excluded owing to lack of analytical data for DKI and 1 study was excluded because it did not answer any of the two pre-defined review questions. Finally, 19 studies were deemed to be eligible for inclusion in the systematic review [9, 18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]. For the primary question, investigating the role of DKI in glioma grading 17 studies was selected [9, 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]. For the secondary question, assessing the technique’s potential in differentiating glioma from other intra-axial tumours the remaining 2 studies was selected [18, 19]. The results of the selection process are presented in Fig. 1. The studies with their characteristics in the areas of histologic types of glioma, details of DKI acquisition technique (e.g. TR/TE, b values and diffusion encoding directions), DKI processing and post-processing (e.g. software used and extracted parameters) are illustrated in Table 1.

Fig. 1
figure 1

Flowchart for selection of the included in the meta-analysis studies

Table 1 Summary of findings for the studies included in the systematic literature review

Quality assessment

First question: role of DKI in glioma grading

Seventeen studies addressed the first question and the results of quality assessment are summarized in Fig. 2 and Table 2 [9, 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35]. For risk of bias, the quality was variable across each domain. In the patient selection domain, 10 were judged to be at low risk [9, 20,21,22, 26, 30,31,32,33,34,35] and 7 were considered high risk [23,24,25, 27,28,29]. In the index test domain, 9 studies were considered low risk whilst the remaining 8 were unclear in risk. For the reference standard domain, all studies were deemed low risk. Regarding flow and timing, 11 studies had low risk, 2 studies had high risk and 4 studies had unclear risk. For applicability, all studies across all domains were considered low risk.

Fig. 2
figure 2

Quality assessment of included diagnostic accuracy studies

Table 2 Detailed quality assessment of included diagnostic accuracy studies considering grading of gliomas

Second question: role of DKI in differentiating gliomas from other brain tumours

The two included studies addressing this question encountered low risk of applicability and reference standard bias risk [18, 19]. However, the risk of bias was high in patient selection and unclear in flow and timing in one study [19]. Whilst the other study had an unclear risk of bias regarding the index test as blinding of the neuro-radiologist to pathology was not mentioned [18].

Overlapping datasets

Six studies included in the systematic review were from the same study group and used overlapping patient cohorts and datasets. This was confirmed after contacting these studies authors. In consensus, our team decided to include in the meta-analysis the most recently published work to avoid repetition bias.

Meta-analyses

For the first question, assessing the role of DKI in differentiating HGGs from LGGs 12 studies out of the 19 selected studies was included in the meta-analysis after considering overlapping datasets [20,21,22,23,24,25,26,27,28,29,30,31], in addition to another excluded study which evaluated the role of DKI in low-grade glioma only [32]. As previously outlined for the first question, two separate meta-analyses were performed: one assessing the mean difference of mean kurtosis (MK) between high-grade gliomas (HGGs) and low-grade gliomas (LGGs) and the other looking at the overall diagnostic accuracy of DKI. A separate meta-analysis for the second question was not feasible due to the limited number of studies and associated data.

Meta-analysis of mean difference

All 12 selected studies provided sufficient data to assess mean kurtosis (MK) difference [20,21,22,23,24,25,26,27,28,29,30,31]. The random-effect model showed a significant difference in MK (pooled mean value of 0.22 (95% CI 0.25–0.19) and p value = 0.0014) between HGGs and LGGs. Forest plots of mean difference in MK between LGG and HGG are shown in Fig. 3. Although a moderate degree [33] of heterogeneity was detected between the studies (I2 = 73.8%), the robustness of our results was verified by sensitivity analysis of multiple study characteristics which showed no significant statistical difference between these features (See Online Resourse 3). Owing to the degree of heterogeneity, it was not possible to define an MK cutoff value for differentiating HGGs from LGGs. Allowing for this, based on the range of cutoff values used across each study, the optimal cutoff value appears to lie between 0.5 and 0.6.

Fig. 3
figure 3

Forest plot of mean difference in MK between LGG and HGG with a random-effects meta-analysis model

Meta-analysis of diagnostic test accuracy

For diagnostic test accuracy (DTA) of DKI in differentiating HGGs from LGGs, 9 of the 12 selected studies were eligible for a bivariate random-effect meta-analysis [20,21,22,23,24,25, 29,30,31]. The pooled sensitivity was 0.87 (95% CI 0.78–0.92) and specificity was 0.85 (95% CI 0.76–0.91). Forest plots of sensitivity and specificity of studies are shown in Figs. 4 and 5, respectively. The summary receiver operating characteristic (ROC) curve is shown in Fig. 6 with an area under the curve (AUC) of 0.92. DTA analysis revealed that the DKI false-positive rate was 0.15 (95% CI 0.09–0.24). Bivariate meta-regression model showed study level characteristics such as TE, TR, max b value, no. of b values and number of diffusion direction had no significant effect on diagnostic accuracy (p = 0.155 to p = 0.893).

Fig. 4
figure 4

Pooled sensitivity of diffusion kurtosis imaging in the grading of CNS gliomas

Fig. 5
figure 5

Pooled specificity of diffusion kurtosis imaging in the grading of CNS gliomas

Fig. 6
figure 6

Summary receiver operating characteristic curve

Discussion

To date, there have been only two published meta-analyses which have looked at the diagnostic test accuracy (DTA) of DKI in glioma discrimination and both demonstrated very promising initial results [10, 11]. Our systematic review and meta-analysis extend this further by increasing the number of electronic databases, removing language restrictions and broadening the eligibility criteria. Importantly regarding DTA of DKI in addition to studies, which follow the WHO 2007 classification, we also included studies using the current WHO 2016 classification, the latter the IDH genotype. Our systematic review included a total of 19 studies, and for looking at the role of DKI in glioma stratification, two meta-analyses were performed. For mean difference analysis of mean kurtosis (MK) across 12 studies, we demonstrated a statistically significant mean difference in MK between high-grade gliomas and low-grade gliomas of 0.22 (95% CI 0.19–0.25). These results are comparable with the Delgado et al., where across 10 studies, they found a significant MK mean difference between high- and low-grade gliomas of 0.17 (95% CI 0.11–0.22) [10]. Our second meta-analysis included 9 studies, assessing the overall diagnostic test accuracy of DKI in grading gliomas, further confirmed its high diagnostic potential with an 87% sensitivity, 85% specificity and 0.92 pooled area under the curve. This is similar to the Delgado group findings which were across a smaller group of 5 studies, which produced a pooled sensitivity and specificity of 85% and 92% respectively and a pooled area under the curve of 0.94.

Looking at heterogeneity, the Delgado et al. subgroup analysis found the type of astrocytoma, the maximum b value and repetition time as significant study level characteristics moderating the diagnostic performance of DKI. Our larger study similarly showed moderate heterogeneity in study scanning parameters (such as TE, TR, max b value, number of diffusion directions). However, across a wider range of studies and by using a bivariate meta-regression model, we found no significant impact of the different study technical characteristics on the diagnostic accuracy. This finding is particularly encouraging as it suggests generalisable clinical utility. Nevertheless, optimization and standardisation of these parameters and post-processing techniques are still required to enable multi-centre quantitative studies, a key concept in generating higher-level evidence. It was not possible to determine a definite MK cutoff value for differentiating HGGs from LGGs. Despite this limitation based on the range of thresholds used in each separate study, we can extrapolate that the optimal cutoff value for MK is placed between 0.5 and 0.6.

Of the 19 studies in the systematic review, 4 studies looked into DKIs role in stratifying IDH mutation status as per the 2016 WHO classification [9, 20, 34, 35]. These studies reported significantly different MK values between IDH wild type and mutant suggesting its potential role as a surrogate marker for IDH phenotyping. Unfortunately, all 4 studies were from the same institute and based on an overlapping dataset preventing a subgroup meta-analysis [9, 20, 34, 35]. Recently, however, after the date this systematic review search was performed, two studies have been published comparing the diagnostic performance of DKI and DTI in predicting the IDH mutation status in gliomas [36, 37]. Both groups considered MK and mean diffusivity (MD) as the main DKI and DTI parameters respectively and concluded that MK can identify IDH mutation status with higher diagnostic value than MD [36, 37].

For the second question, looking at the role of DKI in differentiating glioma from non-gliomatous CNS tumours, our literature search only identified two studies which did not allow us to obtain any conclusive results [18, 19]. These studies showed however encouraging results that need to be reproduced in the future. Yan Tan et al. analysed MK values in solid tumour parts and the periphery of high-grade astrocytomas (HGAs) and solitary metastatic lesions, concluding that MK values differed significantly in the periphery between the two entities. MK values were also more sensitive than diffusion tensor imaging (DTI) metrics [19]. Using DKI, Pang et al. aimed to differentiate between HGGs and primary CNS lymphomas (PCNSLs) [18]. They reported significantly higher MK in PCNSLs than HGG, which could perhaps be explained by the hypercellular nature of lymphomas microenvironment.

Our study has a few possible limitations. Several of the studies are of limited sample size; none of the included studies reported individual patient data and it is not possible to account for differences in post-processing techniques like tumour ROI. Finally, the bivariate meta-regression analysis lacked a multivariate assessment.

Conclusion

Our work further confirms that DKI has a very good diagnostic performance in stratifying high- and low-grade gliomas. The consistent accuracy across different studies with varied acquisition and post-processing techniques importantly also implies that DKI is a technique that may be generalisable and clinically useful across different institutions and populations. Optimisation and standardisation of DKI techniques are still needed however to ensure consistency and parity. We also show its potential role as a surrogate marker for IDH phenotyping, although this requires further investigation. Finally, this study highlights the need to further explore the role of DKI in characterising non-glioma tumours.