The diagnostic role of diffusional kurtosis imaging in glioma grading and differentiation of gliomas from other intra-axial brain tumours: a systematic review with critical appraisal and meta-analysis

Purpose We aim to illustrate the diagnostic performance of diffusional kurtosis imaging (DKI) in the diagnosis of gliomas. Methods A review protocol was developed according to the (PRISMA-P) checklist, registered in the international prospective register of systematic reviews (PROSPERO) and published. A literature search in 4 databases was performed using the keywords ‘glioma’ and ‘diffusional kurtosis’. After applying a robust inclusion/exclusion criteria, included articles were independently evaluated according to the QUADAS-2 tool and data extraction was done. Reported sensitivities and specificities were used to construct 2 × 2 tables and paired forest plots using the Review Manager (RevMan®) software. A random-effect model was pursued using the hierarchical summary receiver operator characteristics. Results A total of 216 hits were retrieved. Considering duplicates and inclusion criteria, 23 articles were eligible for full-text reading. Ultimately, 19 studies were eligible for final inclusion. The quality assessment revealed 9 studies with low risk of bias in the 4 domains. Using a bivariate random-effect model for data synthesis, summary ROC curve showed a pooled area under the curve (AUC) of 0.92 and estimated sensitivity of 0.87 (95% CI 0.78–0.92) in high-/low-grade gliomas’ differentiation. A mean difference in mean kurtosis (MK) value between HGG and LGG of 0.22 (95% CI 0.25–0.19) was illustrated (p value = 0.0014) with moderate heterogeneity (I2 = 73.8%). Conclusion DKI shows good diagnostic accuracy in the differentiation of high- and low-grade gliomas further supporting its potential role in clinical practice. Further exploration of DKI in differentiating IDH status and in characterising non-glioma CNS tumours is however needed. Electronic supplementary material The online version of this article (10.1007/s00234-020-02425-9) contains supplementary material, which is available to authorized users.


Introduction
Gliomas are the commonest primary brain tumour and they remain a leading cause of solid cancer-related deaths in the under 40s [1]. Gliomas encompass a heterogeneous broad group of tumours with different cellular origins and variable biological behaviour. Classification of gliomas is therefore essential to guide therapy, anticipate treatment response and predict prognosis. Historically, the old WHO glioma classification was based on a histologic definition of predominant cellular lineage and grade, essentially differentiating gliomas into high-grade more aggressive and low-grade less aggressive [2]. The advent of biomolecular characterisation has led to the identification of key genetic markers which also influence tumour behaviour. The isocitrate dehydrogenase (IDH) gene has received particular recognition and has contributed to a major revision in the latest 2016 World Health Organization classification [3]. Despite these advances, this essential classification remains reliant on invasive tissue sampling. This has considerable risks, including foremost permanent neurological deficit which also greatly precludes repeat sampling to check for high-grade transformation. Tissue sampling is additionally fallible, with biopsies and incomplete resections potentially providing a non-representative sample due to intratumoral heterogeneity. Thus, there is a clear need for a noninvasive imaging marker of tumour type, grade and genetic status which would inform management and estimate prognosis.
Diffusion-weighted imaging (DWI) and diffusion tensor imaging (DTI) have a well-established role in the radiological assessment of brain tumours. The utility of these diffusion-based sequences in staging remains suboptimal, however [4]. Both DWI and DTI assume that the diffusion of water molecules involves random, Brownian motion. Following this assumption the probability distribution function (PDF), the chance of a proton diffusing between two points in a given time is thought to follow a Gaussian distribution [5]. The apparent diffusion coefficient (ADC) is based on the standard deviation of this PDF and DTI extends this by deriving the ADC in a direction-dependent manner [5,6]. Although the Gaussian model in DWI and DTI holds true for pure liquids, it overlooks the in vivo effect of the complex cytoarchitecture of organic tissue formed of various compartments, cell types and intracellular constituents. The true PDF, therefore, exhibits non-Gaussian behaviour and the way this deviates from a Gaussian PDF can be assessed using the dimensionless statistical measure called kurtosis. Diffusion kurtosis imaging (DKI) is a novel extension of DTI and provides the degree of directional, non-Gaussian diffusion, i.e. the diffusion kurtosis tensor [7,8]. Although the actual physiological basis of DKI remains unclear, the notion is that microstructural variances between gliomas of different grades will result in different DKI parameters, e.g. mean kurtosis (MK), and therefore will potentially provide a more accurate, non-invasive biomarker for glioma staging [5,9]. Early research is encouraging. Two prior meta-analyses which looked at the diagnostic accuracy of DKI for glioma grading projected a pooled area under the curve (AUC) of 0.94 and 0.96 for MK [10,11]. We attempted to consolidate the preliminary meta-analyses evidence in the topic through an updated systematic review and meta-analysis. These earlier studies also only analysed the ability of DKI to differentiate glioma grade with no probing of DKIs role in differentiating glioma from other intra-axial tumours. This is an important question as it assesses the realworld applicability of DKI as a non-invasive imaging tool in tumours of an unknown lineage that have not been sampled. To address these issues, our systematic review and metaanalysis will scrutinise two research questions; firstly, we will further assess the diagnostic accuracy of DKI in differentiating low-grade glioma (LGG) from high-grade glioma (HGG) by broadening the inclusion criteria and including recent studies that adhere to the up-to-date WHO 2016 glioma classification and include IDH genotyping. To answer this first question, we will specifically look at the mean difference in MK between HGG and LGG and the overall diagnostic accuracy of DKI. Secondly, for the first time, we will also review the role of DKI in differentiating gliomas from other intra-axial tumours.

Materials and methods
A comprehensive review protocol was set up according to the Preferred Reporting Items for Systematic Review and Metaanalysis Protocols (PRISMA-P) statement and guidance from the Joanna Briggs Institute Reviewers' Manual for the systematic review of studies of diagnostic test accuracy [12,13]. The protocol was registered in the international prospective register of systematic reviews, PROSPERO (registration number: CRD42018099192) and details have been published previously [14]. For the development of this full systematic review, the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) checklist was used [15]. A detailed PRISMA checklist is available in Online Resource 1.

Search strategy for identification of studies
A systematic literature search in four databases (PubMed, Medline via Ovid, Scopus and Embase) was conducted on the 12th of July 2018, with the help of librarian KB. The search syntax used the keywords 'glioma' and 'diffusional kurtosis' as both MeSH (Medical Subject Headings) terms and free text words without language restrictions. Reference lists of the included articles were also searched for keywords. A detailed search strategy is given in Online Resource 2.

Inclusion and exclusion criteria
Studies were eligible for inclusion if they assessed DKI in (i) the diagnosis of primary or recurrent glioma using either the WHO 2007 or the WHO 2016 classifications; or (ii) the differentiation of gliomas in comparison with other brain tumours. Exclusion criteria comprised paediatric age groups, non-original research articles (review, commentaries, erratum, books, editorial and conference abstracts), animal studies, non-imaging studies, non-MRI studies, non-kurtosis MRI studies, non-neoplastic conditions, non-glial tumours only, non-cerebral tumours and studies written in languages other than English, French or German. Non-relevant studies were excluded following a reading of eligible articles in full text.

Methodological quality (risk of bias) assessment
Risk of bias and applicability concerns were assessed independently by two authors (GA and SES) using the QUADAS-2 tool (revised tool of quality assessment of diagnostic accuracy studies) [16]. Any disagreements were resolved in consensus. If an agreement could not be reached, a third reviewer was available as an adjudicator (PMM). Quality assessment was performed twice separately for each of the two review questions. Risk of bias was assessed in four domains: patient selection, index test, reference standard, and flow and timing. Applicability concerns were assessed in three domains: patient selection, index test, and reference standard. The risk of each domain was judged to be high, low, or unclear. Risk of bias across studies (i.e. 'publication bias') was not assessed as there remains no universally accepted method and the number of studies was small [17]. For the risk of bias domains, certain adaptations were outlined: (1) In patient selection prospective, studies were deemed low risk whilst retrospective studies were judged high risk. (2) For index test, studies were judged low risk if the radiologist was blinded to the reference standard during image analysis versus high risk if unblinded. (3) Reference standard was defined as histopathological assessment. (4) Inflow and timing unclear risk was recorded if the interval between the index test and reference standard was not given or if patients were removed from the study without defined reason.

Data extraction
Data extraction was conducted by GA and SES using predesigned standardized sheets. Extracted data included the name of the first author, publication year, study type, patient population demographics, acquisition techniques, processing and post-processing software, reference standard, WHO classification scheme and diagnostic test accuracy results (true and false positive and negative values).

Overlapping datasets
Studies that include the same authors will be checked for overlapping patient cohorts through direct contact with the study's author/s. If there are overlapping cohorts across different studies, then the most recently published study from that group will be included in the analysis.

Data synthesis and analyses
Mean and standard deviation of mean kurtosis (MK) value were used to describe the mean differences between lowgrade glioma (LGG) and high-grade glioma (HGG) groups with a random-effects meta-analysis model using the restricted maximum likelihood method. To analyse the heterogeneity and the robustness of the results, we performed subgroup analysis by stratifying the included studies according to the type of technique such as the time of repetition (TR) value, number of b values (i.e. the degree of diffusion weighting), maximum b value and diffusion direction. As a measure of the degree of heterogeneity between the studies, we used the Q and I 2 statistics.
For analysis of diagnostic test accuracy (DTA), a bivariate random-effects meta-analysis using the restricted maximum likelihood method was used to calculate the summary receiver operating characteristic (ROC) curve and its area under the curve (AUC). Coupled plots showing points of sensitivity and false-positive rate with their 95% confidence intervals (CI) were also calculated. Bivariate meta-regression was implemented to assess study level characteristics: time to echo (TE), TR, max b value, number of b values and number of diffusion directions. Statistical analysis was performed with R (version 3.5.1).

Search results and included studies
A systematic search of the four databases led to the identification of 216 studies. After checking for duplicates, 88 studies remained. Of these, 65 were excluded based on the predefined inclusion and exclusion criteria by reading titles and abstracts. The remaining 23 articles were selected for full-text evaluation. Following a detailed assessment, 3 studies were excluded owing to lack of analytical data for DKI and 1 study was excluded because it did not answer any of the two predefined review questions. Finally, 19 studies were deemed to be eligible for inclusion in the systematic review [9,[18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35]. For the primary question, investigating the role of DKI in glioma grading 17 studies was selected [9,[20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35]. For the secondary question, assessing the technique's potential in differentiating glioma from other intra-axial tumours the remaining 2 studies was selected [18,19]. The results of the selection process are presented in Fig. 1. The studies with their characteristics in the areas of histologic types of glioma, details of DKI acquisition technique (e.g. TR/TE, b values and diffusion encoding directions), DKI processing and post-processing (e.g. software used and extracted parameters) are illustrated in Table 1.
Second question: role of DKI in differentiating gliomas from other brain tumours The two included studies addressing this question encountered low risk of applicability and reference standard bias risk [18,19]. However, the risk of bias was high in patient selection and unclear in flow and timing in one study [19]. Whilst the other study had an unclear risk of bias regarding the index test as blinding of the neuro-radiologist to pathology was not mentioned [18].

Overlapping datasets
Six studies included in the systematic review were from the same study group and used overlapping patient cohorts and datasets. This was confirmed after contacting these studies authors. In consensus, our team decided to include in the meta-analysis the most recently published work to avoid repetition bias.

Meta-analyses
For the first question, assessing the role of DKI in differentiating HGGs from LGGs 12 studies out of the 19 selected studies was included in the meta-analysis after considering overlapping datasets [20][21][22][23][24][25][26][27][28][29][30][31], in addition to another excluded study which evaluated the role of DKI in low-grade glioma only [32]. As previously outlined for the first question, two separate meta-analyses were performed: one assessing the mean difference of mean kurtosis (MK) between high-grade gliomas (HGGs) and low-grade gliomas (LGGs) and the other  LGG, low-grade glioma; HGG, high-grade glioma; HGA, high-grade astrocytoma; LGA, low-grade astrocytoma; AS, astrocytoma; OD, oligodendroglioma; IDH mut AS, isocitrate dehydrogenase mutant astrocytoma; IDH wt GBM, isocitrate dehydrogenase wild-type glioblastoma; LOH, loss of heterogeneity; ms, milliseconds; ROI, region of interest; T1 + C, T1-weighed images with contrast; ADC, apparent diffusion coefficient; cNAWM, contralateral normal appearing white matter; MK, mean kurtosis; Kr, radial kurtosis; Ka, axial kurtosis; KFA, kurtosis fractional anisotropy looking at the overall diagnostic accuracy of DKI. A separate meta-analysis for the second question was not feasible due to the limited number of studies and associated data.

Meta-analysis of mean difference
All 12 selected studies provided sufficient data to assess mean kurtosis (MK) difference [20][21][22][23][24][25][26][27][28][29][30][31]. The random-effect model showed a significant difference in MK (pooled mean value of 0.22 (95% CI 0.25-0.19) and p value = 0.0014) between HGGs and LGGs. Forest plots of mean difference in MK between LGG and HGG are shown in Fig. 3. Although a moderate degree [33] of heterogeneity was detected between the studies (I 2 = 73.8%), the robustness of our results was verified by sensitivity analysis of multiple study characteristics which showed no significant statistical difference between these features (See Online Resourse 3). Owing to the degree of heterogeneity, it was not possible to define an MK cutoff value for differentiating HGGs from LGGs. Allowing for this, based on the range of cutoff values used across each study, the optimal cutoff value appears to lie between 0.5 and 0.6.

Meta-analysis of diagnostic test accuracy
For diagnostic test accuracy (DTA) of DKI in differentiating HGGs from LGGs, 9 of the 12 selected studies were eligible for a bivariate random-effect meta-analysis [20][21][22][23][24][25][29][30][31]. The pooled sensitivity was 0.87 (95% CI 0.78-0.92) and specificity was 0.85 (95% CI 0.76-0.91). Forest plots of sensitivity and specificity of studies are shown in Figs. 4 and 5, respectively. The summary receiver operating characteristic (ROC) curve is shown in Fig. 6 with an area under the curve (AUC) of 0.92. DTA analysis revealed that the DKI false-positive rate was 0.15 (95% CI 0.09-0.24). Bivariate meta-regression model showed study level characteristics such as TE, TR, max b value, no. of b values and number of diffusion direction had no significant effect on diagnostic accuracy (p = 0.155 to p = 0.893).

Discussion
To date, there have been only two published meta-analyses which have looked at the diagnostic test accuracy (DTA) of DKI in glioma discrimination and both demonstrated very promising initial results [10,11]. Our systematic review and meta-analysis extend this further by increasing the number of electronic databases, removing language restrictions and broadening the eligibility criteria. Importantly regarding DTA of DKI in addition to studies, which follow the WHO 2007 classification, we also included studies using the current WHO 2016 classification, the latter the IDH genotype. Our systematic review included a total of 19 studies, and for looking at the role of DKI in glioma stratification, two meta-analyses were performed. For mean difference analysis of mean kurtosis (MK) across 12 studies, we demonstrated a statistically significant mean difference in MK between high-grade gliomas and lowgrade gliomas of 0.22 (95% CI 0.19-0.25). These results are comparable with the Delgado et al., where across 10 studies, they found a significant MK mean difference between highand low-grade gliomas of 0.17 (95% CI 0.11-0.22) [10]. Our second meta-analysis included 9 studies, assessing the overall diagnostic test accuracy of DKI in grading gliomas, further confirmed its high diagnostic potential with an 87% sensitivity, 85% specificity and 0.92 pooled area under the curve. This is similar to the Delgado group findings which were across a smaller group of 5 studies, which produced a pooled sensitivity and specificity of 85% and 92% respectively and a pooled area under the curve of 0.94. Looking at heterogeneity, the Delgado et al. subgroup analysis found the type of astrocytoma, the maximum b value and repetition time as significant study level characteristics moderating the diagnostic performance of DKI. Our larger study similarly showed moderate heterogeneity in study scanning parameters (such as TE, TR, max b value, number of diffusion directions). However, across a wider range of studies and by using a bivariate meta-regression model, we found no significant impact of the different study technical characteristics on   [9,20,34,35]. These studies reported significantly different MK values between IDH wild type and mutant suggesting its potential role as a surrogate marker for IDH phenotyping. Unfortunately, all 4 studies were from the same institute and based on an overlapping dataset preventing a subgroup meta-analysis [9,20,34,35]. Recently, however, after the date this systematic review search was performed, two studies have been published comparing the diagnostic performance of DKI and DTI in predicting the IDH mutation status in gliomas [36,37]. Both groups considered MK and mean diffusivity (MD) as the main DKI and DTI parameters respectively and concluded that MK can identify IDH mutation status with higher diagnostic value than MD [36,37].
For the second question, looking at the role of DKI in differentiating glioma from non-gliomatous CNS tumours, our literature search only identified two studies which did not allow us to obtain any conclusive results [18,19]. These studies showed however encouraging results that need to be reproduced in the future. Yan Tan et al. analysed MK values in solid tumour parts and the periphery of high-grade astrocytomas (HGAs) and solitary metastatic lesions, concluding that MK values differed significantly in the periphery between the two entities. MK values were also more sensitive than diffusion tensor imaging (DTI) metrics [19]. Using DKI, Pang et al. aimed to differentiate between HGGs and primary CNS lymphomas (PCNSLs) [18]. They reported significantly higher MK in PCNSLs than HGG, which could perhaps be explained by the hypercellular nature of lymphomas microenvironment.
Our study has a few possible limitations. Several of the studies are of limited sample size; none of the included studies reported individual patient data and it is not possible to account for differences in post-processing techniques like

Conclusion
Our work further confirms that DKI has a very good diagnostic performance in stratifying high-and low-grade gliomas. The consistent accuracy across different studies with varied acquisition and post-processing techniques importantly also implies that DKI is a technique that may be generalisable and clinically useful across different institutions and populations. Optimisation and standardisation of DKI techniques are still needed however to ensure consistency and parity. We also show its potential role as a surrogate marker for IDH phenotyping, although this requires further investigation. Finally, this study highlights the need to further explore the role of DKI in characterising nonglioma tumours.