Background

There is an assumption that medical images contain additional data that is not apparent to the human eye and the field of radiomics aims to extract this information through (semi) automated analysis, without the need to change the image acquisition protocols [1, 2]. Texture analysis, measuring image heterogeneity, is an example of additional information that is contained within medical images. Although first-order statistics, based on global measures from voxel intensity histograms, are not a true measure of image texture, they are often reported due to their clinical relevance in a number of studies. Second-order statistics consider the relationship between pairs of voxels providing a measure of local texture features. High-order statistics consider the relationship between three or more voxels and provide a measure of both local and regional texture features. Fractal analysis is a further method that can be used to quantify texture information on the basis of repeating geometric patterns (self-similarity) and roughness [35].

Whilst a number of factors have been described that influence the measurement, accuracy and reproducibility of texture features [4, 6, 7], to our knowledge, there have been no published data on how texture features change with time post-injection of 18F-FDG in soft tissue tumours. As there is a growing interest in using texture features in the clinical environment, these data are essential to inform on the design of clinical and research protocols to enable intra/inter-patient scans to be compared reliably and multi-centre trials to be carried out.

Standardised uptake value (SUV) parameters have been shown to change with time post-injection of 18F-FDG [811], and we hypothesised that texture parameters may also change with time. Therefore, the aim of this study was to investigate the effect of time post-injection of 18F-FDG on the measurement of texture features in a cohort of patients with neurofibromatosis-1 (NF1) in whom malignant transformation of neurofibromas to malignant peripheral nerve sheath tumours (MPNST) was suspected clinically and in whom 18F-FDG PET data had been acquired at two separate time points post-injection. As a purely technical study, we did not aim to assess the ability of texture features to discriminate benign from MPNSTs, an analysis that will be the subject of a separate study.

Methods

Patients

This retrospective study included 54 patients with NF1 attending our national neurofibromatosis service (30 male, 24 female, mean age 35.1 years). All patients had symptomatic plexiform neurofibromas clinically suspected of malignant transformation and were referred for further investigation with 18F-FDG PET/CT. An institutional review board waiver was obtained for this retrospective analysis. All patients either had histological confirmation of the tumours or were followed clinically for at least 5 years.

18F-FDG PET/CT scan acquisition and analysis

Adult patients were injected with 350 (±10%) MBq of 18F-FDG, and in children, the injected activity was scaled by body weight (weight/70 × 350 MBq). All patients had blood-glucose levels below 10 mmol/l at the time of injection. Imaging was performed at two time points post-injection of 18F-FDG: an initial early acquisition at 101.5 ± 15.0 min and a later acquisition at 251.7 ± 18.4 min, as per the standard clinical protocol of our department for characterisation of masses in patients with NF1 [8]. The early scan was from the cerebellum to mid-thigh with additional images acquired if the tumour was below the mid-thigh or above the cerebellum. The late scan involved acquiring a local view of the symptomatic tumour only. The acquisition time for both the early and late scans was 5 min per bed position.

Scans were performed on one of two scanners (Discovery VCT or DST, GE Healthcare, Chicago, USA) which are cross-calibrated to within 3% [12]. All images, from both scanners, were reconstructed with a voxel size of 4.7 mm and slice thickness of 3.27 mm using the ordered-subset expectation maximisation algorithm (2 iterations, 20 subsets). They were subsequently post-filtered using a 3D Gaussian kernel with a full-width at half maximum of 6 mm. Low-dose CT was acquired at 120 kVp and 65 mAs for the purposes of anatomical localisation and attenuation correction without administration of oral or intravenous contrast agent.

The reconstructed PET datasets were imported into in-house texture analysis software implemented in MATLAB (Release 2013b, The MathWorks, Inc., Natick, MA, USA). Many of the tumours, particularly those that were classified as benign, showed only very low-grade 18F-FDG uptake, and it was therefore not possible to implement automated segmentation (e.g., threshold defined by a percentage of maximum standardised uptake value (SUVmax) or a fuzzy locally adapted Bayesian (FLAB) method) nor was it possible to reliably define the region of interest (ROI) by hand on the PET scan. All tumours, at both time points, were therefore segmented manually on the CT images where the tumour edges were easily defined. ROI definition was carried out by an experienced operator trained in both radiology and nuclear medicine. The ROIs drawn on the CT scans were automatically mapped onto the PET scan (Fig. 1). Statistical and textural features that were calculated from tumour volumes of interest included 37 first-order, 25 second-order, 31 high-order and 6 fractal features as listed in Table 1. First-order ROI features were decay corrected from the time of injection. Voxel values within the tumour volume of interest (VOI) were resampled to yield 64 discrete equally spaced bins. Seven 3D direction vectors and 2 distances were considered resulting in 14 matrices. The 2 distances were used to capture relationships between voxels at larger distances and 7 directions to optimise computational time. The texture descriptors were obtained from each matrix followed by averaging the values calculated separately in each matrix. Fractal features were computed using a differential box-counting method.

Fig. 1
figure 1

Axial 18F-FDG PET and CT scans. The tumour is indicated by the arrow (above). A region of interest is shown on the same PET and CT slice of the tumour (below)

Table 1 Early and late scan median values (and range) of all parameters

To assess inter-observer variability, a random subset of 16 patients had VOIs defined on early and late scans by a separate operator blinded to the initial observer measurements and clinical data.

Statistical methods

All statistical analyses were performed using IBM ® SPSS predictive analytics software, v22.0.0.0. As data were not normally distributed, non-parametric tests were performed. For each texture feature, the values obtained were compared between the initial and later time points using the related-samples Wilcoxon signed-rank test and correlations made with Spearman correlation. Comparisons were performed for all 54 tumours and for the subsets of the 30 benign and the 24 malignant tumours. A significance level of p < 0.05 was used. Inter-observer variation was assessed with intraclass correlation coefficients (ICC).

Results

Good inter-observer agreement was found for measurement of all parameters with mean ICC scores for individual patient scans of 0.93 and 0.96 for early and late scanning time points, respectively, and there was no significant difference between the segmented tumour volumes obtained from the early and late scans (median 35.6 vs 34.9 cm3, respectively; p = 0.069). High correlation was observed between early and late scan data for most texture features (mean r value = 0.66 ± 0.21; Table 1).

Several first-, second-, high-order statistical and fractal features were significantly different between early and late scans (p < 0.05). In summary, overall 25/37 (68%) first-order, 9/25 (36%) second-order, 13/31 (42%) high-order and 3/6 (50%) fractal features showed significant changes, i.e. 50/99 (50%) parameters in total (Table 1). For the 30 benign tumours, 22/37 first-order, 7/25 second-order, 8/31 high-order and 2/6 fractal features changed significantly (p < 0.05) between the early and late scans. The corresponding numbers for the 24 malignant tumours were 11/37, 6/25, 8/31 and 0/6 (p < 0.05) (Table 2). Fifteen of the texture features that changed significantly did so for both benign and malignant tumours whilst the majority of features changed only for benign (n = 24) or malignant (n = 10) tumours.

Table 2 Number of texture features that significantly changed (p < 0.05), either increasing (↑) or decreasing (↓) between early and late scanning time points

Overall, more texture features decreased (27/39 benign; 13/25 malignant) than increased (12/39 benign; 12/25 malignant) with time. This pattern was true for first-order, second-order and high-order features in benign tumours and for second-order and high-order features in malignant tumours. However, in malignant tumours, more first-order features increased (n = 9) than decreased (n = 2) (Table 2).

Discussion

Previous researchers have demonstrated changes in measured SUV parameters post-injection of 18F-FDG and that this may even be of benefit for differentiating benign and malignant lesions [811]. However, to our knowledge, this is the first study that has investigated how global first-order and loco-regional texture features change with time post-injection of 18F-FDG.

Our study has demonstrated that a significant number of statistical first-, second- and high-order and model-based fractal features change with time post-injection of 18F-FDG in benign and malignant PNSTs. These findings suggest that both global and loco-regional uptake of 18F-FDG has not stabilised in both benign and malignant tumours by 101.5 ± 15.0 min after injection. We observed an expected high correlation between early and late scan texture features and the differences would therefore be unlikely to impact significantly on discriminatory ability between benign and malignant tumours but would be of greater importance in studies where serial texture features were being calculated as response measures.

The finding that a greater proportion of the global first-order features changed than second-order, high-order or model-based texture features suggests that global changes predominate over regional or local changes in 18F-FDG distribution. However, a significant proportion of second-order, high-order and fractal texture features also changed, showing that regional and local redistribution of 18F-FDG also occurs with time.

For first-order features, as expected, SUVmax increased with time for malignant but not benign tumours, as previously described for a number of malignant tumours [811].

Overall, SUVmean decreased with time, predominantly due to a decrease in uptake in benign tumours. First-order entropy and standard deviation, reflecting the global tumour randomness and distribution of voxel intensities, increased with time across the whole group of tumours but not in either of the benign or malignant groups alone.

For second-order features that reflect the relationships between pairs of voxel intensities and their spatial distribution, 8 out of 9 of the 25 texture features that changed showed a reduction. GLCM energy, a measure of uniformity, increased, and therefore overall the changes in these local texture features implied a reduction in heterogeneity with time. Second-order features showed a decrease in heterogeneity in both benign and malignant tumours suggesting there is a change in relative 18F-FDG distribution in the tumours between the two different time points, causing a change in local tumour texture features.

With high-order textures features, there were increases and decreases in a number of local and regional features in both benign and malignant tumours but with no consistent pattern. This suggests that considering texture features as showing heterogeneity or homogeneity in a binary manner is probably an oversimplification of what each feature represents mathematically. Similarly, a consistent pattern of change was not seen with model-based fractal features.

A number of technical factors are known to affect the measurement and reproducibility of texture features including matrix size, reconstruction parameters, bin width and tumour volume [4, 6, 7]. Our findings demonstrate additionally the importance of quoting post-injection 18F-FDG scanning times when discussing texture features and the importance of consistent post-injection 18F-FDG scanning times when comparing global and texture features of patient tumours in inter- and intra-patient longitudinal studies.

There are some limitations to our study. We only considered PNSTs in NF1 patients and as such, it is not possible to generalise these findings to other tumour types, and future research should investigate how texture features change over time in other cancers. Scans were acquired at 101.5 ± 15.0 and 251.7 ± 18.4 min post-injection in this study as per the clinical protocol in our department [8]. Therefore, we cannot comment on the detail of the kinetics of change between these time points or on the magnitude of change compared to scans acquired at 60 min post-injection, which is a more commonly used clinical protocol elsewhere. Whilst two scanners were used for data used in this study, the acquisition and reconstruction parameters were identical and knowing that quantitative differences were minimal [12], it is unlikely that this will have introduced a significant bias in results. All the tumours included in this project were segmented manually on the CT component of the PET/CT scan, and therefore ROIs are subject to more variability than semi-automated methods such as threshold-based or FLAB. This was unavoidable as low-grade activity in many of the tumours meant that automated methods and direct ROI placement on the PET images proved impossible. However, even on non-contrast-enhanced CT scans as used in our study, the edges of benign and malignant neurofibromas are usually well demarcated (Fig. 1), thus facilitating ROI definition and VOI definition proved straightforward with good inter-observer reproducibility. This method also has the advantage of minimising differences in segmentation volumes due to changes in 18F-FDG distribution, ensuring the whole tumour is included at both time points. The ROIs that were drawn on the CT scan were mapped onto the PET scan. Although all scans were checked qualitatively by an experienced operator to ensure there was no mis-registration of the ROIs, we otherwise made the assumption of accurate co-registration with no patient movement between CT and PET acquisitions.

As the later scans had fewer counts following radioactive decay of 18F-FDG, we cannot exclude image noise as an element that may have contributed to differences in texture features. However, more texture features reduced with time (i.e., became more homogeneous) and so it is unlikely that this is a dominant effect. Lastly, the literature suggests that many texture features may be redundant due to collinearity between features and that only a small number of features should be used based on robustness to technical factors and reproducibility [4, 6]. However, as an initial study of the phenomenon of change in texture feature quantification with time, we preferred to report on multiple features with and without known collinearity to document these findings as broadly as possible.

Conclusions

This study has demonstrated that many 18F-FDG PET texture features differ significantly between early and late scan acquisition time points. As such, it is important to scan patients at consistent times when measuring texture features in longitudinal patient studies, especially in multi-centre patient trials.