Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors

von Schacky, Claudio E.; Wilhelm, Nikolas J.; Schäfer, Valerie S.; Leonhardt, Yannik; Jung, Matthias; Jungmann, Pia M.; Russe, Maximilian F.; Foreman, Sarah C.; Gassert, Felix G.; Gassert, Florian T.; Schwaiger, Benedikt J.; Mogler, Carolin; Knebel, Carolin; von Eisenhart-Rothe, Ruediger; Makowski, Marcus R.; Woertler, Klaus; Burgkart, Rainer; Gersing, Alexandra S.

doi:10.1007/s00330-022-08764-w

Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors

Musculoskeletal
Open access
Published: 09 April 2022

Volume 32, pages 6247–6257, (2022)
Cite this article

Download PDF

You have full access to this open access article

European Radiology Aims and scope Submit manuscript

Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors

Download PDF

Claudio E. von Schacky ORCID: orcid.org/0000-0002-1401-0934¹,
Nikolas J. Wilhelm²,
Valerie S. Schäfer¹,
Yannik Leonhardt¹,
Matthias Jung³,
Pia M. Jungmann³,
Maximilian F. Russe³,
Sarah C. Foreman¹,
Felix G. Gassert¹,
Florian T. Gassert¹,
Benedikt J. Schwaiger⁴,
Carolin Mogler⁵,
Carolin Knebel²,
Ruediger von Eisenhart-Rothe²,
Marcus R. Makowski¹,
Klaus Woertler¹,
Rainer Burgkart² &
…
Alexandra S. Gersing^1,6

3002 Accesses
11 Citations
2 Altmetric
Explore all metrics

Abstract

Objectives

To develop and validate machine learning models to distinguish between benign and malignant bone lesions and compare the performance to radiologists.

Methods

In 880 patients (age 33.1 ± 19.4 years, 395 women) diagnosed with malignant (n = 213, 24.2%) or benign (n = 667, 75.8%) primary bone tumors, preoperative radiographs were obtained, and the diagnosis was established using histopathology. Data was split 70%/15%/15% for training, validation, and internal testing. Additionally, 96 patients from another institution were obtained for external testing. Machine learning models were developed and validated using radiomic features and demographic information. The performance of each model was evaluated on the test sets for accuracy, area under the curve (AUC) from receiver operating characteristics, sensitivity, and specificity. For comparison, the external test set was evaluated by two radiology residents and two radiologists who specialized in musculoskeletal tumor imaging.

Results

The best machine learning model was based on an artificial neural network (ANN) combining both radiomic and demographic information achieving 80% and 75% accuracy at 75% and 90% sensitivity with 0.79 and 0.90 AUC on the internal and external test set, respectively. In comparison, the radiology residents achieved 71% and 65% accuracy at 61% and 35% sensitivity while the radiologists specialized in musculoskeletal tumor imaging achieved an 84% and 83% accuracy at 90% and 81% sensitivity, respectively.

Conclusions

An ANN combining radiomic features and demographic information showed the best performance in distinguishing between benign and malignant bone lesions. The model showed lower accuracy compared to specialized radiologists, while accuracy was higher or similar compared to residents.

Key Points

• The developed machine learning model could differentiate benign from malignant bone tumors using radiography with an AUC of 0.90 on the external test set.

• Machine learning models that used radiomic features or demographic information alone performed worse than those that used both radiomic features and demographic information as input, highlighting the importance of building comprehensive machine learning models.

• An artificial neural network that combined both radiomic and demographic information achieved the best performance and its performance was compared to radiology readers on an external test set.

Bone Tumor Diagnosis Using a Naïve Bayesian Model of Demographic and Radiographic Features

Article 27 July 2017

Multiclass datasets expand neural network utility: an example on ankle radiographs

Article Open access 02 February 2023

Evaluation of a computer-assisted diagnosis system, BONENAVI version 2, for bone scintigraphy in cancer patients in a routine clinical setting

Article 19 October 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Conventional radiography is considered to be the initial imaging modality of choice for the diagnostics of bone tumors and tumor-like lesions [1,2,3]. Radiography allows accurate visualization of osseous destruction patterns and of periosteal response patterns [2]. This enables an assessment of the biological activity of bone lesions and with which lesions can be categorized into aggressive or non-aggressive bone lesions [4]. Further features, such as matrix mineralization or the tumor architecture, may additionally help to establish a specific diagnosis and thus make conventional radiographs crucial for the diagnostic work-up of bone tumors and tumor-like lesions and the following therapy [5]. Magnetic resonance imaging may help with narrowing differential or when a lesion is indeterminate by demonstrating extraosseous tissue components or the composition of the tumor. However, radiography is considered the primary imaging method of choice because it visualizes certain features of bone lesions (e.g. periosteal reaction, osseous destruction pattern, matrix mineralization) combined with high resolution, cost-effectiveness, and accessibility[3, 6].

To standardize the assessment of bone lesions on radiographs, methods enabling the computer-aided extraction of imaging features can be used. For this purpose, radiomics have been successfully used to distinguish between benign and malignant lesions [7,8,9]. Radiomics make use of the extraction of a multitude of imaging features to deduct an image-based signature that characterizes a tumor [10]. The radiomic signatures can then be used as input for machine learning models to classify the tumor [11]. Machine learning models include statistical models, decision tree models, support vector machines, and artificial neural networks (ANN) [10]. Decision trees such as random forest classifiers (RFC) or statistical methods such as logistic regression or Gaussian Naive Bayes classifiers (GNB) are widely used for classification tasks [12]. A previous study demonstrated the use of radiographic imaging features and demographic information to build a GNB model for the classification of bone tumors [13]. Therefore, we hypothesized that combining radiomics extracted from radiographs with demographic information may allow for reliable characterization of bone lesions. This may be particularly helpful for the clinical diagnostic routine, since the assessment of certain bone lesions requires expertise in musculoskeletal tumor imaging, which is difficult to acquire outside of a specialized center, due to the rarity with which these occur.

The aim of this proof-of-concept study was therefore to develop and validate machine learning models using radiomics derived from radiographs and demographic information to distinguish between benign and malignant bone lesions on radiographs and compare the performance to radiologists on an external test set.

Materials and methods

Patient selection and dataset

The local institutional review boards approved this retrospective multi-center study (Technical University Munich and University of Freiburg). The study was performed in accordance with national (as specified in Drs. 7301-18) and international guidelines (as specified in European Medicines Agency guidelines for good clinical practice E6). Informed consent was waived for this retrospective anonymized study. Radiographs of all eligible patients with primary bone tumors obtained at the primary institution between January 1, 2000, and December 31, 2019, were selected for this study, forming a consecutive series. The imaging protocols were in accordance with those previously described [1]. Patients included in this study (n = 880, average age 33.1 years ± 19.4, 395 women) were diagnosed by histopathology which was considered to be the standard of reference with malignant tumors (chondrosarcoma, n = 87; osteosarcoma, n = 34; Ewing’s sarcoma, n = 32; plasma cell myeloma, n = 28; B cell non-Hodgkin’s (NHL) lymphoma, n = 36; chordoma, n = 6) and benign tumors (osteochondroma, n = 228; enchondroma, n = 153; chondroblastoma, n = 19; osteoid osteoma, n = 19; giant cell tumor of bone, n = 44; Non-ossifying fibroma, n = 34; hemangioma, n = 12; aneurysmal bone cyst, n = 82; simple bone cyst, n = 24; fibrous dysplasia, n = 52). Chondrosarcomas included atypical cartilaginous tumors (grade 1, n = 8), grade 2 (n = 48), and grade 3 (n = 31) chondrosarcomas. Patients were excluded (n = 51) due to poor image quality because of artifacts not allowing for any analysis or radiologic assessment of the tumor. Metastases were excluded as they were not included in the database of the university’s musculoskeletal tumor center as primary bone tumors. All included cases were reviewed by two radiologists independently (A.S.G., musculoskeletal fellowship-trained radiologist with 8 years of experience, and S.C.F. a radiologist with 4 years of experience) to ensure that the tumor was correctly depicted on the radiograph with sufficient quality to locate and segment the bone tumor. The dataset was split randomly into 70%/15%/15% for training, validation, and internal testing.

Additionally, an external test set comprised of 96 patients from a different institution (Freiburg University Hospital) was used for further independent testing. Likewise, the external cohort was selected through the database of another university’s musculoskeletal tumor center forming a consecutive series.

Since an increasing model performance with a higher number of training cases was expected all eligible patients with primary bone tumors from the main institution were included to obtain as many samples for the development data set as possible. A sample size of 80–100 cases for the external validation was included, similar to comparable research studies [14, 15].

Table 1 gives an overview of the patients included in this study and tumor types.

Table 1 Subject characteristics*

Full size table

The imaging data was extracted from Digital Imaging and Communications in Medicine (DICOM) files as portable network graphics (PNG) files. PNG was chosen to ensure further lossless image processing. Segmentations of the tumors were performed blinded to the histopathological and clinical data by one radiologist (S.C.F.), using the open-source software (3D Slicer, version 4.7; www.slicer.org) and reviewed by A.S.G. [16]. To measure the intrareader reliability of the tumor segmentations, 45 patients were randomly selected from the data set and an additional segmentation was performed 3 months after the initial segmentation. The recorded intrareader reliability as measured by dice score was 0.92 ± 0.13.

To compare the performance to radiologists, two radiology residents (C.E.v.S., Y.L.) and two radiologists specialized in musculoskeletal tumor imaging (A.S.G., P.J.M.) classified the radiographs of the external test set.

Radiomic feature extraction and machine learning model development

Image processing, feature extraction, machine learning model development, and validation were performed on a 16-core Intel-i9 9900K CPU at 3.60 GHz (Intel), 32GB-DDR4-SDRAM running Linux system (Ubuntu 18.04 Canonical) and implemented in Python 3.7.7 (open-source, Python Software Foundation). Radiomic features were extracted as defined in the pyRadiomics library (version 3.0, https://www.radiomics.io/pyradiomics.html) [7]. The image of the DICOM file was extracted as PNG for further preprocessing. The extracted features are then used as input for the ML models. Clinical information such as location of the tumor (torso/head, upper extremity, lower extremity), age, and sex was also used as input.

First, a RFC was trained using 200 estimators and a maximum depth of 3 as defined previously [17]. This model enabled a detailed analysis of the relevant radiomic features, which motivated the classification. The 10 most important features were selected from the RFC model. Additionally, a GNB and an ANN with 3 fully connected layers and 200, 100, and 100 neurons in each layer were trained using the scikit-learn 0.22.2 (scikit-learn.org) and fastai library [18]. Figure 1 shows an overview of image processing and analysis steps that allow radiomic analysis and machine learning model development. The exact description of the machine learning methods with the model and training parameters used can be found as code online (https://github.com/NikonPic/bonetumor-radiomics).

Model evaluation and statistical analysis

Machine learning models were developed on the training set and validated on the validation set. The best-performing models on the validation set were chosen for final evaluation on the test sets. The model performance reported in this study was observed on the test sets using confusion matrices, accuracy, positive and negative predictive value, precision, recall, f1-score, receiver-operating characteristics (ROC) with area under the curve (AUC) analysis, and 95% confidence intervals (CI) with Clopper-Pearson’s method using scikit-learn 0.22.2 (scikit-learn.org) as previously defined [19]. Sensitivity was calculated as the number of true positives (correctly identified malignant bone tumors) divided by the number of true positives (correctly identified malignant bone tumors) and number of false negatives (malignant bone tumors incorrectly classified as benign). Specificity was calculated as the number of true negatives (correctly identified as benign bone tumors) divided by the number of true negatives (correctly identified as benign bone tumors) and the number of false positives (benign bone tumors incorrectly classified as malignant). McNemar’s test was used for statistical comparison and p < 0.05 was assumed to be statistically significant. Assuming a difference in the accuracy of 7.5% between the model and the resident as well as the musculoskeletal radiologist with a desired level of confidence of 95% using McNemar’s test resulted in at least 80 as the sample size of a test set. Additionally, the softmax of the output of the ANN was calculated as an estimate for the certainty of the prediction. Standard deviations and confidence intervals of the AUCs were calculated with pROC (1.16.1) using the DeLong method in R (3.6.1) [20]. Statistical analyses were performed by B.J.S. Model training, evaluation, and visualization were performed by C.E.v.S. (8 years of experience in data analysis) and N.J.W. (computer scientist with 8 years of experience in statistics and data analysis).

Results

Radiomic feature evaluation and demographic information

Overall, more than 200 radiomic features were analyzed. Of all radiomic features and demographic information, ‘age’ and ‘LLH_firstorder_TotalEnergy’ showed the highest relevance according to their feature importance with the RFC as also demonstrated in Fig. 2. To further investigate the discriminatory power of individual features, ANNs were trained for 10 epochs each. Only a single of these ten most relevant features was used as input. These models showed moderate classification performance with AUCs from 0.49 to 0.65 or accuracies from 52 to 62% depending on the feature that was used, highlighting that using a single radiomic feature or a single demographic variable was not sufficient to accurately distinguish benign from malignant primary bone tumors but rather a combination of radiomic and/or demographic information was needed. Interestingly, of the extracted radiomic features, those that focused on the intensity of individual and neighboring pixels, as well as those that reflected inhomogeneity, were more relevant. Detailed information on the individual radiomic feature performance can be found in Table 2.

Table 2 Performance on the 10 most significant radiomic and demographic features alone

Full size table

Machine learning model evaluation of combined radiomics and demographic information

Using the available demographic information, the best performing model was a RFC with 0.75 AUC, 76% accuracy (101/133, 95% CI: 0.68, 0.83), 41% sensitivity (13/32, 95% CI: 0.24, 0.59) and 87% specificity (88/101, 95% CI: 0.79, 0.93). In comparison, using the selected radiomic features, the best performing model was an ANN with an 0.71 AUC, 75% accuracy (100/133, 95% CI: 0.67, 0.82), 66% sensitivity (21/32; 95% CI: 0.47, 0.81), and 78% specificity (79/101; 95% CI: 0.69, 0.86).

Combining radiomic features and demographic information as input to an ANN resulted in a remarkable increase in performance with 0.79 AUC, 80% accuracy (107/133, 95% CI: 0.73, 0.87), 75% sensitivity (24/32, 95% CI: 0.57, 0.89), and 82% specificity (83/101, 95% CI: 0.73, 0.89) as well as a positive and negative predictive value of 57% (24/42, 95% CI: 0.41, 0.72) and 91% (83/91, 95% CI: 0.83, 0.96), respectively. This model achieved higher accuracy than the model based on demographic information or radiomic features alone and demonstrated an increase in accuracy by 4% and 5% (p = 0.041 and p = 0.023), respectively.

Table 3 shows the classification performances of all developed models. RFC, GNB, and ANN were used as architectures. For each architecture, models were developed that used radiomic features only, demographic information only, or a combination of both radiomic features and demographic information.

Table 3 The classification performances of the models on the internal test set using radiomic features or demographic information alone, as well as combining both radiomic features and demographic information. As model architectures, the following three were used: A random forest classifier (RFC), a Gaussian naïve Bayes classifier (GNB), and an artificial neural network (ANN)*

Full size table

Machine learning model evaluation on the external test set and comparison with radiologists

On the external test set, the best performing ANN achieved an AUC of 0.90, an accuracy of 75% (72/96, 95%: 0.65, 0.83), a sensitivity of 90% (28/31, 95% CI: 0.74, 0.98), and a specificity of 68% (44/65, 95% CI: 0.55, 0.79) as well as a positive and negative predictive value of 57% (28/49, 95% CI: 0.42, 0.71) and 94% (44/47, 95% CI: 0.82, 0.99), respectively.

The first radiology resident achieved 72% accuracy (68/96, 95% CI: 0.61, 0.80), 61% sensitivity (19/31, 95% CI: 0.42, 0.78), and 75% specificity (49/65, 95% CI: 0.63, 0.85). In comparison, the model showed similar accuracy (p = 0.134) at significantly better sensitivity (p < 0.01) and similar specificity (p = 0.074).

The second radiology resident achieved 65% accuracy (62/96, 95% CI: 0.54, 0.74), 35% sensitivity (11/31, 95% CI: 0.19, 0.55), and 78% specificity (51/65, 95% CI: 0.67, 0.88). In comparison, the model showed higher accuracy (p < 0.01) with better sensitivity (p < 0.01) at lower specificity (p = 0.023).

The first radiologist specialized in musculoskeletal tumor imaging achieved 84% accuracy (81/96, 95% CI: 0.76, 0.91), 90% sensitivity (28/31, 95% CI: 0.74, 0.98), and 82% specificity (53/65, 95% CI: 0.70, 0.90). In comparison, the model showed lower accuracy (p < 0.01) at similar sensitivity (p = 1) and lower specificity (p < 0.01).

The second radiologist specialized in musculoskeletal tumor imaging achieved 83% accuracy (80/96, 95% CI: 0.74, 0.90), 81% sensitivity (25/31, 95% CI: 0.63, 0.93), and 85% specificity (55/65, 95% CI: 0.74, 0.92). In comparison, the model showed lower accuracy (p = 0.013) at similar sensitivity (p = 0.248) and lower specificity (p < 0.01).

The prevalence of malignant bone tumors was higher in the external test set compared to the internal test set, possibly leading to differences in the performance measures of the ANN between the internal and external test set.

Figure 3A shows the ROC on the internal test set for the best performing model, an ANN using both radiomic features and demographic information, as well as the ANNs based on demographic information or radiomic features alone. Figure 3B shows the performance of the best-performing model on the external test set. Figure 4A and B show the confusion matrices for the best-performing model — an ANN combining both radiomic and demographic information on the internal and external test set, respectively.

Examples of correct and incorrect classifications by the best performing model

Cases of correct and incorrect classifications by the best performing ANN combining both radiomic features and demographic information were reviewed to further investigate the functioning of the model. When reviewing correct classifications, we could identify cases that showed patterns of malignancy as demonstrated in Fig. 5 A and B or showed the typical appearance of benign lesions as shown in Fig. 5C and D. Those cases also showed high certainty of prediction of the ANN with 86% and 93% certainty, respectively. Figure 6A and B show a case of a benign tumor that was misclassified as a malignant tumor with a low prediction certainty of 54%. This may have occurred due to the pathological fracture of the benign tumor. Figure 6C and D show a case of correct classification of a malignant tumor with a low to moderate prediction certainty of 67%.

Discussion

In this study, machine learning models based on radiomics and demographic information were developed and validated to distinguish between benign and malignant bone lesions on radiographs and compared to radiologists on an external test set. Overall, machine learning models using the combination of radiomics and demographic information showed a higher diagnostic accuracy than machine learning models using radiomics or demographic information only. The best model was based on an ANN that used both radiomics and demographic information. On an external test set, this model demonstrated lower accuracy compared to radiologists specialized in musculoskeletal tumor imaging, while accuracy was higher or similar compared to radiology residents.

Interestingly, when evaluating individual radiomic features only, features that reflect large differences in densities of neighboring pixels and inhomogeneity showed the highest discriminatory power indicating malignancy. This is in line with other studies assessing the ability of radiomic features based on magnetic resonance imaging, computed tomography (CT), and positron emission tomography (PET)-CT in order to distinguish between benign and malignant lesions in different types of diseases as it may be a reflection of moth-eaten appearance or a very inhomogeneous destruction pattern and may therefore be more often detected in malignant bone tumors compared with benign bone tumors.[9, 21, 22]. Of the evaluated demographic features, age showed the highest discriminatory power, which is in accordance with previous studies [1, 20].

Moreover, previous studies used a combination of radiographic features and demographic information to assess bone tumors on radiographs [13, 23, 24]. Kahn et al used Bayesian networks to differentiate among 5 benign and 5 malignant lesions achieving 68% accuracy [24]. Bao H. Do et al used a naive Bayesian model to differentiate primary and secondary bone tumors using 710 cases achieving 62% primary accuracy to differentiate between 10 distinct diagnoses [13]. Yet, these approaches used radiologist-defined semantic features only assessed on radiographs as input for the models, thus depending on the quality of the readings of the radiologists. In contrast, in this study, radiomic features containing first-, second-, and higher-order statistics were combined with patient data as input for sophisticated machine learning models [10]. Additionally, it needs to be noted that the sample sizes of all of the above mentioned previous studies on radiographic feature assessment of bone tumors were smaller than in this study, and performances were not evaluated on a separate hold-out test set or an external test set, in contrast to current best practices as performed in this study [3, 25, 26].

Due to the varying settings in which patients with bone lesions present, a quantitative method for image analysis may guarantee the highest quality of bone tumor diagnostics in the shortest time. Therefore, automated quantitative evaluation techniques of conventional radiographs obtained during the clinical routine diagnostic workup in patients with bone lesions are needed since these are independent of the experience level in evaluating conventional radiographs of the treating physicians. In this proof-of-concept study, we were able to develop a machine learning model using both radiomic features extracted from radiographs and demographic information with an accuracy higher or similar compared to radiology residents. Therefore, a model such as this implemented into the clinical routine pipeline may support inexperienced or moderately experienced radiologists or physicians in enhancing the quality of their decision-making regarding their diagnosis and consequently the further management or referral of these patients. More specifically, a model such as this may help with ‘ruling-in’ malignant lesions, particularly when the treating physician or the radiologist has limited experience. The patient could then be referred to a specialized center and a biopsy may be performed to secure the diagnosis.

This study has limitations. First, radiomic analysis of the tumor was only performed on a single radiograph without considering more available projections. Second, the applied technique relied on manual segmentations of the bone tumor. However, automated segmentations of bone tumors on radiographs may be developed in the future. Third, the ANN is limited by the information entailed in a radiograph and may be improved with additional information obtained from magnetic resonance imaging. Fourth, the demographic information included the location of the tumor, age, and sex; however, the medical history and clinical symptoms such as pain level and duration were also important and their use may be explored in future studies. Also, the developed models can currently only differentiate between benign and malignant lesions and not between the different tumor subtypes. However, a multitude of bone tumors, particularly malignant tumors, cannot be differentiated further by radiography alone, as indicated by the low accuracies in the previous studies mentioned above. In particular, some x-ray features of benign and malignant bone tumors may overlap, such as in low-grade chondrosarcoma showing a benign growth pattern or in giant cell tumor of bone sometimes demonstrating an aggressive growth pattern and periosteal reaction. Moreover, the dataset included only patients with histopathological diagnoses of the osseous lesions, since histopathology was considered to be the standard of reference in our study. Therefore, this may have created a selection bias that we cannot account for, since from certain bone lesions such as NOFs or fibrous dysplasia, histopathology is usually only obtained under circumstances in which bone stability seems endangered or the lesion itself appears to be ‘atypical’. Finally, bone metastases were not included in the current study, while they make up a large part of the malignant bone lesions.

This study is therefore considered to be a proof-of-concept study and the developed machine learning models need to be tested, optimized, and further evaluated in larger datasets also including bone metastases and additionally consisting of conventional radiographs with the final diagnosis of bone lesions based on the clinical and imaging consensus as well as on histopathology in future studies.

In conclusion, a machine learning model using both radiomic features and demographic information was developed that showed high accuracy and discriminatory power for the distinction between benign and malignant bone tumors on radiographs of patients that underwent biopsy. The best model was based on an ANN that used both radiomics and demographic information resulting in an accuracy higher or similar compared to radiology residents. A model such as this may enhance diagnostic decision-making especially for radiologists or physicians with limited experience and may therefore improve the diagnostic work up of bone tumors.

Abbreviations

ANN:: Artificial neural network
AUC:: Area under the curve
CI:: Confidence interval
DICOM:: Digital Imaging and Communications in Medicine
GNB:: Gaussian naive Bayes Classifier
NHL:: Non-Hodgkin’s lymphoma
NOF:: Non-ossifying fibroma
PNG:: Portable network graphics
RFC:: Random forest classifier

References

Costelloe CM, Madewell JE (2013) Radiography in the initial diagnosis of primary bone tumors. AJR Am J Roentgenol 200:3–7
Article Google Scholar
Miller TT (2008) Bone tumors and tumorlike conditions: analysis with conventional radiography. Radiology 246:662–674
Article Google Scholar
Lalam R, Bloem JL, Noebauer-Huhmann IM et al (2017) ESSR consensus document for detection, characterization, and referral pathway for tumors and tumorlike lesions of bone. Semin Musculoskelet Radiol 21:630–647
Article Google Scholar
Lodwick GS (1964) Radiographic diagnosis and grading of bone tumors, with comments on computer evaluation. Proc Natl Cancer Conf 5:369–380
CAS PubMed Google Scholar
Caracciolo JT, Temple HT, Letson GD, Kransdorf MJ (2016) A modified Lodwick-Madewell grading system for the evaluation of lytic bone lesions. AJR Am J Roentgenol 207:150–156
Article Google Scholar
Nascimento D, Suchard G, Hatem M, de Abreu A (2014) The role of magnetic resonance imaging in the evaluation of bone tumours and tumour-like lesions. Insights Imaging 5:419–440
Article Google Scholar
van Griethuysen JJM, Fedorov A, Parmar C et al (2017) Computational radiomics system to decode the radiographic phenotype. Cancer Res 77:e104–e107
Article Google Scholar
Li H, Mendel KR, Lan L, Sheth D, Giger ML (2019) Digital mammography in breast cancer: additive value of radiomics of breast parenchyma. Radiology 291:15–20
Article Google Scholar
Beig N, Khorrami M, Alilou M et al (2019) Perinodular and intranodular radiomic features on lung CT images distinguish adenocarcinomas from granulomas. Radiology 290:783–792
Article Google Scholar
Gillies RJ, Kinahan PE, Hricak H (2016) Radiomics: images are more than pictures, they are data. Radiology 278:563–577
Article Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer Science & Business Media 0:101–137. https://link.springer.com/book/10.1007/978-0-387-84858-7
Prinzie A, Van den Poel D (2007) Random multiclass classification: generalizing random forests to random MNL and random NB. In: Wagner R, Revell N, Pernul G (eds) Database and Expert Systems Applications. Springer, Berlin Heidelberg, pp 349–358
Chapter Google Scholar
Do BH, Langlotz C, Beaulieu CF (2017) Bone tumor diagnosis using a Naïve Bayesian model of demographic and radiographic features. J Digit Imaging 30:640–647
Article Google Scholar
Choi YS, Ahn SS, Chang JH et al (2020) Machine learning and radiomic phenotyping of lower grade gliomas: improving survival prediction. Eur Radiol 30:3834–3842
Article Google Scholar
Weikert T, Francone M, Abbara S et al (2021) Machine learning in cardiovascular radiology: ESCR position statement on design requirements, quality assessment, current applications, opportunities, and challenges. Eur Radiol 31:3909–3922
Article Google Scholar
Fedorov A, Beichel R, Kalpathy-Cramer J et al (2012) 3D Slicer as an image computing platform for the Quantitative Imaging Network. Magn Reson Imaging 30:1323–1341
Article Google Scholar
Breiman L (2001) Random Forests. Mach Learn 45:5–32
Article Google Scholar
Howard J, Gugger S (2020) Fastai: a layered API for deep learning. Information 11:108
Article Google Scholar
Powers DMW, Ailab (2011) Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation. J Mach Learn Techs 2:37–63. https://www.bibsonomy.org/bibtex/2e79179c264c479540bfb3dbafe82eef5/jpvaldes
Robin X, Turck N, Hainard A et al (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 12:77–77
Article Google Scholar
Gitto S, Cuocolo R, Albano D et al (2020) MRI radiomics-based machine-learning classification of bone chondrosarcoma. Eur J Radiol 128:109043
Article Google Scholar
Mattonen SA, Davidzon GA, Benson J et al (2019) Bone marrow and tumor radiomics at 18F-FDG PET/CT: impact on outcome prediction in non-small cell lung cancer. Radiology 293:451–459
Article Google Scholar
Reinus WR, Wilson AJ, Kalman B, Kwasny S (1994) Diagnosis of focal bone lesions using neural networks. Invest Radiol 29:606–611
Kahn CE Jr, Laur JJ, Carrera GF (2001) A Bayesian network for diagnosis of primary bone tumors. J Digit Imaging 14:56–57
Article Google Scholar
Stiller CA, Trama A, Serraino D et al (2013) Descriptive epidemiology of sarcomas in Europe: report from the RARECARE project. Eur J Cancer 49:684–695
Article CAS Google Scholar
Hashimoto K, Nishimura S, Oka N, Akagi M (2020) Clinical features and outcomes of primary bone and soft tissue sarcomas in adolescents and young adults. Mol Clin Oncol 12:358–364
PubMed PubMed Central Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL. This work was supported by the Clinician Scientist Program (KKF) at Technische Universität München.

Author information

Authors and Affiliations

Department of Radiology, Klinikum rechts der Isar, School of Medicine, Technische Universität München, Ismaninger Strasse 22, 81675, Munich, Germany
Claudio E. von Schacky, Valerie S. Schäfer, Yannik Leonhardt, Sarah C. Foreman, Felix G. Gassert, Florian T. Gassert, Marcus R. Makowski, Klaus Woertler & Alexandra S. Gersing
Department for Orthopedics and Orthopedic Sports Medicine, Ismaninger Strasse 22, 81675, Munich, Germany
Nikolas J. Wilhelm, Carolin Knebel, Ruediger von Eisenhart-Rothe & Rainer Burgkart
Department of Diagnostic and Interventional Radiology, Medical Center–University of Freiburg, Faculty of Medicine, Freiburg, Germany
Matthias Jung, Pia M. Jungmann & Maximilian F. Russe
Department of Neuroradiology, Klinikum rechts der Isar, School of Medicine, Technische Universität München, Munich, Germany
Benedikt J. Schwaiger
Institute of Pathology, Klinikum rechts der Isar, School of Medicine, Technische Universität München, Ismaninger Strasse 22, 81675, Munich, Germany
Carolin Mogler
Department of Neuroradiology, University Hospital, LMU Munich, Marchioninistraße 15, 81377, Munich, Germany
Alexandra S. Gersing

Authors

Claudio E. von Schacky
View author publications
You can also search for this author in PubMed Google Scholar
Nikolas J. Wilhelm
View author publications
You can also search for this author in PubMed Google Scholar
Valerie S. Schäfer
View author publications
You can also search for this author in PubMed Google Scholar
Yannik Leonhardt
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Jung
View author publications
You can also search for this author in PubMed Google Scholar
Pia M. Jungmann
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian F. Russe
View author publications
You can also search for this author in PubMed Google Scholar
Sarah C. Foreman
View author publications
You can also search for this author in PubMed Google Scholar
Felix G. Gassert
View author publications
You can also search for this author in PubMed Google Scholar
Florian T. Gassert
View author publications
You can also search for this author in PubMed Google Scholar
Benedikt J. Schwaiger
View author publications
You can also search for this author in PubMed Google Scholar
Carolin Mogler
View author publications
You can also search for this author in PubMed Google Scholar
Carolin Knebel
View author publications
You can also search for this author in PubMed Google Scholar
Ruediger von Eisenhart-Rothe
View author publications
You can also search for this author in PubMed Google Scholar
Marcus R. Makowski
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Woertler
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Burgkart
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra S. Gersing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudio E. von Schacky.

Ethics declarations

Ethics approval

Institutional Review Board approval was obtained (ethics committee of the Technical University of Munich, 393/20S-KH).

Informed consent

Written informed consent was waived by the Institutional Review Board.

Conflict of interest

The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.

Guarantor

The scientific guarantor of this publication is A.S.G.

Statistics and biometry

One of the authors has significant statistical expertise (B.J.S.).

Methodology

retrospective
diagnostic or prognostic study
multicentre study

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(DOCX 27 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

von Schacky, C.E., Wilhelm, N.J., Schäfer, V.S. et al. Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors. Eur Radiol 32, 6247–6257 (2022). https://doi.org/10.1007/s00330-022-08764-w

Download citation

Received: 25 June 2021
Revised: 03 January 2022
Accepted: 17 February 2022
Published: 09 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s00330-022-08764-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Development and evaluation of machine learning models based on X-ray radiomics for the classification and differentiation of malignant and benign bone tumors

Abstract

Objectives

Methods

Results

Conclusions

Key Points

Similar content being viewed by others

Bone Tumor Diagnosis Using a Naïve Bayesian Model of Demographic and Radiographic Features

Multiclass datasets expand neural network utility: an example on ankle radiographs

Evaluation of a computer-assisted diagnosis system, BONENAVI version 2, for bone scintigraphy in cancer patients in a routine clinical setting

Introduction

Materials and methods

Patient selection and dataset

Radiomic feature extraction and machine learning model development

Model evaluation and statistical analysis

Results

Radiomic feature evaluation and demographic information

Machine learning model evaluation of combined radiomics and demographic information

Machine learning model evaluation on the external test set and comparison with radiologists

Examples of correct and incorrect classifications by the best performing model

Discussion

Abbreviations

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Informed consent

Conflict of interest

Guarantor

Statistics and biometry

Methodology

Additional information

Publisher’s note

Supplementary information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation