Artificial intelligence for X-ray scaphoid fracture detection: a systematic review and diagnostic test accuracy meta-analysis

Kraus, Matan; Anteby, Roi; Konen, Eli; Eshed, Iris; Klang, Eyal

doi:10.1007/s00330-023-10473-x

Artificial intelligence for X-ray scaphoid fracture detection: a systematic review and diagnostic test accuracy meta-analysis

Review
Open access
Published: 15 December 2023

Volume 34, pages 4341–4351, (2024)
Cite this article

Download PDF

You have full access to this open access article

European Radiology Aims and scope Submit manuscript

Artificial intelligence for X-ray scaphoid fracture detection: a systematic review and diagnostic test accuracy meta-analysis

Download PDF

Matan Kraus ORCID: orcid.org/0000-0002-0789-2828^1,2,
Roi Anteby^2,3,
Eli Konen^1,2,
Iris Eshed^1,2 &
…
Eyal Klang^1,2

2440 Accesses
1 Altmetric
Explore all metrics

Abstract

Objectives

Scaphoid fractures are usually diagnosed using X-rays, a low-sensitivity modality. Artificial intelligence (AI) using Convolutional Neural Networks (CNNs) has been explored for diagnosing scaphoid fractures in X-rays. The aim of this systematic review and meta-analysis is to evaluate the use of AI for detecting scaphoid fractures on X-rays and analyze its accuracy and usefulness.

Materials and methods

This study followed the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) and PRISMA-Diagnostic Test Accuracy. A literature search was conducted in the PubMed database for original articles published until July 2023. The risk of bias and applicability were evaluated using the QUADAS-2 tool. A bivariate diagnostic random-effects meta-analysis was conducted, and the results were analyzed using the Summary Receiver Operating Characteristic (SROC) curve.

Results

Ten studies met the inclusion criteria and were all retrospective. The AI’s diagnostic performance for detecting scaphoid fractures ranged from AUC 0.77 to 0.96. Seven studies were included in the meta-analysis, with a total of 3373 images. The meta-analysis pooled sensitivity and specificity were 0.80 and 0.89, respectively. The meta-analysis overall AUC was 0.88. The QUADAS-2 tool found high risk of bias and concerns about applicability in 9 out of 10 studies.

Conclusions

The current results of AI’s diagnostic performance for detecting scaphoid fractures in X-rays show promise. The results show high overall sensitivity and specificity and a high SROC result. Further research is needed to compare AI’s diagnostic performance to human diagnostic performance in a clinical setting.

Clinical relevance statement

Scaphoid fractures are prone to be missed secondary to assessment with a low sensitivity modality and a high occult fracture rate. AI systems can be beneficial for clinicians and radiologists to facilitate early diagnosis, and avoid missed injuries.

Key Points

• Scaphoid fractures are common and some can be easily missed in X-rays.

• Artificial intelligence (AI) systems demonstrate high diagnostic performance for the diagnosis of scaphoid fractures in X-rays.

• AI systems can be beneficial in diagnosing both obvious and occult scaphoid fractures.

Musculoskeletal radiologist-level performance by using deep learning for detection of scaphoid fractures on conventional multi-view radiographs of hand and wrist

Article Open access 15 November 2022

Diagnostic Accuracy of Artificial Intelligence-Based Algorithms in Automated Detection of Neck of Femur Fracture on a Plain Radiograph: A Systematic Review and Meta-analysis

Article 08 April 2024

Artificial intelligence fracture recognition on computed tomography: review of literature and recommendations

Article Open access 26 October 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Scaphoid fractures are the most common type of carpal bone fracture, with an incidence of 82–89% [1]. Early detection and immobilization are crucial for the successful management of scaphoid fractures [2,3,4] since neglected untreated fracture of the scaphoid may lead to complications such as avascular necrosis, carpal instability, and early osteoarthritis [4,5,6,7,8].

X-rays are commonly used to identify scaphoid fractures due to their accessibility and cost-effectiveness. However, the sensitivity for diagnosing scaphoid fracture on wrist radiographs is relatively low (66–81%) with a likelihood of up to 50% of occult scaphoid fractures [9,10,11,12,13]. Radiographic wrist interpretation of scaphoid fracture poses a challenge and is a frequent cause of delayed or incorrect diagnosis in emergency departments. Radiographs are often evaluated by non-specialized physicians or junior orthopedic residents potentially affecting accuracy and leading to missed fractures [14]. Thus, possibly due to the presence of less experienced personnel or fatigue, there is a higher incidence of misdiagnosis during evening and overnight hours, on top of the relatively constant shortage of emergency radiologists available to handle the workload [15].

While advanced imaging modalities such as CT or MRI are more sensitive than radiographs for the detection of occult scaphoid fractures, there are inherent disadvantages in these modalities such as increased radiation exposure by CT and high medical cost by MRI [2, 16, 17].

Therefore, there is both clinical and economic advantage in improving the sensitivity and detection rate of scaphoid fractures on plain radiographs. Convolutional Neural Networks (CNNs) are prominent artificial intelligence (AI) deep learning algorithms for image analysis [18, 19]. CNNs are specifically designed to handle image data by exploiting repeating patterns. They have been used for various medical image processing, such as radiology images, skin lesions, retinal scans, endoscopic images, and histopathologic specimens [19,20,21,22,23,24,25,26,27,28,29].

AI algorithms have proven to be effective in identifying various acute pathologies that are frequently encountered in emergency departments, for example, pulmonary embolism, intra-abdominal free gas, intra-cranial hemorrhage, and femoral neck fractures [15]. Indeed, AI methods have been explored for improving sensitivity in the detection of scaphoid fractures on wrist radiographs.

The aim of this systematic review and meta-analysis is to comprehensively evaluate the existing data on the use of AI for detecting scaphoid fractures on wrist radiographs. The review provides a comprehensive analysis of the usefulness and accuracy of current AI systems in this field, and explores its potential applications in the future.

Materials and methods

The systematic review followed the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) and PRISMA-Diagnostic Test Accuracy [30, 31]. A literature search was conducted in the PubMed database for relevant studies published until July 2023, using search terms: (“Scaphoid Bone”[Mesh] or “Wrist”[Mesh] or “scaphoid” or “scaphoid fracture” or “Wrist Fractures”[Mesh]) and (“deep learning”[Mesh] or “artificial intelligence”[Mesh] or “convolutional neural network” or “CNN” or “Neural Networks, Computer”[Mesh]). The bibliographies of the included studies were also searched for additional relevant studies.

Inclusion criteria for studies were (1) use of AI methods for scaphoid fracture diagnosis on wrist radiographs, (2) reporting of statistical analysis of area under the ROC (receiver operating characteristic) and/or accuracy, (3) publication in English, and (4) original articles.

Data from all included studies was collected using a standardized data extraction sheet, including publication year, journal name and affiliation, study design, study period, number and views of wrist radiographs, number of patients and images, fracture type (visible or occult), standard of reference for diagnosis, number of pipeline steps, AI name, CNN architecture, ROI labeling, data input proportion, diagnostic accuracy/AUC, sensitivity, specificity, and true/false positive/negative. The Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) was used to assess bias and applicability [32].

Statistical analysis

The results of the systematic review and meta-analysis on AI for the detection of scaphoid fractures in wrist radiographs were analyzed based on the contingency table of true positives, false negatives, true negatives, and false positives constructed for each study.

A bivariate diagnostic random-effects meta-analysis was conducted and the findings were assessed with the Summary Receiver Operating Characteristic (SROC) curve [33]. The SROC curve considered the sensitivity and specificity values for a given test, taking into account the various cut-off points in each independent study. The area under the curve (AUC) was calculated for the fitted SROC. AUC is an overall summary of diagnostic accuracy, commonly used when reporting the performance of AI classification models. The ROC curve is plotted with the true positive rate (sensitivity) against the false positive rate (1-specificity). The higher the AUC, the better the model’s performance is at distinguishing between scaphoid fracture and non-fracture plain radiographs. AUC results are considered excellent for values between 0.9 and 1, good for AUC values between 0.8 and 0.9, and fair for AUC values between 0.8 and 0.7. An AUC less than 0.7 is considered poor. Accuracy is defined as the number of correct predictions divided by the total prediction number (true positives + true negatives/total dataset). The across-study heterogeneity was assessed using the I² statistic. The analyses were performed using the R 4.0.5 software and the “mada” and “meta” packages.

Results

Included studies and dataset sizes

The bibliographic search retrieved 303 studies, of which nine studies met the inclusion criteria and one more was retrieved by manual searching [2, 16, 34,35,36,37,38,39,40,41]. A flow diagram of the study selection process is presented in Fig. 1, and the characteristics of the included studies are summarized in Table 1. The studies were all retrospective and published between 2020 and 2023.

Table 1 Included study characteristics

Full size table

The ten included studies ranged in sample size, with the number of images used varying from 356 to 11,838. Table S1 (Supplementary Material) provides a summary of the full numerical data of the studies, including the reference standard (“ground truth”) for diagnosis.

Brief summary of the included studies

Langerhuizen et al [34] developed a single-step CNN model that consisted of only the scaphoid fracture detection step; scaphoid segmentation was done manually. The model showed fair diagnostic accuracy, with an AUC of 0.77, and diagnostic performance was compared to five orthopedic surgeons. The clinicians had better specificity, but the CNN performed similarly in terms of accuracy and sensitivity. However, the algorithm struggled with identifying obvious fractures and had some false positive suggestions.

Ozkaya et al [16] also developed a single-step fracture detection CNN, while scaphoid segmentation was done manually. The CNN’s diagnostic performance was compared with an emergency department physician and two hand surgeons. The CNN had a good AUC of 0.84. The CNN’s performance was comparable to that of a less experienced orthopedic specialist and outperformed the emergency department physician.

Yoon et al [35] developed a 3-step CNN model. Firstly, the scaphoids were localized by a segmentation CNN. Afterwards, two CNNs were used consecutively, one to detect scaphoid fractures, followed by another CNN which examined the negative cases of the previous CNN, in order to detect the occult fractures. The fracture detection CNN achieved excellent diagnostic performance with an AUC of 0.955. The “occult fracture detection” CNN, had a good AUC of 0.81. The full model successfully identified 90.9% of occult fractures.

Hendrix et al [36] developed a 2-step CNN model and compared its diagnostic performance to that of eleven radiologists with various experiences. The first CNN localized the scaphoids and automatically passed the region of interest to the fracture detection CNN. The CNN had a good AUC of 0.87 and demonstrated a performance level comparable to that of the radiologists.

Tung et al [37] developed a 2-step CNN model: scaphoid segmentation CNN followed by scaphoid fracture detection CNN. Ten different CNNs for fracture detection were used and compared, with an AUC ranging from 0.86 to 0.95.

Yang et al [38] developed a 2-step CNN model: scaphoid segmentation CNN and scaphoid fracture detection CNN, which achieved an excellent diagnostic performance with an AUC of 0.917.

Li et al [2] developed a 2-step CNN model: scaphoid segmentation and scaphoid fracture detection; and compared its diagnostic performance to that of four hand surgeons. The fracture detection CNN achieved an excellent diagnostic performance with an AUC of 0.92. They concluded that CNN’s ability was comparable to the majority decision of the surgeons, and has the potential to achieve expert-level performance.

Hendrix et al [39] developed a 2-step CNN model: scaphoid segmentation and laterality classification CNN, and scaphoid fracture detection CNN, and assessed the diagnostic performance of the AI system. They also conducted an observer study in order to clinically validate the AI system performance, and compared the diagnostic performance of five musculoskeletal-expert radiologists with and without AI assistance. The CNN had a good AUC of 0.88, and it was comparable to the average performance (AUC 0.87) of the five radiologists. The AI assistance improved the agreement between radiologists in 5/10 pairs, and it reduced reading time for 4/10 radiologists. However, the algorithm did not improve the diagnostic performance of the majority of radiologists.

Cohen et al [40] used a commercially available AI system and assessed its diagnostic performance in detecting different wrist fractures, including scaphoid fractures. They also conducted an observer study in order to clinically validate the AI system, and compared the diagnostic performance of initial radiology reports (IRR) made by 41 radiologists with various experiences, which are not specialized in musculoskeletal imaging. Afterwards, the IRRs the AI results alone, and the combination of the two were compared. IRR + AI observation was considered positive when it was detected by either the AI or the IRR, regardless of the other’s group result. The sensitivity for scaphoid fracture detection was similar for both AI (84%) and IRR (80%). Analysis of the full cohort, meaning all fracture locations, showed that the combination of IRR + AI had greater sensitivity compared to either AI alone or IRR alone.

Lee et al [41] developed an AI model to detect three common types of wrist fractures: distal radius, ulnar styloid process, and scaphoid fractures. The model consisted of two CNNs: scaphoid segmentation and scaphoid fracture detection, which operated simultaneously and were integrated into a final assessment. Two novice radiologists also diagnosed the fracture sites, both with and without the assistance of the AI model. The diagnostic performance of the AI model was evaluated and compared to the novice radiologists. The AI model had a good AUC of 0.808 for scaphoid fracture detection. When novice radiologists were assisted by the AI model, the AUC for detecting scaphoid fractures significantly increased from 0.75 to 0.85 and from 0.71 to 0.80.

Radiograph characteristics

Two studies [34, 39] used a full wrist series of 4 projections (either anterior–posterior (AP), posterior-anterior (PA), lateral and oblique or AP/PA, ulnar-deviated AP/PA, lateral and oblique). Two studies [2, 35] used 2 projections (PA or scaphoid view), one study [41] used 3 projections of wrist radiographs (AP, lateral, oblique), one study [36] used several hand, wrist, and scaphoid projections, one study [37] used 2 projections (AP and lateral), and one study [16] used 1 projection (AP). Two studies [38, 40] did not specify which projections were used. Exclusion criteria were specified in seven studies [2, 34,35,36, 39,40,41] and included poor radiographic quality, old fractures, immobilization devices, hardware, chronic hand disorders, arthritis, and tumors. The other three studies [16, 37, 38] did not specify any exclusion criteria.

CNN pipeline structure and AI characteristics

One study [40] used commercially available software, while the other nine studies [2, 16, 34,35,36,37,38,39, 41] developed CNNs using radiographs for AI training and testing.

Data on the AI characteristics is shown in Table S2 (Supplementary Material). Nine different CNN architecture types were used in the AI models that were developed, along with various augmentation techniques. Eight studies [2, 16, 34,35,36, 38, 39, 41] utilized a single CNN architecture, which differed in each study. One study [37] compared 10 different CNN architecture types.

The most common CNN models used among the studies were VGG16 [34, 37], Resnet-50 [16, 37], DenseNet-121 [36, 37], ResNet-152 [37, 38], and Inception-V3 [37, 39], with two each. Three studies [2, 35, 36] used the Grad-Cam function to create a heatmap based on the input image and highlight the fracture area. Another heatmap was used in one study [41], and a bounding box to highlight the “zone-of-interest” was integral to the commercially available AI system that was used in a single study [40].

Two studies [34, 36] used a 1-step pipeline model with a CNN solely for fracture detection. Scaphoid segmentation was done manually, by cropping and resizing the scaphoid to fit in a rectangular ROI, which then served as input to the fracture detection CNN.

Six studies [2, 36,37,38,39, 41] used a 2-step pipeline model: a scaphoid segmentation CNN followed by a fracture detection CNN.

An example of such CNN is illustrated in Fig. 2 [36].

One study [35] used a 3-step pipeline model, with a scaphoid segmentation CNN followed by a fracture detection CNN. The negative results from the first two steps were re-evaluated in a third CNN, designed to diagnose occult fractures missed in previous pipeline steps. The entire pipeline was then tested separately on new images. One study used a commercial system, in which scaphoid segmentation and fracture detection were integral [40].

AI performance for scaphoid fracture detection

Data on AI compared to human performance is shown in Table S3 (Supplementary Material). The AI performance of scaphoid fracture detection varied from AUC 0.77 to 0.96. Accuracy of fracture detection ranged from 72.0 to 90.3%. Sensitivity and specificity for each CNN are shown in forest plots (Figs. 3 and 4).

Seven studies reported deep learning algorithm performance, with a combined total of 3373 images. The pooled sensitivity and specificity were 0.80 (95% CI 0.75–0.84) and 0.89 (95% CI 0.82–0.94) respectively (Figs. 3 and 4).

Tung et al [37] compared 10 different CNNs applied to the same dataset and found AUCs ranging from 0.86 to 0.95. The other eight studies [2, 16, 34,35,36, 38, 39, 41] also reported high AUC, from 0.77 to 0.96.

The AUCs were fair for 1 CNN (VGG16 [34]), good for 8 CNNs (ResNet50 [16], Densenet 121 [36], VGG16 + VGG19 + Reset152 + DenseNet169 [37], Inception-V3 [39], NasNet [41]), and excellent for 10 CNNs (EfficientNetB3 [35], ResNet50 + ResNet101 + DenseNet121 + DenseNet201 + InceptionV3 + EfficientNetB0 [37], ResNet152 [38], MobileNetV3 [2]).

Table 2 shows that while the same CNN architecture can have close results reported by different authors, they are not identical.

Table 2 Comparison of AUC results in identical CNN architectures according to different studies

Full size table

Hendrix et al [39] assessed their algorithm’s performance using multiple input configurations for various X-ray projections including PA, ulnar-deviated PA, oblique, and lateral. The algorithm’s fracture detection performance improved when PA views were supplemented with ulnar-deviated PA views (AUC, 0.79 to 0.84), oblique views (AUC, 0.79 to 0.85), and all available views (AUC, 0.79 to 0.88). However, there was no significant improvement with the addition of lateral views.

The Summary Receiver Operating Characteristic (SROC) curve plot of deep learning models for the diagnosis of scaphoid fractures on wrist radiographs is presented in Fig. 5. The area under the SROC curve is 0.88, pooled sensitivity is 80%, and specificity is 89%.

Diagnosis of occult fractures

Langerhuizen et al [34] reported that 5 out of 6 occult scaphoid fractures missed by human observers were detected by AI. Contrary, Ozkaya et al [16] reported the same accuracy rate for AI and humans with similar 7 occult fractures which were missed by both. Yoon et al [35] developed two AI models, the first detecting both apparent and occult scaphoid fractures, and the second applied to negative-fracture images from the first model for occult fractures’ detection. The occult fracture model showed an AUC of 0.81 and the entire pipeline correctly identified 90.9% of the occult fractures.

Quality assessment using the QUADAS-2 tool is summarized in Table 3.

Table 3 Methodological analysis of the included studies based on the QUADAS-2 tool

Full size table

Discussion

The potential role of AI in the detection of radiographic scaphoid fractures is twofold. Firstly, it can help reduce missed scaphoid fractures, and secondly, it can aid in diagnosing occult scaphoid fractures. The current review and meta-analysis show that the overall diagnostic performance of AI in detecting scaphoid fractures is very good and near excellent. AI performance in this field demonstrates promising results in detecting both obvious and occult fractures. While studies comparing AI performance to human controls are lacking, existing data suggests that current AI systems are comparable to the performance level of human experts.

AI features and diagnostic accuracy

The relatively limited number of studies evaluating the function of AI in the detection of scaphoid fracture on wrist radiographs and their recent publication (2020 or later) reflect the fact that this field is new, active, and evolving. CNN systems have evolved from the 1-step pipeline models, reliant on manual scaphoid segmentation, to 2-step pipeline models with integral CNN-based scaphoid segmentation [2, 16, 34,35,36,37,38,39,40,41].

The preliminary results show great promise: 10/19 and 8/19 CNNs show excellent and good AUC performance, respectively. Meaning, these models were able to accurately detect scaphoid fractures in plain radiography after training on considerably small datasets. The different scores of the CNNs may stem from different layer architecture and augmentation techniques. The difference between the same CNN in different studies might come from heterogenous input data, such as the number of images, data proportion in training/validation/test sub-sets, and image quality. The radiographic projections used in the AI systems also vary among studies, leading to diverse training and testing cohorts and different results.

In a recent review [42] that included 14 studies and evaluated the diagnostic performance of AI for detecting femoral neck fracture, a mean AUC of 0.969 was reported. The AUC score is somewhat higher compared to the current pooled SROC score. Yet, it is reasonable when taking into account the lower prevalence (2–10%) of occult proximal femur fractures [43] in comparison with scaphoid fractures.

CNN performance compared to human performance

Results of the studies published so far have been promising, with AI systems showing a fair to excellent performance as measured in AUC [2, 16, 34,35,36,37,38,39, 41].

Only three studies [16, 34, 36] compared the AI diagnostic performance to human controls, in which, results varied. Langerhuizen et al [34] found no significant difference in the accuracy of diagnostic performance between the CNN and the performance of five orthopedic surgeons. Similarly, Hendrix et al [36] found no significant difference in AUC between the CNN and eleven radiologists. On the other hand, Ozkaya et al [16] found that an experienced orthopedic specialist performed better and had a higher AUC than the CNN.

Overall, the pooled sensitivity and specificity of the AI systems were 0.80 and 0.89, respectively. This suggests that AI systems have promising diagnostic performance. However, the human performance comparison groups and the information comprising the datasets in the various studies have a high degree of heterogeneity, making it difficult to draw clear conclusions on the relative performance of AI and human observers.

Three studies [39,40,41] conducted clinical validation studies, to compare human and AI diagnostic performance, and shed light on the potential benefits of utilizing AI in these contexts. Hendrix et al [39] found that while the AI system reduced reading time for certain radiologists, it did not lead to an improvement in the diagnostic performance for the majority of them. On the other hand, Cohen et al [40] reported that the implementation of the AI system increased sensitivity in fracture detection. Furthermore, Lee et al [41] demonstrated that the AI system significantly enhanced the diagnostic performance of two novice radiologists in identifying fractures. These studies collectively highlight the potential advantages of integrating AI technology in diagnostic processes, showcasing improvements in reading time and sensitivity for certain radiologists while also providing substantial enhancements in diagnostic capabilities for less experienced medical professionals.

The GRAD-CAM function was designed to integrate the AI output with current medical systems and allow clinicians to directly evaluate the AI output by highlighting the area of suspected fracture (heatmap). This function, other heatmaps, or any feature that delineates “zone-of-interest” may help direct clinical suspicion to an occult fracture, facilitating early diagnosis and treatment. Indeed, the GRAD-CAM function or similar features were utilized in five studies to highlight the area in the scaphoid most likely to be fractured, according to CNN [2, 35, 36, 40, 41]. The same function was previously utilized in AI systems developed to diagnose femoral neck fractures [42]. Cheng et al [44] reported that integrating this technology into the clinical flow is feasible and improves the diagnostic accuracy of physicians, especially novice clinicians. Sato et al [45] reported that using GRAD-CAM improved the accuracy, sensitivity, and specificity of resident-level physicians in the diagnosis of femoral hip fractures.

There are several limitations in the current review. The literature search was restricted to the English language. The available data sets for training CNNs are relatively small and heterogeneous and do not fully represent the complexity of real-life clinical cases. Furthermore, the CNN architectures differed between the AI systems, which also limits the ability to compare them. Additionally, none of the studies evaluated the performance of their algorithms in a clinical setting, where factors such as low-quality images or the presence of casts may affect diagnosis. Moreover, few studies compared the diagnostic performance of the AI systems to clinicians or radiologists; and only recent studies performed clinical validations to the AI systems.

The use of the QUADAS-2 tool in the included studies revealed a high risk of bias and concerns about applicability in 9 out of 10 studies.

These limitations highlight the need for further research and development in this field, including testing the algorithms in more diverse and representative image sets and evaluating their performance in real-life clinical settings.

In the current literature, there is a paucity of data regarding the use of AI in other modalities such as CT or MRI for detecting scaphoid fractures. This may be attributed to the fact that these modalities are typically deployed as confirmatory tools for suspected occult fractures, rather than being the initial or most commonly used modality for diagnosing and managing scaphoid fractures. Hence, the emphasis on AI development and research has been more pronounced in the context of X-ray imaging.

In conclusion, the current diagnostic performance of AI for detecting scaphoid fractures on wrist radiographs shows promising results, with a high pooled sensitivity and specificity and a high SROC result. AI systems cannot yet replace the human role in scaphoid fracture detection, but they may complement and augment the diagnostic performance of physicians. For novice healthcare practitioners, AI systems can improve the detection rate of fractures, especially in cases of occult fractures. For more experienced clinicians, AI systems may serve as a powerful diagnostic aiding tool, particularly when used in conjunction with heatmaps. Further research is required to establish a comparison between AI and human diagnostic performance in the clinical setting.

Still, existing AI systems can already be beneficial for non-expert clinicians in diagnosing both obvious and occult fracture; and aid experts in facilitating management when used in conjunction with heatmaps.

Abbreviations

AI:: Artificial intelligence
AP:: Anterior-posterior
AUC:: Area under the ROC curve
CNN:: Convolutional Neural Network
CT:: Computed Tomography
GRAD-CAM:: Gradient weighted Class Activation Map
IRR:: Initial radiology reports
MRI:: Magnetic resonance imaging
MSK:: Musculoskeletal
PA:: Posterior-anterior
PRISMA:: Preferred Reporting Items for Systematic Reviews and Meta-analyses
QUADAS:: Quality Assessment of Diagnostic Accuracy Studies
SROC:: Summary Receiver Operating Characteristic

References

Rhemrev SJ, Ootes D, Beeres FJ et al (2011) Current methods of diagnosis and treatment of scaphoid fractures. Int J Emerg Med 4:4. https://doi.org/10.1186/1865-1380-4-4
Article PubMed PubMed Central Google Scholar
Li T, Yin Y, Yi Z, et al (2022) Evaluation of a convolutional neural network to identify scaphoid fractures on radiographs. J Hand Surg Eur. 17531934221127092. https://doi.org/10.1177/17531934221127092
Steinmann SP, Adams JE (2006) Scaphoid fractures and nonunions: diagnosis and treatment. J Orthop Sci 11:424–431. https://doi.org/10.1007/s00776-006-1025-x
Article PubMed PubMed Central Google Scholar
Roolker W, Maas M, Broekhuizen AH (1999) Diagnosis and treatment of scaphoid fractures, can non-union be prevented? Arch Orthop Trauma Surg 119:428–431. https://doi.org/10.1007/s004020050014
Article CAS PubMed Google Scholar
Prosser GH, Isbister ES (2003) The presentation of scaphoid non-union. Injury 34:65–67. https://doi.org/10.1016/s0020-1383(02)00162-6
Article CAS PubMed Google Scholar
Neviaser RJ (1986) On resection of the proximal carpal row. Clin Orthop Relat Res 202:12–15
Article Google Scholar
Sabbagh MD, Morsy M, Moran SL (2019) Diagnosis and management of acute scaphoid fractures. Hand Clin 35:259–269. https://doi.org/10.1016/j.hcl.2019.03.002
Article PubMed Google Scholar
Shetty S, Sidharthan S, Jacob J, Ramesh B (2011) “Clinical scaphoid fracture”: is it time to abolish this phrase? Ann R Coll Surg Engl 93:146–148. https://doi.org/10.1308/147870811X560886
Article CAS PubMed PubMed Central Google Scholar
Balci A, Basara I, Çekdemir EY et al (2015) Wrist fractures: sensitivity of radiography, prevalence, and patterns in MDCT. Emerg Radiol 22:251–256. https://doi.org/10.1007/s10140-014-1278-1
Article PubMed Google Scholar
Welling RD, Jacobson JA, Jamadar DA et al (2008) MDCT and radiography of wrist fractures: radiographic sensitivity and fracture patterns. AJR Am J Roentgenol 190:10–16. https://doi.org/10.2214/AJR.07.2699
Article PubMed Google Scholar
de Zwart AD, Beeres FJP, Rhemrev SJ et al (2016) Comparison of MRI, CT and bone scintigraphy for suspected scaphoid fractures. Eur J Trauma Emerg Surg 42:725–731. https://doi.org/10.1007/s00068-015-0594-9
Article PubMed Google Scholar
Tiel-van Buul MM, van Beek EJ, Broekhuizen AH et al (1993) Radiography and scintigraphy of suspected scaphoid fracture. A long-term study in 160 patients. J Bone Joint Surg Br 75:61–65. https://doi.org/10.1302/0301-620X.75B1.8421037
Article CAS PubMed Google Scholar
Gibney B, Smith M, Moughty A et al (2019) Incorporating cone-beam CT into the diagnostic algorithm for suspected radiocarpal fractures: a new standard of care? AJR Am J Roentgenol 213:1117–1123. https://doi.org/10.2214/AJR.19.21478
Article PubMed Google Scholar
Gäbler C, Kukla C, Breitenseher MJ et al (2001) Diagnosis of occult scaphoid fractures and other wrist injuries. Are repeated clinical examinations and plain radiographs still state of the art? Langenbecks Arch Surg 386:150–154. https://doi.org/10.1007/s004230000195
Article PubMed Google Scholar
Katzman BD, van der Pol CB, Soyer P, Patlas MN (2023) Artificial intelligence in emergency radiology: a review of applications and possibilities. Diagn Interv Imaging 104:6–10. https://doi.org/10.1016/j.diii.2022.07.005
Article PubMed Google Scholar
Ozkaya E, Topal FE, Bulut T et al (2022) Evaluation of an artificial intelligence system for diagnosing scaphoid fracture on direct radiography. Eur J Trauma Emerg Surg 48:585–592. https://doi.org/10.1007/s00068-020-01468-0
Article PubMed Google Scholar
Wijetunga AR, Tsang VH, Giuffre B (2019) The utility of cross-sectional imaging in the management of suspected scaphoid fractures. J Med Radiat Sci 66:30–37. https://doi.org/10.1002/jmrs.302
Article PubMed Google Scholar
Klang E (2018) Deep learning and medical imaging. J Thorac Dis 10:1325–1328. https://doi.org/10.21037/jtd.2018.02.76
Article PubMed PubMed Central Google Scholar
Soffer S, Klang E, Shimon O et al (2020) Deep learning for wireless capsule endoscopy: a systematic review and meta-analysis. Gastrointest Endosc 92:831-839.e8. https://doi.org/10.1016/j.gie.2020.04.039
Article PubMed Google Scholar
Soffer S, Ben-Cohen A, Shimon O et al (2019) Convolutional neural networks for radiologic images: a radiologist’s guide. Radiology 290:590–606. https://doi.org/10.1148/radiol.2018180547
Article PubMed Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10.1038/nature14539
Article CAS PubMed Google Scholar
Litjens G, Kooi T, Bejnordi BE et al (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88. https://doi.org/10.1016/j.media.2017.07.005
Article PubMed Google Scholar
McBee MP, Awan OA, Colucci AT et al (2018) Deep learning in radiology. Acad Radiol 25:1472–1480. https://doi.org/10.1016/j.acra.2018.02.018
Article PubMed Google Scholar
Klang E, Barash Y, Margalit RY et al (2020) Deep learning algorithms for automated detection of Crohn’s disease ulcers by video capsule endoscopy. Gastrointest Endosc 91:606-613.e2. https://doi.org/10.1016/j.gie.2019.11.012
Article PubMed Google Scholar
Barash Y, Klang E (2019) Automated quantitative assessment of oncological disease progression using deep learning. Ann Transl Med 7:S379. https://doi.org/10.21037/atm.2019.12.101
Article PubMed PubMed Central Google Scholar
Christopher M, Belghith A, Bowd C et al (2018) Performance of deep learning architectures and transfer learning for detecting glaucomatous optic neuropathy in fundus photographs. Sci Rep 8:16685. https://doi.org/10.1038/s41598-018-35044-9
Article CAS PubMed PubMed Central Google Scholar
Hosseinzadeh Kassani S, Hosseinzadeh Kassani P (2019) A comparative study of deep learning architectures on melanoma detection. Tissue Cell 58:76–83. https://doi.org/10.1016/j.tice.2019.04.009
Article PubMed Google Scholar
Anteby R, Horesh N, Soffer S et al (2021) Deep learning visual analysis in laparoscopic surgery: a systematic review and diagnostic test accuracy meta-analysis. Surg Endosc 35:1521–1533. https://doi.org/10.1007/s00464-020-08168-1
Article PubMed Google Scholar
Anteby R, Klang E, Horesh N et al (2021) Deep learning for noninvasive liver fibrosis classification: a systematic review. Liver Int 41:2269–2278. https://doi.org/10.1111/liv.14966
Article PubMed Google Scholar
Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Intern Med 151(264–269):W64. https://doi.org/10.7326/0003-4819-151-4-200908180-00135
Article PubMed Google Scholar
McInnes MDF, Moher D, Thombs BD et al (2018) Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: the PRISMA-DTA statement. JAMA 319:388–396. https://doi.org/10.1001/jama.2017.19163
Article PubMed Google Scholar
Whiting PF, Rutjes AWS, Westwood ME et al (2011) QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med 155:529–536. https://doi.org/10.7326/0003-4819-155-8-201110180-00009
Article PubMed Google Scholar
Reitsma JB, Glas AS, Rutjes AWS et al (2005) Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 58:982–990. https://doi.org/10.1016/j.jclinepi.2005.02.022
Article PubMed Google Scholar
Langerhuizen DWG, Bulstra AEJ, Janssen SJ et al (2020) Is deep learning on par with human observers for detection of radiographically visible and occult fractures of the scaphoid? Clin Orthop Relat Res 478:2653–2659. https://doi.org/10.1097/CORR.0000000000001318
Article PubMed PubMed Central Google Scholar
Yoon AP, Lee Y-L, Kane RL et al (2021) Development and validation of a deep learning model using convolutional neural networks to identify scaphoid fractures in radiographs. JAMA Netw Open 4:e216096. https://doi.org/10.1001/jamanetworkopen.2021.6096
Article PubMed PubMed Central Google Scholar
Hendrix N, Scholten E, Vernhout B et al (2021) Development and validation of a convolutional neural network for automated detection of scaphoid fractures on conventional radiographs. Radiol Artif Intell 3:e200260. https://doi.org/10.1148/ryai.2021200260
Article PubMed PubMed Central Google Scholar
Tung Y-C, Su J-H, Liao Y-W et al (2021) High-performance scaphoid fracture recognition via effectiveness assessment of artificial neural networks. Appl Sci 11:8485. https://doi.org/10.3390/app11188485
Article CAS Google Scholar
Yang T-H, Horng M-H, Li R-S, Sun Y-N (2022) Scaphoid fracture detection by using convolutional neural network. Diagnostics (Basel) 12:895. https://doi.org/10.3390/diagnostics12040895
Article PubMed Google Scholar
Hendrix N, Hendrix W, van Dijke K et al (2023) Musculoskeletal radiologist-level performance by using deep learning for detection of scaphoid fractures on conventional multi-view radiographs of hand and wrist. Eur Radiol 33:1575–1588. https://doi.org/10.1007/s00330-022-09205-4
Article PubMed Google Scholar
Cohen M, Puntonet J, Sanchez J et al (2023) Artificial intelligence vs. radiologist: accuracy of wrist fracture detection on radiographs. Eur Radiol 33:3974–3983. https://doi.org/10.1007/s00330-022-09349-3
Article PubMed Google Scholar
Lee K-C, Choi IC, Kang CH et al (2023) Clinical validation of an artificial intelligence model for detecting distal radius, ulnar styloid, and scaphoid fractures on conventional wrist radiographs. Diagnostics (Basel) 13:1657. https://doi.org/10.3390/diagnostics13091657
Article PubMed Google Scholar
Cha Y, Kim J-T, Park C-H et al (2022) Artificial intelligence and machine learning on diagnosis and classification of hip fracture: systematic review. J Orthop Surg Res 17:520. https://doi.org/10.1186/s13018-022-03408-7
Article PubMed PubMed Central Google Scholar
Deleanu B, Prejbeanu R, Tsiridis E et al (2015) Occult fractures of the proximal femur: imaging diagnosis and management of 82 cases in a regional trauma center. World J Emerg Surg 10:55. https://doi.org/10.1186/s13017-015-0049-y
Article PubMed PubMed Central Google Scholar
Cheng C-T, Ho T-Y, Lee T-Y et al (2019) Application of a deep learning algorithm for detection and visualization of hip fractures on plain pelvic radiographs. Eur Radiol 29:5469–5477. https://doi.org/10.1007/s00330-019-06167-y
Article PubMed PubMed Central Google Scholar
Sato Y, Takegami Y, Asamoto T et al (2021) Artificial intelligence improves the accuracy of residents in the diagnosis of hip fractures: a multicenter study. BMC Musculoskelet Disord 22:407. https://doi.org/10.1186/s12891-021-04260-2
Article PubMed PubMed Central Google Scholar

Download references

Funding

The authors state that this work has not received any funding.

Author information

Authors and Affiliations

Department of Diagnostic Imaging, Sheba Medical Center, 2 Sheba Road, 5262000, Ramat Gan, Israel
Matan Kraus, Eli Konen, Iris Eshed & Eyal Klang
Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
Matan Kraus, Roi Anteby, Eli Konen, Iris Eshed & Eyal Klang
Department of General Surgery, Sheba Medical Center, 2 Sheba Road, 5262000, Ramat Gan, Israel
Roi Anteby

Authors

Matan Kraus
View author publications
You can also search for this author in PubMed Google Scholar
Roi Anteby
View author publications
You can also search for this author in PubMed Google Scholar
Eli Konen
View author publications
You can also search for this author in PubMed Google Scholar
Iris Eshed
View author publications
You can also search for this author in PubMed Google Scholar
Eyal Klang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matan Kraus.

Ethics declarations

Guarantor

The scientific guarantor of this publication is Dr. Eyal Klang.

Conflict of interest

The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.

Statistics and biometry

One of the authors has significant statistical expertise.

Informed consent

Written informed consent was not required for this study because it is a systematic review and meta-analysis.

Ethical approval

Institutional Review Board approval was not required because it is a systematic review and meta-analysis.

Study subjects or cohorts overlap

Some study subjects or cohorts have been previously reported in the studies that were included in the systematic review and meta-analysis.

Methodology

• Retrospective

• Performed at one institution

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 281 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kraus, M., Anteby, R., Konen, E. et al. Artificial intelligence for X-ray scaphoid fracture detection: a systematic review and diagnostic test accuracy meta-analysis. Eur Radiol 34, 4341–4351 (2024). https://doi.org/10.1007/s00330-023-10473-x

Download citation

Received: 27 July 2023
Revised: 22 September 2023
Accepted: 28 September 2023
Published: 15 December 2023
Issue Date: July 2024
DOI: https://doi.org/10.1007/s00330-023-10473-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Artificial intelligence for X-ray scaphoid fracture detection: a systematic review and diagnostic test accuracy meta-analysis

Abstract

Objectives

Materials and methods

Results

Conclusions

Clinical relevance statement

Key Points

Similar content being viewed by others

Musculoskeletal radiologist-level performance by using deep learning for detection of scaphoid fractures on conventional multi-view radiographs of hand and wrist

Diagnostic Accuracy of Artificial Intelligence-Based Algorithms in Automated Detection of Neck of Femur Fracture on a Plain Radiograph: A Systematic Review and Meta-analysis

Artificial intelligence fracture recognition on computed tomography: review of literature and recommendations

Introduction

Materials and methods

Statistical analysis

Results

Included studies and dataset sizes

Brief summary of the included studies

Radiograph characteristics

CNN pipeline structure and AI characteristics

AI performance for scaphoid fracture detection

Diagnosis of occult fractures

Discussion

AI features and diagnostic accuracy

CNN performance compared to human performance

Abbreviations

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Guarantor

Conflict of interest

Statistics and biometry

Informed consent

Ethical approval

Study subjects or cohorts overlap

Methodology

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 281 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation