These results indicate that a DCNN can be trained to identify hip fractures within image datasets with high sensitivity (98%) and accuracy (91%) (F1 score, 0.91). Currently, hip fractures occur daily, and uneven or nondisplaced hip fractures are challenging to rapidly identify because of the limitations of the human eye and PXRs [14, 15]. With the assistance of the DCNN, we can detect hip fractures immediately with a low false-negative rate (0.02), which is noninferior to the performance of the experts. This DCNN will be useful for primary physicians to lessen the misdiagnosis rate and to prevent subsequent misdiagnosis events. Hip fractures are a promising target for deep learning approaches because of the availability of near-perfect ground truth labels. Because of the weight-bearing nature of the region, patients who have clinically “silent” fractures rapidly develop severe pain and immobility. Early detection and surgery are critical for patient survival and the preservation of hip function. Postponed management of hip fractures results in a poor prognosis and even an increased risk of death years later [35,36,37,38]. Therefore, detecting hip fractures as soon as possible is critical for remote mortality and medical outcomes.
DCNNs allow computers to learn from iterations with automatic feature extraction under limited programming, and the prediction rate is highly accurate. Artificial intelligence and the automation of bony fracture detection have been discussed [25, 26, 28]; Gale et al also described a DCNN method that was used for predicting hip fractures . However, the model developed previously for hip fractures is a region base model and requires a localization network to identify the femoral neck first. Our study shows that the detection and diagnosis of hip fractures on PXRs could be performed by the input of a whole-scale radiograph to a DCNN without identifying the specific region first. The deep learning algorithm also achieved an accuracy level that is compatible with the accuracy of radiologists and orthopedic surgeons. Moreover, this study also substantially added to other current studies.
First, we applied a transferring learning method to develop our algorithm. We set the pretrained model using 25,505 unlabeled limb radiographs instead of the Image-Net images because we believe that a pretrained model using similar images reduces the required image sample size and training time. This study does not compare performance between the Image-Net pretrained model and the limb pretrained model. We entered 2804 frontal PXRs as training material. After evaluation, our accuracy increased from 79% (scratch pretrained) to 91% (limb dataset pretrained), as we expected, and the pretraining material also impacted the final accuracy.
Unlike in previous works, the process did not require much extensive processing, lesion segmentation, or extraction of domain-specific visual features. In contrast, our system needs no handcrafted features, and it is trained end to end directly from image labels and raw pixels. Our fully automated system takes PXRs and automatically detects the presence of hip fractures. These results demonstrate that deep neural networks can be trained using sizable non-pixel labeled datasets without having to specify lesion-based features. Our research shows that despite the challenges specific to radiographic data, the development of large, clean datasets is sufficient to achieve high-level automated performance with deep learning systems. In this way, we can save the time associated with segmentation and labeling.
One paradox in DCNNs for analyzing medical images is the “black box” mechanism. The model may use another part of the image rather than the true lesion site to produce the answer. Therefore, visualization of the features became a solution to realizing the underlying mechanism of DCNNs . In this study, we used whole PXRs for training and testing. Then, we performed Grad-CAM to visualize the class-discriminative regions as the fracture sites that DCNN recognizes in the PXRs (Fig. 3). On the other hand, in normal PXRs, the Grad-CAM films lacked a heatmap (Fig. 4). After visualization by the Grad-CAM method, 95.9% of the class-discriminative regions contained the fracture site, which provided evidence that the model indeed recognized the hip fracture. To understand what a DCNN uses to make predictions is a very active research topic in medical aspects, and it may convince doctors to accept the results because the DCNN can explain what they find.
Most deep learning works evaluating medical images use cropped images to avoid “black box” mechanisms and enhance the accuracy of final validation [29, 32]. Once the target is cropped to include the necessary features for recognition, the DCNN will detect the lesion more easily and quickly. In this study, instead of cropping images, we reduced the image matrix size to 512 × 512 pixels. We prefer to input whole PXR images because this method might be more instinctual and physicians are more familiar with it. Because we integrated the DCNN into the clinical pathway, doctors will prefer to use whole images rather than cropped images. The dimensionality reduction also decreases the computational requirement and shortens the training time with an acceptable result. Furthermore, based on our model, we also applied this algorithm to other types of fracture in PXRs. In the future, based on the observations from this study, the development of similar high-performance algorithms for medical imaging might shorten the training process time and resources required.
This study has several limitations. One fundamental limitation arises from the nature of DCNNs because the neural network was provided with only an image and associated diagnosis, without explicit definitions of features. Because the DCNN “learned” the features that were most predictive, the algorithm might use features previously unknown to or ignored by humans. Although this study showed good visualization for identifying fractures, the exact features being used are still unknown. It is possible that the heatmaps show differences in femoral alignment or soft tissue contrast due to edema or hemorrhage between the fractured and nonfractured sites. In this study, two of the images did not demonstrate the right activation location of the fracture. One of the wrong activation sites is located on the opposite hip, and another had a stronger signal over the iliac bone region. This error is indeed a limitation of DCNN because it is difficult to explain why it activates at the wrong site. Inputting the whole image to the model is also the most challenging part, and we will exert further effort to solve this issue using a greater number of input samples. In addition, the algorithm was specifically trained to discriminate between healthy bones and fractures in the background of the bony architecture on radiographs, but the algorithm might be unable to identify other pathological presentations. The detection of other lesions on PXRs, which is relevant for routine diagnoses, was not included in this study. In the current study, although we used only PXR as the input material, we still observed differences in age, gender, and ISS between both fracture and nonfracture patients, which might create some selective bias. Furthermore, we extracted the hip fracture images by final diagnosis from the registry and identified the fracture site on the images when we validated the performance of the DCNN. Therefore, it is difficult to clarify the exact occult fracture rate. In this study, we still manually excluded a certain proportion of images, which will be problematic when trying to incorporate this process into the clinical workflow. Finally, although the results of this study are promising, integrating this automatic detection algorithm into clinical work to increase the detection rate of hip fractures presents another challenge. Therefore, we utilized a web-based system that can input the PXR from PACS in the hospital, and the DCNN will detect the presence of a fracture or not and localize the fracture region by Grad-CAM. A randomized, prospective study should be conducted to evaluate the clinical impact on the diagnostic accuracy and economic value of DCNN for identifying hip fractures.
In conclusion, to identify hip fractures on PXRs, the algorithm trained by a DCNN achieved excellent performance with a high accuracy and low false-negative rate and is useful as a screening tool. Furthermore, our algorithm can localize the fracture site with high accuracy and thus assist clinical physicians in identifying more occult hip fractures and managing patients with these fractures early to prevent further medical costs and decreased quality of life.