Reporting errors in plain radiographs for lower limb trauma—a systematic review and meta-analysis

Introduction Plain radiographs are a globally ubiquitous means of investigation for injuries to the musculoskeletal system. Despite this, initial interpretation remains a challenge and inaccuracies give rise to adverse sequelae for patients and healthcare providers alike. This study sought to address the limited, existing meta-analytic research on the initial reporting of radiographs for skeletal trauma, with specific regard to diagnostic accuracy of the most commonly injured region of the appendicular skeleton, the lower limb. Method A prospectively registered, systematic review and meta-analysis was performed using published research from the major clinical-science databases. Studies identified as appropriate for inclusion underwent methodological quality and risk of bias analysis. Meta-analysis was then performed to establish summary rates for specificity and sensitivity of diagnostic accuracy, including covariates by anatomical site, using HSROC and bivariate models. Results A total of 3887 articles were screened, with 10 identified as suitable for analysis based on the eligibility criteria. Sensitivity and specificity across the studies were 93.5% and 89.7% respectively. Compared with other anatomical subdivisions, interpretation of ankle radiographs yielded the highest sensitivity and specificity, with values of 98.1% and 94.6% respectively, and a diagnostic odds ratio of 929.97. Conclusion Interpretation of lower limb skeletal radiographs operates at a reasonably high degree of sensitivity and specificity. However, one in twenty true positives is missed on initial radiographic interpretation and safety netting systems need to be established to address this. Virtual fracture clinic reviews and teleradiology services in conjunction with novel technology will likely be crucial in these circumstances.


Introduction
In February of 1896, at the physics laboratory of Dartmouth College, Edwin Brant Frost used what were then known as roentgen rays to capture an image of the healing ulna of his patient, Edward McCarthy [1]. The supreme clinical applications of this novel technology were not lost on early observers, Silvanus P. Thompson (President of the Roentgen Society) said a year later: 'Excepting only the introduction into surgery by Lord Lister of antiseptics, and the discovery of anaesthetics, no discovery in the present century has done so much for operative surgery as this of the roentgen rays' [1].Over the following 130 years of clinical practice, plain radiographs have remained foundational to the investigation of musculoskeletal injuries. The WHO estimates that 3.6 billion investigations using ionising radiation are performed globally each year, the majority of which being simple X-rays [2]. In the UK, more than 60% of emergency department attendances have a primary diagnosis relating to the musculoskeletal (MSK) system [3]. In total, 38.7% of all patients will receive at least one plain radiograph and in MSK injuries this rises to over 50% [4].
Despite being ubiquitous, the interpretation of skeletal radiographs is challenging, and errors can be of significant detriment to both patients and care providers. The interpretation of radiographs in a trauma setting is especially fraught, with high patient turnover and often junior staff. Consequently, emergency departments are recognised as 'high risk' for diagnostic error [5]. Research reviewing UK medicolegal claims in skeletal radiology between 1995 and 2006, showed the 'great majority followed missed diagnoses of fractures following trauma' [6].
Existing research has shown variable levels of performance in the initial interpretation of skeletal radiographs for trauma. Across all radiographs in the emergency department setting, an error rate of approximately 3% has been shown [7]. In the upper limb, estimates suggest incorrect assessment is made in around 8.5% of cases [7,8].
There have so far been limited attempts to produce summary rates of reporting error in plain skeletal radiographs of lower limb trauma, despite a body of individual studies assessing this both in generality and by more specific anatomical site.
The aims of this study were to conduct a systematic review and meta-analysis of the existing literature to establish sensitivity, specificity, and diagnostic odds ratio for the initial interpretation of lower limb radiographs (including those of anatomical sub-divisions; foot, ankle, knee and femur).

Review protocol and search strategy
This systematic review was prospectively registered with the PROSPERO database, a copy of the review protocol can be found under registration number CRD42020197973.
In April of 2020, the PubMed MEDLINE, Embase, Cochrane Database of Systematic Reviews (CDSR) and Cochrane Central Register of Controlled Trials (CENTRAL) databases were scrutinised from 1990 to the present, using a search strategy developed with the aid of Imperial College Library Services. The full electronic search strategy is detailed in Fig. 1.

Eligibility criteria
In accordance with the objectives of this study, eligibility criteria were developed by the authors to identify papers containing pertinent data for inclusion. These were as follows: & Written in the English language & Conducted during or after 1990 & Original research, published in peer reviewed, academic journals (editorial letters, opinion pieces and expert reviews were excluded) & Reporting an initial assessment of plain radiographs of the lower limb, performed by identified members of staff or grade of staff and compared to a definitive assessment of findings & Investigation of subjects with a confirmed or suspected trauma and orthopaedic injury, as characterised by the WHO ICD 11 & Radiographs included for review being of skeletally mature subjects & Conducted in active healthcare settings where diagnostic services are provided to a patient population & Outcomes reported with respect to accuracy, specificity or sensitivity of radiograph reporting & Outcomes reported with respect to specific anatomical site or regional anatomy

Study selection
An initial sample of 200 search results was reviewed for inclusion by the six reviewing authors (TY, CF, KR, GM, HJ, WH). Using the eligibility criteria against title and abstract, each author sorted these 200 results into 'reject' or 'further review' categories. Inter-reader reliability assessment was then performed to establish the degree of agreement amongst the authors on those articles meriting further review. Fleiss' Multirater Kappa was calculated to be 0.640 (p < .005), conventionally taken to represent substantial agreement [9]. Each author then individually assessed an equal share of the remaining results by title and abstract, again categorising as 'reject' or 'further review'. These, along with the reviewed results of the initial sample were combined, and further categorised on the basis of the anatomical region to which they related: lower limb, upper limb, pelvis, spine and thorax, skull and facial. Where an article included data pertinent to more than one anatomical region, it was duplicated, and a copy assigned to both.
TY and CF then reviewed the full text of all potentially eligible results categorised as lower limb against the aforementioned eligibility criteria. Where disparity arose, it was resolved by means of further review and joint assessment.

Data collection and assessment
A bespoke data extraction tool was developed by the authors; this was applied to all included studies. Variables recorded were radiograph reporting population, male/female % of radiograph subjects, recruitment methods to study, anatomical site identified, reporting accuracy/error rate %, specificity %, sensitivity % and qualitative outcome statement.
An assessment was made of methodological quality using the MINORS tool [10] and of risk of bias using a modified Cochrane RoB2 tool [11]. Where the authors initially made a divergent assessment of any study, a consensus evaluation was formed.

Summary and synthesis
The radiograph reporting populations, reporting accuracy and specificity/sensitivity were identified as the principle summary measures. Meta-analysis was then performed in order to produce summary estimates of specificity and sensitivity, including covariates by anatomical site, using HSROC and bivariate model analysis.

Study selection and characteristics
After the removal of duplicates, a total of 3887 papers were identified for screening. Following abstract review, 89 articles were progressed to full-text review. A total of 23 articles were included for qualitative synthesis, of which 10 articles yielded data suitable for meta-analysis [12][13][14][15][16][17][18][19][20][21]. These 10 articles examined an aggregate of 3902 sets of radiographs, producing a total of 4709 radiograph interpretation episodes for metaanalysis (see Fig. 2).
The studies primarily involved the comparison of plain film radiology with an alternative form of imaging (n = 6). Alternatively, inter reader plain film X-ray diagnostic performance was examined (n = 1), or the value of additional X-ray views on diagnostic performance (n = 2), or both (n = 1). The seniority of the studied initial reporters ranged from postgraduate surgical and radiology trainees to senior orthopaedic surgeons, radiologists and emergency physicians.
There was some variation across the ten articles included in the meta-analysis, specifically regarding the definition of a 'positive' and 'negative' radiographic finding. One article [14] defined positive and negative findings as the presence or absence of any bony or soft tissue pathology. This included soft tissue injury, fractures, dislocations, osteomyelitis and osteoporosis. The other nine articles defined positive and negative finding as the presence or absence of a bony fracture [12,13,[15][16][17][18][19][20][21]. However, two of these nine articles went further and required radiograph interpreters to correctly classify any fracture identified for their findings to be regarded as a 'true' positive. Utukuri [12] required interpreters to specify if a calcaneal fracture was intra-or extra-articular. For proximal femur fractures, Riaz O et al. [18] required radiograph interpreters to correctly specify the location and degree of fracture displacement.

Individual study results
Across all lower limb studies sensitivity ranged from 0.59-0.97, and specificity from 0.66-1.00. Utukuri [12] found the highest sensitivity in initial interpretation, with 0.97 achieved for radiographs of the foot. Ricci [21] found the lowest specificity with only 0.65 achieved for lower limb radiographs (see Table 2).

Synthesis of results
A bivariate model was used to conduct meta-analysis along with a hierarchical summary receiver operating characteristic (HSROC) curve for diagnostic performance across all lower limb plain radiographs (see Fig. 3).  The summary estimate of sensitivity across the included studies was 93.5%, with specificity of 89.7% and a false positive rate of 10.3%. Covariate analysis was also performed to assess specificity and sensitivity by lower limb anatomical subdivision; this was possible for all subdivisions apart from the knee where only a single included study was found (see Table 3).
Summary sensitivity and specificity were both found to be highest for ankle radiographs, 98.1% and 94.6% respectively. Similarly, the initial interpretation of ankle radiographs had the highest diagnostic odds ratio (929.97).

Risk of bias assessment
All studies included in meta-analysis were analysed using a modified Cochrane risk of bias tool, this qualitative tool assesses study risk of bias on seven separate criteria. One study was considered to be at high risk of bias due to scoring in greater than four categories. Four studies were considered at moderate risk of bias due to scoring in three or more categories or scoring particularly strongly in one of two categories. Five studies scored in two or fewer categories and so were considered to have a low risk of bias (see Table 4).

Methodological quality
The methodological quality of the ten articles identified for meta-analysis was assessed using the 'Minors' (methodological index for non-randomised studies) tool developed by Slim et al. The range of scores was 13-22 out of a possible 24 points. Articles generally scored highly (average score 16.9).
Nine (90%) of the studies lacked prospective calculations of size, and seven (70%) did not possess an unbiased assessment of their endpoint (see Table 5). Conversely, the studies

Key findings
This study finds that initial interpreters of lower limb plain radiographs for trauma achieve a relatively high degree of sensitivity (93.5%). It is difficult to quantify the rate at which healthcare systems are justified in accepting the failure to detect findings. Certainly, false negatives are likely to represent the most deleterious of these errors; borne-out by the evidence on litigation for missed fractures both in the UK [6,22] and abroad [23,24].
False negatives in the initial interpretation of greater than one in twenty lower limb radiographs, mean that busy accident and emergency or trauma settings are likely to miss substantial numbers of injuries. This appears to support the necessity of safety-netting measures to mitigate the risk of reporting errors. In particular, virtual fracture clinic review [25] and out-of-hours teleradiology services [26] have been widely adopted across the UK and Europe. Alongside these existing methods, the development of novel technologies (such as artificial intelligence algorithms [27]) to supplement interpretation is evidence of a broadly accepted clinical need to improve this reporting.
The summary specificity of reporting was found to be 3.8% lower (89.7%) than sensitivity, suggesting that initial interpreters were less able to identify true negative skeletal radiographs. This finding was commented upon by Utukuri et al. [12] and is also supported by a wider evidence base that shows increasing the seniority of interpreters has a greater benefit to specificity than sensitivity [28,29]. This implies that some interpretation errors, particularly false negatives, represent a limitation of plain radiographs as a modality and so are not easily preventable. These findings also explain the conclusions of the qualitative synthesis which highlighted the importance of corroborating radiograph interpretation with examination and clinical judgement to prevent fractures being 'missed' [7,[30][31][32].
Of the compared anatomical subdivisions, the diagnostic odds ratio for ankle radiographs was found to be superior, followed by the foot and then the femur. The cause for this is not explored in this study; however, the frequency with which ankle injuries present to emergency and trauma care settings may mean initial interpreters are more practiced in the review of these radiographs. The ankle is both the most commonly injured joint, and also the most frequently operated upon [33]; with the estimated incidence for fractures of the ankle being as high as 187 per 100,000 people per annum [34].

Limitations
Of the included studies, a generally favourable assessment of risk of bias and methodological quality was made. However, weaknesses were noted due to lack of prospective size calculation and establishing an unbiased endpoint. The extent to which these factors influence results is uncertain; however, sample sizes in a number of studies appear underpowered [12,19,20].
During study selection, a number of large sample-size papers were identified but lacked sufficient characterisation of data for inclusion in meta-analysis. Whilst these are a targeted for use in future analysis, they emphasise the importance of reporting diagnostic accuracy along STARD 2015 [35] or similar, relevant guidelines.

Conclusions
This study suggests that the initial interpretation of plain skeletal radiographs is performed with a relatively high degree of specificity and sensitivity. However, this still represents greater than one in twenty true positives being missed on primary review. The necessity of systems designed to provide safety netting against this are paramount, as are the development of novel means to improve the accuracy of initial interpretation.
Evidence is also found to support statistically significant variation in the accuracy of interpretation across anatomical subdivisions; radiographs of the ankle were shown to have the highest diagnostic odds ratio. The cause of this is uncertain and may reflect inherent difficulties present in certain radiographic views or anatomy, or simply greater interpreter familiarity with some radiographs. Further research is warranted to explore these factors.
Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.