Background

Avascular Necrosis of Femur Head (ANFH), or osteonecrosis of the femoral head, is a pathologic process, which was first seen in the weight-bearing area of the femur. The stress can lead to bone trabecular structure injury (microfracture) and influence the repair process of the femur, and if not managed timely, it leads to the collapse and deformation of the femur. With many etiological factors, ANFH results from interruption of blood supply to the bone and then leads to ischemic necrosis. ANFH can be divided in traumatic ANFH and non-traumatic ANFH with the non-traumatic ANFH further dividing into steroid-induced and alcoholic non-traumatic ANFH and so on. The timely treatment of early ANFH could promote the recovery the disease. However, in the late stage, it results in femur collapse, loss of hip function, and a very poor outcome that affects the quality of life. Therefore, the early diagnosis of ANFH is of great significance [1,2,3].

Several methods for early diagnosis of ANFH have been proposed, including MRI, SPECT, CT, X-ray, DSA, and laser Doppler with different characteristics. MRI has been characterized as being non-invasive, rapid and high sensitive, and commonly used by many clinicians [4,5,6]. Furthermore, MRI has been used in many studies in the diagnosis of early ANFH. Therefore, in this paper, a systematic review and meta-analysis of all qualified studies were performed to explore the diagnosis accuracy of MRI in early ANFH.

Methods

Search strategy

The following electronic databases were searched from their inception to December 2017: The Cochrane, PubMed, Embase database, for all the qualified trails that analyze the diagnostic accuracy of MRI of early osteonecrosis of the femoral head. Other related articles and reference materials were also identified for additional available studies. The literatures were searched independently by two investigators, and a third investigator was involved to reach an agreement.

Study selection

The studies that met the following criteria were included in our review: (1) prospective cohort study or cross-sectional study; (2) the research objects are patients suspected with early osteonecrosis of the femoral head without other serious diseases; (3) the studies provided the data of true positive (TP), false positive (FP), false negative (FN), and true negative (TN); and (4) the publications were only available in English and Chinese.

The studies that met the following criteria were excluded in our review: (1) repeat publications, or shared content and results; (2) case report, theoretical research, conference report, systematic review, meta-analysis, expert comment, and economic analysis; (3) the outcomes were not relevant; and (4) two or more results of the TP, FP, FN, and TN were zero.

Data extraction and quality assessment

Two independent investigators extracted the following data based on predefined criteria. Differences were settled by discussion with a third reviewer. The analyses data were extracted from all the included studies and consisted of two parts: basic information and main outcomes. The first part was about the basic information: the author name, the sample size, the percentage of male, and the age. The second part was the clinical outcomes. A 2 × 2 contingency table was constructed for each selected study; the results corresponding to the gold standard and MRI were selected as positive or negative. The data included true positive (TP), false positive (FP), false negative (FN), and true negative (TN). In studies in which one single cell in the 2 × 2 contingency table had a value of 0, 0.5 were added to all of the cells for calculation. Sensitivity, specificity, and likelihood ratio were calculated respectively, and the diagnostic odds ratio (DOR) was used as the measure of diagnostic accuracy. A DOR value of 1 indicates a test without discriminatory power, and the higher the DOR value is, the greater the degree of relevance of the assessed diagnostic test. The studies were performed by two reviewers independently. Any arising difference was resolved by discussion.

Statistical analysis

All statistical analyses were performed in the STATA 10.0 (TX, USA). Chi-squared and I2 tests were used to assess the heterogeneity of clinical trial results and determine the analysis model (fixed-effects model or random-effects model). When the chi-squared test P value was ≤ 0.05 and I2 test value was > 50%, it was defined as high heterogeneity and assessed by random-effects model. When the chi-squared test P value was > 0.05 and I2 tests value was ≤ 50%, it was defined as acceptable heterogeneity data and assessed by fixed-effects model. For further assessment of heterogeneity, diagnostic threshold analysis was performed based on the correlation (Spearman’s) between the logit of sensitivity and the logit of [1-specificity]. When a threshold effect occurs, the sensitivity and specificity of the investigated study exhibits negative correlation (or a positive correlation between sensitivity and [1-specificity]). Therefore, a strong positive correlation between sensitivity and [1-specificity] suggests the presence of a threshold effect. When heterogeneity caused by threshold effect was observed, a summary receiver operating characteristic (SROC) curve was plotted. This method was appropriate given that the global sensitivity and specificity values were overestimated. In such cases, analysis of the ROC panel points, as well as analysis of the SROC curve, was recommended. Deeks’ Funnel Asymmetry Plot was used to identify the publication bias.

Results

Characteristics of included studies

A total of 2092 articles were searched by the indexes. After screening the titles and abstracts, 1986 articles were excluded, leaving 106 articles for further selection. During full-text screening, 63 articles were excluded due to the following criteria: unqualified outcomes [7], theoretical research or review [8], and has non clinical outcome [9]. At last, 43 studies [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] with 3133 hips were involved in the final meta-analysis. The selection process was presented in Fig. 1. The main characteristics of the included studies were summarized in Table 1. The basic information included number of hips, age, and gender.

Fig. 1
figure 1

Flow diagram of the literature search and selection process

Table 1 The basic characteristics description of included studies

Diagnostic accuracy

All the included studies reported the results of the accuracy of MRI of early osteonecrosis of the femoral head. Based on the correlation (Spearman’s R = − 0.209, P = 0.589) between the logit of sensitivity and the logit of [1-specificity], there was no threshold effect.

Based on the chi-squared test (Q = 166.45, P = 0.000) and I2 tests (I2 = 74.6%), heterogeneity was high, so we chose the random-effects model to analyze the sensitivity. The global sensitivity was 93.0% (95% CI 92.0–94.0%, Fig. 2). Based on the chi-squared test (Q = 144.43, P = 0.000) and I2 tests (I2 = 70.9%), heterogeneity was high. Therefore, we chose the random-effects model to analyze the specificity, and the global specificity was 91.0% (95%CI 89.0–93.0%, Fig. 3).

Fig. 2
figure 2

Forest plot showing the sensitivity values of MRI of early osteonecrosis of the femoral head

Fig. 3
figure 3

Forest plot showing the specificity values of MRI of early osteonecrosis of the femoral head

Based on the chi-squared test (Q = 125.33, P = 0.000) and I2 tests (I2 = 66.5%), heterogeneity was high, so we chose random-effects model to analyze the positive likelihood ratio, and the global positive likelihood ratio was 2.74 (95% CI 1.98–3.79, Fig. 4). Therefore, a positive MRI result was increased by 2.74-fold in the odds of an accurate diagnosis of patients who actually had early osteonecrosis of the femoral head. Based on the chi-squared test (Q = 69.58, P = 0.005) and I2 tests (I2 = 39.6%), with low heterogeneity, we chose the fixed-effects model to analyze the negative likelihood ration. The global negative likelihood ratio was 0.18 (95% CI 0.14–0.23, Fig. 5), indicating the use of MRI, which was close to zero. Specifically, the odds of a false-positive result were increased by only a factor of 0.18.

Fig. 4
figure 4

Forest plot showing the positive likelihood ratio of MRI

Fig. 5
figure 5

Forest plot showing the negative likelihood ratio of MRI

Based on the chi-squared test (Q = 59.71, P = 0.037) and I2 tests (I2 = 29.7%), heterogeneity was low, so we chose the fixed-effects model to analyze the DOR, with the global DOR being 27.27 (95% CI 17.02–43.67, Fig. 6). And the odds of a positive MRI result were 27.27-fold higher among individuals with early osteonecrosis of the femoral head compared to those without the disease. The area under the SROC was 93.38% (AUC = 93.38%; 95% CI 90.87%–95.89%, Fig. 7), indicating high accuracy.

Fig. 6
figure 6

Forest plot showing the diagnostic odds ratio of MRI of early osteonecrosis of the femoral head

Fig. 7
figure 7

Summary ROC plots for diagnostic accuracy of MRI of early osteonecrosis of the femoral head

Conclusions

Several systematic reviews and meta-analysis have been published concerning the diagnostic accuracy of MRI of early osteonecrosis of the femoral head. Li et al. [51] found that the sensitivity and specificity of MRI were 95%(95% CI 94–96%) and 77%(95% CI 70–83%), respectively. Moreover, the DOR was 31.89%(95% CI 17.32–58.70%), and the AUC under the SROC was 0.9166. MRI was associated with high diagnostic accuracy in the patients with suspected early ANFH. Song et al. [52], who included 21 articles, reported that MRI was more effective than CT in diagnosing ANFH. Significant statistical difference was identified between them (OR, 0.13; 95% CI 0.03–0.51). Su et al. [53], who included 8 studies of 515 patients, found the ANFH positive rate between CT and MRI was statistically significant (OR, 0.12; 95% CI 0.04–0.33), so as the early stage positive rate (OR, 0.45; 95% CI 0.26–0.78). Therefore, MRI appears to be a promising diagnostic tool for avascular necrosis of the femoral head.

However, there were several limitations in this analysis: (1) differences in the inclusion and exclusion criteria for patients, (2) different patients with previous disease and treatments were unavailable, (3) all the included studies were from English and Chinese articles, which may be the source of bias, (4) the fluency of technicians between different studies varied, and (5) pooled data were used for analysis, and individual patients’ data were unavailable, which limited a more comprehensive analysis.

In summary, in this systematic review and meta-analysis, MRI as a diagnostic method is associated with higher accuracy for detecting ANFH. More studies and randomized controlled trails with high-quality and large samples are warranted for further evaluation.