Introduction

As a serious complication after knee and hip arthroplasties, periprosthetic joint infection (PJI) has been regarded as the main contributor to joint arthroplasty failure (20.4%) [1] and revision after joint arthroplasty [2]. Studies revealed that the number of patients who received arthroplasty resulted from PJI is five times more than patients who did not receive an arthroplasty for a PJI [3, 4]. PJI was also associated with an increased morbidity and mortality rate, and it therefore significantly increased the economic burden on the healthcare system [5, 6]. Early and accurate detection of PJI after knee and hip arthroplasty has become an important approach to minimize the risk caused by PJI. Unfortunately, it remains a challenge for early and accurately detecting PJI due to optimal diagnostic method is not available [7].

The current standard for the diagnosis of prosthetic joint infection is diagnostic criteria, and a series of tests have been offered to comprehensively detect PJI hip and/or knee arthroplasty [8]. It is noted that diagnostic tests based on biomarkers were found to have good diagnostic value in the early and accurate detection of PJI hip and/or knee arthroplasty in recent years, such as C-reactive protein (CRP) [9], α-defensin [10], leukocyte esterase [11], and interleukin-6 (IL-6) [12], and these biomarkers are a component of the diagnostic criteria. Nevertheless, meta-analysis indicated an inadequate overall diagnostic accuracy [13]. Among the existing biomarkers, the role of IL-6 in detecting PJI has been extensively investigated, and the clinical significance of IL-6 in distinguishing between infected and aseptic failed total joint replacements has also been suggested [14].

Currently, two meta-analyses [15, 16] have evaluated the diagnostic accuracy of IL-6 in detecting PJI and indicated the excellent diagnostic value of IL-6 in detecting PJI. However, different sources of IL-6 were speculated to be associated with different diagnostic accuracies; a previous meta-analysis was therefore conducted to determine the difference between serum and synovial fluid IL-6 in detecting PJI after hip, knee, and/or shoulder replacement. However, it is unclear whether these findings were suitable to patients only receiving hip and/or knee replacements. Moreover, numerous studies continued to focus attention on this issue after previous meta-analysis because a definitive conclusion as not yet been achieved for specific population. We therefore performed this systematic review and meta-analysis of diagnostic test accuracy (DTA) studies to further evaluate the diagnostic accuracy of serum and synovial fluid IL-6 in detecting of PJI after hip and/or knee arthroplasty through combining available studies.

Materials and methods

We performed this diagnostic meta-analysis according to the methods recommended by the Cochrane handbook [17] and reported it in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) extension for Diagnostic Test Accuracy (DTA) [18]. No ethical approval and patient’s informed consent was required because this is a meta-analysis of previously published studies. We did not register the formal protocol in any public platform.

Study retrieval

We systematically searched 3 databases including PubMed, EMBASE, and the Cochrane library for identifying relevant studies from their inception through to 31 December, 2021. Study search was performed by two qualified independent investigators (** and **). We used the following terms and its analogs to develop search query with Boolean operators: “periprosthetic joint infection”, “interleukin-6”, and “joint prosthesis”. The sensitivity of the search query was modified according to the unique requirements of each database. Additionally, we screened reference lists of eligible studies and published reviews to avoid missing relevant studies. No restriction on publication language and status was applied to literature retrieval. The third senior investigator (**) was consulted for resolving any disagreements during study retrieval. Details of the search query are summarized in Additional file 1: Table S1.

Table 1 Basic characteristics of 30 studies included in this diagnostic meta-analysis

Selection criteria

We selected eligible studies according to the following criteria [16, 19]: (a) eligible patients were identified with PJI with recognized diagnostic criteria after hip and/or knee replacement; (b) serum or synovial fluid sample was obtained for diagnostic investigation in eligible studies; (c) studies reported the data of true positives (TP), false positives (FP), false negatives (FN), true negative (TN) or the sensitivity and specificity. We excluded ineligible studies according to the following criteria: (a) studies were performed to investigate the diagnostic value of IL-6 in detecting PJI after shoulder and/or elbow replacement; (b) ineligible study design such as narrative review, animal studies, or case report; (c) no control group was designed or direct comparison between samples were not available; (d) repeated reports focusing on the same topic published by the same group but with insufficient data and relatively poor quality; and (e) essential data were not accessible in original studies and additional data could not be added through contacting the leading author.

Study selection

Two independent qualified investigators selected eligible studies based on the selection criteria as follows: (a) all records identified from 3 electronic databases were imported into EndNote X9 to build literature database; (b) after removal of duplicate records, the titles and abstracts of retained records were screened; and (c) eligibility of the remaining studies was evaluated finally based on the screening for full-text. The third senior investigator was consulted for resolving any disagreements during study selection.

Data extraction

Two independent qualified investigators extracted the essential data from original studies, including the name of the first author, publication year, country, study design, the number of patients included for the final analysis, the number of PJI and aseptic cases, the diagnostic criteria of PJI, and the part of infected joint (knee, hip or mixed parts), cut-off value, and the diagnosis result including the numbers of TP, FP, FN, and TN, or sensitivity and specificity. We contacted the leading author to collect essential data if necessary. The third senior investigator was consulted for resolving any disagreement during data extraction.

Quality assessment

Two independent qualified investigators (** and **) evaluated the risk of bias and concerns about applicability of the included studies using the Quality Assessment for Studies of Diagnostic Accuracy Score (QUADAS) tool [20]. This tool determined the methodological quality of each study from four domains, including patient selection, index test, reference standard, and flow and timing. We assessed the risk of bias for all domains and applicability for the first three domains. We rated each domain as “low,” “unclear,” and “high” risk. The third senior investigator (**) was consulted for resolving any disagreement about quality assessment.

Statistical analysis

We firstly calculated the TP, FP, FN, and TN based on the available information extracted from the original studies, quantitative indicators with corresponding 95% confidence interval (CI) were then estimated for evaluating the diagnostic value of the serum and synovial fluid IL-6 for the detection of PJI after knee or hip arthroplasty, including the pooled sensitivity and specificity, the positive likelihood ratio, negative likelihood ratio, diagnostic odds ratio (DOR), and the area under the summary receiver operating characteristic (SROC) curve (AUC) [21, 22]. Meanwhile, we evaluated statistical heterogeneity across studies based on the χ2 test and I2 statistics, and I2 ≥ 50% suggested the presence of substantial heterogeneity [23]. Nevertheless, we used the random-effects model to perform data synthesis because variations between studies should not be ignored in real settings. Additionally, we performed the subgroup analysis to furtherly investigate the influence of various characteristics on the diagnostic accuracy of serum and synovial fluid IL-6 test for the diagnosis of PJI. Finally, we created the Deek’s funnel plot to evaluate the risk of publication bias [24]. Data analysis was performed by using STATA 14.0 (StataCorp, College Station, TX, USA) with the “midas” module.

Results

Literature retrieval

We identified 515 records from 3 databases. A total of 133 duplicate records were removed using EndNote software. We excluded 332 ineligible studies after screening the titles and abstracts. Among 50 studies retained for further eligibility evaluation, 20 studies were excluded due to ineligible participants (n = 7), ineligible study design (n = 3), insufficient data (n = 5), and ineligible test (n = 5). Finally, 30 studies[12, 14, 25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45] were judged for meeting selection criteria, including 23 reports for serum IL-6 and 14 reports for synovial fluid IL-6. The flow diagram of the study selection is displayed in Fig. 1.

Fig. 1
figure 1

The PRISMA flowchart of study selection

Basic characteristics of eligible studies

We designed Table 1 to summarize the basic characteristics of 30 eligible studies. Totally, 3218 patients were accumulated, involving 1223 patients with PJI and 1995 patients with aseptic loosing. All studies were published between 2005 and 2021. Among the included studies, 9 studies [12, 29, 44,45,46,47,48,49,50] performed in China, 7 studies [27, 30,31,32, 36, 51] in USA, 5 studies [14, 26, 35, 37, 39] in Germany, 2 studies [25, 33] in Egypt, 2 studies [38, 41] in Austria, and remaining studies in Argentina [28], Poland [40], Turkey [34], UK [52], Slovenia [42], and Sweden [43], respectively. Four studies [12, 28, 43, 52] enrolled patients receiving hip replacement, but remaining studies [14, 25,26,27, 29,30,31,32,33,34,35,36,37,38,39,40,41,42, 44,45,46,47,48,49, 51, 52] included patients undergoing both hip and knee replacements. Twenty studies [12, 14, 25, 26, 29, 31, 34,35,36,37,38, 41, 44,45,46,47,48,49,50,51] clearly reported to use the Musculoskeletal Infection Society criteria (MSIS) criteria for the diagnosis of PJI. The methodological quality of included studies was moderate, which is shown in Fig. 2.

Fig. 2
figure 2

Quality assessment of the included studies

Diagnostic accuracy of serum and synovial fluid IL-6

A total of 23 reports evaluated the diagnostic accuracy of serum IL-6 for detecting PJI after hip and/or knee replacement, and meta-analysis suggested that, as shown in Fig. 3a, the pooled sensitivity and specificity was 0.76 (95%CI 0.69–0.81) and 0.88 (95%CI 0.82–0.92), respectively. Meanwhile, serum IL-6 reached a relatively high diagnostic accuracy, with an AUC of 0.88 (95%CI 0.85–0.91), which is shown in Fig. 4a. For the evaluation of diagnostic accuracy of synovial fluid IL-6, 14 reports were provided for data analysis. Meta-analysis suggested that, as shown in Fig. 3b, the pooled sensitivity and specificity were 0.87 (95%CI 0.75–0.93) and 0.90 (95%CI 0.85–0.93), respectively. Additionally, serum IL-6 received a higher diagnostic accuracy, with an AUC of 0.94 (95%CI 0.92–0.96) (Fig. 4b). This indicated that the diagnostic performance of synovial fluid IL-6 for PJI was superior to serum IL-6.

Fig. 3
figure 3

Forest plot of the pooled sensitivity and specificity of serum IL-6 a and synovial fluid IL-6 b for diagnosis of periprosthetic joint infection

Fig. 4
figure 4

SROC curve of serum IL-6 a and synovial fluid IL-6 b for diagnosis of periprosthetic joint infection

Evaluation of the clinical utility

Meta-analysis suggested that, as shown in Fig. 5, serum IL-6 achieved a positive likelihood ratio of 6.2 (95%CI 4.3–9.0) and a negative likelihood ratio of 0.28 (95%CI 0.22–0.35) for detecting PJI; however, synovial fluid IL-6 achieved a positive likelihood ratio of 8.5 (95%CI 5.3–13.6) and a negative likelihood ratio of 0.15 (95%CI 0.08–0.29). Moreover, the DOR of serum and synovial fluid IL-6 was 22 (95%CI 14–36) and 57 (95%CI 21–156), respectively. We designed 50% pre-test probabilities to estimate the post-test probability in this study, and a post-test probability of 22% was achieved for PJI in serum IL-6 test and 13% in synovial IL-6 test. This suggested synovial IL-6 test was linked to higher clinical utility compared to serum IL-6 test.

Fig. 5
figure 5

Fagan's nomogram of the post-test probability of IL-6 for diagnosis of periprosthetic joint infection based on the pre-test probability of 50% in serum IL-6 a and synovial fluid IL-6 b

Subgroup analysis

Substantial heterogeneity was observed for both serum and synovial fluid IL-6 tests. Subgroup analyses were therefore performed to check the robustness of pooled results based on pre-designed criteria, including exclusion of chronic inflammatory diseases or not, the part of affected joints (mixed joint replacements or hip replacement), the diagnostic criteria for PJI (MSIS criteria or others), study design (prospective or retrospective), the number of patients included for analysis (60 for serum IL-6 and 80 for synovial fluid IL-6), and the cut-off criteria (10 pg/ml for serum IL-6 and 2300 pg/ml for synovial fluid IL-6). As shown in Additional file 2: Table S2, subgroup analysis confirmed the robustness of pooled results. However, it is noted that the number of patients included for analysis and the cut-off criteria for PJI might have an impact on the diagnostic accuracy.

Publication bias

Deek’s funnel plot suggested that the studies were symmetrically located on both sides of the regression line. Furthermore, the asymmetric test for Deek’s plot quantitatively disclosed absence of publication bias for both serum and synovial fluid IL-6, with a P value of 0.27 and 0.61, respectively (Fig. 6).

Fig. 6
figure 6

Deek's funnel plot of serum IL-6 a and synovial fluid IL-6 b

Discussion

Although IL-6 has been suggested to have relatively high diagnostic value in detecting PJI, and difference between serum and synovial fluid IL-6 tests in detecting PJI after hip, knee and shoulder replacements has also been initially evaluated, there was no systematic review and meta-analysis to evaluate the diagnostic accuracy of serum and synovial fluid IL-6 tests in detecting PJI among patients who underwent only hip and knee replacements. In this systematic review and meta-analysis of 30 DTA studies, we evaluated the diagnostic value of both serum and synovial fluid IL-6 tests for detecting PJI after hip and knee replacements, and found that synovial fluid IL-6 might be preferentially prescribed for detecting PJI after hip and knee replacements owing to its higher sensitivity, specificity, DOR, and diagnostic accuracy. Certainly, serum IL-6 might also be considered for detecting PJI after hip and knee owing to its comparable specificity to synovial fluid IL-6 because the volume of synovial fluid is limited and the acquisition of synovial fluid IL-6 is invasive. Moreover, more studies are required to further determine the optimal cut-off value of both serum and synovial fluid IL-6 tests because a negative association between diagnostic accuracy of IL-6 and cut-off criteria has been revealed.

Up to now, numerous systematic reviews and meta-analyses of DTA studies evaluated the diagnostic value of IL-6 in detecting PJI; however, a definitive conclusion has not yet been achieved due to several limitations, which were further explained in the following contents. In 2017, Lee et al. [13] found that synovial fluid IL-6 test was associated with higher diagnostic accuracy, indicating an AUC, sensitivity and specificity of 0.95, 0.81 and 0.94, respectively. In the same year, another meta-analysis reported a similar diagnostic accuracy, with an AUC of 0.956 [53]. It is noted that these findings were calculated from 5 [13] or 7 [53] eligible studies although they achieved consistent diagnostic accuracy with our meta-analysis; however, 14 reports were included in this study to generate more robust and reliable results. In 2020, a meta-analysis by Li et al. [54] found that serum IL-6 achieved a pooled sensitivity, specificity and DOR of 0.87, 0.83, and 36.27, respectively, as well as an AUC of 0.92. Insufficient eligible studies impaired the reliability of pooled results although a higher diagnostic accuracy was achieved compared with our study, in which 23 eligible reports were included for the estimation of diagnostic accuracy.

In 2017, Xie et al. performed a systematic review and meta-analysis of 17 DTA studies to investigate the relative diagnostic values of serum and synovial fluid IL-6 tests for PJI after hip, knee and shoulder replacements [19], suggesting that synovial fluid IL-6 test had a higher AUC (0.96 vs. 0.83) and DOR (101 vs. 20) as well as comparable specificity (0.90 vs. 0.89) relative to serum IL-6 test. Based on these results, authors concluded that serum IL-6 may be regularly prescribed for detecting PJI owing to its relatively high specificity although it had less sensitive than synovial fluid IL-6 test. Yoon et al. [15] included 16 eligible DTA studies for data analysis and found that IL-6 achieved an AUC of 0.93 in detecting PJI after hip, knee, shoulder, and/or elbow replacement, with a pooled sensitivity and specificity of 0.83 and 0.91, respectively. Meanwhile, Tian et al. performed a diagnostic meta-analysis, and pooled results suggested a higher diagnostic accuracy (0.91), corresponding to a sensitivity of 0.80 and a pooled specificity of 0.89 [16].

Compared with previous systematic reviews and meta-analyses of DTA studies, our study minimizes the variation in patient’s characteristics because only patients receiving hip and/or knee replacement were considered. Moreover, this was the first meta-analysis to determine the diagnostic value of serum and synovial fluid IL-6 in detecting PJI after hip and/or knee replacement through combining the greatest number of eligible studies. As an example, our study confirmed that synovial fluid IL-6 test was associated with higher DOR compared with serum IL-6 test, which was directionally consistent with previous meta-analysis. Specificity speaking, a pooled DOR of 57 with a corresponding 95%CI of 21 to 156 was generated in our study, while previous systematic reviews and meta-analyses of DTA studies reported a pooled DOR of 101, with a 95%CI of 28 to 358.

Our systematic review and meta-analysis of 30 DTA studies had some limitations. First, although we selected a random-effects model to calculate results, it is still a fact that substantial heterogeneity was present for both serum and synovial fluid IL-6 tests, which might lower the reliability of our findings. However, we believed that all findings from the present study could be preferentially considered in clinical practice because subgroup analysis confirmed the robustness of our results. Second the formal protocol of this systematic review and meta-analysis of DTA studies was not registered in any platform. However, we strictly performed data analysis and reported pooled results according to the recommended criteria, which greatly enhance the strictness and reliability of this study. Third, we included the greatest number of eligible studies to update the diagnostic accuracy of serum and synovial fluid IL-6 based on a systematic literature retrieval; however, the risk of missing eligible studies could not be avoided because other databases such as Web of Science and China National Knowledge Infrastructure (CNKI) were not retrieved. Fourth, we performed a series of subgroup analyses to investigate the contribution of some important factors to statistical heterogeneity; however, statistical heterogeneity was not significantly reduced after these subgroup analyses. Therefore, our findings should be interpreted with caution, as further analysis cannot be performed based on other potential factors. Finally, it must be acknowledged that distinguishing the diagnostic values of serum or synovial IL-6 for different types of PJI is important because chronicity of the PJI makes a different diagnosis criterion. However, most of the eligible studies did not provide details on PJI type, so further analysis by PJI type was not possible.

Conclusion

Synovial fluid IL-6 test has significantly higher diagnostic accuracy for the detection of PJI among patients undergoing hip and/or knee replacement. Although serum IL-6 test is less sensitive than synovial fluid IL-6 test, it can also be considered for patients with prosthetic failure due to its comparable specificity to synovial fluid IL-6 test. Certainly, our findings should be interpreted with caution due to significant statistical heterogeneity. Moreover, more studies should be performed to determine the optimal cut-off value because it has a negative impact on the diagnostic accuracy of IL-6 in patients undergoing hip and/or knee replacement.