Introduction

After primary TKA, as many as 2% of patients have prosthetic joint infection (PJI) develop; this risk is as great as 5% after revision surgery [3, 26] Accurate diagnosis of periprosthetic infection remains a clinical challenge, particularly in subacute or chronic infections. The evaluation of suspected PJI is characterized by a multimodality workup including microbiologic, laboratory (elevated erythrocyte sedimentation rate, C-reactive protein [CRP]), synovial marker, and histologic tests [35, 57]. Recently, promising results have been reported regarding synovial biomarkers tests, including the alpha defensin immunoassay and synovial fluid CRP tests [5, 54]. However, these test are not yet widely available and their utility has been confirmed in only a few studies [54]. In addition to these diagnostic tests, various imaging techniques including radiographs, ultrasound, CT, MRI, bone, leukocyte, bone marrow, or antigranulocyte scintigraphy, and positron emission tomography (PET) can be used in the assessment of suspected periprosthetic knee infection [10, 11, 29, 31, 57], especially in the case of a challenging diagnosis of a chronic or low-grade infection [4548].

A delay in diagnosing and treating a periprosthetic knee infection can have a critical effect on loosening or maintaining the prosthesis and joint function. Timely identification of a periprosthetic infection is essential to allow initiation of appropriate medical and surgical therapies [49] in which various imaging modalities can contribute when other tests are inconclusive. However, inconsistent diagnostic accuracies across studies investigating periprosthetic knee infection have been published [10, 11, 22]. Consequently, the choice of the most accurate imaging technique remains controversial [11, 31]. To our knowledge, there has been no meta-analysis comparing the most commonly used imaging modalities to evaluate TKA PJI.

The aim of this systematic review and meta-analysis was to compare the diagnostic accuracy of different imaging modalities used for diagnosing periprosthetic knee infection.

Materials and Methods

Search Criteria and Strategy

The imaging modalities that were reviewed for the assessment of periprosthetic knee infection were radiography, ultrasound, CT, MRI, scintigraphy (including bone, antigranulocyte, leukocyte, and bone marrow scintigraphy), and PET.

In June 2015 a computer-aided search of the PubMed and Embase® databases was conducted and updated in January 2016 (Appendix 1. Supplemental material is available with the online version of CORR ®). The search was restricted regarding primary studies that were written in English. For each database, a specific search strategy was developed (Fig. 1) with a medical informatics specialist. Reference lists of the identified studies and relevant reviews were hand-searched for supplementary eligible studies. The search was performed according to the PRISMA Statement (Appendix 2. Supplemental material is available with the online version of CORR ®) [24].

Fig. 1
figure 1

The flowchart shows the search strategy we used for this study.

Study Selection

The following inclusion criteria were used for eligible studies: (1) radiography, ultrasound, CT, MRI, scintigraphy, and PET were used to identify suspected periprosthetic knee infections; (2) a valid reference standard of positive intraoperative culture whether combined with histopathologic evidence regarding acute inflammation of the periprosthetic tissue of surgical débridement or prosthesis removal and/or the presence of a sinus tract that communicates with the prosthesis [8, 13, 29] and/or a clinical followup of at least 6 months; and (3) adequate details to reconstruct a two-by-two contingency table to determine the results of the index tests. Exclusion criteria were (1) animal studies; (2) non-English studies; (3) studies that did not differentiate between various joint replacements; and (4) case reports. Potential overlap of patient populations was assessed when more than one study was selected by the same author or institution by comparing the patient demographics. The study with the largest number of patients was selected when an overlap of patient populations between studies was observed.

The titles were screened for eligibility by one reviewer (SJV) and then processed for abstract assessment. The titles and abstracts were independently screened and assessed in an unblinded standardized manner for eligibility by two reviewers (SJV, RJAS). The final decision regarding inclusion was based on the full article. Disagreement in the evaluation of three studies was resolved with consensus by a third reviewer (OPPT). A priori, no differentiation was made for the type of knee implant, the interpretation criteria used for the index test, or the time between surgery and imaging.

Studies Included

The search strategy identified 3708 studies from MEDLINE and 2864 studies from Embase®. The source population was formed by the total of 6572 studies (including duplicates). In 1933 studies, overlap was found between the retrieved studies from Embase® and MEDLINE. Of the initial 6572 studies, 6433 were excluded after analyzing the information provided in the title and abstract. The full articles of the remaining 139 studies were reviewed for eligibility (Appendix 2. Supplemental material is available with the online version of CORR ®). No other studies were extracted from the reference list of these studies. A total of 116 studies were excluded because the study was not a clinical diagnostic study (32%), did not describe periprosthetic knee infection (12%), was not written in English (17%), did not specify the definition of positivity regarding the index test or applied an insufficient reference standard for periprosthetic knee infection (7%), did not differentiate regarding different prosthetic joint replacements (15%), did not provide data to reproduce two-by-two contingency tables (16%), or the study revealed a potential overlap of the patient population (1%). Eventually, 23 studies were included in this review.

Description of Study Characteristics

Of the 23 studies included for meta-analysis, six used bone scintigraphy [12, 20, 27, 34, 40, 50], four used bone leukocyte scintigraphy [20, 34, 41, 50], six used leukocyte scintigraphy [18, 28, 34, 37, 38, 50], seven used leukocyte bone marrow scintigraphy [2, 7, 9, 16, 17, 21, 34], five used antigranulocyte scintigraphy [14, 15, 40, 44, 52], and five used fluorodeoxyglucose (FDG)-PET [2, 21, 23, 50, 56]. Altogether, a total of 1027 diagnostic images, 404 (39%) with and 623 (61%) without periprosthetic knee infection, were evaluated in 1502 patients with 763 knee prostheses, of which 288 (38%) were infected (Table 1). Of the studies not included for meta-analysis, two studies used ciprofloxacin scintigraphy and one used IgG scintigraphy. No studies were included that used radiographs, ultrasound, CT, MRI, or combined bone and gallium scintigraphy. The two reviewers (SJV, RJAS) independently extracted relevant data of the included studies, which included demographic, implant, and index test characteristics (Table 2). Imaging procedures, image interpretation, and the effects of time after surgery as determined by the publication data (to form subgroups when possible) were analyzed in detail, such as data regarding diagnostic performance indices (eg, sensitivity and specificity).

Table 1 Characteristics of the included studies
Table 2 Characteristics of the reference test(s) and implants

Methodologic Quality Assessment

The criteria list of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) for evaluating internal and external validity of diagnostic studies recommended by the Cochrane Screening and Diagnostic Tests Methods Group (http://methods.cochrane.org/sdt/handbook-dta-reviews) was used for grading the methodologic quality of the selected studies [53]. Evaluation was performed by two reviewers (SJV, RJAS) independently. Internal and external criteria were used for determination of the methodologic limitations, respectively, for descriptive purposes. Studies, however, were not excluded from the systematic review on the basis of quality.

The external validity showed low concerns regarding applicability in more than 85% of the included studies (Fig. 2). The internal validity of the included studies showed more concerns regarding the risk of bias. Approximately 50% of the included studies did not provide sufficient information regarding patient selection, reference standard, and flow and timing.

Fig. 2A–B
figure 2

The methodologic quality of the included studies using QUADAS-2 shows the proportions of studies with high, low, or unclear (A) risk of bias and (B) concerns regarding applicability.

Quantitative Analysis (Meta-analysis)

For the diagnostic modalities, true-positive, false-positive, true-negative, and false-negative results were derived from a two-by-two contingency table. The interpretation criteria with the highest diagnostic accuracy were selected in case multiple interpretation sets for the same index test were used. When studies reported results for more than one observer, the first readers’ findings were included. The statistical heterogeneity of the diagnostic odds ratio (DOR) of each imaging index test across studies was tested using the chi-square test (QDOR) for independence with k-1 degrees of freedom (k = number of studies) [6]. The Spearman rank correlation coefficient ρ value of the DOR was used in case of heterogeneity to measure the correlation between sensitivity and specificity. A ρ of 0.40 or less suggests that the variation between studies may be explained by different cutoff points, or diagnostic thresholds, on a summary receiver operating characteristic curve [6, 25]. The symmetry of funnel plots was visually interpreted to evaluate possible publication bias.

For all included studies, the test of homogeneity for the DOR indicated no statistical heterogeneity. The studies that evaluated bone scintigraphy (six studies, n = 216 knee prostheses), combined bone and leukocyte scintigraphy (four studies, n = 114 knee prostheses), leukocyte scintigraphy (six studies, n = 238 knee prostheses), combined leukocyte and bone marrow scintigraphy (seven studies, 144 knee prostheses), antigranulocyte scintigraphy (five studies, n = 136 knee prostheses), and FDG-PET (five studies, 179 knee prostheses), the Q DOR was 8.09 (5 DOF), 2.90 (3 DOF), 3.52 (5 DOF), 4.69 (6 DOF), 4.70 (4 DOF), and 6.92 (4 DOF), respectively. The funnel plots did not suggest the presence of positive-outcome bias (data not shown).

The sensitivity and specificity were pooled independently and were weighted by the inverse of the variance with use of Meta-DiSc software (Available at: http://www.hrc.es/investigacion/metadisc_en.htm) [55]. The logit-transformed sensitivity, specificity, and corresponding 95% CI of the index tests were compared with use of z-test statistics. A probability less than 0.05 was considered significant (Table 3). In the comparison of two imaging modalities, confidence intervals for two means can overlap and yet the two means can be statistically different from one another at a probability less than 0.05 [1, 19, 36]. The z-test was used to statistically analyze these differences. A secondary analysis was performed to evaluate possible influence of the methodologic quality on the sensitivity and specificity.

Table 3 Comparison of imaging techniques in diagnosing periprosthetic knee infection using the z-test

Results

Bone scintigraphy was less specific (Table 3) than all other modalities tested (56%; 95% CI, 0.47–0.64; p < 0.001), and leukocyte scintigraphy (77%; 95% CI, 0.69–0.85) was less specific than antigranulocyte scintigraphy (95%; 95% CI, 0.88–0.98; p < 0.001) or combined leukocyte and bone marrow scintigraphy (93%; 95% CI, 0.86–0.97; p < 0.001). FDG-PET (84%; 95% CI, 0.76–0.90) was more specific than bone scintigraphy (56%; 95% CI, 0.47–0.64; p < 0.001), and less specific than antigranulocyte scintigraphy (95%; 95% CI, 0.88–0.98; p = 0.02) and combined leukocyte and bone marrow scintigraphy (93%; 95% CI, 0.86–0.97; p < 0.001).

Leukocyte scintigraphy (88%; 95% CI, 0.81–0.93; p = 0.01) and antigranulocyte scintigraphy (90%; 95% CI, 0.78–0.96; p = 0.02) were more sensitive than FGD-PET (70%; 95% CI, 0.56–0.81). However, because of broad overlapping of confidence intervals, no differences in sensitivity were observed among the other modalities, including combined bone scintigraphy (93%; 95% CI, 0.85–0.98) or combined leukocyte and bone marrow scintigraphy (80%; 95% CI, 0.66–0.91; p > 0.05 for all paired comparisons (Table 3).

The secondary analysis, when high risk of bias studies were excluded, showed a higher sensitivity for FDG-PET (93%; 95% CI, 0.80–0.98) that was not different than leukocyte scintigraphy (86%; 95% CI, 0.76–0.93; p = 0.39) and antigranulocyte scintigraphy (91%; 95% CI, 0.78–0.98; 0.18). Combined leukocyte and bone marrow scintigraphy was highly specific (92%; 95% CI, 0.84–0.97) and more specific than bone scintigraphy (55%; 95% CI, 0.45–0.64; p ≤ 0.001) and leukocyte scintigraphy (71%; 95% CI, 0.56–0.84; p = 0.01). However, antigranulocyte scintigraphy (98%; 95% CI, 0.92–0.99) was more specific than all other compared imaging modalities; p < 0.05 for all paired comparisons.

Discussion

In the assessment of suspected periprosthetic knee infection, various diagnostic tests including blood tests, synovial fluid microbiologic analyses, and synovial fluid marker tests (such as alpha defensin and synovial fluid CRP), can be used. However, accurate diagnosis of periprosthetic knee infection remains challenging, especially in chronic or low-grade infections, and inconsistent diagnostic accuracies with various tests across studies have been published [10, 11, 22]. Because of that, imaging tests remain important, but studies do not agree on which imaging technique is the most accurate [11, 31]. Our meta-analysis revealed that in diagnosing periprosthetic knee infection, antigranulocyte scintigraphy and combined leukocyte and bone marrow scintigraphy were highly specific imaging techniques (Fig. 3).

Fig. 3A–B
figure 3

The graphs show the pooled estimates and corresponding 95% CIs for (A) sensitivity and (B) specificity for all index tests. The size of the circles is proportionate to the number of patients investigated by each technique. BS = bone scintigraphy; BS/LS = bone and leukocyte scintigraphy; LS = leukocyte scintigraphy; LS/BMS = leukocyte and bone marrow scintigraphy; AGS = antigranulocyte scintigraphy; FDG-PET = fluorodeoxyglucose positron emission tomography.

Although the included studies showed statistical homogeneity of data, the reliability of the pooled estimates depends on the methodologic quality of the included studies. There are several limitations of this meta-analysis to consider. Collecting large sample sizes of patients with suspected periprosthetic knee infection is difficult; the total number of infected TKAs included in this meta-analysis was only 288. Subsequently, several studies showed wide confidence intervals, because of small numbers of patients who were evaluated using each diagnostic modality. This means that there may have been differences in sensitivity or specificity between certain modalities that we did not detect. Future comparative studies might help resolve this issue. Studies were not excluded on the basis of methodologic quality. Our secondary analysis, with exclusion of studies that showed a high risk of bias, suggested that FDG-PET might be more sensitive than the primary analysis showed; indeed, it may be comparably sensitive to leukocyte scintigraphy and antigranulocyte scintigraphy. The methodologic quality of the included studies did not substantially influence the sensitivity and specificity of other imaging modalities (data not shown). However, there were important concerns regarding the flow and timing of the included studies. Most of the studies often insufficiently described important variables, including types of implants, use of antibiotics, imaging time after surgery, improvement of imaging techniques, and inter- and intraobserver reliability variance. Consequently, analyses of the effect of these variables on the accuracy of imaging was not possible, but could influence the diagnostic performance of the imaging modalities we studied. In addition, the long period evaluated here (1990 to 2015) saw the introduction of numerous new diagnostic tests (such as alpha defensin and synovial fluid CRP) and new diagnostic standards [4], which might have changed the apparent performance of the imaging modalities we studied and how they might be used in practice. The differentiation between acute or chronic infection influences the decision to evaluate a suspected infection with additional imaging, and should be investigated in additional studies.

Another important limitation of the included studies is the lack of uniform criteria for diagnosis of a periprosthetic infection. We could not restrict inclusion to studies using the Musculoskeletal Infection Society criteria [35] because many of the included studies were performed before the development of these criteria. Although a valid reference standard with microbiologic confirmation was a stringent inclusion criterion in this meta-analysis, there is a risk of false-positive diagnosis of infection, which potentially could decrease specificity. When a diagnosis of no infection was considered, clinical followup sometimes was used to monitor the final diagnosis. Only studies with a clinical followup of at least 6 months were included. For obvious reasons, surgery with microbiologic evaluation could not be performed in all patients (patients believed to be without infection did not always undergo surgery). However, this could result in more false-negative results and potentially decrease the reported specificity when an infection is found after the final diagnosis, especially in the case of a low-grade infection.

Our meta-analysis defined test performance for the various imaging modalities when used in isolation. However, multiple diagnostic tests including aspiration results and laboratory tests can contribute in diagnosing periprosthetic infection, which could influence the diagnostic performance of the evaluated imaging techniques, and generally should improve their performance. During the years, important developments have been described in the diagnosis of periprosthetic infection, including the introduction of alpha defensin and synovial fluid tests [5, 54]. When the diagnostic evaluation using synovial fluid markers clearly indicates infection, there is little or no need for additional nuclear imaging tests. However, if those tests cannot be obtained or are inconclusive, nuclear imaging can be used in concert with other elements of diagnostic evaluation, including microbiologic analysis and blood testing, to arrive at a more-precise diagnosis than is possible with imaging or laboratory testing alone. Nuclear imaging seldom is used in isolation, and probably should not be used that way [57].

Using bone scintigraphy during the first years after implantation, postoperative tracer (Table 4) uptake can be caused by various factors and therefore lacks the specificity needed to differentiate between aseptic and septic loosening [10, 32]. Our results (Table 5) confirmed the reputation of high sensitivity and low specificity of this technique [30, 31, 42, 43]. Unfortunately, subgroup analysis of imaging time after implantation could not be performed owing to insufficient data. In clinical practice, imaging often is used to rule out an infection. Bone scintigraphy is widely available and a sensitive tool for evaluation of painful knee prostheses (Fig. 4). However, when confirmation of infection is needed, a positive bone scintigraphy outcome usually leads to a second, more-specific, investigation.

Table 4 Study characteristics of bone scintigraphy for detection of periprosthetic knee infection
Table 5 Diagnostic accuracy of bone scintigraphy for detection of periprosthetic knee infection
Fig. 4A–B
figure 4

The pooled (A) sensitivity and (B) specificity of bone scintigraphy in the assessment of periprosthetic knee infection with 95% CIs are presented.

Leukocyte scintigraphy is assumed to be a more specific-imaging modality and has a long history of use in detection of infections [11, 51]. However, our meta-analysis showed that this technique alone may not be the preferred modality for confirming periprosthetic knee infection, given that it has only moderate specificity (77%) (Table 6). We found that leukocyte scans are very sensitive (88%) (Fig. 5). However, in contrast to bone scintigraphy, leukocyte scintigraphy is a time-consuming procedure with higher costs and therefore may not be the preferred imaging technique to rule out periprosthetic knee infection. The explanation for the moderate specificity may be that labeled leukocytes (Table 7) not only accumulate in infections, but also physiologically in the bone marrow [33]. To reduce the consequent number of false-positive results, leukocyte scintigraphy can be combined with bone marrow scintigraphy (Table 8), which has been proposed as the preferred imaging modality for diagnosing prosthetic joint infections [10, 11, 22, 32]. The current results for knee prostheses confirmed an increased specificity of 93% versus 77% when combining leukocyte with bone marrow scintigraphy (Table 9). Another assessed option to improve specificity (Table 10) was combining leukocyte with bone scintigraphy (Table 11). As expected, specificity did not improve (Fig. 6) [10]. More recently, antigranulocyte scintigraphy was introduced as a less time-consuming alternative for leukocyte scintigraphy with the advantage of in vivo labeling of leukocytes with considerable potential in the detection of infection (Table 12) [10, 11]. We found antigranulocyte scintigraphy (Table 13) to be more specific than leukocyte scintigraphy and FGD-PET (Table 14). However, its role in the assessment of periprosthetic infection is not yet fully established [10]. An important drawback in clinical practice is that neither antigranulocyte scintigraphy nor leukocyte scintigraphy are widely available and used in clinical practice [10].

Table 6 Diagnostic accuracy of leukocyte scintigraphy for detection of periprosthetic knee infection
Fig. 5A–B
figure 5

The pooled (A) sensitivity and (B) specificity of leukocyte scintigraphy in the assessment of periprosthetic knee infection with 95% CIs are shown.

Table 7 Study characteristics of leukocyte scintigraphy for detection of periprosthetic knee infection
Table 8 Study characteristics of combined leukocyte and bone marrow scintigraphy
Table 9 Diagnostic accuracy of combined leukocyte and bone marrow scintigraphy for detection of periprosthetic knee infection
Table 10 Diagnostic accuracy of combined bone and leukocyte scintigraphy for detection of periprosthetic knee infection
Table 11 Study characteristics of combined bone and leukocyte scintigraphy for detection of periprosthetic knee infection
Fig. 6A–B
figure 6

The pooled (A) sensitivity and (B) specificity of combined bone and leukocyte scintigraphy in the assessment of periprosthetic knee infection with 95% CIs are presented.

Table 12 Study characteristics of antigranulocyte scintigraphy for detection of periprosthetic knee infection
Table 13 Diagnostic accuracy of antigranulocyte scintigraphy for detection of periprosthetic knee infection
Table 14 Diagnostic accuracy of fluorodeoxyglucose-positron emission tomography for detection of periprosthetic hip infection

FDG-PET is increasingly used and has proposed potential in the diagnosis of PJI, especially regarding hip arthroplasty [10, 39, 51, 58]. Although this technique offers advantages such as time efficiency, increased resolution, and the use of low-dose CT, our results revealed that this technique was less specific in diagnosing periprosthetic knee infection than combined leukocyte and bone marrow scintigraphy and antigranulocyte scintigraphy (Fig. 7). Some investigations concluded that uptake patterns rather than intensity in the bone-prosthesis interface are specific in diagnosing periprosthetic infection (Table 15) [2, 56]. In particular, the sensitivity of 70% is only moderate (Fig. 8) and was lower than the sensitivity of leukocyte or antigranulocyte scintigraphy (Fig. 9). However, our secondary analysis revealed that FDG-PET was highly sensitive (93%) when low-quality studies were excluded [21, 23], which is not less sensitive than the other imaging techniques evaluated. This should be considered further in well-designed studies. The specificity was not higher than that of combined leukocyte and bone marrow scintigraphy and antigranulocyte scintigraphy. An important drawback of FDG-PET is the high cost compared with other imaging modalities. Therefore, FDG-PET may not be the preferred imaging modality in the evaluation of a suspected infected knee prosthesis.

Fig. 7A–B
figure 7

The pooled (A) sensitivity and (B) specificity of combined leukocyte and bone marrow scintigraphy in the assessment of periprosthetic knee infection with 95% CIs are presented.

Table 15 Study characteristics of fluorodeoxyglucose positron emission tomography for detection of periprosthetic hip infection
Fig. 8A–B
figure 8

The pooled (A) sensitivity and (B) specificity of FDG-PET in the assessment of periprosthetic knee infection with 95% CIs are shown. FDG-PET = fluorodeoxyglucose-positron emission tomography.

Fig. 9A–B
figure 9

The pooled (A) sensitivity and (B) specificity of antigranulocyte scintigraphy in the assessment of periprosthetic knee infection with 95% CIs are shown.

This meta-analysis revealed that, based on current evidence, antigranulocyte scintigraphy and combined leukocyte and bone marrow scintigraphy were highly specific in confirming periprosthetic knee infection. However, the time-consuming procedures and limited availability are important drawbacks of these techniques. Bone scintigraphy was highly sensitive but lacks the specificity in differentiating between various conditions of painful knee prostheses. FDG-PET may not be the preferred imaging modality because it is more expensive and not more effective in confirming infected knee prostheses. In practice, other tests should be used in concert with the evaluated imaging modalities to arrive at more-sensitive and specific diagnostic decisions than are possible with imaging or laboratory testing alone. Future, larger prospective studies should assess the utility of imaging in the diagnostic algorithm of a suspected periprosthetic knee infection, providing more data to evaluate important variables, including the differentiation between acute and chronic infections.