Systematic Review on Reproducibility of Nuclear Imaging in the Assessment of Painful Hip and Knee Replacements

Nuclear imaging is used in the evaluation of painful arthroplasties for diagnosing loosening or periprosthetic joint infection (PJI). The purpose of this systematic review is to depict the reproducibility of the various nuclear imaging modalities used in the assessment of painful hip and knee arthroplasties. A systematic review of the literature was conducted with a comprehensive search of MEDLINE to identify clinical studies investigating the intra-and inter-observer agreement of nuclear imaging modalities in diagnosing PJI and mechanical loosening. A total of 3.000 studies, published between 2004 and 2020, were reviewed and 16 studies met the inclusion criteria. Quality assessment identified concerns with regard to the internal validity in approximately 40% of the included studies. A meta-analysis could not be performed because of insufficient available clinical data. The intra-observer agreement was poorly investigated. The included studies reported an inter-observer agreement of “slight” to “excellent” for bone scintigraphy, “moderate” to “substantial” for leukocyte scintigraphy, “substantial” to “almost perfect” for combined leukocyte and bone marrow scintigraphy, and “fair” to “substantial” for anti-granulocyte scintigraphy. Hybrid imaging with SPECT/CT and FDG-PET/CT demonstrated “substantial” and “almost perfect” inter-observer agreement for symptomatic hip prostheses, respectively. The reproducibility of nuclear imaging is underreported in clinical studies investigating painful knee and hip arthroplasties. Moreover, the included studies demonstrated methodological concerns with a high risk of bias. The available evidence demonstrated a wide range of inter-observer agreement using scintigraphy. Hybrid imaging with SPECT/CT and FDG-PET/CT may improve the accuracy of interpretation and reproducibility. However, literature provides limited data to support this assumption.


Introduction
The incidence of complications after total hip and knee replacements is low, but due to the increased frequency of implantations, complications are commonly encountered and require an appropriate diagnostic algorithm. Among the variety of complications (including heterotopic ossification, pseudotumor, inflammatory disorders, hardware failure, and foreign body granulomatosis causing osteolysis), the two major complications are mechanical (aseptic) loosening and periprosthetic joint infection (PJI). PJI is considered one of the most devastating prostheses-related complications. However, reliable differentiation between PJI and aseptic loosening is often difficult since both may be accompanied by similar symptoms.
The reported diagnostic accuracies of the nuclear imaging modalities in the assessment of symptomatic arthroplasties reflect the agreement of the interpretation of the scan (index test) with the reference standard However, understanding the variables which are of influence to the interpretation of a scan is important for the outcome of (false) positive or (false) negative results. These variables are related to several domains such as the index test, the reference standard, and observerrelated factors. Important variables related to the subject (the painful arthroplasty) are the time interval between imaging and surgery, location and type of implant, and severity of the infection. For example, the interpretation of a low grade PJI could be more challenging for the observer and lead to more false negative results compared with diagnosing an acute PJI. The type of imaging modality (index test), the performed imaging protocol, and the used diagnostic criteria could also affect the interpretation and outcome. For example, a variety of diagnostic criteria was reported for various nuclear imaging modalities which could lead to different diagnostic accuracy outcomes regarding the investigated imaging modality [4,5]. Finally, in the interpretation of the scans, the background of the observer such as the level of expertise and experience may also affect the diagnostic outcome. These factors not only could affect the reported diagnostic accuracy but also may introduce variations in intra-and inter-observer agreement. These variations are important and potential compromising factors for the reproducibility and therefore the clinical application of the imaging modalities in the assessment of symptomatic arthroplasties.
The (statistical) reliability of an imaging modality is defined by the degree of agreement when different observers use the same imaging technique, classification, and procedure to assess the same subject. Each of the nuclear imaging modalities is characterized by specific benefits and drawbacks in diagnosing PJI or aseptic loosening. For example, scintigraphy is potentially hampered by non-specific tracer uptake which could lead to unsatisfactory results of inter-observer reliability. In literature, it has been suggested that the interpretation of nuclear imaging could be improved by more recently introduced hybrid techniques as SPECT/CT and PET/CT [9,10]. For other pathologies, including infectious bone diseases, the improved diagnostic accuracy and reproducibility of hybrid imaging have been well established [11]. However, in the assessment of symptomatic arthroplasties, this improvement in terms of reproducibility has not been well defined in literature.
The level of reproducibility and variables that could influence the individual interpretation of images is not always clear and therefore a suitable subject for systematically literature investigation. To the best of our knowledge, no systematic review that investigated this topic has been published before. The objective of this study is to systematically review the reproducibility of currently used nuclear imaging modalities in the evaluation of symptomatic hip and knee arthroplasties.

Search Criteria
The imaging modalities that were reviewed for the assessment of painful arthroplasties were planar scintigraphy, PET, SPECT, and combined techniques.

Search Strategy
A computer-aided search of the PubMed databases was conducted in June 2019 and updated in April 2020. The search was restricted to primary studies that were written in English. For each database, a specific search strategy was developed ( Fig. 1) with an informatics specialist. Reference lists of the identified studies and relevant reviews were hand-searched for supplementary eligible studies. The search was performed according to the PRISMA guideline (Preferred Reporting Items for Systematic reviews and Meta-Analyses) Statement [12].

Study Selection
The following inclusion criteria were used to identify eligible, clinical studies: (1) planar scintigraphy, SPECT, or PET had been used to identify suspected PJI or loosening of hip and/or knee arthroplasties. (2) The study used a valid reference standard for PJI (criteria by the Musculoskeletal Infection Society and/or the criteria of Infectious Diseases Society of America) [1,13] or aseptic loosening with preoperative findings (3), a clinical follow-up interval of at least 6 months when surgery was not performed. (3) The study reported the calculated intraand inter-observer agreement with multiple observers. Exclusion criteria were (1) nonhuman studies, (2) non-English language studies, or (3) studies in which differentiation between osteomyelitis without arthroplasties and PJI was not possible. There was no restriction regarding the date of publication.
The titles were screened for eligibility by 1 reviewer (S.V.) and were then processed for abstract assessment. The titles and abstracts were independently screened and assessed for eligibility in an unblinded standardized manner by 2 reviewers (S.V. and J.K.) Studies considered to be of dubious eligibility were rejected. The final decision on inclusion was made on the basis of the full text of the article.

Methodological Quality Assessment
The criteria list of QUADAS2 for evaluating internal and external validity of diagnostic studies recommended by the Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests was used for grading the methodological quality of the selected studies [14]. Evaluation was performed by the two reviewers (S.V. and J.K.) independently. The internal and external validity were evaluated and the criteria of QUADAS2 were used for the determination of the methodological limitations for descriptive purposes. All studies that met the inclusion criteria were included and, a priori, no studies were excluded from the systematic review on the basis of quality. A quantitative analysis could not be performed due to insufficient available data.

Included Studies
The search strategy identified 2.994 studies from MEDLINE. There was no overlap found between the retrieved studies. No additional studies were identified through other sources. Of the initial 2,994 studies, 2,761 were excluded after analyzing the information provided in the title and abstract. The full articles of the remaining 233 studies were reviewed for eligibility. Six studies were extracted from the reference lists of these studies (n = 239). There was no disagreement between the reviewers regarding the definitive inclusion of the studies. The main reason for exclusion was the absence of intra-and inter-observer agreement calculation in the use of imaging for diagnosing PJI or aseptic loosening (n = 179). Other reasons for exclusion were the absence of a valid reference standard (n = 5), no differentiation between osteomyelitis and periprosthetic complications (n = 3), or absence of symptomatic arthroplasties inclusion (n = 36). Meta-analysis was not performed due to insufficient available clinical data. In general, no differentiation a priori was made for the type of implant, the interpretation criteria used for the index test, or the time period between surgery and imaging ( Table 1).

Description of Study Characteristics
A total of 16 studies, published between 2004 and 2020, were included for systematic review. Two studies reported the interobserver agreement for nuclear imaging in the assessment of aseptic loosening of hip and knee arthroplasties [17,18]. In 4 studies, the results of reproducibility were reported without differentiation between PJI and aseptic loosening using bone scintigraphy [7,[19][20][21]. Of those studies, one reported the reproducibility of bone scintigraphy combined with SPECT/ CT [7]. The intra-or inter-observer agreement of nuclear imaging in the assessment of PJI was investigated by 10 studies. Of those studies, 2 studies used leukocyte scintigraphy [22,23], 1 study used combined leukocyte and bone scintigraphy  [24], 3 studies used combined leukocyte and bone marrow scintigraphy [24][25][26], 3 studies used anti-granulocyte scintigraphy [15,16,27], 1 study used combined anti-granulocyte and bone marrow scintigraphy [27], and 3 studies used FDG-PET/CT [19,28,29]. The characteristics of the studies are detailed in Table 2.

Qualitative Analysis
The external validity showed high concerns with regard to applicability to clinical practice in approximately 40% of the included studies for the domain "patient selection." The internal validity of the included studies showed more concerns with regard to the risk of bias. Approximately 40% of the included studies did not provide sufficient information regarding the domains "index test" and "patient selection" (Fig. 2) More than 70% of the studies did not provide detailed data regarding the domain "flow and timing." The details of the internal and external validity are shown in Fig. 3.

Bone Scintigraphy and Combined Techniques
Of the 6 studies that investigated the use of bone scintigraphy, one study reported "excellent" (ICC = 0.89) intra-observer agreement of two observers in the assessment of the symptomatic hip [21]. In the assessment of aseptic loosening of THA, a "fair" (ICC = 0.43) and "good" (ICC = 0.67) interobserver agreement was reported for the acetabular and femoral component, respectively [17,18]. No significant differences between the cemented and uncemented prostheses was found. In 4 studies, the reproducibility was reported without differentiation between PJI and aseptic loosening. For the symptomatic hip, two studies reported a kappa value for inter-observer agreement of 0.11 (slight) [19] and an ICC of 0.81 (excellent) [21]. One study did not differentiate between hip and knee replacements and reported a kappa of 0.92, 0.81, and 0.80 for the blood flow, blood pool, and bone phase, respectively (overall kappa was not reported) [20]. One study combined bone scintigraphy with SPECT and reported a "fair" (κ = 0.27) and "moderate" (κ = 0.60) inter-observer agreement in the assessment of THA and TKA respectively [7]. The same study found a "moderate" (κ = 0.58) and "substantial" (κ = 0.80) inter-observer agreement when bone scintigraphy was combined with SPECT/CT in the assessment of THA and TKA respectively.

Leukocyte Scintigraphy and Combined Techniques
One study [23] reported the inter-observer agreement of leukocyte scintigraphy in diagnosing PJI and reported "moderate" to "substantial" as a result of interpretation by three observers (κ = 0.55, 0.60, and 0.74). There was no significant difference of concordance in 24% (n = 55) and 33% (n = 40) of the hip and knee group, respectively. Combined leukocyte and bone marrow scintigraphy demonstrated in one study "almost perfect" inter-observer agreement of three pairs of observers in the assessment of TKA and THA together [24]. For the assessment of hip arthroplasty, a kappa value of 0.74 (substantial) was reported by another study [26]. None of the 5 studies that investigated the use of leukocyte scintigraphy reported the intra-observer agreement.

Anti-granulocyte Scintigraphy and Combined Techniques
The 3 studies [15,16,27] that investigated the use of antigranulocyte scintigraphy in the evaluation of PJI reported "excellent" inter-observer agreement for TKA/UKA (κ = > 0.75), "fair to moderate" for THA/TKA together (κ = 0.31-0.52), and "fair to excellent" agreement depending on the grade of infection (mild to severe). One study [27] reported the antigranulocyte scintigraphy combined with bone-marrow scintigraphy and reported kappa values of 0.55-0.72 without differentiation between THA and TKA. None of the 3 studies reported the intra-observer agreement of anti-granulocyte scintigraphy in the assessment of PJI.

FDG-Pet/Ct
Of the 3 studies that investigated the use of PET/CT in the evaluation of PJI, two studies reported the inter-observer agreement of the tracer 18 F-FDG. Stumpe et al. reported "moderate" inter-observer agreement with (κ = 0.47) [19]. Verberne et al. reported for the inter-observer agreement for 4 different criteria for PJI with kappa values ranging from 0.77-0.85 [29]. One study reported "substantial" (κ = 0.67) inter-observer agreement when 18 F-PET/CT was combined with 18 Fluoride. None of the studies reported the intraobserver agreement for the use of PET/CT in the assessment of PJI. Table 1 The measurements of observer agreement categorical data k < 0 reflects "poor" 0 to 0.20 "slight" 0.21 to 0.4 "fair" 0.41 to 0.60 "moderate" 0.61 to 0.8 "substantial" above 0.81 "almost perfect" The measurement of observer agreement for categorical data [15] Icc correlation [

Discussion
Various nuclear imaging modalities can contribute in the challenging diagnosis and differentiation between PJI and mechanical loosening of hip and knee replacements. The reproducibility of these imaging modalities is an important factor for the clinical usability and could concern a major limitation in the evaluation of symptomatic joint replacements and will be discussed in the section below.

Bone Scintigraphy and SPECT/CT
Bone scintigraphy is known as a highly sensitive imaging modality for diagnosing aseptic loosening. This modality is widely available and can exclude any form of loosening. 99m Tc-labeled bisphosphonates are the most commonly used radiopharmaceuticals to depict osteoblastic activity. In patients with mechanical loosening of the hip or knee replacement, negative perfusion and blood pool but positive osseous phases are often interpreted as loosening. Throughout literature, one investigation group [17,18] reported the diagnostic accuracy of bone scintigraphy for the acetabulum and femoral component of THA and reported "fair" and "good" interobserver agreement, respectively. No substantial differences between cemented and uncemented prostheses were found. However, the results of these studies have not yet been confirmed by other studies. None of the included studies reported the reproducibility of bone scintigraphy for diagnosing mechanical loosening of TKA.
In the assessment of PJI, the increased osteoblastic activity on bone scintigraphy results in a highly sensitive modality. However, in the first years after implantation, specificity is dramatically lower and differentiation between infection and mechanical loosening (1-2 years for hip prostheses, 4-5 years after knee prostheses) is not accurate [4,5]. A bone scan is defined as positive for infection when there is uptake in all three phases (positive perfusion, blood pool, and osseous phase), and this criterion was used by all four studies that investigated bone scintigraphy in the assessment of painful prostheses. However, a wide variety in the level of reproducibility was reported ("slight" to "almost perfect" interobserver agreement) [19][20][21] This wide range of observer agreement may be explained by the low-resolution images of scintigraphy which could lead to unsatisfactorily results of inter-observer reliability.
The introduction of hybrid imaging with SPECT/CT could improve diagnostic accuracy and reproducibility of the lowresolution scintigraphy. Adding only SPECT may not improve the reproducibility of bone scintigraphy alone [7]. This may be the result of the missing CT portion, which results in less anatomical relation and therefore less accurate interpretation of the images. For other pathologies, it has been reported that adding SPECT/CT to planar images results in higher diagnostic accuracy because of better resolution, direct (anatomic) correlation of functional and morphological abnormalities, and better distinguishing of bone infection from soft tissue infections through improved assessment of the extent of the infection [11]. A previous study showed that when SPECT/CT was added, inter-observer agreement regarding the location of infectious foci increased from κ = 0.68 to κ = 1.0 in in the assessment of osteomyelitis [11]. In agreement with the first clinical studies, 4 studies demonstrated "excellent" intra-and inter-observer agreement when SPECT/CT was added in the interpretation of symptomatic hip and knee  [30][31][32][33]. However, these studies did not correlate the nuclear imaging results with surgical findings or clinical follow-up whether the prosthesis was loose or not. Because of this major methodological concern, no conclusions can be drawn regarding the clinical applicability and were therefore excluded from this review. In contrary to these studies, the one study that did correlate the SPECT/CT findings with a valid reference standard reported "moderate" to "substantial" agreement for the hip (κ = 0.58) and knee (κ = 0.80) [7]. Unfortunately, these findings are not yet confirmed by other studies. Although it is plausible that adding SPECT/ CT improves the reliability of the interpretation of scintigraphy in the assessment of symptomatic joint replacement, there is limited clinical data in literature to support this assumption. The clinical impact of SPECT/CT and its improvement in terms of reproducibility would be a suitable subject for further investigation in the field of symptomatic arthroplasties.

Leukocyte Scintigraphy and Combined Techniques
Leukocyte scintigraphy has a long history in the evaluation of PJI as it can be used as a more specific imaging modality in order to differentiate between aseptic loosening and PJI. When the appropriate criteria are used, leukocyte scintigraphy may be a specific and accurate imaging modality for diagnosing PJI [4,5]. Only one study reported "moderate" to "substantial" inter-observer agreement of three different pairs of readers (κ = 0.55-0.74). In 26 (27.4%) of the 95 investigated knee and hip arthroplasties, there was no concordance between the three observers. There were differences in concordance between the hip (76%) and knee (67%) group, however, not significant. The diagnostic accuracy of leukocyte scintigraphy is potentially hampered by the fact that labeled leukocytes not only accumulate in infections but also accumulate physiologically in the bone marrow. In order to reduce the number of false-positive results, leukocyte scintigraphy is often combined with bone marrow scintigraphy, which has been proposed as the imaging modality of choice for diagnosing PJI [34]. Besides the improved sensitivity and specificity compared with leukocyte scintigraphy alone, this combined technique demonstrated "substantial" to "almost perfect" inter-observer agreement in two studies [24,26]. An explanation for the good reproducibility of LS-BMS scans may lie in the fact that comparison of images improve interpretation, which was also found by Sousa et al. [27] It is important to note that different diagnostic accuracies were found between knee and hip arthroplasties in previous studies [4,5]. The one study that investigated the reproducibility of THA alone reported "moderate" inter-observer agreement [26], while "almost perfect" concordance was found in another study with a combined hip and knee group [24]. For further studies, it is important to separately analyze the reproducibility of both arthroplasties.

Anti-granulocyte Scintigraphy and Combined Techniques
Anti-granulocyte scintigraphy was introduced as an alternative for leukocyte scintigraphy with the advantage of in vivo labeling of leukocytes and was proposed as a promising S Acetabulum: a moderate increased uptake in > two zones or intense uptake in at least one zone (scale 0-5) Stumpe et al. [1] 2004 BS THA M, S PJI: focally or diffusely increased periprosthetic uptake in all three phases Loosening: increased radionuclide uptake limited to the third phase Temmerman et al.
2006 BS THA(f) S Femoral: a mild or moderate increase in uptake of contrast in more than two zones or intense uptake in at least one zone Granados et al. [2] 2015 BS THA/TKA M, H Analyzing images of each phase (blood flow, blood pool, and bone phase) of the BS independently, 2 categories (either positive or negative for periprosthetic infection) were considered. The GL was considered positive for infection when there was any extra-medullary periprosthetic uptake Yoldas et al. [3] 2016 BS THA MSIS Increased uptake in all three phases Arican et al. [4] 2015 diagnostic tool for the detection of PJI. Image acquisition and interpretation are the same as for leukocyte scintigraphy. In recent meta-analyses, less satisfying diagnostic accuracies of antigranulocyte scintigraphy in the assessment of THA compared with TKA were found [4,5]. The included studies in this review did not report inter-observer agreement in the assessment of THA alone. For TKA, a wide variety of kappa values between "fair" to "substantial" agreement (κ = 0.31 to > 0.75) was reported. Interestingly, for the combined knee and hip group [27], only "fair" to "moderate" inter-observer agreement was found, while another study reported "excellent" reproducibility (κ = > 0.75) for TKA alone [16]. Hence, one may suggest that variation in diagnostic accuracy outcomes between the knee and hip arthroplasties also applies to the level of reproducibility since interpretation is less accurate for THA. However, there is no available evidence to support this assumption because symptomatic THA has not been separately analyzed. Another important result was found in a study that reported important differences between the inter-observer agreement of mild PJI (κ = 0.25) versus severe PJI (κ = > 0.75) [15]. This could implicate that the degree of infection directly influences the interpretation and reproducibility of the anti-granulocyte scintigraphy scan in the assessment of PJI of the knee. Unfortunately, none of the other included studies differentiated in the degree of infection. In accordance with combined leukocyte and bone marrow scintigraphy, one study found higher levels of inter-observer agreement for combined anti-granulocyte scintigraphy and bone marrow scintigraphy compared with anti-granulocyte scintigraphy alone OA observer agreement, LS leukocyte scintigraphy, BMS bone marrow scintigraphy, CFS ciprofloxacin. 99m Tc-DCP, 99m Tc-dicarboxi-diphosphonate; 99m Tc-HMPAO, 99m Tc-hexamethylpropyleneamine oxime (1)*Three experienced nuclear medicine physicians, level of experience not reported. Level of experience and expertise was not defined   [27]. This supports the conclusion that the combination of these imaging modalities may improve interpretation and reproducibility.

FDG-PET/CT
FDG-PET has assumed considerable potential in the assessment of periprosthetic infection. Important advantages of FDG-PET/CT, compared with planar scintigraphy, are its increased resolution, anatomic correlation with coregistered CT, and availability of simple visual interpretation criteria of images. In the assessment of PJI of the hip, in contrary to PJI of the knee [5], FDG-PET/CT is considered a promising imaging modality with assumed simple visual interpretation of uptake patterns. However, the visual interpretation of uptake patterns could potentially be hampered by inter-observer variation in the interpretation of imaging. One of the first studies on clinical application of FDG-PET/CT stated that the simplicity of the diagnostic criteria should reduce potential inter-observer variation and increase the accuracy of the scan [10]. Throughout literature, however, this has not been investigated in literature. Moreover, a variety of diagnostic criteria has been used. Based on the best available evidence, it is assumed that extended uptake at the middle portion of the femoral bone-prosthesis interface is the most accurate criterion for PJI [29]. This criterion demonstrated "almost perfect" interobserver agreement (κ = 0.85) in one study [29]. An alternative criterion, increased uptake along the bone-prothesis interface compared with the physiological uptake in the bladder, demonstrated only "moderate" inter-observer agreement [19]. Unfortunately, the literature search revealed no additional clinical data to support these findings and more clinical investigation is needed in order to validate these results.

Limitations and Concerns
This systematic review depicts an overview of the currently available literature on the reproducibility of the various nuclear imaging modalities in the assessment of painful arthroplasties. This review demonstrated that the level of evidence is hampered by serious limitations.
There are several concerns regarding the methodological quality of the included studies. Most studies investigated the diagnostic accuracy and intra-and inter-observer agreement Table 7 The intra-and inter-observer agreement of anti-granulocyte scintigraphy and combined techniques in diagnosing PJI
(2) = two nuclear medicine physicians, one orthopedic surgeon; level of experiences was not reported Four-point scale as follows: 1, sulesomab uptake was similar to that in the bone marrow; 2, sulesomab uptake was increased minimally over that in bone marrow; 3, sulesomab uptake was distinctly higher than uptake in bone marrow; 4, sulesomab uptake was two or more times greater than uptake in bone marrow TKP/UKP M, S, H 1, uptake was similar to that in the bone marrow; 2, uptake was increased minimally over that in bone marrow; 3, uptake was distinctly higher than uptake in bone marrow; 4, uptake was two or more times greater than uptake in bone marrow Total M intra-operative cultures, H intraoperative histological findings, S surgery/intra-operative findings was not the main research question. Ideally, the degree of agreement is investigated between multiple and different observers using the same imaging technique, classification, and procedure to assess the same subject. An important factor of influence to the reproducibility of the images is the level of experience of the observers. Interpretation of images by nuclear medicine physicians with or without experience in musculoskeletal imaging and the level of experience may directly influence the diagnostic accuracy. However, the level of experience and expertise was poorly or not described by 12 of the 16 included studies. Moreover, the number of observers is important to calculate the inter-observer agreement properly and avoid the possibility of agreement occurring by chance. Only 5 of the 16 included studies reported the results of more than 2 readers (Tables 3, 4, 5 , 6, 7, 8, 9, and 10).
Recent meta-analyses demonstrated different diagnostic accuracies of nuclear imaging between hip and knee arthroplasties. It is assumed that the inter-observer agreement could vary between the location of the prostheses, due to different uptake patterns and physiology [4,5,21]. In the interpretation of nuclear images, the type of implant (knee or hip) could be of influence to reproducibility, as demonstrated by several studies [7,16,27]. Therefore, the reported reproducibility concerning both arthroplasties without differentiation of type and location should be interpreted with caution. Unfortunately, 5 of the included studies did not differentiate between hip and knee arthroplasties. The degree of infection (chronic versus acute PJI) and the time between the implantation (last surgery) and performed imaging are also important factors in the interpretation of scans and were not described in most studies.
In the field of nuclear orthopedic imaging for painful prosthesis, concerns regarding the methodological quality and level of evidence have been previously been described [35]. This Table 9 Intra-and inter-observer agreement of FDG-PET/CT for diagnosing PJI

Study
Year Imaging Tracer Doses Prostheses Observers Intra-observer agreement Inter-observer agreement Stumpe  NR κ 0.79 1 1 = 4 different diagnostic criteria were investigated for inter-observer agreement, see Table 10 *Level of experience and expertise was not defined. (1) = two nuclear medicine physicians, > 4-year-experience.
(2) = one senior nuclear medicine physician, one nuclear medicine trainee, level of experience not reported Scale 0-5. uptake was low and comparable with that in inactive muscles and fat; a score of 2 that FDG uptake was moderate, clearly noticeable, and distinctly higher than the uptake in inactive muscles and fat; a score of 3 that FDG uptake was strong but was distinctly less than the physiologic uptake in the bladder; and a score of 4 that FDG uptake was very strong and was comparable with physiologic urinary uptake in the bladder systematic review depicts that the interpretation and reproducibility of nuclear imaging in the assessment of symptomatic joint replacements, an important aspect of clinical usability, have thus far been underreported. Moreover, it is important to note in general that most studies reported the consensus results between two readers, and therefore, analyzing the intra-and inter-observer agreement was not possible. Further investigation should report implant characteristics, time interval between imaging and surgery, and degree of infection, and separately analyze hip and knee arthroplasties.

Key Points of the Best Available Evidence According to This Systematic Review
Literature provides insufficient data regarding for intraobserver agreement when using nuclear imaging in the assessment of symptomatic joint replacements. Bone scintigraphy demonstrated a wide range of inter-observer agreement. The comparison of images, as for combined leukocyte and bone marrow scintigraphy, could improve the level of interobserver agreement. The infection grade (mild versus severe) could directly influence the accuracy of interpretation. The interpretation and level of reproducibility for nuclear imaging could be different between hip and knee arthroplasties.
Although it is plausible that hybrid imaging improves interpretation for symptomatic joint replacements and could improve the level of inter-observer agreement, there is limited data to support this assumption. The results of this systematic review should be interpreted with caution because of methodological concerns of the included studies with high risk of bias (Tables 11 and 12). Important variables that may influence the level of reproducibility as type of implant, time interval between imaging and surgery, level of expertise of the observers, and degree of infection were underreported. Further studies investigating the diagnostic performances of nuclear imaging in symptomatic orthopedic implants should report intra-and inter-observer agreement and separately analyze hip and knee arthroplasties.