To investigate whether artificial intelligence (AI) can differentiate septic from non-septic total hip arthroplasty (THA) failure based on preoperative MRI features.
Materials and methods
We included 173 patients (98 females, age: 67 ± 12 years) subjected to first-time THA revision surgery after preoperative pelvis MRI. We divided the patients into a training/validation/internal testing cohort (n = 117) and a temporally independent external-testing cohort (n = 56). MRI features were used to train, validate and test a machine learning algorithm based on support vector machine (SVM) to predict THA infection on the training-internal validation cohort with a nested fivefold validation approach. Machine learning performance was evaluated on independent data from the external-testing cohort.
MRI features were significantly more frequently observed in THA infection (P < 0.001), except bone destruction, periarticular soft-tissue mass, and fibrous membrane (P > 0.005). Considering all MRI features in the training/validation/internal-testing cohort, SVM classifier reached 92% sensitivity, 62% specificity, 79% PPV, 83% NPV, 82% accuracy, and 81% AUC in predicting THA infection, with bone edema, extracapsular edema, and synovitis having been the best predictors. After being tested on the external-testing cohort, the classifier showed 92% sensitivity, 79% specificity, 89% PPV, 83% NPV, 88% accuracy, and 89% AUC in predicting THA infection. SVM classifier showed 81% sensitivity, 76% specificity, 66% PPV, 88% NPV, 80% accuracy, and 74% AUC in predicting THA infection in the training/validation/internal-testing cohort based on the only presence of periprosthetic bone marrow edema on MRI, while it showed 68% sensitivity, 89% specificity, 93% PPV, 60% NPV, 75% accuracy, and 79% AUC in the external-testing cohort.
AI using SVM classifier showed promising results in predicting THA infection based on MRI features. This model might support radiologists in identifying THA infection.
Hip osteoarthritis is a frequent cause of hip pain and total hip arthroplasty (THA), which, to date, is one of the most common surgical procedures in orthopedic surgery, with the number of implants growing over time due to population aging. THA failure may occur due to prosthetic joint infection and non-septic reasons, such as aseptic loosening, dislocation, adverse reaction to metal debris (ARMD), bone fracture, or implant rupture . Several diagnostic tools exist, including clinical tests, laboratory exams on blood samples and joint fluid, and imaging examinations [2,3,4,5,6]. These tests have some limitations, particularly in detecting THA infection and the responsible microorganism, but all of them are part of a comprehensive preoperative evaluation routinely performed in patients with THA failure. As a matter of fact, there is no highly sensitive and specific single diagnostic preoperative test . Some papers have been published recently on the use of magnetic resonance imaging (MRI) in differentiating septic from non-septic THA failure [8,9,10,11]. These studies have investigated the diagnostic performance of different conventional imaging features (e.g., bone edema, synovitis, collections, and bone destruction) and of loco-regional lymphadenopathies reporting very good results. Nevertheless, the interpretation of MRI findings in patients with hip THA can be challenging for radiologists without strong experience in musculoskeletal imaging and other physicians, because these examinations are occasionally performed, generally in referral centers, but also due to artifacts related to the prosthesis itself. In this regard, as in several other settings, artificial intelligence (AI) and machine learning may be supportive to radiologists [12, 13]. AI has several potential applications in augmenting the musculoskeletal radiologist in the assessment of orthopedic implants. Among them, characterization of prosthesis, identification of specific implant models, and assessment of prosthetic positioning and complications. Most works have been published about the use of AI-based analysis of radiographic images to automate postoperative evaluations of joint arthroplasty [14, 15]. For instance, AI-based algorithms reported good accuracy in predicting the dislocation risk of THA (AUC 76.67) or in detecting THA loosening (accuracy 88.3%) . Unfortunately, most of these studies have critical methodological limitations. Thus, current evidence is not sufficient to support the use of AI-based diagnostic algorithms applied to medical images for the evaluation of THA complications in daily clinical practice. The aim of our study was to investigate whether AI can assist radiologists with MRI diagnosis of THA infection.
Institutional Review Board of Ospedale San Raffaele, Milano, Italy, approved this retrospective study and waived the need for informed consent (Protocol RETRORAD). After matching imaging, laboratory, and surgical data, our database was completely anonymized to delete any connections between data and patients’ identity according to the General Data Protection Regulation for Research Hospitals.
This study was concerned with the assessment of MRI examinations performed by a consecutive series of patients managed at IRCCS Istituto Ortopedico Galeazzi, Milan, Italy, from January 2015 to January 2022. These patients were affected by painful THA requiring revision surgery and were all subjected to preoperative pelvis MRI. We gathered all imaging, surgical, and clinical data of these patients cross-referencing the database of our Radiology Department with that of our Hip Department and Laboratory of Clinical Chemistry and Microbiology. The following inclusion criteria were considered: (i) patients subjected to first-time revision procedures for monolateral failed THA; (ii) availability of intraoperative microbiological tests; and (iii) preoperative unenhanced pelvis MRI performed up to 1 month before revision surgery with the same imaging protocol. The exclusion criteria were: (i) nondiagnostic MRI due to prosthesis-related artifacts; (ii) inflammatory arthritis, tumors, bowel inflammatory disease, or autoimmune disease. Overall, a total of 173 patients were included in this study (98 females, 75 males; mean age: 67 ± 12 years, range 28–94). They were divided into training/validation/internal-testing and external-testing cohorts as detailed below in the machine learning section.
Intraoperative samples from periprosthetic material collected during revision surgery (membranes, bones, fluid, THA components) were sent to our laboratory for culture analysis. Agar plates and enrichment broths were incubated for 48 h and 15 days, respectively, and daily checked for microbial growth. In case of broth turbidity, an aliquot was plated onto blood agar. Colonies grown from agar plates were identified by biochemical testing performed on a Vitek 2 analyzer (BioMerieux, Marcy L’Etoile, France). The final diagnosis of THA infection was obtained according to the International Consensus Meeting Criteria , which include two or more samples of intraoperative cultures showing the same microorganism growth and preoperative blood test results concerning the levels of C-reactive protein (CRP), erythrocyte sedimentation rate (ESR), and synovial fluid analysis that were used, together with clinical data, imaging findings, culture analysis from preoperative joint aspiration and intraoperative samples, to reach the final diagnosis.
MRI protocol and images interpretation
All MRI scans were done at in the same1.5 T unit (Avanto, Siemens Medical Solutions, Erlangen, Germany) at our institution. A combination of the table-integrated coil and abdominal coil was used. Our metal artifact reduction sequence (MARS) protocol included coronal T1-weighted, coronal STIR, axial T1-weighted, axial T2-weighted, and sagittal T2-weighted images : coronal T1-weighted (repetition time/echo time, 500/9.1 ms; number of excitations, 2; slice thickness, 3 mm; turbo factor, 6; flip angle, 180; voxel size, 1.3 × 0.9 × 3 mm; bandwidth, 302), coronal STIR (5180/81 ms; number of excitations, 2; slice thickness, 3 mm; inversion time, 160 ms; turbo factor, 28; flip angle, 122; voxel size, 1.4 × 1.1 × 3 mm; bandwidth, 395), axial T2-weighted (5000/68 ms; number of excitations, 2; slice thickness, 3.2 mm; turbo factor, 27; flip angle, 150; voxel size, 1.4 × 1.0 × 3.2 mm; bandwidth, 383), axial T1-weighted (593/9.1 ms; number of excitations, 3; slice thickness, 3.2 mm; turbo factor, 6; flip angle, 180; voxel size, 1.2 × 0.9 × 3.2 mm; bandwidth, 302), sagittal T2-weighted (3550/85 ms; number of excitations, 2; slice thickness, 3.5 mm; turbo factor, 26; flip angle, 150; voxel size, 1.0 × 1.0 × 3.3 mm; bandwidth, 381). The field of view was adapted to patient’s body build including the whole pelvis and all prostheses. All MRI scans were reviewed by a radiologist (blinded to clinical and microbiologic data) with 9 years of experience in musculoskeletal imaging, who provided a yes/no answer for each MRI feature. As done in a previous study on pelvis MRI in patients with THA , MRI features of THA failure were: periprosthetic bone destruction, periprosthetic soft-tissue mass, effusion, synovitis, lamellated synovitis, extracapsular edema, fibrous periprosthetic membrane, bone edema, and extracapsular collection/sinus tract.
Machine learning analysis
The machine learning analysis was performed using the TRACE4© platform (DeepTrace Technologies, Milan, Italy) .Our population of study was divided into training/validation/internal-testing and external-testing cohorts based on MRI performed before and after September 2019, respectively. The training/validation/internal-testing cohort consisted of 117 patients (38 with THA infection and 79 without THA infection). The external-testing cohort consisted of 56 patients (19 with THA infection and 37 without THA infection). All imaging parameters previously assessed by the musculoskeletal radiologist were used to train, validate and test the machine learning algorithm based on support vector machine (SVM, with Gaussian kernel) to predict THA infection on the training/validation/internal-testing cohort. A nested fivefold validation (10 ensembles, 250 trained models) approach was employed. Thereafter, machine learning performance was evaluated on temporally independent data from the external-testing cohort.
Chi-square statistics were used to compare each MRI feature between patients with infected and non-infected THA. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, area under the curve (AUC), and odds ratios (OR) were calculated for each MRI feature. We also assessed the diagnostic performance of a combination of conventional MRI features with the highest OR. Continuous variables were reported as mean ± standard deviation. Discrete variables were summarized as median and interquartile range. Bonferroni correction for multiple comparisons was applied and statistical significance was set at P < 0.003. SPSS software (v. 26, IBM, Armonk, NY) was used for statistical analysis. AUC, accuracy, sensitivity, specificity, PPV, and NPV of the machine learning classifier were calculated in the training-internal validation and test cohorts, respectively.
Conventional MRI features were significantly more frequently observed in patients with THA infection compared to patients without infection (all with P < 0.001), except bone destruction (P = 0.155), periarticular soft-tissue mass (P = 0.005), and fibrous membrane (P = 0.081). The values of sensitivity for detecting THA infection ranged from 5.4% (periarticular soft-tissue mass) to 80.7% (bone edema), specificity from 51.7% (bone destruction) to 97.4% (lamellated synovitis), PPV from 10.3% (periarticular soft-tissue mass) to 87.9% (synovitis), NPV from 62.5% (periarticular soft-tissue mass) to 89% (bone edema), accuracy from 46.8% (bone destruction) to 81.5% (synovitis), and OR from 0.198 (periarticular soft tissue mass) to 29 (synovitis). Data on diagnostic performance of all MRI features are reported in Table 1. Figure 1 shows a representative case from our study population.
When all MRI features were considered in the training/validation/internal-testing cohort, SVM classifier reached 92% sensitivity, 62% specificity, 79% PPV, 83% NPV, 82% accuracy, and 81% AUC in predicting THA infection, with bone edema, extracapsular edema, and synovitis resulting the best predictors ranked by descending importance. After being tested on the external-testing cohort, the classifier showed 92% sensitivity, 79% specificity, 89% PPV, 83% NPV, 88% accuracy, and 89% AUC in predicting THA infection.
SVM classifier showed 81% sensitivity, 76% specificity, 66% PPV, 88% NPV, 80% accuracy, and 74% AUC in predicting THA infection in the training/validation/internal-testing cohort based on the only presence of periprosthetic bone marrow edema on MRI, while it showed 68% sensitivity, 89% specificity, 93% PPV, 60% NPV, 75% accuracy, and 79% AUC in the external-testing cohort.
The main finding of our study is the validation of a diagnostic machine learning classifier to support physicians in the interpretation of imaging findings when dealing with preoperative MRI of patients with failed THA.
Few papers have investigated the diagnostic performance of preoperative MRI in predicting infection in patients with THA infection of the hip [8,9,10, 19] and of the knee [20, 21]. Our results, in line with previous studies, highlight the accuracy of some MRI features in predicting THA infection, such as bone edema, synovitis, synovial layering, and extracapsular edema with accuracy of 75.1%–81.5%. In more detail, the most specific findings were synovitis, lamellated synovitis, and extracapsular edema (specificity of 96.6%, 97.4%, and 87.1%, respectively), while bone edema reached the highest values of sensitivity (80.7%) and NPV (89%). On the other hand, bone destruction, periarticular soft tissue, and periprosthetic fibrous membrane are imaging findings occasionally observed in THA infection, being more commonly seen in aseptic THA failure. According to these data, the AI-based SVM classified bone edema, extracapsular edema, and synovitis as the best predictors of THA infection. Based on all MRI features, the classifier showed 82% accuracy, but with remarkable unbalance between sensitivity (92%) and specificity (62%). However, a substantial increase of specificity (79%), PPV (89%), accuracy (88%), and AUC (from 81 to 89%) was observed in the external-testing cohort. Bone edema was selected by the AI-based SVM as the best predictor showing 80% accuracy in the training/validation/internal-testing cohort and 75% accuracy in the external-testing cohort. Again, a substantial improvement in specificity (from 76 to 89%) and PPV (from 66 to 89%) was observed in the external-testing cohort. These results demonstrate that AI-based machine learning classifiers might be useful supportive diagnostic tools in challenging conditions like the differentiation of septic and aseptic THA failure. Indeed, AI tools applied to imaging after arthroplasty may improve reporting activity and decrease the mistakes rate by reducing cognitive load and fatigue for radiologists . Unfortunately, a comparison with previously published data is not possible, since, to our knowledge, no other studies investigated the diagnostic performance of an AI-based machine learning classifier built upon MRI features in this setting. Moreover, while AI models have been proven for detecting implant complications such as dislocations and loosening on radiographs , poor attention has been placed on the tremendous clinical impact that might have AI-based supportive tools in the diagnosis of THA infection.
Some limitations of our study must be pointed out. First, the relatively limited number of patients included in our series. It is well known that a huge amount of data is essential to build robust machine learning classifiers [23,24,25,26]; thus, larger studies may prove even higher diagnostic performance of AI-based predicting models in failed THA. It should be noted that collecting preoperative pelvis MRI in patients with failed THA eligible for revision surgery is not so easy. Then, we cannot exclude that our evaluation could have been affected by susceptibility artifacts, despite the use of metal artifact reduction sequences, particularly concerning some imaging features of the periprosthetic bone (bone edema, bone destruction, fibrous membrane), that may be missed or wrongly seen. New technologies included in the most recent MRI scanners reduce the risk of misinterpretation of images and may allow improving the diagnostic performance of both radiologists and AI-based supportive classifiers. Last, we did not include in our analysis the evaluation of locoregional lymphadenopathies that some authors have recently reported as potential imaging biomarkers of infected THA, even with higher diagnostic performance reported by conventional MRI features [8, 27].
In conclusion, AI using SVM classifier showed promising results in predicting THA infection based on MRI features assessed preoperatively. This model might represent an adjunctive tool to support radiologists in identifying THA infection and could form the basis for further trials in this little explored field. Future studies are warranted to build AI-based models that combine imaging, clinical, and laboratory data to improve the accuracy of preoperative evaluation of patients undergoing revision surgery for failed arthroplasty.
Kelmer G, Stone AH, Turcotte J, King PJ (2021) Reasons for revision: primary total hip arthroplasty mechanisms of failure. J Am Acad Orthop Surg 29:78–87. https://doi.org/10.5435/JAAOS-D-19-00860
Talbot BS, Weinberg EP (2016) MR imaging with metal-suppression sequences for evaluation of total joint arthroplasty. Radiographics 36:209–225. https://doi.org/10.1148/RG.2016150075
Sconfienza LM, Albano D, Messina C et al (2021) Ultrasound-guided periprosthetic biopsy in failed total hip arthroplasty: a novel approach to test infection in patients with dry joints. J Arthroplasty 36:2962–2967. https://doi.org/10.1016/J.ARTH.2021.03.029
Kwon YM, Mahajan J, Tirumala V et al (2020) Sensitivity and specificity of serum and synovial fluid markers in diagnosis of infection in head-neck taper corrosion of metal-on-polyethylene total hip arthroplasty. J Arthroplasty 35:3737–3742. https://doi.org/10.1016/J.ARTH.2020.06.058
Lee YJ, Sadigh S, Mankad K et al (2016) The imaging of osteomyelitis. Quant Imaging Med Surg 6:184–198
Sconfienza LM, Adriaensen M, Alcala-Galiano A et al (2022) Clinical indications for image-guided interventional procedures in the musculoskeletal system: a Delphi-based consensus paper from the European Society of Musculoskeletal Radiology (ESSR)-part IV, hip. Eur Radiol 32:551–560. https://doi.org/10.1007/S00330-021-07997-5
Signore A, Sconfienza LM, Borens O et al (2019) Consensus document for the diagnosis of prosthetic joint infections: a joint paper by the EANM, EBJIS, and ESR (with ESCMID endorsement). Eur J Nucl Med Mol Imaging 46:971–988. https://doi.org/10.1007/S00259-019-4263-9
Albano D, Messina C, Zagra L et al (2021) Failed total hip arthroplasty: diagnostic performance of conventional MRI features and locoregional lymphadenopathy to identify infected implants. J Magn Reson Imaging 53:201–210. https://doi.org/10.1002/JMRI.27314
Galley J, Sutter R, Stern C et al (2020) Diagnosis of periprosthetic hip joint infection using MRI with metal artifact reduction at 1.5 T. Radiology 296:98–108. https://doi.org/10.1148/RADIOL.2020191901
Schwaiger BJ, Gassert FT, Suren C et al (2020) Diagnostic accuracy of MRI with metal artifact reduction for the detection of periprosthetic joint infection and aseptic loosening of total hip arthroplasty. Eur J Radiol. https://doi.org/10.1016/J.EJRAD.2020.109253
Albano D, Messina C, Sconfienza LM (2021) MRI to diagnose total hip arthroplasty infection: steps toward an accurate diagnosis. Radiology 299:E283–E284. https://doi.org/10.1148/RADIOL.2021204146
Coppola F, Faggioni L, Regge D et al (2021) Artificial intelligence: radiologists’ expectations and opinions gleaned from a nationwide online survey. Radiol Med 126:63–71. https://doi.org/10.1007/S11547-020-01205-Y
Schiaffino S, Codari M, Cozzi A et al (2021) Machine learning to predict in-hospital mortality in COVID-19 patients using computed tomography-derived pulmonary and vascular features. J Pers Med 11:501. https://doi.org/10.3390/JPM11060501
Gurung B, Liu P, Harris PDR et al (2022) Artificial intelligence for image analysis in total hip and total knee arthroplasty : a scoping review. Bone Joint J 104-B:929–937
Yi PH, Mutasa S, Fritz J (2022) AI MSK clinical applications: orthopedic implants. Skeletal Radiol 51:305–313. https://doi.org/10.1007/S00256-021-03879-5
Parvizi J, Tan TL, Goswami K et al (2018) The 2018 definition of periprosthetic hip and knee infection: an evidence-based and validated criteria. J Arthroplasty 33:1309-1314.e2. https://doi.org/10.1016/J.ARTH.2018.02.078
Bellelli A, Silvestri E, Barile A et al (2019) Position paper on magnetic resonance imaging protocols in the musculoskeletal system (excluding the spine) by the Italian College of Musculoskeletal Radiology. Radiol Med 124:522–538. https://doi.org/10.1007/s11547-019-00992-3
Trace4©, DeepTrace-Technologies. Available online: http://www.deeptracetech.com/temp/TechnicalSheet__TRACE4.pdf Accessed 3 November 2022
Busch A, Jäger M, Beck S et al (2022) Metal Artefact Reduction Sequences (MARS) in Magnetic Resonance Imaging (MRI) after Total Hip Arthroplasty (THA): a non-invasive approach for preoperative differentiation between periprosthetic joint infection (PJI) and aseptic complications? BMC Musculoskelet Disord 23:620. https://doi.org/10.1186/s12891-022-05560-x
Plodkowski AJ, Hayter CL, Miller TT et al (2013) Lamellated hyperintense synovitis: potential MR imaging sign of an infected knee arthroplasty. Radiology 266:256–260. https://doi.org/10.1148/RADIOL.12120042
Li AE, Sneag DB, Greditzer HG et al (2016) Total knee arthroplasty: diagnostic accuracy of patterns of synovitis at MR imaging. Radiology 281:499–506. https://doi.org/10.1148/RADIOL.2016152828
Shah RF, Bini SA, Martinez AM et al (2020) Incremental inputs improve the automated detection of implant loosening using machine-learning algorithms. Bone Joint J 102-B:101–106
Gitto S, Bologna M, Corino VDA et al (2022) Diffusion-weighted MRI radiomics of spine bone tumors: feature stability and machine learning-based classification performance. Radiol Med 127:518–525. https://doi.org/10.1007/S11547-022-01468-7
Albano D, Cuocolo R, Patti C et al (2022) Whole-body MRI radiomics model to predict relapsed/refractory Hodgkin Lymphoma: a preliminary study. Magn Reson Imaging 86:55–60. https://doi.org/10.1016/J.MRI.2021.11.005
Yao F, Bian S, Zhu D et al (2022) Machine learning-based radiomics for multiple primary prostate cancer biological characteristics prediction with 18F-PSMA-1007 PET: comparison among different volume segmentation thresholds. Radiol Med 127:1170–1178. https://doi.org/10.1007/S11547-022-01541-1
van der Lubbe MFJA, Vaidyanathan A, de Wit M et al (2022) A non-invasive, automated diagnosis of Menière’s disease using radiomics and machine learning on conventional magnetic resonance imaging: a multicentric, case-controlled feasibility study. Radiol Med 127:72–82. https://doi.org/10.1007/S11547-021-01425-W
Isern-Kebschull J, Tomas X, García-Díez AI et al (2019) Accuracy of computed tomography-guided joint aspiration and computed tomography findings for prediction of infected hip prosthesis. J Arthroplasty 34:1776–1782. https://doi.org/10.1016/J.ARTH.2019.04.018
Open access funding provided by Università degli Studi di Milano within the CRUI-CARE Agreement. This study was supported and funded by the Italian Ministry of Health—"Ricerca Corrente." S. Gitto was supported by Umberto Veronesi Foundation.
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
The study was conducted in accordance with the Declaration of Helsinki,and approved by the Institutional Review Board of Ospedale San Raffaele, Milano, Italy (Protocol name: RETRORAD, approved by Comitato Etico OSR).
Consent to participate
Informed consent was waived by the Institutional Review Board for this retrospective study.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Albano, D., Gitto, S., Messina, C. et al. MRI-based artificial intelligence to predict infection following total hip arthroplasty failure. Radiol med 128, 340–346 (2023). https://doi.org/10.1007/s11547-023-01608-7