Feasibility and reliability of the Spondyloarthritis Research Consortium of Canada sacroiliac joint inflammation score in children
Published methods for quantification of magnetic resonance imaging (MRI) evidence of inflammation in the sacroiliac joint lack validation in pediatric populations. We evaluated the reliability and construct validity of the Spondyloarthritis Research Consortium of Canada (SPARCC) sacroiliac joint inflammation score (SIS) in children with suspected or confirmed juvenile spondyloarthritis (JSpA).
The SPARCC SIS measures the presence, depth, and intensity of bone marrow inflammation on MRI through the cartilaginous part of the joint. Six readers blinded to clinical details except age, participated in two reading exercises, each preceded by a calibration exercise. Inter-observer reliability was assessed using intraclass correlation coefficients (ICCs) and for pre-specified acceptable reliability the inraclass correlation coefficient (ICC) was > 0.8.
The SPARCC SIS had face validity and was feasible to score in pediatric cases in both reading exercises. Cases were mostly male (64%) and the median age at the time of imaging was 14.9 years. After calibration, the median ICC across all readers for the SIS total score was 0.81 (IQR 0.71–0.89). SPARCC SIS had weak correlation with disease activity (DA) as measured by the JSpADA (r = − 0.12) but discriminated significantly between those with and without elevated C-reactive protein (p = 0.03).
The SPARCC SIS was feasible to score and had acceptable reliability in children. The ICC improved with additional calibration and reading exercises, for both experienced and inexperienced readers.
KeywordsImaging Spondyloarthritis Ankylosing spondylitis Juvenile idiopathic arthritis Outcomes
Assessment of SpA International Society
Digital Imaging and Communication in Medicine
Intraclass correlation coefficients
Jjuvenile SpA disease activity index
magnetic resonance imaging
sacroiliac joint inflammation score
Spondyloarthritis Research Consortium of Canada
Short-tau inversion recovery
Tumor necrosis factor
Children with spondyloarthritis (SpA) and axial arthritis are at risk of progression to ankylosing spondylitis, an inflammatory disease that causes joint fusion and leads to permanent functional impairment. While there are similarities between juvenile and adult SpA, there are distinct phenotypic differences that warrant specific focus and clinical trials in juvenile disease [1, 2, 3, 4, 5]. In order to assess the efficacy and effectiveness of medications for SpA and axial disease in critically needed clinical trials, we need an objective measure to quantify response in the pediatric sacroiliac joint.
There are five Food and Drug Administration (FDA)-approved anti-TNF drugs for ankylosing spondylitis in adults and many more phase III trials underway. There has only been one trial of biologic agents in children with axial arthritis . This trial had several limitations leading to findings that were not statistically significant for the primary outcome. Noteworthy limitations were that imaging was not part of the protocol and the primary outcome was achievement of an Assessment of SpA International Society (ASAS) working group 40 response criteria , which has never been validated in children. Despite the negative results, most pediatric rheumatologists believe TNF inhibitors are effective. This trial experience underscores the importance of identifying clinically meaningful and responsive measures of inflammation in the pediatric sacroiliac joint.
Validated measures to objectively assess inflammation in the adult sacroiliac joint include the modified Berlin magnetic resonance imaging (MRI) method  and the SPondylo Arthritis Research Consortium of Canada (SPARCC) Sacroiliac joint Inflammation Score (SIS) . The modified Berlin method grades the proportion of bone marrow edema in each sacroiliac joint according to the percentage of each joint quadrant (grade 1, < 33%; grade 2, ≥ 33–66%; grade 3, > 66%) affected by edema so that the total score for both SIJs ranges from 0 to 24. The SPARCC SIS measures the presence, depth, and intensity of bone marrow inflammation in each sacroiliac joint quadrant in consecutive semi-coronal slices through the joint. The SPARCC SIS has been used successfully in clinical trials of biologic agents in adults with SpA and axial disease [10, 11, 12, 13, 14]. Improvements in inflammation on MRI were significantly correlated with change in several clinical measures including composite disease activity measures and C-reactive protein levels .
Identification of useful measures of inflammation in the pediatric sacroiliac joint will translate into improved care algorithms and increased opportunities for labeling of emerging targeted medications for these children. We evaluated the reliability and construct validity of the SPARCC SIS in children with suspected or confirmed juvenile SpA.
This study protocols were reviewed and approved by the Children’s Hospital of Philadelphia Committee for the Protection of Human Subjects (16–012641 and 17–013883). Waivers of consent and HIPAA authorization were granted as the procedures represented minimal risk to the subjects and did not adversely affect the rights and welfare of the subjects.
Cases were children with suspected or confirmed SpA who had imaging performed at the Children’s Hospital of Philadelphia (Philadelphia, PA, USA) or Stollery Children’s Hospital (Edmonton, Alberta, Canada). All images were de-identified prior to scoring and demographics were abstracted from the clinical chart. The imaging protocols at the two institutions varied slightly and minor modifications of the imaging protocols over the time period were noted. The SPARCC SIS scores, however, only require two specific sequences, which were uniformly performed on all included studies with the T1-weighted and short-tau inversion recovery (STIR) coronal oblique sequences. One of the advantages of the SPARCC tool is that as long as these two sequences are present, subtle differences in MRI procedure amongst institutions do not impact its applicability.
Scoring system used to evaluate six consecutive semi-coronal magnetic resonance imaging slices (total score range 0–72)
Score range per slice
Bone marrow edema
Hyperintense signal on short-tau inversion recovery
Score 4 quadrants in each sacroiliac joint 0/1, range per slice is 0–8
Bone marrow intensity
Hyperintensity of marrow edema using the presacral veins as reference
Score each sacroiliac joint 0/1, range per slice is 0–2
Bone marrow depth
Homogeneous increase of marrow signal ≥1 cm from the articular surface within either sacroiliac joint
Score each sacroiliac joint 0/1, range per slice is 0–2
All training and scoring exercises were performed within a web-based environment (CaREArthritis.com) by six readers (three pediatric radiologists, one pediatric rheumatologist, one adult radiologist and one adult rheumatologist). Two of the readers were SPARCC SIS developers (WM and RL). Before completing reading exercise 1, all readers reviewed a pediatric PowerPoint training module that included scoring instructions and examples of bone marrow edema and bright subchondral signal on STIR scans that may be easily confused with inflammation. The reference for normal bone marrow signal intensity in children was the iliac crests, edges of the vertebral bodies, tri-radiate cartilage and ischial pubic synchondrosis. Each reader, blinded to clinical details except age, subsequently used the online viewing and scoring system to score 30 digital imaging and communication in medicine (DICOM)-based studies that included semi-coronal T1-weighted and STIR sequences. The data for reading exercises 1 and 2 were based on data from a single time point.
After reading exercise 1, all readers, except WM and RL (SPARCC SIS developers), completed an interactive calibration module comprising 40 adult cases, each with studies of T1-weighted and STIR scans from baseline and 2 years after initiation of TNF inhibitor therapy (https://www.carearthritis.com/mriportal/sparccsij/index/). Each study was scored in pairs, blinded to time point. Readers received instantaneous feedback on concordance/discordance of scoring with the expert readers (WM and RL) after scoring each individual semi-coronal slice in the first 20 cases. Feedback on the next 20 cases was provided after scoring the entire case. The intraclass correlation coefficient (ICC) for concordance with the expert readers was calculated after the first 20 cases were completed, again after the next 10 cases, and finally after all 40 cases had been scored. The a priori score deemed acceptable for this calibration was an ICC >0.8 for the total SPARCC SIS score. After the interactive training module, we conducted a second reading exercise based on DICOM images comprising T1-weighted and STIR scans from 30 pediatric studies, which were a mix of 15 studies from the first exercise and 15 new studies.
Face and content validity were established through discussion during the creation of the pediatric PowerPoint training module and initial calibration exercise. Face and content validity is demonstrated if the criteria are sensible on face value to physicians who utilize them in their field. Feasibility was assessed by asking the raters to estimate the time it took to score each case and to rate the ease with which it was performed.
Inter-observer reliability was assessed using ICC. The ICCs are presented as agreement for all readers together, SPARCC developers (n = 2), pediatric radiologists (n = 3) and rheumatologists (n = 2). We applied the pre-specified ICC target from the interactive adult calibration module to the interpretation of the results from the two reading exercises (> 0.8 for the total score).
Construct validity was tested using Spearman’s rho, measuring correlation between the mean SPARCC SIS developers’ status scores from reading exercise 2 and disease activity as measured by the juvenile SpA disease activity (JSpADA) index (range 0–8)  and physician global disease activity assessment (visual analogue scale, range 0–10). The JSpADA index is a disease activity measure that is scored 0 to 8, with higher scores indicating more active disease. The components of the score are weighted equally and consist of the following measures: arthritis, enthesitis, patient pain assessment, inflammatory markers, morning stiffness, clinical sacroiliitis, uveitis and back mobility. Discrimination was tested by comparing the mean SPARCC SIS developers’ scores with laboratory values (C-reactive protein ≥1 or <1 mg/dl and erythrocyte sedimentation rate ≥21 or <21 mm/h) and patients with and without inflammatory back pain. Laboratory values were included if available +/− 60 days from the date of the MRI.
Frequency (%)/median (IQR)
Active joint count
Tender enthesis count
Patient-reported paina (VAS; range 0–10)
Patient-reported disease activity (VAS; range 0–10)
Physician global disease activity (VAS; range 0–10)
CRP (≥1.0 mg/dL)
ESR (≥21 mm/h)
Feasibility and interobserver reliability
The SPARCC SIS had acceptable face validity and was feasible to score. The time required to assess a case for the presence, depth and intensity of lesions for a single case less than 15 min. All readers felt the online scoring platform was easy to use.
In between the first and second reading exercises, the four readers who were not SPARCC SIS developers (PW, JJ, DB and NC) completed an interactive calibration module comprising 40 adult cases. Three of the four readers rated the scoring platform for the calibration module as easy to use and the time to score each case ranged from 4 to 15 min. All four readers achieved an ICC status score >0.82 (range 0.82–0.99).
In exercise 2, 133 (74%) of the studies had a SPARCC SIS ≥2. Median SPARCC SIS was 7.5 (IQR 1–24.5). The ICC amongst all six readers surpassed the pre-specified target of 0.8. When assessed by different rater backgrounds, only the SPARCC developers achieved the pre-specified target of > 0.8 for the total score. The ICCs amongst pediatric radiologists and rheumatologists were 0.77 and 0.79, respectively (Fig. 2).
Correlation between disease activity, as measured by the JSpADA index and physician global disease activity, and the SPARCC SIS developers’ mean scores for all unique patients was weak (r = − 0.12, p = 0.64 and r = 0.09, p = 0.65, respectively). The SPARCC SIS discriminated well between those children with and without elevated C-reactive protein (p = 0.03) but not between children with and without inflammatory back pain (p = 0.92).
Identification of reliable, objective, and quantifiable measures of response in children with axial disease is critical. Without measures to objectively quantify the severity of disease in the pediatric sacroiliac joint, assessment of interval change and response to medication in efficacy or effectiveness studies is not possible. This is the first evaluation of the feasibility and reliability of the SPARCC SIS in children. The SPARCC SIS was feasible to score and achieved the pre-specified target for acceptable reliability (ICC ≥0.8) in children. The ICC improved and surpassed the pre-specified acceptable reliability threshold with additional calibration and reading exercises, even for readers with limited experience. The SPARCC SIS did not have convergent validity with clinical measures of disease activity, highlighting that imaging and clinical evaluations provide complimentary but non-overlapping information. The SPARCC SIS did discriminate between those children with and without elevated C-reactive protein but did not discriminate between children with and without back pain.
Our findings indicate that readers with various levels of imaging experience can have their reading calibrated to acceptable levels of reliability using a relatively simple calibration module that provides iterative real-time feedback on the concordance/discordance of their scoring with the expert readers at the level of individual sacroiliac joint quadrants. This is promising as this means that radiologists (both adult and pediatric) from geographically isolated institutions can have their reading calibrated for the purposes of both efficacy and effectiveness studies. In the first reading exercise both the pediatric radiologists (n = 3) and SPARCC developers (n = 2) attained an ICC >0.8 yet the overall ICC was lower (0.63). The lower total group ICC was likely secondary to inclusion of an inexperienced reader (PW) and to systematic differences in the way in which the pediatric radiologists and the SPARCC developers approach evaluation of these studies. The subsequent calibration exercises were meant to reduce these betwee-group differences and heighten overall agreement or reliability, which they did. Interestingly, the ICC in the pediatric radiology group dropped slightly in the second reading exercise, which might signal that each of the pediatric radiologists adjusted his/her interpretations slightly differently and that perhaps with additional training the total ICC can improve even more. It would be ideal in the future to develop an interactive calibration module based solely on pediatric imaging studies.
In adults, the SPARCC SIS has been shown to significantly correlate with the magnitude of C-reactive protein and other measures of clinical disease activity  that have not been validated in children. We did not see correlation with disease activity measures, perhaps secondary to the relatively small sample size or the lack of validated specific measures of axial disease activity to use as reference standards. One measure specifically designed to assess clinical disease activity in juvenile SpA is the JSpADA index , but this is an assessment of overall SpA disease activity and axial disease activity accounts for only two of eight variables contributing to the score. No other validated juvenile arthritis disease activity tools assess axial disease activity. The SPARCC SIS did discriminate between those with and without elevated C-reactive protein. This is not entirely surprising given the results from the aforementioned adult studies  and the pediatric studies demonstrating increased predictive probability of sacroiliitis with elevated C-reactive protein . Discrimination was not uniform across multiple clinical attributes as it did not discriminate those with and without back pain. This latter finding may be a problem unique to pediatric SpA in which back pain has been shown to be poorly correlated with the presence of axial arthritis in multiple studies [4, 16].
There are several limitations to our study that should be considered. First, the number of pediatric studies was limited, largely due to missing semi-coronal sequences which are essential for SPARCC scoring. The limited number of studies necessitated re-evaluation during the second exercise of some cases used in the first reading exercise. We do not believe using the same studies more than once impacted our results because the cases were not discussed amongst the readers after the first reading exercise and the two reading exercises were separated by approximately 15 months. The limited number of studies also means we could not assess the minimal detectable difference or change in scores; this will be a focus of future work. Second, the studies utilized in the reading and calibration exercises were primarily from only two North American hospitals. Data collection from only two hospitals frequently leads to overrepresentation of regional demographics and a cohort that may not be typical of the entire disease population; however, this limitation is mitigated by the fact that both of these institutions are large referral centers, so the studies included are likely to be representative of cases evaluated in a wide variety of geographic areas. Third, some physician and patient-reported assessments were missing, which is expected in a retrospective study. Fourth, specific clinical measures of axial disease activity were not available as they are neither validated nor routinely collected in pediatric clinical care. Although these latter two issues limited the assessment of construct and discriminative validity, they do not affect the feasibility or reliability assessment. These relatively minor limitations are to be expected in the first systematic assessment of the feasibility and reliability of a new tool. Fifth, there was no gold standard by which to assess construct validity of the tool. The closest constructs to measure clinical (not radiographic) disease activity were the juvenile SpA disease activity index and the physician global assessment of disease activity. At most, we hypothesized there would be modest correlation. The absence a current gold standard should not preclude assessment of the feasibility and reliability utility of the tool.
We have demonstrated the feasibility and reliability of the SPARCC sacroiliac joint inflammation scoring methodology in children with established or suspected spondyloarthritis. This scoring system is based upon dichotomous scoring of lesions on consecutive slices through the cartilaginous part of the sacroiliac joint. We have not only established feasibility but also demonstrated that inexperienced readers can have their reading calibrated using standardized definitions, DICOM reference cases and an interactive calibration module. Further work is needed to assess the responsiveness and prognostic significance of the SPARCC SIS in evaluating MRI lesions in children.
Dr Jaremko and Dr Lambert are supported by Medical Imaging Consultants, Edmonton.
Dr Weiss’ work was supported by the Rheumatology Research Foundation. Dr Jaremko is supported by the Capital Health Chair in Diagnostic Imaging.
Availability of data and materials
The datasets during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Dr. Weiss has served as a consultant for Lilly.
PW designed the study, participated in the reading exercise, analyzed the results and drafted the manuscript. WPM contributed to the design of the study, participated in the reading exercise and interpretation of the results and helped revise the manuscript. RGL participated in the reading exercise, interpretation of the results and manuscript revision. JLJ participated in the reading exercise, interpretation of the results and manuscript revision. DMB participated in the reading exercise, interpretation of the results and manuscript revision. JP was responsible for feasibility and coordination of data acquisition and participated in data analysis. TGB participated in study design, coordination, data analysis, interpretation and drafting of the manuscript. RX participated in data analysis and drafting of the manuscript. NAC participated in the reading exercise, interpretation of the results and manuscript revision. All authors read and approved the final manuscript.
Ethics approval and consent to participate
This study’s protocols were reviewed and approved by the Children’s Hospital of Philadelphia Committee for the Protection of Human Subjects (16–012641 and 17–013883). Waivers of consent and HIPAA authorization were granted as the procedures represented minimal risk to the subjects and did not adversely affect the rights and welfare of the subjects.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 6.Horneff G, Fitter S, Foeldvari I, Minden K, Kuemmerle-Deschner J, Tzaribacev N, Thon A, Borte M, Ganser G, Trauzeddel R, et al. Double-blind, placebo-controlled randomized trial with adalimumab for treatment of juvenile onset ankylosing spondylitis (JoAS): significant short term improvement. Arthritis Res Ther. 2012;14(5):R230.CrossRefPubMedPubMedCentralGoogle Scholar
- 7.Lukas C, Landewe R, Sieper J, Dougados M, Davis J, Braun J, van der Linden S, van der Heijde D. Assessment of SpondyloArthritis international S. Development of an ASAS-endorsed disease activity score (ASDAS) in patients with ankylosing spondylitis. Ann Rheum Dis. 2009;68(1):18–24.CrossRefPubMedGoogle Scholar
- 10.Pedersen SJ, Poddubnyy D, Sorensen IJ, Loft AG, Hindrup JS, Thamsborg G, Asmussen K, Hendricks O, Norregaard J, Piil AD, et al. Course of magnetic resonance imaging-detected inflammation and structural lesions in the sacroiliac joints of patients in the randomized, double-blind, placebo-controlled Danish multicenter study of adalimumab in spondyloarthritis, as assessed by the Berlin and Spondyloarthritis Research Consortium of Canada Methods. Arthritis Rheum. 2016;68(2):418–29.CrossRefGoogle Scholar
- 11.Maksymowych WP, Dougados M, van der Heijde D, Sieper J, Braun J, Citera G, Van den Bosch F, Logeart I, Wajdula J, Jones H, et al. Clinical and MRI responses to etanercept in early non-radiographic axial spondyloarthritis: 48-week results from the EMBARK study. Ann Rheum Dis. 2016;75(7):1328–35.CrossRefPubMedGoogle Scholar
- 12.Lambert RG, Salonen D, Rahman P, Inman RD, Wong RL, Einstein SG, Thomson GT, Beaulieu A, Choquette D, Maksymowych WP. Adalimumab significantly reduces both spinal and sacroiliac joint inflammation in patients with ankylosing spondylitis: a multicenter, randomized, double-blind, placebo-controlled study. Arthritis Rheum. 2007;56(12):4005–14.CrossRefPubMedGoogle Scholar
- 13.Sieper J, van der Heijde D, Dougados M, Mease PJ, Maksymowych WP, Brown MA, Arora V, Pangan AL. Efficacy and safety of adalimumab in patients with non-radiographic axial spondyloarthritis: results of a randomised placebo-controlled trial (ABILITY-1). Ann Rheum Dis. 2013;72(6):815–22.CrossRefPubMedGoogle Scholar
- 14.Sieper J, van der Heijde D, Dougados M, Maksymowych WP, Scott BB, Boice JA, Berd Y, Bergman G, Curtis S, Tzontcheva A, et al. A randomized, double-blind, placebo-controlled, sixteen-week study of subcutaneous golimumab in patients with active nonradiographic axial spondyloarthritis. Arthritis Rheum. 2015;67(10):2702–12.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.