Introduction

Semi-quantitative magnetic resonance imaging (MRI) scoring systems have improved our understanding of osteoarthritis (OA) pathogenesis [1]. Most advancements have been made in knee OA, with a paucity of research undertaken in other affected joints [2]. For hip OA, three semi-quantitative MRI measures have been developed: the Hip Osteoarthritis MRI Scoring System [3], the Hip Inflammation MRI Scoring System [4], and the Scoring Hip Osteoarthritis with MRI (SHOMRI) [5]. The SHOMRI is a valid and reliable grading system that evaluates 8 different hip OA features including articular cartilage loss, bone marrow lesions (BMLs), subchondral cysts, acetabular labrum, paralabral cysts, intra-articular bodies, effusion-synovitis, and ligamentum teres abnormalities [5, 6]. It has also been used in several different populations to evaluate OA feature prevalence [7,8,9] and progression [10].

In the original technical paper by Lee et al. [5], the authors provided explanations for OA feature grading. However, criteria for reporting OA features (i.e., prevalence and severity) and change were not discussed. With increasing use of the SHOMRI in hip OA research and our evolving understanding of disease pathogenesis, we believe it is pertinent to advance the original SHOMRI definitions by providing new criteria for feature reporting and change where appropriate. Providing clear definitions—as has been done for knee OA [11, 12]—will facilitate consistent reporting between studies and progress our understanding of hip OA development.

The first aim of this report was to rigorously define reporting of SHOMRI hip OA feature prevalence and severity, and use these definitions in a sample of 50 hips from an ongoing prospective cohort study of young adults with hip and/or groin pain participating in high-impact physical activity (Australian football or soccer) [13]. The second aim was to develop criteria to monitor change over a period of 2 years and describe the findings in the same cohort. This report provides standardized criteria that should be applied to MRI-based semi-quantitative hip grading in clinical patient cohorts.

Methods

Study design

This study used a consecutive sample of participants from the femoroacetabular impingement and hip osteoarthritis cohort (FORCe) study. Briefly, the FORCe study aims to investigate changes in hip joint structures using MRI over a 2-year period in 184 symptomatic (hip and/or groin pain) men and women participating in football (Australian football or soccer) who were free of radiographic hip OA [13]. In the present study, we used the first 25 participants (50 hips) who completed baseline and 2-year follow-up hip MRIs. In addition to the eligibility criteria used for the larger FORCe study [8, 13, 14], participants were required to have hip MRIs that included the necessary sequences to enable completion of the SHOMRI assessment. Participants were recruited between August 2015 and October 2018 from sporting clubs or organisations and via online or print advertising in Melbourne and Brisbane, Australia. This study had ethics approval (La Trobe University Human Ethics Committee [HEC15-019 and HEC16-045] and the University of Queensland Human Ethics Committee [2015000916 and 2016001694], and all participants provided written informed consent.

Magnetic resonance imaging acquisition

At baseline and 2-year follow-up, each participant completed a 3.0 T MRI (Phillips Ingenia, The Netherlands) with a 32-channel torso coil placed over the hips and pelvis. Positioning aids were used to maintain each hip in internal rotation and neutral abduction/adduction, with right and left hips imaged independently. The MRI protocol included 2D proton density (PD) spectral attenuated inversion recovery (SPAIR) sequences in a coronal, sagittal, and oblique axial orientation (Table 1).

Table 1 Hip magnetic resonance imaging protocol

Development of SHOMRI definitions

To create the SHOMRI definitions, the authorship team (JJH, RBS, TML, JLK, MJS and KMC) met on two occasions to thoroughly discuss the SHOMRI scoring system and previously used semi-quantitative MRI definitions of feature prevalence, severity, and change [11, 12]. After the first meeting, the lead author (JJH) created a set of definitions which were subsequently reviewed by all authors. At the second meeting, the final definitions were discussed and agreed upon.

SHOMRI feature assessment

Two musculoskeletal radiologists (FG and JL, 4- and 3-year experience) blinded to radiographic and clinical findings evaluated all baseline and follow-up MRIs. Both radiologists were trained by a senior musculoskeletal radiologist (TML) with > 25-year experience in semi-quantitative MRI assessment. All MRIs were read paired with knowledge of timepoint (baseline or follow-up) to improve reader reliability and sensitivity to feature OA change [12]. If discrepancies in scoring occurred, a consensus read was performed with the senior musculoskeletal radiologist. To determine intra-reader reliability, each radiologist completed SHOMRI scoring in 20 randomly selected MRIs, re-read 2 weeks after the initial scoring. For inter-reader reliability, each reader completed scoring on 50 MRIs.

SHOMRI feature explanations

We have previously described the SHOMRI scoring system and explanations for feature grading [5, 8]. In short, eight OA features were assessed (Table 2), including articular cartilage loss (scored 0–2), BMLs (scored 0–3), subchondral cysts (scored 0–2), acetabular labrum (scored 0–5), paralabral cysts (scored present or absent), intra-articular bodies (scored present or absent), effusion-synovitis (scored present or absent), and ligamentum teres abnormalities (scored 0–3). Three features (articular cartilage loss, BMLs, and subchondral cysts) were evaluated in 10 subregions (six femoral and four acetabular). The acetabular labrum was evaluated in four subregions (anterior, anterosuperior, superior, posterior).

Table 2 SHOMRI feature explanations

Demographic and patient-reported outcome measures

Demographic information (age, sex, height, weight, symptom duration, football code participation) was collected. Each participant completed the International Hip Outcome Tool-33 (iHOT-33), a valid and reliable patient reported outcome measure[15] that is scored using a visual analog scale ranging from 0 (worst possible score) to 100 (best possible score) [15]. The iHOT-33 is recommended for evaluating hip-related quality of life in young to middle-aged people with hip and/or groin conditions [16].

Statistical analysis

Data analyses were performed with Stata/IC 16.1 for Mac (StataCorp LC, College Station, TX, USA). Intra- and inter-reader reliability for OA features were determined with the weighted kappa or kappa (paralabral cysts only) statistic. Descriptive statistics were used to report participant characteristics and OA features (prevalence, severity and change).

Results

Participants

We included 25 of the first 29 participants (86%) from the FORCe study. Four participants were excluded as they did not undergo a hip MRI at 2-year follow-up. The mean age of the included symptomatic football players was 28 years, 32% were women, and average body mass index was 24 kg/m2. The average baseline iHOT-33 score was 65.9, with the median duration of symptoms being 36 months (interquartile range 24, 50) . Participant characteristics are described in full in Table 3.

Table 3 Demographic characteristics and patient-reported outcome measures

Definitions for SHOMRI feature prevalence and severity

Definitions for prevalence and severity in the eight SHOMRI features are outlined in Table 4 and described in detail below.

Table 4 Definitions for reporting the prevalence, severity, and change of SHOMRI features

Cartilage defects

Cartilage defects were present if cartilage loss was evident in one or more subregions (acetabular or femoral) and were further defined as either partial- (grade 1) or full-thickness (grade 2). The location and number of subregions (0 to 10) affected by a cartilage defect was reported separately for partial- and full-thickness defects. A maximum cartilage score (0–2) across all 10 subregions and sum score (0-20) was also determined.

Labral tears

A labral tear (any) was scored as present if a grade 2 or higher was reported at least one of the four subregions. We further defined labral tears into simple (grade 2 or 3) and severe (grade 4 or 5). Labral tear (any) prevalence was determined for each of the four subregions. Finally, the number of subregions (0–4) affected by a labral tear, the maximum score (0–5) across all subregions and sum score (0-20) were determined.

Bone marrow lesions and subchondral cysts

For BMLs and subchondral cysts, the feature was scored as present if a grade 1 or higher was observed in at least one or more subregions. For both features, prevalence was reported for all 10 subregions. The number of subregions affected (0–10), maximum score (BMLs  0–3/subchondral cysts 0–2) across all regions and sum score (BML 0-30/subchondral cyst 0-20) were determined.

Other features

Ligamentum teres tears were scored as present if a partial- (grade 2) or full-thickness tear (grade 3) was reported. Finally, paralabral cysts, loose bodies, and effusion-synovitis were scored as present or absent.

Definitions for SHOMRI feature change over time

Definitions for change in the eight SHOMRI features are outlined in Table 4 and described in detail below.

Cartilage defects

Change in the number of subregions affected by cartilage defects (partial and full thickness were evaluated separately) was defined as the difference between baseline and follow-up. Cartilage defect worsening was evaluated in each subregion, and the number of subregions with worsening was determined for each hip. Overall cartilage defect worsening was defined as progression of cartilage grade in one or more subregions. Change in maximum and summed cartilage score were evaluated as the difference between baseline and follow-up scores. An example of an incident full-thickness cartilage defect after 2 years is shown in Fig. 1.

Fig. 1
figure 1

Incident cartilage defect and BML. a Normal subchondral bone (white arrow) and articular cartilage (blue arrow) in the superolateral subregion. b Two-year follow-up MRI shows incident grade 2 BML (double white arrow) and full-thickness (grade 2) acetabular cartilage defect (double blue arrow) in the superolateral subregion

Labral tears

The number of new subregions affected with a labral tear (grade 2 or above) was determined by evaluating the affected subregions at baseline and follow-up. Labral tear worsening was defined as progression of labral grading in at least one or more subregions (any worsening), with the number of subregions exhibiting progression (0–4) also reported. Change in maximum and summed labral score (difference between baseline to follow-up) were determined.

BMLs and subchondral cysts

Improvement in BMLs and subchondral cysts was allowed between baseline and follow-up. For BMLs and subchondral cysts, change in number of subregions affected (grade 1 or above) was determined by calculating the difference in affected subregions at baseline and follow-up. Feature worsening and improvement were determined for BMLs and subchondral cysts in the 10 subregions. Worsening was defined as a feature score increase (i.e., grade 1 to grade 2) in at least one subregion. Improvement was defined as feature score decrease (i.e., grade 2 to grade 1) in one or more subregions. The number of subregions with worsening and improvement was also calculated. Change (baseline to follow-up) in maximum (BML − 3 to + 3; subchondral cysts − 2 to + 2) and summed feature scores were determined for each hip. An example of an incident BML and subchondral cyst is shown in Fig. 2.

Fig. 2
figure 2

Incident subchondral cyst and BML. a Normal subchondral bone (white arrow) in the femoral anterior subregion. b Two-year follow-up MRI shows incident grade 2 subchondral cyst and grade 3 BML (double white arrow) in the femoral anterior subregion

Other features

Worsening of ligamentum teres tears was defined as an increase of ≥ 1 in the score. For the remaining features (paralabral cysts, effusion-synovitis, and loose bodies), worsening was determined if the feature was present at follow-up but not baseline. In contrast, improvement was defined as the feature being present at baseline but not follow-up.

Reliability

Intra- and inter-reader reliability kappa statistics results are presented in Table 5. Intra-reader reliability was almost perfect, ranging from 0.86 to 1.00 for readers 1 (FG) and 2 (JL). Inter-reader reliability ranged between 0.80 and 1.00, indicating substantial to almost perfect agreement.

Table 5 Intra- and inter-observer reliability

SHOMRI feature prevalence and severity at baseline

Partial- and full-thickness cartilage defects were present in 76% and 42% of hips, respectively (Table 6). The number or subregions affected by cartilage defects ranged from 0 to 5, with most hips (42%) having a maximum cartilage score of 2 (Supplementary information). Labral tears were present in 88% of hips, with 34% of tears considered severe (Table 6 and Supplementary information). Over 50% of hips had 2 or more subregions affected by a labral tear (Supplementary information). Bone marrow lesions and subchondral cysts were present in 10% of hips (Table 6). No hips had evidence of effusion synovitis, loose bodies, or ligamentum teres tears, with 24% of hips having paralabral cysts (Table 6). Feature sum scores are outlined in Table 7.

Table 6 Baseline–hip OA feature prevalence (n = 50 hips)
Table 7 Baseline and delta (change in sum score between baseline and 2-year follow-up) OA feature sum (total) scores n = 50 hips

SHOMRI feature change over time

Change in SHOMRI feature sum scores (baseline to 2 years) is presented in Table 7. Cartilage defect worsening was evident in 48% of hips (Supplementary information). The number of new subregions affected by partial- and full-thickness defects ranged from − 2 to 3 and 0–2, respectively. Close to half of all hips (44%) had a new labral tear in one or more subregion, and 78% of hips had labral tear worsening in one or more subregion (Supplementary information). The maximum change in labral score ranged from 1 to 4. For BMLs, the change in number of subregions affected ranged from − 2 (two fewer subregions affected by BMLs) to 1 (one more subregion affected by BMLs) (Supplementary information). Five hips (10%) had evidence of BML worsening, with only one hip showing improvement. Four hips (8%) had one new subregion affected by a subchondral cyst, with improvement not observed in any hips (Supplementary information). Of the remaining features, only ligamentum teres tears and paralabral cysts affected more hips at follow-up than baseline (Supplementary information).

Discussion

This study extends the original technical study of the SHOMRI grading system by providing definitions for reporting feature prevalence, severity, and change. This is critical to enable use of this grading system in a clinical context to analyze structural OA disease development and progression. A high prevalence of cartilage defects and labral tears was observed in football players with hip and/or groin pain, with up to 78% of hips demonstrating MRI feature worsening over 2 years, highlighting the sensitivity of SHOMRI to assess incidence and progression of degenerative changes.

Variability in the reporting of OA feature prevalence and severity has been highlighted in two recent systematic reviews [17, 18]. To overcome this, we provide comprehensive definitions for eight OA features that will enable consistent reporting in future studies using the SHOMRI scoring tool. Comprehensive evaluation of OA feature severity is challenging, but important for understanding OA pathogenesis and the link between structural change and symptoms. Existing studies often do not report OA feature severity across the whole joint (i.e., number of subregions), opting for total or sum scored instead [8, 10]. Our definitions for severity overcome this deficiency by providing both measures, affording a comprehensive assessment of severity.

Our proposed change definitions permit SHOMRI feature improvement and worsening, but not all features were allowed to improve. For example, cartilage defects, labral tears, and ligamentum teres tears were not allowed to improve over time. This approach is consistent with semi-quantitative change definitions for knee OA [12]. MRI improvement of tissue morphology (without surgical repair) may represent the formation of scar tissue rather than normal tissue regeneration (i.e., healing) [19, 20]. We acknowledge that our definitions may not be suitable for clinical trials evaluating the efficacy of disease-modifying treatments. Bone marrow lesions and subchondral cysts are dynamic features of OA disease, with the ability to fluctuate in size over time [12, 21]. Improvement and worsening of BMLs and subchondral cysts were allowed in our change definition, as in definitions for longitudinal change in studies of knee OA [11, 12].

Longitudinal change of hip MRI features has yet to be studied in detail [22]. In the present study, we propose several measures to thoroughly describe the spectrum of structural OA change. Summed scores have drawn criticism as they represent heterogeneous change in features [11, 12]. For instance, a summed cartilage change score of 6 could be achieved through three incident full-thickness defects or six incident partial-thickness defects, which are arguably different disease states. However, with knowledge of their shortcomings and when used alongside approaches that capture change within the entire joint, summed scores still provide a sensitive measure for monitoring change in OA features. Change across the entire joint was captured through evaluation of subregions. This approach overcomes the deficiencies of summed scores and permits assessment of change in existing features and the identification of incident pathology [12]. We also include a max change score across the entire hip, providing a measure of overall change that can be missed with summed scores [12]. Further work is needed to understand if specific change measures are related to symptom worsening and hip OA development.

Using our proposed definitions, a large proportion of symptomatic adult football players had hip OA features and demonstrated worsening of these features over 2 years. The reported prevalence of the eight OA features is largely consistent with existing investigations of similar populations [17, 18]. However, the extent of worsening in key hip OA features including labral tears (78% vs 17%) and cartilage defects (48% vs 11%) was much higher than previously reported in older non-athletic subjects [10]. As both studies used similar SHOMRI feature change definitions, differences likely reflect variations in participant characteristics (e.g., hip morphology, physical activity) and the complex nature of hip OA development [23,24,25].

Examination of MRIs was completed with knowledge of MRI sequences and timepoint but not clinical status. This approach is recommended for longitudinal investigations as it improves sensitivity to feature change [3, 11]. A consensus process was used to determine the final grading for each MRI feature. While this may take additional time to complete, it simplifies the reporting of features by providing a single grading for both readers.

We recognize there are several limitations that require consideration when using the proposed definitions. There is considerable debate surrounding the ability of MRI-defined OA features to improve over time [11, 12]. As our understanding of hip OA pathogenesis evolves, we recognize that revision of the proposed definitions may be needed to optimize the reporting of feature change. We used an optimized 3 T MRI protocol for assessment hip OA features. We acknowledge that contrast-enhanced MRI may provide superior assessment of key OA features, including cartilage, labrum, and synovium [26,27,28,29]. However, the use of contrast-enhanced MRI in longitudinal studies is associated with risk and not appropriate in all symptomatic populations. The binary classification of effusion-synovitis is likely to be insensitive to longitudinal change. This deficiency may preclude us from understanding the importance of effusion-synovitis in OA progression.

This study is the first to provide rigorous criteria and definitions for reporting prevalence, severity, and change of hip degenerative change using the SHOMRI grading system. Using the proposed definitions, up to 78% of hips demonstrated feature change over 2 years, well demonstrating sensitivity to change. Use of our definitions will enable comparison between hip MRI studies in clinical cohorts and improve our understanding of hip OA pathogenesis and progression. Longitudinal studies are now required to provide insight into the prognostic implications of the proposed SHOMRI definitions.