Background

Knee osteoarthritis (OA) is a major public health concern with current treatment focusing on controlling symptoms since there are no interventions that have yet been approved for modifying the course of the disease or improving structural alterations in joint tissues [1]. The Foundation for the National Institutes of Health (FNIH) sample was selected for a nested case–control study designed to evaluate the predictive validity of a broad spectrum of imaging and biochemical markers of disease progression in knee OA derived from the Osteoarthritis Initiative (OAI) public data base, an ongoing multi-center prospective observational cohort study of knee OA [2]. A biomarker that exhibits change over the near-term and is associated with longer-term clinically important outcomes would have potential as a marker of treatment efficacy [2].

While radiography depicts structural bony tissue changes only in advanced stages of OA, magnetic resonance imaging (MRI) is able to visualize all involved joint tissues, even in the earliest stages of disease, in which radiographs are normal [3, 4]. Recent data suggest that non-cartilaginous tissue changes in particular play an important role in the onset and progression of osteoarthritis [5, 6].

Using multivariable logistic regression models to examine associations between structural MRI markers and progression of radiographic and pain outcomes, we showed recently that all baseline structural joint features with the exception of effusion-synovitis and meniscal morphology, were able to predict 48 month case status and that for all joint features evaluated including size of bone marrow lesions, cartilage thickness and surface area, effusion-synovitis, meniscus morphology and -extrusion, osteophyte size, and Hoffa-synovitis, change over 24 months was associated with progression of disease [7]. However, definitions of change using complex scoring systems are challenging and need to be defined carefully prior to engaging in detailed analyses focused on outcomes and prediction models. As currently only sparse data are available on reliability and definitions of change in semi-quantitatively assessed MRI studies, we believe that a detailed description will be helpful to investigators focusing on samples at risk for progression; these data were not covered in the recent publication [7].

Thus, the aims of our study were to describe the scoring methodology and MRI assessments used to evaluate the cross-sectional features observed in cases and controls, to define change over time for different MRI features and to report the extent of changes over a 24-month period, which may serve as a potential reference for future studies focusing on MRI features and progression over similar observational periods.

Methods

Study design

The Osteoarthritis Initiative (OAI) is an ongoing multi-center prospective observational cohort study of knee OA (http://www.oai.ucsf.edu/) that enrolled 4796 participants aged 45–79 years at four clinical centers. Clinical data, MRI scans, radiographs and serum and urine specimens were obtained at baseline, 12, 24, 36, and 48 months (M) follow-up [8]. Eligible participants for the present study were those with at least one knee with a Kellgren-Lawrence grade (KLG) of 1–3 at baseline.

Criteria for case–control selection

Radiographic progression was defined by a decrease in minimal joint space width of ≥0.7 mm in loss in the medial tibio-femoral compartment from baseline to 24, 36 or 48 M.

Knee pain was assessed using the Western Ontario McMasters (WOMAC) pain subscale. Symptomatic progression was defined as a persistent increase of ≥9 points on a 0–100 normalized score from baseline to 24, 36, 48 or 60 months. This difference has been documented to be clinically relevant [9].

For the nested case–control study, a predetermined number of index knees was selected in the following outcome groups for measurement of imaging biomarkers [6]: 1) case knees had both radiographic and pain progression; control knees did not have this combination, and included 2) knees with radiographic but not pain progression, 3) knees with pain but not radiographic progression, and 4) knees with neither radiographic nor pain progression. The sample size for cases and these three control groups was 194, 103, 103 and 200 knees, respectively. For the purposes of this analysis we compared 194 cases vs. 406 controls.

MRI acquisition and assessment

MRIs of both knees were acquired using 3 T systems (Siemens Trio) at the 4 OAI clinical sites. A dedicated quadrature transmit/receive knee coil was used and the sequence protocol included a coronal intermediate-weighted 2-dimensional turbo spin echo sequence, a sagittal 3-dimensional dual-echo steady-state sequence, and a sagittal intermediate-weighted fat-suppressed turbo spin-echo sequence [10].

Two musculoskeletal radiologists with 13 (FWR) and 15 (AG) years’ experience of semi-quantitative assessment of knee OA, blinded to clinical data and case–control status, read the baseline and 24 month MRIs according to a validated scoring system [11], and with knowledge of the chronological order of the scans. The following joint structures were assessed: cartilage morphology, osteophytes, subchondral bone marrow lesions (BMLs), meniscal structural damage and meniscal extrusion, Hoffa-synovitis and effusion-synovitis.

In addition, within-grade changes were coded that fulfill the definition of a definite visual change but do not fulfill the definition of a full grade change on the ordinal scales applied [12].

Reliability

One experienced musculoskeletal radiologist (FWR) re-evaluated 20 randomly selected MRIs in random order after a 4 week interval to assess intra-reader reliability. Inter-observer reliability between the two readers was determined using the same 20 cases.

Definition of change over time

BMLs

Change in overall number of subregions affected by any BML was defined as the difference between the number of subregions affected by any BML at 24 months (size > 0) and the number of subregions affected by any BML at baseline. This was further categorized into improvement, no change, and worsening in one subregion and worsening in two or more subregions. An example of incident BML at follow-up is shown in Fig. 1.

Fig. 1
figure 1

Incident BML and meniscal tear. a Baseline sagittal intermediate-weighted fat suppresed image shows normal cartilage coverage of the medial femur and tibia and no meniscal damage. There is a definite osteopyhte at the posterior femur (arrow). b Follow-up image shows incident BML at the anterior medial tibia (short, large arrows) and an incident vertical meniscal tear at the posterior horn of the medial meniscus (arrowhead). In addition there is a small loose body posterior to the meniscus (long, thin arrow)

We also determined the number of subregions with worsening, and the number of subregions with improvement. In both instances we took into account within-grade changes in BML size. We further classified these measures into any subregions with worsening vs. no subregions with worsening and any subregions with improvement vs. no subregions with improvement.

To determine maximum change in BML size score, we first evaluated change in size score in each of the 14 articular subregions between baseline and 24 months. Change in size score in each subregion could range from a maximal improvement by three to a maximal worsening by three. The second step was to create an overall change in size score that was defined as the maximum change in size score across the 14 articular subregions. It was categorized into improvement, no change, worsening within grade, worsening by 1 grade, and worsening by two or more grades. Based on distributional quantities the final grouping included: worsening by <2 grades (comprised of improvement, within grade worsening and worsening in at most one grade in size score) vs. worsening by two or more grades.

Osteophytes

The change in number of locations affected by any osteophyte was defined as the difference between the number of locations affected by any osteophyte at 24 months (Grade > 0) and the number of locations affected by any osteophyte at baseline. This change was classified as no change, worsening in one location, and worsening by two or more locations and then further classified into no change vs. any worsening. To determine maximum worsening in osteophyte score, we evaluated change in score in each of the 12 locations between baseline and 24 months. Maximum worsening in score was defined as the greatest amount of worsening among the 12 locations. Maximum worsening in score was initially classified as no change, worsening one grade, and worsening by two or more grades. Based on the distribution, the final categorization included no worsening vs. any worsening.

Meniscus

We assessed whether there was worsening in meniscal morphology from baseline to 24 months in each of the six meniscal subregions. We defined worsening as an increase in grade in at least one subregion. Figure 2 shows an example of increase in meniscal extrusion over time. We further categorized worsening in meniscal morphology into number of compartments with worsening (range 0–6) and whether any of the compartments had worsening (yes/no). We assessed changes in meniscal extrusion separately in the medial and lateral compartments. We categorized change in extrusion as improvement, no change, and worsening. We further dichotomized change in extrusion as no worsening vs. any worsening.

Fig. 2
figure 2

Progression of meniscal damage and incident cartilage loss over 24 months. a Baseline coronal intermediate-weighted image shows horizontal-oblique tear of the body of the medial meniscus (arrow). There is no apparent cartilage damage at the tibia or femur at the medial compartment. b Follow-up image obtained 24 months later shows marked incident meniscal extrusion (black-filled arrow) and newly developed cartilage loss at the central portion of the medial tibia (white-filled arrow)

Cartilage

MOAKS uses a two-digit score for cartilage assessment that incorporates both area size per subregion and percentage of subregion affected by full thickness cartilage loss. In this analysis separate scores for cartilage thickness and surface area were considered. The number of subregions with worsening (i.e., a higher score at 24 months vs. baseline) was defined separately for surface area and thickness. Change over time for surface area was computed in two ways: including within-grade changes and excluding-within grade changes. Within grade scoring for cartilage refers to within grade change in area or thickness. For both thickness and surface area, worsening was grouped into 4-levels: 0, 1, 2, or 3 or more areas with worsening.

Hoffa-synovitis and effusion-synovitis

As MRI markers of inflammation so-called effusion- and Hoffa-synovitis are evaluated. Fluid sensitive sequences as applied in the OAI are capable of delineating intraarticular joint fluid but a distinction between true joint effusion and synovial thickening is not possible as both are visualized as hyperintense signal within the joint cavity. For this reason the term effusion-synovitis has been introduced, which is scored based on the distension of the joint capsule. Hoffa-synovitis is a term used for signal changes in Hoffa’s fat pad that are commonly used as a surrogate for synovitis on non-contrast enhanced MRI. Effusion-synovitis is scored from 0 to 3 according to the distention of the joint capsule as 1 = small, 2 = moderate and 3 = large. Hoffa-synovitis is scored based on the amount of hyperintensity signal in Hoffa’s fat pad on sagittal fat suppressed intermediate-weighted sequences as 1 = mild, 2 = moderate and 3 = severe.

Twenty-four month changes in both, Hoffa-synovitis and effusion-synovitis were categorized as improvement, no change, or worsening.

Analytic approach

Descriptive statistics were used to report frequencies for the different features and parameters for baseline and change over time. Logistic regression was used to identify factors associated with statistically significant differences between cases and controls. For some features raw distributions were grouped into categories as described above. In these instances descriptive statistics are presented for both raw and categorical versions of features, and regression was used only for the categorical version. Weighted kappa statistics were applied to determine inter- and intra-observer reliability. All analyses were conducted in SAS 9.4 (SAS Institute, Cary NC).

Results

Mean age of the participants was 62 years, 60 % were women and average BMI was 30 kg/m2 [5]. Cases and controls were balanced on all covariates, with the exception of baseline KLG with a higher proportion of KL3 knees in the case group (44 %) compared to the controls (33 %). Summarizing the intra- and inter-observer reliability results, all of the measures showed at least substantial agreement ranging between 0.68 for Hoffa-synovitis and 0.97 for medial and lateral meniscal morphology. Table 1 gives a detailed overview of the reliability results.

Table 1 Intra- and inter-observer · reliability

BMLs

The number of sub-regions affected by any BML ranged from zero to eight and the maximum BML score per knee ranged from zero to three. The change in number of subregions affected by any BML ranged from −3 (three fewer subregions affected at 24 months compared to baseline) to 5 (five more subregions affected at 24 months compared to baseline). Fourteen percent of subjects showed improvement in number of subregions with BMLs (fewer subregions with BMLs at 24 months as compared to baseline) and 52 % showed no change based on this definition. Seventy-three percent of the cases had any subregions with worsening (vs. 66 % in the control group).

Osteophytes

The number of locations with any osteophytes ranged from zero to 12. The maximum osteophyte score per knee was zero for 3 % of knees, one for 48 %, two for 34 % and three for 15 % of the knees. Overall there was very little change in osteophytes over 24 months. Nine percent of the cohort had at least one location that worsened in osteophyte score over 24 months. Across all locations, the maximum amount of worsening was 2 grades (i.e., zero to two or one to three) and 83 % had no change in any location.

Meniscus

Thirty percent of the knees had any meniscal tear and 28 % showed meniscal substance loss (i.e. maceration). The number of regions with meniscal morphology worsening ranged from zero to five, with 16 % of subjects having worsening in at least one subregion. Fourteen percent showed an increase in medial meniscal extrusion while only one knee had an increase in lateral extrusion.

Cartilage

The number of subregions with worsening in cartilage surface area, including within-grade changes, ranged from zero to eight with 59 % of subjects having at least one area with worsening in surface area. The number of subregions with cartilage thickness score > 0 ranged from zero to seven. Across the entire knee, the number of areas with worsening in cartilage thickness ranged from zero to six with 42 % of subjects having at least one area with worsening in thickness.

Hoffa-synovitis

MOAKS Hoffa-synovitis score ranged from zero to seven and with 24 month change ranging from −2 to 2. While only 10 % of subjects experienced worsening, more cases experienced worsening than controls (17 % vs. 6 %).

Effusion-synovitis

MOAKS effusion-synovitis score ranged from zero to three with 24 month changes ranging from −2 to 2. Forty-one percent of cases worsened compared to 18 % of controls.

Apart from meniscal damage and effusion-synovitis, baseline frequencies of all measures showed statistically significant differences for cases vs. controls. For change parameters, maximum worsening of BML score, 24 months change in osteophytes and meniscal damage and extrusion, all cartilage measures, and Hoffa- and effusion-synovitis showed significant differences between cases and controls.

Tables 2, 3 and 4 present the baseline frequencies of BMLs, osteophytes and the menisci including the grouping of the different scores into broader summary categories, while Tables 5 and 6 show in detail the frequencies for cartilage, Hoffa- and effusion-synovitis. The change observations for the different features are presented in detail in Tables 7, 8, 9 and 10.

Table 2 Baseline frequencies of semi-quantitative MRI biomarkers – BMLs
Table 3 Baseline frequencies of semi-quantitative MRI biomarkers – osteophytes
Table 4 Baseline frequencies of semi-quantitative MRI biomarkers – meniscus
Table 5 Baseline frequencies of semi-quantitative MRI biomarkers – cartilage
Table 6 Baseline frequencies of semi-quantitative MRI biomarkers –hoffa- and effusion-synovitis
Table 7 Twenty-four month change in semi-quantitative MRI biomarkers – BMLs
Table 8 Twenty-four month change in semi-quantitative MRI biomarkers – osteophytes
Table 9 Twenty-four month change in semi-quantitative MRI biomarkers – meniscus
Table 10 Twenty-four month change in semi-quantitative MRI biomarkers – cartilage

Discussion

In this cohort of subjects at risk for OA progression, the values for several tissue-specific MRI features associated with progression of disease vary widely and show great change or fluctuation. The subgroup defined as cases based on composite progression of structural and clinical features exhibited changes to a greater extent than the controls on several features. Specifically, we observed greater change in the case group on maximum change in BMLs, worsening of BMLs in two or more subregions, worsening of cartilage surface area and thickness in three or more subregions and worsening of meniscal damage. Inflammatory markers of disease, i.e. Hoffa- and effusion-synovitis, also worsened more frequently in the case group compared to the controls emphasizing the potential role of inflammation in disease progression [1315]. Overall little change was observed for osteophytes reflecting the generally slow course of the disease.

Focusing on the identical dataset, we could show using a multivariable approach that 24-month change in cartilage thickness, cartilage surface area, synovitis-effusion, Hoffa-synovitis, and meniscal morphology were associated with disease progression independently, suggesting that they may serve as efficacy biomarkers in clinical trials of disease modifying interventions for knee OA [7]. Definition of change using semi-quantitative approaches is challenging as there are multiple possible definitions including subregional or maximum-grade approaches. To gain additional understanding of frequencies and categories encountered in this cohort selected on the basis of progression or serving as controls we performed the current analysis that may help researchers in the future to power planned observational studies or clinical trials.

Few studies are available that have focused on longitudinal change of MRI parameters using semi-quantitative assessment. Most available studies are centered around baseline predictors of subsequent cartilage loss as the outcome [16]; only few studies focus on cartilage as a predictor of worsening BMLs as the outcome [17]. When assessing change using semi-quantitative scoring in OA, scores are commonly presented as mean values or summed over a defined anatomical region (commonly compartment or knee) [18, 19]. For several reasons, such approaches have drawbacks that need to be considered. One of the main shortcomings is that sums are challenging to compare. As an example, a sum of six acquired over six distinct subregions may mean one lesion with a grade 6 (considered severe) while five other subregions exhibit no lesion (grade 0); alternatively, it may reflect grade 1 lesions in all six subregions. More work is needed on the prognostic implications of having widespread low grade involvement vs. a focal severe lesion. It appears likely that both play a role with regard to disease progression [3]. Other approaches to define progression have been published recently [20].

Part of the study design was sequential reading of MRIs not blinded to time point but blinded to case or control status as it has been shown that this approach increases sensitivity to change [21]. Reading unblinded to time point also allowed for the application of within-grade changes, further increasing sensitivity to detect minor changes [12]. In assessing MRI data semi-quantitatively, we are advocating the scoring of the number of subregions or locations involved by pathology, with further stratification using cut-offs related to severity of a certain feature. In addition, an approach assessing a maximum change over a pre-defined unit, such as a knee compartment or the entire joint, adds to the understanding of the degree of change observed, which may be lost using a summative approach. Our definition of controls included both non-progressors and non-composite progressors including those that either progressed clinically (but not radiographically) or radiographically (but not clinically). A further subanalysis is needed to look at differences in changes for these subgroups separately.

Conclusions

In summary, a wide range of MRI-detected structural pathologies was present in the FNIH cohort. More severe changes, especially for BMLs, cartilage and meniscal damage, were detected primarily among the case group suggesting that early changes in multiple structural domains are associated with radiographic worsening and symptomatic progression. Particularly the role of structural predictors of progression that are potentially amenable to therapeutic approaches such as inflammatory markers of disease (depicted as Hoffa- and effusion synovitis on MRI) or subchondral bone changes (visualized as BMLs on MRI) should be the focus of further evaluation. In addition, the complexity of the different semi-quantitative scoring systems needs consideration when engaging in analyses focusing on change over time. Simply summing scores does not seem to be sufficient and further validation of analyses taking into account potentially improving features or within-grade scoring is urgently needed to take full advantage of the richness of semi-quantitative data that is considered complementary to more quantitative approaches based on segmentation of 3D datasets.