Background

Juvenile idiopathic arthritis (JIA) is a heterogeneous childhood disease, with chronic joint inflammation as the common feature. The JIA categories differ by the number of joints affected, and the presence of extra-articular involvement [1]. Disease course and prognosis differ between JIA categories, but there is also large variability within each category [2, 3]. Therefore, efforts have been made to discern baseline clinical prognostic factors that can predict the severity, course, and long-term outcome of the disease [4, 5].

The primary goal of JIA treatment is to achieve remission [6]. Early prediction of the disease course for the individual child can facilitate tailored treatment. There is increasing evidence for the concept of “the window of therapeutic opportunity” in JIA, where early aggressive treatment with biologic agents and/or other disease-modifying anti-rheumatic drugs (DMARDs) may modify the disease course and improve long-term prognosis [7,8,9]. On the other hand, it is also essential to avoid unnecessary, costly, and potentially toxic treatment in children with a favorable prognosis.

Guzman et al. have recently presented a model for prediction of severe disease course, with outcomes developed specifically for their study [10]. In a systematic literature review, Dijkhuizen and Wulffraat state the need for prospective longitudinal studies of baseline clinical predictors using standardized validated outcome measures [4]. In the Nordic JIA cohort, we studied prediction of four established and validated outcomes, and aimed to construct prediction models that may aid decision on early aggressive treatment.

Methods

Study population

The initial prospective longitudinal multicenter Nordic JIA cohort consisted of consecutive children with incident JIA from 12 participating centers in defined geographical areas of Denmark, Finland, Norway and Sweden. All children in these areas with newly diagnosed JIA and disease onset in the study periods between 1 January 1997 and 30 June 2000 were included. The study was designed to be as close to population-based as possible, as previously reported [11].

In the current study, 440 children met the criteria of having a baseline study visit and a final study visit 8 years after disease onset. Out of these, 17 patients with systemic JIA were excluded, because systemic JIA is considered to have autoinflammatory rather than autoimmune disease mechanisms, and the clinical characteristics of predominantly fever, rash and serositis differs from other JIA-categories [12].

The baseline study visit was planned 6 months after disease onset. At this visit, disease activity variables, complete joint count, physician’s global assessment of disease activity (physician’s GA) on a 10-cm visual analogue scale (VAS), patient’s/parent’s global assessment (GA), medication and blood tests were registered [13]. Disease onset was defined as the time of presentation of symptoms of active arthritis, and the JIA categories were determined according to the International League of Associations for Rheumatology (ILAR) criteria [14].

Outcomes

At follow up, we evaluated 4 outcomes: (1) the main outcome was non-achievement of remission off medication, chosen as the best available validated measure of an adverse disease state over time. This included active disease, inactive disease of less than 12 months of duration, and clinical remission on medication (according to the preliminary Wallace criteria) [15, 16]. For the remainder of the paper, not in remission or non-achievement of remission refers to non-achievement of remission off medication unless otherwise specified; (2) and (3) functional disability was evaluated using the Childhood Health Assessment Questionnaire (CHAQ), and the Child Health Questionnaire Parent form (CHQ-PF50), aiming to achieve a broad evaluation of functional disability using both the JIA-specific CHAQ and the generic CHQ-PF50 instruments. CHAQ addresses functional ability in different activities of everyday life [17]. The CHAQ was completed by children of age >9 years, and otherwise by their parents, and the corresponding Health Assessment Questionnaire (HAQ) by participants > 18 years of age. From this point on in the text, CHAQ will refer to both the CHAQ and HAQ scores. The CHQ-PF50 consists of 50 items and 12 domains assessing health-related quality of life, yielding a physical summary score (PhS) and a psychological summary score (PsS) [17]. PhS ranges from 0 to 100, with a higher score indicating better functional ability; and (4) joint damage was assessed using the Juvenile Arthritis Damage Index of articular damage (JADI-A) ranging from 0 to a maximum of 72, where 36 joints, or joint groups, are scored 0 for no damage, 1 for partial damage, or 2 for severe damage [18]. All 4 outcomes were dichotomized; remission was dichotomized into clinical remission (those achieving remission without medication), and non-achievement of remission off medication (those not achieving remission or achieving remission on medication), CHAQ and JADI-A into score = 0, indicating no functional disability or no joint damage, and positive score >0, PhS into good functional ability, defined as score ≥40, and functional disability <40. This latter cutoff level is based on a reference score of 40 being one standard deviation below the mean score of healthy children in the USA [19].

Laboratory tests

Antinuclear antibodies (ANA) and rheumatoid factor (RF) were tested at least twice with a minimum of 3 months apart. ANA was analyzed by immunofluorescence on Hep-2 cells. Tests were interpreted according to cutoff values of the local immunological laboratories. HLA-B27 was analyzed using standardized methods [20]. C-reactive protein (CRP) was measured with immunoassays, with values <10 mg/L considered normal.

Statistics

Conventional descriptive statistics (absolute numbers, median, 1st and 3rd quartile, and percentage) were used to describe demographics and clinical characteristics. Univariate logistic regression was performed to assess baseline variables as predictors for each outcome. Variables that were significant at p < 0.05 in the univariate analysis were considered as candidates in a prediction model.

For each outcome, multivariable logistic regression models were constructed using a combination of predefined core variables, and additional variables selected using a forward stepwise selection method. Since the predictive ability of the models is assessed using cross-validation, the conventional limitations related to the screening of a large number of covariates in multivariable models are evaded [21]. Cross-validation controlled for overfitting of the data (internal validation), and the degree of overfitting is reflected in the performance in validation sets.

Clinical characteristics included in the Wallace provisional criteria for remission were a priori included in the prediction models; the cumulative active joint count, erythrocyte sedimentation rate (ESR), CRP, physician’s GA, and morning stiffness [22]. Uveitis activity applies only to a minority of the cohort and was therefore not included. The additional baseline variables were included in a stepwise fashion if they contributed to the multivariable model with p < 0.05 when included. Symmetric joint involvement was not considered a candidate predictor as it correlates strongly with the specific joint involvement (Fig. 1). To ensure model simplicity the total number of variables was not allowed to exceed 10. Once the set of variables were selected, the model coefficients β i for each predictor variable x i were estimated using multivariable logistic regression, and the probability of unfavorable outcome was given as:

Fig. 1
figure 1

Correlations between baseline variables. Lines are drawn only between pairs of baseline variables for which the sample Spearman correlation coefficient is ≥ 0.50. Baseline variables without correlation ≥0.50 are not included in the figure. RF, rheumatoid factor; VAS, visual analogue scale; GA, global assessment; CHAQ, Childhood Health Assessment Questionnaire

$$ P=1/\left(1+{e}^{-A}\right),\mathrm{where}\ A={\beta}_0+{\beta}_1{x}_1+\dots +{\beta}_n{x}_n. $$

For each of the four outcomes, cross-validation of the method was performed by partitioning the cohort randomly in training sets consisting of three quarters of the patients (N = 317) and validation sets consisting of one quarter of the patients (N = 106). In each realization of the random partitioning we constructed prediction models using the algorithm described above, using only the training set to select variables and estimate coefficients. For each of the patients in the corresponding validation set the multivariable logistic model provides a probability of the unfavorable outcome. By comparing the predicted probability of unfavorable outcome with the actual outcome at the final study visit, the receiver operating characteristic (ROC) curve was computed, and the area under the curve (AUC) was estimated. The median AUC with interquartile range (IQR) was estimated from 100 realizations of the random partitioning of the cohort. For each step in the cross-validation we omitted any patients where the outcome or the required predictor variables were not available.

Finally, in our cohort we tested the prediction model for severe disease course developed by Guzman et al. [10]. We tested Guzman’s model using the 4 outcome measures described above, i.e. not the outcomes for which their model was constructed. The analysis was performed using the software packages STATA version 14, and Wolfram Mathematica version 11.1.1.0.

Ethical considerations

Approvals from medical research ethical committees and data protection authorities were granted according to the regulations of each participating country. Written informed consent was obtained from parents of children aged < 16 years, and from the children themselves if aged ≥ 16 years of age.

Results

The main finding is that in the Nordic cohort, long-term outcome in JIA can be predicted, with acceptable sensitivity and specificity, using only a handful of readily available clinical variables.

Study cohort

Characteristics of the 440 patients in the cohort have previously been published [11]. The study cohort constituted 423 children, after 17 patients with systemic JIA were excluded. The median time between disease onset and the baseline study visit was 7 (IQR 6–8) months, and between disease onset and the final study visit it was 98 (IQR 95–102) months. The median time from disease onset to diagnosis was 1.6 (IQR 0.5–3.3) months. A total of 280 patients (66.2%) were female, and the median age of disease onset in the cohort was 5.5 (IQR 2.5–9.7) years (Additional file 1: Table S1).

At the baseline study visit, 227/423 patients (53.7%) had oligoarthritis, 94/423 (22.2%) had rheumatoid factor (RF)-negative polyarthritis, and 4/423 (1.0%) had RF-positive polyarthritis (Additional file 1: Table S1). The median cumulative number of active joints within the first visit was 3 (IQR 1–6), and 381/423 patients (90.1%) had one or more affected lower limb joints at the baseline visit. Antinuclear antibodies (ANA) were present in 115/410 patients (28.1%), and HLA-B27 in 85/393 patients (21.6%) [23], presented in Additional file 1: Table S1. None of the children had started biologic agents before the baseline study visit, and early medications are shown in Additional file 2: Table S2. A total of 410/423 (96.9% of the total cohort) had baseline assessments and data on remission 8 years after disease onset. The corresponding numbers were 340/423 (80.4%) for CHAQ, 199/423 (47.0%) for PhS and 216/423 (51.1%) for JADI-A.

Correlation between baseline variables

The clinical predictor variables were analyzed with respect to correlation. There was significantly positive, moderate to strong correlation between several variables, especially between cumulative number of active joints, the joint-specific variables, and the polyarthritis RF-negative category. Physician’s GA and the patient-reported outcomes also correlated positively with each other. The correlation structure between the predictor variables is illustrated in Fig. 1.

Prediction of non-achievement of remission off medication

Remission status at the final study visit was available for 410 patients. There were 166 (40.5%) children in remission without medication, while 38 (9.3%) were in remission on medication, and 206 (50.2%) were not in remission: 244/410 children (59.5%) did not achieve remission off medication. The baseline predictors of not achieving remission off medication were analyzed by univariate logistic regression and are presented in Table 1.

Table 1 Baseline clinical characteristics as predictors of non-achievement of remission off medication in univariate logistic regression

The following predictor variables were included in the multivariable prediction model for non-achievement of remission: Cumulative active joint count, ESR, CRP, morning stiffness, physician’s GA, ANA, HLA-B27, and ankle joint arthritis. The first five variables were chosen a priori, and ANA, HLA-B27, and ankle joint arthritis were the variables included through the stepwise selection method (Table 2). The model has an AUC of 0.84 in the total cohort. Cross-validation yielded a median AUC = 0.78 (IQR 0.72–0.82) in the validation sets (Table 3). The corresponding ROC curves are shown in Figs. 2 and 3.

Table 2 Prediction of unfavorable outcome by multivariable modeling of baseline clinical characteristics
Table 3 Cross-validation of the four prediction models of unfavorable long-term outcome in the Nordic JIA cohort
Fig. 2
figure 2

Receiver operating characteristic (ROC) curves for the four unfavorable clinical outcomes in the total cohort. Non-achievement of remission off medication; CHAQ, Childhood Health Assessment Questionnaire; PhS, Physical Summary Score; JADI-A, Juvenile Arthritis Damage Index-Articular

Fig. 3
figure 3

Receiver operating characteristic (ROC) curves for the four unfavorable clinical outcomes in the validation sets. The colored lines are the mean ROC curves for the 100 different realizations of the partitioning of the cohort into training sets and validation sets (thin gray curves). a Not in remission. b Childhood Health Assessment Questionnaire (CHAQ) >0. c Physical Summary Score (PhS) <40. d Juvenile Arthritis Damage Index-Articular (JADI-A) >0

We also developed a prediction model without the blood samples (ESR, CRP, ANA, and HLA-B27). This model yielded an AUC = 0.76 (IQR 0.72–0.80) for non-achievement of remission in the validation sets (Additional file 3: Figure S1).

Prediction of functional disability and joint damage

The CHAQ score at the final study visit was available in 340 children, and 111 (32.7%) had a CHAQ score >0. Three of the four patients with RF-positive polyarthritis reported functional disability. For univariate logistic regression results see Additional file 4: Table S3.

The prediction model for CHAQ score >0 uses cumulative active joint count, ESR, CRP, morning stiffness, physician’s GA, finger joint arthritis, and pain VAS as variables (Table 2). The AUC of this model was 0.79 in the total cohort, and cross validation gave a median AUC of 0.73 (IQR 0.67–0.76) in the validation sets (Table 3). The ROC curve for the total cohort, and validation sets are shown in Figs. 2 and 3, respectively. The AUC for the model without blood samples was 0.72 (IQR 0.67–0.76) in the validation sets (Additional file 3: Figure S1).

Of the 199 patients with a physical summary score, 40 (20.1%) had a score <40. Results of the univariate analysis with PhS <40 as the outcome variable are shown in Additional file 5: Table S4. Variables included in the prediction model for PhS were cumulative active joint count, ESR, CRP, morning stiffness, Physician’s GA, and pain VAS (Table 2). The AUC was 0.90 in the total cohort, and cross-validation gave a median AUC = 0.74 (IQR 0.65–0.80) in the validation sets (Table 3, Figs. 2 and 3). The AUC for the model without blood samples was 0.73 (0.66–0.79) in the validation sets (Additional file 3: Figure S1).

The JADI-A was collected for 216 patients at the final study visit, and 29 patients (13.4%) had joint damage registered 8 years after disease onset. The baseline predictors of joint damage are presented in Additional file 6: Table S5. In the prediction model, older age at disease onset and finger joint arthritis were included in addition to the five previously included variables (Table 2). The AUC was 0.84 in the cohort, and the median AUC was 0.73 (IQR 0.63–0.76) in the validation sets. The results are summarized in Table 3 and Figs. 2 and 3. Without blood tests the median AUC in the validation sets was 0.73 (IQR 0.63–0.80) (Additional file 3: Figure S1).

Other prediction models

The prediction model developed by Guzman et al. [10] was tested in our cohort by testing the ability of their model to predict the four outcomes described above. The model yielded an AUC = 0.69 for prediction of not achieving remission. For CHAQ >0, PhS <40, and JADI-A >0 the AUCs were 0.68, 0.69, and 0.71, respectively (Additional file 7: Figure S2).

Discussion

In the Nordic JIA cohort, we have developed and evaluated prediction models for long-term unfavorable outcome with acceptable sensitivity and specificity based on variables easily available at baseline, which may guide individually tailored treatment. Prediction of long-term unfavorable outcome early in the disease course may be useful in deciding when to start aggressive treatment in JIA.

To our knowledge, this is the first study on long-term prediction of well-established disease outcomes in a prospective population-based JIA cohort. Cross-validation analysis of model performance yielded AUCs of 0.78, 0.73, 0.74, and 0.73, for non-achievement of remission, CHAQ >0, PhS <40, and JADI-A >0, respectively.

An important step in developing applicable prediction models for JIA was carried out by Guzman et al. in a Canadian JIA cohort [10]. The authors recommended that their results should be tested in other JIA cohorts. We were not able to reproduce the predictive ability of their model in the Nordic JIA cohort (Additional file 7: Figure S2). One obvious reason for the discrepancy could be that Guzman’s model is constructed to predict severe disease course, and not per se, any of the four pre-established, validated adverse outcomes that we assessed. Other reasons may be differences in the population-based approach, cohort composition, or ethnicity, or overfitting of models to the cohort.

The primary goal in the treatment of children with JIA is to achieve remission off medication, and the main implication of the current study is that prediction models may be useful in guiding decisions about treatment. Previous studies have indicated that the disease course may be modified by starting appropriate treatment early [9, 24, 25]. To reach the goal of early inactive disease, a treat-to-target strategy including shared decision-making with well-informed children and parents is currently recommended [6, 9]. Even with promising advances in using gene expression profiles and biomarkers as predictors of treatment response and flare risk [26,27,28,29], the practical value of prediction based on a handful of readily available clinical variables cannot be understated.

The main strengths of our study are the use of validated outcome measures, the simplicity of the models, and the strict cross-validations. The use of validated outcomes is called for in reports on prognosis in JIA [3, 30, 31]. Model simplicity is ensured through the model construction method, where the main variables in the preliminary Wallace criteria of remission are included in the models a priori [15, 22]. The additional variables that were included in our models have independently been associated with adverse outcomes in previous studies [4, 23, 32,33,34,35,36].

The model performance was assessed using cross-validations, where predictions were performed on validation sets that were completely separate from the data used to construct the models. The 100 repeated model constructions and evaluations prevent overfitting the data. Despite the strictness of the model-developing procedure, we still obtained acceptable predictive ability. The robustness and applicability of the prediction rules are emphasized by the fact that when the analyses were repeated without any blood tests, the performance was similar. An online calculator based on our models is available at the web-page http://predictions.no. An iOS app is also designed, and the test versions are available on request.

One of the limitations of our study is that for some of the patients, the baseline study visit scheduled 6 months after disease onset was not the first clinical visit. Some children had therefore already started treatment, mostly with nonsteroidal anti-inflammatory drugs (NSAIDs) or intraarticular corticosteroids, and were not treatment naïve when the predictor variables were assessed. This baseline time point, however, allowed use of the cumulative active joint count during the first 6 months of the disease, which is an important measure of early disease severity in line with the International League of Associations for Rheumatology (ILAR) criteria. A limitation is also that the primary outcome, non-achievement of remission off medication, is defined as inactive disease for more than 1 year, and this outcome does not necessarily reflect the disease course during the whole 8-year period. In addition, JADI-A is a rather crude measure of joint damage, and future predictive studies should therefore include imaging in joint damage assessment. Finally, the treatment given during the disease course may have altered the disease outcome, even though biologic medications were not generally available in the beginning of the study period in 1997. The natural history of JIA disease course without treatment is clearly impossible and unethical to study.

Conclusion

We have developed statistical models for predicting non-achievement of remission off medication, functional disability, and joint damage in children with JIA. The models are easy to use, and may provide a valuable tool to aid early treatment decisions on the need for DMARDs including biologic agents if validation in other JIA cohorts and across ethnicities can confirm our results [37]. We encourage further testing of our models before the applicability can be generalized and recommended.