FormalPara Key Summary Points

Why carry out this study?

Magnetic resonance imaging (MRI) is a well-accepted technique for detecting bone marrow edema, which is a specific sign of osteitis in patients with axSpA.

The clinical hypothesis of this study is that the MRI status (negative or positive) of remised patients could be used for prediction of disease flare and should be weighted as an appropriate supplement for current clinical remission standards.

What was learned from the study?

MRI and anti-TNF-α treatments were independently related to disease flares. We fit a nomogram predictor including gender, disease duration, HLA-B27, MRI, and anti-TNF-α treatment with ROC curve of the 1-year remission probability in the training and validation groups were 0.71 and 0.729, respectively.

Our study suggested that the MRI status and anti-TNF-α treatment should be well considered among patients with axSpA for predicting risk of flare. Our nomogram predictive model for flare might be well validated before using in practice.

Digital Features

This article is published with digital features, including a summary slide, to facilitate understanding of the article. To view digital features for this article go to https://doi.org/10.6084/m9.figshare.13562414.

Introduction

Axial spondyloarthritis (axSpA) is a chronic inflammatory rheumatic disease characterized by a diversity of clinical symptomatology, including inflammatory back pain, enthesitis, dactylitis, and the formation of syndesmophytes, all of which significantly affect a patient’s healthy functioning and general quality of life [1]. Previous studies reported that nearly 20–70% of patients with axSpA suffered from a flare, including single or repeated flares, as well as flares with no intermittent symptoms [2, 3]. Evidence from the Trial with Human rFSH in Europe and the Rest of the World (ESTHER trial) demonstrated that patients with axSpA can still suffer from flare events, even with the administration of etanercept (ETA) [4]. Constant symptoms of flares are significantly associated with a poorer prognosis and an overall less healthy status [3, 5]. However, little is known about the influence of the risk factors and the mechanism of a flare. Therefore, it is necessary to explore the potential risk factors of flares, which may provide key information regarding the prediction of flares, as well as the prognosis for axSpA patients.

Magnetic resonance imaging (MRI) is a well-accepted technique for detecting bone marrow edema, which is a specific sign of osteitis in patients with axSpA [6]. Moreover, erosions occurring above a certain Spondyloarthritis Research Consortium of Canada (SPARCC) Enthesitis Index score threshold were highly specific for axial SpA, excluding postpartum women [7]. However, MacKay et al. reported that there was no significant correlation between total SPARCC and the Ankylosing Spondylitis Disease Activity Score with CRP (ASDAS-CRP) [8]. Van der Heijde et al. also reported that there was a lack of association between clinical remission and MRI-demonstrated remission [9]. Furthermore, MRI sacroiliitis can be observed in recurrent acute anterior uveitis (rAAU) patients without any musculoskeletal symptoms [10]. One possible reason is that SPARCC from MRI may provide additional imaging information about disease activity, which cannot be provided by ASDAS. This additional information from MRI imaging may be helpful in assessing the activity of axSpA. Therefore, one clinical question is: if a patient achieves clinical remission, could the information of his/her MRI status (negative or positive) be considered as a predictor for the possibility of disease flare, and does it need to be weighted as an appropriate supplement for current clinical remission standards?

In this study, we initially sought to compare the difference in MRI status of active sacroiliitis between patients with and without flare, and then to evaluate whether active sacroiliitis in MRI also needs to be considered once clinical remission has occurred. Previous studies suggested that gender, age, disease duration time, anti-TNF-α treatment, and HLA-B27 more or less affected the disease flare [11,12,13]. However, a prediction model combining various factors has not yet been reported. Therefore, our second aim is to construct a set of nomogram prediction models to facilitate the forecasting of the possibility of flare for clinical remission in patients with axSpA.

Methods

Study Design and Participants

Effects of nonsteroidal anti-inflammatory drugs in recurrence of spondyloarthritis patients after remission (NASA study, ClinicalTrials.gov ID: NCT03425812) is an ongoing multicenter, randomized, parallel-controlled study conducted at Fujian, China from 2017 to 2021. The NASA study aimed to investigate the recurrence rate in remission patients with axSpA who withdrew from nonsteroidal anti-inflammatory therapy. As part of the NASA study, eligible participants who fulfilled the following criteria were included in our study: those aged ≥ 18 years at screening; those who fulfilled the ASAS (Assessment of Spondyloarthritis International Society) classification criteria for axSpA [14, 15]; were followed up regularly for more than 1 year; achieved a Low Ankylosing Spondylitis Disease Activity Score (ASDAS) [12]; and underwent MRI examination. Follow-up of these patients was continued for 12 months (Fig. 1).

Fig. 1
figure 1

Flowchart of participants selection

The ASAS classification criteria for axSpA is defined as: the presence of sacroiliitis on plain-film radiography or on MRI, plus at least one SpA feature, or, the presence of HLA-B27 plus at least two SpA features [16]. An ASDAS below 2.1 is defined as low disease activity in a 2018 update of the nomenclature for disease activity states because the majority of patients consider themselves as being in a patient-acceptable symptom state (PASS) in this category [12, 17, 18]. Patients who achieved Low Disease Activity score (ASDAS ≤ 2.1) and sustained for 3 months or more were included. Considering the high false-positive diagnosis of axSpA by using MRI among pregnancy women and athletes, patients were not eligible if they had pregnancy-related low back pain [19] or if they participated in intensive physical training, such as military recruits and athletes [20, 21]. Other excluding criteria included: autoimmune diseases other than axSpA, chronic infection, severe organ failure, pregnancy, and mental illness. Participants who were included in NSAIDs elution phase and randomized grouping in NASA study were also excluded. We divided participants into a training group and a validation group for model development. Participants who were from the First Affiliated Hospital of Fujian Medical University were included in the validation group. Participants who were from research centers other than the First Affiliated Hospital of Fujian Medical University were included in the training group (Fig. 1).

Ethical approval for the NASA study, including the consent procedure, was granted by the Research Ethics committee of the First Affiliated Hospital of Xiamen University (reference number: KYH2018-007) and the First Affiliated Hospital of Fujian Medical University (reference number: MRCTA, ECFAH of FMU [2018] 198) with the Declaration of Helsinki V and the Danish legislation. All participants gave written informed consent prior to study inclusion.

Clinical Assessment

History and physical examination were performed by rheumatologists at baseline and at 4-week interval visits for a duration of 1 year. The general information collected at the baseline evaluation visit included age, gender, disease duration of axSpA, tobacco smoking status, alcohol drinking status, history of inflammatory back pain, family history of axSpA, and presence of any extra-articular manifestations (including psoriasis, inflammatory bowel disease, and uveitis).

All selected participants completed outcome measurements. The outcome measurements in spondylarthritis were recorded following the report from Jane Zochling [22], which included ASDAS-CRP and Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) at baseline and at the follow-up visits. Details of past and present antirheumatic therapies were also recorded, such as nonsteroidal anti-inflammatory drugs (NSAID), antitumor necrosis factor-α (anti-TNF-α) treatment, and disease-modifying antirheumatic drugs (DMARDs). These therapies were considered noteworthy only if the patient related a history of regular administration of a particular therapy, and if the total time of the therapy administration (both regular administration and tapering) was greater than 50% of the time from enrollment to the end of study. Peripheral arthritis was defined as presence of pain, swelling, and/or tenderness of any peripheral joint excluding shoulder and hip joints.

Laboratory Assessment

Human leucocyte antigen B27 (HLA-B27; flow cytometry method, Beckman Coulter Inc.) status, erythrocyte sedimentation rate (ESR, mm/h), and C reactive protein (CRP, mg/l; immune turbidimetry method, Beckman Coulter Inc.) were assessed as well. A limited value of 2 mg/l was used when CRP level was below 5 mg/l [23].

MRI Scoring

An MRI of the sacroiliac (SI) joints was performed on each patient at the time of his/her baseline visit, using the Siemens Magnetom Skyra 3.0 T MRI with Picture Archiving and Communication System. The acquired MRI images included semi-coronal T1-weighted imaging sequences with a repetition time of 700 ms, echo time of 24 ms, slice thickness of 3 mm, gap of 0.45 mm, 200 × 200-mm field of view, number of excitations of two, and 384 × 288–pixel matrix size. A semi-coronal short tau inversion recovery (STIR) sequence was also applied with repetition time of 3500 ms, inversion time of 150 ms, echo time of 45 ms, slice thickness of 3 mm, gap of 0.45 mm, 200 × 200-mm field of view, number of excitations of three, and 320 × 224-pixel matrix size [7, 24, 25].

The MRIs were anonymized and evaluated by two independent, experienced radiologists with extensive, and well-calibrated axSpA MRI reading experience who were blinded to the clinical data and the patient’s final diagnosis. In the case of score discrepancies, a third radiologist with 20 years of MRI experience scored the images. According to the ASAS definition, an MRI scan of SI joints was recorded as positive (active sacroiliitis) if BME was clearly present and located in a typical anatomical area (subchondral bone), on a T2-weighted sequence, and on at least two consecutive slices of an MRI scan. Detection of inflammation on a single slice would have to show the presence of more than one inflammatory lesion. Other clinical conditions, such as infection, tumor, and trauma had to be excluded [26]. The STIR of MRI scans was assessed for disease activity using the SPARCC system for sacroiliac joints (0–72 scale) [9, 24, 27]. SPARCC score ≥ 2 is defined as SPARCC positive [9].

Outcome and Censoring

The primary outcome in this study was flare, which was defined as one flare, occurring after Low Disease Activity (ASDAS), if ASDAS-CRP ≥ 2.1 [17] and the change of ASDAS-CRP ≥ 0.9 during the 12-month follow-up [28]. Censoring included patients who did not experience the event duration the follow-up period, patients who were lost to follow-up during the study period, or patients who withdrew from the study [29].

Quality Control

The NASA study followed the principles of Good Clinical Practice [30], the Declaration of Helsinki V, and the Danish legislation. A standard protocol of quality control was established for this study. An investigators’ meeting was held prior to the initiation of the study to discuss the scheme, research operations, and the method of sample detection in detail. Standard training programs for data collection and method of sample detection for investigators from all study centers were administered. We set up a multicenter coordinating committee to coordinate research matters. Professional clinical research auditors were employed in order to ensure the quality of collected data and information. Each center established regulations and standard management procedures and Standard Operation Procedure [31] in order to strengthen the training of researchers and to ensure the progress of the trial.

Statistical Analysis

Medians and interquartile ranges (IQR) of continuous variables and frequencies (percentages) of categorical variables were calculated. Mann–Whitney tests were used for comparison of patient characteristics.

A total of 15 potential predictors were selected in this study based upon our literature review [11, 12], expert consensus from a Delphi study [32], and our previous cohort study [13, 25, 33]. We then selected four predictors by multiple Cox regression models, one predictor selected by literature review, and an experts’ consensus in order to construct a nomogram (Figure S1) [34]. Cox regression models were performed to explore the potential confounders among the overall participants. Variables with a p value of less than 0.05 in crude models were selected as predictors of the multiple Cox regression models to identify the independent predictors. The forward likelihood ratio method was used to select independent risk factors. Factors entered the stepwise model at p < 0.05 and exited at p < 0.10. A Kaplan–Meier curve was used to estimate means for relapse time and assess the proportional hazards assumption (PH) for categorical variables. Goodness of fit testing (Schoenfeld residuals) was used to assess PH assumption for continuous data [29].

In the training group, in order to construct nomogram prediction models, a crude Cox regression model was created, and selected those factors reaching a significance threshold of p < 0.25 to forward stepwise multiple Cox regression models in order to determine predictors (according to each of the response criteria examined). Factors entered the stepwise model at p < 0.05 and exited at p < 0.27. The coefficients of the multiple Cox regression model were used to generate a nomogram. Then, we evaluated the validity of the nomogram in the validation group. The concordance index (C-index) and AUROC were used to evaluate the discrimination ability of the nomogram. Calibration curves were plotted to assess the calibration ability of the nomogram. Decision curve analysis (DCA) was performed to demonstrate the clinical usefulness of the nomogram [33,34,35]. DCA is a novel method for evaluating alternative prognostic strategies, which has advantages over AUROC [36, 37].

We further constructed a clinical experience model following the same procedure as was mentioned earlier; the C-index, ATROC and DCA were also calculated to compare whether the nomogram prediction model was better than the clinical experience model.

We used the research data collecting system (RDC, version 1.0, patent application number 202010087680.0) to collect the data. The R (The R Foundation for Statistical Computing, version 3.6.2) statistical packages “rms,” “survival,” “foreign,” and “survival ROC” were used to generate the nomogram, calculate the C-index, and plot the calibration curves, the AUROC curves, and Kaplan–Meier curves, respectively. The source file “stdca.r” was obtained from the website www.mskcc.org, which was used to draw the DCA curves. Two-sided p value < 0.05 was considered statistically significant.

Results

Demographic and Clinical Characteristics

Between 2017 and 2019, a total of 372 participants were screened for inclusion in the NASA cohort. A total of 251 patients (67.4%) met the criteria for study inclusion and completed their baseline visit (Fig. 1). Overall, the median age was 31 years (IQR: 27–39 years) and 66.9% were male. Among them, the symptom duration at baseline was 4 years (IQR: 2–6 years) (Table 1). Most of the participants (89.2%, n = 224) were receiving NSAIDs therapy, more than half (65.3%, n = 164) were receiving DMARDs, and nearly one-half of the participants (49.4%, n = 124) were receiving anti-TNF-α treatment. The median ASDAS was 1.31 (IQR: 0.69–2.63) and 55.8% (n = 140) of participants were MRI positive. Participants in the flare group had longer symptom duration at their first visit, a lower percentage with anti-TNF-α treatment and MRI negative status, and a higher SPARCC score of the SI joints as compared with those of the non-flare group (all p values < 0.05). The mean relapse-free time for total participants was 8.7 months (95% CI 8.21–9.19); and 7.78 months (95% CI 7.07–8.48) for MRI-positive participants and 9.87 months (95% CI 9.27–10.47) for MRI negative participants, respectively (p for log-rank test < 0.05). Among all patients, peripheral joint involvement was present in 33.4% of all patients (84/251); 40 (32.0%) in non-flare group and 44 (34.9%) in flare group. There was no statistical difference in peripheral joint involvement in two groups (p = 0.625). A total of 25.0% patients (63/251) presented nr-axSpA at baseline, 36 (28.8%) in non-flare, and 27 (21.4%) in the flare group. There was no statistical difference for nr-axSpA in two groups (p = 0.178).

Table 1 Characteristics of participants including in this study

Independent Risk Factors of Flares

Crude Cox proportional hazards models among all participants showed that features of active sacroiliitis on MRI and history of anti-TNF-α treatment were significant predictors of flare at the 0.05 level of significance (Table 2). Multiple Cox regression models for the significant variables from crude Cox models found that both MRI status (HR 1.792, 95% CI 1.230–2.611, p = 0.002) and anti-TNF-α treatment (HR 0.507, 95% CI 0.349–0.736, p < 0.0001) were the significantly associated with disease flares.

Table 2 Predictors associated with disease flare using univariate and multiple Cox regression models in training group

Construction and Validation of the Nomogram Prediction Models

Among the overall 251 eligible participants, 144 participants, and 107 participants were included in the training and validation groups, respectively. The characteristics of participants in the training and validation groups are shown in Table 3.

Initially, we selected four predictors (symptom duration first visit, HLA-B27, MRI, and anti-TNF-α treatment) using the univariate and multiple Cox regression models in the training group (Table 4). Combined with the other 11 predictors selected by literature review [11, 12], expert consensus from a Delphi study [32], our previous cohort study [13, 25], and for the purpose of simplified clinical application [33], a total of 15 potential predictors were included for the nomogram predictive model. Finally, five predictors, including gender, symptom duration at first visit, HLA-B27, MRI, and anti-TNF-α treatment, were incorporated into the nomogram model (Fig. 2).

Fig. 2
figure 2

Nomogram model for predicting flare in axSpA patients achieving low disease activity

The calibration curves of the nomogram for the training set and validation set showed a good probability consistency between the prediction and observation (Fig. 3a, b). The C-indices for the prediction of overall relapse in the training and validation groups were 0.66 (95% CI 0.646–0.67) and 0.667 (95% CI 0.653–0.681), respectively. AUROCs of the 1-year remission probability in the training and validation groups were 0.71 and 0.729, respectively (Fig. 3c, d).

Fig. 3
figure 3

Calibration curves of the nomogram for the training set (a) and validation set (b). The areas under the AUROC curve of the 1-year remission probability in the training (c) and validation (d) groups, respectively

Compared with Clinical Experience Model

We further constructed a clinical experience model including gender, disease duration, HLA-B27, based on clinical experience and literature review [11, 12] to compare with the nomogram model. Compared with the clinical experience model, the calibration curves of the nomogram showed a good probability consistency between the prediction and observation (Fig. 4a, b). The C-indices of the clinical experience model in the training and validation groups were 0.547 (95% CI 0.0129 0.534–0.559) and 0.576 (95% CI 0.561–0.591), respectively. AUROCs of the clinical experience model was 0.589 and 0.554, respectively, for predicting 1-year remission probability in the training and validation groups (Fig. 4c, d). Compared with the clinical experience model, the nomogram had higher net benefits according to the DCA curves, which indicated that it had better clinical utility (Figure S2).

Fig. 4
figure 4

The C-indices of the clinical experience model in the training (a) and validation groups (b), respectively. AUROCs of the clinical experience model for predicting 1-year remission probability in the training (c) and validation groups (d)

Discussion

Based on the data from a multicenter randomized clinical trial, we found that MRI status and anti-TNF-α treatment were independent predictive factors of flare among patients with axSpA. We further constructed a nomogram predictive model for flare which provided adequate ability of discrimination and calibration, and good clinical utility.

Many previous studies report that MRI is good at detecting BME for the early diagnosis of axSpA [6], however, using MRI in monitoring the patient’s response to therapy and managing treatment is still not widely accepted [27, 32]. In our previous study, we measured the capability of MRI to predict the possibility of flare in a small cohort [25]. In this study, we further use a prospective study of 251 participants to evaluate the predictive value of MRI and construct the predictive model of flare among axSpA patients. We used Cox proportional hazards regression models to detect the independent risk factors associated with disease flares when patients had reached a patient-acceptable symptom state (PASS, ASDAS < 2.1). Our results showed that MRI status at the baseline visit was an independent risk factor for flares; which suggested that MRI status also needs to be considered for patient management following clinical remission. We found that anti-TNF-α treatment was another independent predictor significantly related to disease flares. This conclusion is consistent with that from previous studies, which also suggested that anti-TNF-α treatment might directly affect the disease flare [11, 12]. Our previous study found that anti-TNF-α treatment could significantly improve the clinical symptoms and enhance joint activity among axSpA patients [13], which might partially explain the connection. We reconfirmed the previous conclusion that anti-TNF-α treatment can significantly improve the therapeutic effects of axSpA.

Clinically, various factors and clinical conditions should be considered and balanced when planning to modify patient management after clinical remission, such as, gender, age, duration of disease, HLA-B27, MRI result, and anti-TNF-α treatment [11,12,13]. However, some of these factors are primarily used for research purposes. For example, the MRI SPARCC scoring system provides an objective measure of disease activity for axial SpA, but it is barely used in clinical practice [32]. This is primarily because the SPARCC system requires MRI and scoring training, and it is a time-consuming procedure, which is not easy for clinicians to use. The ASDAS score in practice is also faced with the same problem. Thus, we attempted to transform SPARCC scores into a simple dichotomous variable, a positive or negative MRI status. Then we constructed a nomogram prediction model, which may help rheumatologists to predict and evaluate the 1-year probability of remission by a very simple scoring and visualizing mode. Our nomogram model contained only five simple variables, representing a straightforward visual tool for the prediction in clinical practice. To the best of our knowledge, this is the first nomogram model predicting axSpA maintaining remission and disease flares, in order to modify therapy methods. The AUROCs of nomogram prediction models could be range from 0.71 to 0.729, suggesting a relative acceptably sensitivity and specificity for the prediction. It could be helpful for doctors to quickly identify patients with high- and low-risk of flares, and then to decide whether a careful follow-up should be performed in higher-risk patients.

Furthermore, we also compared the validity of the nomogram mode with a clinical experience model that included gender, disease duration, and HLA-B27 value. The C-indices of the clinical experience model in the training and validation groups were no larger than 0.60, and the AUROCs of the clinical experience model were 0.589 to 0.554, which is far lower than for the nomogram prediction models. We also applied the DCA, a novel method for evaluating alternative prognostic strategies [36, 37], and found an advantage in the nomogram model.

This study has some limitations. Due to the limited number of participants in this cohort, further large-scaled studies are warranted in order to verify the sensitivity and specificity of the model. Also, the participants were from a clinical trial, which might limit the extent of the conclusions; studies in other countries and areas of the world are needed in order to reconfirm our results. Furthermore, additional possible and practical predictors may also be considered.

Conclusions

In conclusion, using the data from a well-designed clinical trial, this study found that active sacroiliitis in MRI and anti-TNF-α treatment need to be weighted in order to estimate remission and disease flares, when axSpA patients achieve low disease activity. We developed and validated a nomogram predictive model for predicting the remission and flare probability among axSpA patients who have achieved low disease activity. This simple nomogram model had an adequate ability to discriminate and calibrate and appropriate clinical utility. It may be a useful and helpful tool for both patients and doctors conducting a post-remission evaluation.