FormalPara Key Summary Points

Why carry out this study?

Progress in understanding the mechanisms of ulcerative colitis (UC) immunopathogenesis has led to an increase in the number of treatments available, and choice of therapy for patients with UC is currently predicated on disease severity, treatment history, and benefit–risk profiles.

Various statistical techniques could aid the development of models that predict outcomes of treatments and assist decisions on benefits and risks, which is particularly important in the context of dosing because the lowest effective tofacitinib dose should be used for maintenance therapy in patients with UC.

This post hoc analysis assessed whether various statistical techniques could predict the outcomes of tofacitinib 5 or 10 mg twice daily (BID) maintenance therapy, in patients with UC who had previously achieved a response with tofacitinib 10 mg BID induction therapy.

What was learned from the study?

Prediction models demonstrated insufficient accuracy for determining loss of response at week 8 or steroid-free remission at week 52 in OCTAVE Sustain, or delayed response in OCTAVE Open, whereas rectal bleeding and endoscopy subscores were the primary determinants of disease worsening and improvement, respectively.

The prediction models and association analyses were unable to predict differences between tofacitinib 5 or 10 mg BID maintenance therapy; however, it is possible that data related to key variables not collected during the clinical trials included in this analysis could have significantly improved the accuracy of the models, and further studies with larger sample sizes are required.

Digital Features

This article is published with digital features, including a graphical abstract to facilitate understanding of the article. To view digital features for this article, go to https://doi.org/10.6084/m9.figshare.23294930.

Introduction

Ulcerative colitis (UC) is a chronic, immune-mediated disease, which is characterized by recurrent periods of relapsing and remitting symptoms, including diarrhea, fecal urgency, and rectal bleeding [1, 2]. Given the chronic nature of the disease, patients with UC require long-term therapy, with the ultimate goal of inducing and maintaining sustained steroid-free remission.

With progress in understanding the mechanisms behind UC immunopathogenesis, the number of treatment options for patients with UC has expanded [3,4,5]. The choice of therapy for patients with UC is predicated on disease severity and the benefit–risk profile of medical therapy [6]. The concept of personalized therapy is based on the premise of identifying specific clinical or laboratory variables capable of identifying patients who may respond to certain treatments. While personalized medicine for UC is not yet well developed, this could further streamline future therapeutic choices [7, 8].

Tofacitinib is an oral small molecule Janus kinase inhibitor for the treatment of UC. The efficacy and safety of tofacitinib 10 mg twice daily (BID) in patients with moderately to severely active UC were demonstrated in an 8-week phase II induction study (NCT00787202) [9], two 8-week phase III induction studies (OCTAVE Induction 1 and 2; NCT01465763 and NCT01458951) [10], and a 52-week phase III maintenance study (OCTAVE Sustain; NCT01458574) [10], and were evaluated further in an open-label long-term extension study (OCTAVE Open; NCT01470612) [11] and a phase IIIb/IV study (RIVETING; NCT03281304) [12].

The use of various statistical techniques can aid in the development of models to predict the outcomes of patients with UC treated with tofacitinib, which could assist healthcare professionals in decisions on benefits and risks including in the context of dosing. This is particularly important because the lowest effective tofacitinib dose should be used for maintenance therapy in patients with UC [13]. A previous analysis of data from patients with UC in OCTAVE Induction 1 and 2 who received 8 weeks of induction therapy with tofacitinib 10 mg BID demonstrated that early and clinically meaningful predictions of responder status were possible using these techniques [14].

The aim of this post hoc analysis was to assess whether similar techniques could be used to predict early and long-term outcomes of tofacitinib 5 or 10 mg BID as maintenance therapy in patients with UC who had previously responded to tofacitinib 10 mg BID induction therapy. Outcomes included loss of response and steroid-free remission at week 8 and week 52 of OCTAVE Sustain, respectively, as well as differences in loss of response/discontinuation patterns between treatment groups. This analysis also assessed delayed response in patients who did not respond to the initial 8-week induction therapy.

Methods

Patients

Full details of the OCTAVE clinical program study design, and inclusion and exclusion criteria, have been described previously [10, 11]. Briefly, patients with UC who completed OCTAVE Induction 1 and 2 with a clinical response (defined as a decrease from induction study baseline total Mayo score of ≥ 3 points and ≥ 30%, plus a decrease in rectal bleeding subscore of ≥ 1 point or an absolute rectal bleeding subscore of 0 or 1) were eligible to participate in OCTAVE Sustain and were re-randomized to receive tofacitinib 5 mg BID, tofacitinib 10 mg BID, or placebo. Patients who did not respond in OCTAVE Induction 1 and 2 were eligible to participate in OCTAVE Open and received tofacitinib 10 mg BID for at least another 8 weeks.

Compliance with Ethics Guidelines

The study protocols were approved by the Institutional Review Board or Independent Ethics Committee for each participating center. This post hoc analysis is based on the NCT01465763, NCT01458951, NCT01458574, and NCT01470612 studies. Written informed consent was obtained from all patients, per the ethics committee-approved protocols. All studies were conducted in compliance with the ethical principles derived from the Declaration of Helsinki and in compliance with all International Council for Harmonisation Good Clinical Practice Guidelines.

Outcomes and Models

Prediction Analyses

Two modeling techniques (logistic regression and Least Absolute Shrinkage and Selection Operator [LASSO] regression analyses) were used to generate a model that could predict the following outcomes: (1) loss of response (based on change in partial Mayo score [PMS]; Table S1 in the supplementary material) at week 8 of OCTAVE Sustain (week 16 overall) with tofacitinib 5 or 10 mg BID, (2) steroid-free remission (based on PMS) at week 52 of OCTAVE Sustain following treatment with tofacitinib 5 or 10 mg BID or placebo, and (3) delayed induction response in OCTAVE Open following non-response in OCTAVE Induction 1 and 2 and extended treatment with tofacitinib 10 mg BID for an additional 8 weeks (Fig. 1). Response (based on change in PMS) and steroid-free remission (based on PMS) have been defined previously (Table S1) [14, 16].

Fig. 1
figure 1

Figure has been adapted from Sandborn et al. [11] and Winthrop et al. [15]. These are both open access articles under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations were made

Overview of the tofacitinib OCTAVE UC clinical program, and the outcomes of interest. aFinal complete efficacy assessment at week 8/52. Treatment continued up to week 9/53. bClinical response in OCTAVE Induction 1 and 2 was defined as a decrease from induction study baseline total Mayo score of ≥ 3 points and ≥ 30%, plus a decrease in rectal bleeding subscore of ≥ 1 point or an absolute rectal bleeding subscore of 0 or 1. cNon-responders from OCTAVE Induction 1 and 2 received tofacitinib 10 mg BID in OCTAVE Open. dRemission was defined as a total Mayo score of ≤ 2 with no individual subscore > 1 and a rectal bleeding subscore of 0. eOnly patients receiving tofacitinib 5 or 10 mg BID were included in the analysis. BID twice daily, OLE open-label, long-term extension, UC ulcerative colitis.

This analysis only included those patients who received active treatment (tofacitinib 10 mg BID) in the induction studies and were re-randomized in OCTAVE Sustain or proceeded to OCTAVE Open. A total of 593 patients from OCTAVE Induction 1 and 2 entered OCTAVE Sustain. Of these patients, 70 entered OCTAVE Sustain after responding to placebo and were subsequently excluded from this analysis. Of the patients who received active treatment in the induction studies, 176 received tofacitinib 5 mg BID, 173 received tofacitinib 10 mg BID, and 174 received placebo in OCTAVE Sustain.

Two modeling strategies were utilized in these analyses, an empirical model and a hypothesis-based model. The empirical model included separate analyses that used variables from baseline to week 8 of OCTAVE Induction 1 and 2 and variables that were only available at baseline of OCTAVE Induction 1 and 2. The hypothesis-based model included variables that were selected on the basis of information extracted from the current literature and the expert opinions of authors. A hypothesis-based model based on total Mayo score and PMS, individual subscores, medication use, laboratory variables, and medical history was included among other analyses in order to evaluate the contribution of each score towards the precision of the predictive models. The full lists of variables that were entered into the models are shown in Tables S2–S4 in the supplementary material and included PMS and subscores, age, body mass index, disease duration, prior medication use (including corticosteroids), vital signs at baseline, and laboratory variables at baseline or week 8.

Prediction analyses were performed using the Mayo score [16]. Analyses performed using the empirical model included only the PMS, and analyses performed using the hypothesis-based model included both the PMS and total Mayo score, as defined in Table S1.

Logistic Regression

Logistic regression models were performed for each treatment group either with no selection method or using forward selection. Starting from an intercept-only model, one significant variable was added until no significant variables met the significance level to enter the model. An area under the receiver operating characteristic (AUROC) value of greater than 0.9 was considered predictive of response [17, 18].

Data were analyzed using three approaches. The first approach included the whole dataset of patients in the logistic regression model; the second approach used a 70:30 training–testing split for the identification of significant explanatory variables; and the third approach used k-fold cross-validation. For k-fold cross-validation, the whole dataset was randomly split into k different partitions (k = 5 or 10), and one partition at a time was iteratively considered. The remaining k − 1 partitions were used to predict outcomes for the patients in the omitted partition. Five repetitions of the whole cross-validation approach were performed for each k. Model performance was evaluated through the average AUROC value over the repetitions.

Least Absolute Shrinkage and Selection Operator Logistic Regression

LASSO logistic regression analyses, used to shrink or regularize coefficients to avoid overfitting [19, 20], were carried out in a similar way to the logistic regression modeling described above using the model with the smallest Akaike’s information criterion instead of forward selection. Inclusion of the whole dataset of patients did not change the accuracy of the predictive response. Consequently, the LASSO logistic regression analyses reported here only utilized the 70:30 training–testing and k-fold cross-validation approaches.

All modeling analyses were performed using SAS 9.4 software.

Association Analyses

Association analyses were carried out to establish differences in on-response/discontinuation patterns between treatment groups. Bivariate analyses were performed to compare discontinuation as a result of insufficient response in the tofacitinib 5 and 10 mg BID groups using chi-square and Fisher’s exact tests for categorical variables and the Kruskal–Wallis test for continuous variables. A k-nearest neighbor (kNN) analysis was performed to evaluate whether patients who received tofacitinib 5 mg BID and had loss of response in OCTAVE Sustain were more closely associated with patients who received tofacitinib 10 mg BID and had either loss of response or sustained response. Visualizations of individual response and remission patterns were performed using patient-level data with the aim of identifying the treatment patterns over time for each patient group. Three parameters were assessed: differences in the discontinuation and remission patterns among the treatment groups; 2- and 3-point PMS responder status between study baseline and week 52 of OCTAVE Sustain; and change in total Mayo score over time between study baseline and week 52 of OCTAVE Sustain.

In the analyses of discontinuation patterns, to determine p values, logistic regression was used to test if the average probability of discontinuation differed between treatment groups. Dummy variables were created to represent whether a patient had ceased participating in the study at any point. The dummy discontinuation variables were used as the dependent variables, and the treatment groups (tofacitinib 5 and 10 mg BID, and placebo) were used as the independent variable. In the analyses of change in total Mayo score over time, the proportion of total Mayo score comprising each subscore was calculated for all patients. Dummy variables were created to represent whether a patient had either an increase or decrease in total Mayo score over time. Regression models were run using the subscores as the dependent variables, and the dummy variables as the independent variables. This was repeated for each treatment group. The statistical analyses were performed using R 4.1 software.

Results

Loss of Responder Status at Week 8 of OCTAVE Sustain

Empirical logistic regression analyses that included variables from baseline resulted in AUROC values ranging from 0.50 to 0.79 to predict loss of response using all three modeling approaches. The equivalent values from the analyses that included variables from baseline to week 8 ranged from 0.61 to 0.86 (Table 1). For the models that could, to a certain extent, discriminate between the groups of interest that were assessed (i.e., tofacitinib 10 or 5 mg BID; AUROC > 0.6), the relative importance of the variables in the individual models is shown in Table S5 in the supplementary material.

Table 1 Summary of AUROC values from logistic regression analyses of loss of responder status at week 8 of OCTAVE Sustain

The AUROC values were close to 0.9 (range 0.81–0.86) for the analysis that included the whole set of patients and used variables from baseline to week 8 (Table 1). Results of the hypothesis-based logistic regression analyses of loss of responder status were generally similar to the empirical analysis results (Table 2). The relative importance of each variable was not provided for the poorly performing hypothesis-based model because none of the variables selected by experts could discriminate between continued response and loss of response. In addition, results were generally similar with tofacitinib 5 and 10 mg BID (Tables 1 and 2).

Table 2 Summary of AUROC values from the hypothesis-based logistic regression analyses of loss of responder status at week 8 of OCTAVE Sustain

Steroid-Free Remission at Week 52 of OCTAVE Sustain

Empirical logistic regression analyses that included variables that were selected from baseline only resulted in AUROC values that ranged from 0.65 to 0.66 for predicting steroid-free remission (based on PMS) when the treatments were combined and using the training–testing split and fivefold cross-validation models (Table 3). The equivalent values from the analyses that included variables from baseline to week 8 ranged from 0.73 to 0.79 when the treatments were combined (Table 3). Applying variable selection methods to improve the efficiency of the models did not increase the accuracy of the models. Results of the hypothesis-based logistic regression analyses were generally similar to the empirical analysis results (Tables 3 and 4; Table S8 in the supplementary material). For the models that could, to a certain extent, discriminate between the groups of interest that were assessed (i.e., tofacitinib 10 or 5 mg BID or placebo; AUROC > 0.6), the relative importance of the variables in the individual models is shown in Tables S9 and S10 in the supplementary material. In addition, results were generally similar across the two treatment groups (Tables 3 and 4; Table S8). Applying a tenfold cross-validation approach, and the addition of LASSO logistic regression analyses, did not improve the predictive power of the model (Tables S11–14 in the supplementary material).

Table 3 Summary of AUROC values from logistic regression analyses predicting steroid-free remissiona at week 52 of OCTAVE Sustain without applying variable selection
Table 4 Summary of AUROC values from hypothesis-based logistic regression analyses predicting steroid-free remissiona at week 52 of OCTAVE Sustain applying variable selection

Delayed Response at Week 8 of OCTAVE Open

In total, 295 tofacitinib-treated patients who were non-responders in OCTAVE Induction 1 and 2 enrolled in OCTAVE Open as delayed responders. These patients represent only a portion of the patients who entered OCTAVE Open. As per Fig. 1, patients who had completed or had treatment failure during OCTAVE Sustain could also enter OCTAVE Open. Therefore, in total, 944 patients entered OCTAVE Open and received either tofacitinib 5 or 10 mg BID per protocol.

In the empirical logistic regression analyses of delayed response among patients who received tofacitinib 10 mg BID for an additional 8 weeks following induction therapy using all three modeling approaches, AUROC values, based on the PMS and individual partial Mayo subscores, were less than 0.80 in the models that included variables from baseline only (PMS, range 0.52–0.63; individual subscores, range 0.52–0.66) or variables from baseline to week 8 (PMS, range 0.55–0.76; individual subscores, range 0.58–0.75; Table 5).

Table 5 Summary of AUROC values from logistic regression analyses predicting delayed response at week 8 of OCTAVE Open

The results of the hypothesis-based logistic regression analyses were generally similar to those of the empirical analysis (Table 6). For the models that could, to a certain extent, discriminate between the groups of interest that were assessed (i.e., tofacitinib 10 mg BID; AUROC > 0.6), the relative importance of the variables in the individual models is shown in Table S15 in the supplementary material. In addition, the findings from the analyses based on PMS and total Mayo score were generally similar (Table 6). Applying a tenfold cross-validation approach did not improve the predictive power of the model (Tables S16 and S17 in the supplementary material).

Table 6 Summary of AUROC values from hypothesis-based logistic regression analyses predicting delayed response at week 8 of OCTAVE Open and applying variable selection

Association Analyses in OCTAVE Sustain

In the bivariate analysis, the following categorical variables were identified as significant: sex (p = 0.005), presence of extraintestinal manifestations at induction study baseline (p = 0.027), immunosuppressant use prior to induction (p = 0.003), and a history of musculoskeletal and connective tissue disorders (p = 0.037). In the analysis of continuous variables, only uric acid at week 8 of the induction studies (p = 0.042) was identified as significant. Consecutive multivariable analyses were performed; however, no significant variables were identified (data not shown).

Using the significant variables extracted from the bivariate analysis, the kNN analysis determined that patients with loss of response in the tofacitinib 5 mg BID group were more closely associated with patients who received tofacitinib 10 mg BID and had sustained response, compared with patients who had loss of response in the tofacitinib 10 mg BID group (Table 7).

Table 7 Summary of results from k-nearest neighbor analysis evaluating loss of response in patients who received tofacitinib 5 mg BID in OCTAVE Sustain, using patients who received tofacitinib 10 mg BID as neighbors

Disease activity graphs were generated to evaluate the patterns of response for individual patients in each treatment group. Visualization of data from OCTAVE Sustain demonstrated that the patterns of remission, lack of remission, and discontinuation (based on OCTAVE Open study criteria) in patients receiving tofacitinib 5 vs 10 mg BID were generally similar over time (p > 0.05; Fig. 2). Moreover, patients with prior tumor necrosis factor inhibitor (TNFi) exposure demonstrated numerically higher rates of lack of remission and discontinuation compared with patients who were TNFi-naïve; however, these differences were not statistically significant. Patients treated with placebo demonstrated a significantly different pattern of disease activity compared with patients who were treated with tofacitinib 5 or 10 mg BID (p < 0.05).

Fig. 2
figure 2

Visualization of remission rates over time stratified by TNFi exposure among patients in OCTAVE Sustain receiving a tofacitinib 10 mg BID, b tofacitinib 5 mg BID, or c placebo. Remission was defined as having a PMS ≤ 1. BID twice daily, PMS partial Mayo score, TNFi tumor necrosis factor inhibitor

Visualization patterns of response, lack of response, and discontinuation were generally similar in analyses that defined response as changes in PMS of 2 points vs 3 points (Figs. S1 and S2, respectively, in the supplementary material).

Interestingly, in the analysis of the contribution of individual subscores to the change in total Mayo score over time, the rectal bleeding subscore was the predominant subscore in patients with disease worsening (indicated by an increase in total Mayo score) between baseline and week 52 of OCTAVE Sustain (contribution of the rectal bleeding subscore was 24.7% [p < 0.001], 19.6% [p < 0.001], and 25.2% [p < 0.001] higher for patients with disease worsening compared with patients with disease improvement in those receiving tofacitinib 5 mg BID, tofacitinib 10 mg BID, and placebo, respectively; Fig. 3). The difference in the proportion of the total Mayo score made up by the rectal bleeding subscore was not statistically significant when the three treatment groups were compared.

Fig. 3
figure 3

Visualization of change in total Mayo score from study baseline to week 52 of OCTAVE Sustain in patients receiving a tofacitinib 10 mg BID, b tofacitinib 5 mg BID, or c placebo. Individual patients are shown along the x-axis. BID twice daily

By contrast, the endoscopic subscore was the predominant subscore in patients who had disease improvement, indicated by a decrease in the total Mayo score between baseline and week 52 of OCTAVE Sustain in all three treatment groups (contribution of the endoscopic subscore was 15.9% [p = 0.021], 14.5% [p = 0.046], and 12.4% [p = 0.096] higher for patients with disease improvement compared with patients with disease worsening in those receiving tofacitinib 5 mg BID, tofacitinib 10 mg BID, and placebo, respectively; Fig. 3). The difference in the proportion of the total Mayo score made up by the endoscopic subscore was not statistically significant when the three treatment groups were compared.

Discussion

This post hoc analysis of data from patients with UC in OCTAVE Sustain and OCTAVE Open utilized statistical modeling to predict patient loss of responder status (based on change in PMS) at week 8 of OCTAVE Sustain, steroid-free remission (based on PMS) at week 52 of OCTAVE Sustain, and delayed response at week 8 of OCTAVE Open. In all analyses, AUROC values were below the threshold for accurate prediction of response (AUROC value < 0.9), and there were no notable differences in model outcomes when 2- and 3-point changes in PMS were used as definitions of response. The AUROC values were moderately acceptable at discriminating responders vs non-responders (i.e., AUROC > 0.8) in the prediction modeling analyses that used the whole dataset, including data from baseline to week 8 of OCTAVE Induction 1 and 2. However, models validated using the training–testing split or the cross-validation datasets generally had AUROC values below the threshold for moderate to accurate prediction (AUROC values < 0.8) [17, 18]. Comparisons of the two active treatment arms of OCTAVE Sustain (tofacitinib 5 and 10 mg BID) demonstrated that patient characteristics associated with responses to treatment were generally similar when evaluated using the variables that were available in this clinical trial. However, the variables that were identified as important within a particular model should be interpreted with caution. As the overall performance of the models was rather poor, the variables that were driving the outcomes could also not be considered specific enough to discriminate between the tofacitinib 5 and 10 mg BID groups. However, the variables listed such as age, PMS, and corticosteroid use have been previously described as factors associated with UC disease activity [14, 21, 22]. Despite the multiple efforts that were made to refine the research methodology by applying advanced prediction modeling methods and multi-omic approaches, which allows for more robust prediction by combining individual components and sharpening the approach to disease prediction, further refinement is suggested [23,24,25].

Notably, among patients with an increase in total Mayo score from study baseline to week 52 of OCTAVE Sustain, the rectal bleeding subscore was identified as the primary determinant of disease worsening. By contrast, among patients with a decrease in total Mayo score from study baseline to week 52 of OCTAVE Sustain, the endoscopic subscore was identified as the primary determinant of improvement.

A previous assessment of data from OCTAVE Induction 1 and 2 used statistical modeling techniques to identify predictors of early treatment response among patients receiving induction therapy with tofacitinib 10 mg BID [14]. This study reported that logistic regression and random forest models were able to predict week 4 and week 8 responder status using only four measures (PMS, partial Mayo subscores, total cholesterol, and C-reactive protein) at different timepoints. An early change in the PMS was identified as the most important predictor of responder status. Here, we explored the extent to which treatment outcomes after week 8 of OCTAVE Induction 1 and 2 could be predicted on the basis of a change in PMS, as well as steroid-free remission (based on PMS) or delayed response among patients with UC. Patients included in this analysis were receiving either tofacitinib 5 or 10 mg BID as maintenance therapy or placebo in OCTAVE Sustain, or were non-responders at the end of OCTAVE Induction 1 and 2 and received extended induction for an additional 8 weeks in OCTAVE Open. There is no unambiguous reason why the models in this analysis were unable to accurately predict treatment response; however, factors such as the inclusion/exclusion criteria for the patient population in the global OCTAVE clinical program, number of patients analyzed, study design (i.e., omission of variables that could have been more predictive), the subjective nature of patient-reported symptoms in the PMS, and flare severity may have influenced the models’ predictive capabilities.

Visualization of remission and discontinuation patterns across treatment groups demonstrated that disease activity patterns were generally similar in patients receiving tofacitinib 5 or 10 mg BID among TNFi-naïve patients. These visualizations suggest that the differences in measurable disease activity characteristics between the tofacitinib treatment groups were too small to be accurately assessed, even with advanced data analysis techniques. The relatively low patient numbers included in this analysis may also have influenced the performance of the models, and this was partly confirmed by the small increase in power with the models that utilized the whole set of patients.

Previous studies have evaluated the role of biomarkers, such as fecal calprotectin, in predicting endoscopic response and possible relapse among patients with UC [26, 27]. A prospective, real-world study of patients with UC in Spain demonstrated that fecal calprotectin levels could act as a potential surrogate marker for endoscopic remission [27]. The absence of such biomarkers in the OCTAVE clinical program may have influenced the overall predictive value of the models, and future studies might consider including a different array of variables.

This analysis included visualizations of disease activity and discontinuation patterns that provided some insight into the effects of maintenance treatment over time. Notably, while patients who received tofacitinib 5 or 10 mg BID exhibited generally similar patterns of response, patients who received placebo demonstrated a distinct outcome pattern, compared with those who received tofacitinib, and this is consistent with previously reported results of induction study efficacy data [10].

Furthermore, the visualizations established that a change in rectal bleeding subscore was the primary determinant of disease worsening, as indicated by an increase in total Mayo score between baseline and week 52 of OCTAVE Sustain. This is consistent with a previous meta-analysis, which found that a normal rectal bleeding subscore correlated with the achievement of endoscopic remission among patients with UC [28]. The rectal bleeding subscore is a semiquantitative, subjective, patient-reported variable that has demonstrated fair correlation with endoscopic activity in patients with UC [29]. However, similar to stool frequency, it is subject to substantial bias. Conversely, endoscopic activity (measured using the Mayo endoscopic subscore) is relatively objective and, in the global OCTAVE clinical program, was reviewed centrally. Here, the rectal bleeding subscore was not normally distributed. It is possible that the variance in the total Mayo score among patients with disease worsening was driven by a subgroup of outliers who were reporting higher rectal bleeding subscores compared with other variables, such as stool frequency, Physician Global Assessment, and endoscopic subscores, which either remained the same or improved. However, a change in the rectal bleeding subscore may be an early indicator of change in disease activity in patients with UC [30]. Furthermore, a strong correlation between clinical symptoms and endoscopic disease severity has been reported [31]. Therefore, it is possible that improvements in the endoscopic subscore may be indicative of an improvement in a patient’s clinical symptoms.

This study had some limitations. Firstly, the number of patients included in this prediction analysis was relatively small, especially in some treatment subgroups, which limits the ability to interpret the data. Also, the prediction models have not been confirmed or validated by other data sources. External validation is key to determining the reproducibility and generalizability of the models to other patient populations. In addition, all patients who enrolled in OCTAVE Sustain had at least achieved a clinical response following OCTAVE Induction 1 and 2, and this may have contributed to the generally similar patterns of response with both tofacitinib doses in OCTAVE Sustain. Finally, this study only included data from patients who met the inclusion/exclusion criteria for the global OCTAVE clinical program. This may limit the generalizability of the results in the general population of patients with UC, and further studies that include prospective and real-world validation of the predictive models should be performed.

Conclusion

Using data from the global OCTAVE clinical program, we could not generate sufficiently accurate models that could predict loss of responder status with maintenance therapy and/or steroid-free remission. However, in-depth analyses of the individual disease score patterns indicated that there were particular differences in Mayo subscores when patients with disease worsening and disease improvement were compared; the rectal bleeding subscore was the primary determinant of disease worsening, while the endoscopic subscore was the primary determinant of disease improvement. The development of prediction models contributes to a growing body of research that is exploring the use of various statistical techniques to predict treatment outcomes with potential application in clinical practice. Notably, it is possible that there were missing data related to yet unknown key variables that were not collected during the clinical trials included in this analysis that could have significantly improved the accuracy of the models, and further studies with larger sample sizes are required.