Introduction

Therapeutic drug monitoring (TDM) is an important part of the management of inflammatory bowel disease (IBD) [1, 2]. It has been demonstrated to be cost-effective and associated with improved therapeutic outcomes compared to empiric treatment [3, 4]. TDM is commonly used to guide clinical decisions related to dose optimization at the time of loss of response to tumor necrosis factor alpha antagonists (anti-TNFs) in patients with Crohn’s disease (CD) [5]. Therapeutic trough concentration assessments have been suggested for patients with secondary failure to anti-TNF medications [1, 3, 4], and concentration-based dose optimization has been shown to guide clinical decisions that result in increased remission rates and reduced hospitalizations in patients with active IBD [4, 6, 7]. The addition of fecal calprotectin (FCP) testing to anti-TNF TDM may further optimize management decisions in IBD [8]. The potential utility of TDM for predicting future response to therapy and for proactively adjusting anti-TNF dosage to prevent loss of response has also been suggested [9, 10]. There is less evidence to support the use of TDM and FCP for other classes of biologics, including ustekinumab (UST), in the clinical setting.

Ustekinumab has been demonstrated effective for the treatment of CD [11] and ulcerative colitis (UC) [12], and there is a growing body of evidence relating to exposure–efficacy relationships [13,14,15,16,17] although there remains a lack of consensus on serum [UST] thresholds at different time points [13]. Exposure–efficacy data suggest that a drug concentration of > 1 µg/ml is associated with improved clinical and endoscopic outcomes [13]. Different assays have been used in studies of UST TDM to date adding further ambiguity [18, 19], and the real-world clinical utility of UST TDM on clinical decision making remains poorly described in the literature relative to what is known about TDM for anti-TNFs.

We conducted a cross-sectional congruency study to address these gaps in the understanding of UST TDM. The primary objective was to evaluate the impact of providing TDM-related information (i.e., serum [UST] and anti-drug antibodies [ADAb] to ustekinumab) with and without information from FCP testing, on clinical decisions in patients with CD in a real-world clinical setting. We hypothesized that the provision of UST TDM with or without FCP results would impact clinical decisions made in the management of CD patients treated with UST and therefore establish a role for TDM in the treatment of CD patients with UST. We explored the association between UST TDM ± FCP-related information and measures of CD activity. A sub-study retrospectively examined the association between UST TDM results (i.e., “therapeutic” versus “subtherapeutic” serum [UST]) and treatment outcomes at the next follow-up visit. Taken together, these three critical pieces of information could help rationalize the incorporation of UST TDM into clinical practice.

Methods

Study Population

Consecutive outpatients aged 18 to 80 years with documented CD who had been initiated on UST either subcutaneously (SC) or intravenously (IV) for at least 4 weeks with the most recent dose of UST within the last 12 weeks were eligible. Patients were excluded if they had a confirmed diagnosis of UC, an ostomy, or prior extensive bowel resection.

Study Design

This was a cross-sectional, multicenter, non-interventional study conducted in 11 Canadian sites that had experience using TDM as a decision-making tool for patients on anti-TNFs. Study enrollment occurred between April 2017 and January 2018. The institutional review board at each study site approved the protocol (see Supplement for additional information on protocol amendments), and all patients provided written informed consent. During the single study visit, patients provided a medical and medication history and completed the Harvey–Bradshaw index (HBI) questionnaire for disease activity (Fig. 1). A blood sample was obtained for TDM (i.e., serum [UST] and [ADAb] to ustekinumab); this was not protocolized therefore not necessarily at trough. The treatment decision (D1) was taken prior to, and independent of, the patient’s inclusion in the study and according to standard clinical practice. UST TDM results were provided to participating clinicians after their decisions were made, and they recorded whether their treatment decision would hypothetically change based on the provision of UST TDM results alone (D2), and where available, TDM + FCP (D3).

Fig. 1
figure 1

mUST-DECIDE overall study design. FCP, fecal calprotectin; ICF, informed consent form; HBI, Harvey–Bradshaw index; SOC, standard of care; TDM, therapeutic drug monitoring; UST, ustekinumab

A review panel of four expert gastroenterologists was convened, and each patient case was reviewed by three panel members who made a hypothetical clinical decision (D1) and then re-evaluated the case based on the provision of UST TDM (D2) and then UST TDM + FCP (D3).

Study Evaluations

The primary outcome was the congruency of clinical decisions made by participating clinicians before and after access to UST TDM results. Clinical decisions after access to TDM results included “no action” (i.e., no change in decision compared to baseline) or “action” (i.e., a change in decision compared to baseline, which could include request for further investigation such as laboratories, imaging [or other], dose optimization, treatment discontinuation, and treatment switch). Secondary outcomes included the congruency of additional decision pairs, including those of the review panel (i.e., D1 vs. D3 and D2 vs. D3). The review panel followed a majority rule approach for clinical decisions, which were not protocolized; if a consensus of at least two of three members was not reached, the case was labeled “disagreement.”

Adverse events (AEs) were recorded from the time a signed and dated informed consent form was obtained until 30 days after the initial study visit.

UST and ADAb to UST Serum Concentration Measurements

Serum [UST] and presence of ADAb to UST were assessed using an enzyme-linked immunosorbent assay (ELISA) [20] (detection range 0.005–20 µg/mL) and a drug-tolerant radioimmunoassay (RIA) [21] (lower limit of detection: 3 AU/mL), respectively (Sanquin Research Labs, Netherlands). A high level of agreement has been shown between the Sanquin UST and ADAb assay and the assay used in the UST registration trials [18] (and unpublished data from Sanquin).

Sub-study

A retrospective chart review was completed in patients who had a follow-up visit ≥ 30 days after the initial visit. Improved disease control [22] (exploratory outcome) was defined as a composite assessment outcome meeting ≥ 1 disease control criterion (symptomatic, endoscopic/imaging, biochemical) without any of the non-response criteria (inadequate or loss of response, worsening of any disease control criteria, initiation of any CD-related medications, and AEs). Serum [UST] was categorized as therapeutic, subtherapeutic, or uninterpretable based on their position in a two-compartment pharmacokinetic (PK) model. (Subjects on Q8W dosing were assessed as per the log-linear model which projected a therapeutic level of ≥ 4.5 µg/ml at 4 weeks and ≥ 1.0 µg/ml at 8 weeks.) This allowed for interpretation of non-trough sampling [13, 23].

Statistical Methods

A sample size of 100 patients was estimated to have approximately 90% power to detect a 15% change in clinical decisions (assuming 25% discordant pairs). For the primary endpoint, a two-sided McNemar’s test with a P value of 0.05 was used. No corrections were made for multiple testing.

Statistical analyses were performed by or under the authority of the sponsor. All authors had access to the data and reviewed and approved the final manuscript.

Results

Patient and Treatment Information

A total of 110 patients were enrolled and completed the study. Patient demographics, baseline disease characteristics, and medication use are summarized in Table 1. The mean duration of UST therapy was 14.4 months (SD 12.5). Fifty-five patients received induction dosing with SC UST and 46 with IV UST. The median induction doses by body weight were 360 mg and 390 mg for SC and IV UST, respectively. During maintenance, 66 patients received UST therapy at 8-week intervals, with the remaining at 4–6-week intervals. The median disease duration was 16.2 years, the majority (90.0%) had previously failed at least one anti-TNF, the mean HBI score was 4.0 (± 3.95), and 77 (70.0%) patients were in remission (i.e., HBI score < 5).

Table 1 Demographics, baseline characteristics, and CD medication history (n = 110)

Fecal Calprotectin

Each patient was scheduled to have a FCP test per routine care, though FCP values were available for 72 (65.5%) patients at the time of evaluation. The median (IQR) FCP was 208 (103–432) ug/uL. Among these 72 patients, the majority (n = 51, 70.8%) were in symptomatic remission, while 8 (11.1%) had mild disease and 13 (18.1%) had moderate disease, according to HBI scores.

C-Reactive Protein (CRP)

CRP results were available for 89 (80.9%) patients with a median (IQR) of 3.3 mg/L (1.3–6.7), which included 63 (70.8%) in HBI symptomatic remission, 10 (11.2%) having mild disease, and 16 (18.0%) having moderate disease.

Primary Analysis: Clinical Decisions by Participating Clinicians Before and After TDM (± FCP)

Overall, treatment decisions by participating clinicians before (D1) and after the provision of UST TDM results (D2) were unchanged (Fig. 2a) (i.e., the number of actions changing to no further actions canceled out the number of no actions to actions). The most common “actions” (i.e., clinical decisions) were “dose optimization,” followed by “treatment switch” and “further investigation” (Supplementary Table 1S). At an individual patient level, 39.1% (95% CI 29.8–48.4%) of the hypothetical clinical decisions were different when UST TDM results were made available. The addition of FCP to TDM results (D3) also failed to have a net impact on treatment decisions (Fig. 2b, Supplementary Table 2S). At an individual level, 50.0% of decisions would have been different if UST TDM + FCP results had been available. The addition of FCP to UST TDM results did not change the net proportions of clinical decisions by participating clinicians (D3 versus D2, P = 0.10; Supplementary Table 3S), and only 15.3% of individual decisions would have been different.

Fig. 2
figure 2

Congruency of CD treatment decisions by participating clinicians before and after provision of a UST TDM (n = 110) and b UST TDM + FCP (n = 72) Results. FCP, fecal calprotectin; TDM, therapeutic drug monitoring; UST, ustekinumab

Secondary Analysis: Clinical Decisions by Review Panel Before and After TDM (± FCP)

The review panel decisions were broadly similar to the primary clinical decisions, and the level of agreement across the review panel decisions was high (range: 83.3% to 95.5%). The net proportions of clinical decisions were not different before (D1) and after the provision of UST TDM results (D2) (Fig. 3a), but they did change based on TDM + FCP results (D3) (Fig. 3b, Supplementary Table 4S). At an individual patient level, 22.7% (95% CI 14.8–30.7%) of decisions would have been different if the panel had access to UST TDM results and 66.7% of individual decisions would have been different if UST TDM + FCP results had been available. The addition of FCP to UST TDM results changed the net proportions of clinical decisions by the expert panel (D3 versus D2, P = 0.004; Supplementary Table 6S), and 59.7% of individual decisions would have been different.

Fig. 3
figure 3

Congruency of CD treatment decisions by the review panel before and after provision of a UST TDM (n = 110) and b UST TDM + FCP (n = 72) results. FCP, fecal calprotectin; TDM, therapeutic drug monitoring; UST, ustekinumab

Subgroup Analyses

Serum [UST] and [ADAb to UST]

Sites were advised to take serum samples at trough; however, specific sampling was variable since this was not protocolized. Subgroup analyses of serum [UST] and [ADAb] were conducted only in patients who were sampled at trough. The median serum [UST] according to dose frequency and disease activity status is summarized in Table 2. Serum [UST] was generally higher in patients receiving Q4W dosing relative to Q8W dosing and appeared independent of disease activity status. No patient was positive for serum ADAb to UST.

Table 2 Median serum trough* [UST], µg/mL (IQR; n), by dosing frequency and disease activity

Sub-study

A subset of 53 patients had a subsequent follow-up visit more than 30 days after the initial visit and an interpretable serum [UST]. Among them, 44 (83.0%) had therapeutic, 9 (17.0%) had subtherapeutic serum [UST], and 35 (66.0%) were in symptomatic remission at the initial visit. In the subgroup of patients with available HBI at follow-up, five out of 17 (29.4%) of the therapeutic subgroup had some clinical disease activity (HBI ≥ 5), whereas none (n = 0/4) of the subtherapeutic subgroup had disease activity (i.e., all patients had HBI < 5). Serum [UST] at baseline was not associated with clinical decisions (Fig. 4a). After a median of 148 days (range 41–411), 50.9% of patients (n = 27) were in complete disease control; their clinical outcomes appeared independent of achieving therapeutic serum [UST] at the initial visit (odds ratio [OR] 0.80, 95% confidence interval [CI] 0.19–3.38; Fig. 4b).

Fig. 4
figure 4

a Association of serum [UST] with clinical decisions (action vs. no change) in the sub-study (n = 53) and b impact on disease outcomes at follow-up visit (n = 53). CI, confidence interval; OR, odds ratio; UST, ustekinumab

Safety

Overall, 33 (30.0%) patients had one or more AEs from enrollment until 30-day follow-up. The most common AEs were drug inefficacy (n = 27, 24.5%) (prior to the first protocol amendment, any subject with HBI > 5 was categorized as “lack of efficacy”), gastrointestinal disorders (n = 3, 2.7%), and skin and subcutaneous tissue disorders (n = 2, 1.8%). One patient reported one serious adverse event (small intestinal obstruction), which was judged by the investigator as unrelated to UST.

Discussion

mUST-DECIDE was a phase IV, cross-sectional, multicenter, non-interventional study conducted in patients with documented CD who were predominantly receiving UST maintenance therapy at baseline. Patients had longstanding, highly refractory CD, yet a majority were in remission.

The study failed to demonstrate an impact of routine UST TDM on clinical decisions. No effect was detected at the level of participating clinicians or a review panel consisting of gastroenterology experts. This contrasts with what has been reported with anti-TNFs for the treatment of CD [4, 6, 8, 24], where TDM is commonly used in clinical practice [1,2,3]. Despite the lack of a net difference in clinical decisions, the proportions of individual CD treatment decisions made by participating clinicians (39.1%) and the review panel (22.7%) changed after the provision of UST TDM results, indicating that the availability of TDM may impact clinical decision making. A proportion of these changed decisions (23.6% in the expert group) were related to ordering further testing, presumably to assess for active disease (additional radiology testing/FCP information). The changed clinical decisions tended to be uniform in both directions, with similar numbers of actions changing to no further actions, and vice versa, thus explaining the net neutral overall result. The statistical relevance of the changed decisions cannot be ascertained since there was no control group in this study, though they appear lower than expected from similar experiments with anti-TNFs [4, 6, 8, 24].

Similar studies have been reported in the anti-TNF literature. For example, a single-center study of 36 IBD patients from the University of Alberta found 69.4% of decisions would be different based on IFX TDM results [8].

Interpretation of UST TDM results remains poorly described in the literature compared to the widely accepted thresholds for anti-TNF TDM [1, 4, 7]. An analysis from the UNITI trials reported that serum [UST] was proportional to dose and treatment efficacy, which included clinical remission and endoscopic efficacy [13], and PK analyses suggested that trough concentration targets for clinical remission during maintenance treatment with UST ranged from 0.8 to 1.4 µg/mL [13]. Other cohort studies have suggested higher threshold UST maintenance levels in anti-TNF refractory CD patients ranging from 1.7 to 4.5 µg/mL [14, 15, 17]. In a Canadian cohort, highly refractory CD patients were treated with UST SC during induction and optimized maintenance and showed improved clinical and endoscopic outcomes in patients with serum [UST] higher than 4.5 µg/mL with a homogeneous mobility shift assay [14]. Notably, substantial absolute differences in [UST] have been reported between different assays [19], which may help explain the approximately twofold to threefold higher serum [UST] reported in these studies compared to the UNITI trials. Thus, there is no clear established threshold for therapeutic [UST] and median serum [UST] during maintenance treatment of CD has not been consistently shown to correlate with clinical or endoscopic remission rates [15, 17, 25]. In the mUST-DECIDE sub-study, a threshold of 1 µg/ml at trough was used to explore this association, given the PK data from the UNITI trials. In patients mainly in clinical remission, serum [UST] at the initial study visit did not predict clinical decisions (i.e., “action” or “no action”) and there was no association between serum [UST] and short-term clinical outcomes at the next follow-up visit.

Access to FCP results after the UST TDM-related information did not alter decisions by participating clinicians but did change decisions by the review panel. The FCP information resulted in a larger proportion of changed decisions than with UST TDM alone (50% and 66.7% decisions by participating clinicians and the review panel, respectively). This larger proportion of changes in the review panel suggests a greater adherence to clinical guidelines to objectively measure active inflammation to inform clinical decisions [5]. Similar to the UST TDM results, changed decisions tended to balance out in both directions, with similar numbers of actions changing to no actions, and vice versa.

The proportions of changes to treatment decisions (based on UST TDM) after provision of FCP results were 15.3% and 59.7% for decisions made by participating clinicians and the review panel, respectively. This suggests that in expert hands, FCP appears to influence CD treatment decisions and highlights the importance of a complete drug concentration and biomarker profile in properly assessing the clinical course of action, with the caveat that this study did not investigate the role of FCP alone. It remains possible that FCP alone could drive a majority of clinical decisions, suggesting a more limited role for UST TDM in the context of active inflammation. Fecal calprotectin has been demonstrated as a valuable monitoring [26, 27] and decision-making tool [8]. In fact, an algorithm has been proposed for IFX [28].

No patients in this study were positive for serum ADAb to UST, which is similar to other results varying from no ADAb [14,15,16] to very low incidence in the UNITI (2.3%) [29] and UNIFI trials (4.6%) [12]. Notably, the presence of ADAb in IM-UNITI did not preclude efficacy of UST and no definitive conclusions could be drawn on the effect of ADAb 29.

This study has several limitations. Clinical management, including the UST dosing regimen, was not protocolized, in order to reflect the use of UST in the real world. A specific serum sampling time was also not protocolized nor uniformly executed, and thus, serum samples to determine [UST] were collected at trough in only ~ 50% of patients. This underscores the practical challenges of sampling UST-treated patients that are dosed subcutaneously every 8 to 12 weeks. Further, there are no published studies on interpretation of non-trough serum [UST], and it is unclear how the participating sites intended to interpret non-trough samples. Despite this, the review panel was able to extrapolate the therapeutic nature of the non-trough PK samples. FCP information was available for only a subset of 72 patients, and the sub-study included only 53 patients. The patient population comprised stable, treatment-resistant patients who had been exposed to multiple therapies and prior surgeries, possibly limiting the role of serum [UST] on clinical decision making—these patients had already exhausted their available treatment options and physicians may have been predisposed to maintaining their current dose in the absence of symptoms. The majority of patients were in remission (i.e., HBI < 5), and the findings should not be extrapolated to patients exhibiting a secondary loss of response to UST.

In conclusion, obtaining UST TDM information in the clinical management of consecutive CD patients on maintenance treatment with UST did not alter theoretical decision making for clinicians or an expert review panel, whereas adding FCP to UST TDM altered clinical decisions for the review panel but not clinicians. A sub-study showed no impact between baseline serum [UST] and short-term clinical outcomes at the next follow-up visit. The divergence on impact of TDM and FCP between clinicians and the review panel highlights a need for greater understanding and education in assessing for active inflammation using FCP and interpreting UST TDM results. Further studies to clarify the use and impact of these tests in clinical practice across different clinical scenarios (e.g., reactive/loss of response) are warranted.