Prediction and evaluation of high-risk patients with primary biliary cholangitis receiving ursodeoxycholic acid therapy: an early criterion

Background and aims Current treatment guidelines recommend ursodeoxycholic acid (UDCA) as the first-line treatment for new-diagnosed primary biliary cholangitis (PBC) patients. However, up to 40% patients are insensitive to UDCA monotherapy, and evaluation of UDCA response at 12 months may result in long period of ineffective treatment. We aimed to develop a new criterion to reliably identify non-response patients much earlier. Methods Five hundred sixty-nine patients with an average of 59 months (Median: 53; IQR:32–79) follow-up periods were randomly divided into either the training (70%) or the validation cohort (30%). The efficiency of different combinations of total bilirubin (TBIL), alkaline phosphatase (ALP), and aspartate aminotransferase (AST) threshold values to predict outcomes was assessed at 1, 3 or 6 month after the initiation of UDCA therapy. The endpoints were defined as adverse outcomes, including liver-related death, liver transplantation and complications of cirrhosis. Adverse outcome-free survival was compared using various published criteria and a proposed new criterion. Results A new criterion of evaluating UDCA responses at 1 month was established as: ALP ≤ 2.5 × upper limit of normal (ULN) and AST ≤ 2 × ULN, and TBIL ≤ 1 × ULN (Xi’an criterion). The 5 year adverse outcome-free survival rate of UDCA responders, defined by Xi’an criterion, was 97%, which was significantly higher than that of those non-responders (64%). An accurate distinguishing high-risk patients’ capacity of Xi’an criterion was confirmed in both early and late-stage PBC. Conclusions Xi’an criterion has a similar or even higher ability to distinguish high-risk PBC patients than other published criteria. Xi’an criterion can facilitate early identification of patients requiring new therapeutic approaches. Supplementary Information The online version contains supplementary material available at 10.1007/s12072-022-10431-7.


Introduction
Primary biliary cholangitis (PBC) is an immune-mediated liver disease characterized by chronic inflammation of the intrahepatic bile ducts that causes progressive ductal damage and liver fibrosis [1]. PBC has heterogeneous clinical features, and some patients can develop cirrhosis, hepatic failure, and liver-related death during disease progression [2,3]. Currently, ursodeoxycholic acid (UDCA) is the firstline therapy for PBC, which can improve liver biochemistry indicators, ameliorate disease-associated symptoms and suppress liver fibrosis progression [4,5]. However, a significant proportion of patients have an inadequate response to UDCA, which leads to a higher risk of liver-related progression [6]. To assure adequate clinical management and personal care, it is necessary to define and establish reliable parameters in identifying subgroups of patients at high risk.
In the past few decades, standard serum liver biochemistry testing under UDCA treatment has been used to predict treatment responses, and liver-related complications. Several criteria for UDCA treatment have been developed to evaluate patient risk stratification, such as Rotterdam, Barcelona, Rochester-II, Paris-I, Paris-II, Toronto and Ehime criteria [7][8][9][10][11][12][13]. Those prognosis risk stratification model assesses therapeutic effects using liver biochemical parameters after UDCA treatment initiation for 6, 12, or 24 months, respectively. A 12 month period is conventionally used to identify patients in needs for second-line therapies [6]. However, these criteria also posed potential limitations for patients with inadequate responses who were at a higher risk of disease progression to receive non-effective treatment for a long period.
Notably, approximately 50% of patients might need additional treatments to reach therapeutic goals [14]. The rate of progression varies greatly among individual patients [15]. Although more patients are being recognized with earlierstage disease, there are still a considerable proportion of patients who are progressing rapidly [4,16]. Mean survival in patients with bilirubin level of 2 mg/dL is 4 years, and that in patients with bilirubin level of 6.0 mg/dL is only 2 years [17].
In this study, we retrospectively reviewed the clinical parameters and ascertained liver-related events. To identify patients who can likely benefit from early initiation of second-line therapy, we selected biochemical indicators at different time-points and constructed a new risk stratification criterion to predict insufficient responses to UDCA treatment.

Study design
We collected and analyzed data from 569 patients diagnosed with PBC between 2004 and 2021 in the Xijing Hospital of the Fourth Military Medical University (Xi'an, China). The diagnosis and treatment of PBC were based on international guidelines [6,15]. Briefly, PBC was diagnosed when at least two of the following three criteria were met: (i) biochemical evidence of cholestasis with elevation of ALP, (ii) positivity for anti-mitochondrial antibodies, and (iii) consistency with PBC in liver biopsy. All participants were treated regularly with UDCA at 13-15 mg/kg/day. We only included PBC patients who were treated with UDCA continuously for at least 1 year after the diagnosis. Patients were excluded if they had an end-point within 6 months, viral hepatitis (hepatitis B or C), alcoholic liver disease, primary sclerosing cholangitis, steatohepatitis, and overlapping autoimmune hepatitis.

Definitions of biochemical response and endpoints
The biochemical response to UDCA treatment was evaluated according to six previously published definitions: (1) Barcelona criteria, a decrease in ALP level 40% of baseline values or a return to normal levels after 1 year of treatment; (2) Paris-I criteria, biochemical response was defined as ALP < 3 × ULN, AST < 2 × ULN, and bilirubin ≤ 1 mg/ dL after 1 year of UDCA treatment; (3) Paris-II criteria, AST and ALP ≤ 1.5 × ULN, with a normal bilirubin level after 1 year of UDCA therapy; (4) Rochester-II criteria, ALP < 2 × ULN at 12 months of UDCA therapy; (5) Rotterdam criteria, normalization of abnormal albumin and/or bilirubin levels after 1 year of UDCA treatment; (6) Ehime criteria, a 70% decrease from baseline level or a normal level of GGT after 6 months of UDCA treatment.
For the present study, all the definitions mentioned above were applied and evaluated using the same endpoint, that is, the occurrence of adverse outcome as defined by at least one of the following events: liver-related death, liver transplantation, and complications of cirrhosis (namely ascites, variceal bleeding, or hepatic encephalopathy). Data were censored at the time of death or liver transplantation for the patient who died or underwent transplantation, and at the time of presenting with a cirrhosis-related complication or the last follow-up for the living non-transplanted patients. If a living non-transplanted patient developed more than one cirrhosisrelated complication during follow-up, data were censored at the time of the first presentation of cirrhosis-related complications. To improve the prognostic performance of the criteria as early as possible, different cut-off values of ALP and AST levels with a normal TBIL at 1, 3, or 6 months were assessed to define new criteria.

Statistical analysis
Quantitative variables were presented as median with interquartile range (IQR). Comparisons of the biochemical liver tests at baseline, 1-, 3-, 6-, or 12 months were performed using the Wilcoxon signed-rank test for paired data. Categorical variables were presented as counts with percentages and compared by Chi-squared test or Fisher's exact test. Adverse outcome-free survival was estimated using the Kaplan-Meier method and compared by log-rank test. The effect of baseline variables or 1-, 3-, 6-, or 12 month biochemical response to UDCA on survival was estimated using the Cox proportional-hazards regression model. The average hazard ratio (HR) and 95% confidence interval (CI) were used to quantify the strength of the statistical links between the tested variables and survival. Univariate Cox regression analyses were applied to the training cohort to identify prognostic factors with different cut-off values of liver tests.
The C-index, likelihood ratio Chi-square, area under time-dependent receiving operator characteristic (tim-eROC) curve, sensitivity, specificity, positive (PPV), and negative (NPV) predictive values, as well as positive (PLR) and negative (NLR) likelihood ratios, were calculated for all definitions to assess their performance in predicting long-term outcomes. Akaike information criterion (AIC) was also calculated to compare the loss of information for different models. Bootstrapping with 1,000 samples was used for model validation. C-index and 95% CI was calculated by survcomp package by in R software. Statistical analyses were carried out using SPSS software (version 22.0; SPSS Inc., Chicago, IL, USA). The survival curve was plotted using the R 3.5.2 software with survival, and rms packages. The timeROC curve was plotted by timeROC package. All analyses were two-sided and p values < 0.05 were considered statistically significant.

Characteristics of study population
A total of 569 patients were finally included and randomly divided into the training (N1 = 393) and validation (N2 = 176) cohort at a ratio of 7:3 ( Fig. 1). Baseline characteristics were comparable between the 2 cohorts ( Table 1). The median follow-up was 53 months (IQR 32-79). Among all the patients, 476 patients (84%) were female, and 387 patients were in early-stage (histological stageI-II). There were no significant differences in baseline characteristics between the training and validation cohort.

Adverse outcome-free survival
In entire cohort, adverse outcomes were recorded in 71 patients (12.5%), including 18 liver-related deaths, 3 liver transplantations, 50 complications of cirrhosis (30 ascites, 13 variceal bleeding, 5 with both ascites and variceal bleeding, one with hepatic encephalopathy and ascites, and one with hepatic encephalopathy, ascites, and variceal bleeding). Adverse outcome-free survival rates at 3, 5, and 10 years were 93%, 87%, and 75%, respectively (Fig. S1). Among patients with adverse outcomes, the mean time to the end-point was 3.5 years (median, 3.0 years). Importantly, 29 (41%) of them had an end-point within 2 years, and 56/71 (79%) patients within 5 years (Table 2). Among these 29 patients with an end-point with 2 years, 14 patients were in early-stage and 13 patients were in late-stage (2 patients were not available). These results showed that a considerable proportion of patients had a rapidly progress within 2 years, even in early-stage patients. Hence, considering risk stratification in these patients using the guidelines after 12 month UDCA treatment could delay their timing in receiving adjunct therapy. Therefore, we aimed to identify an earlier criterion risk stratification.

Cut-off values of biochemical parameters for risk stratification
We firstly analyzed the dynamic changes of biochemical indicators within 1 year in entire cohort (Fig. 2). The serum levels of ALP, GGT, AST, and ALT at 1 month decreased by ~ 40%, and TBIL decreased by ~ 25% when compared with baseline values. These biochemical values fluctuated slightly and almost remained at stable levels thereafter. In univariate cox regression analysis, biochemical parameters associated with prognosis were a serum activity of ALP ≤ 2.5 × ULN, ALP ≤ × 2ULN, AST ≤ 1.5 × ULN, AST ≤ 2.5 × ULN, and TBIL ≤ 1ULN at 1, 3, or 6 months (Table S1). Thus, we subsequently applied these the cut-off values in further analysis.

3
The proportion of adverse outcome in non-responders of Xi'an criterion is 21.2%, which is higher than Barcelona (15.0%), Paris-Ⅱ (16.9%), and Ehime (16.5%), and slightly lower than Paris-Ⅰ (24.6%), Rochester-Ⅱ (21.3%), Rotterdam (22.6%). Non respondersjudged by Xi'an criterion showed higher or at least comparable proportion of adverse outcomes compared with published criteria. But our criterion was established by the data of 1-month UDCA treatment, so we speculated that this criterion was effective. We then further examined the Xi'an criterion using a separate cohort for validation. In validation cohort, the response rate was 54% with the Xi'an criterion. Similarly, rate of the adverse outcome in responders was only 3.9% using the Xi'an criterion when compared to 7.6-10.7% in other published criteria. Using the Xi'an criterion, the rate of adverse outcome in non-responders of Xi'an was 23.4%, which is lower than Rochester-II (27.6%). Responders defined by Xi'an criterion have a higher adverse outcome-free survival in both early-and latestage patients in training cohort (Fig. S2A). In validation cohort, non-responders defined by Xi'an criterion had a low adverse outcome-free survival compared to responders in early-stage, while there was no statistical difference in late-stage patients (p = 0.063, Fig. S2B). In entire cohort, Xi'an criterion showed good discrimination both in earlyand late-stage patients (Fig. S3), as well as cirrhotic and non-cirrhotic patients (Fig. S4).

Discrimination of high-risk patients with rapid progression by Xi'an and other published criteria
Among the 71 patients with adverse outcome, 29/71 (41%) had an end-event within 2 years in the entire cohort. Next, we divided patients with adverse outcomes into 3 groups, including rapidly progressive (with adverse events within 2 years), moderately progressive (with adverse events from 2 to 5 years), slowly progressive patients (with adverse events over 5 years) both in training and validation cohort (Fig. 4). In training cohort, the Xi'an criterion can accurately identify 82% rapidly progressive patients, which is higher than Barcelona (20%), Paris-I (67%), Paris-II (73%), Rotterdam (67%), Rochester-II (27%), and Ehime (64%). In validation cohort, 91% rapidly progressive patients were exactly identified by Xi'an criterion, which is higher than Barcelona (36%), Paris-I (64%), Paris-II (73%), Rotterdam (63%), Rochester-II (45%), and Ehime (67%). In moderately progressive patients, Xi'an criterion could distinguish 88% patients, only lower than Paris-II (94%) and Ehime (91%) in training cohort. Furthermore, in slow progressive patients, Xi'an criterion remained effective in identifying patients with adverse events. These results showed that Xi'an criterion had a superior ability to discriminate high-risk PBC patients, especially to those who had a rapidly progression.

Discussion
Stratified therapy is an important strategy in the clinical management of PBC patients. Several agents, such as obeticholic acid (OCA), fibrates, and budesonide proved to be effective for patients with insufficient UDCA response [19]. At present, there is also a trend to develop earlier intervention paradigms for PBC patients [3]. The clinical trial (NCT04076527) is currently ongoing to assess if OCA can improve clinical outcome in newly diagnosed PBC patients. Besides, a phase-3 clinical trial (NCT02823353) also enrolled new-diagnosed PBC patients combining fenofibrate with UDCA. In this study, we designed an earlier and excellent criterion, called Xi'an criterion, which is based on liver test using qualitative criteria after 1-month UDCA treatment, to discriminate patients who have a high risk of disease progression.
Notably, up to 40% of PBC patients will have a suboptimal biochemical response to UDCA, as assessed by binary response criteria and/or prognostic models [20]. The biochemical response to UDCA treatment strongly predicts long-term outcome. The responders defined by Paris-I criteria had a 10 year transplant-free survival rate of 90%, compared to 51% for non-responders [12]. Consequently, early identification of this subgroup patients is essential for guiding clinical practice. In this study, we determine a new definition of the biochemical response by focusing on biochemical parameters as early as possible and incorporating liver-related death, liver transplantation, and any clinical decompensated events of liver cirrhosis in the endpoints. These multiple end-point criteria are likely to better reflect the various patterns of PBC progression and be more specific to the disease course [11]. Notably, Xi'an criterion is simple qualitative criteria, like Barcelona, Paris, and Rotterdam criteria, which is much easier for clinicians to guiding clinical practice and making early prognostic judgment.
The disease progression of PBC patients varies greatly. Our data has shown the 29/71 (41%) patients with adverse outcome had an end-point within 2 years after initial diagnosis, and 82% patients in training cohort and 91% patients in validation cohort can be accurately categorized as nonresponders by Xi'an criterion, which is much higher than Barcelona, Paris-I, Paris-II, Rotterdam, Rochester-II and Ehime. In a recent study by Zhang et al. [21], 47% patients with adverse outcomes had an end-point within 5 years, compared to 78% in our study. However, the proportion of late-stage patients in our study (28%; 149/527) is approximately 2 times higher than Zhang's study (15%; 11/72). Even in early-stage, 36% (4/11) patients with adverse outcomes had an end-point within 5 years [11]. For these rapidly progressing patients, especially those who progressed within 2 years, Xi'an criterion is more effective in identifying high-risk patients than other criteria analyzed in this study. Early use of second-line agents for these high-risk patients may improve biochemical test and prolong survival without adverse outcome. Besides, Xi'an criterion had the highest c-index, specificity, PPV, and PLR both in training and validation cohort. In addition, the AUROC curve of Xi'an criterion is much higher than other published criteria. These results showed that Xi'an criterion provides an effective and reliable platform in predicting long-term outcomes.
In 2017, the EASL Clinical Practice Guidelines proposed various criteria as tools to select patients for second-line therapies and for a better design of clinical trials in PBC [6]. Multiple clinical trials were conducted to determine the safety and efficacy of other drugs such as OCA, bezafibrate, and elafibranor, in patients with incomplete response to UDCA [22][23][24]. Most clinical trials defined incomplete response in patients who were treated with UDCA at least for 12 months [25,26]. Our study showed that the level of biochemical parameters used in these criteria fluctuated slightly from 1 to 12 months, and Xi'an criterion showed excellent predictive effectiveness. However, whether it is reasonable for the Xi'an criterion to define the biochemical response, apply it to the response definition of clinical research, and the guidance of PBC management and choice of second-line treatment, further research is needed.
Zhang et al. proposed that previously published criteria, including Paris, Barcelona, Toronto, and Ehime, applied at 3 and 6 months significantly discriminated high-risk patients [21]. This study shows that earlier biochemical indicators can also be used to determine the prognosis of patients. Consistent with the results of Zhang et al., biochemical parameters at 3 and 6 months in our cohort are also relevant markers in predicting poor prognosis patients. In particular, our study found that the indicators at 1 month after UDCA treatment can also effectively predict the prognosis of patients. Since Paris-I criteria is considered the best for predicting prognosis for late PBC [12], while Paris-II criteria has a better performance for early PBC [11]. We assessed the discriminatory capabilities of the Xi'an criterion at different stages, and responders defined by Xi'an criterion have a higher adverse outcome-free survival in both early-and late-stage patients in training cohort. And in validation cohort, Xi'an criterion had a good discrimination in early-stage patients, while there was no statistical difference in late-stage patients (p = 0.063). There were 34 late-stage patients in validation cohort, and sample size may not be large enough to pick up a statistically significant difference.
However, this study had some limitations. Firstly, it was a single-center, retrospective study. Further validation in multicenter studies with a larger cohort of patients is warranted in future. Secondly, the mean follow-up period was 5 years and relatively short. Noting the mean period of developing adverse outcome which is 3.6 years, we submit that an average of 5 years of follow-up time is sufficient to forecast the prognosis of PBC patients.
In summary, we have designed and validated a new early criterion for distinguishing high-risk PBC patients in a Chinese population for the first time. Our data indicated that PBC patients with ALP ≤ 2.5 × ULN, AST ≤ 2 × ULN, and TBIL ≤ 1 × ULN (Xi'an criterion) after 1 month UDCA treatment were likely to have better prognosis. For rapidly progressive patients, the Xi'an criterion is highly reliable and has an overall excellent predictive capacity than other published criteria. In addition, Xi'an criterion provides significant prognostic information in both early-and late-stage PBC and provides an additional comprehensive platform in the clinical evaluation of PBC patients. Most importantly, it can be readily applied in the rapid identification of PBC patients who require additional therapeutic approaches.