Introduction

Primary sclerosing cholangitis (PSC) is a low-prevalence chronic progressive inflammatory disease [1]. Although elevated cholestatic laboratories are suggestive, they can neither diagnose nor monitor PSC activity over its median 12–20-year duration [2,3,4]. Therefore, an alternative non-invasive exam is needed to diagnose PSC, detect its complications, including the development of clinically-relevant strictures, and predict outcomes. Meta-analysis, confirming international imaging guidelines, showed that non-invasive magnetic resonance cholangiopancreatography (T2-MRCP) is the imaging study of choice for diagnosing PSC due to its high accuracy and very high (94%) specificity and positive likelihood ratio [5].

Ruiz et al developed the Anali scoring system with and without extracellular Gd-chelate to predict the most important factors determining PSC survival, including liver decompensation, liver-related death and/or the need for orthotopic liver transplant (OLT) [6]. Lemoinne et al validated both scores using an external cohort [7]. The Anali scores accurately predicted 4-year radiologic progression from baseline with subsequent validation in a retrospective multi-center study [6, 7]. Grave et al confirmed moderate inter-reader agreement for the Anali no gadolinium score and Anali arterial-phase using extracellular gadolinium-based contrast agent, as well as their predictive value [8]. Although, Grigoriadis et al found poor to moderate inter-reader agreement for the Anali no gadolinium score and Anali arterial-phase with gadoxetic acid, challenging their clinical utility, they also confirmed that both Anali scores correlated with clinical outcomes, highlighting MRI’s value in determining PSC prognosis [9].

Recently, Poetter-Lang et al reported on the value of gadoxetic acid-enhanced MRI (GA- MRI) to non-invasively and confidently diagnose potential functional stricture (PFS) in PSC patients, with an additional predictive value [10]. They found that this simple binary stratification could not only diagnose FS requiring ERCP dilatation and/or stenting, but also predict OLT or liver-related death in their cohort by diagnosing very advanced-stage PSC, i.e., hepatocellular dysfunction (HD) [10].

In PSC patients, both the Anali scores and PFS can be simultaneously evaluated prognostically on GA-MRI, including T2-weighted MRCP. Therefore, using gadoxetic acid, instead of conventional gadolinium chelates, our purpose was two-fold, firstly, to validate Anali scores with and without gadolinium (ANALIGdAP, ANALIGdHBP and ANALINoGd) and, secondly, to compare their prognostic ability with the recently-proposed potential functional stricture (PFS).

Patients and methods

Patients

Our institutional ethics review board approved this retrospective, single-center study. All patients gave written, informed consent for MRI and interventional procedures. Only patients with confirmed PSC according to EASL guidelines [11] who underwent GA-MRI between October 2007 and March 2022 were included. Patients with small-duct PSC, secondary sclerosing cholangitis, confounding liver illnesses (autoimmune hepatitis (AIH), Primary biliary cholangitis (PBC), Alcoholic liver disease (ALD), etc.), current or prior malignancy, who underwent orthotopic liver transplantation (OLT) prior to MRI, were under the age of 18, and/or had incomplete GA-MRI exams were excluded. Only patients with PSC-associated malignancies, i.e., cholangiocarcinoma, hepatocellular carcinoma (HCC), and gallbladder cancer were enrolled in the study. Patients with current or prior malignancies unrelated to PSC e.g., lung cancer, etc., were also excluded. Thus, our final cohort was comprised of adults with large-duct PSC who had at least one multiparametric GA-MRI, including conventional 2D- and 3D-T2-weighted-MRCP (Fig. 1 Flowchart).

Fig. 1
figure 1

Flowchart Between 2007 and 2022, 8564 patients underwent a standardized 3.0 Tesla contrast-enhanced MRI of the liver. Of these, 1312 patients were excluded because they were imaged using a contrast agent other than gadoxetic acid. A diagnosis other than sclerosing cholangitis further eliminated 7016 patients. Among patients with sclerosing cholangitis, additional exclusion occurred due to: secondary sclerosing cholangitis (SSC) (70), either overlap syndrome and/or small-duct PSC (14), incomplete HBP (10), previous OLT (11), age under 18 years (3), and prior malignancy (5)

Clinical data

Demographic and clinical data obtained from electronic medical records included patient age, gender, body mass index (BMI), date of, and indication for MRI, follow-up imaging exams, duration of PSC, and the presence of liver cirrhosis or concomitant inflammatory bowel disease (Table 1). Laboratory tests performed within two weeks of MRI, plus clinical scores that indicated disease severity, including MELD [12], Revised Mayo-Risk-Score (RMRS) [4], Fib-4 [13], APRI [14], ALBI [15], UK-PSC risk scores [16] and Prognostic Index of the Amsterdam-Oxford model (PI-AOM) [17], were recorded (Table 2). Clinical events that occurred after inclusion, orthotopic liver transplantation (OLT), death, cause of death, and cirrhosis decompensation defined as the occurrence of variceal bleeding, ascites, hepatic encephalopathy, or hepatorenal syndrome were recorded.

Table 1 Characteristics of 123 primary sclerosing cholangitis (PSC) patients
Table 2 Clinical scores, splenic volume, laboratory tests in 123 patients

Definition of sequelae

Patients entered the survival analyses at the time of GA-MRI. In March 2022, patient records were censored at date last seen, if they did not experience any sequelae. Survival status (alive, deceased), and date and type of whichever liver-related event first occurred were recorded (Table 1). OLT, death and decompensation (including encephalopathy, ascites, bleeding of oesophageal varices) were recorded as sequelae. The new occurrence of cholangiocarcinoma (CCA), gallbladder cancer, or hepatocellular carcinoma (HCC) were noted but not considered as sequelae if patients were still alive at the end of the study. Diagnostic or therapeutic endoscopic retrograde cholangiopancraticography (ERCP) was not considered as an event.

Disease severity classification

PSC severity and expected prognosis were based upon previously validated scores, including the RMRS, Fib-4, APRI, ALBI, Short-Term UK-PSC risk score, and PI-AOM for PSC [7, 16, 18, 19] (Table 2). We further classified these continuous scores into categorical groups, e.g., RMRS: low-risk (≤ 0); intermediate-risk (> 0 and < 2); and high-risk (≥ 2) groups [20, 21]. Binary classification of the Fib-4 was 0 (≤ 1.3) and 1 (> 1.3) [22], and the APRI was 0 (≤ 1.17) and 1 (> 1.17) [22]. The ALBI score was grouped into ≤ −2.60 (Grade 1), >  −2.60 to ≤ −1.39 (Grade 2), and > −1.39 (Grade 3) [23, 24], and the PI-AOM into low-risk, low – intermediate-risk, moderate-risk, and high-risk [25] (Table 2).

MRI exam protocol

All examinations were performed on a 3 Tesla MR (MAGNETOM Trio Tim, or PrismaFit, Siemens). T2-weighted-MRCP was performed according to Hoeffel et al’s protocol and adhered to International PSC Study Group recommendations [26, 27]. MRCP images included a respiratory-triggered, 3D, heavily T2-weighted sequence in the coronal plane and a breath-hold, thick slab, single-shot, 2D, heavily T2-weighted sequence in the coronal and oblique coronal projections. Dynamic MR images were obtained in the, following transverse plane covering the whole liver during end-expiratory breath hold. T1-weighted VIBE images were obtained pre-contrast and during arterial (using automatic bolus tracking, using TWIST sequence (TR: 68.77, TE: 1.52, FA: 30°, slice thickness: 20 mm, FOV: 350, matrix 148 × 192, and only one NEX), portal venous (70 s), transitional (300 s) and hepatobiliary (20 min) phases (HBP) after injection of gadoxetic acid (Primovist® in Europe, Eovist® in USA). In addition, an axial T2-weighted sequence, an axial T2- weighted sequence with fat-suppression, axial in- and out-of-phase T1-weighted gradient echo sequence and diffusion-weighted sequences with three b-values (50, 300 and 600) and ADC map, and axial and coronal plane HBP images (with a flip angle of 20° and 35°) were obtained. The examination parameters for the whole MRI exam are given in Table 1S.

Image analysis

MRI exams were anonymized, then independently evaluated on a commercially available PACS workstation by five readers, R-A, R-B, R-C, R-D and R-E with 2, 3, 4, 6 and > 20 years of experience in abdominal radiology, respectively. To assess the intra-reader agreement, R-C and R-E reviewed the images twice, at least 12 weeks apart.

Blinded to all clinical data except PSC diagnosis, the radiologists assessed the ANALI scores for each patient. The individual parameter for Anali scores, including intrahepatic bile duct dilatation (IHBD) was assigned as = 0 if any duct was 3 mm or smaller, = 1 if any duct was 4 mm, and = 2 if ducts was 5 mm or larger. Liver dysmorphia was considered present, score = 1, if there was atrophy, lobulation of the liver contour and/or increased caudate-to-right liver lobe ratio [6, 7]. Otherwise, dysmorphia was considered absent, i.e., score = 0. Portal hypertension (PH) was considered present if there were collateral vessels, with or without splenomegaly, score = 1, were observed. Otherwise, PH was scored = 0, indicating its absence. Liver enhancement was assessed on both the arterial (AP)- and hepatobiliary-phase images (HBP). If parenchymal enhancement was uniform, it was scored = 0. Otherwise, heterogeneous liver parenchymal enhancement was scored = 1.

Then Anali scores were calculated as follows: ANALINoGd = (1 × dilatation of intrahepatic bile duct (IHBD)) + (2 × liver deformity) + (1 × PH), range 0 to 5, and ANALIGd = (1 × liver deformity) + (1 × parenchymal enhancement heterogeneity), range 0 to 2 [6, 7].

Thereafter, Anali scores were dichotimized as follows, ANALINoGd: low risk (0–2 points) and high risk (3–5 points), i.e., binary, ≤ 2 and > 2. ANALIGd in AP and HBP: low risk and high risk, i.e., binary ≤ 1 and > 1.

On 20-minute HBP images, patients were also dichotomized into normal contrast excretion through the biliary system at 20 min (NFS) or impaired excretion (PFS), i.e., no contrast seen either to: first-order left hepatic duct (LHD) or right hepatic duct (RHD) or common hepatic duct (CHD/hilum) or common bile duct (CBD) or none at all at 20 minutes [10]. For further details regarding PFS, please refer to Poetter-Lang et al [10].

Furthermore, estimated splenic volume was calculated as [mL] = 30 + 0.58 × L × D × T [28]. A cut-off value of 381.1 cm³ was chosen to differentiate normal-sized from enlarged spleens (Table 2) [29].

Statistical analysis

Metric data are presented as means ± standard deviations or median and quartile, depending upon their distribution. Nominal data are presented as absolute frequencies and percentages. Categorical data were evaluated by the chi-squared-test or the Fisher’s exact test.

Inter- and intrareader agreement between radiologists were assessed using Fleiss,’ and Cohen’s kappa, respectively. The Fleiss’ kappa was determined separately for each parameter, as well as low-risk vs high-risk (i.e., binary) for all 3 ANALI scores, indicating inter-reader agreement. For reader E, the intra-reader Cohen’s kappa was obtained. Then 95% confidence intervals (CI) were calculated for each value.

The kappa values < 0 indicated poor, 0.00–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1.00 almost perfect agreement [30]. Event-free survival was defined as the time interval from MR diagnosis to the first liver-related event occurrence. Kaplan-Meier estimates were performed, survival curves compared with log-rank test, when applicable. Univariate Cox proportional hazards regression analysis was performed to evaluate the association between outcomes and binary data for all 4 imaging parameters, i.e., PFS vs NFS, and (low-risk vs high-risk) ANALI scores, clinical scores and laboratory parameters. Multivariate analysis with adjustment for age and sex was also performed.

For event-free survival analysis, we calculated the results of each reader separately and then averaged them, yielding a mean event-free survival for the PFS and each ANALI score.

For survival analyses, cohort data were dichotomized into low- and high-risk scores, for ANALINoGd, i.e., ≤ 2 and > 2, and for both the ANALIGdAP and ANALIGdHBP, i.e., ≤ 1 and > 1. The level for statistical significance was set at p < 0.05. Statistical analyses were performed using R Studio (Version 1.4.1717) and IBM SPSS (version 26).

Results

Cohort characteristics

We included 123 patients, 50 F/73 M, mean age 40.5 ± 14.2 years (range, 18.3–77.6 years), diagnosed with PSC according to society-guideline MRCP features [27, 31]. Of these, 44 (34.1%) had biopsy-confirmed PSC. Inflammatory bowel disease was histologically verified in 81 patients: 49 with ulcerative colitis; and 32 with Crohn’s disease. Mean duration of the PSC was 7.4 years. Mean follow-up post-MRI was 3.9 years (Table 1).

By March 1st, 2022, 37 (30%) of our 123 patients had experienced at least one sequelae [17 OLT (14%), 7 liver- related deaths (6%), and 13 only pure decompensation (11%)]. The causes of death were hepatic dysfunction or acute-on -chronic liver failure (ACLF) in 4 patients, CCA in 1 patient, gallbladder carcinoma in 1 patient and septic shock of biliary origin in 1 patient. The reasons for OLT were end-stage liver disease with hepatic dysfunction in 14 patients and recurrent cholangitis in 3 patients.

Other recorded events included development of hepatobiliary cancers, i.e., an additional 3 CCA and 1 gallbladder cancer, but no HCCs. These 4 cancers were not considered sequelae since patients were still alive at the end of the study, in accordance with the original study design [7].

There was no statistically significant difference in gender or BMI between both groups (all p > 0.05) However, patients with sequelae were significantly older, p = 0.001 and their PSC duration was almost 1.5 times longer, p = 0.038 (Table 1).

Inter- and intrareader agreement

Inter-reader agreement for PFS vs NFS, for each variable of the three ANALI scores with and without gadoxetic acid, and for the binary, i.e., low-risk vs high-risk of the three ANALI scores with and without gadoxetic acid are shown in Table 3. All components and scores demonstrated statistical significance, p < 0.001. Fleiss’ kappa (ϰ) was highest, 0.81 for binary PFS showing almost perfect agreement between the five readers. Individual variables of the ANALI scores exhibited moderate to substantial agreement (0.41–0.74). Intrahepatic duct dilatation and arterial-phase parenchymal enhancement heterogeneity had the lowest agreement (for both ϰ = 0.41, moderate). Liver deformity had higher agreement (ϰ = 0.74, substantial), while evaluation of portal hypertension and HBP parenchymal enhancement heterogeneity had moderate agreement (ϰ = 0.41 and 46), respectively.

Table 3 Inter-reader agreement for all 5 readers using Fleiss’ Kappa correlation with 95% confidence intervals (CI) of potential functional stricture (PFS) and ANALI scores and their components

The ANALINoGd showed fair agreement (ϰ = 0.31), versus moderate agreement for the ANALIGdHBP and ANALIGdAP (ϰ = 0.43 and ϰ = 0.47), respectively.

Cohen´s kappa for R-E was 0.87 for the PFS (almost perfect), but only 0.69, 0.72, and 0.77 for ANALI scores, respectively, i.e., substantial to almost perfect, agreement. For R-C. Cohen´s kappa values were 0.82 for PFS (almost perfect), but only 0.65, 0.71, and 0.73 for ANALI scores, respectively i.e., also substantial to almost perfect, agreement.

Prognostic performance of the imaging and clinical scores

Mean adjusted hazard ratios for low- vs high-risk imaging and clinical scores were as follows: ANALINoGd, (6.12), PFS, (3.12), ANALIGdAP, (3.59), ANALIGdHBP, (3.56), APRI (3.26), ALBI (3.69), and AP (3.26) (Table 4).

Table 4 Mean low- vs high-risk estimates for clinical outcomes (i.e., liver-related death, liver transplantation or decompensated cirrhosis) for PFS and ANALI scores for readers A-E, as well as splenic volume and clinical scores

All mean reader imaging scores (i.e., PFS and 3 ANALI scores) differed significantly (p < 0.0001) in patients with vs without sequelae (Table 5).

Table 5 Mean of scores for readers A-E per sequelae for the 123 primary sclerosing cholangitis (PSC) patients

The NPV was highest for the ANALINoGd, 95.2%, while the PPVs for all four scores ranged from 50% to 59.5% (Tables 1, 2 and 3S). See the supplementary section for further data on the mean reader scores and all data on individual scores for all five readers (Table 2S) and (Fig. 2a–c).

Fig. 2
figure 2

No functional stricture (NFS). OLT for HD. A 36 years-old male PSC patient. Axial (a) T2- weighted images, coronal maximum intensity projection from 3D MRCP (b) and post-contrast T1- weighted images with fat suppression in arterial (c, axial) portal venous phase (d, coronal) and HBP (e, coronal and f, axial). ANALI scores with and without gadoxetic acid in the arterial and HB phases were the same for all five readers (A-E): 2 in the ANALIGdAP, 2 in the ANALIGdHBP, and 5 in the ANALINoGd. In other words, all readers graded duct dilatation ≥ 5 mm, liver dysmorphy (i.e., liver contour lobulation (⇥ in a) ± increased caudate-to-right liver lobe ratio (* in a)) as 1, portal hypertension with collateral vessels (⇥ in d) ±, splenomegaly (* in d) as 1, and heterogeneous parenchymal enhancement as 1 (⇥ in c and f). All 5 readers called no functional stricture (NFS) since contrast in the intra- and extrahepatic bile ducts on the HBP image indicates excretion (⇥ in e)

The Kaplan Meier curves for the outcome-free survival of all patients showed a markedly higher probability of sequelae over time in patients with high-risk vs. low-risk imaging scores (Fig. 3a–d). The difference was always most visible one year after MRI, especially for the ANALINoGd. p value for log-rank test for all four imaging scores (ANALINoGd, ANALIGdAP, ANALIGdHBP and PFS) was statistically significant (p < 0.001).

Fig. 3
figure 3

No functional stricture (NFS). OLT for HD. A 55 years-old male PSC patient. Coronal maximum intensity projection from 3D MRCP (a), axial T2-weighted image (b) and post-contrast T1-weighted images with fat suppression in arterial phase (c, axial) and portal venous (d, coronal) and HB phase (e, axial and f, coronal). ANALIGdAP, ANALIGdHBP, and ANALINoGd.scores were calculated by all 5 readers. Readers A-E graded gadoxetic acid-enhanced AP images as 2, and HBP images as 2. In other words, all readers registered parenchymal enhancement heterogeneity as present in the AP (⇥ in c), as well as in the HBP (⇥ in e). But on non-contrast images, ANALINoGd scores were 4,4,3,3,2 for readers A, B, C, D, and E, respectively. Reader A, B and D scored the IHBD as 1 (4 mm, ⇥ in a), Reader C and E as 0 (≤ 3 mm, ⇥ in d in a). Reader A, C, and D rated the liver as deformed = 1 (⇥ in d in b), whereas readers B and E judged the liver as normal = 0. Reader E scored no signs of portal hypertension = 0, whereas, due to collateral vessels (⇥ in d), all other readers scored PH as = 1. All 5 readers called no functional stricture (NFS) since contrast excretion is seen within the intra- and extrahepatic bile ducts in the HBP (⇥ in d, f)

Spleen volumetrics

Spleen volumes were significantly higher (p < 0.001) in the patients with adverse outcomes and significantly more patients with splenomegaly had liver-related outcomes (p < 0.001), highlighting the importance of splenomegaly as a marker of disease severity (Table 2, 3S).

Discussion

We found that PFS, and all three ANALI scores (ANALINoGd, ANALIGdAP, ANALIGdHBP) could non-invasively predict PSC outcomes, i.e., liver-related death, the need for OLT, and decompensation. Both the ANALI scores and PFS are derived from a routine gadoxetic acid- enhanced MRI, including T2-weighted MRCP. Although the inter-reader agreement for the raw ANALI scores were only fair to moderate, confirming the results of previous studies [8, 9], this agreement improved substantially when the scores were dichotomized into low- and high-risk. The relatively low inter-reader agreement of the Anali Scores can be explained by tiny differences in reader measurements, quite understandable since we are dealing with millimetres (Fig. 4). Whereas one reader might measure a duct as 3 mm, another might call it 4 mm. By dichotomizing the ANALI scores, we get the overall picture of the patietnt´s disease status, i.e., relatively mild vs relatively severe. In fact, that is the point of prognostication, i.e., to determine the big picture. Furthermore, the predictive value of these binary single- and mean-reader ANALI scores (low- and high-risk) was statistically significant, confirming the value of ANALI scores as an outcome prognosticator [6,7,8,9]. On the contrary, the PFS had almost perfect inter- and intra-reader agreement which we attributed to its simplicity, i.e., presence or absence of biliary contrast excretion [10]. Therefore, the combined use of Anali scores and PFS makes it an appealing prognosticator for short and medium-to-long-term-monitoring.

Fig. 4
figure 4

Functional stricture (FS). OLT for HD. A 59 years-old male PSC patient. Coronal maximum intensity projection from 3D MRCP (a), Post-contrast T1-weighted images with fat suppression in arterial phase (b, axial), portal venous (c, coronal) and HB phase (d, axial and e, coronal). Also, ERCP (f, coronal) and 4-week follow-up coronal T1-VIBE HBP, after ERCP (g). Potential functional stricture (PFS) was diagnosed by all five readers due to the absent contrast excretion in the HBP ( ↑ in e). ERCP the following day confirmed a FS which was dilated (⇥ in f). Follow-up HBP shows normal excretion (⇥ in g), i.e., NFS, now. All five readers scored the ANALIGdHBP as 2 (⇥ in d) because the parenchymal enhancement was rated as heterogeneous and liver dysmorphy was judged present (➞in b). However, for the ANALIGdAP, readers A, C and E scored the parenchymal enhancement as heterogeneous (⇥in b), i.e., 1 while readers B and D scored the enhancement as homogeneous, i.e., 0. This means the overall ANALIGdAP scores were 2, 1, 2, 1, 2 for readers A, B, C, D, and E, respectively. For the ANALINoGd, readers A, B, C, D, and E assigned scores of 3,2,5,3, and 2, respectively. For the individual parameters, IHBD were scored as 0, 0,2 [maximal duct diameter as ≥ 3 mm (⇥ in a)], 0,0, respectively. All readers felt that liver dysmorphia [enlarged caudate lobe (* in b), lobulated liver surface (➞ in b)] was present, i.e., 1. All but reader B felt that PH was present due to collateral vessels (⇥ in c)

As the PFS and all three ANALI scores correlated well with all clinical scores and laboratory tests in assessing PSC severity, all should be taken into consideration when managing the patient. By considering the whole GA-MRI, one can determine whether ERCP is required, i.e., presence of a PFS, or whether high-risk ANALI scores are due to liver cirrhosis as a consequence of PSC’s natural course, i.e., hepatocellular dysfunction (HD) [6, 7, 10].

A direct relationship has been shown between PFS and sequelae [10], just as has been shown previously for ERCP-diagnosis of dominant stricture (DS) and its sequelae [11, 32, 33]. This highlights the importance of timely diagnosis and treatment of functional stricture (FS) in determining PSC’s course [10]. Like DS, FS may exacerbate cholestasis, leading to further inflammation and fibrosis, and finally, the development of liver cirrhosis and decompensation (HD) requiring OLT [10, 34]. Furthermore, recurrent biliary tract obstruction, i.e., DS or FS, causing recurrent cholangitis may accelerate PSC progression through inflammation and fibrosis [35]. As liver damage is related to stricture severity causing bile flow obstruction [36] PFS is not only a diagnostic tool but is also a predictive surrogate. By making an early diagnosis of functional stricture, prompt ERCP to confirm and treat a dominant stricture may reduce the risk of long-term liver injury in these patients [10, 33].

We observed that ANALINoGd was a stronger predictor of adverse outcomes than PFS and ANALIGd (Fig. 5). We attributed this to the fact that it takes into account extrahepatic features of advanced cirrhosis, too, i.e., not just dilatation of intrahepatic bile duct (IHBD) and liver deformity, but also portal hypertension. In addition, the wider range of ANALINoGd, from 0 to 5, rather than 0 to 2 for the ANALIGd, probably stratifies the patients better. However, we have to interpret these results with caution, since the confidence interval (CI) was wide. Further prospective studies are needed.

Fig. 5
figure 5

ad Kaplan Maier curves for adverse outcome-free survival for the low- vs high-risk PSC patients based upon ANALINoGd, ≤ 2 and > 2 (a), ANALIGdAP, ≤ 1 and > 1 (b), ANALIGdHBP, ≤ 1 and > 1 (c), and PFS vs NFS (d)

Even so, the prognostic ability of PFS is comparable to that of both ANALIGd [6, 7]. This highlights the main advantage of Poetter-Lang et al’s recently-introduced PFS [10]. Because MRI is risk-free compared to invasive ERCP, clinicians likely act far more quickly to work-up suspected FS. The advantage of high inter- and intra-reader agreement with PFS is likely to increase diagnostic confidence, potentially contributing to earlier management of DS if detected. This will probably be just as important for FS as it has been for DS which complicates the clinical course in up to 50% of PSC patients [37]. Thus, imaging can generate reliable prognostic models, and may help avoid or postpone progression to severe fibrosis of end- stage PSC. We suggest that more weight should be given to the PFS if a clinically-relevant stricture is the concern while the ANALINoGd score should be given preference in long-term counselling.

The mean NPVs were very high (i.e., ca 90%) for all ANALI scores and ca 80% for the PFS, while the PPVs were never more than 50–60%. Such high NPVs can help clinicians identify PSC patients who are unlikely to have an event, i.e., at low risk. This can help to avoid unnecessary invasive tests and treatments.

Furthermore, our results can be implemented to tailor the MR exam according to the clinical indication. That is, for the follow-up of stable PSC patients, we can perform MRI without contrast agent, while in symptomatic PSC patients with elevated liver enzymes or tumor marker (s), we can inject gadoxetic acid. If a PFS is present, ERCP with brush cytology can clarify the presence and nature of the stricture, i.e., either benign or malignant [10]. This should be further evaluated in a prospective multi-center study.

Spleen volume also correlated significantly with PSC adverse outcomes. As in other chronic liver diseases, splenomegaly and imaging signs of portal hypertension herald advanced PSC, and increased risk of further event(s) [29].

Our study had several limitations. First, it was a retrospective single-center study with inherent potential bias. However, we assume that this is not a serious limitation, as this is a conformity study for established scores, i.e., Anali scores. Secondly, the mean follow-up after MRI (3.9 years) was relatively short given the natural course of PSC. Therefore, our results should be interpreted as mid-term prognostic outcomes and further long-term evaluation is warranted. Thirdly, our study population presented for imaging with moderate-to- advanced disease, therefore, the majority of events occurred within 12 months of MRI. As this is not expected in PSC, developing predictive models based on patients with advanced disease might overestimate the risk. We believe this relatively high percentage of adverse events within a short follow-up time is because our tertiary care patients were severely ill compared to the average PSC patient. This is a well-known phenomenon in tertiary centers, such as ours [17]. Furthermore, our retrospective study design, by skewing our inclusion cohort, caused rather short mean follow-up times (3.9 years) as compared to our long observation period, i.e., 7.4 years.

In conclusion, by dichotomizing, the four MR-derived scores, the NPVs, especially for the ANALINoGd, proved good prognosticators. Furthermore, they appear to be complementary, i.e., the ANALINoGd seems a better longer-term predictor of PSC-related outcomes while the PFS is superior if a potential functional stricture is the main clinical concern.