Background

Primary Sclerosing Cholangitis (PSC) is a chronic, cholestatic liver condition that results in inflammation and fibrosis that can involve the entire biliary tree [1]. PSC is a progressive disorder and can lead to cirrhosis, portal hypertension and liver failure [1].

Approximately 1 in 100,000 people in the general population is affected with PSC per year in Europe and the United States [2]. The disease occurs at any age, but is more prevalent in adults between the ages of 30–60 years and is more common in men than in women. Approximately 70–80% of patients with PSC have an associated inflammatory bowel disease (IBD) such as ulcerative colitis or Crohn’s disease [3]. Currently, there is no known licensed medication to prevent the progression of PSC, which if left untreated can result in increasing disability and even death [4]. In patients with end-stage PSC liver disease, the only therapeutic option currently available is a liver transplant [4].

Although overall disease progression can be slow, patients with PSC can experience a range of debilitating symptoms. In the early stage of the disease, symptoms include tiredness or fatigue. In more advanced cases, symptoms include pruritus, jaundice, abdominal pain, weight loss, fevers, hyperpigmentation, vitamin deficiencies and metabolic bone disease [5]; all of which can have a significant impact on health-related quality of life (HRQOL) [6, 7].

Increasingly in chronic diseases and terminal illness, it is recognised that maintaining HRQOL is an important consideration when the treatment is aimed at maintenance rather than a cure, or the treatment has a high level of toxicity [8]. Many of the current therapeutic interventions in PSC are aimed at managing symptoms. Measuring the impact of these interventions and preserving HRQOL is an important aspect of PSC care. This requires patient reported outcome measures (PROMs) that are sensitive enough to capture changes in HRQOL or symptoms over time.

Increasingly, PROMs use has demonstrated a positive contribution to clinical practice and research [9]. In clinical practice, aggregate level PROM data can help us to understand the burden of chronic medical conditions, identify health inequalities [10] and determine new areas for therapeutic interventions. They can also play a key role in benchmarking and audit. [11] At an individual patient level, PROMs can be used to monitor the response, adverse effects and benefits of treatments in routine practice, [12] facilitating communication between clinicians and patients regarding their HRQOL, symptom management and control [13,14,15].

A previous review investigating the quality of life (QOL) instruments used in liver transplant recipients has been conducted [16]. However, to date, no comprehensive review of PROMs used in PSC patients has been undertaken. There is a clear need to evaluate the measurement properties of the PROMs currently used in this population to determine the optimal measures for use in future research and routine care. Therefore the objectives of this systematic review were to: (a) identify and categorise PROMs currently used in research involving the PSC population; and (b) investigate their measurement properties, to help inform the selection of PROMs for use in future PSC research and routine practice.

Methods

The following guidelines were used, where applicable, to inform the conduct and reporting of this study: (i) the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [17] guidance (see Additional file 1 for the PRISMA checklist), (ii) COnsensus based Standards for the selection of health Measurement INstruments (COSMIN) guidance [18] and (iii) the updated method guidelines for systematic reviews in Cochrane collaboration back review group [19]. The study was registered with PROSPERO (Registration Number: CRD42016036544).

Search strategy

A systematic search was conducted on the following electronic databases: Medline, EMBASE and CINAHL from inception to 15 February 2018. The search terms “Primary sclerosing cholangitis” and “Patient reported outcome measures” were used, alongside synonyms and related terms (see Additional file 2 for the full search strategy). These terms were combined with the COSMIN search filters developed by VU University Medical Centre Amsterdam and University of Oxford (available on COSMIN website: http://www.cosmin.nl/). In addition, papers included in the full text review were subjected to a hand search of reference lists [20, 21].

Inclusion criteria

Studies were eligible if:

  1. a)

    PROMs were included in the study meeting the FDA definition [22].

  2. b)

    Study participants were patients with PSC.

In addition:

c) Studies that evaluated at least one measurement property (i.e. reliability, validity, responsiveness, interpretability) were included in the COSMIN quality review.

No restriction was placed on age or gender of participants or language, publication date or country of origin of the study.

Selection of studies

Two reviewers (FI/GT or GT/GK) independently screened studies according to their title and abstract to determine eligibility. Following this, the full text of potentially eligible studies was retrieved and screened independently by two independent reviewers (FI/GT or GT/GK). The protocol planned that discrepancies would be discussed with a third investigator (MG or DK or AS) to reach consensus; however, this was not required.

Data extraction

The two independent reviewers (GT plus FI, GK or AS) independently extracted the data from each study using a predefined form (including study design and patient level characteristics). Information regarding each PROM was extracted, including: constructs, therapeutic area, domains, number of items, scoring method, recall period, administration, completion time, data collection, cost/permission and measurement properties (reliability, validity, responsiveness, interpretability).

Content comparison of included PROMs

A summary of PROMs used in studies of PSC patients, including an overview of included domains and specific content was prepared. The PROMs were categorised according to their domains to facilitate comparison of the measures that have been used in PSC studies to-date.

Quality assessment

The COSMIN checklist [23] was used to assess the methodological quality of studies that reported on the measurement properties of PROMs used in the study. Two reviewers (FI/GT or GT/AW) independently completed the COSMIN checklist. The protocol planned that discrepancies would be discussed with a third reviewer; however, this was not required. Each measurement property was scored according to the quality of reporting by the publication, using a four-point rating scale: ‘excellent’, ‘good’, ‘fair’ and ‘poor’. The methodological quality of each study was rated by taking the lowest score (worst score counts method) per domain. For example, if any of the items of the domain reliability was scored ‘poor’, the overall score for regarding the methodological quality of reliability was rated as ‘poor’.

Evidence synthesis

Synthesis of measurement property evidence was performed using standardised criteria developed by Terwee 2011 [23]. The summary of the overall evidence of measurement properties of the PROMs was determined by the number of studies, the methodological quality of the studies, and consistency of the findings. Based on these factors the overall rating of a measurement property per PROM was ranked as “+” positive, “?” indeterminate or “-” negative and combined with an assessment of the overall level of supporting evidence (strong, moderate, limited, conflicting, unknown) as proposed by the Cochrane Back Review Group [24].

Results

Study selection

In total, 8074 studies were identified, 5893 remained after duplicate removal and 150 remained after reviewing titles and abstracts (Fig. 1). Following review of the 150 full texts, 37 studies, containing 36 different PROMs, were included.

Fig. 1
figure 1

PRISMA flowchart describing the identification, selection and inclusion of studies on PROM assessment in Primary sclerosing Cholangitis

Table 1 summarises the general characteristics of the included studies. The study designs included 17 cross-sectional studies, five randomised controlled trials (RCTs), four case-control studies, two validation study, two pilot study, two before and after study, one cost-effectiveness study, one case matched study, one longitudinal study, one cohort study and one retrospective case series study.

Table 1 Characteristics of included studies

Twenty seven of the 37 included studies used PROMs to examine the impact of PSC on patients and seven of these measured the effectiveness of treatments: one study evaluated the cost-effectiveness of liver transplantation, one study assessed health utilities and two were validation studies of the PROMs: the National Institute of Diabetes Digestive and Kidney Diseases Liver Transplant (NIDDK-QA) and the Primary Sclerosing Cholangitis Patient Reported Outcome (PSC PRO).

In total, 3742 patients with PSC were recruited to the included studies (sample size range n = 4–1000). All participants were adults, with the exception of one study [25] which included patients with the mean age of 11.6 years. Studies were heterogeneous in terms of population demographic characteristics. In the thirty-five studies that reported gender, the proportion of PSC patients who were males ranged from 15 to 97%. Five studies reported a relatively wide range of mean Mayo risk scores (− 0.1 to 2.87) for PSC patients, a score which estimates patient survival in PSC [6, 26,27,28,29]. Twenty-four studies described the proportion of IBD in PSC patients, ranging from 7 to 100%. In 12 studies, the percentage of PSC patients who had received a liver transplant ranged from 12 to 100%.

Characteristics of PROMs

Characteristics of the 36 included PROMs are presented in Table 2. The most frequently used PROM was the Short Form 36 health survey (SF-36) (n = 15), followed by the Chronic Liver Disease Questionnaire (CLDQ) (n = 6) and the Primary Biliary Cirrhosis (PBC)-40 (n = 5). All other PROMs were used in ≤3 studies (Table 1).

Table 2 Characteristics of included PROMs

There were seven generic measures including: the 15 Dimensional Health-Related Quality of Life Measure (15D ©) [30, 31]; SF-36® [6, 27,28,29, 32,33,34,35,36,37,38,39,40,41,42,43]; Short Form 6 health survey (SF-6D) [27]; Psychological General Well-being Index (PGWBI) [44]; Paediatric Quality of Life Inventory™ generic core scale (PedsQL™) [25]; EuroQOL (EQ. 5D) [37, 45, 46]; and the World Health Organization Quality of Life assessment instrument (WHOQOL-BREF) [37].

Ten disease-specific measures included: the Short form Liver Disease Quality of Life questionnaire (LDQOL 1.0) [32]; CLDQ [27, 29, 38, 39, 42, 43]; the NIDDK-QA [26, 28]; Rome II Modular Questionnaire; the Cleveland Global Quality of Life questionnaire (CGQOL) [34]; the Short Inflammatory Bowel Disease Questionnaire (SIBDQ) [32, 47]; Oresland scale; PSC PRO; [43] PBC-27 [35, 40, 41]; and PBC-40 [32, 35, 40, 41, 43].

The 17 symptom-specific PROMs included: the FIS [29, 37, 44]; Gastrointestinal Symptom Rating Scale (GSRS) [44]; Fisk Fatigue Severity Scale (FFSS) [36, 42, 48]; Multidimensional Fatigue Inventory (MFI) [48]; VAS [48,49,50]; the 5-Dimension Itch; [42, 43] the Pruritus numerical rating scale; [51] the Hospital Anxiety and Depression Scale (HADS) [29]; Beck Depression Inventory (BDI) [44, 52]; Inventory of Depressive Symptomatology (IDS) [50]; Patient Health Questionnaire (PHQ-9) [6, 32]; Schedule for Affective Disorders and Schizophrenia (SADS) [52]; the Female Sexual Functioning Index (FSFI) [34]; International Index of Erectile Function (IIEF) [34]; Epworth Sleepiness Scale (ESS); [21] and Composite Autonomic Symptom Scale 31 (COMPASS 31) [21].

Two other measures included: the Lifetime Drinking History (LDH) and Health Habits and History Questionnaires (HHHQ), which focused on alcohol consumption and dietary intake.

Content comparison of included PROMs

The most frequent health domains (n = 6) included across the measures were: fatigue, pain, physical functioning, emotion, anxiety and general health.

Generic PROMs measured symptoms such as pain, physical functioning, emotion, mental health and depression. The disease- and symptom-specific PROMs targeted aspects surrounding gastro intestinal symptoms, such as abdominal pain, or gastroduodenal symptoms, sexual problems, somatic symptoms, depression, mood disturbance, and vegetative features (Additional file 3).

Quality assessment

Only three studies investigated measurement properties for PROMs, two studies evaluated the NIDDK-QA [26, 28] and one study evaluated the PSC PRO [43].

For NIDDK-QA, one validation study [28] included 76 Primary Biliary Cirrhosis (PBC) and 17 PSC patients. A second study examined health status and QOL in patients with cholestatic disease before and after a liver transplant. In this study the NIDDK-QA questionnaire was administered to 65 Primary Biliary Cirrhosis and 92 PSC patients [26]. The PSC PRO validation study included 102 patients with PSC who completed the PSC PRO and four other questionnaires (SF-36, CLDQ, PBC-40 and 5-D Itch Scale) using an ePRO website [43]. The results of the validation studies are presented in Table 3 and summarised below.

Table 3 Results of measurement properties of NIDDK-QA

Internal consistency

All the validation studies, appropriately calculated Cronbach’s alpha to estimate reliability and internal consistency. Reported Cronbach’s Alpha ranged from 0.87 to 0.94 for the NIDDK-QA and 0.86 to 0.94 for the PSC PRO which suggests good internal consistency. Criteria defined by the COSMIN tool meant that for the NIDDK-QA the measurement properties were evaluated as ‘poor’ in methodological quality in both studies primarily because of small sample sizes and a lack of information regarding the proportion of missing items and how missing items were managed. The PSC PRO was rated as ‘fair’ due to the lack of explicit reporting of missing items and sample size for unidemensionality analysis.

Reliability

Kim et al. (2000) [28] assessed test-retest reliability of the NIDDK-QA by administering the measure on two separate occasions approximately 2 weeks apart in 19 patients. Although Pearson’s correlation was high at 0.80 (range 0.82 to 0.94), this measurement property was evaluated as ‘poor’ methodological quality due to the small sample size. For the PSC PRO, 53 patients completed the PSC PRO a second time within 3 months and correlations between administrations was high (range 0.70–0.88). The reliability of the PSC PRO was rated as ‘fair’ due to this length of time between administrations.

Validity

Kim et al. (2000) [28] assessed concurrent validity, by investigating the correlation between the NIDDK-QA and SF-36. The authors postulated that observed correlations between theoretically related domains such as physical function and health satisfaction (r = 0.86 and 0.72 respectively) demonstrated concurrent validity of the tool. However, this measurement property was also evaluated with ‘poor’ methodological quality owing to the absence of details regarding the measurement properties of the comparator scale (SF-36) in this population, and issues with sample size and missing data.

Kim et al. (2000) [28] also measured discriminant validity and information on the significant differences in the item and domain level scores of NIDDK-QA reported. Again, this property was evaluated with ‘poor’ methodological quality, secondary to issues regarding sample size, proportion and handling of missing data.

For the PSC PRO, 26 PSC patients enrolled in cognitive interviews for assessment of content validity, which was rated as ‘excellent’ according to the COSMIN checklist. An external validation cohort of 102 patients completed the PSC PRO along with SF-36, CLDQ, PBC-40 and 5-D Itch Scale; all correlations were statistically significant. The structural validity measurement property was rated as ‘fair’ due to the sample size in relation to the number of items.

Evidence synthesis

Both NIDDK-QA studies reported limited information regarding internal consistency, reliability and validity (concurrent and discriminant). Using the COSMIN guidance these properties were rated as indeterminate due to the poor methodological ratings of both studies (Tables 4 and 5) (Additional file 4) [23]. The PSC PRO study [43] had higher methodological quality compared to the NIDDK-QA studies; however, as there was only one study the level of evidence is limited.

Table 4 Methodological quality of each study per measurement property and PROM
Table 5 Quality of measurement properties

Discussion

This review identified a total of 37 studies assessing 36 different PROMs used in patients with PSC; however, only one of these tools was specifically developed for the PSC population in accordance with FDA guidelines. The rationale for PROM utilization in the included studies varied. Most studies sought to measure the burden of the disease using constructs such as HRQOL and symptom severity; however, some studies examined the effectiveness of treatment, cost effectiveness and health utility. No studies researched the use of real-time monitoring of PROMs to directly inform PSC patient care in a routine clinical setting. Only three studies evaluated the measurement properties of PROMs in PSC patients: two studies evaluated the NIDDK-QA [26, 28] and one study evaluated the PSC PRO [43]. Currently, due to weakness in the methodological quality, there is limited evidence to support the use of these PROMs in the PSC population; however the PSC PRO is a promising new measure designed with patient input which requires further validation.

Clinicians or researchers wishing to use PROMs in PSC patients may consider use of both generic and disease specific measures. Choice of measurement selection should be informed through consideration on psychometric properties and patient input [53]. Generic measures such as the SF-36, although not formally validated in PSC patients, are widely used and allow comparison of the burden of PSC with other chronic disease, whilst the EQ-5D and SF-6D may be used to provide estimates of health utility to inform cost-effectiveness analysis [54]. Use of the PSC PRO will provide a more detailed assessment of symptoms and impact of symptoms relevant to PSC patients and help identify patients with varying disease severity [43, 55].

Although the PSC PRO has been developed with input from patients with and without IBD, questions focused on IBD symptoms appear fairly limited. This is important to note since 70–80% of PSC patients have co-existent IBD, most frequently ulcerative colitis [3]. This is a long term comorbidity and can occur even after a liver transplant [56]. The clinical course for patients with PSC and concomitant IBD can be different when compared to IBD or PSC alone [57]. PSC-IBD patients have higher incidence of rectal sparing, colorectal neoplasia, pouchitis following ileal pouch anal anastomosis (IPAA), pancolitis, and an overall poorer prognosis when compared to patients with IBD alone [57, 58]. Thus, PSC-IBD patients have additional symptoms and burdens that impact on activities of daily living with the consequential impact on HRQOL [59]. Additional use of an IBD measure such as the IBS-QOL may therefore be warranted [60].

Following further validation, the PSC PRO has potential for use in a number of ways to inform PSC patient care. The PRO may be used in clinical trials to assess the impact of new treatments or be used at the individual patient level in routine clinical practice to facilitate shared decision making and tailor care to individual patient needs. This approach has been highly successful in other settings such as cancer where routine monitoring using ePROs reduced emergency room admissions by 7%, hospital admissions by 4%, helped patients stay on treatment longer, improved patient quality of life by 31% and increased survival on average by 5 months at low cost [61, 62].

Strengths and limitations

This study is the first to undertake a systematic review of PROMs used in PSC, in accordance with the PRISMA [63] and COSMIN guidelines [64]. The use of COSMIN criteria has permitted a structured and comprehensive evaluation of the identified measures. However, the NIDDK QA studies evaluated in this review were carried out before the COSMIN guidance was available and at the time of publication the level and detail of reporting may have been deemed acceptable at that time. Another important consideration for research studies or clinical trials in rare diseases such as PSC are the small study populations. When guidelines such as COSMIN judge the quality of the methodology on sample sizes, it can make it more difficult to demonstrate sound methodological quality when there are only small numbers of patients available for recruitment and validation of PROs [65]. The use of international multi-centred studies may be one approach to overcome the small numbers available in studies that aim to evaluate and develop PROs for use in PSC in future studies.

Conclusion

In conclusion, a wide variety of PROMs are used to assess HRQOL and symptom burden in patients with PSC, but none have undergone comprehensive and extensive validation in this patient group. The PSC PRO is a promising new measure to assess symptoms and symptom impact in PSC patients; however further validation work is required. Collection of PROs in PSC patients can provide valuable information in a research setting and routine clinical practice to improve PSC patient care.