Health-related quality of life in primary hepatic cancer: a systematic review assessing the methodological properties of instruments and a meta-analysis comparing treatment strategies

Purpose Patient-reported outcomes including health-related quality of life (HRQoL) are important oncological outcome measures. The validation of HRQoL instruments for patients with hepatocellular and cholangiocellular carcinoma is lacking. Furthermore, studies comparing different treatment options in respect to HRQoL are sparse. The objective of the systematic review and meta-analysis was, therefore, to identify all available HRQoL tools regarding primary liver cancer, to assess the methodological quality of these HRQoL instruments and to compare surgical, interventional and medical treatments with regard to HRQoL. Methods A systematic literature search was conducted in MEDLINE, the Cochrane library, PsycINFO, CINAHL and EMBASE. The methodological quality of all identified HRQoL instruments was performed according to the COnsensus-based Standards for the selection of health status Measurements INstruments (COSMIN) standard. Consequently, the quality of reporting of HRQoL data was assessed. Finally, wherever possible HRQoL data were extracted and quantitative analyses were performed. Results A total of 124 studies using 29 different HRQoL instruments were identified. After the methodological assessment, only 10 instruments fulfilled the psychometric criteria and could be included in subsequent analyses. However, quality of reporting of HRQoL data was insufficient, precluding meta-analyses for 9 instruments. Conclusion Using a standardized methodological assessment, specific HRQoL instruments are recommended for use in patients with hepatocellular and cholangiocellular carcinoma. HRQoL data of patients undergoing treatment of primary liver cancers are sparse and reporting falls short of published standards. Meaningful comparison of established treatment options with regard to HRQoL was impossible indicating the need for future research. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-021-02810-8.


Introduction
Besides survival and treatment-associated adverse events, patient-reported outcomes (PROs) are arguably the most relevant outcome parameters in oncology. A PRO is defined as 'any outcome evaluated directly by the patient himself or herself and is based on patient's perception of a disease and its treatment(s)' [1]. PROs have many potential advantages as they may elucidate the relationship between clinical endpoints and the patient´s well-being [1], allowing for a more comprehensive evaluation of patients' health [2].
Health-related quality of life (HRQoL) is a multidimensional PRO measure that is of special interest in oncology as it provides a 'personal assessment of the burden and impact 1 3 of a malignant disease and its treatment,' [1] thus, adding valuable information for a true risk-benefit assessment. This is of special interest when prognosis is limited as in primary malignancies of the liver. HRQoL tools can be distinguished into generic, cancer-specific, cancer-type-specific and utility-(preference-)based instruments [3]. While definitions, implementation, evaluation and analyses of survival and toxicity/complication endpoints have been well standardized over the last decades, PROs are still under-evaluated and reported in most clinical settings. Multiple studies have aimed to define suitable HRQoL tools for different clinical settings, e.g. [4,5], including cancer patients [6][7][8].
Hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (CCA) account for more than 95% of all primary malignant liver tumours. Hepatitis B and C infections are the most prominent risk factor for HCC [9]. More than 840.000 patients were newly diagnosed with HCC or CCA in 2018, and numbers are estimated to rise > 1.3 million annually until 2040 [10]. Although age-standardized incidence rates are moderate in the Western World, they are high in most parts of Asia and parts of West Africa [10], making HCC one of the most frequent tumours in these parts of the world. Prognosis is dismal with 5-year overall survival being around 15% in the USA and 5% in low-income countries [9]. Besides surgical resection, medical treatment (e.g. chemotherapy, kinase inhibitors) and interventional treatments like radiofrequency ablation (RFA) and transarterial chemoembolization (TACE) constitute the three mainstays of treatment for both HCC and CCA.
Therefore, the objectives of this systematic review and meta-analysis were threefold: (1) to perform a systematic review to identify all published HRQoL tools for primary liver cancer (HCC/CCA); (2) to assess the methodological quality and clinical relevance of these HRQoL measures; and (3) to synthesize quantitative data via means of a metaanalysis to compare surgery vs. interventional treatments vs. systemic therapies with regard to HRQoL.

Material and methods
This systematic review and meta-analysis is reported in line with current PRISMA guidelines [11]. The study was registered in the PROSPERO database on 18th July 2017 (registration number CRD42017068103).

Eligibility criteria
Studies investigating HRQoL in HCC or CCA patients were included independent of language or year of publication. All types of studies were included in our search with the exception of case reports, i.e. randomized controlled trials (RCT), cohort-type studies (CTS), case-control studies (CCS) and cross-sectional studies. Furthermore, studies in animals (non-human studies) were excluded. The patient (P) and outcome (O) terms of the PICOT (patient-intervention-comparison-outcome-time) scheme were used to build a search strategy. The search used the 'outcome' term to identify PROMs describing quality of life or HRQoL and the 'patient' term to find studies including patients with HCC or CCA. Supplement 1 shows the search strategy for MEDLINE performed via OvidSP. If studies included mixed patient populations (e.g. including HCC patients together with metastatic cancer patients and other tumours), only those trials were included in which HRQoL data could clearly be extracted for HCC and CCA patients.

Information sources
The following databases were searched [12]: Where necessary, authors were directly contacted to retrieve missing information.

Search
Sensitive search strategies were developed for all databases using wildcards and adjacency terms where appropriate. Supplement 1 shows the search strategy for MEDLINE performed via OvidSP. The search strategies for the other databases were adapted to the specific vocabulary of each database.

Study selection
Search results were imported into EndNote software (EndNote X7.7, Thomson Reuters) [13], and duplicates were removed by using the automated duplicate removal function of EndNote. Consequently, titles and abstracts of studies were screened by two authors (KW, ALM) for fulfilment of inclusion and exclusion criteria. Remaining duplicates were removed manually. For the remaining studies, full text articles were obtained, which were then screened for eligibility by two authors independently (KW, ALM). Reasons for exclusion of full text articles were recorded (Fig. 1). All remaining articles were included in the qualitative syntheses (objectives 1 and 2). For objective 3 (quantitative assessment), all articles using adequate HRQoL measures (i.e. fulfilling objective 2) were included in the assessment of quality of reporting of HRQoL data and risk of bias assessment of individual studies. HRQoL data were extracted wherever possible and grouped according to the three clinical settings: (a) surgery; (b) interventional therapy and (c) medical treatment.
HRQoL assessments were then grouped into 3-month periods. In a next step, quantitative data analysis was performed for those HRQoL measures for which ≥ 2 quantitative data time points were available.

Data collection process
Data were extracted by two authors independently (KW, ALM) and collected on pre-specified piloted forms. In case, required data were not reported in the study, and authors were contacted to obtain remaining data. Differences in data extraction were resolved by consensus with a third author (MKD).

Data items
The following data items were collected: title, author, year of publication, country where study was performed, journal, language, cancer type, intervention, control, co-interventions, primary endpoint, secondary endpoints, HRQoL tool used, type of study, number of centres, start and end dates of study and intervention, number of patients (total), number of patients allocated to intervention(s), number of patients allocated to control, number of patients evaluated for HRQoL (at each point in time), number of withdrawals, exclusions, conversions, duration of follow-up, HRQoL data at baseline and during follow-up, analysis strategy, subgroups measured and subgroups reported. Furthermore, the following baseline characteristics of patients (for both intervention and control group) were recorded: age, gender, severity of illness, co-morbidities and other relevant baseline characteristics.

Evaluation of methodological quality of the HRQoL measures
The methodological quality of HRQoL measures was assessed based on specific psychometric criteria. Owing to the lack of uniform consensus on how to appraise PRO measures, criteria were applied based on published recommendations [3,14] in accordance with U.S. Food and Drug Administration guidance [15] and the Oxford University PROMs Group guidelines and the COnsensus-based Standards for the selection of health status Measurements INstruments (COSMIN) [16]. The criteria and benchmarks laid out in Table 1 were used for evaluation and have been used in previous publications [4,5]. A rating scale described in previous publications was applied to allocate a mark for each domain [4,5]: 0 no evidence reported;evidence not in favour; + evidence in favour; ± conflicting evidence. Lack of basic psychometric evaluation was defined by a priori consensus as evaluation of less than 2 positive ( +) aspects (other than feasibility and interpretability) in HCC/CCA patients. Evaluation was limited to primary hepatic cancers (HCC/CCA), i.e. the psychometric properties of some instruments might have been evaluated in other types of cancer, but not in HCC/CCA patients. In case of lack of psychometric data for a given instrument, searches were conducted in Medline to identify additional studies that have evaluated the psychometric properties of the HRQoL instrument in closely related patient cohorts (e.g. patients with chronic liver disease).

Evaluation of the quality of reporting of HRQoL data
For assessment of reporting, the studies were analysed using the following questions: (a) Is HRQoL data analysis described in methods section? (b) Has an a priori statistical analysis plan for HRQoL outcomes been implemented, addressing common problems like missing data, multiple testing? (c) Is HRQoL raw data presented? (d) Is individual patient data reported? (e) Which summary scores are used for HRQoL data? (f) Which time points of HRQoL assessment are described in the methods section? g.) For which time points is HRQoL data reported in the results section? Table 1 Psychometric criteria used to assess the quality of the patient-reported outcome measures Adapted from [4,5] Domain Criteria There are a number of methods to measure responsiveness, including t tests, effect size, standardized response means or Guyatt's responsiveness index. There should be statistically significant changes in score of an expected magnitude Appropriateness Assessment whether the content of the instrument is appropriate to the questions which the clinical trial is intended to address Interpretability Subjective assessment whether the scores of the instrument are interpretable for patients or physicians Acceptability Acceptability is measured by the completeness of the data supplied; ≥ 80% of the data should be complete Feasibility Qualitative assessment whether the instrument is easy to administer and process Floor-Ceiling effect A floor or ceiling effect is considered if 15% of respondents are achieving the lowest or the highest score on the Instrument

Assessment of risk of bias in individual studies
For RCTs risk of bias was judged using The Cochrane Collaboration tool of for assessing quality and risk of bias [17]. Risk of bias for non-randomized, interventional trials was assessed with the ROBINS-I tool (Risk Of Bias In Nonrandomized Studies-of Interventions, formerly known as ACROBAT-NRSI) as recommended by the Cochrane collaboration [11]. Non-randomized, non-interventional studies were assessed using the Newcastle-Ottawa risk of bias tool [18], and cross-sectional studies were assessed using the AHRQ checklist. RCTs were judged to be at an overall high risk of bias if there was a serious risk of bias in any of the following domains: random sequence generation, allocation concealment, missing data. For non-randomized trials, the following overall risk of bias judgement for individual studies was used in line with Cochrane recommendations [11]: (a) low risk of bias: the study is judged to be at low risk of bias for all domains; (b) moderate risk of bias: the study is judged to be at low or moderate risk of bias for all domains; (c) serious risk of bias: the study is judged to be at serious risk of bias in at least one domain, but not at critical risk of bias in any domain; (d) Critical risk of bias: the study is judged to be at critical risk of bias in at least one domain.

Statistical analysis
Data were entered in RevMan 5 software 5.3. (Review Manager, Version 5.3 Copenhagen: The Nordic Cochrane Center, The Cochrane Collaboration, 2014) [19]. As level of significance, an alpha of 0.05 was determined. A random-effect model (inverse variance) was used as there has been clinical heterogeneity between the included trials. Heterogeneity was evaluated using I 2 statistic. Results lower than 25% were considered as low, between 25% and 75% as possibly moderate, and results of I 2 over 60% were considered as a considerable heterogeneity. HRQoL in HCC/CCA patients was compared by meta-analysis for the following types of interventions: (a) surgery; (b) interventional therapies (e.g. TACE, RFA) and (c) systemic therapies (e.g. chemotherapy).
Only studies using the FACT-G/FACT-Hep could be used for meta-analysis (see results section). As these subscores are continuous variables, the mean difference in the FACT-G/FACT-Hep subscores was used as effect measure.

Study selection
We identified 3811 studies by database search and 12 additional studies by hand search resulting in a total of 3823 records. 453 of those studies were duplicates (Fig. 1). After screening titles and abstracts, the other 2888 records were excluded according to inclusion and exclusion criteria. Subsequently, the other 358 articles were excluded after full text analyses for the following reasons: no HRQoL tool (n = 74), other type of cancer (no HCC/CCA) (n = 48), no primary data (n = 198), ongoing study without report (n = 21), double publication (n = 15) and no full text available (n = 2). The remaining 124 studies were included in the final qualitative syntheses (Fig. 1).

Health-related quality of life instruments
In total, 29 different HRQoLs in 124 studies instruments were identified by our search (Figs. 2 and 3). Of those, 26 different HRQoL PROMs were identified in HCC patients, 8 in CCA patients and 4 different tools in mixed patient cohorts. Multiple studies used more than one HRQoL tool ( Table 1). The identified instruments covered all types of HRQoL (generic, cancer-specific, cancer-type-specific and utility-based HRQoL instruments) (Fig. 2). Despite being labelled as HRQoL instruments in the studies, a number of the identified instruments solely address cancer symptoms and, thus, lack the multidimensionality that is requested for HRQoL and were, thus, excluded from further analyses (Fig. 3 step 1). These were (a) MD Anderson symptom inventory; (b) ESAS: Edmonton symptom assessment scale; (c) MD Anderson symptom inventory -gastrointestinal and (d) FHSI-8 FACT hepatobiliary symptom index. The remaining 25 instruments (117 studies) were included in the further analyses (Fig. 3). These 25 instruments use two to eight domains covering various aspects of quality of life (e.g. physical and mental health, role functioning and symptom burden). The EORTC QLQ-C30 and the FACT-G have cancer-type-specific supplements (EORTC QLQ-HCC18 and FACT-Hep) which can only be used in combination with the more general questionnaire. The questionnaires comprise 5 (EQ-5D) to 47 questions (NIDDK-QA) and have a recall period from the 24 h (EQ-5D) to

Methodological assessment of HRQoL instruments
The methodological quality of the remaining 25 HRQoL instruments was assessed as outlined in the methods section. Results are shown in Table 3. If no data for a given HRQoL instruments were available for HCC/CCA patients, additional Medline searches were performed to identify methodology studies that evaluated the PROM in closely related patient populations like chronic liver disease. These studies are indicated in Table 3.
The most frequently evaluated dimension in all HRQoL tools was reliability (test-retest reliability and internal consistency). With a test-retest correlation of more than 0.70, adequate performance for 6 out of 12 PROMs (SF-36, FACT-G, EORTC QLQ-HCC18, FACT-Hep, NIDDK-QA and QOL-LC) was confirmed [41,88,120,[141][142][143][144][145][146]. For the EQ-5D, correlation coefficients ranging from 0.58 to 0.98 were observed showing that not all scales in this PROM are reliable enough [141]. Internal consistency was evaluated with the calculation of Cronbach's α. A value greater 0.70 was considered sufficient according to COSMIN guidelines [16]. This could be observed in 8 out of 12 HRQoL tools (NHP, SF-36, WHO-BREF, EORTC QLQ-C30, FACT-G, FACT-Hep, NIDDK-QA and QOL-LC) [27,77,88,120,141,142,[144][145][146][147][148][149][150][151]. Concerning validity, rarely all three pre-defined categories (content, criterion and construct validity) were evaluated. More frequently only one or two aspects of validity were examined. Content validity was evaluated investigating the process of questionnaire creation. In case of the FACT-G, FACT-Hep and EORTC QLQ-HCC18, the process described included qualitative studies with inclusion of expert opinions, patient reports and current literature [28,144,152]. Merely three PROMs (FACT-Hep, FACT-Hep and NIDDK-QA) were compared to the gold standard (i.e. an already established questionnaire), thus, testing criterion validity [144][145][146]. In order to evaluate construct validity, group comparisons using performance status (such as the Karnofsky Performance Status) were used for the EORTC QLQ-HCC18 and FACT-Hep questionnaires as it is known that a higher performance status correlates with better HRQoL [41,88]. Construct validity within the SF-36 was evaluated using the correlation with hypothesized scores (conceptually related and unrelated scores) [141,148,149]. Kim et al. compared item scores between ambulatory patients and liver transplant recipients as well as examined correlations between the domain scores of NIDDK-QA vs. SF-36 and Mayo risk score, respectively [146]. The Wilcoxon signed-rank test was used by Chie et al. to evaluate if the changes in score were significant before and after treatment. For example, patients undergoing surgical treatment suffered significantly more pain compared to before which reflects an adequate responsiveness of the EORTC QLQ-HCC18 [41]. Steel et al. evaluated the clinically meaningful changes of the FACT-Hep over time and found significant decrements in all subscales from baseline to 3-month followup [147]. The SF-36 performed poorly during the evaluation of floor and ceiling effects with patients scoring the highest or lowest possible score in distinctly more than 15% which was the set cut-off [148,149]. Valid acceptability and feasibility were assumed when the response rate was > 80%, or the time to complete the questionnaire was 10 or less minutes [24,27,46,56,85,88,120,126,141,148,149,153]. The interpretability of all PROMs was considered acceptable as higher scores in QoL scales represent higher HRQoL, and higher scores within the symptom scales represent lower HRQoL.
Due to a lack of data concerning the basic psychometric evaluation or negative results, only the following 10 HRQoL instruments were considered methodologically adequate according to the pre-specified criteria (see methods section) and were subsequently included in further analyses (Table 3)

Data synthesis for HRQoL tools
For generic HRQoL instruments like the SF-36, EQ-5D or WHO-BREF, no meta-analysis following treatment was possible, either because primary data were insufficiently reported (supplement 4) or only single articles reporting raw data were identified. Similarly, for cancer (type)-specific HRQoL tools like EORTC QLQ-C30, EORTC QLQ-HCC18 and QLQ-LC meta-analysis of HRQoL data, the following treatment was impeded by either insufficient reporting during follow-up (supplement 3), or studies compared interventions that were too heterogeneous for meta-analysis. Only for the FACT-G and FACT-Hep questionnaires, clinically comparable interventions were analysed in several studies: Six studies contained surgical study groups [35,37,43,81,99,116], two studies contained data on RFA [37,116], and 5 studies reported extractable data in TACE patients [73,99,103,116,123]. Although FACT-G or FACT-Hep was used in several studies investigating medical treatment options for HCC, these were either single-arm studies [32,34,94], contained placebo control groups [31,36,38,53,137] or compared two medical treatment options [72,136], thus, precluding a comparison to interventional/surgical treatments. Similarly, some studies used the FACT-G or FACT-Hep questionnaire to compare different interventional treatments [73,103,116,122], again impeding meta-analysis. Consequently, only 3 studies using the FACT-G/FACT-Hep remained for meta-analysis (Fig. 3 step 3).

Meta-analyses
For the comparison of surgical resection vs. TACE, only two studies reported raw data at baseline and during followup [99,116] (supplement 5A). Poon et al. split the surgical cohort into two distinct subgroups: those with a complete follow-up of two years and those with a shorter follow-up. This is likely to introduce major bias as patients completing 2-year follow-up are likely to be healthier and have less aggressive tumour diseases. We, therefore, pooled the data for the two surgical groups. Supplement 5A shows the results of this exploratory meta-analysis of the mean difference in FACT-subscores (functional, physical, social and emotional well-being) at 12-month post-intervention/surgery. One additional analysis was possible: the comparison of surgery vs. RFA as data are reported in the two studies by Huang et al. and Toro et al. [37,116]. Supplement 5B shows the results of the exploratory meta-analysis for the 12-month post-interventional/postoperative follow-up, again comparing mean differences in FACT-subscores.

Discussion
HRQoLs represent an important domain of clinical outcomes in oncology. While definitions, implementation, evaluation and analyses of survival and toxicity/complication endpoints have been well standardized over the last decades, PROs are still under-evaluated and reported in most clinical settings. Multiple studies have aimed to define suitable HRQoL tools for different clinical settings, e.g. [4,5], including cancer patients [6][7][8]. However, no concise evaluation has been performed for patients with primary liver cancers (HCC or CCA).
Although 124 studies were included in this systematic review, we were able to complete only the first two objectives of our study, namely to identify and evaluated HRQoL measures in HCC/CCA patients. However, meta-analysis of study results comparing the outcome of surgical, interventional or medical treatments for HCC/CCA patients in regard to HRQoL was barely possible due to the use of different HRQoL instruments, lack of data or insufficient reporting.
We identified 29 different HRQoL instruments, which indicate vast heterogeneity and lack of consensus in this field. Similar results have been reported before in other diseases [6][7][8]. Furthermore, many of the identified tools lacked basic HRQoL characteristics like multidimensionality [154,155]. Hence many authors seemed to be unaware of the difference between mere symptom measures and HRQoL instruments. In addition, validation of HRQoL is poor for most instruments in HCC/CCA patients ( Table 2). As expected, the best psychometric data were available for cancer-type-specific HRQoL instruments, like EORTC QLQ-HCC18 or the FACT-Hep. Interestingly, even for common generic and disease-specific HRQoL tools, like the Spitzer quality of life index and the EORTC QLQ-C30, data in HCC/CCA patients are sparse. Hence, evaluation of these common tools in this patient cohort seems necessary in future studies. In addition, even for HRQoL measures developed especially for liver cancer patients, psychometric properties were less stringent as might have been thought. The EORTC QLQ-HCC18 shows mixed psychometric results [41,88]. FACT-Hep, on the other hand, although showing good psychometric properties, has been validated only in mixed patient populations including patients with liver metastases and pancreatic cancer in addition to HCC/CCA patients [144,147]. Similarly, the preference-based HRQoL EQ-5D has been extensively evaluated in chronic liver disease, but little psychometric data are available in HCC/CCA patients. Future studies should address these shortcomings.
Nevertheless, our analysis revealed suitable HRQoL instruments with sound psychometric properties that should be used in all future HRQoL studies. These are SF-36 [156] for generic HRQoL measurement. The SF-36 is a generic HRQoL instrument consisting of 36 items divided into eight scales (Physical Functioning, Emotional Role Functioning, Physical Role Functioning Bodily Pain, General Health, Vitality, Social Functioning, Mental Health, Health Transition) [156]. The number of response choices per item ranges from two to six. The scores for each scale range from 0 to 100. A higher score indicates a better QOL. The time frame of the SF-36 is 'last week' [141].
For cancer-specific HRQoL measurement in HCC/CCA patients, the EORTC QLQ-C30 [157] and the FACT-G can be recommended. Both have limited, but acceptable psychometric properties in HCC/CCA patients and have been used extensively in this patient cohort. The 30-item QLQ-C30 measures five functional scales (physical, role, emotional, cognitive and social functioning), global health status, financial difficulties and eight symptom scales (fatigue, nausea and vomiting, pain, dyspnoea, insomnia, appetite loss, constipation and diarrhoea). The scores vary from 0 (worst) to 100 (best) for the global health status and functional scales, and from 0 (best) to 100 (worst) for symptomatic scales [157]. The FACT-G consists of 27 items for the assessment of four domains of QOL: (1) Physical Well-Being and (2) Socio-Family Well-Being contain seven items each; (3) Emotional Well-Being contains six items and (4) Functional Well-Being contains seven items. The time frame of the FACT-G is 'last week'. Each item is scored on a 5-point ordinal scale, where 0 indicates not at all and 4, very much [152].
Cancer-type-specific HRQoL should be measured via the EORTC QLQ-HCC18 or FACT-Hep. The EORTC QLQ-HCC18 is an 18-item HCC-specific supplemental module developed to augment QLQ-C30 and to enhance the sensitivity and specificity of HCC-related QOL issues. It contains six multi-item scales addressing fatigue, body image, jaundice, nutrition, pain and fever, as well as two single items addressing sexual life and abdominal swelling. The scales and items are linearly transformed to a 0 to 100 score, where 100 represents the worst status [28,88]. The FACT-Hep is a 45-item self-reported instrument that consists of the 27-item FACT-G (see above), and the 18-item hepatobiliary cancer subscale, which assesses specific symptoms of hepatobiliary cancer and side effects of treatment. The FACT-G and hepatobiliary cancer subscale scores are summed to obtain the FACT-Hep total score [37,144]. The QoL-LC questionnaire shows good psychometric properties but has been developed and tested exclusively in Chinese patients, thus, limiting its generalizability. Similarly, NIDDK-QA as a cancer-type-specific HRQoL tool has been used in only one study and, thus, cannot be recommended currently.
For utility-based HRQoL measurement, the EQ-5D [158] has been identified as the instrument of choice. It fulfils basic psychometric requirements, and a sound database is available in HCC/CCA patients. The EQ-5D consists of five items (mobility, self-care, usual activities, pain/discomfort and anxiety/depression). Each item has three response categories: no problems, some problems and extreme problems. The sixth item is a global health evaluation scale, ranging from 0 (the worst imaginable health state) to 100 (the best imaginable health state). The time frame of the EQ-5D instrument is the present moment.
The quality reporting of the HRQoL results was insufficient overall. Few trials reported common methodological problems of HRQoL data like multiple testing, missing data or a priori hypothesis. Raw data were rarely reported and summarize measures (mean, median etc.) as well as follow-up regimes varied widely between studies. In addition, the methodological quality of the studies was generally poor. Thus, despite a total of 124 studies available, evidence regarding HRQoL in HCC/CCA patients is limited.
It is astonishing that reporting of HRQoL data does not seem to have improved over the last decades despite the publication of multiple guidelines and recommendations concerning HRQoL reporting. Few of the included studies fulfiled basic reporting standards for HRQoL like the ones proposed by Basch et al. [159], Staquet et al. [160], the International Society for QoL research (ISOQOL) [161] or the CONSORT-Patient-reported outcome extension [162].
These shortcomings in the methodological quality and reporting were the main reasons for the insufficient metaanalyses in our study. Studies had to be excluded at various points along the way (Fig. 3). The planned comparison of treatment options (surgery vs. medical treatment vs. interventional treatment) with regard to HRQoL can, therefore, be regarded exploratory at best. Future, high-quality HRQoL trials, adhering to basic reporting standards, are urgently needed to address these shortcomings.
One of the main strengths of the current study is the use of a comprehensive search strategy to identify all relevant publications. Furthermore, to our knowledge, this is the first study that assesses the methodological quality of HRQoL tools in HCC/CCA patients according to internationally accepted standards time [3,15,16] thereby identifying suitable HRQoL instruments for the use in future studies. In addition, this study can be used as an easy reference standard to identify available studies and raw data for the design and sample size calculation in future HCC/ CCA trials. The transparent analysis process in this study can be regarded as a further strength.
The main limitation of our analysis is the heterogeneity of included studies, patients and trial designs. The variations in the application, analyses and reporting of HRQoL between studies made data synthesis difficult. The metaanalyses should regarded exploratory at best.
In summary, clear recommendations for generic, cancer-specific, cancer-type-specific and preference-based HRQoL instruments in HCC/CCA patients can be given. Meta-analysis of data comparing different treatment options in HCC/CC patients was severely limited due to methodological weaknesses of the included studies and shortcomings in reporting. Future trials should address these aspects and adhere to HRQoL reporting standards.
Author contributions KW, ALM, MKD and CE are responsible for conception and design of the study. KW, PH and ALM performed the acquisition and analysis of the data, and drafted the manuscript. KW, PH, PP, CE, MKD and ALM offered substantial contributions to interpretation of the data and critically revised the manuscript. All authors gave their final approval of this version of the manuscript and are accountable for all aspects of the work.
Funding Open Access funding enabled and organized by Projekt DEAL. No funding was used to create this review.

Data availability Not applicable.
Code availability Not applicable.

Declarations
Conflict of interest All authors declare no conflict of interest.
Ethical approval Not applicable.

Consent for publication Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.