FormalPara Key Points for Decision Makers

Despite extensive discussions advocating for value-based health technology assessment and policymaking in China, no empirical study has been conducted in this country.

Through interviews with 34 Chinese stakeholders and a review and analysis of 16 government documents, we identified 12 value attributes measuring severity of disease, health benefit, safety, economic impact, innovation, organizational impact, health equity, and quality of evidence.

These value attributes could be used for the development of a VAF to support transparent, consistent, and robust health technology value assessment in China.

1 Introduction

The requirement to meet rising healthcare needs with scarce resources is shifting healthcare systems from ‘volume-driven’ to ‘value-driven’ services and funding models [1, 2]. In a value-driven system, healthcare choices and decisions are made based on the comprehensive assessment of health technologies (e.g., drugs, devices, medical or surgical procedures, and health programs) [3]. In response to this shift, a number of value assessment frameworks (VAFs) have recently been developed to support health technology assessment (HTA) and subsequent coverage policymaking. These VAFs facilitate transparent and consistent decision making, and promote the adoption and diffusion of innovative health technologies in healthcare systems [4,5,6,7]. However, most existing VAFs were developed for high-income countries [8]. There are a lack of VAFs developed in low- and middle-income countries that account for population preferences and limited resources available in the local setting.

China is one of the most populous countries, with 1.44 billion residents and 12% (approximately 172 million) over 65 years of age [9]. China’s healthcare system is facing unprecedented challenges in meeting the healthcare needs of its population. It was estimated that the healthcare spending in China accounted for 6.6% of its gross domestic product (GDP) in 2019. The current healthcare delivery model is primarily volume-based and the expenditures are expected to exceed 9% of China’s GDP by 2035 [10, 11]. As part of the latest healthcare reforms, the National Healthcare Security Administration (NHSA) was established to adopt a centralized approach to drug pricing, coverage, and decision making [12, 13]. The value of new health technologies is being considered in this process, albeit informally. Developing a framework to guide the value assessment of new technologies can support consistent and efficient coverage policymaking, which is critical to the establishment of an accessible, equitable, and sustainable healthcare system for China.

Recently, the Evidence and Value: Impact on DEcisionMaking (EVIDEM) framework has been adapted to support HTA in China [14, 15]. The EVIDEM framework was developed through literature review, with some attributes noted as context-sensitive [16]. However, stakeholder engagement in the adaptation of the EVIDEM framework to the context of China was limited. The input from patients and the general public was missing in the adaptation. The objective of this study was to identify key value attributes for developing a VAF for China through interviews with Chinese stakeholders.

2 Methods

2.1 Overview

This study was conducted as part of the development of a VAF for HTA and coverage policymaking in China. We previously completed a systematic literature review to summarize existing VAFs, which informed the present study design [8]. This study focused on the identification and selection of value attributes for the VAF through incorporation of multiple stakeholders’ perspectives. A future study will conclude the program of work via a survey among Chinese stakeholders to develop a VAF that includes all values identified and accounts for the dependence between value attributes and the uncertainty in the coverage decision-making process.

2.2 Study Design

We designed a qualitative study that was informed by the principles of qualitative description (QD) to elicit stakeholders’ perspectives on important attributes for assessing the value of new health technologies [17, 18]. QD seeks to provide a rich description of a phenomenon, a process, or the perspectives and perceptions of people who have direct experience with the phenomenon of interest [17, 18]. A central element of QD is staying close to the data provided by participants and to generate an overarching description of the phenomenon without too much interpretation [17]. Thus, QD emphasizes the importance of collecting and collating perceptions of events or experiences from target populations to advance our understanding about health-related phenomenon, as well as healthcare planning or services [19]. It is a research design well regarded for addressing applied research questions with healthcare policy and practice relevance [19].

2.3 Study Setting and Participants

Members of the public are the consumers of healthcare services and key drivers of health technology use [20]. It is critical to engage them in the development of VAFs to align healthcare decisions with public preferences. However, the engagement of patients and members of the public was generally limited in existing VAFs [8]. Therefore, patients and participants recruited from the public (hereafter referred to as the public) are one of the key stakeholder groups for our study. We particularly considered factors that could impact the public’s perspectives and expectation on new health technology in sampling and recruitment to reflect the diversity of perspectives, experience, and expertise. These factors include the public’s geographical region (e.g., Northwest China vs. South China), residence (urban vs. rural) and insurance type (the urban employee basic medical insurance [UEBMI] vs. urban-rural residents basic medical insurance [URRBMI]) [21,22,23,24]. There are seven geographical regions in China [25] and considerable disparities exist in their levels of economic development and health investment, with East China and South China ranking highest, Northwest China and Southwest China ranking lowest, and Northeast China, North China and Central China in the middle [26]. Meanwhile, substantial urban-rural differences in personal income and economic development still exist despite increased urbanization in China in recent years [27]. On the other hand, public health insurance programs are the major form of health insurance for people in China and cover over 95% of the population [21]. There are two public health insurance programs in China: UEBMI, which provides coverage to working or retired urban residents in the formal sector; and URRBMI (merged from the Newly Cooperative Medical Scheme and Urban Resident Basic Medical Insurance), which provides coverage to urban residents and rural residents who are not eligible for UEBMI [21, 22, 28, 29]. Public health insurance programs are operated and organized by the local government and there is substantial variation in the amount of funding and coverage available between the different public health insurance programs, which is further complicated by an additional layer of funding and coverage availability in different regions.

Informed by the information above, as well as the methodological guidelines for qualitative inquiry, participants were sampled and recruited using purposeful sampling procedures. Specifically, we used criterion, maximum variation, and snowball sampling techniques [30]. With respect to criterion sampling, policymakers, healthcare providers, industry representatives, and academic researchers were asked to describe their experience in HTA or health technology-related decision making or policymaking using a predeveloped screening questionnaire (see Appendix 1 in the electronic supplementary material [ESM]). The public were required to be (1) older than 18 years of age, and (2) able to understand and communicate in Mandarin. We used the maximum variation sampling approach to ensure that selected policymakers, healthcare providers, industry representatives and academic researchers varied in the years of work experience, professional status (e.g., senior vs. junior), expertise (e.g., physicians vs. nurses), residence area and geographical regions. The public were selected in terms of variation in age, sex, residence area, geographical regions, insurance type, socioeconomic status (e.g., occupation, education, and work activity) and current health status (e.g., presence vs. absence of disease diagnosis). Snowball sampling supplemented our recruitment efforts via asking participants to link the interviewer to individuals who might be willing and able to participate. As is customary in inductive qualitative research, sampling, data collection, and data analysis happened concurrently. Therefore, sampling continued until data saturation was achieved where the amount, variation and depth of the data was deemed capable of adequately generating a comprehensive description of value attributes from multiple stakeholders [31,32,33]. Data saturation was determined via independent coding of the data by two coders, as well as consensus-based discussions among team experts in qualitative methods, HTA, and health policy. Given the descriptive aims of our work, as well as the inclusion of multiple stakeholders who are involved in various stages of HTA and policymaking, we expected to achieve data saturation following the completion of 35 semi-structured interviews with 7–10 participants in each stakeholder group.

Due to the coronavirus disease 2019 (COVID-19) pandemic, we used China’s major social media platform (i.e., WeChat) as the primary recruitment tool and one of the interview platforms [34]. The lead researcher (MZ) screened and selected participants following the above sampling strategy.

2.4 Data Collection

We conducted one-on-one, open-ended, semi-structured interviews with participants. Virtual web-based technology (i.e., WeChat) or online conferencing software (e.g., Microsoft Teams, Tencent Meeting) was used. Interviews focused on encouraging participants to describe their perceptions about important attributes when evaluating the value of a new health technology or their perspectives on the characteristics that a health technology with high value should have [35]. At the end of each interview, participants were asked to name any relevant documentation that they consulted or felt relevant to the assessment of value for health technology. Documents recommended by interview participants were also reviewed by the study team.

Interviews were conducted between 19 June 2021 and 7 October 2021. All interviews were audio-recorded and transcribed verbatim, except for interviews with two policymakers at their request. Each of the interviews were rendered anonymous via the transcription process. Two interviewers examined the transcripts following predeveloped transcription guidelines to ensure the accuracy of transcriptions. The guidelines provided general formatting rules of removing identifying information, capturing nuances (e.g., long pauses from participants), and highlighting strongly expressed opinions (e.g., raised voice added with italics to communicate emphasis). Field notes were also created by the interviewers after each interview to document contextual information and to capture reflective thoughts about interview content that was perceived to be relevant to the data analysis.

To facilitate the interview with stakeholders with different backgrounds and knowledge, we developed an interview guide for each stakeholder group (see Appendix 2 in the ESM). Trained qualitative interviewers pilot tested each version of the interview guide with senior researchers to ensure that questions were asked in an appropriate and consistent way to obtain the most relevant information. Plain language was used in the guide for the public.

2.5 Data Analysis

The data analysis consisted of three stages. First, we used conventional content analysis and the constant comparison technique to generate relevant concepts and categories from the interview and the documents reviewed [19, 36,37,38]. The content analysis was carried out immediately after each interview so that emerging questions or issues could be incorporated in the subsequent interviews. Two coders reviewed the transcripts and government-issued regulations and documents independently and identified key concepts that we described as value attributes. Value attributes were then organized into categories based on the content described. Solidifying the identification, definition, organization of value attributes was achieved via consensus among the team. Based on the value attributes identified in the first round of five interviews, the transcript and any new document files suggested by the interviewee in each subsequent interview were added to the data set for analysis. The coders used the constant comparison technique to determine whether any new value attributes or categories needed to be generated [37]. Where any new attributes or categories were identified, the interviewers went back to previously coded data to ensure that they were coded in each transcript. This iterative coding process was supplemented via analytical memoing by the coders, which captured the generation and justification for the development of new attributes and their categories until data saturation was achieved [39].

The second component of our analysis involved the use of multiple criteria to guide decisions related to retaining or dropping value attributes. The exclusion criteria were informed by the findings of our recently published systematic review of existing VAFs [8]. Some previous VAFs included societal context, such as political, historical and cultural milieu as contextual attributes and recommended measuring them qualitatively [8, 16, 40,41,42]. However, the qualitative measurement methods for these attributes or the approaches of incorporating the measurement results of these attributes into the decision-making process were unknown or were not reported in these VAFs [8, 16, 40,41,42]. This could increase the risk for inconsistency in VAF application, as well as lack of transparency in decision making; both of these potential procedural issues contradict VAFs’ primary goals of accurate and reliable value assessment to inform healthcare decisions [20]. Thus, attributes that were not measurable quantitatively or qualitatively due to unclear definition(s) from the participants, or that may pose challenges for reliable measurement using currently available methods, were excluded.

Third, response levels for each attribute were generated through discussion and consensus among the research team. The discussion was informed by the suggestions from the interviewees, and the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) system [43, 44]. In GRADE, quality of evidence for each outcome is divided into four levels: very low, low, moderate and high [43]. The magnitude of effect for a single outcome can be divided into four ranges (i.e., trivial, small, moderate, or large effect) by three thresholds (i.e., small, moderate, or large-effect threshold) [43]. We adopted the four levels for quality of evidence and this four-range approach to define response levels of other attributes included in the VAF. Symbols and color coding were adopted for the response levels to facilitate the understanding and use of the framework.

All transcripts, memos and documents from this study were managed using NVivo (Release 1.0, 18 March 2020). Descriptive statistics and frequencies were used to analyse and present participants’ sociodemographic characteristics, which was performed using Microsoft Excel (Microsoft Corporation, Redmond, WA, USA).

2.6 Ethical Approval and Consent

Approval to conduct this study was granted by the Hamilton Integrated Research Ethics Board (HiREB), and all participants provided informed consent to participate in this study. An honorarium was provided to each participant after the interview. We used a series of strategies for all phases of the study to promote the rigor and trustworthiness of our research and reporting procedures, which are outlined in Appendix 3 in the ESM.

3 Results

This study was reported following the Standards for Reporting Qualitative Research (SRQR) reporting guideline [45].

3.1 Participant Characteristics

A total of 34 online interviews were conducted and the mean duration of each interview was 64 mins (range 23–95). The data saturation was achieved after 29 interviews (1 policymaker, 8 healthcare providers, 6 academic researchers, 9 public, and 5 industry representatives), but we conducted another 5 interviews (3 policymakers and 2 industry representatives) to ensure the inclusion of participants with various demographic characteristics in these two stakeholder groups (see Appendix 4 in the ESM). A total of 16 government-issued documents were identified and analyzed (see Appendices 5 and 6 in the ESM). The policymakers were from hospital, provincial and NHSA agencies. Healthcare providers included physicians, nurses, and pharmacists, while HTA researchers were from academia, consulting companies, or non-governmental organizations. Industry representatives were working in various departments, including market access, research and development, and health economics outcomes research in pharmaceutical or medical devices companies.

Table 1 presents the characteristics of the study participants. Participants resided in 13 different provinces that spanned all seven of the geographic regions in China; 50% of the participants identified as female. Most participants were from North or East China (n = 24, 70.6%), living in urban areas (n = 29, 85.3%), with UEBMI (n = 29, 85.3%) and with a bachelor’s degree or higher (n = 31, 91.2%). Of the 9 public participants, most were from North or East China (n = 5, 55.6%), living in urban areas (n = 6, 66.7%), and with UEBMI (n = 6, 66.7%). Within this group, nearly half of the participants identified as female (n = 4, 44.4%). Of the 16 government-issued documents reviewed, 15 (94.8%) were released in the last 5 years (i.e., between 2016 and 2021). Documents were published by the National Health Commission (n = 6, 37.5%), the General Office of the State Council (n = 5, 31.3%), the NHSA (n = 4, 25%), and the National Medical Products Administration (n = 1, 6.2%).

Table 1 Participant characteristics

3.2 Attribute Identification and Selection

Table 2 displays the descriptions of all the value attributes included in the VAF, as well as illustrative quotes from the coded data. A total of 12 value attributes grouped to eight categories were included: (1) severity of disease; (2) health benefit, including survival, clinical outcomes, and patient-reported outcomes (PROs); (3) safety; (4) economic impact, including budget impact to payer, out-of-pocket costs to patients, and cost effectiveness; (5) innovation; (6) organizational impact; (7) health equity; and (8) quality of evidence. Appendix 7 in the ESM presents the generation of categories and value attributes in the form of a coding tree. All participants discussed the importance of health benefits, safety, economic impact, and health equity (see Appendix 4 in the ESM). Most participants discussed the current health system context and the potential organizational impact of new health technologies (n = 31, 91.18%) and quality of evidence (n = 26, 76.47%). They believed that quality of evidence should be separately rated for each characteristic. Half of the participants discussed cost effectiveness (n = 17, 50%), severity of disease (n = 17, 50%) and the value of innovation in addressing unmet needs (n = 17, 50%). Most interviewees (n = 28, 82.4%) believed that the rankings or the relative importance of the other attributes varied across diseases of different levels of severity, and that different priorities should be assigned to the disease for coverage decision making. All 12 value attributes were discussed across all stakeholder groups; the one exception was the attribute of ‘cost effectiveness’, which was not discussed by any participant from the public (see Appendix 8 in the ESM). The public participants emphasized the importance of health benefits, safety, and out-of-pocket costs to patients. One public participant discussed innovation in addressing unmet needs. When discussing quality of evidence, the public defined evidence as recommendations from healthcare providers and other patients. All 12 value attributes have been mentioned in the government-issued policy documents. However, only two documents (12.5%) discussed severity of disease.

Table 2 List of value attributes included in the value assessment framework

Ethics and societal implications were mentioned in the interviews but it was not clear whether and how to measure them. Ten participants (29.41%) discussed ethics; however, four of them did not give a clear description of ethics, while the remaining six described ethics with substantial variation, ranging from healthcare professionals’ behaviors to no harm to patients, which overlapped safety. Societal implications discussed by participants were extremely broad, including demographic, cultural, economic, legal, and political context in China. It was not clear whether or how to measure these implications in the value framework and therefore they were excluded.

3.3 Attribute Levels

We categorized the severity of disease into three levels to reflect life-threatening or critical disease, severe disease and moderate or mild disease, as discussed by the participants. For quality of evidence, we used the four levels for high, moderate, low, and very low. For attributes measuring health benefits, safety, cost effectiveness, innovation, and health equity, we used the four levels for excellent, good, fair, and poor. For attributes measuring costs and organizational impact, we used the four levels of none, low, moderate, and high.

4 Discussion

This qualitative descriptive study has identified 12 important value attributes for a VAF for health technology value assessment and decision making in China. The included attributes represent a broad range of value components related to severity of disease, health benefit, safety, economic impact, innovation, organizational impact, health equity, and quality of evidence.

Using semi-structured interview and document analysis, this qualitative study involved multiple stakeholders, including patients and members of the public, policymakers, healthcare providers, HTA researchers and industry representatives for attribute identification. We identified attributes that capture aspects important to the stakeholders in China for health technology value assessment and coverage decision making by (1) purposively selecting participants who have a diverse background and experience with health technology use, assessment and coverage decision making in China; (2) inductively analyzing the participants’ insightful and contextual descriptions and discussions; and (3) deliberately supplementing and triangulating the interview data with review of government documents related to HTA and coverage policies. Among existing VAFs, the attributes were often identified through literature review or selected by a few healthcare providers, health economists and policymakers without direct input from the public [8, 14]. Not doing so risks missing value attributes important for the public and healthcare providers, two key parties involved in health care decision making.

Similar to most existing VAFs, our VAF includes severity of disease, health benefit, safety, and quality of evidence [8]. However, there are important differences in measuring these attributes based on inputs from the qualitative study.

First, previous frameworks usually measure severity of disease as part of burden of disease along with unmet needs or size of population. Sometimes, they include both burden of disease and budget impact to payer, or both unmet needs and innovation [7, 16, 46]. There are overlaps between these attributes. For example, the budget impact to the payer takes into account the size of the population. The unmet needs has been one of the criteria to determine the novelty of a health technology [47, 48]. On the other hand, some multicriteria decision analysis (MCDA) frameworks included severity of disease alongside other attributes in the weighted-sum model [7, 49, 50], which assumes independence and compensation between attributes [51]. Therefore, the inclusion of disease severity in the weighted-sum model ignores the potential interactions and dependence between disease severity and other value attributes, which has been suggested by previous studies and our discussions with participants about the relative importance of attributes for diseases at different levels of severity [51, 52]. In our framework, severity of disease was used to construct disease scenarios at different levels of severity. Budget impact to payer incorporates size of the population, while innovation incorporates unmet needs. In each scenario, the relative weights of the remaining attributes are to be determined separately.

Second, the health benefit of a health technology is measured through three value attributes in our framework: survival, clinical outcomes (excluding survival), and PROs. However, the value framework developed by the Professional Society for Health Economics and Outcomes Research includes quality-adjusted life-years (QALY) as a core value attribute [53]. Although QALY was not included as a separate value attribute in our VAF, both survival and PROs (including health-related quality of life) were identified as important value attributes. This categorization was used because some interviewees did not mention QALY, which might be due to the fact that they were not familiar with the concept of QALY. Another reason was that those participants who discussed QALY were concerned about the limitations of QALY in capturing value attributes such as equity.

Third, quality of evidence was included in some existing VAFs as an overall rating of quality of evidence on all attributes [16, 50, 54]. For example, quality of evidence was included as an attribute in the weighted-sum model alongside other attributes in the EVIDEM framework [16]. The relative importance of quality-of-evidence ratings for different attributes (e.g., clinical outcome vs. economic impact) was left to the users’ judgment [16]. It was not clear what value attributes to which quality of evidence should apply and how the overall quality of evidence rating was generated in EVIDEM. In our framework, we rate quality of evidence using the GRADE approach and incorporate it in the assessment of performance level for each attribute.

There is no consensus on the inclusion of cost effectiveness, alongside costs and health benefits, into a VAF [51, 55,56,57]. Some argue that cost-effectiveness overlaps with costs and effectiveness and suggest removing cost effectiveness or costs and effectiveness [51, 55]. We include attributes on budget impact to the payer, out-of-pocket costs to patients, and health benefit alongside cost effectiveness in our framework. This was because cost effectiveness was mentioned by interviewees from all stakeholder groups, except those from the public. Furthermore, cost effectiveness measures the marginal effect of a health technology versus the comparator which supplements, instead of replacing, the measures of cost and health outcomes in the value assessment [58].

It has been increasingly recognized that value is multi-dimensional and value assessment has expanded beyond the current cost per QALY gained approach [53, 59, 60]. Even with modifiers, the cost per QALY gained method may still be limited in capturing all the dimensions of value and it is difficult to set appropriate thresholds to facilitate decision making [61,62,63]. MCDA is an alternative approach that has been proposed to measure value [8, 62, 64]. This approach is originated in the discipline of operational research and is concerned with decision making situations where multiple dimensions are to be combined or aggregated [64]. It has been increasingly explored in healthcare decision making and adopted or piloted by various HTA agencies and VAFs around the world [8, 40, 46, 57, 65,66,67]. The multiple value attributes identified in our study offer an opportunity to evaluate the utility of the MCDA methods [8, 62, 64]. Subsequently, we will construct a survey using the identified value attributes among healthcare stakeholders in China. The survey will include hypothetical drugs described by the identified attributes experimentally varying in their levels. Appendix 9 in the ESM gives an example of the value profile that could be used in the survey and in real-world decision making.

Our study has a few limitations. First, people from less developed regions in China were underrepresented in our study due to their limited access to internet and online data collection platforms where we posted our recruitment advertisement and performed the interview. Second, the perspectives of policymakers might be underrepresented due to restrictions on government officials from participating in research. We have conducted a document analysis to at least partly address this limitation. Third, ethics and societal implications were not included in our VAF due to their unclear definitions and difficulty to measure either qualitatively or quantitatively. This might limit the capacity of the framework in capturing some ethical and societal concerns.

China has made considerable effort in improving patients’ accessibility to quality healthcare services while striving for the efficiency of healthcare resource use. The government has developed and adopted various policies, including the zero drug mark-up policy, reform of public hospital payment method, national health insurance negotiation, and centralized drug procurement [21, 68,69,70]. The concept of value assessment in HTA has also been increasingly discussed and debated at the national level in China [71]. These policies and progress present both challenges and opportunities for the application of our VAF in China. Despite the rapid development of HTA in China in recent years, a few issues have yet to be addressed. These issues include lack of HTA researchers with sufficient training and experience, lack of a national HTA agency that produces and endorses HTA reports, and lack of translation of the HTA evidence to informing decision making on the introduction or reimbursement of health technologies [72, 73]. Furthermore, the inclusion of most new drugs relies on the centralized drug procurement process that involves price negotiation between the representatives of pharmaceutical companies and the NHSA [74]. Prioritization has been given to drugs for cancers, rare diseases, chronic diseases and children’s diseases [74]. The decentralized HTA system and the emphasis on price, CEA results, and certain diseases in the price negotiation process might result in the lack of relevant data to support the value assessment of health technologies on attributes such as health equity, innovation, and organizational impact. Thus, the use of our VAF could be impacted. However, the NHSA has recently adopted a scoring checklist to facilitate the assessment of health technologies across multiple dimensions that is similar to multicriteria decision making (MCDM) [75]. Since last year, the National Health Commission has also released a series of national guidelines for the comprehensive clinical evaluation of drugs [76]. The guidelines have included all the attributes in our VAF, except for severity of disease. These changes could open up great opportunities to validate and apply our VAF in health technology value assessment and coverage decision making in China to improve equity and accessibility of new health technologies.

5 Conclusions

Twelve value attributes were identified for the development of a VAF to support transparent, consistent, and robust health technology value assessment in China.