FormalPara Key Points for Decision Makers

Patient-reported outcome measures (PROMs) for rare diseases face potential challenges resulting from small patient populations and disease heterogeneity.

Data collection, psychometric properties and each specific type of PROM face unique challenges.

Each of the challenges have potential solutions that can be considered and selected to fit specific contexts.

1 Introduction

A patient-reported outcome (PRO) is a report of the status of a patient’s health condition that comes directly from the patient, without interpretation of the patient’s response by anyone. Accordingly, a patient-reported outcome measure (PROM) is a tool, such as a questionnaire or a survey, used to measure and collect data on a PRO, usually related to health-related quality of life (HRQoL), symptoms or treatment side effects or experience with care (adherence, satisfaction or health status) [1].

Various types of PROMs exist to capture PROs, with the main distinction being between generic measures and disease/condition/treatment-specific measures [2]. Generic PROMs are not specific to a disease, condition or treatment, but can be used across different populations. They more generally capture such aspects as quality of life (QoL); HRQoL; physical function; physical, mental and emotional health; social function; pain, etc. [2]. Examples include the Short Form-36 (SF-36) or the World Health Organization Quality of Life (WHOQOL) questionnaire. Disease-group-specific PROMs relate to a specific group of conditions or diseases, or similar diseases. These PROMs tend to be more sensitive than generic measures, but less sensitive than PROMs tailored to a specific rare disease (RD). A common example used in oncology is the European Organization for Research and Treatment of Cancer Quality of Life (EORTC QLQ-C30) questionnaire [3]. Disease/condition/treatment-specific PROMS (hereafter referred to as ‘disease-specific’) are tailored to measure symptoms, effects of treatment or other aspects related to a specific condition or disease [2].

Generic and disease-specific PROMs can be further divided into preference or non-preference based. Non-preference-based PROMs are presented as profiles or by summing answers to provide a total score that is interpretable on its own [2]. Preference-based PROMS are measured in a way in which health state utility values (HSUVs) can be derived. Instead of answers being summed, they are used to create an index score (based on societal preferences for a particular health state), which allows calculation of quality-adjusted life-years (QALYs) [24]. The most common of these types of PROMs are the EuroQol 5-Dimensions (EQ-5D), the Health Utility Index (HUI3), and the Short Form 6 Dimension (SF-6D) [4].

PROMs are increasingly being used to derive information on a treatment’s value and are often accounted for during health technology assessment (HTA) processes when making decisions on whether to provide a treatment for routine use [5]. Patient perspectives provide crucial information for decision makers in these contexts [6], particularly in RDs [7]. The high unmet need, severe and disabling nature of the condition and scarcity of adequate data for RDs means clinical trials need creative and pragmatic supplements to conventional measures, to capture treatment effects from patient perspectives [7] and help ensure the measurement of meaningful outcomes. Well-designed PROMs can support clinical endpoints, which are often challenging in RDs and may rely on surrogate endpoints [8]. Some countries (e.g. the UK) use QALYs to quantify health outcomes for HTA and use preference-based generic PROMs to derive HSUVs for calculating QALYs to be included within economic models [9]. Other non-QALY-based HTA systems (e.g. Germany) use PROMs as sources of additional evidence for the deliberative process [10, 11]. Currently, the focus of HTA bodies is largely on generic PROMs, and the use of PROM evidence in decision making is inconsistent [10].

The use of PROMs in HTA for rare disease treatment (RDTs) also poses a number of challenges, some of which are not specific to RDs but may be exacerbated by the inherent characteristics of such conditions:

  • Data are usually collected from small patient populations [12], which may result in inaccurate aggregate results.

  • Conditions and presentations can be heterogeneous, which make it difficult to capture meaningful and generalizable outcomes [3, 12,13,14,15].

  • Information and understanding regarding disease progression and natural history is lacking, which makes it difficult to know which PROMs to use or how to develop new PROMs [12, 13, 16, 17].

  • The number of studies is insufficient, which makes it difficult to obtain representative samples in literature reviews [18].

  • Many issues that are important to patients are not captured with existing measures/methods [19].

  • Existing value frameworks largely fall short of consistently measuring outcomes that matter to patients [16].

  • Psychometric and linguistic validation of newly developed PROMs is challenging to attain [12].

  • Patients are often children or have cognitive impairments associated with the disease, which makes it hard or impossible for patients to self-report and often places a reliance on proxy measures such as parent proxies [13].

These challenges have important implications for the use of PROMs in HTA for RDTs; a thorough understanding of challenges (and potential solutions) can be beneficial for all stakeholders involved in these processes.

The aim of this research was to review the current literature on the use of PROMs in RDs and identify key factors to consider when using PROMs for HTA of RDTs. These identified factors are then interpreted and discussed, with the goal of providing useful, evidence-based insights that can support HTA stakeholders when considering PROM results during RDT appraisal. This work is not about the details of selecting, adapting or developing PROMs for a particular RD, as this is a complex process for which an entirely separate piece of research is needed.

This study was conducted within IMPACT HTA, an EU Horizon 2020 project examining new and improved methods in costs, health outcomes and economic evaluation in the context of HTA and health system performance measurement (https://www.impact-hta.eu/). This work package (WP10) focuses specifically on HTA appraisal of medicinal products for RDs. Results will feed into a guidance document intended for HTA stakeholders on the use of PROMs in HTA for rare diseases.

2 Methods

2.1 Study Design

A scoping review of scientific (PubMed) and grey (Google) literature was conducted, following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses—extension for scoping reviews) checklist [20] to ensure accurate and comprehensive information reporting. Only PubMed was used as an inclusive database; other database test searches (Science Direct, Springer Link) provided overlapping hits. All hits from each search string were exported to Excel. One reviewer screened the titles (eliminating unrelated articles), read the abstracts (to eliminate unsuitable articles not detected by title screening), read the full text for promising and included articles and extracted the information. A second senior researcher read and reviewed the full-text articles selected for inclusion and the extracted information. To ensure all relevant information was captured, literature that was already available for the project was also included, and references of all selected articles were checked to retrieve any relevant literature that was not captured by the search strings. Searches were conducted until May 2020, with no date or study design limitations.

In PubMed, search terms were open and included patient reported outcome measure*, patient reported outcome*, prom*, rare disease*, RDT*, orphan medicinal product*, OMP*, challenge*, recommend*, healthy technology assessment, HTA, appraisal*. For the grey literature, search terms included patient reported outcome measure, rare disease, health technology assessment. The full search string combinations are listed in Appendix 1 in the electronic supplementary material (ESM).

To encompass a wide perspective from both scientific and real-world practice viewpoints, the search was broad and incorporated various types of literature. This included original research, reviews, commentaries, discussion papers, policy papers, conference/webinar/symposium presentations and position papers.

2.2 Article Selection

Articles were included if they were in English and provided any insight into PROMs for RDs in terms of the advantages, challenges and potential solutions, both in general and specifically related to HTA.

Articles were excluded if they only described the application or development of a PROM without a description of the advantages, challenges or potential solutions relevant for use with RDs or if they referred to aspects of PROMs not relevant for the purposes of this research.

The PubMed search identified 103 scientific articles, resulting in 44 records included for analysis (see Fig. 1). This included 23 original research, nine reviews, four commentaries/editorials/short communications, three conference/webinar/symposium presentations, two discussion/perspective papers, two reports and one position statement. The remaining 59 articles were excluded because they were related to QoL or experience, but not PROMs; they were about effectiveness, not QoL; or they were generally about evidence.

Fig. 1
figure 1

Article selection flow chart

2.3 Information Extraction

Relevant information from all included articles identified as being related to the advantages, challenges and potential solutions of using PROMs for RDTs was extracted and summarized in an Excel template. Extracted information included authors, date of publication, journal, title, country, type of research, research objective(s)/research questions(s) and key advantages/challenges/solutions mentioned in the text. This information was used to identify which aspect of PROMs in HTA the article was most applicable for, or the article ‘focus’. An overview of the characteristics of selected articles is displayed in Table 1, and the detailed information that was extracted is in Appendix 2 in the ESM.

Table 1 Characteristics of included studies

Key findings were derived from each article included in the analysis and grouped into pre-defined categories related to PROMs for RDTs. Categories were identified as those areas requiring understanding of all stakeholders to better ensure successful use of PROMs for RDTs in HTA: (1) psychometric properties, (2) existing generic PROMs, (3) existing disease-group-specific PROMs, (4) existing disease-specific PROMs and (5) creating new disease-specific PROMs. Consideration was also given to whether the PROMs were preference or non-preference based. The potential solutions of included articles were discussed among authors regarding factors that may hinder or facilitate solution implementation. This is explored further in the discussion section to indicate which solutions may be more or less feasible to implement and why, and what HTA bodies could do to facilitate the success of such solutions.

3 Results

Our results, summarized in Table 2, outline the potential challenges and their solutions when using PROMs for RDTs, and implications for HTA.

Table 2 Challenges and solutions in using patient-reported outcome measures to inform the appraisal of rare disease treatments

3.1 General Considerations for the Use of Patient-Reported Outcome Measures (PROMs) in Rare Diseases (RDs)

3.1.1 Potential Data Collection/Measurement Challenges and Solutions

Diversity of Use of PROMs in RDs Researchers are using a wide variety of PROM types for the same condition or group of conditions, making the comparison of results across populations more challenging [15, 21]. Recommended core outcome measures based on existing guidelines could be developed to provide a standard set of PROMs for specific RDs or groups of RDs to ensure improved consistency and comparability across populations [5, 15, 21]. Disease and treatment characteristics from the perspective of all stakeholders, especially patients and carers, should be considered when developing these core outcome measures. Concept elicitation interviews, for example, could be conducted with as many patient and carer groups as possible to evaluate differences in patient experiences across disease subtypes. These studies should take into account the most important features that cause variation in disease experience, such as disease group, age, ethnicity, or disease severity. They should further aim to identify the most important symptoms within various subtypes and focus on core signs and symptoms that apply to most or all patients. Working with patient and clinical experts at an early stage is essential for capturing the meaning and importance of all potential endpoints [5, 13].

Small, Heterogeneous Populations The small sample sizes and heterogeneous populations inherent to RDs result in sampling, data collection and statistical analysis issues, which often mean that conventional methods of selecting, developing or adapting PROMs are not effective [15, 22].

Small population sizes result in sampling and data collection issues when trying to recruit enough patients for clinical trials or PROM development/validation [23]. Collaborating with patient advocacy groups and clinical care networks may help to maximize patient recruitment [23]. Some software, such as that based on Bayesian item response theory, can offer statistical methods to overcome the small sample size challenge while maintaining adequate psychometric qualities [24].

Multicentre or international data collection can increase sample sizes and allow pooling of data from different locations. This may include, for example, using a research network to collect data; all sites can identify eligible participants via electronic health records and use various recruitment methods. This enables efficient identification of eligible participants, an available sample and a standardized approach that allows for pooling of information [25]. However, challenges remain when collecting multi-site data. First, this may entail linguistic and cultural validation of these PROMs. Neglecting cultural specificities may lead to a lack of cultural validity, which makes comparison of results difficult and may result in dropouts and missing data [15]. Moreover, it may be difficult to engage patients over a wide area [26], and cross-cultural variations in research protocols can exist between centres [14]. It is essential that extra efforts are made to engage participants, and data collection should be standardized across locations as much as possible. Collecting and pooling international data can be an effective solution to overcome the small sample size issue only with the presence of high-quality study design and methods and psychometric, linguistic and cultural validity [14].

In establishing cultural validity, a statistical Rasch measurement theory calculation can first determine whether significant country or language differences exist [15]. To better ensure cultural validity, it has been recommended to consider the following six types of cross-cultural equivalence: conceptual, semantic, operational, item, measurement and functional equivalence [27, 28], with the first three being particularly important [28]. One approach to achieving conceptual equivalence is the simultaneous development of instruments in different cultural settings. To ensure semantic equivalence, forward and back translation and cognitive debriefing in a small sample of the target population is recommended [28,29,30]. Finally, an understanding of response styles in different settings and the use of different measurement approaches may help to address operational equivalence [28].

In terms of heterogeneity, there may be substantial variability even within one RD, so measures in which patients answer the same questions may not capture each manifestation of the disease [31]. However, collecting information on every PROM for every domain of a disease can be too demanding on patients, especially with a small sample, which can result in fatigue and missing data [15]. Therefore, a primary challenge is to identify a PROM that has the most appropriate content possible, as well as a method of data collection in which patients can realistically participate [23]. PROMs need to be tailored to the patient, condition and therapy, but should at the same time contain some core comparable outcome measures, as mentioned previously [31].

Difficulty with Self-Reporting Obtaining information from patients with RDs can often be challenging. Patients are not always able to self-report [13, 32], as they are often children and/or may have cognitive impairment or functional limitations from their illness. Moreover, individual and cultural differences may influence how people interact with instruments. Furthermore, patients' responses may not only reflect disease and treatment experience, but also other environmental or contextual factors [31]. Therefore, self-report measures might not accurately capture the patient experience [28].

Several possible ways of dealing with these challenges exist. PROM information can be obtained through proxy PROMs or PROM instruments designed for children, although the reliability of parent-proxy responses still needs further investigation since child and adult preferences can differ [33]. PROMs can be completed with the help of an interviewer when patients are able to report but have physical impairments and cannot complete paper, computer or phone measures [23]. When no self-report or parent proxy is possible, it may be more suitable to use other outcome assessment measures, such as clinician- or observer-reported measures, performance outcomes or survival-based outcome measures [32], although it is also important to minimize clinician burden in terms of recruitment and data entry [26].

HTA bodies usually prefer established PROMs that are easy to interpret and are often critical of poor-quality PROM data [10]. While poor-quality data can understandably not be accepted, it is important that decision makers recognize innovative approaches to PROM use and development in light of the challenges posed by the paucity of PROMs for RDs and the small, heterogeneous populations.

3.1.2 Potential Challenges and Solutions with Psychometric Properties

PROMs are often not fit for purpose; the evaluators are often not convinced that a PROM is measuring what it claims, or supporting evidence may be insufficient [34]. As such, it is difficult for HTA bodies and payers to accept the results that can be realistically expected from RD PROMs [31]. Thus, if a drug is being developed for approval, then discussion and collaboration with the relevant agencies is essential to both ensure the PROM is attuned to their standards and to come to an agreement about generating evidence to reduce uncertainty [5, 23, 33].

PROMs need to be as valid and responsive as possible [9, 13] and allow for accurate interpretation of results [31], yet PROMs for RDs are often not validated for the population in which they are being used [5, 14]. Evaluating the psychometric properties of PROMs in RDs is challenging, as small population sizes and lower-quality data mean conventional methods are not always appropriate [15, 35, 36]. To deal with this, RD populations can be combined with populations with similar disease presentations to increase the sample size. Rasch measurement theory in particular is a potential solution that can be used for small sample sizes when combining RD populations with similar disease presentations; the use of differential item functioning can then be used to determine whether the responses of the combined groups are equivalent and, if they are, both samples can be used as a larger sample to validate the PROM [16]. Conventional methods are still used for measuring psychometric properties of RD PROMs [17, 37], but these require larger sample sizes and may not always be appropriate for such PROMs. Mixed-methods psychometric research is the best fit in RDs, as it can help to maximize clinical interpretability, improve conceptual understanding and avoid potential measurement problems [16].

Practical limitations exist for current PROMs for RDs in terms of feasibility and response rates, and they often have poor content validity and poor face validity due to issues with data quality [10]. Validating a PROM can be challenging [18], yet it is perhaps the most important psychometric property to address. Content validity (measuring concepts of importance to patients) is of utmost importance and should always be checked [38]. Vinik et al. [39] described the validation of a condition-specific questionnaire for an RD using different approaches (e.g. correlation, linear regression) to assess elements such as floor or ceiling effect and invariance, but these are conventional approaches and not ideal for such small sample sizes. However, some approaches do not require large sample sizes. For example, face validity/generalizability can be checked by expert panel review, and content validity can be checked by linking items to international classification systems [38]. Hybrid concept-elicitation/cognitive interviews can also be used to test content validity in new populations [13].

3.2 Generic PROMs

HTA bodies often prefer validated generic PROMs [10], as the standardized questions used allow for comparability across diseases and populations [31]. Preference-based generic PROMs (e.g. EQ-5D, HUI, SF-6D, 15D, Assessment of Quality of Life, Quality of Well-Being scale) from which HSUVs can be derived are preferred when economic analyses are conducted: "… having utility data is of course critical for accurate cost effectiveness analysis and there aren’t that many instruments out there that do have the utility information” [9].

However, generic PROMs can pose the challenge of being unresponsive and missing important disease- and population-specific data [14, 15, 18, 31]. This is a particular issue for PROMs in RDs because the small, heterogeneous samples and variation in treatment impact increase the possibility of generic PROMs being insufficiently applicable.

Since disease-specific PROMs tend to be more sensitive for distinguishing changes in health within a specific population or disease [15], it has been suggested to use both a generic and a disease-specific instrument for RDs in a complementary way. This enables comparability across populations and sufficient data for economic analysis, as well as the ability to detect small but important changes specific to the condition [14]. An objective systematic approach for RDs might be to develop a variety of measures that include some constant features of generic measures as well as measures related to the specific personal and societal factors appropriate for patients and disease-specific aspects. This would include, for example, basic QoL questions, with added disease-specific questions. For instance, a HTA-specific approach similar to that of the European Network for HTA could use a disease-specific PROM for effect assessment and a generic PROM for utility analysis [31]. The combination of generic and disease-specific instruments requires the willingness of HTA bodies to accept such evidence. Since many prefer generic PROMs, this would necessitate a change in requirements, or at least discussion to come to an agreement with decision makers in a particular country regarding what they would accept for a given RDT.

3.3 Disease-Group-Specific PROMs

The main advantage of disease-group-specific PROMs is that they are more widely applicable to various conditions than disease-specific PROMs, and—while they are not as sensitive as disease-specific PROMs—they are more sensitive than generic measures. It is nearly impossible to create disease-specific PROMs for every RD, making disease-group-specific PROMs across similar conditions a practical alternative.

The challenges with using disease-group-specific PROMs primarily revolve around their applicability and responsiveness. First, many of these instruments are not specifically compatible enough with the target disease and may include some items that are not applicable for the target disease/population [16]. Thus, the responsiveness of disease-group-specific PROMs to the RD or manifestation to which they are applied may not be well-established [36] or sufficient to grasp the RD’s specificity [18]. Using such existing instruments from one context of use to another is valuable and needs to be carried out in a way that increases applicability as much as possible. To facilitate the use of existing instruments, previously created item banks can be used to select and match instrument to the concept of interest (COI). Instruments closest to the COI that can be disaggregated should be selected, if possible, to only include relevant subscales [13]. A systematic review to identify the most relevant PROM may be needed. Existing supportive tools can facilitate the selection process [5], such as the COSMIN (COnsensus-based standards for the selection of health Measurement INstruments) guidelines [40], the ePROVIDE™ database (contains a range of information on PROMs, including critical review on the measurement properties) [40, 41] or PROMIS (a cooperative group programme of research aiming to develop, validate and standardise item banks to capture PROM data across a wide range of conditions and domains) [15, 42]. To increase applicability for these instruments in general, the scope of applicability may need to be limited so that concept-specific instruments are created that could be applicable across a closely related group of RDs and not just any similar disease [16].

Similarly, if there is substantial heterogeneity in manifestation of the RD in question, which is often the case, it may not be possible to measure distinct outcomes across the population [23], making the application of an existing disease-group-specific PROM difficult. A multi-attribute questionnaire may therefore be useful when working with disease-group PROMs, to make the PROM more applicable across heterogeneous manifestations of an RD. Mixed-methods frameworks may also be a practical approach to optimize the applicability of a PROM in a new context of use [3]. The US FDA suggests using mixed methods in clinical trials to capture patient experience qualitatively and quantitatively and gives recommendations for identifying what is important to patients [7, 43,44,45].

3.4 Disease-Specific PROMs

The advantage of disease-specific PROMs is that they are more sensitive and responsive than generic PROMs and disease-group PROMs, making them more likely to capture meaningful outcomes of specific conditions.

Disease-specific measures pose the challenge that they can only make comparisons within the same patient group [14]. As disease-specific and generic instruments assess different aspects of QoL, the use of both instruments in a complementary way has been suggested [14].

Moreover, if multiple disease-specific measures for different conditions (and manifestations of a condition) exist and are used, this can lead to outcome measure heterogeneity. Outcome measure heterogeneity hinders the reliable and reproducible capture of a significant change in disease or health status and the synthesis and meta-analysis needed for evidence-base generation [46]. To manage this challenge, recommended core outcome measures could be developed for disease-specific instruments to ensure a level of comparability [5, 15].

Disease-specific PROMs for RDs are generally lacking [15, 17, 46,47,48]. Those that have been developed may have been validated for a specific population that is not the target population, and clarity on them within the expert community is often lacking [49]. If no (validated) PROMs exist for an RDT, validated PROMs from other, similar diseases could be considered, or a new PROM can be created if resources permit. In addition, in QALY-based systems, generic preference-based PROMs yielding HSUVs (e.g. EQ-5D) are often preferred by HTA bodies. The ‘mapping’ technique can potentially allow the conversion of disease-specific PROM responses onto HSUVs derived from generic PROMs, but the lack of concordance between disease-specific and generic PROMs means it is complicated to conduct a mapping exercise in practice, and their degree of ‘overlap’ should be assessed in advance using proper correlation tests [31, 50].

Furthermore, the amount of preference-based disease-specific PROMs available to inform cost-utility analyses is limited, and only four have been identified in RDs: the Amyotrophic Lateral Sclerosis Utility Index (ALSUI), the Aberrant Behaviour Checklist Utility Index (ABC-UI) for fragile X syndrome, the Myelofibrosis 8 Dimensions, and a preference-based scoring algorithm for the Short Bowel Syndrome health-related Quality of Life (SBS-QoL) scale [51]. Utility data are often lacking for conditions affecting infants and young children because most instruments are not designed for such young age groups, yet about 80% of RDs affect children [52]. Additionally, the benchmarking of HSUVs estimated for similar diseases is limited by the resemblance of health states being compared [53]. Thus, the usage of preference-based disease-specific PROMs in HTA is generally limited to interventions where it is inappropriate to use a generic PROM.

The development of new algorithms to derive HSUVs from disease-specific PROMs is encouraged to evaluate RDTs where the use of generic PROMs is not appropriate. The range of HSUVs in diseases with similar characteristics can be used as a benchmark to validate results of such new preference-based instruments in RDs. For example, HTA bodies could use utilities benchmarked from similar diseases to define reasonable intervals for the incremental cost-utility ratio produced by preference-based RD-specific PROMs [53].

3.5 Creating New Disease-Specific PROMs

The advantage of developing new disease-specific PROMs is that they have the potential to be well-tailored to a specific disease, thus making them highly likely to capture meaningful outcomes. A new PROM is extremely time and resource intensive to create well, often requiring several steps, patient and clinician engagement and qualitative and perhaps quantitative analysis [3, 24, 54]. This problem is amplified for RDs because of the heterogeneous disease presentation and small populations, which make it difficult to access and recruit (enough) patients to collect data for PROM development [38].

Several approaches can be used to optimize the PROM development process. For instance, computer-assisted technology (CAT) can streamline instrument development by helping reduce response burden on patients and increase completion rates, and multi-attribute questionnaires using skip patterns and computer adaptive testing can be customized to the individual. Such technologies enable a small but specific number of questions to be presented, selected based on a person’s answers to previous questions. This also allows disease-specific items highlighted by patients to be incorporated into the questionnaire, thus tailoring the PROMs to a patient’s specific symptoms without having to develop a completely new instrument [17]. Additionally, web-based approaches such as electronic PROMs allow data to be collected internationally, locally and in real time. This gives patients the freedom and flexibility to complete PROMs when it is convenient for them, which can improve dropout and missing rates and allows data to be collected from multiple sources and locations [13, 15, 26].

Additionally, the natural history of most RDs is poorly understood. Without sufficient information about the disease, it can be difficult to identify concepts of interest for meaningful treatment benefit and to clearly determine what outcome(s) should be measured [16]. To maximize knowledge as much as possible, all available sources of information should be used to understand the natural history of an RD. Engaging with patient advocacy groups and the RD community can help provide the full picture, from disease symptom onset to correct diagnosis and treatment [13]. It has also been recommended that a PRO consortium that incorporates the patient voice throughout all stages could be beneficial for developing PROMs that capture disease-specific patient experience and challenges [55]. It is crucial to partner with and listen to patients and caregivers early and systematically to identify meaningful treatment outcomes that resonate with their experience, preferences, expectations and values and compensate for a lack of natural history knowledge [5, 7, 56]. Patient or patient representative involvement via, for example, discussions or interviews can be used to explore and prioritize patients’ health concerns for a given RD [19, 57] and can help develop an understanding of natural history as much as possible. This approach can be used to identify the most common symptoms among patients and what they consider most important [58].

Effective approaches to developing PROMs are not always clear. For instance, the current FDA guidance for reviewing and evaluating existing PROMs does not address disease-specific issues in the development of PROMs, which is especially important for RDs [18]. When developing a disease-specific PROM, it may therefore be helpful to refer to any existing guidelines and examples of stages that may be useful. The FDA has documented guidelines [46], and an example of PROM development stages has been published [59].

4 Discussion

RDs pose unique challenges and require PROM strategies that are flexible and innovative [3, 16]. This scoping review synthesized the details of challenges and potential solutions in the literature for the use of PROMs for RDTs in HTA. To our knowledge, this is the first study to thoroughly review the literature and comprehensively identify the key challenges and existing potential solutions for the use of PROMs for RDTs in HTA. This work can be useful in helping HTA stakeholders understand the specificities of using and developing PROMs (and associated HSUVs) in RDTs for HTA.

An overarching takeaway is that it is essential for HTA stakeholders to be aware of the potential challenges that may arise when using PROMs in RDTs, which are similar in rare and non-rare diseases, but are exacerbated in RDs (e.g. heterogeneity of disease presentation and diversity of outcomes, data collection/psychometric property challenges due to small sample sizes, lack of sensitivity of generic PROMs, lack of disease-specific proms). In HTA, the added benefit of treatment that PROMs aim to demonstrate may not be accurately captured or interpretable because of these challenges. This requires HTA stakeholders to recognize the need for potential innovative solutions. Some potential solutions have been identified with this research, such as the use of core outcome measures, stakeholder communication to agree on acceptable and feasible PROM data and combining populations with similar diseases for PROM development or validation. However, many reported solutions are still conventional and not necessarily appropriate for RDTs; there remains a substantial need for more effective, innovative solutions. The solutions that were identified in this search were reviewed and discussed among the research team for appropriateness and feasibility.

Solutions that were agreed to be both appropriate and feasible to implement for RDTs were as follows:

  • Development of core outcome measure set across disease, disease subtype or similar disease This requires initial upfront agreement and development; however, once developed, it is a tool that can be used sustainably with minor adjustments.

  • Proactive stakeholder collaboration and discussion to agree on acceptable and feasible PROM data This requires stakeholder willingness for planning and time commitment but has the potential to save a substantial amount of time later in the process.

  • Combining populations with similar disease characteristics to increase sample size Guidelines and best practices are needed for this but, if done properly, provides a promising solution to overcoming the limited RD sample size.

  • CAT to streamline the PROM development process Although this depends on resources, using available tools to overcome as many data collection challenges as possible does not require substantial time or structural changes.

  • Take existing guidance into account (e.g. FDA) Referring to and following any available high-quality guidance is only a matter of taking the time to do the research.

  • Use of disease-group PROMs when no disease-specific PROM exists This is a very promising solution, but requires further research regarding what ‘disease group’ actually entails. In this paper, we used the term ‘disease group’ to refer to any methods using PROMs across similar diseases, but definitions vary in the literature: some refer to disease families [16], others relate to symptom- or function-specific PROMs or PROMs that capture similar symptoms in analagous conditions [47], and still others refer to using PROMs similar to those for common diseases [60]. Thus, the parameters of such PROMs still need to be better defined.

    A solution that was considered appropriate and probably feasible to implement was mixed-methods research, which can serve to avoid potential measurement issues and maximize the applicability of disease-group-specific PROMs, but it does require time and resource investment.

    Solutions that were agreed to be appropriate and feasible but that entailed more potential challenges were as follows:

  • Use of generic and disease-specific measures This could be a very valuable solution, but it depends on the specific HTA body and the type of data they are willing to accept. For example, if HTA agencies only want preference-based generic PROMs, the impact of adding disease-specific measures will likely be minimal. This solution would thus require a broadening of the willingness of HTA agencies to accept different forms of QoL data. In QALY systems, this would require clarity around how data not included in the economic model weighs in the decision.

  • Mapping disease-specific measures to generic PROMs to enable HSUVs to be derived for QALY-based systems This approach in and of itself is good, and many HTA agencies are willing to use it, but it relies heavily on the degree of overlap between generic and disease-specific measures, making the mapping exercise particularly difficult for RD PROMs, and very few mapping algorithms are available in RDs [31, 50, 51]. Therefore, it may be necessary for HTA bodies to recognize that it is often not possible to map disease-specific PROMs onto generic ones, in which case an alternative should be used to generate HSUVs, such as referring to published literature or conducting ad hoc valuation studies.

  • Multi-site or international data collection This is a good way to overcome the small sample size issues for PROM development and validation but poses challenges with regard to obtaining cross-cultural validity and may thus require more consolidated and adhered to guidance to produce data of sufficient quality.

  • Developing new disease-specific PROMs when none exist While this solution can lead to PROMs that are well-tailored to the disease and can be created to be preference based, this is very resource and time intensive relative to the number of users and may often be beyond stakeholders’ resources.

In terms of putting some of these solutions into practice and working towards further innovative solutions, HTA decision makers require a willingness to accept other forms of QoL evidence than are currently expected. HTA PROM requirements and preferences differ across jurisdictions, so one solution cannot be recommended across all; however, the general challenges are relevant for all stakeholders, and solutions can be specified to particular requirements with proactive collaboration between key stakeholders. This is in line with suggestions in the literature that, in order for PROMs to be integrated into HTA in a more standardised and sustainable way that contributes added value to the assessment, there is a need for international agreement on the evidentiary requirements that is accepted by all stakeholders [5, 16]. It has further been suggested that patient-relevant outcomes and endpoints should be discussed in advance with HTA bodies and other stakeholders via joint scientific advice meetings or qualification procedures, so that optimal evidence-generation plans can be designed and agreed on. When patient evidence suggests that novel PROMs or the adaptation of existing outcome measures to make them more relevant to patients are needed, the prospect of innovative measures or methodologies (such as individualized outcome measures) to capture patient benefit should be accepted in the HTA process [5, 16].

Additionally, the deliberative HTA process needs to allow for sufficient consideration of evidence around QoL. For countries with a QALY system, HSUVs from a generic PROM are used in the economic model, and the disease-specific PROM (if considered) would be deliberated. The former is likely to have more weight in the decision, but this approach needs to be re-evaluated considering the frequent inability of generic PROMs to accurately capture PROM evidence [14, 15, 18, 31]. If disease-specific PROMs were considered and given equal weight, the solution of using generic and disease-specific PROMs together would be a more feasible solution. In countries with non-QALY systems, both generic and disease-specific PROMs would be deliberated in parallel, but the interpretation of the generic is often easier since committee members are more familiar with such measures. A proposed solution to this that could be developed would be to provide a benchmark that supports the interpretation and comparison of the results from the different PROMs.

This study has some limitations that should be acknowledged. First, the searches were only conducted in PubMed. While this is a comprehensive database, others may still have provided additional articles. Furthermore, we cannot clearly recommend one concrete approach for selecting and using PROMs for RDTs in HTA, as this is a complex process with additional factors that must be taken into account but which were beyond the scope of this research. Some points of relevance were not captured by the searches but are relevant for HTA bodies, including the change in disease course over time, slowing disease progression or maintaining function, which are significant for patients but difficult to capture and factor into clinical benefit; impact on family is similarly highly important and relevant for decision makers to factor into disease severity and treatment benefit.

Despite these limitations, this study contributes a situational analysis of where we are today and points to areas where further PROM research is needed, along with constructive discussions around what may or may not be acceptable for improving the development and use of PROMs for HTA in RDTs.

5 Conclusion

The usefulness of PROMs in HTA for RDTs may be undermined by practical challenges. A better understanding of the potenital advantages, challenges and solutions when using PROMs for RDTs can help improve their use in HTA. This review provides an overview of the critical issues and some potential solutions for the use of PROMs for RDs in HTA. Some solutions can be taken forward, but solutions are often conventional ones that may have limitations in RDs. There is a pressing need for HTA stakeholders to acknowledge these limitations and discuss innovative approaches and non-standard solutions.