FormalPara Key Points for Decision Makers

There are no functional dyspepsia-specific clinical outcome assessments that have adequate documentation to support their use for US labeling purposes in regulated clinical trials.

There is a need for the development of a new clinical outcome assessment for patients with functional dyspepsia that reflects the preliminary conceptual model proposed by the authors.

1 Introduction

Functional dyspepsia (FD) is a complex disorder defined by upper gastrointestinal symptoms, including epigastric pain and burning, postprandial fullness, early satiation, bloating, belching, nausea, and vomiting, outside of any evident structural disease [1]. The definition of FD will likely continue to evolve with the publication of the revised Rome IV diagnostic criteria in 2016 [2]; however, the current definition, formulated by the Rome task force in 2006 as part of Rome III, includes two subtypes, postprandial distress syndrome and epigastric pain syndrome, that may overlap to varying degrees [3]. Postprandial distress syndrome is characterized by postprandial fullness and early satiation, and epigastric pain syndrome is characterized by epigastric pain and burning. Given the current definition, the prevalence of FD as defined in Rome III is difficult to determine because, based on the diagnostic criteria, an upper endoscopy must be performed to rule out the presence of any structural disease [3]. As a result, the literature commonly reports the prevalence of “uninvestigated dyspepsia,” with estimates ranging from 5 to 15 % depending on the population and definition used within individual studies [1].

The Critical Path Institute’s Patient-Reported Outcome (PRO) Consortium [4], in conjunction with advisors from the US Food and Drug Administration (FDA), has identified the need for a well-defined and reliable patient-reported instrument to assess treatment benefit in FD clinical trials. Although self-reporting is central to identifying and evaluating treatments for this symptom-defined condition, it is unclear to what extent the development of existing FD symptom PRO questionnaires was consistent with the FDA’s Guidance for Industry, “Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims” (hereafter referred to as “FDA PRO Guidance”) [5]. Therefore, the PRO Consortium established the Functional Dyspepsia Working Group (WG) to develop a PRO instrument for use as a primary endpoint assessment in FD clinical trials, and to submit the instrument for qualification under the FDA’s Drug Development Tools (DDT) Qualification Program [6].

As a first step, the WG set out to develop the evidentiary basis for the construction of a new PRO instrument to evaluate FD treatment efficacy claims. To this end, both a symptoms review and an instrument review were conducted. In keeping with the FDA PRO Guidance [5], the objectives of these activities were to document, from the perspective of the empirical literature, the primary symptoms of FD, evaluate the extent to which existing questionnaires target those symptoms, and, finally, identify any missing evidence that would impact the questionnaires’ use in regulated clinical trials to assess treatment efficacy claims intended for product labeling. To the extent that existing instruments did not measure the concepts of interest (i.e., the primary symptoms of FD) or failed to meet the FDA’s evidentiary standards, decisions could be informed with respect to developing a new instrument or repurposing an existing one. In that light, it is important to note that the FDA categorizes a “questionnaire” as a set of questions or items shown to a respondent to obtain answers for research purposes, while an “instrument” consists of both a means to capture data (i.e., a questionnaire) plus all the information and documentation that supports its use (e.g., methods and instructions for administration or responding, a standard format for data collection, and methods for scoring, analysis, and interpretation of results) [5].

2 Methodology

2.1 Symptoms and Instrument Literature Searches

Literature searches were conducted in the Embase®, PsycINFO®, and MEDLINE® databases, using the OvidSP platform. Searches were limited to English-language studies in humans. To prevent the aggregation of unrelated or previously misclassified diseases, search results were also limited to those conducted from January 2006 onward, the year the Rome III criteria [3] for FD were published; and the search results closed May 2013, after which our instrument development transitioned to qualitative research. Search terms included “functional dyspepsia,” “nonulcer dyspepsia,” “idiopathic dyspepsia,” “essential dyspepsia,” “postprandial distress syndrome,” and “epigastric pain syndrome”; these were required to be present in the abstract of the article. In addition, abstracts were required to include at least one search term related to symptoms, questionnaires, or qualitative research methods. A manual inspection of the reference lists of all retrieved articles was also performed to identify any relevant citations not captured with the specified search criteria.

Search results were screened by abstract for relevancy and retrieved for full review according to pre-defined eligibility criteria. Specifically, articles were selected for review if they were specific to FD in adults, referenced symptoms of FD as experienced by patients, or referenced FD symptom-specific questionnaires or published qualitative research with patients (i.e., referenced qualitative design/methodology/analysis, such as interviews, focus groups, or patient reports); articles were excluded if they concerned FD in a pediatric population, concerned symptom assessment in a mixed population (i.e., population was not exclusive to patients with FD and included patients with other gastrointestinal conditions), reported data on patients recruited into a study based on pre-Rome II criteria, reported FD symptoms without specifying a qualitative methodology or naming a specific questionnaire by which they were collected, focused on questionnaires used for diagnostic purposes only (not study endpoints), or were derived from non-peer-reviewed research (e.g., conference proceedings or dissertations). From each article selected for full-text review, data pertaining to the study aim, sample demographics, methodology, and results were extracted by populating a data extraction spreadsheet with these pre-specified variables.

In addition to the review of the literature, the patient-reported outcome and quality of life instruments database (PROQOLID) was searched to identify existing PRO instruments used to measure FD symptoms. PROQOLID is a database of PRO instruments managed by the Mapi Research Trust [7]. It includes information on psychometric properties, development and validation, available translations, and conditions of use from over 1000 reports on PRO instruments. Questionnaires focusing on or comprising domains focusing on the assessment of the patient experience of FD symptoms were considered for review. Questionnaires evaluating the impacts of FD, or health-related quality of life or quality of life, were excluded, as were any questionnaires not available in English.

2.2 Conceptual Model Development

Collectively, symptom concepts (e.g., epigastric pain and burning) identified in each of the reviewed articles were used to construct a preliminary conceptual model of the symptom experience of FD [8]. For comprehensiveness, the conceptual model included all concepts reported from each of the three sources, including existing PRO instruments, published qualitative research, and the Rome III criteria [810]. The published qualitative research included documented instances of symptom concepts spontaneously reported by patients with FD. Published interviews, focus groups, or written reports in which patients were allowed to freely expound outside the confines of a previously described or developed instrument were all considered eligible for analysis.

Proposed by Wilson and Cleary [8], a conceptual model is a heuristic classification scheme that links a specified disease state or condition to its proximal and increasingly distal health outcomes. In general, proximal concepts tend to be uni-dimensional, have a direct relationship with the condition and effects of treatment, and have greater potential to be characterized by primary trial outcomes. Distal concepts tend to be multi-dimensional, indirectly related to the condition and effects of treatment, and have less potential to be characterized by primary trial outcomes [9, 10]. In the present context, the preliminary conceptual model was intended to capture only proximal, condition-level concepts (i.e., symptoms of FD).

2.3 Instrument Evaluation

Following the instrument literature review, existing PRO instruments were evaluated using a two-step approach. First, the adequacy of concept coverage was evaluated by comparing the concepts assessed by the reviewed instruments to the primary FD symptoms specified in the Rome III criteria (postprandial fullness, early satiation, postprandial nausea, excessive belching, epigastric pain, epigastric burning, and upper abdominal bloating) [3]. Second, instruments deemed to have adequate conceptual coverage were further evaluated against the evidentiary requirements set forth in the FDA PRO Guidance [5], and Clinical Outcome Assessment (COA) criteria established by the FDA’s Study Endpoints and Label Development (SEALD) group [11].

3 Results

3.1 Symptoms Literature Search

The literature search identified a total of 787 articles. Through the initial database search (Embase®, PsycINFO®, and MEDLINE®), 780 potentially relevant articles were identified. Subsequently, seven additional articles were added from the article reference lists; 56 articles were included in the final review (Fig. 1).

Fig. 1
figure 1

Symptoms literature search flow diagram. Asterisk articles were excluded if they concerned FD in a pediatric population, a mixed population (i.e., not exclusive to FD), reported data on patients recruited into a study based on pre-Rome II criteria, reported FD symptoms without specifying a qualitative methodology or naming a specific questionnaire by which they were collected, focused on questionnaires used for diagnostic purposes only, or were derived from non-peer-reviewed research. FD functional dyspepsia

3.2 Conceptual Model for Functional Dyspepsia Symptoms

The FD symptom conceptual model derived from the symptoms literature review is presented in Fig. 2 (depicted as a Venn diagram). The bottom left circle (a) of the model contains those concepts listed in the Rome III criteria for FD [3]. The top circle (b) contains the additional concepts assessed in existing questionnaires, which largely fall into one of two distinct categories: irritable bowel syndrome (IBS)-like symptoms [3] and heartburn or gastroesophageal reflux disease (GERD)-like symptoms [12]. Other dyspeptic symptoms that did not fit into one of these two categories are listed separately. Heartburn or GERD-like symptoms have been shown to co-occur with dyspeptic symptoms and are deemed interconnected to the key diagnostic symptoms of FD [1]. Further, patients with FD frequently experience IBS-like symptoms, making it difficult to separate IBS from the symptoms used as diagnostic criteria of FD.

Fig. 2
figure 2

Conceptual model for FD symptoms. Qualitative patient reports from Pilichiewicz et al. [13]. Asterisk postprandial distress syndrome symptoms, as defined by Rome III criteria. Dagger epigastric pain syndrome symptoms, as defined by Rome III criteria. FD functional dyspepsia, GERD gastroesophageal reflux disease, IBS irritable bowel syndrome, PRO patient-reported outcome

Finally, the bottom right circle (c) contains the symptoms reported by patients with FD in published qualitative research (non-instrument-based). These concepts include five of the diagnostic criteria from Rome III [3]—postprandial fullness, postprandial nausea, excessive belching, epigastric pain, and upper abdominal bloating—as well as vomiting. Excluding the articles describing the development of FD instruments, only one article was identified that describes spontaneous patient reports of FD symptoms [13]. In this study, which examined the relationship between FD symptoms and dietary patterns, 41 patients [n = 20 with FD (17 women; mean age = 45 ± 3 years; age range = 23–73 years) and n = 21 healthy controls (18 women; mean age = 40 ± 4 years; age range = 20–74 years)] completed symptom diaries in which they recorded any symptoms experienced, their severity, and the time at which they occurred. Subjects reported a total of 612 symptoms, which were divided into meal-related, meal-unrelated, or other symptoms on the basis of the time of their occurrence. Meal-related symptoms were reported as bloating, nausea, upper-abdominal pain, belching, epigastric pain, fullness, vomiting, and discomfort. Information on other symptom categories was not reported. With the exception of vomiting, all meal-related symptoms were redundant with the Rome III diagnostic criteria [3].

3.3 Instrument Evaluation: Conceptual Coverage of Functional Dyspepsia Instruments

A total of 16 PRO instruments assessing symptoms of FD were identified. The conceptual coverages of these instruments were evaluated against the seven primary symptom concepts defined by the Rome III criteria [3]. A total of three instruments were found to measure all seven symptoms: the Dyspepsia Symptom Severity Index (DSSI), Nepean Dyspepsia Index (NDI), and Short-Form Nepean Dyspepsia Index (SF-NDI). A summary of the conceptual coverages of the identified FD instruments is provided in Table 1.

Table 1 Coverage of core concepts from Rome III FD diagnostic criteria in existing instruments

Of the seven primary symptoms of FD, the most commonly measured by existing FD instruments were postprandial nausea/nausea and epigastric pain/pain, with 13 out of 16 instruments containing items assessing these two symptoms. Eleven of the 16 instruments also had items measuring “upper abdominal bloating” or “bloating.” Among the 13 instruments with incomplete conceptual coverage of the seven primary FD symptoms, “epigastric burning” and “postprandial fullness” were the symptoms most often omitted from assessment (omitted in 11 and ten instruments, respectively).

3.4 Instrument Evaluation: Description and Developmental History of Functional Dyspepsia Instruments

Instruments concluded to have adequate conceptual coverage—the DSSI, NDI, and SF-NDI—were further evaluated with regard to the instruments’ recall periods, item wording, response options, scoring, and development histories (Table 2).

Table 2 Assessment of instrument face validity and developmental history

Recall period An instrument’s recall period is defined as the time period that respondents are asked to consider when replying to items. The DSSI, NDI, and SF-NDI all utilize recall periods of the 2 weeks prior to assessment. The FDA has indicated, particularly for symptom assessment (i.e., concepts that vary over short periods of time), that recall periods that ask respondents to focus on their current or recent state (e.g., the past 24 h) are preferable to longer intervals [5]. If FD symptoms vary within a day and/or between days, the instruments reviewed here may raise concern due to potential recall bias and an inability to account for daily variation in FD symptoms.

Item wording An item was considered well-constructed based on the following criteria: contains no medical jargon or slang, double negatives, or overlapping or unbalanced responses; assesses only one concept (i.e., not double-barreled); and does not assess vague or ambiguous concepts. Item wording was deemed adequate in both the full and short forms of the NDI; however, the DSSI includes an item assessing “discomfort,” which may be best measured with multiple items assessing various dimensions of “discomfort.”

Response options Response options comprise the range of choices a questionnaire provides to respondents in replying to individual items. The DSSI, NDI, and SF-NDI all use 5-point Likert-type scales. For the DSSI, the wording of response options was considered clear and appropriate, and response options were appropriately ordered, with strong distinctions offered between choices.

The NDI, however, uses a “bother” scale to assess symptoms, in addition to frequency and intensity scales, and scoring involves the summation of frequency, intensity, and bother scores. Incorporating bother scores may raise concerns because bother is a complex evaluative concept that may encompass aspects of symptom frequency, severity, and associated impact. Additionally, greater opportunity for individual variation exists among participants responding to a measure of symptom bother than for items assessing symptom frequency or severity. Moreover, how individual patients arrive at different responses is potentially unclear. The use of “bother” may also prove difficult to implement in global trials, where translation of the instrument will likely be required [31]. Finally, incorporating multiple scales in one instrument may raise concerns, as certain symptoms lend themselves to assessment along particular dimensions better than others. For example, patients may find it more difficult to assess vomiting in terms of intensity rather than frequency; therefore, the assessment of both frequency and intensity in the NDI may be problematic for some symptoms.

Development history An instrument’s development history consists of the methods used to inform its construction. The DSSI, NDI, and SF-NDI provided some evidence indicating patient involvement in their development. However, few details were provided concerning the characteristics of the participant populations used and the way in which data generated from these activities informed item development. Furthermore, no reference to “concept saturation” (i.e., the point at which no new information would likely be generated upon conduct of additional interviews, a measure of confidence in qualitative research results) was made in any of the instrument development papers. In light of these findings, further evidence is needed to confirm that the instruments adequately assess all symptoms relevant and important to an FD-specific population.

4 Discussion

Functional dyspepsia is a complex disorder defined by upper gastrointestinal symptoms, including epigastric pain and burning, postprandial fullness, early satiation, bloating, belching, nausea, and vomiting, outside of any evident structural disease. Despite the centrality of patient assessment in properly identifying patients and evaluating treatments for this symptom-defined condition, it is unclear to what extent existing PRO instruments for assessing symptoms of FD were developed according to principles consistent with the approach outlined in the FDA PRO Guidance for supporting product labeling. In keeping with the FDA PRO Guidance approach, the objectives of our symptoms and instrument literature reviews were to document, from the perspective of the empirical literature, the primary symptoms of FD, evaluate the extent to which existing instruments target those symptoms, and, finally, identify any missing evidence that would impact their use in product labeling.

The identified symptoms of FD are summarized in the preliminary conceptual model and include symptoms outside the Rome III criteria (i.e., IBS-like symptoms, GERD-like symptoms, and other less obviously clustered dyspeptic symptoms), as well as one additional symptom identified from the qualitative research reports in the literature (i.e., vomiting). Except for vomiting, all of the identified symptoms from the published qualitative research reports were also specified in the Rome III criteria [3]. Heartburn or GERD-like symptoms identified in the symptoms literature review have been shown to co-occur with dyspeptic symptoms and are deemed interconnected to the key diagnostic symptoms of FD. Further, patients with FD frequently experience IBS-like symptoms, making it difficult to separate IBS from the symptoms used as diagnostic criteria of FD. Therefore, it is likely that future qualitative research soliciting direct patient reports of symptoms in the population of patients with FD will elicit not only the symptoms specified in the Rome III criteria but also a larger set of symptoms as conceptual saturation is achieved. Whether the conceptual framework of a future PRO instrument should adhere to a narrow subset of disease-defining symptoms or attempt to incorporate a broader range of FD-predominant symptoms will be informed by qualitative and quantitative data from interviews with patients with FD.

Sixteen instruments were identified in the literature search for existing PROs used in populations with FD. Of these, three were found to measure all seven core symptoms of FD as defined by the Rome III criteria: the DSSI, NDI, and SF-NDI. However, when these three instruments were evaluated in light of regulatory evidentiary recommendations [5], several issues were identified that could jeopardize qualification of the instruments for substantiation of a labeling claim.

Firstly, all three instruments have a recall period of 2 weeks, which is longer than recommended by the FDA PRO Guidance (Sect. D.1, Content validity: recall period) [5]. Generally, the appropriateness of the recall period should be established for the population, disease state, and application of the instrument and, more specifically, factors such as the variability, duration, frequency, and intensity of the concept measured should be considered. Therefore, given the inherent variability of FD symptoms, the advantage of momentary assessment in reducing recall bias, and the stated objective of regulatory acceptance, a 2-week recall period would not likely be considered acceptable.

Secondly, the NDI and SF-NDI assess all symptoms in terms of frequency, intensity, and bother. Certain symptoms, such as vomiting, lend themselves to assessment by certain dimensions (e.g., frequency) better than others (e.g., intensity); bother, as an evaluative concept, is complex and may encompass aspects of symptom frequency, severity, and impact. Further, qualitative evidence of the content validity of these instruments is unavailable in the empirical literature. In particular, although concepts were elicited from patients during the development of these instruments, no evidence that a conceptual framework was developed or that conceptual saturation was achieved is evident. Lastly, there is a lack of evidence regarding cognitive debriefing across all three instruments. For comparison, a review of the literature conducted by Ang et al. [32] to identify studies of patients with FD that employed clinical outcome assessments similarly concluded that no existing instruments appeared to be sufficiently content valid for use in a Rome III-defined FD trial.

In light of the preliminary conceptual model and the deficiencies of these existing instruments with respect to the approach outlined in the FDA PRO Guidance, fundamental modifications and new documented evidence would be a prerequisite for deploying any of these instruments in a biopharmaceutical development program targeting an FDA-approved product label. Therefore, in consideration of the clear clinical need, the WG recommends the development of a new PRO instrument to assess FD symptoms and, in particular, the assessment of symptom improvement. The conceptual model for FD symptoms derived from our symptoms literature review can serve as a foundation for the future qualitative research necessary for the development of such an instrument, and will be informed by data from qualitative interviews in patients with FD as well as refined throughout the instrument development process.

The next step in the development of a new FD symptom-focused PRO instrument is to conduct concept elicitation interviews among patients with a Rome III-confirmed FD diagnosis to identify and document a comprehensive set of symptoms from the patient perspective. To this end, the WG has developed a semi-structured interview guide based on the present work to facilitate qualitative research in a carefully selected group of patients with FD, which will provide documented evidence suitable for qualification under the FDA’s DDT program [33]. As it is evident that a number of symptoms within the conceptual model developed from the literature overlap with those of other functional gastrointestinal disorders, key objectives of future concept elicitation interviews will be to establish whether these overlapping concepts are important and relevant symptoms from the perspective of patients with FD, and to gather evidence that development of a well-defined and reliable patient-reported measure to assess treatment benefit is possible in FD.

5 Limitations

The preliminary conceptual model currently includes concepts assessed by PRO instruments that, although used in studies to assess FD symptoms, were not initially developed in an FD population. The FDA PRO Guidance [5] outlines the importance of ensuring that “the items and domains of an instrument are appropriate and comprehensive relative to its intended measurement concept, population, and use”. Therefore, it is possible that qualitative research may demonstrate that some of the concepts currently included in the conceptual model are not important or relevant to patients with FD.

Owing to our chosen eligibility criteria, articles were excluded if they concerned FD in a pediatric population, concerned symptom assessment in a mixed population (i.e., population was not exclusive to patients with FD and included patients with other gastrointestinal conditions), reported data on patients recruited into a study based on pre-Rome II criteria, reported FD symptoms without specifying a qualitative methodology or naming a specific questionnaire by which they were collected, focused on questionnaires used for diagnostic purposes only (not study endpoints), or were derived from non-peer-reviewed research (e.g., conference proceedings or dissertations). Therefore, it is possible that instruments that could have been modified and redeveloped may have been excluded. However, in light of the results of our instrument evaluation, this possibility appears unlikely.

6 Conclusion

No existing PRO instruments were identified that assessed all seven core Rome III symptoms of FD and adhered to principles consistent with the instrument development approach outlined in the FDA PRO Guidance for supporting product labeling. In light of these findings, the WG recommends the development of a new PRO instrument to measure FD symptoms, and provides a preliminary conceptual model for consideration in the requisite qualitative research.