1 Introduction

Today, patients are increasingly involved in information gathering and decision making at all levels of the healthcare system [1]. Patient self-reports of adverse drug events (ADEs) are an important additional source of information on the safety of drugs because they differ from healthcare professional reports [27]. Healthcare professionals often underestimate symptomatic ADEs experienced by patients [7, 8]. The added value of patient reports is acknowledged by the Food and Drug Administration (FDA) as well as the European Medicines Agency [9, 10]. The FDA advises the use of patient-reported outcome (PRO) questionnaires for the measurement of outcomes that are best known by patients [9] (e.g., pain [11]). In PRO questionnaires, the patient is the direct source of information without interpretation of the responses by a healthcare professional [9, 12].

Patient-reported ADE questionnaires can be open-ended or checklist based. Compared to open-ended questionnaires, checklist-based questionnaires are more sensitive in identifying potential ADEs [13, 14]. However, these methods may lack specificity in the detection of true ADEs [13]. Adding questions per ADE on its nature and causality might solve this problem. To assess unknown ADEs of (new) drugs and comparing ADE profiles of different drugs, a generic PRO questionnaire is needed that can measure all possible ADEs [13, 15]. Most available patient-reported ADE questionnaires focus on specific ADEs, such as gastrointestinal ADEs [16] or ADEs specific for a drug class, such as inhaled corticosteroids [17] or chemotherapy [18]. Previously, a generic questionnaire was developed that contained approximately 600 symptoms classified by body category [19]. More recently, a questionnaire with 84 ADEs classified in 19 body categories was developed [3]. Although both questionnaires have been piloted, no explicit validation has been reported. Furthermore, both questionnaires lack questions supporting causality assessment and questions about the nature of the ADE such as those regarding seriousness, severity, frequency, and time course, which are relevant attributes in the evaluation of the ADE [20, 21].

The aim of our study was to develop and test a generic questionnaire for identifying ADEs and assessing their nature (e.g., frequency, severity) and causality as reported by patients. We tested the content validity and feasibility of the questionnaire as well as the reliability for reporting ADEs.

2 Method

The study consisted of three parts: (1) development of a draft ADE questionnaire, (2) content validation and revision of the questionnaire in an iterative process, and (3) feasibility and reliability testing of the revised questionnaire.

2.1 Questionnaire Development

The questionnaire consists of four sections with questions about: (1) general patient characteristics; (2) drug use in the past 4 weeks, diseases for which these drugs were used, whether the patient had other diseases; (3) ADEs experienced in the past 4 weeks using structured checklists; and (4) for each ADE a question to describe the ADE in the patient's own words with additional questions about its nature and causality. We expected that a period of 4 weeks would be sufficient for capturing a wide range of ADEs for which patients would be able to recall the relevant details. In the development phase, ADEs were selected, named, coded, and categorized into a body category, and questions were constructed to assess the nature and causality of the ADEs.

2.1.1 Adverse Drug Event (ADE) Selection and Naming in Lay-Terms

We aimed to include a wide range of common symptomatic ADEs. We identified possible ADEs from the Common Terminology Criteria for Adverse Events version 4.0 [22], and existing symptom and ADE checklists [3, 13, 18, 2329]. Patient-reported data about ADEs from the Lareb Intensive Monitoring System of The Netherlands Pharmacovigilance Centre Lareb [30] were used to translate ADEs into lay-terms. We excluded ADEs based on laboratory results (e.g., hyperkalemia) and those related to specific devices (e.g., uncomfortable pressure of the mask). The first selection included 252 possible ADEs with an open-ended option for reporting "other" experienced ADEs.

2.1.2 Coding of ADEs

Two researchers (SdV and PD) independently coded each lay-term ADE to a lowest level term of the Medical Dictionary for Regulatory Activities (MedDRA®) terminology version 13.0, making use of codings suggested by pharmacovigilance experts from Lareb. MedDRA® is the international medical terminology developed under the auspices of the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH). Agreement between the codings existed in 74 % of the ADEs. Dissimilarities were resolved by discussion, and translation of the Dutch lay-terms into English by a professional translator was used to reach agreement on all MedDRA® terms. Two ADEs, "Bone fracture or fractures" and "Stroke," were classified at a higher hierarchical ADE group definition because of their nonspecific nature. One ADE (dry teeth) showed overlap in the MedDRA® terminology with another included ADE (dry mouth), and they were therefore combined.

2.1.3 Categorization of ADEs

To increase the efficiency of completing the questionnaire, the ADEs were classified in body categories. By first checking body categories in which patients experienced ADEs, they were directed to short checklists of specific ADEs within that body category. These lists with specific ADEs also include the option to report other ADEs. The body categories in the initial questionnaire were based on the classification used in the MedDRA® and in existing questionnaires [3, 19].

2.1.4 Assessing Nature and Causality of ADEs

Relevant known attributes of ADEs were duration, frequency, severity, and seriousness of the ADE; its impact on activities; and the patient’s benefit–risk assessment of the drug [24, 3032]. Existing questionnaires were screened for questions covering these topics [26, 27, 3335]. Questions regarding causality were included, based on medical [36], and patient-reported considerations [37].

2.2 Content Validation

The draft questionnaire was subjected to cognitive debriefing interviewing to eliminate ambiguity in questions and answer options. Cognitive debriefing is a qualitative interview method in which the patient’s understanding and interpretation of items and answer options of the questionnaire are assessed [38, 39]. A separate classification task was used to assess the appropriateness of the body categories.

2.2.1 Study Population

Patients included in the study were 18 years or older; diagnosed with type 2 diabetes, asthma, and/or chronic obstructive pulmonary disease (COPD); using drugs for these conditions; and able to speak, read, and write the Dutch language. Patients with these diagnoses were included to cover a population with a broad age range in which many different types of drugs are commonly used, both daily and as needed. Eligible patients were recruited by three general practitioners and two dieticians in the northern part of The Netherlands in 2011–2012.

2.2.2 Study Procedure

After signing informed consent, patients completed the questionnaire during which they were observed by a researcher (SdV) to detect any problems with completing the questionnaire. Immediately thereafter, a semi-structured interview was conducted using a topic list based on the "question-and-answer" model [38, 39]. A subset of patients was asked to do a classification task, for which all ADEs were randomly split into five lists. Patients were instructed to classify each ADE of one list into a body category. Each ADE was classified by at least four patients.

2.2.3 Analyses

The audio-recorded interviews were transcribed verbatim, and transcripts were screened by two researchers (SdV and PD) to identify problems in understanding the questions and answer options. The questionnaire was adapted in an iterative process by which changes were made addressing detected problems until no new problems were identified regarding understanding the questions and answer options (Fig. 1) [38].

Fig. 1
figure 1

Iterative process in adapting the developed questionnaire to a content-validated questionnaire

Regarding the classification task, we considered an ADE classification as problematic when more than two patients classified the ADE in body categories different from our original classification, or when two patients were consistent in choosing a different category. These problematic ADEs were subsequently judged by four additional patients and a pharmacovigilance expert. Based on their judgements, revisions were made. This revised questionnaire was then translated from Dutch to English by a professional translator. The English version was screened for differences with the original Dutch version through informal back translation by the researchers, and final changes were made. A Web-based version of the content-valid questionnaire was then constructed using the Unipark Enterprise Feedback Suite 8.0 version 1.1 (http://www.unipark.de).

2.3 Feasibility and Reliability Testing

The Web-based version was used to assess the feasibility of completing the questionnaire, its ability to measure the ADEs in a consistent manner (test–retest reliability), and to assess the impact of using body categories on feasibility and ADE reporting.

2.3.1 Study Population

Included patients were aged 18 years or older, had been dispensed an oral glucose lowering drug, had an e-mail address, and were able to access the Internet. These patients were recruited via pharmacists in the northern part of The Netherlands in 2012.

2.3.2 Study Design and Procedure

In a test–retest design study, consenting patients received an e-mail message with the URL (uniform resource locator) to open the Web-based version. A personal login code was used to prevent multiple completions of patients [40]. After completion of the ADE part, questions were asked regarding feasibility, including self-reported time to complete the questionnaire and ease of use on a five-point Likert scale. In addition, the total time between opening and closing of the digital questionnaire was logged (registered time), as well as the proportion of patients completing the questionnaires, and the number of ADEs reported in the "other" category. One week after completion, patients received an e-mail for the second questionnaire for the reliability analysis.

Patients were randomly assigned to three groups using simple randomization [41] to receive: (A) the same questionnaire twice (the “test–retest group”); (B) a questionnaire with the body category structure at the first measurement (T1) and without these categories at the second measurement (T2) [the “group with body categories at T1”]; or (C) reversing the order used in B (the “group with body categories at T2”).

One reminder was sent to the patients who did not complete the first questionnaire within a month. Patients who did not complete the second questionnaire were send a reminder twice. We aimed to include about 50 patients per group, which has been reported as a reasonable number for reliability studies [42].

2.3.3 Analyses

Differences in sex and age between responders and nonresponders were assessed using Chi-square and Mann–Whitney U tests. Descriptive statistics were used for the feasibility parameters, including self-reported completion time, ease of use, proportion of patients completing the questionnaires, and number of ADEs reported in the “other” category. ADEs that were reported as “other” were evaluated and, if possible, classified by the researchers within the provided ADE lists. To assess the number of chronic diseases, we classified each self-reported disease in 1 of 12 chronic diseases, excluding conditions of normal ageing (e.g., loss of hearing).

We measured the agreement between ADE reporting at T1 and T2 at three levels: any ADE at “patient level,” similar ADEs at primary System Organ Class “MedDRA® level,” and the same ADE at the lowest description “ADE specific level.” Cohen’s kappa coefficient and proportion of positive agreement were calculated as measures of agreement. Especially at the lowest level, where specific ADEs will be checked by few patients, the kappa statistic is negatively affected by the skewed distribution and proportion of positive agreement has been proposed as an alternative [43]. The proportion of positive agreement was calculated by the formula 2a/[N + (a − d)], in which N is the total number of observations, a is the number of patients reporting ADE at T1 and T2, and d is the number of patients not reporting ADE at T1 and T2 [44]. Kappa and proportion of positive agreement values of >0.5 were considered to be acceptable [45]. We conducted additional analyses aggregating experienced ADEs using the patients’ own description of the ADEs. Based on these descriptions, two researchers (SdV and PD) clustered ADEs that were checked as separate ADEs but described by the patients as being one problem. Although one might expect that this clustering is similar to the aggregation at MedDRA® level, it is possible that patients use terms from different MedDRA® classes to describe one problem. For instance, goose bumps, shivering, and cold limbs can be seen as one problem by the patient but are coded in different primary MedDRA® System Organ Classes. Misclassification can also occur when patients check similar but not the same symptomatic ADEs at T1 and T2. Finally, we calculated how often patients checked a symptom only as a symptom at one time point but as a possible ADE at another time.

The effect of including body categories was tested by comparing feasibility parameters and the number of reported ADEs between the questionnaire with body categorization and without at baseline, using Chi-square and Mann–Whitney U tests. Additionally, the agreement values of the group with the body categories at T1 and the group with the body categories at T2 were compared using the normal curve deviate statistic (Z value) [46].

Sensitivity analyses were conducted to investigate whether the number of days between completing the first and second questionnaire influenced the agreement values. All analyses were conducted using IBM SPSS Statistics version 20 (Armonk, New York, USA). P-values of <0.05 were considered to be statistically significant.

3 Results

3.1 Questionnaire Development

The initial version of the questionnaire contained 252 ADEs categorized in 21 body categories, and 11 questions regarding the nature and causality assessment for every ADE identified.

3.2 Content Validation

Twenty-eight patients, 54 % of them women, participated (Table 1). Ages ranged from 22 to 90 years, with a median of 61 years. Almost all patients used more than one drug.

Table 1 Patient characteristics of content validation

3.2.1 Content Validation, Cognitive Debriefing Interviews

Based on the cognitive debriefing interviews, the questionnaire was revised 14 times. This included a revision of the general structure of the questionnaire, and a major revision by asking for ADEs as well as symptoms. The final revision was tested in five patients and no major problems in the interpretation of questions and answer options were detected. Problems detected in the questionnaire are presented according to the domains of the question-and-answer model, with examples given in Table 2.

Table 2 Examples of issues mentioned by patients per topic during the cognitive debriefing

Wording of the body categories and ADEs was generally clear for the patients (Table 2: “Comprehension”). Several ambiguous interpretations, reading difficulties, and vague statements were reported by patients regarding specific question and answer options, which were subsequently changed. Eight patients reported that the recall period of 4 weeks for the experienced ADEs was short (Table 2: “Retrieval”). Because this did not reflect the content validation, no changes regarding the recall period were made during the study period.

The initial questionnaire asked to indicate “experienced ADEs.” However, it became clear that patients, when confronted with a checklist of possible symptomatic ADEs, incorrectly started to check symptoms that they actually did not see as ADEs (Table 2: “Judgement”). Asking to check both experienced symptoms and ADEs solved this problem. The answer option “do not know” was added because some patients were not sure whether the experienced symptom was related to a drug they used. Almost half of the patients either skipped the body categories to go directly to the specific checklists (navigation) or had difficulties in deciding which body category their symptom might be classified into. Other patients who used the body categories found them helpful and easy to use. As a result, we kept the body category structure as a supportive step in the questionnaire, but patients no longer needed to check body categories before going to the specific checklists.

Answer options that did not fit with the judgements of the patients were detected and adapted, and answer options were added (Table 2: “Response”). The answer options of the question “how often did you experience this side effect during the past 4 weeks (on how many or which days)?” were changed multiple times. Problems remained especially for intermittently occurring ADEs, and this question was therefore adapted into an open-ended question (Table 3).

Table 3 Comparison of questions regarding nature and causality of ADEs between initial and last revision

Two questions were added to the initial questionnaire because they yielded additional information regarding causality (Table 3). One question was added to cover an additional attribute, namely actions taken (Table 3).

One patient reported difficulties with the sequence of the questions per ADE (Table 2: “Respondent burden”). This was improved by clustering the topics of the questions. One patient had some problems with the size of the letters in the questionnaire (font size 11, Arial), but none of the other patients reported such reading difficulties. Problems regarding navigation in the questionnaire, especially due to layout issues were detected and resolved. After seven interviews, the questionnaire was split into two distinct parts, separating the specific questions about the ADEs from the first part of the questionnaire. Two patients mentioned that they felt many questions per ADE were included but that this was not a problem for them. Comments on the length and number of answer options of a causality question led to shortening these phrases (Table 3).

3.2.2 Classification Task

Based on the classification task, where the patients had to assign ADEs to body categories, 51 problematic ADEs (20 %) were detected. As a consequence, we made the following adaptations: shifting the ADE to a more fitting body category (5 ADEs), renaming the ADE (2 ADEs), a combination of shifting and renaming of the ADE (2 ADEs), renaming a body category (8 ADEs), combining body categories (16 ADEs), and creating a new body category (6 ADEs). For 12 ADEs, no changes were made.

3.2.3 Final Revision

Based on the English translation, one ADE was detected that was considered ambiguous in the original Dutch version. To solve this, two ADE descriptions instead of one were introduced (“blood with feces” and “blood in feces”). The comparison of the English version with the Dutch version resulted in a few minor changes regarding the wording in both versions. Finally, after combining the ADEs with an overlapping MedDRA® term (dry teeth/mouth), the final questionnaire contained 252 ADEs categorized in 16 body categories with 14 questions per ADE regarding its nature and causality (Appendix I ADE questionnaire, Appendix II Questions per reported ADE, and Table 3).

3.3 Feasibility and Reliability

In total, 187 patients gave informed consent in response to an invitation that was mailed to 958 patients. These 187 patients were slightly younger (65 vs 67 years, Z = −2.653, P = < 0.01) than patients not responding. There was no significant difference regarding sex (39.6 vs 44.7 % women, χ 2 = 1.638, P = 0.20). Of the consenting patients, 152 started with the study by opening the questionnaire, and 137 completed both questionnaires (73.3 %). Four times, a patient reported an ADE in the “other” box, which could all be classified to one of the listed ADEs by the researchers. One patient reported in the comments that the reported ADE was probably not due to a drug but due to surgery. This ADE was excluded from further analysis. One patient was excluded from the test–retest analysis for reporting to have experienced the “same symptoms” at T2 as at T1, instead of checking the symptoms again. Another was excluded because of this patient’s comment that several symptoms had been wrongly checked. Further analyses were thus based on 135 patients, 45 in each group. The median age of this population was 65 years; on average, they used five prescription drugs (Table 4). The median number of days between completing the first and second questionnaires was 8 days (SD 4).

Table 4 Patient characteristics, number of adverse drug events (ADEs) reported per group (P-values for differences among the three groups)

At T1, 25.2 % (N = 34) of the 135 patients reported one or more ADEs, and 27.4 % (N = 37) at T2. In total, 173 ADEs were reported at T1, and 146 ADEs at T2. The most common type of ADEs were gastrointestinal disorders (Table 5). Less than 1 % of the questions about the nature and causality of the ADE were not completed (0.4 % missing at T1, and 0.2 % at T2). For most ADEs (124 at T1 and 96 at T2), patients checked only one reason for suspecting the ADE. The most common reason was that they did not experience the symptom before they took the drug. In three quarters of the cases, the patients indicated which drug they thought caused the symptom, and in most of these cases they were quite sure about the relationship between the drug and the ADE (Table 5). Finally, there were 51 cases where a symptom was reported only as a symptom at one point but as a possible ADE at another time (22 times as symptom at T1 but ADE at T2, and 29 times as ADE at T1 but symptom at T2).

Table 5 Nature and causality reported at adverse drug event (ADE) level for the three groups at first (T1) and second (T2) measurement

Self-reported time for questionnaire completion was in general lower than the registered time (Table 6). On average, the median self-reported time was 15 min for patients not reporting any ADE (with three patients reporting >30 min), and 30 min for those reporting one or more ADEs (with four patients reporting >60 min). Differences observed in completion time between the questionnaire with and without body categorization were not significant (Table 6). Most of the patients agreed that the questionnaire was easy to use (74.4 % for the questionnaire with body categories; 75.6 % for the questionnaire without body categories), which did not significantly differ between the two versions of the questionnaire (χ 2 = 0.028, P = 0.986). Overall, this percentage was lower for patients reporting one or more ADEs than for patients not reporting any ADE (52.9 vs 82.2 %, χ 2 = 12.791, P = 0.002).

Table 6 Time in minutes needed to complete the questionnaire with and without body categories for reporting no adverse drug event (ADE) and one or more ADEs

The agreement of reported ADEs regarding the test–retest reliability was acceptable at patient level and at MedDRA® level (κ > 0.5, proportion of positive agreement >0.5). At ADE specific level, the agreement was lower (κ = 0.38, proportion of positive agreement = 0.38, Table 7). By aggregating separately checked but related ADEs according to the patient’s own description, the 64 ADEs reported at T1 were reclassified as 34 distinct ADEs, and the 51 ADEs at T2 as 31 distinct ADEs. There was agreement for 16 of these ADEs and the proportion of positive agreement was 0.49.

Table 7 Kappa values and proportion of positive agreement for test–retest reliability and body categories at patient level, MedDRA® level, and ADE-specific level

Agreement between the two measurements was slightly higher for patients who completed the questionnaire including body categories at first measurement in comparison to those who first completed the questionnaire without this categorization. However, kappa values did not significantly differ between the group with the body categories at T1 and the group with the body categories at T2 (Table 6). The two-by-two tables of the agreement analyses are presented in Appendix III. The number of reported ADEs was similar between the questionnaire with and without body categories (Z = −0.049, P = 0.961). Sensitivity analyses including only those patients who completed the second questionnaire within 10 days did not lead to significant differences in agreement measures (Appendix IV).

4 Discussion

We developed and tested a generic questionnaire for patient reporting of ADEs. The questionnaire adds to the available questionnaires in that it is both generic and checklist-based and includes specific questions about causality, severity, duration, seriousness, and frequency of each experienced ADE. The questionnaire is intended for use in postmarketing studies and clinical trials.

Through cognitive debriefing interviews, significant problems were detected in several domains of the question-and-answer model that needed to be resolved. After initial adaptations, some problems reoccurred, underlining the relevance of an iterative process. The input of patients was found to be vital for the development and content validation. It became clear that directly asking for ADEs can lead to over-reporting because some patients accidently checked symptoms as well as ADEs when confronted with a list of symptomatic ADEs. While going through the lists, patients sometimes forgot that they should only check symptoms perceived as being ADEs. This happened even while patients were able to distinguish ADEs from symptoms, as has been established before [37, 47]. Some of the available checklist-based ADE questionnaires use terms such as symptoms, problems, and ADEs interchangeably (e.g., see [18, 27]). We recommend clear differentiation between symptoms that could be related to the underlying disease and ADEs, as is done in other checklists [23], ensuring that respondents maintain the distinction while completing the questionnaire. This mechanism may explain in part why more ADEs are reported in checklists than in open-ended questionnaires [13].

Several patients reported that a recall period of 4 weeks was quite short, for instance, to capture ADEs that fluctuate over time, as has been identified before [48]. On the other hand, the period should not be too long when the aim is to collect information on symptomatic ADEs that can be mild in nature. The optimal recall period may depend on the nature of the ADE [48]. Although a recall period of 4 weeks is quite common, and even shorter recall periods have been used in ADE questionnaires [17], the reliability of various recall periods needs to be tested in further studies.

Reducing respondent burden is relevant for the feasibility of using the questionnaire. We identified problems in navigating the questionnaire and these were solved by formatting the questionnaire along principles of cognitive design [49]. Around half of the patients found the body category structure helpful, but we detected some difficulties with our initial ADE classification based on the MedDRA® System Organ Classes. We thus adapted this to a more patient-based classification system. The feasibility test showed, however, that the categorization structure only marginally decreased the time to complete the questionnaire for patients reporting at least one ADE. Only four ADEs were reported as “other,” indicating that most patients were able to identify their experienced ADE within the provided lists. For most of the patients reporting at least one ADE, the time needed to complete the questionnaire was <60 min. In our opinion, this time is acceptable for a questionnaire intended for research purposes, in which questions about general characteristics and drug use were included. It should, however, be noted that only a quarter of patients reported at least one ADE. The majority of the patients agreed that the questionnaire was easy to use, but this number was lower for those reporting an ADE than those reporting no ADEs. Of the patients who opened the questionnaire, around 10 % were lost to follow-up.

Although the test–retest reliability of the patient-reported ADE questionnaire was considered acceptable at patient level and at MedDRA® level, it was below the threshold of 0.6–0.8 recommended for reliability coefficients [50]. For ADE reporting, however, a skewed distribution is observed where many patients report no ADEs on both measurements, which decreases the kappa values used for the reliability assessment [51, 52]. Formulas to adjust for such effects have been proposed, for example, the prevalence-adjusted bias-adjusted kappa [53], but their inappropriateness has also been demonstrated [51]. We therefore calculated the proportions of positive agreement as an alternative agreement measure, which showed similar results. Future studies assessing the reliability of ADE reporting are advised to recruit a more balanced group of patients experiencing and not experiencing ADEs [51]. Based on a combined approach, that is, looking at kappa values, alternative agreement measures, and additional analysis of ADEs at patient level, we conclude that our questionnaire was not sufficiently reliable at the ADE-specific level. This result implies that the distinct symptoms reported by patients as ADEs using these checklists should not be used blindly to quantify rates at the lowest ADE-specific level. Part of the lack of reliability might be solved by improving the questionnaire, but some lack of reliability at the lowest ADE level could be inherent to patient reporting.

One can expect that uncertainty by patients about a symptom being an ADE may lead to inconsistent answers. The finding that some patients checked a symptom as an ADE on one measurement but not on the other indicates such uncertainty. Furthermore, in around half of the cases the patients did not mention a potential drug that they believed was causing that specific ADE or were not very sure about the causal relationship. On the other hand, some of the inconsistency was caused by using a checklist that does not require differentiation between related and disparate ADEs. Patients often checked multiple related ADEs, but not exactly the same ADEs on the two measurements. When aggregated at MedDRA® level or using the patient’s own descriptions, patients were therefore found to be more consistent. This problem could be a consequence of direct patient reporting; that is, reporting without involvement of a healthcare professional who can interpret and cluster specific symptoms to a more general ADE description. However, a more intelligent questionnaire flow or an interactive questionnaire, might solve this problem. For instance, using an interactive questionnaire requiring patients to cluster related symptoms that are considered as one problem before they move to answer more detailed questions. Such a questionnaire should incorporate a more flexible linkage to the MedDRA® System Organ Class by not only focusing on the primary MedDRA® class. This prevents symptoms with different primary MedDRA® classes used to describe one ADE being classified in different MedDRA® classes. Notwithstanding these possible improvements to the questionnaire, some patients clearly checked totally different ADEs at the two measurements. We chose a period of 1 week between the measurements to exclude memory effects, but this period may have been too long to exclude true changes in the experience of ADEs in the previous 4 weeks, especially for ADEs that might change from day to day [12].

The comparison between the questionnaire with and without the body category structure showed no significant differences in the number of reported ADEs or in agreement measures. From this we conclude that including a body category system did not influence the reliability of the ADE reporting. Because the cognitive debriefing showed that the body categories were helpful and increased the feasibility for some patients, we still recommend the use of such a categorization as a supportive element.

To our knowledge, this is the first study to validate a generic patient-reported questionnaire intended for systematic data collection of ADEs. We conducted a broad search for symptomatic ADEs, which we translated in lay-terms and linked to MedDRA® terms. The use of these standard terms makes it possible to compare ADE data across different studies, which is important in the evaluation of drug safety [54]. We included a heterogeneous population with respect to age and education level in the content-validation study. Patients were selected for having type 2 diabetes, asthma, or COPD, but many of them used multiple drugs, and also used drugs for other diseases. We expect that the questionnaire is suitable for adult patients on a steady drug regimen who are able to read and write. We cannot, however, guarantee that all ADE terms are content valid. In addition, we tested the Dutch version of the questionnaire. The use of the questionnaire in other languages requires additional testing [55]. We expect that the reliability for ADE reporting of the Web-based version is comparable to the paper-based version. The navigation through the questionnaire and the time needed to complete, however, may differ between the Web-based and paper-based versions [56]. We tested the questionnaire in an observational, postmarketing setting. We expect that the questionnaire is also applicable in clinical trials in which patients are initial drug users, but this should be confirmed in future studies. Further validation studies are needed (e.g., establishing the probability of a causal relationship between the reported ADEs and the drugs using an external reference) because content validation is an essential but only first step in providing evidence of full validity [12, 57, 58].

5 Conclusions

Participants in postmarketing studies and clinical trials can use multiple drugs that may interact and cause unexpected ADEs. Using a generic questionnaire in which all experienced ADEs can be reported by patients is therefore important. In terms of content validity, our patient-reported ADE questionnaire can be used for assessing the nature and causality of symptomatic ADEs as experienced by patients undergoing chronic drug therapy. The questionnaire is feasible for research purposes, and reliable to identify numbers of patients experiencing ADEs in general and at MedDRA® System Organ Class level. To quantify specific patient-reported ADEs, improvements to the structure of the questionnaire are required.