FormalPara Key Points

We developed a 43-item checklist, entitled the RIMES statement, to assess the reporting quality of risk minimization evaluation studies in order to support more standardized, transparent reporting study results.

Our findings showed that the checklist had good inter-rater reliability, both overall and for the four subscales (Key information; Design; Implementation; and Evaluation).

We conclude with a proposal for further validating and refining the checklist to increase its practical appeal and usefulness.

1 Introduction

Ensuring the safe and appropriate use of medicines is an important public health priority, particularly in light of the rapid growth worldwide in prescription drug use [1]. Although product labeling serves as the basis for safe medication use, additional measures to minimize risks can be mandated by regulatory authorities in certain circumstances for products with serious safety concerns [2, 3]. These risk minimization programs can be imposed as either a condition of marketing authorization approval (most commonly), or as a condition to permit continued marketing authorization.

Marketing authorization holders of medicinal products are responsible for designing, implementing and evaluating these programs. Typically, however, sponsors must rely on healthcare professionals (alone or in conjunction with other third parties such as continuing medical education providers) to implement the actual intervention components [4, 5]. Other defining hallmarks of risk minimization interventions include the fact that they target multiple audiences (e.g. healthcare professionals, patients, caregivers, lay audiences), feature multiple measures or ‘components’ (e.g. risk communication, training of healthcare professionals, prescriber certification), span a range of socioecological levels (e.g. individual patient, healthcare system), involve multiple different types of implementers (e.g. physician prescriber, pharmacist, informal caregivers), and require implementation across multiple settings (e.g. inpatient, outpatient, home) and geographic areas (e.g. regions, countries, urban, rural).

Collectively, these characteristics define what is known as a ‘complex’ intervention [6]. Evaluating complex interventions requires ascertaining not only whether the actual intervention achieved the desired impact, but under what conditions it did so, for whom, and whether the impact was sustained over time [7]. Undesired or unanticipated impacts may also need to be captured, such as discontinuation of treatment or channeling towards inappropriate or suboptimal treatments.

Evidence of risk minimization program effectiveness is critical for demonstrating to regulatory authorities that a product’s benefit-risk balance remains positive. Sponsors are encouraged to publish the results of their risk minimization program evaluations in order to build the risk minimization evidence base [3]. Additionally, the European Medicines Agency is legally required to make public both the protocols and abstracts of results of the post-authorization safety studies initiated, managed or financed by a marketing authorization holder, including those on risk minimization effectiveness [8].

Improving the effectiveness of risk minimization programs is a priority within the pharmacovigilance community [7,8,9]. However, to date, the number of risk minimization evaluation studies reported in the peer-reviewed literature has lagged far behind the number that have been implemented thus far [10]. Of those evaluations that have been published, methods and results have been inconsistently reported, making it difficult to evaluate their methodological quality and to interpret the results [11]. Common shortcomings in reporting include a failure to specify the intervention’s purported causal mechanism(s) of risk minimization intervention and its relation to short-, intermediate- and long-term intended outcomes, inadequate information regarding the process of implementation and the healthcare context in which the intervention was delivered, limited correspondence between the stated intervention aim and the selected effectiveness measures, and an absence of predefined thresholds for effectiveness determination [12].

Over the past decade, numerous reporting checklists have been developed to standardize reporting of results of different types of studies, thereby building the evidence base for clinical and public health practices. Such checklists include, for example, CONSORT (clinical trials), STROBE and GRACE (observational and epidemiological studies), TREND (public health intervention evaluation studies), SQUIRE (healthcare systems), WIDER (knowledge transfer), and GREET (educational interventions and teaching) [13,14,15,16,17,18,19,20]. Recently, there has been a call to improve the evidence base also underlying risk minimization interventions [3, 11, 21]. In order to address this call, a standard is needed for gauging the reporting quality of risk minimization evaluation studies. Notably, however, existing reporting checklists have limited applicability for the purposes of assessing the reporting quality of risk minimization evaluation studies. First, such checklists focus either on randomized designs or on one particular type of non-randomized design. To date, experimental study designs (e.g. randomized controlled trials), have not been used for the purposes of evaluating risk minimization programs because regulators have required that these programs be implemented across the entire targeted population. As a result, a variety of other non-randomized types of designs have been used (e.g. observational, interrupted times series), including mixed methods approaches that combine both qualitative (how, why) and quantitative (how much) research in order to gain a fuller understanding of the risk minimization program impact and the factors that contributed to its success or failure [11].

Extant checklists also fail to address why and how specific risk minimization program measures were selected, how they were designed, the process and context of program implementation, who was reached by the intervention, what ‘dosage’ amount was received (i.e. degree of exposure to program activities, such as, for example, completing all educational requirements), whether and to what extent different healthcare delivery settings adopted the program, and the degree to which intervention delivery was sustained over time. In particular, both the process and context of implementation are important to assess because risk minimization interventions, unlike clinical trials, are conducted under ‘real-world’ conditions in which both participants and participating settings are heterogeneous, implementers vary in terms of degree of commitment and relevant skills or expertise, and time and other resources are constrained. Information on the process and context of implementation can shed light on the mechanism(s) of change, help identify the circumstances under which the intervention works best, and aid in interpreting evaluation results, including negative, inconclusive, or positive intervention effects. Not least, it can also provide insight on unintended effects, whether negative or positive in nature [7, 22].

To address this gap, the Benefit-Risk Assessment, Communication, and Evaluation (BRACE) Special Interest Group (SIG) of the International Society for Pharmacoepidemiology (ISPE) sought to develop a common set of criteria to assess the quality of information reported in risk minimization evaluation studies [22, 23]. These criteria, designated as the Reporting recommendations Intended for pharmaceutical risk Minimization Evaluation Studies (RIMES) statement, were intended for use by regulatory bodies, industry, academic and journal editors and reviewers. The goals of the checklist were to (1) assess the quality of risk minimization evaluation studies; (2) improve the interpretation and usefulness of risk minimization evaluation study results; (3) increase awareness among key stakeholders regarding evidence-based standards in the field of risk minimization; (4) establish a reporting platform that bridges across the relevant sciences, including public health, health communication science, behavioral medicine, health services research and pharmacoepidemiology; and (5) promote, via reporting standardization and quality improvement, the inclusion of published risk minimization evaluation studies into systematic reviews. The latter goal is especially important given the paucity of published literature on risk minimization evaluation studies by drug or drug class. In this regard, the RIMES statement could help facilitate systematic reviews of evaluations of specific categories of risk minimization interventions (e.g. those pertaining to controlled distribution systems or healthcare provider communication plans), such as have been conducted for different types of behavioral health interventions [24].

The RIMES statement was developed explicitly as a tool for assessing the quality of the information reported in risk minimization evaluation studies, not as a checklist for evaluating the quality of these studies themselves. Ultimately, however, widespread adoption of the RIMES statement could lead to improvements in the quality of evaluation study design and, in so doing, generate better evidence on the effectiveness of risk minimization interventions for regulatory decision making. In turn, better evidence regarding the effectiveness of risk minimization interventions—which programs, for example, work best for whom and under what circumstances—should enhance the quality of risk minimization programs themselves [16]. The purpose of this study was to develop an initial version of the RIMES statement and test its reliability in a sample of recently published risk minimization evaluation studies.

2 Methods

We convened a multidisciplinary team of professionals (authoring team) with expertise in therapeutic risk management, regulatory science, public health, pharmacoepidemiology and behavioral medicine to develop the checklist. The development process involved a series of four consecutive steps consisting of (1) initial development of a checklist; (2) piloting; (3) individual checklist item revisions; and (4) inter-rater reliability testing. These steps are described in greater detail below.

2.1 Development of the Initial Version of the Checklist

The initial development of the checklist was guided by a theoretical framework, a review of existing reporting checklists, and leading texts on public health and risk minimization intervention design, implementation and evaluation [14, 16, 25,26,27,28,29,30,31,32,33,34].

2.1.1 Theoretical Framework

To develop the RIMES statement, we adapted and combined relevant elements from existing program theory and process evaluation frameworks [7, 21, 35]. Our resulting framework (Fig. 1) emphasizes the stepwise contribution of design, implementation, and evaluation to the effectiveness of a complex intervention. Furthermore, it highlights the role of ecological context as an important contributor to intervention outcomes [32, 35]. Each of the items in our RIMES checklist falls within the elements of this framework. For example, a risk minimization program may be implemented in a range of outpatient care settings, each of which could differ in terms of leadership commitment to implementation, quality of staff training, and operating resources. An understanding of the role of, and interactions among, different contextual factors can also provide insight regarding how to optimize the fit of a risk minimization intervention to different delivery settings and how to improve its sustainability (i.e. long-term delivery) [36].

Fig. 1
figure 1

Framework for risk minimization intervention evaluation study reporting criteria

2.1.2 Existing Checklists

Many of the items included in the RIMES checklist refer to general research standards and are common to existing reporting checklists [14, 16, 17] but have been tailored to apply specifically to risk minimization. Items common to such reporting checklists relate to key information (author names, affiliations, conflict of interests, funding), descriptions related to study methods (participant recruitment, sample size, details of interventions, description of measures, and statistical analyses) and reporting of results (main results, limitations, generalizability and conclusions). We also consulted a checklist for implementation (Ch-IMP) and incorporated similar concepts (process metrics, implementer training, fidelity, adoption) into the RIMES checklist [7].

2.1.3 Leading Texts in Public Health and Risk Minimization

There are a number of known challenges to risk minimization programs and an emerging consensus regarding ways to advance the science in this field [11, 33]. For example, experts suggest that the goals of the intervention should be clearly defined, specific, measureable and time-bound. Thresholds of success should be determined a priori. When developing tools for communication, content should be tested among stakeholders (including intended audience) to ensure the message of risks is clearly conveyed. Furthermore, evaluations should address process outcomes (reach, adoption, implementation), as well as examine the results in the short term (effectiveness) and the success of the message in the long term (maintenance and sustainability). Each of these concepts helped inform the contents of the draft RIMES checklist [9, 11, 14, 16, 25,26,27,28,29,30,31,32,33,34, 37].

2.2 Piloting

To explore this concept, we conducted an initial literature search of peer-reviewed published articles pertaining to formal risk minimization programs and evaluations. Specifically, we searched PubMed for English-language articles published between January 2000 and July 2016 using the following text words: (‘risk minimization plan’ OR ‘risk evaluation and mitigation strateg*’ OR ‘risk management plan’ OR ‘risk minimization’ OR ‘risk minimisation’ OR ‘direct healthcare professional communication*’ OR ‘dear doctor’ OR ‘risk communication’.

Based on this literature search, we identified a convenience sample of 12 articles that met the following inclusion criteria: article relates to (1) a pharmaceutical product, (2) a risk communication or risk minimization intervention (including written, verbal or electronic), and (3) an assessment of the impact of the intervention [4, 5, 38,39,40,41,42,43,44,45,46,47]. Two raters (co-authors MYS and AR) separately reviewed and applied the draft checklist to each article. Of these two raters, MYS was an experienced researcher with extensive subject matter expertise. Conversely, AR had formal training and experience conducting health information communication research, but comparatively limited experience (less than 1 year) in designing and evaluating pharmaceutical risk minimization programs specifically.

2.3 Individual RIMES Checklist Item Revisions and Development of the Revised Checklist

Following the independent review of the 12 articles and application of the draft version of the RIMES checklist, the two raters met to discuss and compare item ratings. Based on that discussion, the checklist was further refined and the wording of several items was clarified to reduce ambiguity and to reflect single concepts only. Several examples were also added to guide future checklist application. The updated version of the checklist contained 45 items, with answer options scored as 0 (not reported or not applicable) or 1 (reported). These items were grouped into four domains:

  1. 1.

    Key information—includes established reporting criteria items such as adequate title, appropriate summary of the study in the abstract, valid, evidence-based study conclusions, and reporting of limitations as well as disclosures of funding and conflicts of interest.

  2. 2.

    Description of the risk minimization program—includes items that adequately describe the risk minimization program, such as the objective, design, target population and other key program elements.

  3. 3.

    Implementation of the risk minimization program—includes items that describe program implementation planning considerations, and how the program was implemented.

  4. 4.

    Evaluation of the risk minimization program—includes items that describe the study rationale, methods, implementation process measures, in particular the extent to which the program was implemented according to plan, and any factors that might have served to facilitate or impede implementation efforts, outcome measures and study results.

2.4 Inter-Rater Reliability Testing of the Revised Checklist

A second literature search of the published risk minimization evaluation literature was conducted approximately 6 months after the original search. The same search terms were employed. The inclusion criteria were the same as those used in the initial round of testing with two exceptions: (1) emphasis was placed on identifying only those articles evaluating risk minimization interventions formally required by a regulatory authority; and (2) the search timeframe was narrowed to include only those articles that had been published between January 2013 and January 2017. The purpose of restricting the timeframe was to focus the search on more recent studies that were expected to have higher reporting quality, given that they had been requested by a regulatory authority and had been published in the wake of European Union (EU) pharmacovigilance legislation that provided guidance on how sponsors were to evaluate formal risk minimization commitments. For the inter-rater reliability testing, we selected a convenience sample of the first 10 articles that met all the inclusion criteria (Table 1) [4, 5, 45,46,47,48,49,50,51,52].

Table 1 Description of articles reviewed with the RIMES statement

The two raters independently reviewed the 10 articles, applying the revised checklist. Inter-rater reliability was reported using Cohen’s kappa and Gwet’s AC1 statistics. Statistical analysis was conducted in R version 3.3.2 using the ‘irr’ and ‘lpSolve’ packages. Interpretation of both statistics was based on Cohen’s definition of agreement: poor (0), slight (0.01–0.20), fair (0.21–0.40), moderate (0.41–0.60), substantial (0.61–0.80), and almost perfect (0.81–1.0) [53]. Reporting inter-rater reliability with the kappa statistic is appropriate when the ratings have variation. The kappa statistic is sensitive to a high frequency of one score over another and may yield low reliability even when the percentage of agreement is high. This issue is known as the ‘kappa paradox’ and is described by Feinstein and Cicchetti [54]. In 2008, Gwet proposed and validated the AC1 statistic as a way to address the limitations of kappa [55]. This statistic is less influenced by skewed ratings and is based on an alternative adjustment of chance that is defined as the “conditional probability that two, randomly selected rates will agree, given that no agreement will occur by chance” [56]. The Gwet’s AC1 method has been used in other evaluations of inter-rater reliability of checklists and is often presented in conjunction with the kappa coefficient [7, 57, 58].

3 Results

The RIMES checklist was developed and then underwent two rounds of pilot testing. As a result of the inter-rater reliability analysis in the second round of testing, two items were deleted due to ambiguity in their phrasing—ambiguity that contributed to differing interpretations and poor inter-rater reliability. These items were (1) ‘Explicit statement of causal assumptions linking intervention to a benefit for the recipient is provided’, and (2) ‘Upfront efforts to address potential sources of bias and confounding’. Based on further discussion, it was ultimately concluded that the first item substantially overlapped with an earlier item in the checklist (‘Theoretical basis of the risk minimization program’). In addition, the second item was deemed as more accurately reflecting a quality study design item and, as such, covered by an earlier item (‘Internal validity. Evaluation limitations, degree to which sources of potential bias were addressed’). After these eliminations, our final checklist consisted of 43 items (Table 2).

Table 2 RIMES statement: checklist of items that should be included in reports of risk minimization evaluation studies for medicinal products

Rater scoring, percentage agreement, and reliability statistics for individual items can be found in Table 3. The frequency of each rater’s scores are listed, where Y or N indicate the number of articles in which the rater determined the item was adequately covered or absent, respectively. For example, for Item 2b, Rater 1 determined nine articles fulfilled the criteria and one article did not fulfill criteria. For individual items, inter-rater agreement ranged from 40 to 100%, kappa coefficients ranged from − 0.15 to 1.00, and AC1 coefficients ranged from − 0.20 to 1.00. Slightly more than half (n = 22) of the kappa coefficients ranged from moderate to almost perfect, and slightly less than half were either fair (n = 10), slight (n = 2) or poor (n = 7). Two items (9b and 17f) had negative kappa coefficients, indicating that the reliability of raters was lower than what would be expected due to chance.

Table 3 Inter-rater reliability testing: percentage agreement, Kappa and AC1 statistics by item

The reliability statistics for a number of items showed large discrepancies despite high or moderate percentage agreement (items 2b, 3b, 4, 5a, 8, 10a, 13a, 17f). For instance, for all of the kappa coefficients rated as poor or negative in value, the percentage agreement ranged between 70 and 90% and the AC1 statistic was 0.59 or higher. For these items, the raters’ scoring patterns showed a high degree of skew such that either the item was scored consistently as being present or scored consistently as being absent.

One item resulted in 40% agreement, a slight kappa (0.12) and a negative AC1 statistic. This item passed the initial piloting of the checklist but emerged as a source of discordant ratings between the raters during reliability testing. During the final round of testing, the raters disagreed on the specificity required to give full credit on this item, with one rater being consistently more stringent than the other.

Summary statistics for the checklist overall and for subscales can be found in Table 4. We found the inter-rater reliability of the checklist overall to be substantial (κ = 0.65, AC1 = 0.65). Similarly, three of the four domains also showed substantial reliability based on the kappa: key information (κ = 0.73, AC1 = 0.80), design (κ = 0.64, AC1 = 0.64), and evaluation (κ = 0.66, AC1 = 0.69). The implementation domain (κ = 0.17, AC1 = 0.61) resulted in slight reliability based on the kappa coefficient, but higher reliability based on the AC1.

Table 4 Inter-rater reliability summary statistics

3.1 Respondent Burden

Initially, the average time raters spent reviewing and rating each article using the checklist was approximately 25 min; however, as familiarity with the checklist items increased, the average review time dropped to approximately 20 min per article.

4 Discussion

This article reports on the development of a set of criteria to describe the reporting quality of risk minimization intervention evaluation studies. Our results show that it is feasible to develop such a checklist despite the fact that these studies, by definition, must utilize non-randomized design types, may feature two or more substudies, and may employ a combination of both qualitative and quantitative research methods (‘mixed methods’) [4]. The checklist addresses important aspects of reporting that are vital to assessing the quality of a risk minimization evaluation study and that are under-represented in existing reporting checklists developed for other types of research studies and program evaluations. Examples of such key aspects include a description of the goals of the risk minimization program and the actual risk minimization measures used, how the program was implemented and whether implementation efforts were successful, and the inclusion of information regarding the external validity of evaluation results.

The RIMES statement is intended for use by a range of audiences, including regulatory, industry, academic evaluators and journal editors. Standardized reporting of risk minimization evaluation studies, such as that provided by the RIMES statement, can facilitate systematic reviews and data synthesis, including meta-analyses. This is a particularly important feature given that, to date, pharmaceutical risk minimization has been singularly uninformed by research findings from other relevant sciences, including public health, communication, behavioral medicine and health services research. In addition, the checklist can guide research planning and manuscript development in the first instance, and serve as a platform for bridging pharmaceutical risk minimization science with other relevant fields. Not least, it can also assist sponsors in designing higher-quality risk minimization evaluation studies, and through learning from this evidence may potentially enhance the quality and effectiveness of risk minimization programs themselves.

The main limitation of this study related to the relatively small size (n = 10) of the sample of articles reviewed by only two reviewers. With a larger sample of articles or additional reviewers, both the kappa and AC1 statistics would have been more reliable and the rates of discrepancy between them would have been reduced. However, as noted previously, we were limited in our sample size by the relative paucity of articles on risk minimization evaluation studies that have been published in the peer-reviewed literature to date.

5 Conclusions

Results of preliminary reliability testing show that the RIMES statement has good inter-rater reliability among a small sample of articles. Important next steps in its development would include conducting testing among a larger sample to confirm item reliability, particularly for items in this analysis that have low kappa coefficients. It is possible some items are underperforming and may require adjustment. In addition, formal usability testing and an examination of both the content and construct validity of the checklist based on a more comprehensive and systematic assessment of relevant publications, including those found in the grey literature. To enhance the checklist’s practicality, future research should also assess ways to streamline it further, potentially via factor analytic methods, and to explore possible approaches to item weighting. Not least, to aid in standardizing the interpretation of checklist items, a user manual should be developed.

Although additional methodological work is planned, the current version of the checklist showed good reliability when tested by two raters among a small sample of articles. As such, this checklist represents an important step forward in improving the quality of reporting of risk minimization evaluation studies, one that can benefit both the science of pharmaceutical risk minimization and, ultimately, patient safety.