FormalPara Key Summary Points

Why carry out this study?

Standard US labels for Food and Drug Administration (FDA)-approved drugs summarize the incidence of adverse events occurring in randomized controlled trials in 2% frequency tables, providing useful information about treatment safety. However, these tabular summaries do not include data that are important to fully understand the safety side of a drug’s benefit-risk ratio

In this study, we propose a means to improve clinical trial statistical analyses for safety. To this end, we analyzed adverse event data from randomized controlled trial databases, allowing us to derive novel drug safety metrics, including adverse event absolute prevalence and expected duration, as well as drug-placebo difference in adverse event prevalence

What was learned from this study?

By calculating these novel metrics, the drug’s benefit-risk ratio can more easily emerge and be placed in clear perspective

We propose the adoption of these metrics by sponsors and peer-reviewed publications, as they provide meaningful information on the side effect burden of a drug relative to placebo that is relevant to patients and healthcare professionals. These metrics are not currently reported for each randomized controlled trial or described in standard US labels for FDA-approved drugs

Introduction

The analysis and reporting of adverse event (AE) data gathered throughout the development, testing, and use of a drug are key to establish the drug’s safety profile. An AE is defined by the US Food and Drug Administration (FDA) as “any untoward medical occurrence associated with the use of a drug in humans, whether or not considered drug related” [1]. Prior to marketing approval, information on AEs is derived from the drug’s preclinical studies and randomized controlled trials (RCTs) [2, 3]. In the US, RCTs are subject to regulatory safety reporting requirements by the FDA [4]. After a newly approved drug enters the marketplace, post-marketing surveillance systems can reveal AEs not detected during the pre-approval review [5]. These systems are useful to generate hypotheses of potential drug-associated AEs but do not allow the quantification of risk; AE incidence cannot be calculated, as the number of events reported (numerator) is not representative of the actual number that occurred, and the number of subjects exposed to treatment or “safety population” (denominator) is not known [6, 7]. In RCTs, the presence of a clearly defined safety population and a placebo comparator allow for both the calculation of AE incidence and the controlled comparison of AE rates [8].

AEs detected in RCTs are presented in drug labels, as detailed in the FDA Guidance Document for Adverse Reactions Section of Labeling for Human Prescription Drug and Biological Products [9]. The standard US label for FDA-approved drugs lists all the AEs identified in RCTs that occur with an incidence ≥ 2% in the drug treatment group and for which the rate for drug exceeds the rate for placebo [10, 11]. AEs are presented in tabular summaries with counts and percentages of the number of subjects who experienced the event by treatment, thus enabling side-by-side comparison of AE incidence for drug and placebo. AEs are initially recorded by clinical investigators in their own words (verbatim terms). To provide a meaningful estimate of the proportion of individuals experiencing AEs and prevent diluting or obscuring the true effect of the AEs, events reported under different terms but representing the same phenomenon are coded using the Medical Dictionary for Regulatory Activities (MedDRA). The coded terms are then summarized and analyzed by System Organ Class and Preferred Term [12, 13]. The drug label lists AEs as Preferred Terms or group of Preferred Terms.

There is increasing interest in enhancing the retrieval, analysis, and reporting of AEs in both pre- and post-marketing settings. Recently, regulators, including the FDA, have issued new guidance. As is known, MedDRA not only maintains the list of Preferred Terms but also provides a large number of Standardized MedDRA Queries (SMQs). SMQs are routinely used to facilitate retrieval of MedDRA-coded data as a first step in investigating drug safety issues in pre- and post-marketing. In 2022, a group of FDA medical experts presented the FDA Medical Queries (FMQs), developed specifically to assess the safety of new drugs in clinical development. Indeed, FMQs have the ability to consolidate medical conditions with scattered Preferred Terms, detecting more readily safety signals in RCT datasets [14]. This same group of experts discussed standard safety tables and figures and provided new statistical considerations in the analyses of AEs. The FDA will implement the use of FMQs and these analytical methods and presentations for drug safety data [14].

The above guidance to improve statistical analysis of AEs can be considered a follow-up to a 2015 FDA initiative, the Safety Research Interest Group, aiming at identifying gaps in safety reporting and addressing these gaps through targeted actions [15]. The Group recognized, among several areas of safety-related needs, the necessity to (1) improve clinical trial statistical analyses for safety, including benefit-risk assessments, and (2) improve access to post-marketing data and explore the feasibility of their use in analyzing safety signals.

Here, we propose a means to improve clinical trial statistical analyses for safety. This work expands our efforts to implement drug safety signal detection, with a shifted focus from post- to pre-marketing data; previously, Hopkins et al. piloted a new method to evaluate the novelty of the safety profile of a drug in a new pharmacological class in development against the safety profile of drugs in an established pharmacological class by using Bayesian disproportionality analysis of post-market FDA Adverse Event Reporting System (FAERS) data [16]. They showed that in lurasidone clinical trials in schizophrenia, half of subjects had AEs specific to atypical antipsychotics, whereas ulotaront, a novel trace amine-associated receptor 1 (TAAR1) agonist with 5-HT1A agonist activity, which does not act via blockade of D2 or 5-HT2A receptors [17,18,19,20], presented a lower cumulative rate of antipsychotic class-specific AEs [16]. In an ensuing paper [21], the authors demonstrated that the class-specific AEs in RCT data for risperidone, calculated as a cumulative function of the AEs’ disproportional reporting derived from FAERS data, were comparable to those first reported for other atypical antipsychotic drugs. Therefore, it can be postulated that cumulative AE curves represent a more objective approach to describe the qualitative similarities or, vice versa, differences in AE profiles between drugs.

In the current paper, we acknowledge that the traditional 2% incidence tables of the drug label presuppose identical AE duration and usually ignore recurrent events and competing risks occurring in the study population. If AE durations are different across treatment groups, then comparisons based on simple incidence produce biased results [22]. Regulators have underlined the limitations of defining and measuring AEs using frequency tables and have called for caution in drawing any robust conclusion [7]. Moreover, we highlight that tabular summaries do not include data that may be relevant to patients and healthcare professionals. Specifically, AE prevalence and duration can impact patients’ treatment satisfaction, adherence to medication, employment status, social activities, and, ultimately, quality of life. AE prevalence and duration are also crucial to fully understand the safety side of a drug’s benefit-risk ratio. To illustrate this point, we analyzed pooled data from five RCTs of the dopamine D2 antagonist lurasidone [23,24,25,26,27] and one RCT of the novel TAAR1 agonist ulotaront [18] in acutely psychotic patients with schizophrenia: first, we calculated incidence, absolute prevalence, and expected duration of AEs; then, we developed and tested a new metric, the drug-placebo difference in AE prevalence.

Methods

Randomized Placebo-Controlled Trials of Lurasidone and Ulotaront

AE data, coded as FDA MedDRA Preferred Terms and reported from five RCTs of the dopamine D2 antagonist lurasidone, were used to develop and test our novel metric. The 42-day trials evaluated different doses of lurasidone (40, 80, 120, or 160 mg/day, N = 1042) for the treatment of patients with an acute exacerbation of schizophrenia. These trials have been included in previous analyses of AE data by our group [16, 21] and have been selected based on the availability of reported AEs and the adequateness of the study design, which were suitable for submission of a New Drug Application to the FDA. They include study D1050006 (ClinicalTrials.gov identifier, i.e., NCT number, not available) [23], study D1050196 (NCT00088634) [24], study D1050229 (NCT00549718) [25], study D1050231 (NCT00615433) [26], and study D1050233 (NCT00790192) [27]. Placebo was used as reference (N = 503). Due to its nature, placebo was expected to have a lower side effect burden profile compared with active treatment.

Similarly, we used AE data coded as FDA MedDRA Preferred Terms and collected from one RCT of ulotaront, a novel TAAR1 agonist with 5-HT1A agonist activity, which acts via a non-D2 mechanism. The 28-day trial, study SEP361201 (NCT02969382) [18], evaluated two doses of ulotaront (50 or 75 mg/day, N = 120) for the treatment of patients with an acute exacerbation of schizophrenia. Placebo was used as reference (N = 125).

In each RCT, the safety population used for AE data analyses consisted of the subjects who received the study treatment, i.e., drug or placebo. The subjects who received the drug constituted the safety population in the drug arm, those who received placebo constituted the safety population in the placebo arm.

Ethical Conduct

Analyses were conducted on de-identified data from the six previously conducted RCTs; thus, they did not require IRB review.

Statistical Analysis

Incidence, Absolute Prevalence, and Expected Duration of AEs

In a first step, we describe how to calculate for each AE of drug compared to placebo: (1) incidence with the corresponding difference in proportions of drug vs placebo with 80% confidence interval (CI); (2) absolute prevalence with the difference in proportions with 80% CI; (3) expected duration (Table 1). To this end, we utilized pooled AE data from the lurasidone arm (all doses) and placebo arm in the five trials, and AE data from the ulotaront arm (all doses) and placebo arm in the single trial. An 80% CI is considered acceptable to detect drug-placebo differences in AE rates in RCTs with enough confidence [28].

Table 1 AE variables used in the study with the corresponding definitions

Incidence was defined as the number of subjects who developed the AE in the arm (C, new cases) divided by the total number of subjects who could have developed the AE in the arm (N), i.e., the safety population in the arm. Incidence is a ratio that can be expressed as a percentage (C × 100/N). We calculated the difference between incidence of drug and placebo with 80% CI using a previously defined formula based on the normal approximation to the binomial distribution with continuity correction, where the CI is adjusted by 0.5 × [1/Ndrug + 1/Nplacebo] [29,30,31].

Absolute prevalence was defined as the subject AE days in the arm (SUBAEDAYS) divided by the total number of subject treatment days in the arm (SUBTRTDAYS). SUBAEDAYS is the number of subject-days spent with an AE in the arm, regardless of the study day of the AE onset (i.e., the sum of days an AE was experienced by each subject in the arm). SUBTRTDAYS is the total subject-days spent in treatment in the arm (i.e., the sum of days of treatment of each subject in the arm), which is the safety population in the arm. By plotting these data, we obtained one line plot for SUBAEDAYS and one for SUBTRTDAYS, with their corresponding area under the curve (AUC) values. Their ratio (SUBAEDAYS × 100/SUBTRTDAYS) is the absolute prevalence. The x-axis of the line plot corresponds to treatment days: 42 days for most subjects in the lurasidone studies and 28 days for most subjects in the ulotaront study, except for those subjects who withdrew before the planned end of the study. The difference in the prevalence of drug and placebo with 80% CI was obtained using the formula described above. For AEs with a recorded end date, AE days were calculated as ([Date of AE end − Date of AE start] + 1). For AEs ongoing at the end of the study without a recorded end date, AE days were estimated as ([Date of study end − Date of AE start] + 1). If a subject experienced multiple occurrences of the same AE, AE days were the sum of each occurrence. If a subject withdrew from the study, treatment days were calculated as ([Date of study discontinuation − Date of study start] + 1).

Expected duration of an AE was established from the clinical trial data as days to 50% incidence (AE50), which corresponds to the number of days it takes for the incidence of an AE (C × 100/N) to be reduced by half. AE50 was calculated for the drug arm and placebo arm of the lurasidone and ulotaront trials, respectively.

Drug-placebo Difference in AE Prevalence

In the second step, we discuss the procedure to calculate our new metric, the drug-placebo difference in AE prevalence with 95% CIs (Table 1). We chose a 95% CI because of the higher desired confidence level. We computed, for each day, the number of subjects in the arm who experienced any AE that lasted for that duration or more, regardless of the study day on which the subject first developed the AE. This was obtained by summing up the SUBAEDAYS for all AEs in the drug and placebo arms, respectively. We then divided this sum by the safety population in the arm, i.e., the total number of subjects in the arm (TRTARM), to normalize the number of subjects with any AE lasting ≥ 1 day on a 0–100 scale. This ratio is the SUBJECT-AE duration. Calculating this ratio separately for each TRTARM yielded one line plot for drug and one for placebo. These plots were defined as incidence by duration. The incidence (percentage) of subjects out of the safety population who experienced one (or more) AEs by increasing duration is shown on the y-axis, while the duration (days) is shown on the x-axis: 1 day, 2 days, 3 days, etc. The AUC for each line plot can be calculated using the formula below:

$$\mathrm{AUC}=\sum_{\mathrm{TRTARM}}\mathrm{\%}.$$

TRTARM can be drug or placebo. The difference between AUCdrug and AUCplacebo is ∆AUCO:

$${\Delta \mathrm{AUC}}_{\mathrm{O}}= {\mathrm{AUC}}_{\mathrm{drug}}-{\mathrm{AUC}}_{\mathrm{placebo}}.$$

∆AUCO is the area between the two AUCs of drug and placebo and corresponds to the drug-placebo difference in AE prevalence. To compute the mean percent contribution of each AE to ∆AUCO, we removed one AE at a time to the AUCs of drug and placebo and obtained a new AUC for drug (AUCi-drug) and one for placebo (AUCi-placebo). This allowed us to recalculate ∆AUCO and obtain a new ΔAUCi corresponding to (AUCi-drug − AUCi-placebo). ΔAUCi does not represent the percent contribution of all AEs any longer but the percent contribution of all AEs minus the one we removed. It differs from the initial ∆AUCO as the removal of the AE changes the AUCs for both drug and placebo. Specifically, ∆AUCO decreases or increases based on the AE being predominantly associated with drug or placebo, respectively. The extent of the increase or decrease corresponds to the difference between the initial ∆AUCO and the new ΔAUCi divided by ∆AUCO ([ΔAUCO − ΔAUCi] × 100/ΔAUCO) and reflects how much of the drug-placebo difference in AE prevalence is accounted for by just that AE. To calculate the 95% CI of the mean percent contribution of each AE to ΔAUCO, we removed individual subjects with that AE one at a time (SUBJECT-AE pairs), each time obtaining new AUCs for drug (AUCij-drug) and placebo (AUCij-placebo) and thus new drug-placebo differences in AE prevalence with a particular SUBJECT-AE pair removed (ΔAUCij = [AUCij-drug − AUCij-placebo]). We repeated this process for all distinct SUBJECT-AE pairs (e.g., SUBJECT1-AE, SUBJECT2-AE) in the clinical trial data and obtained 95% CIs using the stat.t.interval function in Python [32]. This function is based on the central limit theorem, which assumes that the sampling distribution of the mean is normally distributed in a large sample. The CI is a range of values (from lower CI to upper CI) that is likely to contain the sampling mean with a certain level of confidence. All statistical analyses in our study were carried out using SAS 9.4 and Python 3.10.4, with the packages SciPy (1.8.1), NumPy (1.22.4), and Pandas (1.4.2) to assist in calculations.

Results

In the pooled data from the lurasidone trials, we identified 408 distinct AEs in the drug arm and 277 in the placebo arm. In the ulotaront trial, the subjects in the drug arm were affected by 59 distinct AEs, while the subjects in the placebo arm were affected by 67 AEs.

Incidence, Absolute Prevalence, and Expected Duration of AEs

We calculated incidence, absolute prevalence, and expected duration of all AEs from the clinical trial data and decided to focus on five AEs in the lurasidone trials (akathisia, nausea, agitation, insomnia, anxiety) and one AE (akathisia) in the ulotaront trial. Indeed, the ulotaront phase 2 trial is characterized by a relatively small sample size compared to the five lurasidone trials (N = 245 vs N = 1545), and our aim is to integrate its AE data with the AE data that will derive from the ongoing phase 3 trials.

The incidence of akathisia in the lurasidone trials was 14.3% for the drug arm, with 149 subjects developing akathisia as an AE out of the 1042 subjects who could have developed that AE in the drug arm (safety population for lurasidone) vs 2.58% in the placebo arm, with 13 subjects developing akathisia as an AE out of the 503 subjects who could have developed that AE in the placebo arm (safety population for placebo) (Table 2). The incidence of the other four AEs in lurasidone vs placebo is reported hereafter: nausea (12.86%, N = 134/1042 vs 7.36%, N = 37/503); agitation (7.1%, N = 74/1042 vs 7.55%, N = 38/503); insomnia (14.68%, N = 153/1042 vs 14.51%, N = 73/503); anxiety (8.35%, N = 87/1042 vs 6.56%, N = 33/503) (Table 2).

Table 2 AE preferred terms in a hypothetical, revised lurasidone label listed with their: (1) incidence and respective difference in proportions between drug and placebo with 80% CI; (2) absolute prevalence and difference in proportions with 80% CI; (3) expected duration, corresponding to the days to 50% incidence

We then calculated the absolute prevalence of AEs in the pooled data from lurasidone and placebo. The absolute prevalence of akathisia in the lurasidone arm was 11.15% (N = 3725/33,401), meaning that 3725 subject treatment days had this AE (SUBAEDAYS) out of the 33,401 subject treatment days in which this AE could have expressed itself (SUBTRTDAYS) (Table 2; Fig. 1a). SUBAEDAYS (pink area in Fig. 1a) assumes that each subject has a certain number of days on akathisia and represents the sum of these days. SUBTRTDAYS (gray area in Fig. 1a) is the sum of the days during which each of the subjects in the lurasidone arm was on treatment (most subjects were on treatment for 6 weeks, which is the duration of the trial). The SUBAEDAYS/SUBTRTDAYS ratio is the absolute prevalence. The absolute prevalence of akathisia in the placebo arm was 1.54% (N = 238/15,454) (Table 2; Fig. 1a). The absolute prevalence of the other AEs in drug vs placebo is as follows: nausea (3.79%, N = 1267/33,401 vs 2.53%, N = 391/15,454); agitation (2.73%, N = 912/33,401 vs 3.1%, N = 479/15,454); insomnia (8.78%, N = 2931/33,401 vs 10.33%, N = 1597/15,454); anxiety (4.52%, N = 1510/33,401 vs 3.46%, N = 535/15,454). For the absolute prevalence of akathisia in the ulotaront trial (Fig. 1b), we determined the ratio of SUBAEDAYS (green area in Fig. 1b)/SUBTRTDAYS (gray area in Fig. 1b) in the drug and placebo arms (most subjects were on treatment for 4 weeks): the prevalence of akathisia in ulotaront vs placebo was 0.68% (N = 20/2920) vs 0.54% (N = 17/3169).

Fig. 1
figure 1

Absolute prevalence and expected duration (days to 50% incidence) of adverse events for the five lurasidone trials and the ulotaront trial. a The five plots on the top show the safety population for lurasidone (SUBTRTDAYS, in gray; N = 33,401), as well as the absolute prevalence (SUBAEDAYS, in pink), i.e., the number of subjects on lurasidone who had akathisia, nausea, agitation, insomnia, or anxiety multiplied by the number of days during which each subject had these AEs. The expected duration of each AE in days (AE50) is also shown. The five plots on the bottom display the safety population for placebo (in gray; N = 15,454), as well as the absolute prevalence (in pink) and AE50 of the five AEs in the placebo arm. b The plot on the top shows the safety population for ulotaront (in gray; N = 2920), as well as the absolute prevalence (in green) and AE50 of the AE of akathisia. The plot on the bottom displays the safety population for placebo (in gray; N = 3169), as well as the absolute prevalence (in green) and AE50 of the AE of akathisia in the placebo arm

We then computed the expected duration (AE50) of the AEs in the lurasidone trials, corresponding to the days required for AE incidence to go down by 50%. We observed the following AE50 for akathisia in drug vs placebo (Fig. 1a; Table 2): after 18 days, 50% of subjects with akathisia in the drug arm no longer had it, while after 23 days, 50% of subjects with akathisia in the placebo arm no longer had it. The AE50 of the other AEs in drug vs placebo was the following: nausea (5 vs 4 days); agitation (7 vs 7 days); insomnia (15 vs 14 days); anxiety (15 vs 9 days). We also computed AE50 of akathisia in ulotaront vs placebo (8 vs 18 days) (Fig. 1b).

For both incidence and absolute prevalence of lurasidone and ulotaront with respective placebos, we calculated the difference in the two proportions with 80% CI (Table 2).

Drug-placebo Difference in AE Prevalence

By plotting the incidence by duration of AE data (SUBJECT-AE duration) from pooled drug and placebo in the lurasidone trials, we generated AUCs for drug (in pink in Fig. 2a) and placebo (in gray in Fig. 2a), respectively: the y-axis corresponds to the incidence of subjects, normalized on a scale from 0 to 100%, who experienced one or more AEs by increasing duration, which is shown on the x-axis: from 1 to 42 days. The difference between the AUCs for drug and placebo is a pink area, which represents the drug-placebo difference in AE prevalence (ΔAUCO) (Fig. 2a). We compared, in lurasidone vs placebo, the percentage of subjects experiencing any AE with duration of 1 day (80% in lurasidone vs 76% in placebo), 7 days (60% vs 51%), 14 days (46% vs 39%), 28 days (28% vs 23%), and 42 days (14% vs 11%). As expected, the placebo curve showed a lower side effect burden compared with active treatment.

Fig. 2
figure 2

Procedure to calculate the drug-placebo difference in AE prevalence for the pooled data from the lurasidone trials. a We determined the AUC of AE prevalence for drug (area under the pink curve, representing the percent subjects in the drug arm that have any AE of duration of ≥ 1 days), as well as the AUC of AE prevalence for placebo (area under the gray curve, representing the percent subjects in the placebo arm that have any AE of duration of ≥ 1 days). The AUCs are expressed as subject AE days from 0 to 100%. Then, we calculated the difference between the two AUCs of drug and placebo, which corresponds to the pink area in between (ΔAUCO). b We estimated the mean percent contribution of each AE to the drug-placebo difference in AE prevalence. On the left-hand side, we removed akathisia, a drug-related AE, from ΔAUCO, meaning that we removed all the subjects with that AE and recalculated the pink area (ΔAUCi = AUCi-drug − AUCi-placebo). We observed a decrease in the pink area. On the right side, we removed schizophrenia, a placebo-related AE, from ΔAUCO and recalculated the pink area (ΔAUCi = AUCi-drug − AUCi-placebo). The pink area increased. The difference between the original ΔAUCO and the  new ΔAUCi gives us the extent of change of the pink area (ΔAUCO − ΔAUCi) × 100/ΔAUCO. c To compute the 95% CI of the mean percent contribution of each AE to ΔAUCO, we took out one SUBJECT-AE pair at a time from the AUCs of drug and placebo and recalculated the pink area each time, thus obtaining new ΔAUCs (ΔAUCij = AUCij-drug − AUCij-placebo). d We computed and listed all the AEs by mean percent contribution with 95% CI from the AEs related to drug (positive percentage) to those related to placebo (negative percentage)

We then quantified the percent contribution of each AE to the AUCs for drug and placebo (Fig. 2b). To this end, via a subtractive process, we removed one AE at a time and recalculated the area. We obtained each time a different ΔAUCi and measured the change compared to the original ∆AUCO: ([∆AUCO − ΔAUCi] × 100/ΔAUCO). When removing a drug-related AE such as akathisia from the lurasidone and placebo AUCs, the pink area decreases and the white portion that appears in ∆AUCO represents the portion of the drug-placebo difference accounted for by akathisia, whose mean percent contribution is + 28.44%. This happens because the removal of akathisia moves the drug curve more than the placebo one (Fig. 2b). When removing a placebo-related AE such as schizophrenia from the lurasidone and placebo AUCs, the opposite happens and the pink area increases, with schizophrenia contributing – 3.35%. Therefore, the removal of schizophrenia moves the placebo curve more than the drug one (Fig. 2b). After having removed each AE (i.e., all subjects with that AE) one at a time from the drug and placebo AUCs, we removed each individual subject with that AE (SUBJECT-AE pair) one at a time to determine how the pink area would change (ΔAUCij = [AUCij-drug − AUCij-placebo]) (Fig. 2c) and calculate the 95% CI of the mean percent contribution of each AE to the drug-placebo difference in AE prevalence (Fig. 2d). We also used our metric for the ulotaront trial and compared the percentage of subjects experiencing any AE with duration of 1 day in ulotaront vs placebo (53% in ulotaront vs 56% in placebo), 7 days (34% vs 34%), 14 days (15% vs 23%), and 28 days (4% vs 7%). We then calculated the mean percent contribution of each AE with 95% CI by assessing the contribution of each SUBJECT-AE pair to ∆AUCO. Akathisia contributed to 0.6%, while schizophrenia contributed 38.7%. Notably, ulotaront exhibited a drug-placebo difference in AE prevalence curve that was smaller than that of placebo.

It is worth noting that in the lurasidone trials, 45.9% (N = 478/1042) AEs in the drug arm and 35% (N = 176/503) AEs in the placebo arm were ongoing at the end of the 6 weeks. In the ulotaront trial, 17.5% (N = 21/120) AEs in the drug arm and 28.8% (N = 36/125) AEs in the placebo arm were ongoing at the end of the 4 weeks. We reproduced the drug-placebo differences in AE prevalence for both the lurasidone and ulotaront trials after excluding the AEs without an end date (Fig. 3a, b). We did not find these new ΔAUCs to be substantially different from the initial ones, where we imputed the AE end date for those AEs that did not have one. The AUCs for lurasidone and placebo, as well as the ones for ulotaront and placebo, continued to be parallel, showing that there are fewer AEs for all durations in both drug and placebo rather than more long-lasting AEs in drug and more short-lasting AEs in placebo—which would have caused the placebo curve to drop abruptly before to the drug curve. Hence, we can consider the drug-placebo difference in AE prevalence to be reliable regardless of the inclusion or exclusion of ongoing AEs.

Fig. 3
figure 3

Comparison of drug-placebo difference in AE prevalence when including and excluding ongoing AEs, respectively. a The two top curves (black and gray) demarcate a pink area representing the drug-placebo difference in AE prevalence (ΔAUCO) of lurasidone vs placebo when including the AEs without an end date and imputing their end date to the study end. The light pink area outlined by the two bottom dotted curves (black and gray) corresponds to the ΔAUCO of lurasidone vs placebo after excluding the AEs without an end date. b The two top curves (green and gray) delimit a pink area representing the drug-placebo difference in AE prevalence (ΔAUCO) of ulotaront vs placebo when including the AEs without an end date and imputing their end date to the study end. In between the two bottom dotted curves (green and gray), there is a light pink area corresponding to the ΔAUCO of ulotaront vs placebo after excluding the AEs without an end date

Our novel metric of drug-placebo difference in AE prevalence was tested for adaptation to other studies. We were able to replicate our findings for pooled drug and placebo data in RCTs for other psychiatric and neurological disorders (insomnia, bipolar disorder, attention-deficit hyperactivity disorder, binge eating disorder, partial-onset seizures), as well for other compounds (eszopiclone, eslicarbazepine, dasotraline) (results unpublished) [33, 34]. This suggests that our methodology can be applied to potentially all clinical trials, making it possible to calculate the percent contribution of each AE to the drug-placebo difference in AE prevalence. The executable is available upon request.

Discussion

The drug label is a publicly available document that is uniquely placed to be an invaluable source of information for patients, healthcare providers, researchers, and regulators. Per FDA guidance [35], the drug label must meet two criteria: (1) it must contain details and directions for healthcare providers to prescribe the drug safely and effectively, including the approved uses for the drug, contraindications, potential adverse reactions, available formulations and dosage, and how to administer the drug [21 CFR 201.56(a)(1)] and (2) it must be informative and accurate and neither promotional in tone nor false or misleading [21 CFR 201.56(a)(2)]. This information aligns with the FDA’s historic mission to protect consumers and the congressional mandates that the drug label “must be truthful” (the 1906 Wiley Act) [36] and provide complete information of “the risks as well as the benefits” (the 1962 Drug Amendments) [37].

Despite these efforts, current drug labels only report the incidence of the drug’s AEs (percent of subjects who reported the AE out of the total subjects in the RCT) and not the prevalence (percent of subject-days spent with an AE out of the total subject-days spent in the RCT) or duration (days required for the AE incidence to be reduced by half), which can be valuable for both patients and physicians. A recent review of adverse drug reaction data from 24 publicly available drug labels for antidepressants and anticonvulsants marketed in the USA showed that only one drug label out of the 24 contained information about AE duration [38]. In this study, when we analyzed the pooled lurasidone data, we found that the two AEs “akathisia” and “nausea” in the drug arm have similar incidence (14.3% and 12.86%, respectively) but dissimilar duration (18 and 5 days, respectively). Physicians can hypothesize that “nausea” is short-lived compared to “akathisia” but cannot be certain, because the duration, despite being collected in RCTs, is usually neither analyzed nor reported. Consideration of side effect duration may help determine reversibility of AEs in safety analyses and enhance evaluation of drug safety signals for those AEs whose incidence rates are similar between the drug and placebo arms of RCTs. We also showed that the two AEs “akathisia” and “nausea” in the drug arm of the pooled lurasidone data had dissimilar prevalence (11.15% and 3.79%, respectively). Clearly, information on prevalence and duration, if reported for each RCT, would inform a patient-physician discussion not only on how likely the patient will be to experience a specific side effect but also on how much and how long the patient will suffer from it. For the FDA, additional information regarding AE prevalence and duration may inform inclusion of a side effect with low incidence (below the 2% threshold) where prevalence and/or duration in drug is greater than placebo.

We then calculated the drug-placebo difference in AE prevalence. The latter is an area under the curve (AUC) corresponding to the difference between the AUCs of drug and placebo (∆AUCO). It is defined as incidence by duration: the y-axis shows the incidence (percentage) of subjects who experience one or more AEs by increasing duration (days) on the x-axis. The quantification of this ∆AUCO offers more information than the difference in incidence between drug and placebo, thus improving safety signal detection. The evaluation of the contribution of individual AEs to ∆AUCO reveals a continuum of risk, from most drug-associated AEs to most placebo-associated AEs, based on the specific AE contributing to an increased or decreased drug-placebo difference in AE prevalence. For example, in pooled lurasidone data, akathisia contributes + 28.44% to (increased) drug-placebo difference in AE prevalence, while schizophrenia contributes − 3.35% to (decreased) drug-placebo difference in AE prevalence. From a clinical perspective, this means that in lurasidone trials almost one third of the drug-placebo difference in AE prevalence is attributable to one single AE, akathisia, an extrapyramidal symptom specific to D2 antipsychotics, while schizophrenia, an AE commonly related to the underlying disease, plays a marginal role in this difference. In ulotaront data, akathisia contributes 0.6%, while schizophrenia contributes 38.7%. Notably, with this metric, individual AEs that do not appear in the 2% incidence tables can, nevertheless, contribute substantially to the drug-placebo difference in AE prevalence, regardless of whether they are associated with drug treatment or lack thereof (placebo). Overall, these findings show that the drug-placebo difference in AE prevalence in the drug arm is generally greater than in the placebo arm, though ulotaront exhibits a drug-placebo difference in AE prevalence that is lower than placebo. This is attributable, in part, to a relatively lower incidence and shorter duration of AEs in the ulotaront arm, as well as the emergence of disease-related AEs in the placebo arm. These results underline how the reliability of detecting drug-associated AEs in clinical trials may be augmented by incorporating information on drug-placebo difference in AE prevalence.

The main reason for the analysis of AE information in RCTs being less than robust is that these trials are designed and statistically powered to establish the efficacy rather than the benefit-risk relationship of a drug [22]. RCTs are often underpowered to evaluate the harm profile of a drug, which includes multiple, non-predefined AEs [11, 39]. Moreover, safety analyses are usually limited to descriptive statistics and basic statistical computations, which are not particularly informative. A recent paper has shown that a search on Google Scholar for “new models to demonstrate efficacy in clinical trials” generated 1.2 million results, while the search for “new models to analyze safety” generated only 218,000 results [40].

The analysis of AEs requires the evaluation and reporting of data on timing, duration, and severity of AEs, among other variables, as highlighted by two recent studies on AE burden in schizophrenia trials [41, 42]. Over the years, a variety of unique methods have been proposed to analyze AEs beginning as early as 1989—though most were published after 2004 [39]. These methods were summarized in a recent review and include graphical methods, hypothesis testing methods under the frequentist paradigm, estimation methods that quantify distributional differences in AEs between treatments without a formal test (e.g., risk differences, risk ratios, and odds ratios with CIs), and Bayesian methods that give the posterior probability of AEs [39]. The use of these methods, however, is limited. An online survey of public sector and industry statisticians working in RCTs showed that only 38% were aware of these methods, and even less—approximately 13%—had used such methods [43]. The most frequently cited reasons were the unsuitability of trial sample sizes, multitude of different AEs experienced in trials, technical complexity of most statistical methods, significant resources and time needed to implement these methods, and uncertainty about the level of agreement of regulators on these methods.

Our new metric adds AE duration to incidence, yielding a two-dimensional plot for both drug and placebo and providing a novel way to look at AE data. This metric reflects more accurately the impact of AEs on patients, offers a more robust understanding of safety risks for drug relative to placebo, and enables the quantification of the drug-specific side effect burden as measured by the absolute prevalence of AEs and by the drug-placebo difference in AE prevalence. After piloting this metric in lurasidone and ulotaront RCTs in schizophrenia, we replicated our results in RCTs for other drugs and indications.

It must be noted that in RCTs, some subjects drop out from treatment or study follow-up because of lack of efficacy, AEs, or loss to follow-up, among other reasons [44]. Dropout from RCTs due to AEs impacts the estimation of a drug’s safety profile as it terminates AE data collection and generates missing values [45,46,47]. The typical conduct and duration of RCTs often result in incomplete information on AEs lasting longer than the study end date and do not allow for follow-up within the RCT setting to fully characterize drug tolerability. A recent analysis of clinical trial safety results in ClinicalTrials.gov for FDA-approved drugs revealed that one of the main challenges in using AE data from RCTs for drug safety monitoring is that approximately half of all RCTs have missing data in the published report [48]. The missing data are typically safety-related [49]. Good clinical and research practices require the collection of AE data to begin at the study start (initiation of drug or placebo intervention) and continue until resolution. Follow-up is required for AEs that cause interruption or discontinuation of the study drug or those that are present at the end of study treatment (ongoing AEs). It is important for sponsors, investigators, and medical monitors to strive for improved clinical trial reporting practices to increase data quality, and limit missing data points, particularly for AE end dates that fall within the duration of the RCT. A limitation of our study is the high rate of AEs without an end date in the lurasidone and ulotaront trials, ranging from 17.5 to 45.9%, with the highest rates in the drug and placebo arms of the lurasidone trials. Despite this limitation, we demonstrated that the results of our study, including our new metric, were insensitive to missing data and did not differ substantially after excluding all AEs without an end date from the analysis. Another limitation of our study is that we did not investigate reasons for study discontinuation and could not assess whether treated subjects discontinued because of inadequate efficacy or AEs in a greater proportion than subjects in the placebo group. We also did not assess whether there were differential losses to follow-up between intervention arms. The presence of differential losses and the cessation of AE monitoring in patients who withdraw from the study may lead to imbalances in the rates of AEs between arms. Last, our study did not account for long-term or tardive AEs that may not have become apparent until after the RCT end date.

In summary, our results point to areas for potential improvement in the analysis and reporting of AE data that could benefit patients, physicians, researchers, and regulators. Enhanced focus by sponsors and clinical investigators is needed to ensure the completeness of safety outcome reporting, including information on prevalence and duration. Moreover, careful consideration of harm outcomes and implementation of appropriate statistical methods when designing clinical trials can help identify safety signals and provide a more accurate evaluation of a drug’s benefits and risks.

Conclusions

AE data collection is a standard regulatory requirement in a drug’s clinical trials. Here, we propose to compute, in addition to incidence, the absolute prevalence and expected duration of AEs, and advocate that these variables be reported by sponsors and adopted by peer-reviewed publications for each RCT, allowing the inclusion of these variables in drug labels, in addition to the 2% frequency tables. Ultimately, these variables may render drug labels more informative. We also propose a novel metric, the drug-placebo difference in AE prevalence, to analyze AE data collected in clinical trials. This metric provides a unique estimate of risk, visualized as a continuum from drug treatment to placebo. This metric was piloted in RCTs of the D2 antipsychotic lurasidone and the novel TAAR1 agonist ulotaront, as well as in a number of other CNS trials, offering a new way to quantify the side effect burden of a drug relative to placebo. Although further validation of this metric is needed, this initial proof-of-concept analysis suggests that the collection of additional information on AEs is both possible and clinically relevant to patients and physicians. The approach described here has the potential to be used in any dataset without the need for additional data collection and can be adapted to evaluate the tolerability of medications used to treat a variety of conditions.