FormalPara Key Points

For the comparison of adverse events between patients, regression models adjusting important covariables may be considered both in observational and randomized controlled trials.

Time-to-event models have been advocated in adverse events analysis. However, models like logistic regression (with rare event corrections) may be considered for rare events.

If possible, the absolute risk from the regression model should be presented systematically because it may help validation and interpretation, particularly in the competing risk settings, and comparison between studies with different follow-up times.

Due to the multiple facets of adverse events data, several risk measures may be relevant to accurately evaluate and compare patients’ toxicity profiles.

1 Introduction

Over the past few years, there have been several studies undertaken to depict the situation with regard to the collection, reporting and analysis of drug safety data in randomized controlled trials (RCTs). In a review, the practice regarding the collection, reporting and the analysis of adverse events (AEs) were described as inconsistent and sub-optimal [1]. The CONSORT harm extensions [2, 3] provided guidelines to cover the reporting of harms but have not been sufficiently adopted [1]. With regard to the statistical part of AE analysis, guidelines hardly tackle the question, but focus mainly on collection and reporting [4]. A scoping review of the statistical methods identified several studies designed specifically for AE data, but those methods have rarely been applied [5]. A recent survey conducted by statisticians from academia and industry to understand the current practice [6] identified several barriers that limit the application of those methods. Beyond the barrier of the design and characteristics of the RCT (e.g., the overwhelming number of events and the underpowered sample size for harm outcomes), many participants indicated a lack of guidelines, of awareness of appropriate methods, and of training on the subject.

For treatment comparison, even for randomized designs, it may be interesting to consider multivariate regression models to adjust on important covariables to improve precision and reduce sample size [7,8,9]. However, they are barely used in practice [10]. In pharmacoepidemiology, i.e., the study of the risks and therapeutic effects of drugs in real-life populations [11], regression modeling is even more relevant, as those studies rely on observational data with no control for confounders, so adjusted analysis may be necessary to reduce biases. However, due to the multidimensional nature of AEs (e.g., timing, multiplicity, severity) modeling is challenging [12]. For chronic treatments or in oncology, the occurrence of intercurrent events (e.g., death, treatment interruption, treatment discontinuation) is unavoidable even in controlled designs and must be considered during the choice of the model and of the risk measures [13].

Over the last few years, several authors advocated the use of competing risk methods to avoid bias related to the death in time-to-event outcomes of AE analysis [14, ]. Non-parametric estimation of risk measure was discussed in the context of pharmacoepidemiology, but without considering competing events, and more recently in the context of RCTs [15]. Models for neglected dimensions were proposed like recurrence [16] or severity [17]. To our knowledge, there is no specific overview of which regression model to choose depending on the research question and the data structure. The purpose of this article is to (1) provide an overview of regression models available to compare adverse events between patients, (2) to discuss and give some clues regarding the choice of the model according to the characteristic of the event, the clinical framework, and the objective.

2 Models

2.1 Towards the Model

The definition of the event(s) and its (their) specificities highly affect the modeling choices. Figure 1 gives a general overview of the characteristics that may considered when identifying a suitable model. The AE might occur early or late (situation 1 or 2) after treatment initiation, it might occur more than once (situation 3) or have various severities (situation 4). Furthermore, different AE types may be modelled together to catch a common effect (situation 5) (e.g., different AEs from a same body system [18]). Finally, when available, AE duration may be of interest because long durations may highly affect patients’ quality of life (situation 6).

Fig. 1
figure 1

Event characteristics driving the modeling choices

Additional to the event characteristics, the therapeutic pathway of the patients may influence the characteristics of the model. In a chronic disease such as cancer, a patient’s follow-up may last several months or years so that during the treatment, many incidents may happen, referred as intercurrent events in the Addendum on estimands and sensitivity analysis in clinical trials of the ICH Guidelines [13, 19]. We will mention some of them that usually require the model to be adapted. Figure 2 depicts common successions of incidents as they may be seen during a patient’s follow-up.

Fig. 2
figure 2

Typical patients’ therapeutic pathways

Patient A and B belong to the classical setting of survival analysis, the event of interest (e.g., the AE) occurring at different times and potentially censored (patient B). A first type of intercurrent event is death (Patient C), which obviously prevents the AEs from occurring. When interested in comparing the occurrences at a given time for all patients, the model usually needs to account for a terminating event. Targeting the “direct” effect of the covariable on the occurrence of AEs (i.e., not mediated by the competing event) is much more complicated, as it requires the hypothetical scenario in which the intercurrent event would not occur. This is not achievable, unlike sometimes making unrealistic/untestable assumptions [20, 21] (e.g., independence between the competing events and the events of interest). We will not address those methods in the article. Patient D discontinues the treatment, which usually reduces or even removes the risk of AEs. Hence, this phenomenon counts as an informative censoring event. Contrary to death, the outcome may not be defined because AE collection may be reduced in frequency or even discontinued as in patient D bis (e.g., if the patient begins another treatment) [22]. Treatment interruption for a given period (Patient E), co-medication (Patient F), or dosage change (Patient G) modify the risk of exposure to AEs and raise important methodological issues, not to mention that collecting information about co-medication from the patients is very difficult and when present may have thousands of modalities.

2.2 Single Event Models

The usual practice in AE analysis is to focus on one occurrence of an event (e.g., the first, or the most serious). This section details regression models for a single event outcome. For a formal definition of the outcome and associate risk measure in each case, the reader may refer to Table 1.

Table 1 Summary of the models and risk measures of the single-event models section

2.2.1 Logistic Regression

Logistic regression is the most obvious way of relating covariables with a single event. In AE analyses, it may not always be appropriate. Regarding the characteristics we described above, we identified the following situations to be appropriate for this model.

  • Early occurrence Estimates stemming from the logistic regression (e.g., probabilities or odds ratios [OR]) may be highly biased in presence of censoring (loss to follow-up, administrative censoring) so early AEs may be less sensitive to this issue (situation 1 in Fig. 1) [23]. Moreover, for those events, the times of occurrence may be quite homogenous between patients so that modeling their timing may have limited interest.

  • Rare events When considering rare events, the logistic regression may be a modeling option. As the event probabilities may be underestimated in those setting [24], it can be combined with useful type of penalization (e.g., Firth’s correction) to reduce biases [25]. Variants of Firth’s corrections have also been developed to improve the estimates of ORs [26]; they may be particularly useful for analyses of rare but serious AEs [27, 28] with case-control designs [22].

  • Case-control designs The ORs directly stemming from the logistic regression are known to be interesting for risk evaluation in the case-control study designs, in which the baseline probability is not available.

Odds ratios are often interpreted as probability ratios but this should be done with caution when the probability of an event occurrence is greater than 0.1 [29]. Indeed, in that case, ORs tend to differ from probability ratios and may therefore lead to overestimation of the association between the event and the risk factor. Estimating the risk ratio or risk difference with non-rare outcomes may avoid misleading interpretations, although they require more complex methods (e.g., binomial model) [30].

Handling terminal intercurrent events In case of competing events, all those risk measures stemming from the logistic regression provide the total effect of the covariables on the AE occurrence, meaning a combination of both the direct effect on the AE occurrence and the effect mediated by the competing events [31]. That is why indicators based on probabilities are sometimes criticized for not providing information regarding potential differences in follow-up durations between patients [32]. A classic example is the comparison between treatments with the same hazards of AEs in case one of them increases survival or progression-free survival. Considering the probability of event, the conclusion is that the risk of toxicity is larger in the group with increased survival; hence, the explanation is that patients who die rapidly do not have time to experience AEs in the other treatment arm [19].

2.2.2 Time-to-Event Models

To deal with censoring and to describe the occurrence of a single type of AE over time, time-to-event models have been advocated to improve the analysis of safety data [4, 19, 33]. The proportional hazards model (or Cox model) is the most commonly used regression time-to-event model. The following situations (non-exhaustive list) are well suited for time-to-event models:

  • Long-term toxicity analysis with censoring As long-term cohort analyses are usually affected by censoring (e.g., administrative), so time-to-event models are the only ones to guarantee unbiased estimates of the probability of events in that situation.

  • Most non-recurrent AEs If the event of interest is not recurrent (e.g., serious events that may lead to treatment discontinuation), building a time-to-event model is usually valuable because in addition to the treatment effect, it provides an interesting description of the risk of toxicity over time (e.g., cumulative hazard or survival in the Cox model). The proportionality assumption may be questionable and systematically checked, particularly when comparing treatments with very different toxicity profiles (e.g., immunotherapy vs chemotherapy). If the proportionality is not valid, flexible models allowing for time-varying effects may be useful [34].

  • Special interest in time-varying covariables such as drug exposure Another valuable aspect of time-to-event models is their ability to consider important time-dependent covariables, such as exposure. In 2019, Danieli and Abrahamowicz [34] tackled the modeling of drug exposure and treatment interruptions in a time-to-event model to be used in observational studies. They considered a weighted cumulative exposure to deal with both the time elapsed since the last exposure and the cumulative dose. Flexible estimates described the way past exposures affect the hazard of a specific event. This approach is interesting in case of late AEs, potentially triggered by drug accumulation (Fig. 1 situation 2).

Hazard ratios (see Table 1) measure the association between the event and the covariables, although we generally cannot use them to establish causality [35], even with a correctly Cox specified model and a proper randomization at baseline. Indeed, by conditioning on survival, the risk set may be modified over time if the covariable of interest has an actual treatment effect or in presence of an unmeasured covariable [36]. Cumulative probabilities difference or ratio may be considered instead for causal inference. In both cases, reporting absolute risk measures (e.g., hazards or cumulative hazards, cumulative probabilities) is highly advised [37].

Handling terminal intercurrent events As previously mentioned, competing risks are often encountered in AE analysis, particularly in oncology [14] complicating time-to-event analyses and their interpretation [38]. Two main approaches may be used for that type of analysis. The first is to model cause-specific hazards based on the times of occurrence for a single cause of failure (see Table 1). Cox model can be used to perform a regression on the cause-specific hazards. A distinct model for each cause of failure is needed to compute the cause-specific cumulative incidence function. The second approach relies on sub-distributional hazards whose famous associated regression model is the Fine-Gray model [39]. Despite its unclear biological interpretation [40], the strength of the model is to directly link the variables to the cumulative incidence function in a single model [41]. In both cases, covariable effects measures (e.g., difference or ratio) derived from the cumulative probabilities provide the same total effect we previously described for logistic regression [31]. Dealing with the direct effect of the treatment cannot definitely be solved using the cause-specific or the sub-distributional hazards, which have generally no causal interpretation, even in randomized designs [31]. Causation in the competing risks setting is constituting an active fields of statistical research and other estimands may be considered (e.g., survivor average causal effect [SACE]) [21].

2.2.3 Models Comparing Severities

The severity of AEs is usually quoted using an ordered categorical or a numeric scale. The well-known National Cancer Institute’s Common Terminology Criteria for Adverse Events (CTCAE) uses a grade from 1 (mild) to 5 (death). Then, we may want to compare the impact of a covariable on the severity of an event (situation 4 of Fig. 1). For example, one may be interested in the level of toxicity of a treatment regarding various dose schemes [42] or by the risk factors of severity [43]. A natural option is to conduct an ordinal logistic regression (e.g., proportional odds model or continuation ratio model) [44]. The idea of those models is to associate the covariables with the probabilities of the levels of severities. However, as with the classical logistic regression, the ordinal logistic regression does not account for the timing of the AEs and it may be highly biased in the presence of censoring limiting the situations of application of those models. Hence, other modeling options may be considered according to the aim of the study.

  • The interest is in the evolution of the severity of the adverse events in patients over time In this case, repeated measurements of the level of severity are collected and longitudinal models may be considered. A first option is the ordinal logistic regression with a random effect on the patient’s level. For example, Augustin et al [45] built a longitudinal proportional odds model on an oral mucositis score, with the cumulative dose and the mouth sites as covariables, to improve the planning of radiation therapy. The second option is to consider the grade (e.g., CTCAE grades) as a repeated measure over time using a linear mixed model [46,47,48]. The latter approach captures the complete toxicity trajectory of the patients, including the burden of low-grade AEs. It may provide a comprehensive, visual description of the toxicity profile despite the uncertain assumption of grade normal distribution.

Handling terminal intercurrent events When the longitudinal outcome and the occurrence of a terminal event (i.e., informative dropout) are correlated, the linear mixed model will lead to biased estimates. To reduce the bias, a joint model may be considered [49], i.e., a mixed model for the longitudinal data and a survival model for the time to death. Shared parameters like random effects link the two outcomes.

  • The interest is in comparing the cumulative probabilities of occurrence for each level of severity over time (potentially the maximum grade per patient) To handle this issue, Berridge and Whitehead [47] built a two-component model to estimate the occurrence probabilities of AEs according to their severities. One component is a proportional hazard model, used to estimate the all-grade probability of events over time in an unbiased manner, and the second component is an ordinal logistic model which ventilates the probability over the levels of severity.

2.3 Recurrent Events Models

Harm studies tend to focus on severe life-threatening AEs (e.g., CTCAE grade 3–4), which are recommended to be systematically collected [4, 50]. As those events may lead to treatment discontinuation, single-event models seem rather suited to them. However, it became increasingly common to include patient reported outcomes (PRO) (e.g., PRO-CTCAE [51]) in clinical trials to complete AE collection because physicians tend to under-report symptoms in terms of frequency and/or severity, compared to patients themselves [52]. Those events may be mild but recurrent (situation 3 in Fig. 1) and may highly affect the quality of life. However, the statistical methods used to deal with these recurrent events have been repeatedly reported as inappropriate (e.g., restricting the analysis to the first or the worst-grade event) in clinical trials [1, 12, 16]. In this section, we discuss the use of some common recurrent events methods in the context of AEs. The reader may refer to Table 2, for a summary of the models and formal definitions of the outcomes and risk measures.

Table 2 Summary of the models and risk measures of the recurrent event models and multi-type events sections

Previously, in the time-to-event model section, the hazard was defined as the instantaneous probability by unit of time of experiencing an AE in patients who had not experienced an AE previously. Here, the counting process theory is used to extend the notion of hazard to recurrent events. The intensity of AEs is defined as the instantaneous probability by unit of time of experiencing an AE given the history of the process (e.g., the timing of the previous events) [53]. Additionally, ‘rate’ will stand for the instantaneous probability of experiencing an AE. We identified the two following clinical questions that may guide the choice of the recurrent event model.

  • Interest in finding associations between the covariables and the overall occurrence of AEs over time: The most natural modeling approach is the Poisson and the Negative Binomial model [10]. However, the Andersen-Gill (AG) model [54] has been found to perform better in complex situations and should be preferred (unless the available data are aggregated counts) [55]. The latter is marginal (e.g., based on quantities such as the rate or the cumulative mean) semi-parametric regression on the rate (extension of the Cox model). The AG model may be easily applied with standard statistical software handling the Cox model, by simply rearranging the dataset [53, 56]. A robust estimator of the variance is usually needed to account for correlation between the events of a same subject [57]. As marginal quantities, covariables effect on the rate or on the cumulative mean number of events do have a causal interpretation (no selection of the population over time). Hence the rate is particularly interesting in randomized designs to compare treatment effects [58]. The cumulative mean number of events, that are easier to understand, may be obtained from the rate function.

    Handling terminal intercurrent events Terminal events may be managed similarly to the single event hazards by considering the modeling of the rate of AEs at a given time, conditioned on the survival from the terminal event at the same time (see Table 2) [53]. Due to this conditioning, the rate-based covariable effect estimates no longer have a causal interpretation [58]. Furthermore, the rate is no longer directly related to the cumulative mean function (it may also depend on the terminating event rate).

  • Interest is to compare the patient’s individual risks of AEs given their number of previous events The Prentice-Williams-Petersen (PWP) [59] model may be considered in that situation. It is another semi-parametric proportional model based on intensities and the history of the process being merely summarized by the number of previous events. Hence, it can be seen as a multi-state model. Two formulations of the model are possible depending on the knowledge of the process. The first formulation is defined by the time of the events. The covariable effect on the occurrence of AEs is evaluated over the entire observation period and may be allowed to vary according to the number of previous events. In particular, the event occurrence does not depend on the delay since the previous event. The second formulation relies on inter-times or gap times, i.e., the delay between two subsequent events. In that formulation, the occurrence of an event does not depend on the delay since the beginning of the follow-up (e.g., the beginning of the treatment). Like the AG model, the coefficient estimates may be obtained from a standard statistical software handling the Cox model, by rearranging the dataset and stratifying the number of previous events [53, 60]. As for the hazard, the covariable effects in PWP models do not have a causal interpretation (selection of the population) despite randomization at baseline [61]. Deriving the cumulative mean number of AEs from intensity models can be very complex, as well as can its interpretation [53].

Handling terminal intercurrent events The terminal event (e.g., death) may be related to the recurrent process (e.g., serious events). In that case, the joint modeling of the hazard of the terminating event and of the intensity of the process may be necessary (e.g., joint frailty model) [53, 62].

Comparing severities in recurrent event models The model of Berridge and Whitehead discussed in the single-event model section was extended to deal with recurrent events by replacing the proportional hazard model by a recurrent model such as PWP [17].

2.4 Multi-type Event Models

As discussed in section 2, patients may experience multiple types of AEs (situation 5 in Fig. 1). It is always possible to model them independently but this would result in a loss of power. Modeling AEs jointly seems more attractive. The Wei Lin and Weissfeld (WLW) model [63] is a Cox model extension that is able to handle several types of events (by stratifying the type). It provides a common measure of the variable effect across all types of AEs considered. This comes with an important gain in power but at the cost of the strong hypothesis that covariable’s effects are the same whatever the type of AEs.

Comparing severities in recurrent event models An extension of the Berridge and Whitehead model was proposed [58] in which the two components are respectively replaced by a multinomial logistic regression and a recurrent event model (PWP or WLW). One difficulty here is to carefully define mutually exclusive categories of AEs.

3 Discussion

In this article, we discussed various regression models for AE outcomes. We considered various dimensions of those events, including severity. Models dealing with severity are more complex (two-component models) and most have been proposed recently. More practical examples to facilitate their interpretation, as well as the implementation of software, would be useful so that they can be used routinely. In most cases, models should be adapted in the presence of a terminal event like death (at least with regard to their interpretation).

Although we extensively searched the literature to illustrate the methods of this overview, this article does not claim to be representative of the actual practice nor to be an exhaustive list of the methods used in that context. We first identified methodological articles dealing with the statistical issues in AE analysis in Pubmed and Google Scholar using keywords: “toxicities”, “adverse events” or “drug safety” with “statistical analysis”. We did not exclude any period of publication. From those articles, we built a list of regression models. To enrich the discussion, we managed to identify articles that apply the models in practice with drug safety data by searching the name of the model with the keywords “toxicities”, “adverse events” or “drug safety” in PubMed and Google Scholar as previously. Hence, this article provides some methodological tools that may suit common situations and answer some clinical questions. Most of the references we provided dealt with the comparison of AEs between treatment arms in RCTs but their usage may be extended to observational studies, like the Qualitop project [64], which motivated this article, and various covariables of interest.

Often, risks measures are used to “map the AE data to a single value” [19] for the purpose of safety evaluation. However, unlike efficacy, AE comparisons may not rely on a single value due to the complex dimensionality of those data. For example, providing both absolute and relative risk measures is commonly advised [37]. For non-parametric estimation, incidence rate is often advised compared to the overall probabilities to account for studies with various follow-up durations (e.g., due to different durations of two treatment arms) [19]. However, by considering the overall cumulative incidence function over time instead of an overall probability, quantities are more comparable. Graphical representations of the absolute risk stemming from the regression model should be done systematically as it may help to validate the adequacy of the model (e.g., comparison with non-parametric estimates) and to interpret the model, particularly in the presence of competing risks. Moreover, it should facilitate further meta-analyses, mixing studies with various follow-up durations.

All throughout the article, we considered the outcome (AEs) of the models to be clearly defined. However, the number of AEs collected may be huge. For example, the CTCAE has narrowed the keyword field used to describe AEs but its version 4.0 still includes more than 1000 terms. Hence, the analyses and comparisons have to then focus on a small number of events of interest whereas the criteria for their selection are often unclear, ill documented or based on arbitrary rules (e.g., frequencies \(\ge 5 \%)\). Some authors considered grouping AEs according to the body systems [65] but assigning types of AEs into body systems is not always as easy as it may seem and the grouping choices may highly influence the conclusions of the study [18]. Selecting the AEs according to their attributability to the treatment is a more difficult task. In their 2016 recommendations, Lineberry et al did not insist on this kind of selection because of its inherent subjectivity and limited value in clinical trials [4].

One common limitation of all the models we discussed is the reliability of data collection. If mild AEs may be of interest regarding the quality of life of the patients, they are often under-reported by clinicians [52]. Therefore, using Patient Reported Outcomes (PRO) may be more relevant in that situation. Furthermore, we discussed models using severity that may be difficult to collect reliably over the whole follow-up, particularly in observational studies. For severe events, the patient is most likely to come for a consultation or may be hospitalized, so the collection of the AE is close to the time of occurrence. Otherwise, AE collection is usually performed when patient meets clinician for a follow-up visit (e.g., once a month). The time of occurrence is therefore not precisely collected, which leads to interval censoring and modeling issues. Moreover, some AEs may arise and be resolved in between visits (e.g., transient hyperthyroidism during immunotherapy), so they may not be collected (truncation). Hence, comparing the occurrence of such AEs in patients with various visit frequencies may be misleading.

4 Conclusion

Comparing the adverse events between groups of patients is a recurrent occurrence with drug safety data. Regression models adjusting on important covariables may be considered in both observational and RCTs. Time-to-event models are advocated for AE analysis; however, the interpretation of those models are complicated because of competition with death or treatment discontinuation. Hence, the absolute risk from the regression model should be presented systematically because it may help validation and interpretation, particularly in the competing risk settings, and comparison between studies with different follow-up times. Flexible time-to-event models dealing with baseline risks (unlike semi-parametric models) as well as non-linear and time-dependent covariate effects have proven to be useful and should be explored further in this context. Rare events are a recurrent issue in drug safety data and few models may suit rare outcomes. Hence, the logistic regression (with rare event corrections) may be a useful option. Recent articles proposed models accounting for severity; however, their interpretation may be difficult and real-life application should be performed to extend their use.