Introduction

The virus responsible for COVID-19, SARS-CoV-2, binds to angiotensin-converting enzyme 2 (ACE2) [1, 2]. ACE2 is a key regulator of the renin-angiotensin system (RAS), responsible for degrading angiotensin II and thus preventing the downstream effects of the angiotensin II type 1 receptor (AT1R). Binding of SARS-CoV-2 to ACE2 may result in the downregulation of the host receptor, causing the RAS to shift towards a pro-inflammatory, pro-fibrotic state [3, 4]. AT1R has been shown to facilitate ACE2 receptor-mediated endocytosis in the non-COVID-19 setting [5] and so may also play a more direct role in COVID-19 pathophysiology. Angiotensin receptor blockers (ARBs) are widely available medications originally developed as blood pressure-lowering agents that reduce the risk of cardiac events and the progression of kidney disease. By inhibiting the binding of angiotensin II to AT1R, it is hypothesised that ARBs may reduce the inflammatory effects of SARS-CoV-2 infection, thereby reducing the risk of severe COVID-19. ARBs have been in clinical use for over 30 years and are extensively used in the treatment of chronic conditions involving a dysregulated RAS, so could provide a safe and readily available adjunct to current treatments of COVID-19. Furthermore, ARBs are known to protect against lung injury in animal studies, including injury from viruses like the SARS-CoV-1 virus [6]. Whether similar protection is observed in humans is unknown.

The CLARITY trial is a two-arm, multi-centre, comparative effectiveness phase III randomised controlled Bayesian adaptive trial, conducted in India and Australia. It is designed to evaluate whether ARBs reduce the severity of COVID-19 among high-risk patients. The investigational arm of the study is ARB delivered with standard of care. In India, the control arm is placebo plus standard care and participants are blinded. In Australia, the control arm is standard care and neither participants nor clinicians are blinded.

At the time of writing, participants are being recruited at clinical sites in India and Australia and the first planned analysis is imminent. In both countries, participants are recruited from sites providing inpatient care. In addition, in Australia, participants are recruited from sites that are managing community-based patients in monitored settings.

The trial design addresses the safety and feasibility challenges of running a clinical trial during a pandemic. To address a priori uncertainty about the extent of potential benefit, an adaptive sample size design has been selected that allows the trial to continue until pre-specified levels of evidence of effectiveness or futility are met. The approach reduces the risk of an indeterminate outcome and ensures the trial does not continue recruiting participants beyond the point where conclusive evidence of benefit is found. This means that results can be expeditiously reported and adopted into clinical practice. Such an approach is particularly appealing during a pandemic. In light of the well-established profile of ARBs, infection control risks and the potential demands on health services, the burden on clinical staff is reduced through a number of measures including the collection of data largely limited to information contained in the health record. Trial-specific in-person encounters are also avoided and phone consent employed to reduce opportunities for infection transmission.

The CLARITY trial statistical analysis plan (SAP) was developed following the Guidelines for the Content of Statistical Analysis Plans in Clinical Trials [7] and includes pre-specified decision rules for continuing or stopping the trial based on effectiveness or futility. None of the authors has been unblinded nor observed any data other than aggregated baseline characteristics. The final study report will follow the Consolidated Standards of Reporting Trials guidelines for reporting on randomised controlled adaptive trials [810]. The study protocol has been accepted for publication [11].

Study design

Overview

CLARITY is a two-arm Bayesian adaptive randomised controlled trial with the sole adaptation relating to sample size. Frequent planned analyses will be performed to evaluate whether a treatment benefit exists up to a maximum sample size of 2200. Pre-specified decision rules are defined that allow the trial to be stopped at planned analyses if there is sufficient evidence of treatment effectiveness or futility.

The pre-defined decision rules for early stopping are:

  1. 1.

    Stop for effectiveness — if the predictive probability of trial success at the current sample size exceeds a threshold of 0.95

  2. 2.

    Stop for futility — if the predictive probability of trial success at the maximum sample size is below a threshold of 0.02

The first planned analysis will be triggered 14 days after enrollment of the 700th trial participant and conducted as close to that as is practical given logistical constraints. If neither the effectiveness nor futility decision rule requirements are met, successive planned analyses will be conducted 14 days after every additional 300 participants are enrolled. However, if a trial decision rule is met and the decision subsequently recommended by the Data Safety Monitoring Board (DSMB), then all enrolled participants will be followed up to 28 days after enrollment and the final analyses will be undertaken and reported on primary and secondary outcomes.

Intervention

In India, the ARB used is telmisartan and is supplied as 40mg tablets. In Australia, supply chain interruptions prevented the sourcing of placebo for these hygroscopic medications at the commencement of the trial. The trial was therefore commenced open-label in Australia, with the intention of sourcing placebo as the trial progressed, an intention which was subsequently abandoned with the abatement of cases in Australia and subsequent low recruitment rates. Trial principal investigators in Australia are permitted to select an ARB from the local hospital formulary.

The ARB or placebo is taken according to the treating clinician prescription for 28 days. Guidance is provided on the initial dose, dose titration, and associated monitoring. Final decisions on ARB dosing and management are determined by treating clinicians in line with their familiarity with these medications. Participants are followed daily between day 0 and day 28, and at day 90, primarily through information recorded in the medical record supplemented by phone calls.

Standard care is at the discretion of the treating team, who are encouraged to manage participants according to local best practice throughout the course of the study. Agents used for treatment of COVID-19 are determined by local practice and will be recorded.

Randomisation

Participants are randomised (1:1) to the investigational arm (ARB plus standard care) or control. As already noted, in India, the control arm is placebo plus standard care, and, in Australia, the control arm is standard care. Participants are randomised within 10 days after a confirmed SARS-CoV-2 diagnosis. The randomisation procedure uses permuted blocks with sizes of 4 and 6 (with equal probability). This was generated in SAS for the whole trial by Dr Qiang Li, a senior biostatistician at The George Institute for Global Health. The randomisation was stratified by country, and in Australia by hospital and community-based settings. The randomisation list and seed are held in a Redcap database and on restricted access folders that are only accessible by Qiang Li and unblinded database administrators.

Trial population

A flow diagram of the trial and participant progression is presented in Fig. 1. Participants enrolled in the trial are those aged ≥18 years with a recent diagnosis of SARS-CoV-2 infection (<10 days prior to randomisation). Additional eligibility criteria identify participants who are at high risk of severe disease, including requirement for hospital admission due to COVID-19 or, for those who are managed in the community and have at least one of the following risk factors: aged ≥60 years, body mass index ≥30kg/m2, diagnosis of diabetes (HbA1c ≥7% and/or use of glucose-lowering medication), history of cardiovascular disease , history of chronic respiratory disease or current treatment with immunosuppressive therapy (as defined by the treating clinician). Additional inclusion and exclusion criteria are defined in the trial protocol [11].

Fig. 1
figure 1

Trial overview and participant schedule [11]

The number of participants who are screened and randomised (% of screened), who complete 28 days of the trial (% of randomised) and who complete follow-up (% of 28 days completed) will be reported. The number of participants who meet the eligibility criteria but are not enrolled and the reasons for non-enrollment will be reported, where available. For missing data, it will be reported if no data have been collected from a participant for >7 consecutive days between day 0 and day 28, the number of withdrawals (% of randomised) and those lost to follow-up (% of randomised).

Primary analysis population

The primary analysis population will include all participants who were randomised and have passed the primary outcome of 14 days after randomisation with their primary outcome status either known or known to be missing. This analysis set will follow the intention-to-treat principle in that all randomised participants will be included and analysed according to the treatment to which they were initially allocated, irrespective of any deviations from this or any other protocol deviations. This analysis population will inform the primary estimand. Any missing data for the primary outcome will be reported. The occurrence of missingness will also be explored to identify any potential patterns. In the decision procedures, missing values will be handled via Bayesian imputation [12, 13].

Primary, secondary and exploratory outcomes

The primary outcome for this trial is a 7-point ordinal scale of clinical health status (formed by adapting the WHO Clinical Progression score [14]) assessed on day 14. Each level in the scale is scored as shown in Table 1.

Table 1 Seven-point ordinal scale of clinical health status [14]

Table 1 also shows the percentage of participants expected to be in each state at days 14 and 28 under standard care (empirical estimate, based on a small convenience sample of accessible study results available at the time of writing). The secondary and exploratory outcomes are listed in Table 2.

Table 2 Secondary and exploratory outcomes for the CLARITY trial. AKI is defined as any documentation of the following: increase in serum creatinine by ≥0.3 mg/dl (≥26.5 μmol/l) within 48 h; increase in serum creatinine to ≥1.5 times baseline, which is known or presumed to have occurred within the prior 7 days; or urine volume <0.5 ml/kg/h for 6 h [35]

ARBs have been extensively researched and used widely in standard practice for around 30 years, forming a significantly robust safety profile. The trial endpoints capture the commonly known safety events associated with ARBs. No additional safety reporting of adverse events or serious adverse events is required given the well-understood safety profile of these agents.

Data management

The Data Management Plan for CLARITY has been developed according to the standard operating procedures of the NHMRC Clinical Trials Centre at the University of Sydney, and the Trial Master File is held centrally by the Clinical Trials Centre. All such documentation are currently not publicly available.

Statistical analysis framework

All outcomes will be analysed within a Bayesian framework based on joint posterior distributions. Sampling from the posterior distribution will use Markov chain Monte Carlo (MCMC) methods [1517], and where necessary, model choice will be determined via an appropriate information criterion [18]. Convergence of MCMC chains will be assessed via trace plots and the Gelman-Rubin diagnostic [19], and goodness-of-fit will be assessed via posterior predictive checks. All computation will be performed within the R-package which will interface with STAN [20] or equivalent should the need arise.

Across all analyses, weakly informative prior information calibrated by prior predictive checks will be used such that trial conclusions will be predominately data-driven. For all parameters of interest, posterior means will be reported along with 95% credible intervals and the posterior probability that the parameter is greater or less than 0.

Descriptive summaries

Baseline characteristics

Baseline characteristics will be summarised for each treatment arm and overall. Discrete variables will be summarised by frequencies and percentages. Percentages will be calculated according to the number of participants for whom data are available. Where values are missing, the denominator (which will be less than the number of participants assigned to the treatment group under consideration) will be stated. Continuous variables will be summarised by use of standard measures of central tendency and dispersion using mean and standard deviation and/or median and first and third quartiles. Free text entries for fields collecting both categorical and free text information (e.g. ethnicity) will be assessed and assigned to a category if appropriate. No testing will be performed for differences in baseline characteristics between treatment arms as per CONSORT principles.

Table 3 shows the baseline characteristics that will be summarised.

Table 3 Baseline characteristics

Delivered treatment

Medication adherence will be reported by each treatment arm as adherence days (median, first and third quartiles, minimum and maximum), number of days of follow-up (median) and percentage of days medication was administered (mean ± SD) for those who complete follow-up and those who do not. We will also report the mean average dose of treatment for participants randomised to ARB plus standard care, the use of other COVID-19 directed care, other hypertensives and use of oxygen.

Analysis of primary outcome

The primary outcome for this trial is a 7-point ordinal scale of clinical outcomes (Table 1), assessed on day 14 and will be modelled using proportional odds cumulative logistic regression [21]. Under such a proportional odds model, the treatment effect (along with the parameters associated with other variables) is common among the response categories.

The model is specified as follows. Let Yi∈{1,…,K} denote a random variable for the outcome for the ith participant where the ordering is natural and K=7. A proportional odds model can be constructed based on the categorisation of a latent continuous variable Y. Ordered cut-points \(c\in \mathbb {R}^{K-1}\) are defined such that ck<ck+1 for k∈{1,…,K−2} then for k∈{1,…,K}:

$$\begin{aligned} \pi_{ik} &= P(Y_{i} = k | \eta_{i}, c)\\ &= \left\{ \begin{array}{ll} 1-\text{logit}^{-1}(\eta_{i}-c_{1}) & \text{if }k=1 \\ \text{logit}^{-1}(\eta_{i}-c_{k-1}) - \text{logit}^{-1}(\eta_{i}-c_{k}) & \text{if }1< k<K \\ \text{logit}^{-1}(\eta_{i}-c_{K-1}) & \text{if }k=K \end{array}, \right. \end{aligned} $$

where YiMN(n=1,(πi1,…,πi7)) and ‘MN’ denotes the multinomial distribution. The linear predictor \(\eta _{i} = \beta x_{i} + \sum _{p=1}^{P} w^{\mathsf {T}}_{ip} \gamma _{p}\) comprises a parameter β that charaterises the treatment effect, with xi=1 for the ARB treatment group membership and xi=0 otherwise and a vector of parameters γ for modelling variation in the response associated with baseline variables defined in \(w^{\mathsf {T}}_{i}\). Note that this implementation will be such that exp(β)<1 results in a reduction in the probability of death.

The model is parameterised with the cuts ck derived from a simplex with a Dirichlet prior where all concentration parameters are set to 1. All fixed effect terms will use weakly informative normal priors centred on zero with a standard deviation of σβ=10. If issues with convergence or sampling arise, variations to these priors are permissible at the discretion of the analyst; however, all model variations and their justification will be reported to the DSMB and will be disclosed in all internal and external publications.

The motivation to adjust the model for baseline covariates stems from the understanding that mortality rates associated with COVID-19 infection vary by age, sex, ethnicity and the presence of some co-morbid diseases [2224]. Specifically, the case mortality rate is estimated as being very low (0.06% or less) in age groups under 50, 0.5% in 50–59 year olds, 2.9% in 60–69 year olds before steeply rising to 40% in those aged 90 and over [22]. Additionally, disease severity appears to be worse in males, in those with pre-existing co-morbidities, such as hypertension, diabetes, heart failure, chronic kidney disease, and chronic respiratory illness [23], and in the presence of obesity [24].

Since age, sex, co-morbid disease and oxygen requirement are the most supported, baseline values for these variables will be included in the linear predictor as fixed effects with levels as summarised in Table 4. Incorporating these terms in the model allows us to account for potential baseline imbalance, obtain a stratified estimate of the response and can increase power. All decisions arising from successive planned analyses will be based on the adjusted model; however, both unadjusted and adjusted results will be reported.

Table 4 Prognostic baseline characteristics/pre-specified subgroups for analyses of the primary, secondary and exploratory outcomes

Sensitivity to assumptions

Sensitivity analyses for the primary outcome will be undertaken to determine how the results of the analysis may change depending upon particular modelling assumptions. For this trial, this will include:

  • Choice of priors where, instead of a Dirichlet prior, an ordered Normal prior will be placed on the cuts ck. This Normal distribution will have a mean of 0 and standard deviation of 4. Results from fitting an analogous frequentist model will also be reported.

  • Relaxing the proportional odds assumption

Additionally, a subset of data will be modelled constrained to participants who were randomised to receive either an ARB or a placebo (i.e. those participants randomised in India). Technically, this is a subgroup analysis, but is included here because the results from this subset of the data represent an important consideration for interpreting the results for the primary outcome.

Weakly informative priors have been pragmatically chosen to regularise estimates and minimise problems with posterior sampling. The sensitivity analysis will evaluate the impact of uninformative, skeptical and enthusiastic priors [25], keeping in mind the comments of [26] on the influence of the likelihood on interpreting the contribution of the prior.

Relaxing the proportional odds assumption accommodates category-specific effects. For example, an intervention could be potentially useful for healthier participants, but harmful to participants that are already very sick. The cumulative ordinal model is not well suited for this purpose (category-specific effects in this model can lead to negative probabilities of having some scores), so an alternative modelling strategy is required. One approach is to collapse the response into several binary comparisons, fit separate logistic regression models (response scores 1 versus 2–7, 1–2 versus 3–7 etc.) and qualitatively compare the odds ratios obtained from these models to the more parsimonious proportional odds model. However, this becomes unwieldy for a large number of response categories. Therefore, the sensitivity analysis will be undertaken using an adjacent category model with category-specific effects to gain insight into heterogeneity in the treatment effect, see Chapter 4 of [27] for further details. Metrics for relative goodness-of-fit will be obtained via leave-one-out cross-validation.

The category-specific effects model will only be evaluated at the final analysis, but the potential for violations in the proportional odds assumption will be monitored heuristically at each planned analysis.

Analysis of secondary and exploratory outcomes

The analysis of all secondary and exploratory outcomes detailed in Table 2 will be model-based within a Bayesian inference framework. An outline of the modelling approach for each of the different data types is given below. Unless directed by the DSMB, the analysis of all secondary and exploratory outcomes will only be conducted as part of the final analysis. The priors on all regression parameters will be weakly informative (e.g. Normal with mean 0 and standard deviation 10). The only exception to this will be in the case of the binary endpoint where a Uniform prior on the probability support for the inverse logit of the intercept will be used.

Binary outcome

The following outcome is represented as a binary random variable and will be summarised by number of and proportion of outcomes within each category. The relative odds of the outcome will be modelled by treatment assignment using standard logistic regression.

  • (All-cause) mortality at day 28 (participants discharged prior to day 28 will be considered alive unless noted otherwise)

For logistic regression, the parameter associated with the treatment term in the model reflects the change in the log-odds of the event relative to the log-odds of the event in the reference group holding all other terms constant.

Ordinal outcomes

The following outcomes can be represented as ordinal scales. Each will be summarised by the proportion in each category and modelled using the same approach described for the primary outcome. Specifically, the relative odds of the outcome will be modelled by treatment assignment using proportional odds logistic regression.

  • Seven-point ordinal scale score at day 28 (primary is at day 14)

  • Number of ventilator-free days between day 0 and 28

  • Number of oxygen support-free days between day 0 and 28

  • Number of dialysis-free days between day 0 and 28

  • Number of ICU-free days between day 0 and 90

For the latter four outcomes, the occurrence of death will be coded such that it is the worst possible outcome, with the remaining possible outcomes being coded via the natural ordering. In cases where some categories only have a few observations, then categories may be aggregated to improve model stability.

The interpretation of the parameter associated with treatment term in the proportional odds cumulative ordinal model is very similar to that of a logistic regression. For the ordinal model, one can conceptualise the event as successive splits of the scores demarcating success. So, for example, one might consider realising 1 or more ventilator-free days (VFDs) versus 0 or worse. Equivalently, one can consider realising 2 or more VFDs versus 1 or worse and so on. The parameter estimate associated with treatment simply characterises the change in the log-odds of the event (as defined above) on ARB relative to the log-odds of the event on placebo, holding all other terms constant. The proportional odds assumption holds that the transitions between each of the splits are equally impacted by the treatment, i.e. there are no category-specific effects.

Time-to-event outcomes

The following outcomes are considered in terms of the time until the occurrence of an event, all of which occur in the presence of a competing risk. For example, time to admission to ICU is blocked should the patient die prior to being admitted to ICU. The goal is to characterise the rate or intensity of events by treatment assignment.

  • Time to discharge alive from hospital between day 0 and 28

  • Time to discharge alive from hospital between day 0 and 90

  • Time to admission to ICU between day 0 and 28

  • Time to admission to ICU between day 0 and 90

  • Time to non-invasive or invasive mechanical ventilation between day 0 and 28

  • Time to requirement for dialysis between day 0 and 28

  • Time to AKI between day 0 and 28

  • Time to hypotension requiring vasopressors between day 0 and 28

  • Time to hyperkalemia between day 0 to 28

With the exception of time to discharge (where time to discharge is the event of interest), each of these events exists in the presence of the competing risk of death and/or discharge. Standard approaches for evaluating competing risk include models that focus on the cause-specific hazard and/or the subdistributional hazard [28, 29]. The former is preferred where interest is in the rate of occurrence rather than the risk and is generally recommended for evaluating direct causal effects. Therefore, we will adopt a piecewise exponential model for the cause-specific hazards (reporting results from all event types). The linear predictor will be of a similar form as described in the “Analysis of primary outcome” section; the exponential of which will estimate the multiplicative shift away from the baseline marginal hazard function.

The cause-specific hazard is the instantaneous rate of failure from a specific cause given that no failure from any cause has happened. Interpretation of the parameter estimate associated with treatment effect will be in terms of the effect on the cause-specific hazard of the event under consideration. Finally, in a competing risk setting, the cumulative incidence function (a generalisation of one minus the survival to the competing risk setting) cannot be estimated naively using one minus the Kaplan-Meier estimator. Therefore, any visualisation of the survival and cumulative incidence will account for this.

Subgroup analyses

Rather than split the data into strata and proceed with independent analyses, subgroup analyses will be undertaken by extending the linear predictor in the specified models to account for treatment by group interactions. The pre-specified subgroups include age, sex, co-morbid disease and oxygen requirement at baseline as detailed earlier in Table 4.

Shrinkage through prior specification will be used as a method for mitigating issues associated with multiplicity. Additionally, only first-order interactions will be considered, which implicitly assumes all higher order interactions are set to zero. Specifically, for the subgroup treatment effects, a revised linear predictor will be introduced from the earlier specification in the “Primary, secondary and exploratory outcomes” section:

$${}\begin{aligned} \eta_{i} &= \beta x_{i} + \sum_{p=1}^{P} w^{\mathsf{T}}_{ip} \gamma_{p} + \delta_{1,\mathsf{sex}[i]} x_{i} + \delta_{2,\mathsf{age}[i]} x_{i} + \delta_{3,\mathsf{cmb}[i]} x_{i}\\ &\quad+ \delta_{4,\mathsf{O2}[i]} x_{i} \end{aligned} $$

where each δg parameter corresponds to a vector of parameters associated with each subgroup of interest with δgNormal(0,σδ) and σδExponential(1) such that the interaction terms are estimated under the assumption of a common variance [30]. The abbreviations “cmb” and “O2” correspond to “comorbidity” and “oxygen requirement”, respectively. Unfortunately, little information about the heterogeneity in treatment effects was available to inform the choice of prior for σδ. Hence, this prior was selected on the basis that it should impose some level shrinkage for moderately variable treatment effects.

From the above model, each covariate can be considered separately and standardised treatment effects obtained in each subgroup as weighted averages. For example, to obtain a standardised treatment effect for males, compute

$${}\begin{aligned} \beta_{\mathsf{M}} &= \beta + \delta_{1,\mathsf{sex[M]}} + \nu_{\mathsf{age}=1} \times \delta_{2,\mathsf{age}[1]} + \nu_{\mathsf{age}=2} \times \delta_{2,\mathsf{age}[2]} \\ & \quad+ \nu_{\mathsf{cmb}=1} \times \delta_{3,\mathsf{cmb}[1]} + \nu_{\mathsf{cmb}=2} \times \delta_{3,\mathsf{cmb}[2]} + \nu_{\mathsf{cmb}=3} \\ &\quad \times \delta_{3,\mathsf{cmb}[3]}+ \nu_{\mathsf{O2}=1} \times \delta_{3,\mathsf{O2}[1]} + \nu_{\mathsf{O2}=2} \times \delta_{3,\mathsf{O2}[2]} \end{aligned} $$

where βM denotes the log odds-ratio for the treatment effect in males, νage=1 denotes the observed relative frequency of males in the first age category, νage=2 denotes the observed relative frequency of males in the second age category and the other ν weights are defined similarly.

Missing data

Among other possibilities, missing data are to be anticipated due to loss to follow-up and competing priorities on institutional resources to complete follow-up.

Under the missing completely at random (MCAR) and missing at random (MAR) assumptions, there will be loss of precision due to the missingness, but there is no bias in the parameter estimates when appropriate statistical methods are used. However, under missing not at random (MNAR), the probability of missingness depends on the (unobserved) missing values. When missingness is due to MNAR, there is both a loss in precision and bias, and therefore sensitivity analyses are required. MAR is commonly assumed, although MNAR is arguably more applicable for most settings. In the CLARITY trial, there is potential for MNAR because differential loss to follow-up could be observed across treatment groups; however, this is believed to be unlikely.

Patterns of missingness for the outcome and covariates will be reported by time and group. When applicable, missing outcome data will be imputed using the standard posterior predictive approaches within Bayesian inference [12]. The posterior distribution for imputation will be based on the analysis of all available data. Missing covariate information may also be imputed, and this will be based on, where possible, other available data (e.g. missing region or site).

If concerns arise regarding missingness (or a sensitivity analysis is requested by the DSMB), then single value imputation may be used implementing a best-worst-case and worst-best-case sensitivity analyses to evaluate the potential range of impact of missing values.

Exploratory analyses

Temporal treatment effect heterogeneity

As there is potential for changes in standard of care and variation in the circulating virus strains, we will explore temporal heterogeneity in the treatment effect for the primary outcome. To do so, indicator variables will be created that categorise trial epochs. These indicator variables will then be included (as described in the “Analysis of primary outcome” section) as main effects and as terms that interact with treatment. Choice of the time step will be pragmatically selected based on obtaining reasonably sized groups for the purposes of estimation.

Disease progression modelling

Given the potential for participants to manifest complex disease progression, state-space modelling will be undertaken to characterise transition probabilities and sojourn times.

Planned analyses

The first planned analysis will be conducted as soon as practicable after the 700th enrolled participant has been followed for 14 days. This was decided by exploring the sampling distribution of the treatment effect parameter for a variety of different scenarios (“Specification of tuning parameters for decision rules” section) including different odds ratios, and a desire to have a relatively stable estimate, see Fig. 2. From the results, the first planned analysis at 14 days after 700 participants were enrolled was a reasonable compromise. After this, the planned analyses will occur 14 days after every 300 additional participants have been enrolled. The decision also considers practical constraints such as needing enough time to complete the analyses and having these reviewed by the DSMB before the next planned analyses.

Fig. 2
figure 2

Distribution of the posterior mean of β based on 1000 simulated data sets for a given number of enrolments and odds ratios of a 1.00, b 0.95, c 0.87 and d 0.80

The planned analyses will model the primary outcome (“Analysis of primary outcome” section) for which both the unadjusted (i.e. model only includes treatment status) and adjusted posterior treatment effects will be reported to the DSMB. Decision rules will be evaluated based on the results from the adjusted primary analysis.

Subgroup analyses (“Subgroup analyses” section) will only be undertaken at the final analysis as these are essentially exploratory in nature and need to be interpreted with great care.

Based on the results from the primary analysis, supplemented by external knowledge, recommendations will be made to the Trial Steering Committee by the DSMB to either halt or continue recruiting into the trial.

Decision rules

Pre-specified decision rules will be evaluated at each planned analysis, and the trial can be stopped based on the results of these analyses. This implies that the trial sample size is a random variable with an upper bound of 2200, dictated by the available resources. For this trial, two decision rules were adopted. These were to assess treatment effectiveness and to determine if it is futile to continue the trial. Both of these decision rules are based on the primary outcome as described in the “Analysis of primary outcome” section accounting for differences between treatment arms and baseline characteristics only. In the following section, each decision rule is defined, including the methods by which each will be evaluated.

Effectiveness

At each planned analysis, the predictive probability of treatment arm effectiveness compared to placebo or standard care alone (Australia only has standard care) will be assessed. To do so, one may consider that there are two types of participants in the interim data: (1) participants who have been enrolled and had their day 14 outcome ascertained and (2) participants who have been enrolled but have not yet had their day 14 outcome ascertained. The expectation of the probability that the posterior probability β<0 is greater than the pre-specified level of evidence (Δ=0.975) will be evaluated, which is nominally consistent with a standard one-sided frequentist type-I error of 0.025. The expectation is taken over the data from participants who have been enrolled but have not responded and is assessed against a decision threshold, δe set at 0.95. Specifically, the expectation is defined:

$$\begin{aligned} &E[\mathcal{I}(P(\beta < 0 | y,Z,x,w) > \Delta)]\\ &\quad= \sum_{z\in\mathcal{Z}} \mathcal{I}(P(\beta < 0 | y,z,x,w)> \Delta)p(z|y,x,w), \end{aligned} $$

where \(\mathcal {I}()\) is an indicator function which equals one if the event is true and zero otherwise, y denotes data on the primary outcome for the participants who have been enrolled and had their day 14 outcome ascertained, x and w denote treatment allocations and baseline characteristics (respectively) for all participants enrolled, and \(Z\in \mathcal {Z}\) is the random variable associated with z which denotes supposed future data for the participants who have been enrolled but not yet had their day 14 outcome ascertained.

Evaluating the above expectation is unwieldy, so simulation will be used to form an estimate. The approach for this is outlined in Algorithm 1 where the data y, the treatment allocations x, baseline characteristics w and the prior on the parameters p(β,c) are initialised. For large B, a sample from the posterior distribution of the parameters is obtained based on data from participants who have been enrolled and have had their day 14 outcome ascertained. For those participants who have not yet had their day 14 outcome ascertained, their treatment allocation and baseline characteristics are known but their outcome is not, so their outcome is simulated (line 4). This forms a (partly simulated) data set for all participants enrolled in the trial from which a posterior distribution can be found.

To sample from this posterior distribution, MCMC could be used. However, given this step needs to be performed a large number of times (within a single simulation of a trial and across different trial settings), using MCMC here becomes computationally infeasible. Accordingly, a fast approach to form an approximation of the posterior distribution is needed. Consequently, the Laplace approximation [31] will be adopted, as has been used previously for a similar purpose [32, 33]. Based on this posterior, an indicator function is evaluated to determine whether treatment effectiveness would be concluded at the pre-specified level of evidence (Δ=0.975), if the outcomes from all enrolled participants were ascertained (line 6).

After repeating this procedure a large number of times, the probability the trial is expected to be successful is approximated (line 8). If this is larger than a nominal value of δe=0.95, then the trial will stop for expected treatment effectiveness.

Futility

The purpose of the futility decision rule is to determine whether it would be futile to continue the trial based on the apparent likelihood of concluding effectiveness. To evaluate this, we consider the probability of concluding treatment effectiveness if the trial were to continue until the maximum sample size of N=2200. At each planned analysis, there will be three types of participants: (1) participants who have been enrolled and had their day 14 outcome ascertained; (2) participants who have been enrolled but have not had their day 14 outcome ascertained; and (3) participants who have not yet enrolled and therefore could not have had their day 14 outcome ascertained.

As in the effectiveness decision rule, there will be uncertainty about the responses for a number of participants. Given this, we consider the expectation of an indicator of success over the distribution of these unknowns. Specifically, the expectation is defined:

$${}\begin{aligned} &E[\mathcal{I}(P(\beta < 0 | y,Z,x,V,w,U) > \Delta)]\\ &\quad= \sum_{v\in\mathcal{V}} \sum_{u\in\mathcal{U}}\sum_{z\in\mathcal{Z}} \mathcal{I}(P(\beta<0|y,z,x,v,w,u) > \Delta)\\ &\quad\quad p(z|y,x,v,w,u)p(u)p(v), \end{aligned} $$

where V, U and Z are the random variables associated with v which denotes supposed future treatment allocations, u which denotes future baseline characteristics of participants and z which denotes supposed future outcome data, respectively. The distribution of V denoted as p(v) will be based on the 1:1 randomisation and the distribution of U denoted as p(u) will be estimated non-parameterically via re-sampling w (independently and with replacement).

Simulation is again used to form an estimate, and the approach for this is outlined in Algorithm 2 where y, x, w and p(β,c) are initialised. For large B, a sample from the posterior distribution for the parameters based on data from participants who have responded is drawn (line 4). The remaining trial data is then simulated (line 5). For this, some treatment allocations and baseline characteristics are known and some are not. For those that are not known, these are simulated from p(v) and p(u) as noted above. Data for the remainder of the trial (up to the maximum sample size of N=2200) are then simulated (again in line 5) and a sample from the posterior distribution of the parameters based on this partly simulated data set is then obtained (line 6). Based on this posterior, an indicator function is evaluated to determine whether treatment effectiveness would be concluded at the pre-specified level of evidence (Δ=0.975), if the outcomes from all enrolled participants were ascertained (line 7).

After repeating this procedure a large number of times, the probability the trial will conclude that the intervention is effective is approximated (line 9). If this is less than a pre-specified value δf=0.02 (suggesting that concluding treatment effectiveness after N enrolments is very unlikely), then it will be concluded that it is futile to continue with the trial.

Specification of tuning parameters for decision rules

The specification of the two decision rules adopted in this trial requires pre-specifying the values of Δ,δe and δf. To do so, the statistical power and type-I error of the trial were explored through simulation, aiming to find appropriate values for these parameters. To run such a simulation study, a number of assumptions needed to be made. For the simulations for this trial, the following was assumed:

  • The data observed in the trial will follow a proportional odds logistic regression model where any average difference in outcomes can only be explained by differences in their assigned treatments

  • The data observed from each participant in the trial will be independent. That is, each participant yields a single observation from the 7-point ordinal scale, and these are independent between participants

  • Five values for the treatment effect were explored. These relate to odds ratio values (i.e. exp(β)) of 1.0, 0.95, 0.87, 0.8 and 0.67

  • Accrual rate of participants into the trial will be either 120, 100 or 80 per month. That is, three potential scenarios which can be defined as “best”, “expected” and “worst” (respectively) from the trial perspective

  • The probability of observing each category of the 7-point ordinal scale for a participant within the placebo plus standard care group is as shown in Table 5. Here, three potential scenarios (“best”, “expected” and “worst” from the patient perspective) have been defined

    Table 5 Expected distribution (as percentages) of 7-point ordinal scale at day 14 for placebo/standard of care group where “best”, “expected” and “worst” case is from the patient’s perspective
  • Planned analyses will not start until 14 days after 700 participants have enrolled into the trial. After this, planned analyses will occur 14 days after an additional 300 participants have been enrolled in the study

Based on these assumptions, 500 trials were simulated for every combination of different scenarios as outlined above. That is, trials were simulated with five different odds ratios, three different accrual rates and three different expected outcomes in the control group. This means for each odds ratio there are nine different scenarios, and it is of interest to determine the chance of stopping due to declaring effectiveness or futility across these settings, and also how variable trial outcomes are across these different scenarios. To evaluate expected effectiveness and futility, data were simulated 500 times (i.e. B=500 in Algorithms 1 and 2).

After extensive exploration, it was determined that Δ=0.975,δe=0.95 and δf=0.02 yielded acceptable results across a variety of scenarios. Figure 3 provides a summary where (a) shows the probability of concluding treatment success at the end of the trial, (b) shows the probability of stopping the trial early due to futility, and (c) shows the average sample size where this average is taken across all 500 trials regardless of whether it was stopped early or no conclusion was drawn.

Fig. 3
figure 3

Estimated probability of the trial stopping for a effectiveness, b futility and c average sample sizes based on a variety of different assumptions about the odds ratio, recruitment rate and data that will be observed

We note that for each odds ratio, there are nine scenarios, so each plot actually shows nine points for a given odds ratio. Some of these points are difficult to see because there is very little variability in outcomes across the nine scenarios. This is reassuring as it means the outcomes of the proposed design appear robust to foreseen variations in accrual rate and expected outcomes within the control group. As would be expected, there is substantial variability between different assumed odds ratios. In particular, when the assumed odds ratio is 0.8 and below, the power of the trial to determine that the intervention is effective remains above 80%. When the assumed odds ratio is 0.87, the power reduces to around 60%, and there is very little power when the assumed odds ratio is 0.95. When there is no benefit on treatment, the type-I error remains around 5%.

The average sample size shows that the maximum average is less than 1650 and this occurred when the assumed odds ratio was 0.87. This decreases as the assumed odds ratio decreases because the effectiveness decision rule is, on average, met earlier. As the assumed odds ratio increases, the average sample size also decreases because the futility rule is stopping those simulations early. Given the uncertainty of the clinical importance of a benefit associated with an odds ratio of 0.95, this appears reasonable. For comparison, the required sample size was also computed for a variety of odds ratios and levels of statistical power based on a standard randomised controlled trial where there are two treatment arms and the outcome is ordinal. For this, the analytic solution for sample size as given by [34] was used. These results are shown in Table 6, demonstrating that larger sample sizes are needed to achieve equivalent statistical power to the adaptive trial. For example, to achieve a power of 60% for an odds ratio of 0.87, 2391 participants need to be enrolled. This is compared to an average of 1650 in a trial with an adaptive sample size. This highlights one of the potential benefits of adopting an adaptive sample size approach.

Table 6 Required sample size for a standard, two-arm, randomised controlled trial for different odds ratios and levels of statistical power

Overall, the rules and design of this trial appear to achieve reasonable operating characteristics for moderate to large odds ratios across a variety of scenarios that might be observed. It should be noted that a variety of additional scenarios were also considered in terms of different values for δe and δf. Across all of these scenarios, the values proposed here appear to yield more desirable operating characteristics, given the trial settings and the assumptions made. In addition, alternative primary outcomes such as the 7-point ordinal scale ascertained on day 28, a binary outcome being some combination of the 7-point ordinal scale ascertained on day 14 and day 28, and time-to-ventilation/death were also explored to determine whether a reduced sample size could be considered for this trial. However, no significant improvement in efficiency was observed. Details of this for the time-to-event outcome are given in the Appendix.

Discussion

The CLARITY trial investigates the effectiveness of ARBs with standard care compared to placebo (if provided) with standard care to reduce the duration and severity of COVID-19 in severe participants.

The first patient was enrolled on 18 August 2020. At the time of writing, there are 14 and 7 active sites in India and Australia, respectively. Recruitment has been predominately from India reflecting the relative caseload during the period the trial has been active. Due to uncertainty about the effectiveness of ARBs for COVID-19 and the urgent need to find effective treatments for COVID-19 patients, the trial was designed with an adaptive sample size. The decision rules were constructed to maintain desirable levels of statistical power and type-I error properties, while maintaining a relatively small average sample size, particularly when compared to a traditional randomised controlled trial. Other primary outcomes were explored (including a binary and a time-to-event outcome); however, negligible gain in efficiency was observed.

In terms of the decision rules, the trial can be stopped due to sufficient evidence of effectiveness or futility. It is worth noting that an additional decision rule related to harm was also explored. That is, a rule that would stop the trial if there was sufficient evidence that the treatment plus standard care arm was performing worse than the placebo plus standard care arm (or less than a minimum important clinical difference). However, little to no benefit in terms of the trial operating characteristics was observed when this decision rule was included. This appeared to be because whenever this rule was triggered, the futility rule was also being triggered. Thus, it was concluded that this additional decision rule was not required.

In designing this adaptive trial, we have considered the primary outcome and assumed it will be analysed within a Bayesian inference framework. This was motivated by such a framework offering a number of advantages when compared to frequentist alternatives. These include being able to effectively update an analysis with accumulating data (i.e. at each planned analysis), being able to rigorously handle and incorporate uncertainty due to, for example, missing data, more flexibility in the available modelling approaches, not needing to adjust for multiple testing and interpretability of evidence in the case of “null” results. Thus, we would advocate considering such a framework in future adaptive trials and hope that this paper provides an accessible instruction on applying such methods to encourage further use.

Fig. 4
figure 4

Empirical hazard function for patients with COVID-19 based on data from the Recovery trial (ClinicalTrials.gov, NCT04381936)

Fig. 5
figure 5

Proposed hazard functions for simulation study

Appendix: Time-to-event outcome

To explore whether an alternative outcome might yield a reduced sample size, a simulation study was conducted for the primary outcome of time-to-ventilation/death. Prior information on this outcome was sourced from the Recovery trial (ClinicalTrials.gov, NCT04381936) which yielded an empirical hazard function as shown in Fig. 4. As can be seen, there is essentially a fairly constant decline over a 28-day period. However, there are some deviations from this trend. For example, there is a second “hump” in week 2 possibly associated with cytokine storm in a subset of patients (may be 30% of those hospitalised). This might explain this small hump; however, it may also be artefactual. In addition, in regard to the first “hump”, mechanistically, it is plausible that in order to be enrolled in the trial there could be a small “healthy volunteer” type selection bias whereby anyone whose death is imminent is not randomised (hence the slight increase in hazard over the first few days). Overall, it was thought that these deviations were not large enough to materially affect the simulations when compared to a monotonically declining hazard, so this is what was considered in the simulation study.

The empirical hazard function derived based on the results shown in Fig. 4 is shown in Fig. 5a. For comparison, a hazard function under the exponential distribution (i.e. constant hazard) and a Weibull distribution (i.e. monotonically decreasing) were also considered. These are shown in Fig. 5b and c. If proportional hazards are assumed (between the placebo plus standard care and treatment plus standard care groups), then the hazard for the invention group can be defined via a proportionality parameter (ψ). This proportionality parameter reflects the treatment effect, i.e. logψ=β. All three hazard functions for treatment plus standard care group are also shown in Fig. 5 with ψ=0.87.

Once the hazard function is defined, then all functions needed to simulate time-to-event data (e.g. cumulative hazard, Survivor function, etc.) can be derived. The approach for estimating power given a particular sample size is outlined in Algorithm 3 and is a standard simulation estimation approach. As can be seen, the hazard function (which may involve additional parameters ϕ), ψ and the sample size N are initially defined. Then, for a large number of iterations, treatment allocations are simulated and data are then simulated based on a probability density function and these allocations (line 3). A survival model is then refit to the data (line 4). Here, two types of survival models are fitted: a semi-parametric version where the baseline hazard is left undefined, and a Weibull proportional hazards model. The appeal of the semi-parametric model is that the baseline hazard is quite flexible, and should be able to capture the empirically defined hazard. The appeal of the Weibull model is that assuming a parametric hazard should yield gains in power; however, the hazard may not be flexible enough to capture the assumed form of the empirical hazard. Both models were fit using maximum likelihood techniques. Based on the fitted model, a 95% confidence interval for logψ was found, and an indicator for whether this interval includes 0 was evaluated. Once this process was repeated a large number of times, the power was approximated by evaluating the proportion of times the 95% confidence interval for logψ did not include 0.

Fig. 6
figure 6

Approximate power for given sample sizes for ψ=0.87 and the a empirical hazard function, b exponential hazard function λ=1/26 and c Weibull hazard function γ=0.8,λ=1/26 when the semi-parametric (−) and Weibull (− −) regression models were refit to the simulated data

Approximate power for given sample sizes are shown in Fig. 6 for the three hazard functions, ψ=0.87 and where the simulated data were refit under the semi-parametric and Weibull regression models. As can be seen, there is little difference in power estimates between the semiparametric and Weibull models. However, when data were generated based on the empirical hazard, the maximum likelihood estimates for the Weibull regression model often failed to converge. Accordingly, only results for the semi-parametric model are shown for this case. Across all cases, it can be seen that a large number of participants need to be enrolled to achieve at least 60% or 80% power (particularly in the empirical and Weibull hazard function case). Given this, it was concluded that it was not worth pursuing a full Bayesian adaptive sample size implementation to explore required sample sizes across a variety of different settings.

Trial status

The trial commenced recruitment on 18 August 2020 and the first planned analysis is now imminent.