FormalPara Key Summary Points

Multiple factors lead to placebo/vehicle response in a clinical trial, although study design may play a key role in mitigating the impact of that response in the interpretation of the data.

Dry eye disease (DED) trials are especially prone to placebo/vehicle response, emphasizing the need to consider various aspects of clinical trial design to mitigate this risk.

Masking the timing of treatment transition (MTT) design can limit the impact of the vehicle response and help in getting more consistent results for DED signs and symptoms, thus enabling the detection of true treatment effect in DED trials.

The MTT design has been implemented in a limited number of DED trials and should be considered while designing future DED trials.

Introduction

Dry eye disease (DED) or keratoconjunctivitis sicca is a multifactorial disease of the ocular surface characterized by a loss of homeostasis of the tear film, and accompanied by ocular symptoms, in which tear film instability and hyperosmolarity, ocular surface inflammation and damage, and neurosensory abnormalities play etiological roles [1, 2]. Globally, an estimated prevalence of 5–50% DED was reported and is continuously increasing over time across all demographics. In DED, the loss of tear film homeostasis that may be caused owing to the imbalance in the normal ocular microbiome [3], which leads to ocular surface inflammation and thereby results in ocular discomfort and visual impairment, and eventually causes damage to the ocular surface, if left untreated [4, 5]. Despite extensive efforts to develop efficacious treatment options for DED, the number of drugs available to improve outcomes in patients with DED is limited [6]. Hence, there is an unmet need for effective treatments that can address both the signs and symptoms of DED.

Many candidate drugs have been assessed over the years in pursuit of demonstrating efficacy in both signs and symptoms of DED. However, in most cases, efficacy is either observed in signs or symptoms, or in both in separate trials [7, 8]. Moreover, there are instances where efficacy in signs and symptoms observed in preclinical or early clinical studies could not be reproduced in confirmatory trials [9,10,11]. This could be due to incomplete understanding of the disease pathology, limitations in clinical study design, or failure to identify candidate drugs with precise mechanisms of action [6, 12, 13]. In addition, the lack of objective, quantifiable, and reproducible outcome measures increases the variability in results, which is further complicated by the inconsistency between signs and symptoms of DED [6].

A common challenge observed in DED clinical trials is the placebo or vehicle response, i.e., therapeutic effect caused by a pharmacologically inert treatment [6]. A high magnitude of vehicle response interferes with the estimation of a drug’s treatment effect and may lead to failure of a clinical trial.

Similar challenges have been repeatedly observed in clinical trials conducted in other heterogeneous diseases such as pain and depression [14, 15]. However, the vehicle responses in DED trials are of particular importance as there is no true “placebo” used in trials evaluating topical ocular drugs. For this reason, to minimize confounding factors, it is important to use a vehicle as a negative control, which has the same constituents as the active formulation except the active pharmaceutical ingredient [16]. However, to reduce the vehicle response, additional measures are required. Clinical trial design is one such important element to be considered. To address these concerns, Tear Film and Ocular Surface Society International Dry Eye Workshop (TFOS DEWS) taskforce has recommended a few study design strategies for DED trials [6].

This review briefly presents the factors that lead to placebo/vehicle response and focuses on the aspects of clinical trial design that can be improved to mitigate vehicle response in DED trials.

What Factors Are Associated with Vehicle Response in DED Trials?

Selection of Negative Control

Placebo is defined as a substance or treatment that is designed to have no therapeutic value. In studies using pills, the placebo may be described as a “sugar pill,” or a saline solution may be administered in studies with intravenous administration, as these are known to have no positive effect on the parameters being studied [17, 18]. In DED trials, depending on the investigational dry eye treatment being evaluated (drug or device), different placebos have been utilized [19]. However, when assessing efficacy of drugs administered topically on the ocular surface, the addition of liquid drops provides temporary benefit due to the liquid’s hydration property and results in decreased concentration of inflammatory molecules at the ocular surface [16]. The excipients (components other than the active pharmaceutical ingredient) present in the formulation, such as polymers (for viscosity), oils, or surfactants (for drug solubility or permeability), can positively or negatively impact both signs and symptoms of DED [20]. Therefore, to minimize the confounding factors of the excipients and detect the true treatment effect caused by the active drug, DED trials should use a vehicle as a negative control.

Lack of Adequate Washout Period and Effect of Concomitant Medications

If enrolled patients have been using medications that cause drying or adversely affect the ocular surface (e.g., preservative-containing medications), prior to participation in the trial, removal of these medications may lead to recovery from the adverse effects related to cumulative toxicity of the preservatives used in the medication. In the absence of an adequate washout period, removal of these medications may cause an improvement in DED outcomes during the trial, in both the treatment and the vehicle arms, which would result in a vehicle effect and falsely indicate that the drug and vehicle have similar efficacy [6]. Furthermore, adverse effects of a newly initiated concomitant non-DED medication may worsen the outcomes regardless of the assigned treatment arm, thus impacting the differentiation of treatment effect between the vehicle and active treatment [21]. Similarly, dosing and timing of artificial tears, when used as a control or as concomitant therapy, should be planned as their frequency would directly affect outcomes.

Fluctuation and Lack of Correlation of Signs and Symptoms

DED signs and symptoms are highly variable and often fluctuate in severity within a day, or from day to day, during the disease. The fluctuations in signs do not always coincide with that of symptoms, i.e., one is not necessarily the direct cause or result of the other [22]. A few studies reported consistency between signs and symptoms in small subgroups of patients, which is difficult to be observed in studies with large cohorts including patients with different degrees of severity [6]. Therefore, if patients had higher signs and symptoms scores at baseline, the scores may improve during the natural course of the disease, which may coincide with a primary outcome of the trial (regression toward the mean) (Table 1). This may not be observed in a subsequent trial that includes a different population. This improvement might be incorrectly interpreted as a therapeutic effect even if occurring in the vehicle-treated group, thus making it difficult to demonstrate reproducible efficacy [6]. To minimize the placebo response observed owing to improvement of outcomes during the natural course of the disease, multiple visits should be scheduled to establish consistency of signs and symptoms prior to randomization [6].

Table 1 Study design and placebo effect in various clinical trials

Subjectivity of Outcome Measures

For assessing the efficacy of a candidate drug or treatment, DED trials use various patient-reported and clinician-reported outcome measures. Symptoms of DED are purely subjective assessments made by patients through the use of patient-reported outcomes, and are subject to high variability based on patient’s tolerance level and mechanism for coping with symptoms and perception of improvement [31, 32]. If there is an imbalance of such patients, the interpretation of treatment effect can be skewed and difficult to reproduce in a subsequent trial. Likewise, for clinician-reported endpoints, although there are attempts to normalize the assessments using study-specific scales or grading systems, the majority of endpoints considered to be objective have a high degree of subjectivity as these are dependent on clinician judgement, which is influenced by experience, personal knowledge of the patient and their condition, or interpretation of the scoring definitions (e.g., ocular surface staining, hyperemia, tear film breakup time) [33]. Therefore, even objective measures are susceptible to placebo effects, albeit to a lesser extent compared to symptom report indices or rater indices [34].

Hence, to remove the influence of the aforementioned confounding factors and mitigate vehicle response, development and implementation of more automated or digital objective measures could be beneficial.

Expectations of Therapeutic Benefit

Study participants’ and study staff’s expectation of treatment benefit is an important factor leading to the vehicle response, irrespective of whether the patient is randomized to an active treatment or a vehicle group [35,36,37]. Similar to pain studies, patients may overendorse symptoms and feel helpless about them, especially when suffering for a long period of time [38, 39]. In addition, overendorsing symptoms to ensure inclusion in the study can potentially drive vehicle effects on self-reported measures because after randomization, regression to the mean can lead to improvement that is falsely interpreted as a treatment effect [34, 40]. Moreover, the fact that a patient is being more closely monitored during the conduct of clinical trial may influence patients’ perception of disease symptoms owing to expectancy of therapeutic benefit [13, 41]. These expectations may lead to changes in patients’ behavior or response considering conformity and social desirability [41]. A cordial patient–physician relationship may drive vehicle response, by influencing patients’ satisfaction, sagacity of health care quality, and treatment adherence [36, 42, 43].

In addition, more recent publications and industry-sponsored randomized controlled trials (RCTs) had a larger placebo effect. Jones et al. proposed, although without supporting data, that the larger placebo effect could be because often industry-sponsored trials are investigating new therapeutic agents, which may increase the expectancy of treatment benefit [44]. Higher expectancy of receiving an active drug reduces self-defeating thoughts and may lead to increased activity in the reward circuit in the brain, as shown by increased dopaminergic and opioid activity in the nucleus acumens of strong placebo responders [45, 46]. All of these factors are also potentially relevant in the conduct of DED trials. Neutralizing expectation may limit the placebo response without reducing the response to active treatment, thus improving assay sensitivity. This can be achieved by educating participants and study staff about placebo response factors, training them to improve the accuracy of patient-reported outcomes, and reducing external factors influencing participants’ outcomes ratings [35, 47, 48].

Treatment Adherence and Compliance

Treatment of DED typically includes the use of artificial tears or other prescription medications to be used as needed or at a set frequency. However, owing to multiple reasons such as patient experience with the treatment (e.g., instillation events), patient desire, and so on, adherence to these treatments may be challenging [49]. The lack of adherence for adequate duration for a treatment to show effect can lead to the assumption that the treatment was ineffective [50]. However, participation in a DED trial can be a motivating factor for the patients to adhere to the treatment for the required duration to observe the therapeutic effect. In patients receiving vehicle with non-pharmacological constituents, this improved adherence may lead to improved outcomes over time as a result of the hydrating effect of drops and the resulting decrease in concentration of inflammatory mediators at the ocular surface, despite the absence of an actual biological effect of an active ingredient [51].

TFOS DEWS II Recommendations for Design of DED Trials

As with other diseases such as pain and depression, where trials are plagued by high placebo effects, technological advancements may help mitigate high placebo response. A set of clinical and biological markers that can predict placebo response with a fair degree of accuracy could potentially help in identifying placebo responders in trials [52]. Some studies have explored the use of predicting factor-based algorithms to address placebo response in depression trials [52,53,54]. However, this field is still at a very nascent stage and will continue to evolve [55].

In the absence of such tools related to DED trials, TFOS DEWS II made recommendations to maximize the quality and interpretation of pivotal trials in DED. TFOS DEWS II recommended considering disease mechanism and drug’s mode of action while deciding on the outcome measures in a trial [6]. They emphasized the importance of prospective, randomized, double-masked, placebo- or vehicle-controlled, parallel-group trial as the most optimal study design for a DED clinical trial [6].

TFOS DEWS II considered crossover clinical trials acceptable if the initial treatment is not a cure (improvements observed are temporary; signs and symptoms are expected to return after stopping the initial assigned treatment), there are no carryover effects associated with first treatment, and all patients complete the trial. Recognizing the prominent vehicle effect in DED clinical trials, to mitigate the factors contributing to the vehicle effect, the committee recommended that the time of treatment initiation should be masked to both patients and investigators (Table 2) [6]. In addition, TFOS DEWS II recommended using withdrawal trial designs to minimize placebo response (Table 2) [6].

Table 2 Mitigating placebo or vehicle response in DED trials: TFOS DEWS II recommendations and potential study design strategies

Withdrawal Study Design

A withdrawal trial design was an approach suggested by TFOS DEWS II (Fig. 1) to minimize vehicle response, where active medication is given to all patients in the first phase and patients are randomized to either vehicle or the drug group in the second phase (Table 1) [6, 56, 57]. This design reduces duration of placebo treatment, as after active treatment phase only responders are randomized in the second phase to receive placebo or active drug. The enrichment approach increases the statistical power of the study. In addition, it helps to determine the optimal duration of treatment. Nevertheless, there are a few disadvantages associated with this study design including carryover effects of drug (with insufficient washout period), reduced external validity, and overestimation of treatment effect (Table 2). A few clinical trials in other therapy areas have utilized this study design to reduce placebo response [56, 58,59,60,61,62,63]. However, to date, only one DED trial (an extension study) has implemented a withdrawal trial design [64].

Fig. 1
figure 1

Withdrawal study design

Crossover Trial

In a crossover trial, patients sequentially receive both active and placebo treatments, but in a random order (Fig. 2). The first treatment is followed by a washout period, after which the next treatment is initiated. The order in which the treatments are given is randomized to avoid any bias based on presumptions of treatment assignment, which can affect study outcomes. When patients act as their own controls, between-subject variability is reduced and confounding effects of drug on the outcomes are minimized (Table 2). This design helps in avoiding overestimation of the therapeutic benefits of the drug under investigation and, hence, makes results more likely to be indicative of actual therapeutic benefit [65]. Crossover trials have higher statistical power and require a smaller sample size to detect treatment effects. The disadvantages associated with crossover design include possibility of unforeseen treatment-by-period interaction (e.g., carryover), longer duration, and a greater impact of missing data (Table 2) [65]. In addition, patients’ expectations in the second phase of trial may be influenced by their experiences and response to the treatment received in the first phase. To date, crossover designs in DED trials are the most useful when looking at short-term treatment, or immediate assessments such as drop instillation comfort, blurring profile, and so on. Crossover designs are not recommended for long-term DED treatment trials.

Fig. 2
figure 2

Crossover trial design

Enrichment Designs to Minimize Vehicle Effects in DED Trials

To avoid high placebo/vehicle response rates, vehicle nonresponder enrichment can be done using several approaches [66, 67].

Sequential Parallel Comparison Design

To our knowledge, a sequential, parallel-comparison (SPC) study design has not been utilized in a DED trial to date. However, the concept has been utilized in other disease areas [68]. In a SPC design, initially, an unbalanced randomization between placebo and drug group is done, with more patients randomized to placebo (Fig. 3). In the other parallel phase, placebo nonresponders are randomized to either drug or placebo. Only placebo nonresponders, based on an early endpoint, are included in the second phase. As these patients have already failed to respond to placebo, they will have very limited placebo response. Data from both phases are pooled in the analysis to maximize statistical power. SPC study designs require smaller sample sizes and can help reduce the placebo effect (Table 2) [69].

Fig. 3
figure 3

Sequential parallel comparison design

Hence, SPC designs could be the preferred study design strategy for diseases reporting high placebo response and wherein substantial treatment effect can be confirmed with a relatively short follow-up period [68]. Furthermore, SPC designs can be incorporated with adaptive designs to enhance the efficiency of clinical trials and limit patient exposure to ineffective and unsafe medications, or treatments [70]. Boessen et al. suggested that using SPC designs can increase success rates of depression trials, which was supported by the results of FORWARD-5, a phase 3 trial [68, 71]. This depression trial utilized a SPC design to mitigate high placebo response by increasing statistical power with a small sample size and improving signal detection [71].

However, there are a few challenges associated with SPC designs. The biggest issue is deciding the estimand, i.e., what exactly the treatment difference (active vs control) evaluates [72]. Because of the adaptive and pooling nature of the design, it is also subject to a high potential for type 1 error (false-positive outcome). Because of these limitations, SPC designs are usually used in exploratory settings than in pivotal trials [73]. Other disadvantages include reduced external validity due to the criteria used during population enrichment, a complicated study design, and longer study duration compared to conventional RCTs (Table 2). SPC designs have not been used widely owing to the challenges associated with designing and analyzing these trials (e.g., unavailability of an early endpoint that can predict the efficacy), as well as the uncertainties around their evaluation by regulatory agencies [73, 74]. Although many of these limitations are applicable to DED trials, if such limitations can be mitigated and the risks accepted, these study designs could potentially be of value for reducing the vehicle effect in exploratory studies.

Run-in Design

A single-blind, run-in strategy can be implemented to identify and exclude vehicle responders, thereby enriching the study for vehicle nonresponders. Single-blind or open-label run-in designs are the most common designs used in DED trials, wherein topical treatments are administered at the ocular surface. In this study design, after meeting the initial entry criteria, all patients are given the same intervention (e.g., vehicle, a preselected artificial tear) during a run-in phase, followed by a qualification visit to assess if the patient still meets the entry criteria before randomization (Fig. 4; Table 1) [75]. Patients who no longer meet the criteria are usually discontinued from the trial. Therefore, run-in designs using vehicle can help identify and exclude patients who show a predecided degree of improvement to vehicle alone and, therefore, improve probability of detecting a treatment effect [50, 67]. However, open-label run-in studies are influenced by inclusion criteria for patients, and the study design [76,77,78]. All patients and investigators are aware of the fact that the patient is receiving vehicle during the run-in phase. Hence, there is less expectation of improvement, thus limiting the vehicle response during this phase. Consequently, the high probability of placebo response remains in the randomized phase of the trial, with an increased expectation of therapeutic benefit from both physicians and patients. Notably, meta-analyses comparing depression trials with and without a single-blind placebo run-in period reported no significantly reduced placebo responses or increased assay sensitivity with this approach [15].

Fig. 4
figure 4

A standard run-in design

One common challenge with run-in designs is to decide the duration of the run-in period that adequately allows washout of previous medications and enables detection of vehicle responders (Table 2). Among nine DED trials conducted with a run-in design, the run-in period ranged from 7 to 20 days, and the treatments given during the run-in period included placebo, vehicle, artificial tears, and saline [79,80,81,82,83,84,85,86]. In DED trials, the most common and recommended run-in duration is 1 to 2 weeks [16]. Furthermore, setting appropriate cutoff values for exclusion of vehicle responders and inclusion of patients for randomization and primary analysis must be done very carefully. To account for the patients excluded during enrichment, many patients should be screened at the first screening visit. On average, DED studies with a single-blind run-in design exclude approximately 25–30% of patients after the run-in phase [79,80,81,82,83,84, 86].

Double-Blind Run-in or Masked Treatment Transition Design

In most DED trials, post hoc analyses led to hypotheses that the investigational drug may work better in certain subgroups of patients (e.g., patients with higher severity of disease). Hence, many DED treatments have modified their target population, and further trials have considered entry criteria, based on such subgroup analyses. However, the subsequent results are often inconsistent with those of the subgroup analysis, which may in part be due to the presence of vehicle responders in these subgroups. By excluding vehicle responders efficiently, one might avoid these false-positive results. Various strategies are used to circumvent the impact of placebo/vehicle response; however, this can result in issues such as overestimation of intention-to-treat effect and possibility of unforeseen treatment-by-period interaction, and may affect the generalizability of results [66, 75]. These challenges can plausibly be managed if the timing of treatment transition is masked for both patients and investigators to reduce the bias resulting in vehicle response.

In DED studies, it is common to use an open-label vehicle run-in period. Still, improvement in all treatment arms after the randomization can be observed as a result of the factors mentioned in “What Factors Are Associated with Vehicle Response in DED Trials?” Complete exclusion of vehicle responders is difficult to achieve as during the run-in phase, patients and investigators are aware that actual treatment has not started. In the double-blind, run-in design, both patients and investigators are blinded about the existence of the run-in period, the time of transition (masked treatment transition, MTT) to active drug, and the criteria for inclusion in the primary analysis (Fig. 5) [78]. Masking the timing of treatment transition (control to active drug or active drug to control) from investigators and patients can reduce vehicle response observed in the trials. This approach is recommended by IMMPACT and TFOS DEWS II to minimize placebo response [5, 78, 88]. Patients who do not meet the entry criteria after the run-in (vehicle responders) may be retained in the study but are excluded from the primary analysis or are discontinued from the study. In a double-blind placebo run-in phase, about 28% of patients were identified as placebo responders compared to less than 10% when in a single-blind placebo run-in phase in two studies carried out in a similar time frame [78].

Fig. 5
figure 5

Clinical trial with masked treatment transition design

An important aspect in these trials is that the investigator and the patient are unaware of the actual time and criteria for randomization, as this may subconsciously influence behaviors and scoring (e.g., artificial inflation to meet inclusion criteria) [89]. Masking the actual criteria to enter randomization phase helps exclude vehicle response resulting from regression to mean from the primary analysis, which may occur as a result of overendorsement of signs or symptoms at screening (to ensure inclusion in the study) or the natural course of disease progression [34, 90]. Moreover, when the timing of randomization is masked, the improvement in symptoms arising from expectation for treatment benefit, i.e., vehicle response, occurs before the randomized treatment begins and thus does not impact the primary efficacy analysis. This improves the chances to see a more meaningful treatment effect in patients who need more than just an artificial tear.

Recent Example of MTT Design in DED Trial

As mentioned earlier, trials with MTT design have been conducted in different therapy areas to minimize the extent of placebo response [76, 78]. Recently, the phase 2b trial of ECF843, a recombinant human lubricin, implemented this study design (Fig. 6) to assess its efficacy in improving overall Symptom Assessment in Dry Eye (SANDE) and composite corneal staining scores (CFS) in patients with moderate to severe DED [9]. The design of this study attempted to address several factors that contribute to the vehicle response by including a double-masked run-in, as well as a double-masked withdrawal phase at the end. Treatment transitions and criteria for entering the primary analysis were masked to both the investigators and patients. To enrich the population for both signs and symptoms of DED, patients who were vehicle responders in CFS score or SANDE score, or both, were excluded from the primary analysis (Fig. 6). The criteria used to identify vehicle responders were predefined and managed through an automated system to ensure efficient implementation of the study design while maintaining masking. Patients and investigators were informed that the duration of treatment in the trial would be 56 days, and all patients would receive vehicle at some point during the trial.

Fig. 6
figure 6

ECF843 phase 2b study design with vehicle run-in, masked treatment transition, and withdrawal phase. *Patients who had an average SANDE score of ≥ 40 from visit 1 to visit 2, and total CFS score of ≥ 3 at visit 2. Treatment sequence of ECF843 (up to 28 days) or vehicle is random. Vehicle responders were excluded from primary analysis. They continued to receive vehicle and were included in safety analysis. BID twice a day, CFS corneal fluorescein staining score, N total number of patients, n number of patients under the category, SANDE Symptom Assessment in Dry Eye, TID thrice a day

In this trial, 970 patients were screened, with 717 meeting entry criteria at visit 1. Visit 1 was indicated as the randomization visit as patients were randomized to dosing frequency, although all patients received vehicle for 2 weeks. After the 2-week run-in, 159 patients were identified as vehicle responders, and 558 patients as vehicle nonresponders, who met the criteria for inclusion in the primary efficacy analysis. Although the number of vehicle responders was not exceptionally high, there was a striking magnitude of improvement observed in many vehicle responders in both CFS and SANDE scores during the run-in phase. The vehicle responders were not randomized and were consequently excluded from the primary efficacy analysis and continued receiving vehicle till end of the study. As shown in Table 3, some patients had almost complete resolution of signs and/or symptoms after 2 weeks of vehicle treatment despite being identified as patients with chronic DED and significant symptoms and staining at the time of entry to the study (Fig. 6). This could have been due to high fluctuations in severity of DED, although the entry criteria were not widely different than those of other trials in patients with moderate-to-severe DED. The sharp improvement during the run-in could also have been due to the expectation of therapeutic benefit in these patients because of the belief that they were being randomized at visit 1. Moreover, the quantification of magnitude of the placebo effect compared to the inclusion criteria at multiple visits can help in expecting the treatment benefit [91,92,93].

Table 3 Representative examples of vehicle responders from ECF843 phase 2b study

With these strategies of selecting vehicle nonresponders, masking the criteria for selection as well as treatment transitions, the resulting outcomes data were consistent for both the signs and symptoms, which is very unusual in DED studies (Fig. 7) [12, 22, 94]. Although ECF843 was not superior to vehicle in efficacy, patients did improve while on treatment, which indicates that during enrichment, the study did not select patients who were just unable to improve (treatment non-responders). Hence, using MTT design can limit vehicle response, helps in getting more consistent results for DED signs and symptoms, and thus enables the detection of true treatment effect in DED trials.

Fig. 7
figure 7

Impact of masked treatment transition in ECF843 phase 2b study on a overall SANDE score and b composite corneal staining. BID twice a day, CFS corneal fluorescein staining score, SANDE Symptom Assessment in Dry Eye, SE standard error, TID thrice a day, VNR vehicle nonresponder, VR vehicle responder

Notably, in this study, a clear response was not observed following withdrawal of the active treatment. This is most likely due to the lack of efficacy of the active treatment, which would eliminate any logical deterioration after active treatment is stopped. Importantly, because the initiation of the withdrawal phase was masked, the expectation for any change in effect of the treatment was removed. Therefore, this further supports the hypothesis that masking treatment transition points and more effectively eliminating vehicle responders enables clear and consistent interpretation of study data.

In studies using the MTT design, and as demonstrated in the ECF843 study, a greater improvement is observed during the run-in period, compared to that noticed with a single-blind run-in design, and hence a higher proportion of patients (deemed vehicle responders) are excluded from the primary analysis. In a recent study using MTT design, more than half of the patients (64%) were excluded following a 4-week vehicle run-in period [85]. Therefore, it is important to have a larger number of patients during initial screening to ensure the power of the study is attained. MTT designs are not used commonly, primarily because of the complexity of such a trial. Extensive planning is required to ensure that all criteria for inclusion in the primary analysis are prespecified and masking is maintained during the trial.

Summary

DED is a highly prevalent disease that will likely continue to increase over time as the population ages, along with increased use of video display or devices. To date, although many topical ocular DED therapies have been studied, very few have been able to demonstrate significant benefits over vehicle. Also, the efficacy observed in trials has been difficult to reproduce. There are various proposed reasons behind these challenges including vehicle response observed in DED trials, which itself can be due to many factors. One of these is the fact that DED trials of a topical ocular drop do not have a true placebo comparator (one with no therapeutic benefit) as topical formulations themselves can provide some therapeutic benefit to the desiccated ocular surface. Owing to substantial improvement in signs and symptoms induced by vehicle alone, the window for additional improvement by the active pharmaceutical ingredient is reduced.

Once enrolled in a trial, better adherence has been observed with both vehicle and active treatment, leading to improvement in both treatment arms. By including a MTT design, this improvement can be moved before the randomized treatment period, so that the treatment effect can be clearly observed. This is, in part, why real-world evidence on treatment adherence is gaining focus to get perspectives on the efficacy and safety of new medications in real-world settings.

The physical vehicle effect is further complicated by the psychological vehicle effect (a true placebo effect). Patients and physicians have the best of intentions when participating in clinical trials. However, subconscious bias raised from the expectation of treatment benefit, which can be in any treatment arm in a masked trial, can influence clinical assessments, which are heavily based on subjective inputs. This is true also for some objective physician-assessed endpoints, e.g., corneal staining, which are influenced by the physician’s assessment and interpretation, that are subjective. Therefore, factors such as expectation of therapeutic effect and regression to the mean cannot be fully mitigated.

An imbalance of vehicle responders between treatment groups could lead to an incorrect interpretation of the study results and hence limit reproducibility of any improvements observed in a subsequent trial. Moreover, when artificial tears are used as control or used concomitantly, the frequency of their usage should be pre-planned as it would affect the outcomes and the treatment difference. Similarly, many DED trials have reported efficacy of a drug in a subgroup of patients; however, the subsequent results were inconsistent with those observed in these subgroup analyses, which may be due to the presence of vehicle responders and smaller size of the group. Notably, the efficacy of most approved DED drugs is based on the totality of data rather than repeated efficacy demonstrated in more than one pivotal trial.

By masking the study through a MTT design, many of the above factors are somewhat mitigated. As observed in the ECF843 trial, a substantial improvement in outcomes occurred during the initial phase of the study (before randomization). However, there were patients who did not improve with vehicle alone, suggesting that these patients have a more severe disease state, which requires more than just a lubricant to improve their condition. These patients were most likely to demonstrate clear benefit if an effective active ingredient was given. In other words, although patients continued to improve after randomization, any effect of the drug itself would have been more clearly observed. Importantly, the results were consistent for DED signs and symptoms in this trial.

Attempting MTT designs could be challenging as these require planning, predefined criteria to select vehicle responders, and a large number of patients at initial screening to ensure an appropriate sample size for the primary analysis. Whether run-in study designs, including double-masked run-in or MTT designs, improve the probability of success of a trial remains debatable. However, in research, the goal is to design a study that provides clear results. By using the MTT design, as seen in the ECF843 phase 2b study as well as the study by Shettle et al., the results are noticeably clear and consistent for DED signs and symptoms, which are much needed for reliable interpretation of results in DED trials.

Future Recommendations

TFOS DEWS II made several recommendations for the design of DED clinical trials. Multiple DED trials have used a run-in period before randomization. However, TFOS DEWS II recommendation of masking treatment transition from investigators and patients and the withdrawal study design have only been implemented in a very limited number of DED trials and should be considered while designing future DED trials. Timing and frequency of vehicle instillations as artificial tears should be planned and should correspond to the instillations of the active formulation being administered in the trial. Neutralizing the patient’s and physician’s expectations by educating them about the outcomes and scoring/grading methods would be useful to reduce subjectivity in reporting of outcomes. The mechanism of action of a candidate drug should be considered while selecting the outcome measures in a trial. Development and use of validated, reproducible outcome measures, including biomarkers or digital assessments for tests such as corneal staining, and use of technology for enabling real-time reporting of outcomes might help in minimizing vehicle response in DED trials. To establish signs and symptoms of DED and limit the probability of regression toward the mean during the study, assessments should be performed at multiple visits before randomization. Moreover, further research should be conducted regarding the early detection of vehicle responders with artificial intelligence using predictive algorithm in pursuit of mitigating vehicle response.

In patient with severe DED or meibomian gland dysfunction (MGD), device-based treatments can be considered as the newer options in case of no response to the first-line therapy. Though several device-based treatments are available for the management of MGD, extensive evaluation of these therapies in randomized clinical trials is warranted [95].