FormalPara Key Points

This study investigated the amount of additional sleep time people with insomnia disorder would consider to be a meaningful improvement.

Blinded anchor- and distribution-based analyses of subjective total sleep time (sTST) data from two clinical trials of drugs used to treat insomnia (zolpidem and daridorexant) were used to derive estimates of meaningful within-patient change.

Triangulation of the different estimates indicated that an increase in sTST of approximately 55 min per night was meaningful to patients with moderate to severe insomnia.

1 Introduction

Regulatory guidelines recommend that the efficacy of pharmacotherapies for insomnia disorder be evaluated using both objective and subjective sleep parameters [1, 2]. Sleep diaries completed by patients are a useful tool for assessing subjective symptoms and treatment effects in people with insomnia disorder [3, 4]. The Consensus Sleep Diary (CSD) was developed using extensive patient and expert input to capture the most relevant aspects of the patient experience of insomnia [3, 5, 6]. The CSD assesses key sleep parameters including subjective total sleep time (sTST), subjective wake time after sleep onset (sWASO), and subjective latency to sleep onset (sLSO).

The Sleep Diary Questionnaire (SDQ) is a modified version of the CSD, with minor changes to optimize it for use in clinical trials. Notably, the SDQ includes two questions on the use of study medication; a minor change to question 9, clarifying for respondents that they should estimate their total sleep time ‘last night’; three visual analog scales (VASs) to be completed in the morning (quality of sleep last night, depth of sleep last night, feeling in the morning); and two VASs to be completed in the evening (daytime alertness, ability to function) [Table 1 in the electronic supplementary material (ESM)]. The SDQ omits CSD Q8: How would you rate the quality of your sleep? Content validity of the SDQ was evaluated in a US-based web survey of 100 adults with self-reported insomnia, which found that the SDQ was easy to understand and relevant to patient experiences of insomnia [7]. In-depth one-on-one telephone interviews conducted with 17 participants from the web survey provided additional evidence of the importance of sTST from the patient perspective and supported the clarity and comprehensiveness of the SDQ items [7] (data on file, Idorsia Pharmaceuticals Ltd). However, the importance of sTST as an outcome measure and meaningful within-patient change for sTST have not been established as part of the validation of either the CSD or the SDQ.

To properly evaluate treatment effects, it is important to understand the amount of additional sleep time patients with insomnia would consider to be a meaningful improvement. In this study, we estimated meaningful within-patient changes for sTST using SDQ data from interventional clinical trials in adults with insomnia disorder. Patient-estimated sTST was also compared with calculated sTST to explore the accuracy of the patient report.

2 Methods

2.1 Study Design

Quantitative assessment of meaningful within-patient change was performed using data from an open-label, 2-week trial of zolpidem conducted in Germany and the US (NCT03056053) [8] and a phase III, randomized, double-blind trial of daridorexant conducted in 10 countries (Australia, Canada, Denmark, Germany, Italy, Poland, Serbia, Spain, Switzerland, and the US), in which subjects were randomized 1:1:1 to receive one of two daridorexant doses (25 or 50 mg) or placebo daily for 3 months (NCT03545191) [9]. These trials are summarized in Table 1. In both trials, ethical approval was obtained for each study site, and all subjects provided written informed consent.

Table 1 Overview of the included trials

2.2 Subjects

Participants in both the open-label and phase III trials were adults aged ≥ 18 years with insomnia disorder according to the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) and an Insomnia Severity Index (ISI) score ≥ 15 at screening. Potential participants were excluded if they had a psychiatric disease that might interfere with study participation or confound the results, periodic limb movement disorder, restless legs syndrome, circadian rhythm disorder, rapid eye movement sleep behavior disorder, narcolepsy, or history of a sleep-related breathing disorder.

2.3 Patient-Reported Outcome Assessments

The SDQ includes 10 questions and three VASs on the previous night’s sleep that are completed in the morning. It also includes two questions on daytime napping and two VASs that are completed in the evening (ESM Table 1). Subjects in the open-label and phase III trials completed the SDQ daily throughout screening and treatment. Question 9 of the CSD assessing sTST (“In total, how long did you sleep?”) is slightly modified in the SDQ as “In total, how long did you sleep last night?”. The accompanying instructions (“This should just be your best estimate, based on when you went to bed and woke up, how long it took you to fall asleep, and how long you were awake. You do not need to calculate this by adding and subtracting; just give your best estimate”) remain unchanged from the CSD (ESM Table 1). sTST was assessed every morning. The prespecified scoring rule for calculating weekly average sTST (in minutes/day) required subjects to have ≥2 days of sTST data during a given week.

sTST was also calculated using morning questionnaire responses that comprised components of sTST. Calculated sTST was defined as time of final awakening in the morning (question 8) minus time of beginning to try falling asleep (question 4) minus length of time taken to fall asleep (question 5) minus total duration of night-time awakenings (question 7). Calculated sTST was set to missing if the value derived for a given night was negative (nonsensical) or >16 h (implausible).

Subjects also completed other patient-reported outcome (PRO) assessments (Table 1). The ISI is a 7-item measure for evaluating the severity and functional and emotional impacts of insomnia disorder over the previous month [10]. Each item is scored on a scale from 0 to 4, and the scores for individual items are summed to give a total score of 0–28 points. An ISI total score of 15–21 points indicates moderate insomnia, and a score of 22–28 points indicates severe insomnia. Subjects completed the ISI at baseline and day 15 in the open-label trial and at baseline, month 1, and month 3 in the phase III trial.

The Patient Global Assessment of Disease Severity (PGA-S) is a single-item measure assessing the severity of daytime symptoms and impacts of insomnia disorder over the previous 7 days on a 6-point scale from ‘none’ to ‘very severe’. The Patient Global Impression of Severity (PGI-S) is another single-item measure that assesses the severity of night-time insomnia symptoms over the previous 7 nights on a 4-point scale from ‘none’ to ‘severe’. Two Patient Global Impression of Change scales (PGI-C) were used to assess changes in the severity of night-time insomnia symptoms over the previous 7 nights (phase III trial only) and of daytime sleepiness and impacts due to insomnia over the previous 7 days (open-label and phase III trials). For the PGI-C scale, changes in severity are scored on a 7-point scale from ‘very much better’ to ‘very much worse’, as compared with the week before starting treatment. In the open-label trial, the PGA-S was completed at baseline, day 8, and day 15, and the PGI-C was completed at day 8 and day 15; the PGI-S was not used. In the phase III trial, the PGA-S, PGI-S, and PGI-C were completed at baseline, month 1, and month 3.

2.4 Clinician Assessments

The open-label trial included Clinician Global Impression of Severity (CGI-S) and Clinician Global Impression of Change (CGI-C) as additional single-item measures of insomnia disorder (Table 1). The CGI-S assesses the severity of a patient’s daytime symptoms and impacts of insomnia disorder over the previous 7 days on a 6-point scale from ‘none’ to ‘very severe’. The CGI-C assesses changes in the severity of a patient’s daytime symptoms and impacts due to insomnia over the previous 7 days on a 7-point scale from ‘very much better’ to ‘very much worse’, as compared with the week before starting treatment. The CGI-S was completed at baseline and day 15 and the CGI-C was completed at day 15.

2.5 Psychometric Analysis

Analyses were performed using SAS® version 9.4 or later (SAS Institute, Cary, NC, USA) on the full analysis set (FAS). In the open-label trial, the FAS comprised all subjects who received at least one dose of zolpidem; in the phase III trial, the FAS comprised all randomized subjects. Descriptive statistics were calculated for subject demographics and PRO scores. For sTST, baseline was a weekly average value, and day 8, day 15, month 1, and month 3 were weekly average values for the week preceding the given time point. For the other PROs and the clinician assessments, baseline, day 8, day 15, month 1, and month 3 were a single score/rating at the given time point.

Meaningful within-patient change estimates were derived using recommended anchor-based and distribution-based approaches [11, 12]. In the anchor-based analysis, Spearman correlation coefficients were first calculated for changes in weekly average sTST and the corresponding changes in scores for potential anchors. Anchors with moderate or stronger correlations of less than or equal to − 0.30 [11, 13] were considered acceptable for inclusion in the anchor-based analysis of meaningful within-patient change. For weekly average sTST, mean and median (95% confidence interval [CI]) changes from baseline to day 8 and day 15 in the open-label trial and from baseline to month 1 and month 3 in the phase III trial were calculated for subjects who met prespecified thresholds defining clinically relevant score changes for the anchors. These score changes were a 1- or 2-point decrease (improvement) from baseline for the PGA-S, PGI-S, and CGI-S; ‘a little better’ or ‘moderately better’ for the PGI-C (daytime and night-time symptoms) and CGI-C; and a 6-point decrease from baseline for ISI total score [14, 15].

Similar correlation analyses for sWASO and sLSO yielded only weak to moderate correlations with potential anchors. It was therefore decided not to pursue meaningful within-patient change estimates for sWASO and sLSO.

In distribution-based analyses, the standard error of measurement (SEM) was calculated for weekly average sTST at day 8 and day 15 (open-label trial) and month 1 and month 3 (phase III trial). The 0.25, 0.33, and 0.50 standard deviations (SDs) were calculated for weekly average sTST at baseline and at each post-baseline assessment.

The anchor- and distribution-based analyses were repeated in a subsample of 200 subjects who participated in a second randomized, double-blind, placebo-controlled trial of daridorexant (NCT03575104) [9]. The subsample comprised the first 120 subjects aged 18–64 years and the first 80 subjects aged ≥65 years who were randomized in the trial and who had baseline data for the PGA-S (daytime symptoms), PGI-S (night-time symptoms), and PGI-C (daytime and night-time symptoms).

The estimates obtained from the anchor- and distribution-based approaches were compared or ‘triangulated’ to obtain a final estimate of meaningful within-patient change [11, 16, 17]. This involved evaluating the different estimates in a descriptive, non-inferential way to identify a set of similar values where they converged. This set of values was then used to identify a narrow range of values representing a meaningful within-patient change.

In addition, cumulative distribution function (CDF) curves for sTST were plotted using data from subgroups of subjects categorized based on score changes on the PGA-S, PGI-S, and CGI-S, and ratings for the PGI-C (daytime and night-time symptoms) and CGI-C. In these CDF curves, the x-axis represents a continuous plot of subjects’ change from baseline in weekly average sTST, from the greatest worsening to the greatest improvement, and the y-axis shows the cumulative percentage of subjects who attained that level of change.

2.6 Concordance Analysis

A concordance analysis was conducted using data from the phase III trial to evaluate agreement between sTST (directly reported in SDQ question 9) and calculated sTST (based on separate SDQ items; described in Sect. 2.3). The distributions of sTST and calculated sTST at baseline, month 1, and month 3 were summarized using descriptive statistics. Individual differences at baseline, month 1, and month 3 (calculated sTST minus sTST) were also evaluated using descriptive statistics, as well as histograms and Bland–Altman analysis plots.

3 Results

3.1 Subjects

Demographics and baseline clinical characteristics of subjects in the open-label trial (N = 114) have been reported elsewhere [8]. Briefly, most subjects were female (65%) and were White (80%) or Black/African American (20%). Baseline insomnia severity was moderate (ISI total score 15–21) in 75% of subjects and severe (ISI total score 22–28) in 25% of subjects [8]. Subjects in the phase III trial (N = 930) had a mean (SD) age of 55.4 (15.3) years (range 18–88). Most subjects were female (67%) and White (90%) or Black/African American (8%). Baseline insomnia severity based on ISI total score was moderate in 57% of subjects and severe in 26% of subjects [9]. The amount of missing sTST data was low: almost all subjects (> 98% in the open-label trial and > 90% in the phase III trial) had 5–7 days of data that contributed to their weekly average sTST at each time point (baseline, month 1, month 3).

3.2 Correlation Analysis

PGA-S (daytime symptoms), CGI-S and CGI-C (daytime symptoms), PGI-C (daytime and night-time symptoms), PGI-S (night-time symptoms), and ISI total score were evaluated as potential anchors for estimating meaningful within-patient change in sTST. In the open-label trial, score changes/ratings at day 8 and day 15 for each potential anchor had correlations with change in weekly average sTST above the prespecified threshold of −  0.30 (Table 2). Individual correlations in the open-label trial were moderate to strong, ranging from −  0.30 for CGI-S and −  0.31 for CGI-C (daytime symptoms) to −  0.54 for PGA-S. In the phase III trial, correlations of score changes/ratings at month 1 and month 3 for potential anchors with change in weekly average sTST ranged from −  0.37 to −  0.49 (Table 2). Given the strengths of the relationships observed, all of the tested PRO measures were used in the anchor-based estimation of meaningful within-patient change in sTST.

Table 2 Correlations of change in weekly average sTST and scores/score changes for potential anchors in the open-label and phase III trials

The corresponding correlations for the phase III trial subsample are shown in ESM Table 2.

3.3 Anchor-Based Analyses

In the open-label trial, mean change in sTST from baseline to day 8 for subjects with a 1-point or 1-step improvement on the anchors was + 60.1 min in subjects who rated the PGI-C (daytime symptoms) as ‘a little better’ and + 83.2 min in subjects with a 1-point decrease in PGA-S score (Table 3). At day 15, the mean change in sTST from baseline for subjects with a 1-point or 1-step improvement on the anchors ranged from + 55.5 min in subjects with a 1-point decrease in PGA-S score to + 68.2 min in subjects with a 1-point decrease in CGI-S score. For subjects with a 2-point or 2-step improvement on the anchors, mean change in sTST from baseline to day 8 was + 79.6 min in subjects with a 2-point decrease in PGA-S score and + 81.4 min in subjects who rated the PGI-C (daytime symptoms) as ‘moderately better’. At day 15, the mean change from baseline ranged from + 80.1 min in subjects with a 2-point decrease in CGI-S total score to + 93.5 min in subjects with a 2-point decrease in PGA-S score.

Table 3 Anchor-based analysis of weekly average change in sTST (min) from baseline to day 8 and day 15 in the open-label trial

In the phase III trial, for subjects with a 1-point or 1-step improvement on the anchors, mean changes in sTST from baseline to month 1 and month 3 were lowest in subjects with a 1-point decrease in PGA-S score (+39.3 min at month 1 and + 47.3 min at month 3) and highest in subjects with a 1-point decrease in PGI-S score (+46.7 min at month 1 and + 58.3 min at month 3) [Table 4]. For subjects with a 2-point or 2-step improvement on the anchors, the mean change in sTST at month 1 ranged from + 60.7 min in subjects who rated the PGI-C (daytime symptoms) as moderately better to + 76.2 min in subjects with a 2-point decrease in PGI-S score. Mean changes at month 3 ranged from + 70.1 min in subjects with a 2-point decrease in PGA-S score to + 87.7 min in subjects with a 2-point decrease in PGI-S score.

Table 4 Anchor-based analysis of weekly average change in sTST (min) from baseline to month 1 and month 3 in the phase III trial

Results of the anchor-based analysis of the phase III trial subsample are shown in ESM Table 3.

3.4 Distribution-Based Analysis

In accordance with US FDA recommendations [12], distribution-based analyses were conducted to obtain supportive evidence for the anchor-based meaningful within-patient change estimates for sTST. One SEM [18, 19] and 0.50 SD [20] have been proposed as useful measures of meaningful within-patient change. In the open-label trial, one SEM was 51.1 min at day 8 and 55.5 min at day 15, and 0.5 SD was 29.4 min at day 8 and 31.7 min at day 15 (ESM Table 4). In the phase III trial, one SEM was 43.2 min at month 1 and 53.3 min at month 3, and 0.5 SD was 34.3 min at month 1 and 37.4 min at month 3. While the SEM values were within the ranges derived from the anchor-based analysis, the 0.5 SD values were generally lower than the anchor-derived ranges. Results of the distribution-based analysis of the phase III trial subsample are shown in ESM Table 5.

3.5 Triangulation of Data from the Anchor- and Distribution-Based Analyses

When the various anchor-based estimates for meaningful within-patient change from baseline to day 8 and day 15 were considered (Fig. 1), the results from the open-label trial were supportive of a meaningful within-patient change threshold for sTST of approximately 60 min for subjects with a 1-point or 1-step improvement on the anchors and 80 min for subjects with a 2-point or 2-step improvement on the anchors. Results from the phase III trial for change from baseline to month 1 and month 3 were supportive of a meaningful within-patient change threshold of approximately 55 min for subjects with a 1-point or 1-step improvement on the anchors and 75 min for subjects with a 2-point or 2-step improvement on the anchors. Distribution-based estimates were supportive of findings using anchor-based methods.

Fig. 1
figure 1

Triangulation of sTST changes from anchor- and distribution-based analyses for the a open-label trial and b phase III trial. Data are the mean (95% confidence interval) change in weekly average sTST (min) from baseline to day 8 (red) and baseline to day 15 (black) in the open-label trial, and from baseline to month 1 (red) and baseline to month 3 (black) in the phase III trial. CGI-C Clinician Global Impression of Change, CGI-S Clinician Global Impression of Severity, ISI Insomnia Severity Index, PGA-S Patient Global Assessment of Disease Severity, PGI-C Patient Global Impression of Change, PGI-S Patient Global Impression of Severity, SD standard deviation, SEM standard error of measurement, sTST subjective total sleep time

3.6 Cumulative Distribution Function Curves

Meaningful within-patient change was also explored by generating CDF curves. Overall, CDF curves generated using open-label trial data indicated a consistent pattern of greater increases in sTST among subjects with greater improvements in daytime symptoms based on the PGA-S, PGI-C, CGI-S, and CGI-C (Fig. 2). Similarly, CDF curves for the phase III trial indicated greater increases in sTST among subjects with greater improvements in daytime symptoms based on the PGA-S and PGI-C and with greater improvements in night-time symptoms based on the PGI-S and PGI-C (Fig. 3).

Fig. 2
figure 2

CDF plots of weekly average change in sTST (min) from baseline to day 15 in the open-label trial by score changes and ratings for different anchors for a PGA-S; b PGI-C (daytime symptoms); c CGI-S; and d CGI-C. CDF cumulative distribution function, CGI-C Clinician Global Impression of Change, CGI-S Clinician Global Impression of Severity, PGA-S Patient Global Assessment of Disease Severity, PGI-C Patient Global Impression of Change, sTST subjective total sleep time

Fig. 3
figure 3

CDF plots of weekly average change in sTST (min) from baseline to month 3 in the phase III trial by score changes and ratings for different anchors for a PGA-S; b PGI-C (daytime symptoms); c PGI-S; and d PGI-C (night-time symptoms). CDF cumulative distribution function, PGA-S Patient Global Assessment of Disease Severity, PGI-C Patient Global Impression of Change, PGI-S Patient Global Impression of Severity, sTST subjective total sleep time

3.7 Concordance Analysis

In the phase III trial, mean and median sTST and calculated sTST values were higher at month 1 and month 3 compared with baseline, indicating increased sleep duration (ESM Table 6). Distributions of the individual differences between calculated sTST (based on separate SDQ items) and sTST (directly reported in SDQ question 9) at baseline, month 1, and month 3 were approximately normal and were centered around 0 (ESM Fig. 1). Mean calculated sTST was slightly smaller than sTST at all time points; the greatest mean difference was observed at baseline (− 6.5 min), with smaller differences at month 1 (− 4.5 min) and month 3 (− 5.5 min). The median difference between calculated sTST and sTST values was 0 at all time points. In a Bland–Altman analysis (ESM Fig. 2), bias was minimal, with mean differences close to 0. In addition, the majority of differences were located within the limits of agreement (upper and lower 95% CI bounds based on the mean difference ±1.96 SD). This indicates good agreement between reported and calculated sTST, although the Bland–Altman plots suggest that there may be more variability in the lower range of differences.

4 Discussion

Using data from two clinical trials, we estimated meaningful within-patient change for sTST. In conducting this quantitative analysis, we followed FDA guidance that recommends using an anchor-based approach, complemented by distribution-based analyses [12]. To evaluate existing PRO measures as potential anchors, we first selected measures that were easier to interpret than the PRO measure itself, as recommended in the FDA guidance [12]. We then identified moderate to strong relationships between changes in scores for the selected PRO measures and changes in weekly average sTST, supporting the chosen anchor-based approach to derive change estimates.

The meaningful within-patient change estimates for sTST we derived from analyses using open-label and phase III trial data were largely consistent with each other. Despite differences in sample size, duration of follow-up, and study design that may have contributed to variation in the meaningful within-patient change estimates for sTST derived from the open-label and phase III trials, the consistency of results across the two trials and the various follow-up time points supports a meaningful within-patient change threshold starting at 55 min. CDF curves also showed a consistent pattern of changes from baseline in sTST, where greater improvement in daytime and night-time symptoms of insomnia disorder corresponded to larger increases in sTST. This supports the use of the selected anchors in the analysis of meaningful within-patient change.

Values for sTST calculated using individual SDQ items showed good agreement with sTST values derived directly from SDQ question 9. Moreover, the distributions of individual differences between sTST and calculated sTST do not suggest systematic bias in the estimation of sTST. These observations indicate that people are able to accurately estimate the total time they slept the previous night (using question 9 in the morning questionnaire of the SDQ). However, it should be noted that the concordance analysis was included post hoc as a way of comparing two approaches to asking the same question; it did not validate sTST against an objective measurement.

Previous research has explored the associations of subjectively estimated and objectively measured sleep duration [21,22,23,24,25]. There is evidence that healthy sleepers overestimate their sleep duration, whereas people with insomnia disorder tend to underestimate it [22, 25]. One recent study compared polysomnography readings and self-assessments of sleep duration made with a daily sleep diary and a morning sleep questionnaire in patients with insomnia disorder attending a sleep disorders clinic or participating in a research study. The study found that subjective measures yielded lower estimates of sleep duration than polysomnography. However, rather than declaring subjective measures of sleep duration inaccurate or invalid, the study authors speculated that subjective and objective methods might measure different constructs and concluded that both provide useful information [26].

Strengths of the present study include our use of data from trials with different durations of follow-up (2 weeks in the open-label trial and 3 months in the phase III trial). Moreover, the analyses incorporated multiple anchors reflecting both daytime and night-time symptoms. Limitations include deriving the threshold for meaningful change in sTST using data from patients who participated in clinical trials. The trials had specific inclusion criteria for sleep parameters (≥ 30 min to fall asleep, wake time during sleep ≥ 30 min, and sTST ≤ 6.5 h at least 3 nights per week) and insomnia severity (ISI total score ≥ 15 indicating moderate to severe insomnia) that may not be representative of all individuals who experience insomnia. Therefore, real-world data would be valuable to assess improvements in sTST in broader populations of people with insomnia disorder and to evaluate the relevance and applicability of the meaningful within-patient change threshold identified in this study. It should be noted that the minimum increase in sTST that is meaningful to an individual patient may depend on their particular circumstances or lifestyle. Moreover, it should be acknowledged that increasing sTST might not be the most important treatment outcome for all patients with insomnia disorder. Heterogeneity in the patient experience of insomnia suggests that the treatment plan for an individual patient should consider their unique circumstances and be tailored to address the disease aspects that most bother them and most affect their daily life.

5 Conclusions

Despite being potentially less accurate than objective measurement, subjective assessments such as sTST provide important information from the patient perspective and can be helpful in tracking treatment responses in clinical practice. The present analysis based on independent interventional studies provides evidence that an increase in sleep time of approximately 55 min per night is meaningful to patients. This study adds to the body of literature supporting the importance of sleep duration to patients and highlights that even an extra hour of sleep time can be beneficial. This information could be useful in identifying treatment responders and non-responders, and could inform treatment decisions in clinical practice. Additional research using longer-term data would be helpful to confirm whether the magnitude of meaningful increases in sTST is maintained or changes (in either direction) with longer follow-up. Examining patients’ perceptions of improvement with long-term use of an insomnia medication could also help to better understand the patient experience of treatment and adherence to therapy.