Introduction

Measuring habitual sleep duration is essential in observational studies as it is correlated with many other health outcomes [1,2,3]. Three methods can be used to measure habitual sleep, including sleep questionnaire (single point self-reporting of habitual sleep duration), sleep diary that reports time in bed and wake up time for a representative period of time (usually 7 consecutive days), and accelerometry using an electronic device that measures the movement pattern of the subject under investigation, with the wake-sleep pattern determined by prolonged non-movement. All these methods are validated against the gold standard of sleep measurement in a laboratory condition, polysomnography [4,5,6], although discrepancies have been observed between these sleep measures and polysomnography, and the validity varied among subjects with sleep disorders [7]. Among these three measurements, accelerometry is the only objective measurement, and it has become more popular in sleep research due to its decreasing cost and ability to collect additional data such as physical activity and both electric and outdoor sunlight exposure [8]; therefore, accelerometry is also being regarded as a standard of sleep measurement in a free-living condition [9, 10], albeit with an overestimation in sleep duration and underestimation in wake after sleep onset and sleep onset latency [11].

Given its low cost and ease of administration, self-reported sleep duration remains a popular choice in observational studies. The sleep diary has a better validity than a sleep questionnaire as a diary is completed on a day-by-day basis and can capture the sleep variation of the respondents. The sleep diary was shown to have less than a 15-min difference in measuring sleep duration compared with accelerometry, albeit moderately correlated (r = 0.4–0.6) [12, 13]. However, the sleep diary introduced a burden to its respondents, and systematic bias existed due to the unavoidable difference between actual sleep onset time/wake up time and time at completing the diary. Therefore, self-reported questionnaires are still being widely used albeit with questionable validity. For instance, the National Health And Nutritional Survey (NHANES), which surveyed a US-representative sample of around 5,000 individuals each year, used a single-item question “How much sleep do you usually get at night on weekdays or workdays (hours)?” from wave 2005–2006 to wave 2017–2018 to measure sleep duration. Data collected using this question had been widely used as a measure of habitual sleep duration, and a number of studies correlated this with other health outcomes in NHANES [1, 2]. The single-item question “On average, how many hours of sleep do you get in a 24-h period?” used in Behavioral Risk Factor Surveillance System (BRFSS) has been validated in a subsample of 300 participants [14], but the validity of the NHANES question has not been validated. As a quality assurance procedure, this study aimed to validate the single-item total sleep duration question used in NHANES against a wrist-worn accelerometer (ActiGraph GT3X +) in waves 2011–2012 and 2013–2014 among an adult population aged 20 or above. The results may help evaluate the methodological quality of these existing studies using NHANES data.

Participants and methods

Participants

The complete details of the NHANES recruitment procedure can be found on the NHANES official website, https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/overview.aspx?BeginYear=2011. A total of 11,329 participants aged 20 + were recruited in NHANES 2011–2012 and 2013–2014, and only those who provided valid data on self-reported and accelerometer-measured sleep duration (defined in the “Measurement” section) were included in the present analysis.

Measurement

Self-reported sleep duration

Participants were asked “How much sleep do you usually get at night on weekdays or workdays (hours)?”. Participants responded with an integer value between 2 and 11, and responses of 12 h or more were coded to 12. I regarded sleep duration of 2 h/day as outliers and were removed from the analysis (n = 31).

Accelerometer-measured sleep duration

The complete details of the accelerometer procedure can be found on the NHANES official website (https://wwwn.cdc.gov/nchs/data/nhanes/2011-2012/manuals/2012-Physicial-Activity-Monitor-Procedures-Manual-508.pdf). In short, participants were invited to wear an accelerometer (ActiGraph GT3X + , https://actigraphcorp.com/) on their non-dominant wrist at the day of the examination, continue to wear it 24 h a day for 7 consecutive days, and remove it on the morning of the 9th measurement day. The accelerometer measured the acceleration with an 80 Hz frequency, and the epoch length was set at 1 min. Each of the measured minute was classified as either wake, sleep, non-wear, or unknown according to the signal power, variance of the orientation, and change of the orientation using a machine learning algorithm [15]. For the current analysis, sleep onset was defined as a consecutive sleeping period of at least 15 min, and a sleep period ended if a consecutive waking period of at least 15 min were recorded. The non-wear and unknown status of accelerometer data were not used in the current analysis. Sleep duration was calculated as the difference between the sleep onset and sleep offset. A sensitivity analysis was conducted to test the robustness of this parameter by computing the total sleep duration using 5-min, 10-min, and 20-min criteria. To align with the self-reported sleep duration, accelerometer data at weekends (i.e., Friday–Saturday and Saturday–Sunday nights) were removed from the analysis.

Data analysis

All accelerometer-measured sleep duration of < 3 or > 12 h/day were regarded as outliers and removed from the analysis. Paired sample t-test and Pearson correlation were used to examine the difference and correlation between self-reported sleep duration and accelerometer-measured sleep duration, respectively. Mean and SD of accelerometer-measured sleep duration across all levels of self-reported sleep duration were reported. The self-reported sleep duration was classified as underestimation, accurate estimation, and overestimation if the difference between the corresponding accelerometer-measured sleep duration was smaller than − 0.5 h/day, between − 0.5 and 0.5 h/day, and larger than 0.5 h/day, respectively. Bland–Altman plot was used to evaluate the agreement of self-reported sleep duration and accelerometer-measured sleep duration. All statistical analysis was conducted using R 4.0. The R syntax for accelerometer data processing is available as supplementary material.

Results

A total of 8,438 participants (mean age 49.7, SD 17.6) were included in the present analysis. On average, 2.8 (SD 1.2) valid accelerometer-measured sleeping episodes (i.e., sleep duration between 3 and 12 h) were provided by the participants, and the intra-class correlation coefficient was 49.5%. Table 1 shows the demographic characteristics and sleep duration of the participants. The sample was uniform across age and gender, and most of them were Non-Hispanic Whites (40.8%) and Blacks (23.4%). More than half of them had at least some college or AA degree (56.0%) and were married (50.8%). A large over-reporting was observed in the average daily sleep duration. The accelerometer-measured and self-reported sleep duration were 6.01 (SD 1.48) and 6.88 (SD 1.40) h/day, respectively, representing a 0.87 h/day of over-reporting (SD 1.90, p < 0.001). Such an over-reporting was observed in all subgroups, where the over-reporting ranged from 0.72 h (those aged 41–50) to 1.13 h/day (those aged 71 or above). The correlation between accelerometer-measured and self-reported sleep duration represented a small but positively, statistically significant effect size (ρ = 0.14, p < 0.001). A similar pattern was observed among all subgroups, with correlations ranging from 0.02 (separated) to 0.19 (those who graduated from college or above).

Table 1 Accelerometer-measured sleep duration (h/day) of the participants (n = 8,438)

Table 2 shows the distribution of the self-reported sleep duration, as well as the accelerometer-measured sleep duration across all levels of self-reported sleep duration (3–12 h/day). While there was a positive association between the sleep duration measured by accelerometer and self-report, the association was weak, and the mean accelerometer-measured sleep duration (h/day) across the groups differed by less than 2 h. For self-reported sleep duration of 5 h or less, the self-reported sleep duration overestimated the objectively measured sleep duration, while the self-report underestimated those who have an objectively measured sleep duration of 6 h or more. Figure 1 shows the Bland–Altman plot for these two measurements, where their large discrepancy was revealed by the wide range of the 95% limits of agreement (− 4.59, 2.84).

Table 2 Accelerometer-measured sleep duration (h/day) of the participants across different levels of self-reported sleep duration (h/day) (n = 8,438)
Fig. 1
figure 1

Bland-Altman plot for the agreement of self-reported sleep duration and accelerometer-measured sleep duration. The mean bias and 95% limits of agreement were 0.87 and (-4.59, 2.84), respectively

Table 1 shows the results of the sensitivity analysis. The accelerometer-measured sleep duration using the 5-min, 10-min, and 20-min criteria were 5.07 (SD 1.39), 5.68 (SD 1.41), and 5.49 (SD 1.59), respectively. They were mildly correlated (ρ = 0.37–0.79), and they all have small but significant correlation with self-reported sleep duration (ρ = 0.07–0.12). Similar patterns were found using different definitions of accelerometer-measured sleep onset and awake (5-min, 15-min, and 20-min definitions). The results of this sensitivity analysis supported that the conclusions drawn using the main study were robust to the criterion used to define sleep onset and awake.

Discussion

This study shows that self-reported single-item total sleep duration was only weakly associated with the sleep duration measured by a wrist-worn accelerometer (ActiGraph GT3X +), and participants over-reported their sleep duration by approximately 52 min per day with a wide 95% limits of agreement (− 2 h 50 min, 4 h 35 min). Therefore, the validity of this single-item sleep duration measurement and the validity of studies examining the associations between sleep duration and other health outcomes using NHANES data are questionable. Results obtained from research assessing sleep duration using this single-item question should be further tested using more accurate and valid measures of sleep duration. For analysis of sleep using NHANES data, the accelerometer data should be used instead given its validity and ability to measure other sleep parameters including sleep efficiency and wake after sleep onset.

Results of this study are not without limitations. The reference measure of sleep duration in the current study, the actigraphy, has its own limitations. Accelerometers are found to overestimate sleep duration by about 5–15 min in adult populations [5, 10, 11], indicating that the over-reporting of sleep duration might be more than the 52 min found in the current study. Note that the machine learning algorithm used to detect sleep duration in the current study has not been validated in a free-living condition among a general population. However, visual inspection of the accelerometer data from several participants by the authors confirmed the validity of this algorithm. There were no data on the time lag between self-reported and objectively measured sleep duration; thus, its effect on the validity of self-reported sleep duration could not be evaluated. Furthermore, as no data were available on the participants’ working pattern, a Monday to Friday pattern was assumed, and the accelerometer-measured sleep duration extracted may not have represented participants who worked on weekends.

The single-item question on sleep duration was used in NHANES from waves 2005–2006 to waves 2015–2016. However, only waves 2011–2012 and 2013–2014 of NHANES were used here as these were the only two waves where respondents concurrently wear an accelerometer so that correlating of the two measures are feasible. The same sleep duration question was implemented in waves 2015–2016. In NHANES 2017–2018, another question was added to collect self-reported sleep duration during weekends. In fact, for sleep duration, two types of questionnaire are used in the literature; the first type is a single-item question that asks the respondents the average sleep duration per night (e.g., NHANES and BRFSS [14]), and the second type is a two-item questionnaire that asks the sleep onset time and wake up time, and the sleep duration is determined by their difference (e.g., Pittsburgh Sleep Quality Index, PSQI [16]). In NHANES 2019–2020, instead of asking the total sleep duration, the second type of questionnaire was used. However, no concurrent objective measurement of sleep duration was available for 2015–2016 onwards, and the validity of the new sleep questions could not be evaluated.

In BRFSS [14], a US community sample comparable to NHANES, both self-reported and accelerometer-measured sleep duration, was 7 h/day. In the current sample, the self-reported sleep duration was 7 h/day and accelerometer-measured 6 h/day. Assuming that BRFSS and NHANES had a similar target population, it is possible that the accelerometer algorithm was biased and underestimated the actual sleep duration. However, no other measurements of sleep duration were available for the NHANES sample, and this postulation could not be examined. Also, the BRFSS subsample analyzed in the aforementioned study may not be comparable to NHANES as it was a small (n = 300) and geographically limited (Upstate New York region) study.

With the NHANES that surveyed a US-representative sample of n ~ 5,000 each year, much population-level research could be conducted, for example, sleep patterns in sub-groups and longitudinal change in sleep patterns. A simple analysis was performed here on the average sleep duration across different age groups, gender, ethnic groups, and education level (Table 1). The current research serves as a starting point of the above possibilities by providing the validity of the single-item sleep duration question.