Introduction

Sleep health has become an important public health concern and is composed of six dimensions: regularity, satisfaction or quality, alertness or sleepiness, timing, efficiency or continuity, and duration [1]. “Sleep quality” is defined as an individual’s self-satisfaction in all aspects of sleep, and is used to refer to subtle aspects of the sleep experience, which cannot be captured by objective sleep parameters [2, 3]. To date, the concept of sleep quality does not have a consensus definition, but it is generally believed to be comprised of several factors: sleep continuity, sleep efficiency, sleep latency, sleep duration, awakening, and wakefulness after falling asleep [4, 5]. Even though the fact that polysomnographic parameters are considered the gold standard for measuring sleep [5, 6], common sentiment is that sleep quality considered as “subjective perception of sleep” [7]. In other words, a subjective evaluation of sleep quality may better explain everyday psychological and behavioral performance which is not captured by objective measurements [8].

Poor sleep quality has arisen as a widespread issue, with nearly one-third of adults claiming to experience dissatisfaction with their sleep [9, 10]. Numerous previous studies have found that university students and healthcare workers are at increased risk for sleep problems [11, 12]. Healthcare students, a subgroup of university students, have a significantly higher prevalence of poor sleep quality than non-medical students and general population [13, 14]. Healthcare students experience noticeable sleep issues which were primarily related to their intense academic requirements and high achievement expectations [15]. COVID-19 pandemic and associated lockdown may have contributed to a higher prevalence of sleep problems in healthcare students [13, 16, 17]. Despite healthcare students as a clinical or research sample have a significant risk for sleep disorders, did not lead to academic circles adequate attention [18].

Quality sleep is an essential part of a healthy lifestyle. Factors that influence sleep quality in healthcare students include physiological (e.g., age, BMI), psychological (e.g., stress, anxiety, and depression) [19,20,21], environmental (e.g., bedroom light, room temperature, and noise), and family/social expectations [4]. Importantly, psychological distress and sleep problems demonstrate a bidirectional relationship [17]. To mitigate negative effects of poor quality on their psychosomatic health, accurate monitoring and prompt diagnosis are vital, particularly in healthcare students [22, 23].

As noted previously, objective sleep measures have trouble applying in widespread applicability [6]. Scales for subjective measurement of sleep quality are widely used in research and clinical practice, with common scales including the Pittsburgh Sleep Quality Index (PSQI) [24], the Sleep Quality Scale [25], the Sleep Quality Index [26], and the Sleep Satisfaction Tool [27]. In China, existing sleep-related scales focused on assessing nighttime sleep or were designed for explicitly for clinical settings and patients. The assessment of sleep quality is not limited to nocturnal symptoms and includes many daytime consequences of poor nighttime sleep [28]. The Sleep Quality Questionnaire (SQQ) is a concise psychometrically sound instrument developed specifically for evaluating sleep quality across two core dimensions (Sleep Difficulty and Daytime Sleepiness) via 10 items among non-clinical populations [29].

The Chinese version of the SQQ (SQQ-C) has been translated, adapted, and validated, demonstrating equivalence between the two versions while maintaining original measurement properties [30,31,32]. The SQQ demonstrated a two-factor structure in both the original Japanese and Chinese versions. Credibility and applicability of the SQQ has been established via the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidelines [33, 34]. Nonetheless, discriminant validitsy of the SQQ-C has not been fully established, and some discrepancies exist regarding factor loadings. The original version suggested that sleep quality measured by the SQQ was a strong predictor of general health; however, less was known about its association with anxiety and depressive symptoms. This study aims to further establish measurement properties of the SQQ among Chinese healthcare students. We examined discriminant validity of the SQQ-C with other constructs including anxiety, depression, and self-rated health. Furthermore, we performed multiple linear regression analysis to examine incremental validity of the SQQ with the Generalized Anxiety Disorder-2 (GAD-2), Patient Health Questionnaire-2 (PHQ-2), and Self-Rated Health Questionnaire (SRHQ).

Methods

Participants and procedures

This study focuses on freshman to junior undergraduate and freshman postgraduate students who are attending regular classes at university. College seniors are at a developmentally distinct phase of their educational journeys. Considering that seniors are facing more stressful life events such as graduation, job search, and clinical internship, as they are making a shift from student to worker, we excluded this population. More specifically, for undergraduate students: a group of students from the first to the third year of health management, nursing, pharmacy, clinical medicine, preventive medicine, and health services management majors were selected as a teaching class; and for postgraduate students: all first-year students in health management, health services management, and public health majors. All study data was collected via short paper-and-pencil questionnaires in two waves between December 2020 and January 2021 at a university in Hangzhou, China (an area at low risk for COVID-19 infection during data collection). Data collection was conducted during breaks between classes. Only full-time students attending classes on-site were recruited; students on suspension and long-term sick leave were excluded. Following the instructions of the study protocol, informed consent was obtained from each participant.

Baseline assessment (Time 1, T1) included a brief sociodemographic questionnaire and a small packet of scales, as described below. A follow-up assessment (Time 2, T2) was conducted approximately one week after T1 and included all measures at T1 except for the demographic survey. The average time interval between T1 and T2 was 7 days + 16.62 hours. The reproducibility of health measurements is optimized at intervals of 1–2 weeks [35]. Six hundred and thirty-seven (N1 = 637) and six hundred and sixteen (N2 = 616) valid data were obtained at T1 and T2, respectively. Both assessments included unique student ID of each respondent. The last author manually matched with ID across two assessment questionnaires via Microsoft Excel, thereby resulting in a matched sample (N = 595).

The study protocol was approved by the Institutional Review Board of Hangzhou Normal University Division of Health Sciences, China (Reference No. 20190076). The process was carried out following the principles of the Declaration of Helsinki [36].

Measures

Sociodemographic questionnaire

During the baseline, participants were asked to provide demographic information, including gender, age, grade, academic stage (undergraduate, postgraduate), home location (urban, rural, suburban), being only child (yes, no), monthly household income (1 CNY ≈ 0.160 US dollars), part-time job (yes, no), physical exercise [purposeful exercise with the goal of improving health (yes, no)], hobby [regularly and frequently engage in activities or projects of their preference (yes, no)], frequency of visiting home (once per week, twice per week, once per month, once per quarter, once per semester, and once per academic year), and stress coping strategy [the most customary way of coping when faced with stress (emotion-focused, solution-focused, avoidance coping)].

To facilitate meaningful data analysis, we grouped age, monthly household income, and frequency of visiting home as follows: 1) Age: < 20 years and ≥ 20 years, 2) Monthly household income: below and above the average household income (10 000 CNY) of a three-member family in China in 2021 [37], 3) Frequency of visiting home: frequently versus occasionally.

Sleep Quality Questionnaire (SQQ)

The original SQQ is a 10-item scale consisting of the Sleep Difficulty Subscale (SDS, e.g., “I had trouble sleeping”) and Daytime Sleepiness Subscale (DSS, e.g., “I sometimes felt sleepy during the day”), which is used to assess subjective sleep quality over the past month in non-clinical samples [29]. Responses are recorded on a five-point Likert scale ranging from 0 to 4 (with 0 indicating “strongly disagree” and 4 indicating “strongly agree”). Global scores range from 0 to 40 with a higher score indicating poorer subjective sleep quality. Previous studies have demonstrated that the SDS was stably loaded by three items (items 1, 4, 9), while the DSS was loaded by six items (items 3, 5, 6, 7, 8, 10). Item 2 of the SQQ showed cross-loading in both the original and Chinese versions. The strong psychometric properties observed in previous studies have promoted wide usage of the SQQ-C as a practical measurement instrument in research and survey sites [30, 32].

Patient Health Questionnaire‑4 (PHQ‑4)

The PHQ is a self-report version of the Primary Care Evaluation of Mental Disorders (PRIME-MD) [38], both the original and derived scales are available from the ‘Patient Health Questionnaire (PHQ) Screeners website’ [39]. The PHQ-4 [40], consisting of the PHQ-2 (e.g., “Little interest or pleasure in doing things”) and GAD-2 (e.g., “Feeling nervous, anxious, or on edge”), are two extensively used ultra-short screening tools to measure depressive disorder and anxiety symptoms. In fact, the PHQ-4 was originally designed for patients, yet subsequent studies have continued to advance the validation and applicability of the PHQ-4 in the general population [41, 42]. Each item of the PHQ-4 is scored on a four-point Likert scale ranging from 0 (not at all) to 3 (nearly every day), with higher global scores being indicative of higher levels of depressive and anxiety symptoms. For the PHQ-2 and GAD-2, a scale score of ≥ 3 is recommended as a cut-off point between the normal range and possible cases of depression or anxiety, respectively.

Self-Rated Health Questionnaire (SRHQ)

The SRHQ is a simple questionnaire consisting of two items that assess self-rated physical health and mental health, respectively [43]. The SRHQ utilizes a five-point Likert scale from 1 (excellent) to 5 (extremely poor), with higher total scores implying poorer self-rated health status.

Statistical Analyses

All statistical analyses were completed with R (4.1.3) and JASP (0.16.1). R packages of “lavvan (0.6–11)” [44], “MBESS (4.9.1)” [45], “irr (0.84.1)” [46], and “semTools (0.5–6)” [47] were applied. The missing rate of data was smaller than 5% (ranged from 0.168% to 1.681%), therefore the mean (continuous variables) or median (categorical variables) was used to impute missing values, treating the same methods as in our previously publications. Differences in sleep quality levels based on sociodemographic characteristics were analyzed using t-tests and ANOVA. Multiple regression models were used to examine incremental validity of the SQQ with the PHQ-4 and SRHQ as indicators of criterion variables. Following the COSMIN guidelines, different metrics were applied to assess measurement properties of the SQQ [33, 34], a brief description is provided with undermentioned items.

Structural validity

Based on a previous study [30], we conducted CFA on three alternative models of the SQQ: item 2 loading on the SDS (Factor 2), item 2 loading on the DSS (Factor 1), and the SQQ-9 (item 2 excluded). The weighted least squares mean and variance adjustment (WLSMV) method was used for all CFA analyses, given that the data were ordinal [48]. The indexes of goodness-of-fit used were the chi-square (χ²) value, P value, comparative fit index (CFI), Tucker-Lewis index (TLI), Akaike information criterion (AIC), Bayesian information criterion (BIC), and root means square error of approximation (RMSEA). For interpretation purposes: smaller values of AIC and BIC suggest a better model fit, CFI and TLI ≥ 0.90, RMSEA ≤ 0.08 are considered adequate and CFI and TLI ≥ 0.95, RMSEA ≤ 0.05 are considered good [49,50,51]. χ² is considered as sensitive in large samples and tends to reject the optimal model as the sample size increases, we do not use it as the only criterion for the goodness-of-fit judgment.

Cross-cultural validity/measurement invariance

Multi Group CFA and longitudinal CFA were conducted for testing measurement invariance (MI) of the SQQ-C for sociodemographic variables and across time intervals, respectively. In cross-population and cross-time applications of the scale, the establishment of MI means that construct measured by instrument does not change with population heterogeneity and over time. MI will be established on unbiased latent construct comparisons across groups and time. Increasingly constrained and nested models are stacked and tested against each other. That is, the SQQ has the same interpretation: the measurement structure and domain are relatively stable [52], with strict model invariance being well-validated [53].

Four nested models including configural, metric, scalar, and strict models were tested. Configural invariance requires only the same basic structural relations between observed variables. Metric invariance implies similar measurement constructs and limits factor loading to be equivalent across groups. Scalar invariance limits factor loadings and intercepts, meaning that the systematic bias of measurement content is consistent across groups and time. Strict invariance proves that group differences are caused by latent variables by limiting item residuals [54]. The fit indexes CFI, TLI, and RMSEA were applied on evaluating the goodness of model fit, changes (Δ) in CFI were used to assess whether there was invariance between progressively constrained models, where ΔCFI ≤ 0.010 was considered satisfactory [51, 55].

Construct/discriminant validity

Spearman correlation coefficient was calculated to evaluate construct/discriminant validity of the SQQ total and subscales with the GAD-2, PHQ-2, and SRHQ [56]. Considering the bidirectional association between sleep quality and psychosomatic health, we hypothesized that the SQQ would have a moderately strong correlation (0.30 < r ≤ 0.50) with the GAD-2, PHQ-2, and SRHQ [57].

Internal consistency and test–retest reliability

Ordinal Cronbach’s alpha and McDonald’s omega were calculated to estimate internal consistency of the SQQ and were considered good when they were equal to or greater than 0.70 [45, 55]. Test–retest reliability of the scale was assessed using intraclass correlation coefficient (ICC), with values between 0.50 and 0.75 indicative of moderate reliability, and values between 0.75 and 0.90 indicate good reliability [58]. Meanwhile, standard error of measurement (SEM) was used as a supplementary index to determine measurement accuracy in the evaluation of test–retest reliability [59].

Multivariate regression analyses

Sleep difficulty and daytime sleepiness were used as potential influences on negative symptoms (anxiety and depression) and self-rated health for further multiple linear regression analysis. Baseline sleep quality scores were applied to predict anxiety, depression, and self-rated health scores at follow-up to examine the incremental validity of the SQQ. Confirmation of linear relationship between variables, autocorrelation statistics were performed with the Durbin-Watson test. The variance inflation factor (VIF) was used to evaluate the multicollinearity between variables, and less than 10 was considered to be the absence of collinearity [60].

Results

Participants

The final 595 participants (Table 1) included in this study ranged in age from 17 to 31, with an average age of 19.857 ± 1.625 years. Among respondents, 554 (93.109%) were undergraduates and 41 (6.891%) were postgraduates. The average score of the SQQ-9, SQQ, PHQ-4, and SRHQ at each timepoint was 15.968 ± 5.449, 18.111 ± 6.102, 3.726 ± 2.390, and 4.598 ± 1.210 at T1, 16.114 ± 5.438, 18.237 ± 6.095, 3.508 ± 2.107, and 4.418 ± 1.120 at T2, separately. Table S1 (Additional file) lists the description of item and factor scores for the main variables.

Table 1 Sociodemographic variables and comparison of sleep quality (N = 595)

Structural validity

We tested three alternative models (i.e., item 2 in the SDS, item 2 in the DSS, and item 2 excluded) for CFA (Table 2). It is evident that the item 2 in the SDS model (original model) obtained an inadequate fit, while the SQQ-9 (item 2 excluded) model slightly outperformed the item 2 in the DSS model in terms of fit indices (except RMSEA). The smaller AIC and BIC values of the SQQ-9 model imply that there is greater stability and applicability of the SQQ toward Chinese healthcare students.

Table 2 Fit indices for three alternative CFA models of the SQQ (N = 595)

Sociodemographic factors related to sleep quality

There were no significant differences in sleep quality scores between genders, but differences in the SQQ-9 scores between age, grade, and academic stage groups (Table 1). Students who had a hobby and responded to stressful events with a positive attitude had better sleep quality. Meanwhile, students with anxiety symptoms, depressive tendencies, or poor self-rated health had high scores on the SQQ-9.

Cross-cultural validity/measurement invariance

Table 3 presents the results of examining cross-sectional measurement invariance (CMI) among different subgroups. The four models with constrained stepwise invariance assumptions were well-fitted in subgroups for both the baseline and follow-up data: CFI was from 0.953 to 0. 997, TLI was from 0.958 to 0.998, and all REMSA values were less than 0.080. Further, except that the ΔCFI of the metric model in the baseline SRHQ subgroup (-0.011), all the other ΔCFI were in threshold levels. The complete fitting information can be accessed in Additional file: Table S2. Δχ² indicated no significant difference in the invariance of models in nearly all subgroups.

Table 3 Cross-sectional measurement invariances of the SQQ-9 (N = 595)

Longitudinal measurement invariance (LMI) across time intervals (Table 4) also shows an excellent fit, with all fit indexes within critical values (CFI: 0.981–0.990, TLI: 0.982–0.989, RMSEA: 0.037–0.047, and SRMR: 0.039–0.042). The change in fit indexes were also in the acceptable range.

Table 4 Longitudinal measurement invariance of the SQQ-9 (N = 595)

Construct/discriminant validity

Parts i and ii of Fig. 1 show the results of Spearman correlation analysis between the SQQ, PHQ-4, and SRHQ items, total scores, and subdomains at T1 and T2, respectively. The coefficients on the left side of the black line are the items, factors and total correlations within the SQQ, and the coefficients on the right side of the black line are discriminative validity estimates between the SQQ with the PHQ-4 and SRHQ. Whether it is 9 items or 10 items, the correlation coefficients between the SQQ with the GAD-2, PHQ-2, PHQ-4, and SRHQ are all moderate in strength (0.30–0.50). Table S3 (Additional file) displays the results of the correlation coefficients combining the two assessments.

Fig. 1
figure 1

Item–factor, factor–total, and discriminant correlations between the SQQ, PHQ-4, and SRHQ (N = 595). Note: Spearman correlations, T1 Time 1, T2 Time 2, SQQ Sleep Quality Questionnaire, SQQ01-10 item 1–10, SDS Sleep Difficulty Subscale, DSS Daytime Sleepiness Subscale, DSS (-) item 2 excluded from the DSS, SQQ-9 item 2 excluded from the SQQ, GAD Generalized Anxiety Disorder, PHQ Patient Health Questionnaire, SRHQ Self-Rated Health Questionnaire

Internal consistency and test–retest reliability

Although the previous CFA suggested that the SQQ-9 was a better model than the original SQQ-10, we still estimated reliability of the original scale and its subscales (Table 5). All ordinal Cronbach’s alpha and McDonald’s omega values are greater than 0.800, and ICC values suggested that the subscales and total scale have excellent reliability except for the SDS (ICC = 0.741). In this study, the GAD-2, PHQ-2, PHQ-4, and SRHQ all showed good reliability (T1: 0.814, 0.747, 0.844, 0.704; T2: 0.784, 0.743, 0.831, 0.710).

Table 5 Internal consistency and test–retest reliability of the SQQ (N = 595)

Multivariate regression analyses

Multiple linear regression analysis (Table 6) was performed based on the scores of the SDS and DSS for the PHQ-4 and SRHQ. There is a linear relationship between the independent and dependent variables, and VIF value recommended little multicollinearity. Sleep quality as measured by the SQQ predicted 20.7% of anxiety and depressive symptoms and 10.8% of self-rated health, respectively. Overall, β weights for both daytime sleepiness and sleep difficulty scores were significant at the 0.001 level.

Table 6 Multiple linear regression analysis predicting negative symptoms and self-rated health (N = 595)

Discussion

This study auxiliary examined measurement properties of the SQQ-C in healthcare students. Our findings indicated good structural, cross-cultural, and discriminant validity, adequate internal consistency, and stability of the SQQ-9. CMI was established based on sociodemographic variables that may affect sleep quality. LMI models across time intervals further suggested that the SQQ-9 is a promising and practical measurement instrument for assessing sleep quality. Multiple linear regression results demonstrated that sleep quality measured by the SQQ can be used to predict short-term negative symptoms (anxiety and depression) and self-rated health status.

CFA results suggested that the two-factor SQQ-9 structure fits the data best. Interestingly, such a factor structure is inconsistent with the originally proposed factor structure [29], the same factor structure appeared in our previous study using the SQQ to measure sleep quality in Chinese samples [30]. These studies yielded the SQQ-9 in suggesting that comprehension discrepancy might be due to cross-cultural differences and translation issues.

MI of the SQQ-9 held perfectly for most cross-sectional subgroups and longitudinal interval, as demonstrated by configural, metric, scalar, and strict invariances regardless of gender, age, grade, academic stage, hobby, stress coping strategy, anxiety symptoms, depressive tendencies, self-assessed health status, and across time intervals. In a previous study of healthcare students, it was found that age and gender were not significantly associated with sleep quality or daytime sleepiness [61]. This conclusion was confirmed through our invariance tests, but it was possible due to sampling homogeneity and the narrow range of age. Lastly, we showed the SQQ-9 has strict LMI across two-time intervals, indicating that the SQQ-9 is a stable scale with high reliability.

Our results, as in previous studies, revealed that sleep quality in healthcare students was moderately associated with negative mental states (anxiety and depression) [62, 63]. The findings were also consistent with a previous study on medical students in Malaysia, where daytime sleepiness was more pronounced among medical students who reported poor sleep quality and psychological distress [64].

Cronbach’s alpha, McDonald’s omega, ICC, and SEM were used to measure the reliability of the SQQ. Given that Cronbach’s alpha as an indicator of reliability is still contested [65,66,67], we further demonstrated high internal consistency of the SQQ and its subscales through McDonald’s omega. Both indicators demonstrated a high homogeneity between the SQQ items. Retest intervals of 1 or 2 weeks are considered typical intervals used to validate the reproducibility of health status measures used longitudinally [35, 68]. ICC is a relative measure of reliability and SEM is an indicator of absolute reliability, both showed good test–retest reliability [59]. Collectively, the evidence indicated the SQQ has good reliability.

Multiple linear regression analysis demonstrated that poor sleep quality was a predictor of negative mood (anxiety and depression) and self-rated health status. Sleep problems and mental health issues are common in modern society. Generally, good sleep can relieve psychological problems (e.g., depression, anxiety, and stress), while poor sleep may have negative effects on quality-of-life and academic performance, or vice versa [69, 70]. The relationship between sleep quality and mental health is well documented, in terms of primary prevention, thus sleep improvement represents a viable therapeutic goal that can provide significant benefits to mental health [71].

The importance of sleep to public health and the contribution of insufficient sleep to health disparities deserves to be emphasized. The development of healthy sleep education, especially early intervention on campus, contributes to the development of good sleep habits [72]. Cognitive behavioral therapy for insomnia is considered a first-line treatment for sleep improvement and standardized protocols have been developed [73]. Moreover, this has been found to be effective in reducing psychological problems and improving sleep-related quality of life [74, 75].

The SQQ-9 is a slightly short form determined from factor analysis results. However, the development of ultrashort measurement instruments is a priority and has been recognized as necessary for large-scale population screening (e.g., PHQ-2, GAD-2). In terms of sleep quality assessments, the previously developed single-item Sleep Quality Scale [76], the two-item Sleep Condition Indicator (SCI) [77, 78], and the six-item PSQI [79] suggest that short questionnaires are a potential possibility.

Strengths and limitations

Discriminant validity and incremental validity of the SQQ-C were complemented in this study, we were able to retest almost all participants to examine the stability and establish LMI of the SQQ-C. This study has some limitations. First, generalizability may be limited by the type of university and overrepresentation of female respondents. As such, the extrapolation of conclusions may require more consideration and caution towards other samples. Second, since the survey was conducted during the COVID-19 epidemic and the university was in lockdown (students are not allowed to enter or leave the campus when it is not necessary), students may have been experiencing higher levels of psychological distress and sleep problems. Third, objective measures were not used in the study, and thus results are limited by the self-report nature of the data.

Future directions

The stability and reliability of the SQQ-C should be examined across samples (e.g., occupational population, floating population), locations (e.g., multicenter or multilocation), and settings (e.g., communities, worksites). Similarly, the SQQ-C should be administered with other sleep assessment scales (e.g., PSQI, SCI) that yield cutoff values for good and poor sleep quality, allowing for estimation cutoff points for the SQQ-C. Additionally, including objective sleep measurements (e.g., wristwatch activity recorder, portable sleep monitoring device) in future studies could yield additional insights about sleep quality. Applying the SQQ-C routine monitoring and exploring response shifts in multi-wave would help to further investigate risk or preventive factors related to sleep health [1, 80, 81].

Conclusions

The SQQ-C is a practical psychometrically sound instrument to assess sleep quality, including sleep difficulty and daytime sleepiness, amongst healthcare students. This study demonstrates that the SQQ-9-C has good psychometric properties and measurement invariance. Sleep quality as measured by the SQQ-9-C associated with short-term negative symptoms (i.e., anxiety and depression) and self-rated health status. Our previous and current findings combined suggests that the SQQ-C can be used to accurately measure sleep quality in community and research settings.