A well-known self-perceptive assessment of excessive daytime sleepiness (EDS) is the Epworth Sleepiness Scale (ESS) (Appendix, items 1–8) [1]. The ESS has been translated into several languages and validated in populations of different lifestyles and economic backgrounds [211]. Two Chinese language versions have been previously validated using test groups from Hong Kong [12] and Taiwan [13]. However, the lifestyles and economic backgrounds of the populations in these two regions differ from those in mainland China.

Given the behavioral differences between regions and ethnic groups, as well as evidence of the inaccuracy of the ESS from previous studies [1416], we believe that modification of the questionnaire is necessary before it can be applied as a routine clinical tool for EDS assessment in mainland China.

Materials and methods

The eight situations described in the original English language ESS were translated into official Chinese language (Mandarin) as suggested by previous guidelines [17, 18]. The first author of this article, a native Chinese speaker, made the initial English–Chinese translation. The Chinese version was then verified via backward translation into English by two bilingual physicians. Each item was discussed until agreement was reached on an appropriate translation, always striving to keep the sentences conceptually understandable and simple (Appendix, item 1–8). The final Chinese translation of the questionnaire is nearly the same as the translation made by a group of physicians in Taiwan [13], where Mandarin Chinese is also the official language. Consent for the use of ESS for validation in China was obtained from its creator, Dr. Murray Johns.

Two newly designed backup items, in accordance with the living habits of the people in central China (Appendix, items 9 and 10), were added to the original ESS. The Ten-item Sleepiness Questionnaire (10-ISQ) and the original ESS were used as the basis for identifying eligible items.

Selection of the two back up items was based on the following reasons:

  1. 1.

    Games such as mahjong, poker, and chess are very popular in the region studied;

  2. 2.

    The majority of people in this region habitually takes a nap after lunch;

  3. 3.

    It has been long noticed in our clinical practice that the patients with SDB were likely to doze off in these two situations.

If an item in the original ESS was proved unreliable by multivariate exploratory statistics as detailed below, it would be replaced by an eligible backup item, thus a modified ESS (mESS) with the same scale as the original one would be built.

All subjects, whether patients or normal subjects, came from Hubei and Henan, two provinces in central China that share a common border. The patients recruited in this study were consecutive patients referred to our sleep laboratory for sleep evaluation because of suspected sleep-disordered breathing (SDB). Their ages ranged between 18 and 65 years. Chief complaints of most of our patients were feeling sleepy during the daytime, loud snoring at night or longstanding abnormal pauses in breathing during sleep. None of the subjects reported chronic use of medicines with hypnotic effects, such as antihistamines, benzodiazepines, and barbiturates. The normal subjects recruited were healthy, aged between 18 and 65 years. They did not snore, did not do regular shift work, or have any other medical conditions requiring chronic treatment.

Prior to the nocturnal sleep study, the patient was asked to rate on a scale of 0 ~ 3 how likely he/she would be to doze off in the situations described in the 10-ISQ based on his/her usual way of life in recent months (1 to 3 months or so). If the subject had not experienced some situations, he/she was recommended to imagine how each might affect him/her. Patients were also asked to note their experience with driving. Refusal to cooperate was rare and only happened in two or three cases with very late arrival; inadequate time was their main excuse for not participating.

Multivariate exploratory techniques, including reliability/item analysis and factor analysis, were used for item validation, and then the mESS was built. Patients diagnosed with severe obstructive sleep apnea (OSA) [apnea and hypopnea index (AHI) ≥ 30 h−1] who tolerated long-term treatment with nasal continuous positive airway pressure (nCPAP) were administrated the mESS again 3 months later in order to assess interpretability of score changes.

Results

The performances of the 10-ISQ and the original ESS were good for patients but not for normal subjects (Table 1). However, item 8 in the patients’ response showed a very low item-to-total correlation. Deletion of it would increase the standardized values of Cronbach’s alpha, from 0.86 to 0.87 for the 10-ISQ and from 0.83 to 0.85 for the original ESS. The items had more factor loading on the principal factor, which was supposed to be the assessment of EDS, were items 1, 2, 3, 4, 5, 6, 7, 9, 10 but not 8 (Table 2). Only 31 out of 122 (25.4%) patients reported that they drive often.

Table 1 Reliability/item analysis for the Ten-item Sleepiness Questionnaire, the original Epworth Sleepiness Scale (ESS), and the modified Epworth Sleepiness Scale (mESS)*
Table 2 Factor loadings (Varimax normalized) and eigen values for Ten-item Sleepiness Questionnaires and modified Epworth Sleepiness Scale (mESS)*

The mESS was thus built by substitution of item 8 in the original ESS with item 10 in the 10-ISQ. Improvement was demonstrated in patients’ response (but not in normal subjects’ response) that the mESS had better internal consistency than the original ESS (Cronbach’s alpha increased from 0.83 to 0.86) and assessed only one main factor (Table 1 and 2).

Among the 122 patients with suspected SDB, 119 met the minimal diagnostic criteria for obstructive sleep apnea (OSA) by sleep study, e.g., apnea and hypopnea index (AHI) ≥ 5 h−1. The patients with more severe OSA had higher mESS scores (P < 0.01 by ANOVA and post hoc Scheffés test, Table 3). Among those parameters relevant to SDB, such as age, body mass index (BMI), AHI, and percentage of sleep time with oxygen saturation below 90% (SpO2 < 90), AHI was the best predictor of the mESS score, as demonstrated by multiple regression test (P < 0.01, Table 4).

Table 3 Demographic data and mESS scores in patients and in normal subjects
Table 4 Correlation of mESS score to parameters of sleep study for patients with OSA, evaluated by multiple regression test*

The mESS scores for the 21 patients (aged 41.8 ± 11.9 years; 19 men, 2 women) with severe OSA and good compliance to nCPAP were 18.9 ± 2.1 before treatment and 10.3 ± 4.4 after 3 months of treatment, as shown in the Fig. 1 (t test, t = 8.50, P < 0.001).

Fig. 1
figure 1

For 21 patients with severe obstructive sleep apnea, the sum scores of the modified ESS before and after nasal CPAP treatment for 3 months are shown (P < 0.001). The two values for individual patients are connected with straight lines; mean values before and after the treatment period are represented by horizontal lines

Discussion

Previous validations of the Chinese versions of the ESS using multivariate exploratory techniques in Hong Kong [12] and Taiwan [13] had consistently shown that the internal homogeneity was satisfactory when the questionnaire was applied to OSA patients (Cronbach’s alpha was 0.80 and 0.81, respectively). Unlike our results, the Taiwanese study [13] (the Hong Kong study did not provide information about item reliability) showed that item 8 was reliable, and the ESS measured one main factor. Therefore, the application of ESS to Taiwanese patients without any modification seems to be reasonable.

Lack of driving experience might be one of the reasons for our patients’ poor responses to item 8; nearly 75% of them did not drive or was rare to drive. Our current study is not the only one indicating that item 8 is unreliable. A recent study for ESS using confirmatory factor analysis based on OSA patients in Australia also demonstrated that the original ESS was not an accurate measurement [16]. Of particular note in this study was that item 8 had the lowest standardized regression weights, and 78% of patients with OSA responded that they “would never doze” to it.

Given the poor performance of item 8, replacement of it with more reliable item is necessary. As a result of this substitution, the mESS not only improved the internal validity (increased Cronbach’s alpha) but also congregated the measurement into one dimension, i.e., EDS. It was just what the original one did while applied to Australian patients [11]. When constructing the mESS, we chose item 10 instead of item 9 in the 10-ISQ to replace item 8 in the original ESS, because item 10 had contributed slightly more than item 9 to the principal measurement of the 10-ISQ.

The mESS scores seemed to be capable of discerning the severity of OSA (Table 3). The interpretability of mESS to score change was good and comparable to an investigation based on a validated German version of ESS [10]; alleviation of sleep apnea by nasal CPAP treatment led to significant reduction in the scores in both studies.

The discovery that our subjects’ mESS scores correlated only to the frequency of apneas and hypopnea (R2 = 0.50, P < 0.001) by multiple linear regression is very similar to what Jones found (R2 = 0.23, P < 0.001) [19], except for the higher coefficient of correlation in our study. Sampling bias might be one of the reasons for this discrepancy. Our patients were much more severe: 64% OSA patients in our study had AHI more than 25 h−1, compared to just 36% in Jones’s study. We postulate that the more severe OSA a patient suffers, the less likely his or her EDS is contributed by confounding factors or overlapping comorbidities.

However, the mESS is not reliable for normal subjects because of low internal consistency and more than one measurement dimension (Tables 1, 2). The studies based on Australian [11], German-speaking Swiss [10], and Hong Kong Chinese [12] also indicated that the internal consistency of the ESS as applied to normal subject was inferior to that for patients (Cronbach’s alphas of normal subjects vs. SDB patients were 0.73 vs. 0.88, 0.60 vs. 0.83, and 0.69 vs. 0.80, respectively) implying the irrelevancy of this questionnaire to subjects without EDS. Although the Australian study obtained an acceptable Cronbach’s alpha (0.73) for the medical students, the students could not be regarded as normal. They were more sleepy (ESS 7.6 ± 3.9) than the healthy Australian subjects (ESS 5.9 ± 2.2) in another investigation [1, 11].

We conclude that not only is the modification of ESS necessary, but also that the mESS improves the validity of ESS in assessment of EDS for patients with SDB in central China.