Gender- and age-bias in CES-D when measuring depression in China: A Rasch analysis


Practitioners often screen depression among the general population with the Center for Epidemiologic Studies Depression (CES-D) scale (Radloff, 1977). However, few studies tested the scale for bias (by gender and age) across a wide age range. This study does so with a partial credit Rasch model via Winsteps® (Linacre, 2018) on 34,762 Chinese people, 10–99 years old. Results showed one gender Differential Item Functioning (DIF) item (cry) and six age DIF items. As low positive-affect was not a good indicator of depression, its four items were excluded, yielding a 16-item CES-D (CES-D16). At the same level of depression, females report crying more often than males do. Compared to young people at the same depression level, older people felt less fearful, cried less, were bothered less often, had more sleep problems, needed more effort to do things, and could not get going more often. When using the CES-D16 to examine the general population of Chinese people across different genders and ages, researchers should pay special attention to these DIF items.

This study used secondary data of the 2012 wave of the China Family Panel Studies (CFPS), which can be accessed by visiting Peking University Open Research Data Platform (


The work was partially supported by the grants from the Central Reserve Allocation Committee and the Faculty of Education and Human Development of The Education University of Hong Kong (Project No. 03A28).

Jinxin Zhu: Conceptualization, Methodology, Writing- Original draft preparation.

Ming Ming Chiu: Conceptualization, Supervision, Writing- Reviewing and Editing.

Correspondence to Jinxin Zhu.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical approvals were gained from The Education University of Hong Kong.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Shorter, Gender- and Age- Unbiased Depression Scale (CES-D10)


Deleting the six DIF items from the CES-D16 yielded a shorter CES-D10. Further analysis showed the unidimensionality of the CES-D10. The PCA of residual variances yielded the first contrast eigenvalue of 1.52 (less than 2) and accounted for 35.8% of the raw variance, close to the 36.5% for the CES-D16. The Infit and Outfit MNSQ values were between .71 and 1.32 showing good fit (within 0.5 – 1.5; see Table A1). All ten items were in the same direction as the scale (all item point-measure correlations were positive). Also, the correlation between person measures obtained from the CES-D16 and CES-D10 was high (0.95).

Table A1 Item statistics

Cronbach’s alpha of the CES-D10 was .83, and the item reliability was 1.00, close to those for the CES-D16. The Rasch person reliability is relatively low (.63), due to the few items (N = 10) relative to the large sample (N = 32,082).

The Rasch-Thurstone thresholds increased in order (-1.50; 0.38; 1.15) as did the category measures (-2.57; -0.54; 0.80; 2.16). The Infit and Outfit MNSQs for the four categories were between 0.87 and 1.49, showing a good fit (within 0.5 – 1.5). Although respondents selected every category, few chose categories “3-4 Days” (5%) or “5-7 Days” (3%).

Like the CES-D16, the CES-D10, the distribution of items positioned at the Rasch-Thurstone thresholds covered only about half of the person measures, failing to cover the highest extremes and many respondents with small depression estimates (Figure A1). Although the person ability estimates ranged between -4.96 and 4.22, the item difficulty estimates only ranged from -0.57 to 0.65, with the lowest and highest Rasch-Thurstone threshold edges at -2.07 (= -0.57 + -1.50) and 1.80 (= 0.65 + 1.15), respectively. Gender and age group DIF showed no substantial bias (all < 0.64). Table A2 shows the transformation of raw scores (0 to 30) to Rasch scores.

Fig. A1
figure 2

Wright Item-person Map: Rasch-Thurstone Thresholds for CES-D10. Note: ‘#’ stands for N = 473, ‘.’ stands for N = 1 to 472; N.M stands for the threshold between the (M-1)th category and the Mth category for Item No.

Table A2 The transformation of raw scores to Rasch scores

Higher CES-D-10 scores were linked to (a) lower positive self-esteem (-0.143), (b) higher negative self-esteem (0.299), and (c) lower life satisfaction (-0.237).


Excluding the four positive-affect items forming a different dimension and the six items showing gender or age bias yielded a short version of the CES-D (CES-D10). CES-10 had good psychometric properties, including unidimensionality, good Infit and Outfit MNSQ, good reliability, ordered category structure, and no gender or age group DIF items. Also, the person measures obtained using CES-D10 and CES-D16 were highly correlated. Unlike past studies on shortened CES-D that focused on specific samples (e.g., Andresen et al., 2013; González et al., 2017), the CES-D10 proposed in this study can be applied to a general sample covering ages10 to 99 years, without gender or age bias.

However, CES-D10 did not cover respondents at the highest extremes or with low estimates of depression. To create a broader diagnostic tool that accurately captures higher or lower levels of depression, items with extremely high and low difficulties can be added. For instance, thoughts of death or suicide is one of the criterion for clinical diagnosis of depression (American Psychiatric Association, 2013; Folse et al., 2006; Stoewen, 2015) and can be added to the CES-D to detect those with high levels of depression. If accurately measuring low levels of depression is desirable, researchers can likewise test items that capture them, such as poor attention or executive functions (Maalouf et al., 2010).

Although excluding ten items, the CES-D still covered important criteria for Major Depressive Disorder (depressed mood, feeling of worthlessness, fatigue, and diminished ability to work) defined by the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition ( DSM-5; American Psychiatric Association, 2013). Although positive affect was included in the full CES-D, it does not exactly capture diminished pleasure. Also, some items related to retardation and insomnia were included in CES-D and CES-D16, but they are not necessarily symptoms of Major Depressive Disorder. As suggested by DSM-5, agitation (as well as retardation) and hypersomnia (as well as insomnia) might be symptoms of Major Depressive Disorder (American Psychiatric Association, 2013). As a result, including retardation and insomnia items in the rating scale might not yield accurate results. This can be a direction for future studies. Nonetheless, the CES-D should be for non-clinical use, and clinical examination is suggested for depression symptoms lasting over 2 weeks.

Although results of this study showed that, conceptually, positive affect and depression yield two different dimensions, practitioners might still use responses to positive affect items as proximal indicators of diminished pleasure. Future studies can include items that better capture diminished pleasure.

Also, a table transforming the raw total score to a Rasch score was provided. It can help practitioners better understand the CES-D results or aid researchers in further statistical analysis of the interval data.

Zhu, J., Chiu, M.M. Gender- and age-bias in CES-D when measuring depression in China: A Rasch analysis. Curr Psychol (2021).

