Practitioners often screen depression among the general population with the Center for Epidemiologic Studies Depression (CES-D) scale (Radloff, 1977). However, few studies tested the scale for bias (by gender and age) across a wide age range. This study does so with a partial credit Rasch model via Winsteps® (Linacre, 2018) on 34,762 Chinese people, 10–99 years old. Results showed one gender Differential Item Functioning (DIF) item (cry) and six age DIF items. As low positive-affect was not a good indicator of depression, its four items were excluded, yielding a 16-item CES-D (CES-D16). At the same level of depression, females report crying more often than males do. Compared to young people at the same depression level, older people felt less fearful, cried less, were bothered less often, had more sleep problems, needed more effort to do things, and could not get going more often. When using the CES-D16 to examine the general population of Chinese people across different genders and ages, researchers should pay special attention to these DIF items.
This is a preview of subscription content,to check access.
Access this article
This study used secondary data of the 2012 wave of the China Family Panel Studies (CFPS), which can be accessed by visiting Peking University Open Research Data Platform (https://doi.org/10.18170/DVN/45LCSO).
Carleton, R. N., Thibodeau, M. A., Teale, M. J. N., Welch, P. G., Abrams, M. P., Robinson, T., & Asmundson, G. J. G. (2013). The Center for Epidemiologic Studies Depression Scale: A review with a theoretical and empirical examination of item content and factor structure. PLoS One, 8(3), 1–11. https://doi.org/10.1371/journal.pone.0058067.
Cole, S. R., Kawachi, I., Maller, S. J., & Berkman, L. F. (2000). Test of item-response bias in the CES-D scale: Experience from the New Haven EPESE study. Journal of Clinical Epidemiology, 53(3), 285–289. https://doi.org/10.1016/S0895-4356(99)00151-1.
Covic, T., Pallant, J. F., Conaghan, P. G., & Tennant, A. (2007). A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population using Rasch analysis. Health and Quality of Life Outcomes, 5(41), 1–8. https://doi.org/10.1186/1477-7525-5-41.
Dagevos, H. (2005). Consumers as four-faced creatures. Looking at food consumption from the perspective of contemporary consumers. Appetite, 45(1), 32–39. https://doi.org/10.1016/j.appet.2005.03.006.
El-Den, S., Chen, T. F., Gan, Y.-L., Wong, E., & O’Reilly, C. L. (2018). The psychometric properties of depression screening tools in primary healthcare settings: A systematic review. Journal of Affective Disorders, 225, 503–522. https://doi.org/10.1016/j.jad.2017.08.060.
Erikson, E. H., & Erikson, J. M. (1998). The life cycle completed (extended version). WW Norton & Company.
Fischer, A., & Lafrance, M. (2015). What drives the smile and the tear: Why women are more emotionally expressive than men. Emotion Review, 7(1), 22–29. https://doi.org/10.1177/1754073914544406.
Gay, C. L., Kottorp, A., Lerdal, A., Lee, K. A., Gay, C. L., Kottorp, A., … Lee, K. A. (2016). Psychometric limitations of the Center for Epidemiologic Studies-Depression Scale for assessing depressive symptoms among adults with HIV/AIDS: A Rasch analysis. Depression Research and Treatment, Depression Research and Treatment, 2016. https://doi.org/10.1155/2016/2824595.
Hartmann, C., Shi, J., Giusto, A., & Siegrist, M. (2015). The psychology of eating insects: A cross-cultural comparison between Germany and China. Food Quality and Preference, 44, 148–156. https://doi.org/10.1016/j.foodqual.2015.04.013.
Jackson, P. B., & Finney, M. (2002). Negative life events and psychological distress among young adults. Social Psychology Quarterly, 65(2), 186–201.
Linacre, J. M. (2018). A user’s guide to Winsteps® Rasch-model computer programs: Program manual 4.1.0. Winsteps.com.
Lindquist, K. A., Satpute, A. B., Wager, T. D., Weber, J., & Barrett, L. F. (2016). The brain basis of positive and negative affect: Evidence from a meta-analysis of the human neuroimaging literature. Cerebral Cortex, 26(5), 1910–1922. https://doi.org/10.1093/cercor/bhv001.
Ma, G. (2015). Food, eating behavior, and culture in Chinese society. Journal of Ethnic Foods, 2(4), 195–199. https://doi.org/10.1016/j.jef.2015.11.004.
Mander, B. A., Winer, J. R., & Walker, M. P. (2017). Sleep and human aging. Neuron, 94(1), 19–36. https://doi.org/10.1016/j.neuron.2017.02.004.
Masters, G. N., & Wright, B. D. (1993). The partial credit model. In: W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of Polytomous Item Response Theory Models (pp. 101–121). https://doi.org/10.4324/9780203861264.ch5.
Molton, I. R., Terrill, A. L., Smith, A. E., Yorkston, K. M., Alschuler, K. N., Ehde, D. M., & Jensen, M. P. (2014). Modeling secondary health conditions in adults aging with physical disability. Journal of Aging and Health, 26(3), 335–359. https://doi.org/10.1177/0898264313516166.
Nelson, L. J., & Chen, X. (2007). Emerging adulthood in China: The role of social and cultural factors. Child Development Perspectives, 1(2), 86–91. https://doi.org/10.1111/j.1750-8606.2007.00020.x.
O’Doherty Jensen, K., & Holm, L. (1999). Preferences, quantities and concerns: Socio-cultural perspectives on the gendered consumption of foods. European Journal of Clinical Nutrition, 53(5), 351–359. https://doi.org/10.1038/sj.ejcn.1600767.
Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1(3), 385–401. https://doi.org/10.1177/014662167700100306.
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests (Expaned ed). University of Chicago Press.
Schirda, B., Valentine, T. R., Aldao, A., & Prakash, R. S. (2016). Age-related differences in emotion regulation strategies : Examining the role of contextual factors. Developmental Psychology, 52(9), 1370–1380. https://doi.org/10.1037/dev0000194.supp.
Stansbury, J. P., Ried, L. D., & Velozo, C. A. (2006). Unidimensionality and bandwidth in the Center for Epidemiologic Studies Depression (CES–D) scale. Journal of Personality Assessment, 86(1), 10–22.
Sun, X., Li, Y., Yu, C., & Li, L. (2017). Reliability and validity of depression scales of Chinese version: A systematic review. Chinese Journal of Epidemiology, 38(1), 110–116. https://doi.org/10.3760/cma.j.issn.0254-6450.2017.01.021.
Wang, W.-C., Chen, P.-H., & Cheng, Y.-Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9(1), 116–136. https://doi.org/10.1037/1082-989X.9.1.116.
Watson, D. C. (2012). Gender differences in gossip and friendship. Sex Roles, 67(9–10), 494–502. https://doi.org/10.1007/s11199-012-0160-4.
World Health Organisation. (2018). Depression. Retrieved May 11, 2018, from World Health Organisation website: http://www.who.int/en/news-room/fact-sheets/detail/depression
Wright, B. D., & Masters, G. (1982). Rating scale analysis. Mesa Press.
Zumbo, B. D., Gelin, M. N., & Hubley, A. M. (2001). Psychometric study of the CES-D: Factor analysis and DIF. International Neuropsychological Society’s 29th Annual Meeting, 1–11. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.506.6756&rep=rep1&type=pdf. Accessed 18 Jun 2021.
The work was partially supported by the grants from the Central Reserve Allocation Committee and the Faculty of Education and Human Development of The Education University of Hong Kong (Project No. 03A28).
Conflict of Interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethical approvals were gained from The Education University of Hong Kong.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Shorter, Gender- and Age- Unbiased Depression Scale (CES-D10)
Deleting the six DIF items from the CES-D16 yielded a shorter CES-D10. Further analysis showed the unidimensionality of the CES-D10. The PCA of residual variances yielded the first contrast eigenvalue of 1.52 (less than 2) and accounted for 35.8% of the raw variance, close to the 36.5% for the CES-D16. The Infit and Outfit MNSQ values were between .71 and 1.32 showing good fit (within 0.5 – 1.5; see Table A1). All ten items were in the same direction as the scale (all item point-measure correlations were positive). Also, the correlation between person measures obtained from the CES-D16 and CES-D10 was high (0.95).
Cronbach’s alpha of the CES-D10 was .83, and the item reliability was 1.00, close to those for the CES-D16. The Rasch person reliability is relatively low (.63), due to the few items (N = 10) relative to the large sample (N = 32,082).
The Rasch-Thurstone thresholds increased in order (-1.50; 0.38; 1.15) as did the category measures (-2.57; -0.54; 0.80; 2.16). The Infit and Outfit MNSQs for the four categories were between 0.87 and 1.49, showing a good fit (within 0.5 – 1.5). Although respondents selected every category, few chose categories “3-4 Days” (5%) or “5-7 Days” (3%).
Like the CES-D16, the CES-D10, the distribution of items positioned at the Rasch-Thurstone thresholds covered only about half of the person measures, failing to cover the highest extremes and many respondents with small depression estimates (Figure A1). Although the person ability estimates ranged between -4.96 and 4.22, the item difficulty estimates only ranged from -0.57 to 0.65, with the lowest and highest Rasch-Thurstone threshold edges at -2.07 (= -0.57 + -1.50) and 1.80 (= 0.65 + 1.15), respectively. Gender and age group DIF showed no substantial bias (all < 0.64). Table A2 shows the transformation of raw scores (0 to 30) to Rasch scores.
Higher CES-D-10 scores were linked to (a) lower positive self-esteem (-0.143), (b) higher negative self-esteem (0.299), and (c) lower life satisfaction (-0.237).
Excluding the four positive-affect items forming a different dimension and the six items showing gender or age bias yielded a short version of the CES-D (CES-D10). CES-10 had good psychometric properties, including unidimensionality, good Infit and Outfit MNSQ, good reliability, ordered category structure, and no gender or age group DIF items. Also, the person measures obtained using CES-D10 and CES-D16 were highly correlated. Unlike past studies on shortened CES-D that focused on specific samples (e.g., Andresen et al., 2013; González et al., 2017), the CES-D10 proposed in this study can be applied to a general sample covering ages10 to 99 years, without gender or age bias.
However, CES-D10 did not cover respondents at the highest extremes or with low estimates of depression. To create a broader diagnostic tool that accurately captures higher or lower levels of depression, items with extremely high and low difficulties can be added. For instance, thoughts of death or suicide is one of the criterion for clinical diagnosis of depression (American Psychiatric Association, 2013; Folse et al., 2006; Stoewen, 2015) and can be added to the CES-D to detect those with high levels of depression. If accurately measuring low levels of depression is desirable, researchers can likewise test items that capture them, such as poor attention or executive functions (Maalouf et al., 2010).
Although excluding ten items, the CES-D still covered important criteria for Major Depressive Disorder (depressed mood, feeling of worthlessness, fatigue, and diminished ability to work) defined by the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition ( DSM-5; American Psychiatric Association, 2013). Although positive affect was included in the full CES-D, it does not exactly capture diminished pleasure. Also, some items related to retardation and insomnia were included in CES-D and CES-D16, but they are not necessarily symptoms of Major Depressive Disorder. As suggested by DSM-5, agitation (as well as retardation) and hypersomnia (as well as insomnia) might be symptoms of Major Depressive Disorder (American Psychiatric Association, 2013). As a result, including retardation and insomnia items in the rating scale might not yield accurate results. This can be a direction for future studies. Nonetheless, the CES-D should be for non-clinical use, and clinical examination is suggested for depression symptoms lasting over 2 weeks.
Although results of this study showed that, conceptually, positive affect and depression yield two different dimensions, practitioners might still use responses to positive affect items as proximal indicators of diminished pleasure. Future studies can include items that better capture diminished pleasure.
Also, a table transforming the raw total score to a Rasch score was provided. It can help practitioners better understand the CES-D results or aid researchers in further statistical analysis of the interval data.
About this article
Cite this article
Zhu, J., Chiu, M.M. Gender- and age-bias in CES-D when measuring depression in China: A Rasch analysis. Curr Psychol 42, 8186–8196 (2023). https://doi.org/10.1007/s12144-021-01991-2