Use of CES-D among 56–66 year old people of Dutch, Moroccan and Turkish origin: Measurement invariance and mean differences between the groups

When assessing depressive symptoms across ethnic populations it is important to ensure that items from a questionnaire are valued and interpreted similarly across groups. We aimed to examine measurement (in)variance of the Center for Epidemiological Studies Depression Scale (CES-D) among people of Dutch, Moroccan and Turkish origin in the Netherlands and to compare the level of depressive symptoms across these three groups. Data were used from the Longitudinal Aging Study Amsterdam, including 269 people from Turkish, 209 from Moroccan and 618 from Dutch origin (aged 55–65 years). A multi-group confirmatory factor analysis (MGCFA) was performed to test measurement invariance of the four-factor CES-D across the three cohorts. To compare scores across ethnic groups, we performed ANCOVA. The four subscales of the CES-D (depressed affect, positive affect, somatic symptoms, and interpersonal problems) appeared measurement invariant in people of Dutch, Moroccan and Turkish origin. Turkish and Moroccan participants reported more depressive symptoms on all four domains. The four subscales of the CES-D measure the same constructs in people of Dutch, Moroccan and Turkish origin. Higher levels of depressive symptoms in the migrants groups are therefore not due to measurement invariance, but point to increased mental health problems in these groups.


Introduction
In the Netherlands the number of Turkish and Moroccan older adults is increasing. Most of these immigrants are first generation immigrants, and came to the Netherlands in the 1960s and 1970s for work or family reunification. Immigrants are considered vulnerable groups in society due to an accumulation of risk factors including language barriers (Pot et al. 2018;Suurmond et al. 2016), acculturative stress (Denktaş 2011), and experiences of discrimination and segregation (Pettigrew et al. 1997). Moreover, these risk factors are usually experienced on top of traditional risk factors for depression, including, being female (van de Velde et al. 2010), poverty (Bruce and Hoff 1994) and physical health deterioration (Cole and Dendukuri 2003). Because these characteristics are known to be linked to mental health problems (Van Der Zwaan and Tolsma 2013), questions arise about the mental health of these immigrants. Previous research showed that depressive symptoms as measured with the Center for Epidemiological Studies Depression Scale (CES-D) were more prevalent among older immigrants as compared to older adults from the Dutch origin ). The prevalence of self-reported depressive symptoms was 33.6% for Electronic supplementary material The online version of this article (https://doi.org/10.1007/s12144-018-9977-5) contains supplementary material, which is available to authorized users. people from Moroccan origin and 61.5% for people from Turkish origin, whereas the prevalence of depressive symptoms in the people from Dutch origin was 14.5% . However, it is questionable whether the scores on the CES-D reflect real differences in depressive symptoms among these cultural groups. Depression might be expressed in different ways in different cultures, and items of the CES-D might be valued and interpreted differently across cultural groups (Guarnaccia et al. 1989;Kim et al. 2011;Kirmayer et al. 2017). In non-Western cultures the attitude towards mental illness seems more stigmatized than in Western cultures and the expression of depressed mood might be more devalued (Zhang et al. 2011). In some cultures there are no words for some of the key symptoms of depression, which illustrates that there may be fundamental cultural divides with regard to the experience and expression of depressive symptoms. As a result, migrants might emphasize somatic symptoms, such as sleep problems, having difficulty to get started, or feeling tired more than the typically psychological symptoms, such as feeling down, self-deprecation, and suicidal ideation . Given the importance of cross-cultural comparisons of depression, it is important to have access to instruments with minimal measurement variance (MI) (Van Den Berg and Lance 2000).
The CES-D (Radloff 1977) has been translated in many different languages and is widely used across the world. Radloff (1977) examined the factor structure of the CES-D and identified four factors, including depressed affect, positive affect, somatic symptoms/retarded activity and interpersonal problems. This four-factor structure of the CES-D has been replicated many times (Kim et al. 2011;Chin et al. 2015). However, several other studies found a three-factor or even a two-factor structure (see Carleton et al. 2013). In addition, bifactor or higher order models were evaluated and were found to be invariant across different cultural groups (Gomez and McLaren 2015;Yang et al. 2009). In the Netherlands, Beekman et al. (1997a) confirmed the four-factor structure of the CES-D. More recently a study compared the factor structure of a native Dutch sample with that from a Chinese sample, and showed that in both samples, the four-factor model resulted in the best fit. They concluded that the four dimensions (somatic symptoms, depressed affect, positive affect, and interpersonal problems) of the CES-D seemed to be the most informative when assessing depressive symptoms in older adults compared to the other factor models (Zhang et al. 2011). In a meta-analysis Kim et al. (2011) replicated the original four-factor structure in four out of five ethnic groups, and concluded that the original four-factor structure may not be the best fit for all ethnic groups.
The four-factor model was the most informative when assessing depressive symptoms in older adults compared to the other factor models (Zhang et al. 2011), but one previous study showed violations of MI across ethnic groups (Kim et al. 2011). In addition, the sum score on the CES-D is widely used in research and practice. Therefore we considered both the unidimensional and the four-factor model for measurement invariance testing.
When comparing different cultural groups, it is important that the CES-D measures depression the same way across cultural groups. In other, words the CES-D has to be measurement invariant. Measurement variance may be caused by differences in the interpretation of items and by differences in response style due to cultural differences in conceptualization, meaning, and symptom expression of depression (Kim et al. 2011). For instance Zhang et al. (2011) found several items of the CES-D (for instance 'feeling depressed', 'feeling fearful', and 'feeling good') to be measurement non-invariant in a Chinese and Dutch sample of older adults. In addition, for different measures of depression, Smits and colleagues (Smits et al. 2005) found that questions about depressive symptoms were prone to interpreting issues among Turkish and Moroccan immigrants. When interviewing these immigrants, they found that depressive symptoms are often subject of shame among individuals or families and therefore more likely to be concealed. Physical symptoms, by contrast were more easily reported. Given the potential variety of interpretations, it is important to study measurement invariance before comparing groups with different cultural backgrounds.
The first aim of the present study was to examine measurement invariance of the CES-D regarding ethnicity. When we show that the CES-D is measurement invariant, we aim to compare the scores on the CES-D between people of Dutch, Moroccan and Turkish origin. For comparing the CES-D (subscale) scores across groups, we aim to eliminate the potential effect of other important risk factors for depression. More specifically, we included three potential risk factors, namely sex, income level and physical limitations. Each of these risk factors are known to be associated with depressive symptoms and ethnicity (Bruce and Hoff 1994;Beekman et al. 1997b;Cole and Dendukuri 2003;Van Der Wurff et al. 2004;Schellingerhout 2004;Reijneveld et al. 2007;Van De Velde et al. 2010;Denktaş 2011).

Sample
The present study included two samples collected in the context of LASA. The Longitudinal Aging Study Amsterdam (LASA) investigates determinants and consequences of ageing in social, cognitive, physical and emotional domains of functioning (Huisman et al. 2011). The people from Dutch origin (n = 1023) were included in 2012-2013. Respondents were drawn from the population registers of 11 Dutch municipalities that differ with regard to the degree of urbanization in the Netherlands. The respondents were born between 1948 and 1957 (cooperation rate 63%). The second sample included 269 people from Turkish and 209 from Moroccan origin, also born between 1948 and 1957 (cooperation rate Total: 45%, Turkish immigrants: 50%, Moroccan immigrants: 40%) and was collected in 2013-2014. LASA focussed on these immigrant groups because they comprise the largest groups of labour immigrants that have settled in the Netherlands. There are a number of reasons why the people of Moroccan and Turkish origin are considered to be at risk of more rapid physical deterioration, social loneliness and depression (Schellingerhout 2004). Because people of Moroccan and Turkish origin predominantly live in the larger cities in the Netherlands, data collection took place by drawing from the registers of fifteen Dutch cities with a population size between 85 and 805 thousand.
In order to enhance the comparability between, we selected only those Dutch respondents who lived in urbanized areas of the Netherlands. To this end 317 native Dutch respondents to LASA were excluded from the original sample. Furthermore, 88 Dutch respondents were excluded because they were not born in the Netherlands. Similarly, 5 immigrants were excluded because they were not born in Turkey or Morocco. In total 618 native Dutch, 267 Turkish and 209 Moroccan respondents were included in the analysis.
The CES-D is part of the face-to-face interview. The interviews were conducted by trained interviewers, who were of the same ethnic background and gender as the respondent. If needed, translated interviews in Turkish, Tarafit or Moroccan dialect were available for the immigrant sample. These Turkish and Arabic versions were previously translated according to standard procedures . First, all questions of the CES-D were translated into Turkish language and Moroccan Arabic dialect by certified translators. Second, all translated questions were translated back into Dutch by other certified translators. Third, experts reconciled both translations and decided upon a final translation of the CES-D. Afterwards the study conducted an analysis to validate the translation of the CES-D. Translations in Tarifit and all other questions in the interview, where done by professional translators and were subsequently evaluated through several pilot interviews.

Covariates
Country of origin (Dutch, Turkish or Moroccan) is defined according to the country of birth of the participants. Income level was based on the level of income in the household, and was dichotomized at the Dutch poverty line; below € 1425 for respondents with partner and below €1040 for respondents without partner. The number of physical limitations was based on the respondent's ability to perform seven daily tasks: going up and down a staircase, using own or public transportation, cutting toenails, dressing and undressing, sitting down and standing up from a chair, walking outside, and taking a shower or bathe. Five response options were included: no difficulty, some difficulty, much difficulty, only with help, and unable, coded as 0 to 4. Sum scores are calculated with range from 0 to 28. Cronbach's alpha was .81, .87, and .79 for Dutch, Turkish, and Moroccan respondents, respectively.

Statistical Analysis
To examine measurement invariance of the CES-D, first confirmatory factor analysis (MGCFA) was performed using polychoric correlations with a weighted least squares algorithm with mean and variance adjustment (WLSMV), using MPlus version 7, to assess how well the data fit the competing models. First, the unidimensional model was tested in each cohort (i.e. the Dutch, Turkish and Moroccan group) because in many studies the CES-D is used as a unidimensional scale. Next, the original four-factor model was tested. Model fit was evaluated using the Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), and Standardized Root Mean Square Residual (SRMR). Generally, CFI and TLI >0.95, and SRMR <0.08 are considered as good (Hu and Bentler 1999). RMSEA between 0.05 and 0.08 is considered reasonable, and RMSEA ≥0.1 suggests poor fit (University of Connecticut 2014). Local independence was investigated by inspecting the residual correlations between items obtained from the confirmatory factor analysis. In addition, the internal consistency was computed for each unidimensional (sub)scale.
After having found one model that showed reasonable fit in each ethnic group, we incorporated this model into a multigroup MGCFA. Four nested models were tested, i.e. configural invariance, metric invariance, scalar invariance, and strict invariance, by comparing a more constrained model to the previous model. In the configural invariance model the same factor structure is imposed on the three groups, indicating that the clustering of items and the factors that they represent are similar across groups (Gregorich 2006). In the metric invariance model the factor loadings are constrained to be equal, implying that the magnitude of the loadings is similar across the groups (Gregorich 2006). In the scalar invariance model, both the factor loadings, and the item thresholds are constrained across groups, indicating that item responses are not systematically higher or lower in one group compared to the other group(s) (Gregorich 2006). In strict invariance models, in addition to constraining the factor loadings and item thresholds, also the residual variances, or error terms, of each item are constrained across the groups (Gregorich 2006).
Following recommendations by Chen (2007) for comparing two nested models, cut-off values of ΔCFI <0.01 and ΔRMSEA <0.015 will be used for testing metric invariance, scalar invariance, as well as strict invariance. In the present study, models will be considered acceptable on condition that both indices met the above criteria.
Finally, in order to compare scores across ethnic groups, we performed univariate Analysis of Covariance (ANCOVA), adjusting for the effect of gender, income and physical limitations. The analysis for each of the four sub-scales was done separately. In addition, pairwise comparisons were performed through post-hoc testing.

Sample Characteristics
Table 1 summarizes the sample characteristics including age, sex, household income level, and physical functioning for all origin groups separately. People from Turkish and Moroccan origin had lower income levels than people from Dutch origin. Turkish immigrants had the lowest physical functioning levels followed by Moroccan immigrants and lastly by older adults from Dutch origin.

Confirmatory Factor Analysis and Internal Consistency
First, we determined goodness-of-fit indices of the two models, i.e. the unidimensional model and the original fourfactor model ( Table 2) for each origin group separately. In all three groups the four-factor model showed the best fit. The goodness-of-fit indices for the unidimensional model were largely insufficient. Table 3 depicts the Cronbach's alpha of all four domains of the CES-D for each group separately. The internal consistency of the somatic symptoms, the depressed affect and the positive affect subscales is sufficient in all three groups. The internal consistency of interpersonal contacts subscale is insufficient, but this sub-scale consist of only two items. Table 4 shows the results from the multi-group CFA. Adding constraints for equal factor structure, equal factor loadings, equal item thresholds and equal residual variances, respectively (i.e. configural invariance, metric invariance, strong invariance and strict invariance) did not lead to a reduced model fit when testing measurement invariance for the group variable. While the RMSEA value for country of origin increased after adding constraints in the metric and strong model (i.e. ΔRMSEA = 0.007, ΔRMSEA = 0.000) it did not fall above the critical value of 0.08. Moreover, while the RMSEA for country of origin indicated acceptable fit the CFI indicated good fit. The final strict model had a RMSEA of 0.056, a CFI of 0.958 and a TLI of 0.960, indicating reasonable and good fit, respectively. Measurement invariance regarding country of origin was confirmed for the four factors of the CES-D.

Comparison of CES-D Scores between the Migrant and Dutch Groups
After adjusting for income, sex and physical limitations, descriptive statistics indicate that native Dutch and immigrant groups differ on all four domains of de CES-D (Table 5). On all four domains, average scores of people from Turkish and Moroccan origin indicated more symptoms than native Dutch. Only for positive affect, Turkish and Moroccan groups differed from each other. Turkish immigrants had lower levels of positive affect than Moroccan immigrants.

Discussion
The four subscales of the CES-D (depressed affect, positive affect, somatic symptoms, and interpersonal problems) are measurement invariant in older adults from Dutch and Turkish and Moroccan origin. This implies that the four subscales of the CES-D measure the same constructs in the three groups. Therefore we are able to make meaningful comparisons between the three groups. It has to be noted that the internal reliability of the interpersonal subscale was insufficient, but this subscale only contains two items. Nevertheless, we should be careful in interpreting this subscale. When comparing the three groups our results showed that the Turkish and Moroccan immigrants reported more depressive symptoms on all four domains, compared to adults of Dutch origin. These results are in line with prior studies Schrier et al. 2010), using the CES-D as a unidimensional scale  or the Symptom Checklist-90-revised (SCL-90-R), which did not examine measurement invariance of the scales. Except for positive affect, people from Turkish and Moroccan origin scored similarly on the subscales. The Turkish group had the lowest levels of positive affect of all groups. Although we expected more depressive symptoms in the immigrant groups as compared to the Dutch origin group, we expected them to score especially high on the somatic subscale, and lower on the depressed affect subscale as was suggested by Acartürk and colleagues (Acartürk et al. 2011).
The MGCFA showed that the original four-factor structure of the CES-D had the best fit in all three groups. The goodness of fit of an unidimensional model was insufficient in the three groups, meaning that the use of a total score is not grounded in our study sample. Our results show the importance of studying the factor structure and measurement invariance before comparing groups with a different cultural background. In addition to the unidimensional and 4-factor model, we fitted a bifactor model because Chen and colleagues (Chen et al. 2006) argued that there are mathematical advantages of the bi-factor model, which include estimating fewer parameters and reduced model complexity. However, in the Turkish sample, this model could not be identified, and two items (i.e. item 7; I felt that everything I did was an effort and item 18; I felt sad) showed extremely high factor loadings both for the general factor as well as for the group factor compared to other factor loadings. It might be that for Turkish immigrants item 7 signifies physical health problems, which would explain why they were more likely to endorse item 7. In the Dutch sample, the model could be identified, but item 18 also showed proportionately high factor loadings compared to the other factor loadings. The fact that item 18 showed high factor loadings for both Dutch and the Turkish group might indicate similarity rather than difference between the structure of the CES-D in both origin groups. Because of these difficulties, we decided not to use this model.
Our study has several strengths and limitations. Strength is that we included a group of older adults from Turkish,  2, 5, 7, 11, 13, 20; depressed affect includes items 3, 6, 9, 10, 14, 17, 18; positive affect includes items 4, 8, 12, 16; interpersonal problems includes items 15, 19 Moroccan and Dutch from urban areas in the The Netherlands, where most of the immigrant groups live. Also we were able to include persons who did not speak Dutch or who were illiterate by using interviewers that spoke the language of the respondent, and by using questionnaires translated by professional translators available in Moroccan dialects and Turkish language that were verbally executed. A limitation is that among the immigrant groups the cooperation rate was low. This seems a common problem, across Europe studies report challenges with cooperation rates among older migrants (Lipson and Meleis 1989), as well as in the Netherlands (Reijneveld et al. 2007). Little is known about the characteristics of the persons who do not participate in this study. This makes it difficult to estimate how this has affected our results. The use of only self-report measures may also be considered a limitation. Some ethnic minority groups may be more inclined to give socially desirable answers (Reijneveld 1998;Reijneveld and Stronks 1999), which may have resulted in underreporting depressive symptoms. It is suggested that underreporting could be based on socio-economic factors, health problems, trust issues and time constraints . It is worrying that the immigrant groups report much more depressive symptoms than their native Dutch peers. A previous Dutch study among adult immigrant groups showed already in 2008 higher prevalence rates for depressive disorders among Turkish and Moroccan immigrants as compared to their native Dutch peers (De Wit et al. 2008). Nonetheless, it has been shown that Turkish and Moroccan immigrants make relatively little use of mental health care facilities (Kamperman et al. 2007). It is therefore questionable whether these symptoms are adequately recognised by health care practitioners or by the Turkish and Moroccan immigrants themselves. One reason might be that language barriers occur between health practitioners and immigrants (Fransen et al. 2013;Rademakers 2014). Language barriers go beyond the ability to speak Dutch but include cultural differences in the interpretation of mental illness (i.e. of the taboos resting on mental illness) as well as an individual's own knowledge and understanding of mental illness (Fransen et al. 2013). Based on the results of this study, it is important that health professionals are aware of the high prevalence rates of depression among older immigrants and that the individual depressive symptoms are systematically asked for in these migrants when they are suspect for depression. In addition, the finding that Turkish immigrants might display lower positive affect than Moroccan immigrants might be an important lead for health professionals to pay extra attention to.

Compliance with Ethical Standards
Conflict of Interest On behalf of all authors, the corresponding author states that there is no conflict of interest.
Ethical Approval All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
Informed Consent Informed consent was obtained from all individual participants included in the study.
Open Access This article is distributed under the terms of the Creative Comm ons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.