Introduction

The physical and mental health of left-behind children (LBC), as a disadvantaged group created by the rapid urbanization and economic development of China, has long been a concern (Shao et al., 2018). The LBC refer to children under 18 years of age who have been left alone in their hometown, and one or both of the parents migrated to other places for work (Lan et al., 2019; Milton et al., 2010). With the progress of industrialization and the increasing economic gap between urban and rural areas in developing countries, more and more workers are choosing to leave their registered residential place for better-paid jobs. With the rural-to-urban migration, the workers had to leave their children in the countryside, and are unable to bring them to the cities due to destitute living conditions. Thus, these affected children are forced to live without their parents and receive less parental care and education (Tang et al., 2019; Wu et al., 2019). Studies have shown that parental migration has a negative effect on children’s mental health, resulting in low self-esteem, loneliness, anxiety, depression and even a high prevalence of suicidal ideation (Ai & Hu, 2014; Ding et al., 2019; Tang et al., 2018; Xiao et al., 2019). In fact, compared with non-left-behind children (NLBC), LBC are more likely to develop emotional or behavior problems and have significantly more emotional symptoms, more psychopathological behaviors and fewer pro-social behaviors (Fan et al., 2010; Wang et al., 2019). This may be because LBC and NLBC have differing emotional regulation capacities. Therefore, it is meaningful to closely examine emotion regulation in LBC.

Emotion regulation, as an important aspect of healthy psychological development, is the process by which individuals influence some aspects of their emotions, including what emotions they experience and how they express them and respond to them (Gross, 1999). A body of research indicates that effective emotion regulation is linked to affective functioning, social relations and more effective cognitive processing (Gross, 2001; Gullone & Taffe, 2012; Ochsner & Gross, 2008). In contrast, emotion dysregulation is implicated in depression, anxiety, maladaptive functioning and even many kinds of mental disorders (Aldao et al., 2010; Cisler et al., 2010; Moore et al., 2008).

Emotion Regulation Strategies

According to the process model of emotion regulation, there are two emotion regulation strategies based on how emotion unfolds over time. Gross (1999) pointed out that these strategies can be differentiated as antecedent-focused or response-focused respectively, depending on whether they occur before or after the generation of emotion (Gross, 1999; Gross & John, 2003). Cognitive reappraisal (CR) is an antecedent-focused strategy that refers to a cognitive change that occurs before the emotion has been generated or a related response has taken place and involves modifying the situation’s emotional impact. In contrast, expressive suppression (ES) is a response-focused strategy that occurs after the generation of emotion and involves the inhibition of emotion expression (Gross, 2001). Considering that these two strategies act at different points, they may also have different influences on individuals (Gross, 2002; Lopes et al., 2003). Studies have shown that emotion regulation can be accommodating or harmful based on the strategy individuals use to manage or change their responses to emotional situations. Numerous studies suggest that adequate emotion regulation strategies are related to better social relationships and life satisfaction. However, frequent use of inadequate strategies is negatively associated with emotion release, mood repair efforts, stress symptoms, and trait anxiety (English et al., 2012; Gómez-Ortiz et al., 2016; John & Gross, 2004; Spaapen et al., 2014; Teixeira et al., 2014).

Factor Structure of the Emotion Regulation Questionnaire

To analyze the individual differences in these two strategies, Gross and John (2003) developed the Emotion Regulation Questionnaire (ERQ), which consists of 10 items measuring the use of Reappraisal and Suppression strategies (Gross & John, 2003). A two-factor structure was obtained in the original study through model testing, and the scale has good reliability and validity. Further studies have tested the psychometric properties of the ERQ among different ethnic and cultural groups and obtained mixed results. The ERQ has primarily been employed in American (John & Gross, 2004; Moore et al., 2008) and French (D’Argembeau & Van der Linden, 2006) contexts, and the results confirmed the original two-factor structure. Then, Matsumoto et al. (2008) used the ERQ in a transcultural study of participants from 23 countries (Matsumoto et al., 2008); their results also supported the two-factor structure of the ERQ. A body of research has since widely employed the ERQ and its revised or age-specific version in German (Abler & Kessler, 2009), Australian (Gullone & Taffe, 2012), Chinese (Li & Wu, 2018; Liu et al., 2017), Portuguese (Teixeira et al., 2014), Italian (Balzarotti et al., 2010; Sala et al., 2012), American (Melka et al., 2011; Preece et al., 2021), Japanese (Namatame et al., 2020), and Spanish (Cabello et al., 2013; Gómez-Ortiz et al., 2016) contexts. All these studies supported the two-factor structure. However, two other studies conducted in Germany (Wiltink et al., 2011) and Australia (Spaapen et al., 2014) found more complicated results. Preece et al. (2019) summarized the existing research and found that the factor loading and structural results varied between university students and general community samples. They then conducted a study to further verify the psychometric properties of the ERQ in general community samples, replicating the original factor structure (Preece et al., 2019) (see Table 1).

Table 1 Psychometric Properties of the ERQ and its revised version in Different Languages and Samples

The Present Study

Previous studies have mainly focused on healthy samples, and few studies have tested the reliability and validity of the ERQ in specific groups, including LBC. Moreover, the factor structure of the ERQ differs across samples. For instance, the original two-factor structure of the scale presents mixed results in general community samples (Preece et al., 2021; Spaapen et al., 2014; Wiltink et al., 2011) while it shows good stability in student samples (Abler & Kessler, 2009). In addition, considering that there are significant differences in emotional or behavioral problems between LBC and NLBC, these children may also differ in emotional regulation. Whether a questionnaire measures identical constructs is the premise for comparing validity across different groups (Dimitrov, 2010). Thus, it is important to determine whether the ERQ demonstrates consistent measurement characteristics across these two groups. In order to further verify the applicability and generalizability of the ERQ across different groups and to understand the differences in emotional regulation strategies between LBC and NLBC, this study aimed to test the psychometric properties and measurement invariance of the ERQ among Chinese LBC.

Materials and Methods

Participants

Our study was conducted in Guizhou province, which has a high percentage of left-behind children in China. In this study the target participant were secondary school students, and the cluster random sampling method was used. We adopted the definition for LBC according to the All-China Women’s Federation: minors under the year of 18 with one or both of their parents migrated out for work, and the continuous separation time was more than 3 months. Thus, the LBC in this study were defined by the following criteria: 1) children under 18 years old; 2) with one or both of the parents migrating for work; 3) being left behind for three consecutive months or more. It’s worth mentioning that although we have adopted the criterion of under 18 years old, due to the fact that some children in rural areas start school later and sometimes restudy, we have included students over 18 years old as they are still in the secondary school range. For those children with none of their parents migrating for work, they were defined as NLBC.

Sample 1: We completed the initial test in April 2019. The participants in this study were 2960 students from 7 middle schools spread across three cities in Guizhou Province, China. 1365 students (46.11%) were LBC and 1595 students (53.89%) were NLBC. The ages of the 2960 participants ranged from 12 to 20 (M = 15.74, SD = 2.50, 1.41% were missing), 44.39% of participants were male and 55.27% were female (.34% were missing). In accordance with the purpose of the data analysis, the sample was divided into two groups: LBC and NLBC.

The ages of the 1365 LBC ranged from 12 to 20 years (M = 15.80, SD = 2.45, 1.31% were missing). A total of 42.30% were male, and 57.26% were female (.44% were missing). Among them, the percentage of those whose parents were absent as migrant workers more than three months in a row was 32.24% (father works away from home), 12.55% (mother works away from home) and 55.22% (both parents work away from home). The age range of the 1595 NLBC was 12 to 20 years (M = 15.69, SD = 2.54, 1.50% were missing), 46.11% were male and 53.58% were female (.31% were missing).

Sample 2: After three months, 300 participants were retested, and 273 valid data points were retained, 41.76% of this sample was male and 57.14% female (1.10% were missing). The age ranged from 12 to 16 years (M = 13.66, SD = 1.01), 43.23% of them were LBC and 54.21% of them were NLBC (2.56% were missing).

Instruments

The Emotion Regulation Questionnaire (ERQ) (Gross & John, 2003) is a 10-item questionnaire which consists of two emotion-regulation strategies: ES (4 items) and CR (6 items), and using a 7-point Likert scale from strongly disagree to strongly agree (1 = strongly disagree and 7 = strongly agree). Before the development of the Chinese version of the ERQ, we contacted the developer of the original scale by email, obtained the permission for the use of the scale in China, and later sent back the translated version. The Chinese translation of the ERQ was developed with a backtranslation procedure by two independent groups. The one involved eight masters and one PhD in psychology, the other was a psychology master who had passed the Test for English Majors-Band 8 (TEM-8). We used a committee consensus approach to ensure the accuracy. The initial translation was done independently by the nine-person group, and then the questionnaire was translated back into English by the other group. Then, we compared the errors and inconsistencies between the reverse translation version and the original English version. In further reverse translation, we removed this content in repeated iterations until the version was semantically identical and was agreed upon by members of both groups. The final version of the questionnaire was obtained through backtranslation and discussion between the two groups.

The Emotion Regulation Scale (ERS) (Wang et al., 2007) is a self-report scale consists of 14 items to assess two dimensions of emotion regulation: CR (7 items) and ES (7 items). All these 14 items are rated on a Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree), with higher scores for each dimension indicating a higher level of tendency for that emotion regulation strategy. Cronbach’s alpha in this study was .76 for Suppression and .81 for Reappraisal.

The General Health Questionnaire (GHQ) is a self-administered questionnaire developed to assess general psychological health. The 12-item version of the GHQ (GHQ-12) (Goldberg et al., 1997) was used in this study, which uses a 4-point Likert scale (1 = not at all true of me and 4 = exactly true of me) with higher scores indicating lower level of individual’s psychological health. To ensure accuracy, backtranslation procedure with a committee consensus approach was used to translate the scale from English into Chinese. Cronbach’s alpha in this study was .80.

The Rosenberg Self-Esteem Scale (RSES) is the most widely used measure to assess individual’s self-esteem. It is a unidimensional scale and composed of 10 items using a 4-point Likert scale ranging from strongly disagree to strongly agree. The higher the scores, the higher level of positive self-assessments. In this study, we used the Chinese version of the RSES (Wang et al., 1999). Cronbach’s alpha in this study was .82.

The University of California Los Angeles Loneliness Scale (ULS) was originally developed by Russell et al. (Russell et al., 1978). It is the most frequently used measure of loneliness and show good psychometric properties. A short-form version of ULS (ULS-8) (Hays & Dimatteo, 1987) was used in this study, which contains 8 items and the statements evaluated on a 4-point Likert scale (1 = strongly disagree and 4 = strongly agree). Backtranslation procedure with a committee consensus approach was used to translate the scale into Chinese. The higher the scores, the higher level of loneliness. Cronbach’s alpha in this study was .81.

Procedure

Participants were invited to complete afore-mentioned questionnaires while they were in school. In order to ensure the effectiveness of the answer, it was emphasized by the researchers that there were no right or wrong answers and participants only had to choose the one that matched their state. They independently completed the RSES, ULS, GHQ, ERS and ERQ in order, and all of them were finished privately. There was no time limit and the students completed the questionnaire in about 30 min. All participants completed the questionnaires under the supervision of a class teacher and then returned the questionnaires to researchers by the teacher. All participants volunteered to participate in this study and were informed of their right to anonymity. Informed consent was obtained from all participants in this study and this study was approved by the Ethics Committee of University of Guizhou Normal University.

Statistical Methods

We found missing data in demographic variables. However, since these missing data did not affect the data analysis of response, we retained these data in most of the data analysis. But these missing data are not included in the analysis of measurement invariance. Therefore, the test of measurement invariance only included 1365 LBC (42.85% was male) and 1595 NLBC.

First, skewness and kurtosis indices were examined to detect the normality. Items with univariate skewness above 3 and kurtosis above 10 should be removed (R. B. Kline, 2012). The sample of 1365 LBC was split in half using an odd-even method. One of the subset was used for exploratory factor analysis (EFA). The Kaiser-Meyer-Olkin test (KMO) with value closer to 1 and the significant Bartlett’s test of sphericity was carried out to verify if the data is suitable for the EFA (Carpenter, 2018). An EFA using principal component factor (PCF) was conducted to extract factors, and based on Kaiser’s criterion of eigenvalue equal or greater than 1.0 were considered (Kline, 1994; Nunnally & Bernstein, 1994). Varimax-rotation and promax-rotation were performed to explore factor loadings, and the item-factor loading should be at least 0.35 on a main factor (Namatame et al., 2020). As the eigenvalue-greater-than-1.0 rule tends to retain too many factors, a minimum average partial correlations (MAP) test was performed to confirm the number of factors to be retained. It was suggested that choosing a number of components at which the average squared partial correlations was minimum (Velicer, 1976). Then the confirmatory factor analyses (CFA) were performed to test whether the two-factor structure of the ERQ could be replicated in this study, and the maximum likelihood parameter estimates (MLM) was used given that it is more robust. In addition, the Satorra-Bentler rescaled chi-square statistic (S-Bχ2), root mean square error of approximation (RMSEA), comparative fit index (CFI) and Tucker-Lewis index (TLI) were used to evaluate the model fit. According to the guidelines suggested by Hu and Bentler (Hu & Bentler, 1998), the RMSEA value less than .08 indicate moderate model fit, below .06 indicate good fit. For the CFI and TLI, the value above .95 are taken as good, above .90 are taken as acceptable.

Second, measurement invariance across genders, LBC and NLBC was tested using multigroup confirmatory factor analysis (MCFA). There were four more progressively restricted models presented: model 1 (equal form), model 2 (equal factor loading), model 3 (equal indicator intercepts) and model 4 (equal indicator error variances). Considering that ∆ S-Bχ2 is susceptible to the influence of sample size, the use of other model fit indices (∆CFI and ∆TLI) is appropriate, and a value of < .01 is suggested (Putnick & Bornstein, 2016).

Third, gender and left-behind experience differences were examined with a t-test. Then, to provide reliability evidence, Cronbach’s alpha internal reliability coefficients, McDonald’s omega (ω) and test-retest reliability based on the Pearson correlation were calculated. Finally, the convergent, discriminant and criterion validity were tested based on Pearson correlations between ERQ, ERS, GHQ, RSES and ULS-8 scores. The results are presented below. The descriptive statistics and Pearson correlation coefficients were calculated and the exploratory factor analysis, reliability analysis and t-test analysis were conducted used STATA/MP 13.0. The confirmatory factor analyses and multigroup confirmatory factor analysis were performed using Mplus software version 8.3.

Results

Factor Structure

Firstly, we conducted the normality assessment (see Table 2). Although the results showed that the variables basically met the norms for univariate normality (all skewness values <3 and all kurtosis values <10), we found that the data showed slightly non-normality (all |skewness| < 1.25 and all |kurtosis| < 3.75) (Coenders et al., 1997; Flora & Curran, 2004). Before conducting EFA on each item of the ERQ scale, the KMO value and Bartlett’s test of sphericity were firstly used to investigate whether each item of the scale was suitable for exploratory factor analysis. The results showed that the KMO value of the data was 0.77, and the Bartlett spherical test value was χ2/df = 1746.59/45 ≈ 38.81, p < 0. 001, which indicated that the data met the premise of further EFA. An EFA was performed using both the traditional method of extracting the number of factors (Kaiser eigenvalues) and the MAP test. According to the results of Kaiser eigenvalues, two factors were extracted. The first factor was defined by the cognitive reappraisal items and the second factor was defined by the expressive suppression items. The results of varimax-rotation and promax-rotation showed that each item consistently loaded on the expected factor, and the factor structure was substantially clear (see Table 2). In addition, the results of the MAP also suggested retaining two factors (see Table 3).

Table 2 Results of the normality assessment, item-total correlations and factor loadings
Table 3 Kaiser eigenvalues and minimum average partial correlations

Following the results of EFA, we adopted the two-factor model. CFA was conducted for the LBC sample using the MLM method, and Fig. 1 shows the two-factor model of the ERQ. According to the chi-square and fit statistics for this model, the fit for the LBC sample was moderately satisfactory (S-Bχ2/df≈106.778/34, P < .001, CFI = .948, TLI = .932, RMSEA = .056 [.044, .068]), indicating that the model was a good representation of the data and that the original two-factor structure was replicated in this study.

Fig. 1
figure 1

The confirmatory factor analysis model examined for LBC sample. ES = expressive suppression, CR = cognitive reappraisal

Measurement Invariance

To determine if the ERQ demonstrates consistent measurement characteristics across genders, LBC and NLBC, a multi-group confirmatory factor analysis (MCFA) was performed (see Tables 4, 5). The fit indices for the male, female, LBC and NLBC data sets were all good, indicating that it was appropriate to test the measurement invariance. First, for different genders in LBC, the first model (configural invariance) was supported. Then, three highly constrained model were tested (i.e., model 2: metric invariance, model 3: scalar invariance, model 4: strict invariance). The results seemed to show that the ERQ was consistent across different genders in LBC in this study, as all the ∆CFI and ∆TLI values were < .01. In addition, for LBC and NLBC group, same results were obtained as all differences were less than .01, supporting the measurement invariance across these two groups.

Table 4 Fit indices for measurement invariance across genders
Table 5 Fit indices for measurement invariance between LBC and NLBC

Gender, LBC and NLBC Differences

After the measurement invariance is established, the comparison of the manifest variables is meaningful (Meade et al., 2005). To examine gender and left-behind experience differences, a t-test was performed (see Table 6). As found by other studies, there were no significant gender differences for CR in LBC (t = .89, ns; effect size d = .04). In contrast, the ES score for males (M = 4.26, SD = 1.33) was higher than that for females (M = 4.06, SD = 1.45), t = 2.63, p < .01, effect size d = .14. Moreover, as predicted, a significant difference in the two emotion regulation strategies was found between LBC and NLBC. The scores of LBC (M = 4.14, SD = 1.40) were higher than those of NLBC (M = 4.03, SD = 1.44) on ES, t = 2.11, p < .05, effect size d = .07. However, there were no significant differences in CR between LBC and NLBC (t = −1.96, ns; effect size d = − .06). Thus, LBC may use the ES strategy more than NLBC.

Table 6 A t-test of gender, LBC and NLBC differences

Reliability

To test the internal consistency coefficient of the ERQ in the LBC sample, the Cronbach’s alpha and the McDonald’s ω of the ES and CR subscales were respectively examined (see Table 7). Cronbach’s α=.79 for ES (with item-total correlations ranging from .73 to .81, all p < .001), and α=.78 for CR (with item-total correlations ranging from .66 to .74, all p < .001). Both Cronbach’s α coefficients for ES and CR are higher than in many other studies (Abler & Kessler, 2009; John & Gross, 2004; Matsumoto et al., 2008; Sala et al., 2012). In addition, McDonald’s ω = .80 for ES, and ω = .78 for CR, indicating that the ERQ has good homogeneity reliability in LBC. Moreover, to test the stability of the ERQ, the test-retest reliability based on Pearson correlation was calculated. The correlation coefficients over a 3-month period for ES and CR were .62 and .68, respectively.

Table 7 Correlation coefficients matrix between ERQ, ERS, GHQ and RSES among LBC

Convergent, Discriminant and Criterion Validity

In accordance with the original and other studies (Gross & John, 2003; Moore et al., 2008), the low intercorrelations between ES and CR subscales were replicated (r = .08, P < .01), indicating that the relationship between these two factors was independent. Moreover, the correlation coefficient between the ERQ and other scales can be seen in Table 7. As with the ERQ-CR and ERS-CR correlation (r = .64, p < .001), the ERQ-ES was also highly significantly related to the ERS-ES (r = .58, p < .001), indicating that both the ERQ-CR and ERQ-ES have good convergent validity. Given that a significant but low correlation was found between the ERQ-ES and ERS-CR (r = .07, p < .01) and a significant but low correlation was found between the ERQ-CR and ERS-ES (r = .26, p < .001), indicating that both the ERQ-CR and ERQ-ES have good discriminant validity.

In addition, the ERQ-CR was significantly positively related with the RSES (r = .28, p < .001) and significantly negatively related with the GHQ (r = −.34, p < .001) and ULS-8 (r = −.16, p < .001). In contrast, the ERQ-ES showed a significant negative correlation with the RSES (r = −.22, p < .001) and significant positive correlations with the GHQ (r = .26, p < .001) and ULS-8 (r = .32, p < .001). In sum, all these results indicated that the ERQ has satisfactory criterion validity.

Discussion

This study aimed to test the psychometric properties and measurement invariance of the ERQ in Chinese LBC. Although two studies have validated the psychometric properties of the Chinese version of the ERQ, both used a revised (Li & Wu, 2018) or age-specific version (Liu et al., 2017). Considering that the original scale (Gross & John, 2003) has been widely used in various countries and groups and shows good reliability and validity, we further verified the performance of the ERQ in Chinese LBC and investigated the difference between LBC and NLBC in their use of ES and CR of as emotional regulation strategies.

First, the exploratory factor analysis suggested remaining two factors, and the two-factor model achieved satisfactory model fit, indicating that the original two-factor structure was replicated in the Chinese LBC sample. This result is similar to many other studies (Balzarotti et al., 2010; Gómez-Ortiz et al., 2016; Gullone & Taffe, 2012; John & Gross, 2004; Teixeira et al., 2014), indicating that the ERQ shows very good structural validity in different countries and groups. In addition, given that the differences in fit indices across the four different models in MCFA all reached the criterion, sufficient evidence of measurement invariance across genders, LBC and NLBC was obtained. In other words, all the items on the ERQ have the same function and meaning in Chinese LBC and NLBC samples and across genders in LBC. Regarding group differences in the two emotion regulation strategies, consistent with other studies, males reported significantly higher levels of ES than females in the LBC sample. Furthermore, as we expected, LBC reported significantly higher levels of ES than NLBC. This may be because LBC in rural China experience more loneliness and depression than NLBC (Wang et al., 2019; Wen & Lin, 2012), which leads them to bury their emotions. Because of their long-term separation from their parents, LBC are more likely to feel lonely, isolated or abandoned (Ai & Hu, 2014). These feelings and the lack of someone to talk to may lead to less self-expression and more use of the ES strategy in LBC experiencing emotional problems. This phenomenon may explain the differences between LBC and NLBC in this study. In addition, the present study verified that each item of ES and CR has a good item-total correlation, and both scales showed good reliability coefficients. In other word, the ERQ has good internal consistency. The test–retest correlations suggested reasonable stability for the scale over time. It is worth noting that the result of stability is consistent with that in another study that also suggests that there is more variation of ES strategy over time compared with CR (Liu et al., 2017). However, it differs from the findings of other studies that suggest less variability for the ES over time (Balzarotti et al., 2010; Cabello et al., 2013; Gómez-Ortiz et al., 2016; Gullone & Taffe, 2012). We noticed that Liu et al. also included Chinese rural students in their sample. Regarding the abovementioned results, LBC may use the ES strategy more, and one study showed that a longer duration of being left behind is significantly associated with emotional symptoms (Fan et al., 2010). In other words, it is reasonable to assume that the ES strategy is more likely to change in concert with the increase in the amount of time LBC are separated from their parents. Finally, the ERQ showed adequate convergent and discriminant validity, and the criterion validity was also acceptable.

Limitations

First and foremost, the samples in this study were all from the same province in China, which may limit the generalization performance of these conclusions of this scale in other regions. In addition, in this study we found low stability of the ES strategy over time, though it is similar to another study which also contained Chinese LBC sample, it is different from other studies which all shows less variability for the ES. Thus, more studies are needed to verify these conclusions in other groups.

Conclusion

The Emotion Regulation Questionnaire is a good two-dimensional measurement to be used in Chinese LBC to investigate the extent of their use of these two different emotional regulation strategies.