Introduction

Adverse childhood experiences (ACEs) are potentially traumatic events, such as physical, emotional, or sexual abuse, physical or emotional neglect, and household dysfunction that occur in the first 18 years of life (Felitti et al., 1998). ACEs are prevalent across countries; 38.8% of adults across 21 countries had at least one childhood adversity (Kessler et al., 2010). ACEs are robust predictors of poor physical and mental health outcomes, risky health behaviors, limited educational attainment, reduced life opportunities, and premature death in adulthood (Brown et al., 2009; Hughes et al., 2017; Liu et al., 2013). In addition, ACEs are costly; total annual costs attributable to ACEs were estimated to be US$581 billion in Europe and $748 billion in North America (Bellis et al., 2019). ACEs represent an urgent public health crisis with wide-reaching health and societal impacts (Bhushan et al., 2020).

Measuring ACEs in Chinese Populations

Although most of the current ACE studies are conducted in Western countries, more ACE research has begun to emerge in non-Western contexts, such as Mainland China, the most populous country in the world. Since 2000, at least 42 studies measuring ACEs using Chinese samples have been published in Western journals (Chang et al., 2019; Jia et al., 2020; Xiao et al., 2008; Zhang et al., 2020). Findings from these studies show that exposure to ACEs was significantly associated with risky health behaviors, chronic illnesses, social anxiety symptoms, depression, suicide intentions, and post-traumatic stress disorder (PTSD) in Chinese populations (Chang et al., 2019; Jia et al., 2020; Meng et al., 2021; Xiao et al., 2008; Zhang et al., 2020). For example, higher ACE scores were associated with increased risk of drinking (adjusted odds ratio [AOR] = 1.09, 95% confidence intervals [CI]: 1.00–1.09), chronic disease (AOR = 1.17, 95% CI 1.06–1.28), depression (AOR = 1.37, 95% CI: 1.27–1.48), and posttraumatic stress disorder (AOR = 1.32, 95% CI: 1.23–1.42) in a sample of 1501 adults aged 18–59 years in Macheng, China (Chang et al., 2019). ACEs were negatively associated with mindfulness during the COVID-19 pandemic in a sample of 1871 college students in China (Huang et al., 2021). In addition, findings from a population-based, cross-sectional study in China suggested that compared with those without ACE exposure, middle-aged or older adults who experienced four or more ACEs had increased risks of chronic diseases such as dyslipidemia, chronic lung disease, asthma, liver disease, digestive disease, kidney disease, arthritis, psychiatric disease, memory-related disease and multimorbidity (Lin et al., 2021).

These studies also show that childhood adversities are highly prevalent in Chinese populations, although rates of exposure to at least one ACE have ranged widely from 31% (Wei and Yu, 2013) to 93.5% (Li et al., 2015). Discrepancies in reported ACE exposure rates could be a result of variations in study populations and ACE measures used across studies. For example, some studies (e.g., Ding et al., 2019) used the Centers for Disease Control and Prevention (CDC) 10-item ACE questionnaire (CDC, 2019) which includes three ACE categories: abuse, neglect, and household dysfunction. Other studies assessed additional ACEs such as community violence, poverty, and bullying using questionnaires such as the Chinese version of the Childhood Adversities Checklist (Li et al., 2015). The inconsistencies in measures on ACEs impede cross-study prevalence comparisons. Furthermore, most of the ACE measures used in Mainland China were developed in the USA and were not designed for international use, such as the CDC 10-item ACE questionnaire, the Childhood Trauma Questionnaire (CTQ), and the Revised Adverse Childhood Experience Questionnaire (ACEQ-R) (Chang et al., 2019; Jia et al., 2020; Xiao et al., 2008; Zhang et al., 2020). These ACE measures were developed based on the US cultural context and may not capture types and prevalence of ACEs on an international scale. Using a standardized, international ACE measure, such as the ACE International Questionnaire (ACE-IQ), allows comparison of ACE prevalence rates across different countries and populations.

The ACE International Questionnaire (ACE-IQ)

The ACE-IQ was developed by the World Health Organization (WHO) to establish a global ACEs surveillance framework to assess the global health burden of ACEs (Anda et al., 2010; WHO, 2016). The ACE-IQ was adapted from the ACE Questionnaire of the CDC-Kaiser ACE study (Felitti et al., 1998). Questions that assess collective and community violence and bullying were added to increase the international cultural applicability and to better reflect a wide array of adversities that may be more commonly encountered outside the USA (Anda et al., 2010; Kidman et al., 2019). The ACE-IQ is designed to be administered to adults and includes 29 items assessing exposures to 13 categories of childhood adversity in the family and community (WHO, 2016). Categories of childhood adversity include emotional and physical neglect; emotional, physical, and sexual abuse; alcohol and/or drug abuse in the household; living with a household member who was chronically depressed, mentally ill, institutionalized, or suicidal; living with a household member who was incarcerated; parental death, separation, or divorce; violence against household members; bullying; witnessed community violence; and exposure to war or collective violence (WHO, 2016).

The ACE-IQ is the only measure developed specifically for international use and it has been administered in several countries or regions, including Hong Kong (Ho et al., 2019, 2021), Malawi (Kidman et al., 2019), Tunisia (El Mhamdi et al., 2020), Mainland China (Chang et al., 2019), Saudi Arabia (Almuneef et al., 2014), South Korea (Kim, 2017), and Vietnam (Tran et al., 2015). (See Table 1 for comparisons of prevalence rates identified using ACE-IQ and its psychometric properties across regions and populations). Despite the variations in prevalence rates across studies conducted in different countries, the dose–response associations between ACEs and risky health behaviors (e.g., alcohol abusive behaviors) and poor health-related outcomes (e.g., depressive symptoms, diabetes, coronary diseases, and respiratory disease) are consistent across studies (Chang et al., 2019; El Mhamdi et al., 2020; Kim, 2017) and align with existing studies conducted in Western countries including the USA (Felitti et al., 1998; Hughes et al., 2017).

Table 1 Comparisons of ACE prevalence rates and ACE-IQ psychometric properties across countries and populations

The WHO proposed two scoring methods to calculate ACEs exposure. Both versions dichotomize the 13 ACE categories into “non-exposure (0, i.e., no ACE)” and “exposure (1, i.e., one or more ACE),” resulting in a total score range from 0–13. However, the frequency scoring method takes the level of exposure into consideration. For example, only one incident (e.g., being touched or fondled in a sexual way) is required to be counted as exposure to sexual abuse, whereas exposure to physical abuse requires incidents (e.g., being spanked, kicked, or punched) to occur many times (WHO, 2016). The second method is the binary scoring method which uses a lower threshold for identifying an ACE exposure. Any experience of the adversity denotes exposure. For example, being screamed or sworn at once counts as emotional abuse (WHO, 2016). ACE-IQ developers recommended both scoring methods be used to calculate ACE-IQ scores in research and to facilitate comparisons across studies (WHO, 2016). 

Although the ACE-IQ has been previously translated into Traditional Chinese (Ho et al., 20192021), this version is not suitable for use in Mainland China where Simplified Chinese is the official written language. Compared to Traditional Chinese, Simplified Chinese has fewer strokes and is easier to learn. To our knowledge, no studies have reported the translation process and psychometric properties of the ACE-IQ in Simplified Chinese.

Validating the SC-ACE-IQ in Chinese Health Science Students

Chinese health science students (e.g., nursing and medical students) have high rates of mental health problems such as depression, anxiety, and suicidal ideation (Zeng et al., 2019a, 2019b; Zeng et al., 2019a, 2019b). It was estimated that the prevalence of depression, anxiety, and suicidal ideation were 29%, 21%, and 11%, respectively, among medical students in China (Zeng et al., 2019a, 2019b). The associations between ACEs and poor mental health outcomes in adulthood have been well established (Hughes et al, 2017; Kessler et al., 2010). However, little is known about ACEs and their impact on mental health outcomes in Chinese health science students. The aims of this study were to (a) translate the ACE-IQ into the official language of Mainland China, Simplified Chinese, (b) assess the psychometric strength (i.e., content validity, criterion validity, and test–retest reliability) of the Simplified Chinese version of the ACE-IQ (SC-ACE-IQ) among Chinese health science students, and (c) compare rates of ACEs exposure calculated using binary and frequency scoring methods proposed by the ACE-IQ developers.

Methods

Study Design and Sample

This descriptive, cross-sectional study involved three phases, described in detail below (Fig. 1). In Phase I, the ACE-IQ was translated from English to Simplified Chinese and then back-translated to English to confirm translation accuracy. In Phase II, an expert panel evaluated the content validity of the SC-ACE-IQ draft. In addition, cognitive interviews were conducted with a sample of Chinese young adults who were health science students from the target population to assess the SC-ACE-IQ draft’s clarity, sensitivity, relevance, and cultural appropriateness. In Phase III, test–retest reliability and validity of the final draft of the SC-ACE-

Fig. 1
figure 1

The study process of translation and psychometric evaluation of ACE-IQ. Note. ACE-IQ Adverse Childhood Experiences-International Questionnaire, SC-ACE-IQ Simplified Chinese version of Adverse Childhood Experiences-International Questionnaire

IQ were examined with a sample of undergraduate and graduate students in Shanghai, China.

In Phase III, a total of 566 eligible health science students completed the online surveys. Participant characteristics are presented in Table 2. Participants’ average age was 22 years old (SD = 2.83). Most participants were single (81.9%), female (70.1%), and undergraduate students (74.8%). This study was approved by the corresponding author’s University Institutional Review Board.

Table 2 ACE prevalence rates by characteristics of Chinese young adults, using two SC-ACE-IQ scoring methods

Phase I: Translation and Back-translation of the ACE-IQ

The ACE-IQ was translated from English to Simplified Chinese by a professional.

independent translator and then back-translated independently by two bilingual (Chinese and English) study team members who are experienced in ACEs and mental health and familiar with the Chinese culture. These two bilingual team members then compared the back translations with the original ACE-IQ and discussed conceptual, cultural, and linguistic discrepancies which were resolved by consensus. Several items of the SC-ACE-IQ were revised for clarity and cultural or linguistic appropriateness. Through this process, the initial draft of SC-ACE-IQ was developed.

Phase II: Content Validity Testing and Pre-testing the Translated ACE-IQ

Ten multidisciplinary experts from Mainland China rated items from the initial draft of the SC-ACE-IQ for content validity. Seven were experienced pediatric nurses or nurse managers who were familiar with childhood adversities in the Chinese population. Two were mental health and children’s health experts, and one was a survey methods expert. Experts rated the items on the relevance to childhood adversities and their appropriateness in Chinese culture and society using a 4-point Likert scale (4 = highly relevant to 1 = not relevant) (Polit & Beck, 2006; Polit et al., 2007). Experts were also asked to suggest item revisions to enhance clarity and cultural, conceptual, and linguistic appropriateness. We computed item content validity indices (I-CVIs) and the average scale content validity index (S-CVI) for the SC-ACE-IQ. The content validity index (CVI) is one of the most commonly used methods to evaluate content validity quantitatively. Item-CVI (I-CVI) and scale-level CVI (S-CVI) are two types of CVIs. I-CVI is the proportion of experts giving item a relevance rating of 3 or 4. S-CVI is the average of the I-CVI scores for all items (Yusoff, 2019). The second draft of SC-ACE-IQ was developed after content validity testing.

Two rounds of cognitive interviewing were conducted by three bilingual authors (Chinese and English) to pre-test the second draft of the SC-ACE-IQ. For cognitive interviewing, eight to 15 respondents are considered sufficient to identify problems in the materials presented (Boeije and Willis, 2013; Willis, 2004). A total of 12 nursing students participated in the cognitive interviews. For the first round of cognitive interviewing, three bilingual authors independently conducted individual interviews with four students through videoconferencing using a semi-structured interview guide. Revisions were made after the first round of cognitive interviewing. To ensure clarity and appropriateness of the final draft, the second round of cognitive interviewing was conducted with six students randomly selected from the 12 students who participated the first round of cognitive interviewing. The SC-ACE-IQ finalized in Phase II was the version tested in the next phase.

Phase III: Testing Reliability and Validity of the SC-ACE-IQ

Convenient sampling was used to recruit potential participants to test the validity and reliability of the finalized SC-ACE-IQ draft. The web-based SC-ACE-IQ survey was distributed to potential participants between June and July 2020. Health science students eligible for this phase of the study were (a) 18 to 38 years old, (b) enrolled in a health science major in universities in China, and (c) native Chinese. The survey weblink was distributed via student cohorts’ online groups on WeChat, the most used communication software in Mainland China.

The first page of the online survey described the study purpose, survey content, and that their participation was voluntary and anonymous. Participants provided implied consent by answering the survey questions. To avoid multiple entries from the same participant, the survey was designed to allow one-time completion for the same electronic device. Participants were also asked to voluntarily provide their email and/or phone number if they agree to participate in the follow-up for test–retest reliability evaluation.

Measures

The SC-ACE-IQ used in this study included 25 items assessing 12 ACE categories (see Table 3). The SC-ACE-IQ included multiple response options: categorical (i.e., Yes/No/Not sure/Refused); 5-point Likert scaling with response anchors “Always,” “Most of the time,” “Sometimes,” “Rarely,” “Never,” and 4-point Likert scaling with response anchors “Many times,” “A few times,” “Once,” “Never.” Exposure to each ACE category requires an affirmative response to at least one of the items under that ACE category. The total number of ACE categories to which participants endorsed are summed to create a total ACE score ranging from 0–12. For comparison, the total ACE scores were computed using both the binary and the frequency scoring method (see Table 3).

Table 3 ACE prevalence rates among Chinese young adults by category and total score, using two SC-ACE-IQ

To evaluate criterion validity, we examined whether the SC-ACE-IQ score is correlated with mental health outcomes in adulthood. The adverse impacts of ACEs on depressive and anxiety symptoms have been well-documented (CDC, 2019, Hughes, et al., 2017; Petruccelli et al., 2019). Thus, depressive and anxiety symptoms in adulthood were selected as the criterion. Participants completed the Chinese version of the PHQ-9 (Kroenke et al., 2001) and the GAD-7 (Spitzer et al., 2006). The Chinese version of PHQ-9 and the GAD-7 have demonstrated good internal consistency and test–retest reliability in the general Chinese population (Spitzer et al., 2006; Tong et al., 2016; Wang et al., 2014; Zeng et al., 2013). We hypothesized that SC-ACE-IQ scores would be positively correlated with depressive and anxiety symptom scores as measured by the Chinese versions of PHQ-9 and GAD-7.

Statistical Analysis

All analyses were performed using SPSS version 23.0 (SPSS Inc., Chicago, IL, USA). Descriptive statistics summarized exposure to individual ACE categories and the cumulative ACEs scores and participant characteristics. Test–retest reliability was examined using intra-class correlations (ICC). ICC estimate was calculated based on a mean-rating (k = 2), absolute-agreement, two-way mixed-effects model. ICC values between 0.75 and 0.90 indicate good reliability and values greater than 0.9 indicate excellent reliability (Koo & Li, 2016). To establish criterion validity, we examined the correlations between the SC-ACE-IQ score and PHQ-9 score as well as GAD-7 score using bivariate correlation analysis. The level of statistical significance was set at 0.01 given the sample size.

Results

Content Validity and Pre-testing of the SC-ACE-IQ Draft

Experts rated the SC-ACE-IQ as highly relevant and culturally/linguistically appropriate. The S-CVI index of the SC-ACE-IQ was 0.89. In total, six out of 43 items had I-CVI scores lower than 0.80, and all six items were in the demographic section of the survey. Three out of these six items were dropped. For example, one item “What is your [insert relevant ethnic group/racial group/cultural group/others] background?” was dropped because of the low I-CVI (0.4) and deemed culturally irrelevant for Mainland China. Although the I-CVI scores for the other three items (0.6–0.7) were lower than the standard score of 0.8 (i.e., “Sex,” “How old are you?,” “If you are a mother or father what was your age when your first child was born?”), they were retained for descriptive purposes.

In the pre-testing phase, feedback from cognitive interviewing respondents addressed areas of improvement on the survey. For example, for item “Did a parent guardian or other household member yell, scream or swear at you, insult or humiliate you?” some respondents were uncertain of how to score the frequency or severity of these occurrences given the belief that strict discipline was consistent with Chinese cultural norms. To address this uncertainty, a comment on the decision rule, “Select your answer based on your subjective experience” was added to this item. Linguistically unnatural wording was identified and revised based on respondents’ recommendations. For example, one respondent suggested we translate “a sexual way” to “with sexual connotations” in the question “Did someone touch or fondle you in a sexual way when you did not want them to?” Cognitive interview respondents suggested that questions about collective violence (e.g., wars, terrorism, and political or ethnic conflicts) were not applicable to their experiences growing up in Mainland China. Questions about collective violence were removed from the SC-ACE-IQ based on their low cultural relevance in Mainland China. The removal of collective violence category is consistent with studies that use ACE-IQ in other Asian contexts, such as South Korea, Vietnam, and Singapore (Polit & Beck, 2006; Polit et al., 2007; Subramaniam et al., 2020). Following the first round of revisions based on the cognitive interview, the finalized SC-ACE-IQ was developed. All six students who participated in the second round of cognitive interviews stated that the finalized SC-ACE-IQ was clear and appropriate, and the format was easy to follow. Thus, no further revisions were made.

Reliability and Validity of the SC-ACE-IQ

Two weeks after completing initial surveys, 119 participants who had provided their contact information were invited to take the SC-ACE-IQ survey the second time. Fifty-six of 119 participants (47%) completed the SC-ACE-IQ survey twice over a two-to-three-week period. According to the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist for assessing methodological quality of studies on measurement properties, sample size of 50–59 for test–retest reliability indicates good quality (Terwee et al., 2012). The SC-ACE-IQ test–retest reliability over a two-to-three-week interval was ICC = 0.88. In terms of criterion validity, as hypothesized, ACE scores were positively associated with more depressive symptoms when analyzed using both the binary scoring method (r = 0.26, p < 0.001) and the frequency scoring method (r = 0.29, p < 0.001). ACE scores were also positively associated with more anxiety symptoms when analyzed using both the binary scoring method (r = 0.22, p < 0.001) and the frequency scoring method (r = 0.24, p < 0.001).

Comparison of Binary and Frequency Scoring Methods

Differences in ACE scores and categories calculated using the binary and frequency methods are shown in Table 3 and depicted in Figs. 2 and 3. Using the binary method, ACE scores ranged from 0 to 10 (M = 3.10; SD = 2.07); 88.5% of participants reported at least one ACE, and 42.6% experienced four or more ACEs. Community violence (58.3%), household member treated violently (56.9%), emotional abuse (52.3%), and physical abuse (45.1%) were the most common types of ACEs reported using this method.

Fig. 2
figure 2

Comparison of ACE total scores among Chinese young adults using binary or frequency scoring methods

Fig. 3
figure 3

Comparison of ACE categories among Chinese young adults using binary or frequency scoring methods

Using the frequency scoring method, ACE scores range from 0 to 10 (M = 1.42; SD = 1.64); 63.1% of participants reported at least one ACE, and 12.2% experienced four or more ACEs. Emotional neglect (31.6%), household member treated violently (26.1%), death of one or both parents, parental separation or divorce (16.8%), and sexual abuse (15.4%) were the most common types of ACEs reported using this scoring method.

Discussion

The purpose of this study was to translate the ACE-IQ into Simplified Chinese, examine the psychometric strength of the SC-ACE-IQ in a sample of Chinese health science students, and compare SC-ACE-IQ scores calculated using binary and frequency scoring methods. The results of this study indicate that the SC-ACE-IQ is a valid and reliable measure of ACE exposures among health science students in our sample. Two-to-three-week test–retest reliability of the SC-ACE-IQ were high. This result is similar to those reported for the Traditional Chinese version of ACE-IQ (ICC = 0.90) among college students in Hong Kong (El Mhamdi et al., 2020) suggesting that students’ recall of ACE exposures was stable. The SC-ACE-IQ scores were positively correlated with participants’ reports of depressive and anxiety symptoms, which support the criterion validity of the SC-ACE-IQ. This finding is consistent with previous studies that demonstrated the dose–response relationships between ACEs and risks of depression and anxiety (Hughes et al., 2017). For example, one study using the ACE-IQ score and Beck Depression Inventory (BDI) (r = 0.35, p < 0.001) (Kidman et al., 2019). It is important to note that although statistically significant, the correlations between SC-ACE-IQ and depression and anxiety score were both modest in our sample. This suggests that the other variables, such as those measuring individual sources of strength and resilience, would also be important to measure as these may buffer the effects of ACEs on health science students’ mental health (Yu et al., 2021).

To our knowledge, this is the first study comparing the two scoring methods for the ACE-IQ proposed by the WHO. We found that total ACE scores were substantially different based on the scoring method used. Particularly, the percent of participants with four or more ACEs is nearly 3.5 times higher using the binary method compared to the frequency method. This discrepancy reflects the lower threshold used in binary scoring method to identify exposure to ACEs. For example, being screamed or sworn at least once counts as emotional abuse using the binary method; however, being screamed or sworn at least many times counts as emotional abuse using the frequency method (WHO, 2016). Computed using the frequency scoring method, 63.1% of the participants in our study reported exposure to at least one ACE and 12.2% reported exposure to four or more ACEs. This prevalence of ACE exposure is similar to one study that used the frequency scoring method to examine the ACE exposure among 433 college students in Hong Kong (Ho et al., 2019, 2021). This study found that 74.36% reported exposure to at least one ACE and 18.7% to four or more ACEs. However, when computed using the binary scoring method, 88.5% of the participants in our study reported exposure to at least one ACE and 42.6% reported four or more ACEs. This prevalence of ACE exposure is higher than those reported in studies using the ACE-IQ to measure ACEs in Asian contexts. For example, one study examined the ACE exposure among 2099 college students in Vietnam found that 76% reported exposure to at least one ACE and 21% to four or more ACEs, without describing which scoring method was used to calculate ACE scores (Tran et al., 2015).

In addition, the two scoring methods also create considerable differences in ACE prevalence rates at the categorical level. For example, prevalence of physical abuse differed substantially by scoring method with 45.1% of respondents reporting physical abuse using the binary scoring method but only 9.7% of respondents reporting physical abuse using the frequency method. In contrast, emotional neglect was the most frequently endorsed ACE category (31.6%) using the frequency scoring method, whereas endorsed by only 2.1% of respondents using the binary method. Such differences in ACEs exposure have implications on setting priorities and allocating resources for ACEs research and clinical screening and interventions. However, currently, there is no clear guidance on how to select scoring methods for ACE research and screening.

Cultural context may be one of the considerations when selecting SC-ACE-IQ scoring methods for research or clinical screening. Research has shown that participants from Chinese culture tend to restrain their negative feelings, perceive their adversities as a private matter, and keep adversities as a secret to “save face” for the family (Ho., Chan, Shevline, et al., 2019). Thus, Chinese participants may under-report ACEs exposures due to stigma and cultural norms on adversities. Since frequency scoring method provides more conservative ACEs score, using this method in Chinese populations may lead to an underestimation of ACE exposure.

Limitations

This study has several limitations. First, the cognitive interviews were conducted with all female nursing students; men and students in other health science majors may have completed this phase differently or raised additional questions about item clarity. Second, this study used a sample of health science students for the psychometric evaluation, limiting the generalizability of the findings. These health science students were likely to have higher socioeconomic status compared to young adults who were not in college. ACE prevalence rate can be even higher in the general Chinese young adult populations. Future research should replicate the study using a nationally representative sample of young adults with diverse educational background in Mainland China. Third, the expert panel in the content validity testing phase lacked researchers in the field of child development and psychopathology. Lastly, the study was conducted during the COVID-19 pandemic and this may have affected depressive and anxiety symptoms self-reported by young adults.

Implications

The considerable differences in ACE estimations caused by binary versus frequency scoring methods have implications for research, clinical practice, and policy-making. These differences in ACE estimations can create difficulties in comparing ACE exposure rates across studies. Because the two scoring methods vary by threshold for identifying ACEs exposure, it also has implications for clinical screening and intervention. When the screening threshold is set too high, clinicians may fail to identify individuals who have experienced ACEs and provide timely intervention to address their potential unmet needs. Future research needs to focus more on understanding of how and when to use these different scoring methods for research and diverse samples.

Conclusions

The present study filled important gaps in the literature by translating and validating the ACE-IQ for a sample of health science students in Mainland China. Our findings suggest that the SC-ACE-IQ is a stable and reliable measure and is associated with key indicators of mental health (symptoms of depression and anxiety). Our study also indicated that the choice of ACE-IQ scoring method can make a vast difference in how ACE exposure rates are reported and understood. Future research needs to focus more on understanding of how and when to use these different scoring methods for research and clinical practice.