Background

Child maltreatment is common and a risk factor for various developmental outcomes. Studies show that the consequences of child maltreatment are diverse [5] and may persist into adulthood [7, 10]. Besides mental disorders, such as posttraumatic stress disorder, depression and behavioral problems, consequences may include chronic somatic diseases [9] and may cause a high economic burden [8, 14]. Additionally, a range of meta-analyses have documented the impact of child maltreatment on neurobiological alterations [15, 23, 24, 29, 30]. Therefore, the reliable and economic assessment of child maltreatment is essential to various settings and research questions. To better put the results of specific populations or individuals into context, normative data is necessary.

A range of measures is available for the retrospective assessment of child maltreatment. One of the most widely used is the short version of the Childhood Trauma Questionnaire (CTQ) [4]. The short form of the CTQ consists of 28 items, measuring five subtypes of maltreatment: emotional abuse, physical abuse, sexual abuse, emotional neglect and physical neglect. Participants are required to rate abuse and neglect events on a 5-point Likert Scale (1- never true to 5-very often true). The CTQ is applied in clinical and non-clinical settings and is generally considered a reliable and valid instrument for the retrospective assessment of maltreatment [12].

The CTQ has been further reduced to a five-item screening version, the childhood Trauma Screener (CTS) [13], for more practical, cost-effective and accessible and therefore less error-prone use. Each of the five items assesses one subtype of maltreatment. The final five items of the CTS were derived based on their psychometric properties in a study, including a representative sample. One item of each of the five CTQ subscales was chosen based on values of discriminatory power, explained variance and feasibility to best represent the subscale [13]. The correlations between the items and the respective subscale ranged between r = 0.55 and r = 0.87. The internal consistency was acceptable (α = 0.76). The authors concluded that the CTS represents a reliable and very economic and straightforward screening tool for the retrospective assessment of child maltreatment [13]. Such brief screening instruments can economically assess covariates in studies or in stepped diagnostical approaches [12]. Therefore Glaesmer and colleagues provided clinical cut-offs based on a representative general population study in Germany [12]. However, the CTS has only been evaluated in the context of the CTQ. The five items of the CTS have never been examined independently. The five items of the CTS might show different psychometric properties when presented isolated compared to when presented in the context of the CTQ. Therefore, the present study aims to evaluate and validate the CTS based on two representative samples and provide normative data.

Methods

Study design and participants

Two representative population-based surveys were conducted in 2013 and 2018 in a three-stage approach by a research institute (USUMA) using identical procedures. The surveys were conducted in collaboration of different research groups focusing on health and wellbeing in the general population. As the two studies were conducted with identical procedures, data of these were combined to achieve a maximum of statistical power with a large sample size. In the first step, systematic area sampling was conducted based on the municipal classification of the Federal Republic of Germany (ADM F2F Sampling Frame). In doing so around 53,000 areas all over Germany were delimited electronically, these contained an average of around 700 private households in each area. These areas were then first layered regionally according to districts into a total of around 1500 regional layers and then divided into 128 disjunct “networks”. Each network served as a sampling frame, containing 258 single sample points proportionate to the distribution of private households in Germany. In the second stage, private households were systematically selected with a random route procedure [17] at each sample point. Households of every third residence in a randomly selected street were invited to participate in the study. In the third stage, in multi-person households, a kish-selection grid was used to ensure random participation. This means that to determine the target person, all members of the household who are 14 years and older are first entered into a scheme on the address list: all men who live in the household and are at least 14 years old are entered in descending order according to their age in boxes (e.g. 1 to 4) and all women are also entered in descending order according to their age in boxes (e.g. 5 to 8). The person whose number appears first in the sequence of random numbers is then to be interviewed, whereby the respective order of the random numbers varies in the data collection protocols. In this way, the target person is selected completely independent of the interviewer and the contact person [20].

Participants had to be at least 14 years of age and have sufficient German language skills. The potential participants were informed that the study was about health and well-being. Informed consent was obtained from those who indicated willingness to take part. The overall response rate was 78.9%.

Anonymity for saving the data and analyzing the data was guaranteed. After collecting sociodemographic data through a face-to-face interview, the researcher handed the questionnaires to the participant along with an envelope to seal afterwards, and then left the room but stayed nearby in case help was needed. The completed questionnaires were linked to the respondents' demographic data, but did not contain their name, address or other identifying information. Both surveys were conducted in accordance with the Declaration of Helsinki. They fulfilled the ethical guidelines of the International Code of Marketing and Social Research Practice of the International Chamber of Commerce and the European Society of Opinion and Marketing Research. Both surveys obtained ethics approval from the ethics committee of the Medical Faculty of the University of Leipzig before being carried out.

Measures

Survey participants completed the Childhood Trauma Screener (CTS) [13]. The CTS consists of five items, which are: When I was growing up…

  1. 1.

    I felt loved (R) (emotional neglect)

  2. 2.

    There was someone to take me to the doctor when I needed it (R) (physical neglect)

  3. 3.

    People in my family hit me so hard, it left me with bruises or marks (physical abuse)

  4. 4.

    I felt that somebody in my family hated me (emotional abuse)

  5. 5.

    Somebody molested me (sexual abuse)

Respondents rate the items on a five-point Likert-Scale (1- never true to 5-very often true). The items can be used independently and a total score of all five items can be calculated ranging from five to 25. Additionally, we investigated a subscale for neglect, consisting of the two items for emotional and physical neglect with scores ranging from two to ten and the abuse subscale including the three items for emotional, physical and sexual abuse, with scores ranging from three to 15. This subdivision follows the Centers for Disease Control (CDC) definition of child maltreatment in the US [22].

Sociodemographic characteristics such as age, gender, education, marital status, employment status, net household income, nationality, place of residence, and religious affiliation were collected in a face-to-face interview. To assess convergent validity, part of the sample also completed the German version of the Adverse Childhood Experiences (ACE) questionnaire (ACE-D, [31]. This questionnaire consists of 10 items assessing adverse childhood experiences, including emotional abuse, physical abuse, sexual abuse, emotional neglect, physical neglect, parental separation, domestic violence, substance abuse, and incarceration of a household member. Items are scored dichotomously whether or not participants experienced these adverse childhood experiences in childhood.

Statistical analyses

Item characteristics of the CTS items, including item means and item-intercorrelations, were examined. For reliability, the internal consistency (Cronbach's α) of the CTS total scale and the abuse and neglect subscales was assessed. For factorial validity, the factor structure of the CTS was investigated using confirmatory factor analysis (CFA) [18]. To assess dimensionality, CFAs were used to examine a two-dimensional structure of the CTS representing two subscales and a one-dimensional structure representing the total CTS score. Factorial invariance was tested between two subsamples divided by gender. We used five criteria to assess how well the model fits the data [18]. Three of these criteria indicate the absolute model fit: the root mean square error of approximation (RMSEA), the 90% confidence interval for RMSEA, and Standardized Root Mean Square Residual (SRMR). The other two criteria represent measures of relative model fit: the Comparative Fit Index (CFI) and the Tucker Lewis Index (TLI). RMSEA < 0.05 represents a “close fit”, RMSEA between 0.05 and 0.08 represents a “reasonably close fit”, and RMSEA > 0.10 represents an “unacceptable model” [18]. SRMR of 0 represents a perfect fit, SRMR < 0.05 represents a good fit, and an SRMS between 0.05 and 0.10 represents an adequate fit [18]. CFI and TLI indicate how well a given model fits the data relative to a “null” model, which assumes that sampling error alone explains the covariation among the observed measures. Hu and Bentler [18] have suggested that measurement models should have a CFI and TLI of at least 0.95.

For convergent validity, we investigated inter-correlations of the items of the CTS with the ACE [31]. Because of the ordinal nature of the data and non-normality, Kendall’s Tau was calculated.

To obtain normative data for the CTS, age- and gender-specific percentiles were generated for each CTS item, the total score, and the subscales. Percentiles were used because they are independent of the distribution of scale scores. Percentiles indicate the subject’s rank compared to other subjects of the same age group and gender, using a hypothetical group of 100 subjects. The sample size was sufficient to be divided into gender-specific age groups of ten years each for better clarity. Statistical analyses were conducted using SPSS, Version 21 and MPLUS, Version 7.3 [25]. Due to the large number of participants, the two subsamples differ significantly in the CTS items (emotional neglect χ2 = 57.3, physical neglect χ2 = 57.2, physical abuse χ2 = 57.8, emotional abuse χ2 = 34.5 and sexual abuse χ2 = 60.2) but only with very small effect sizes (emotional neglect Cramer’s V = 0.11, physical neglect Cramer’s V = 0.11 physical abuse Cramer’s V = 0.12, emotional abuse Cramer’s V = 0.08 and sexual abuse Cramer’s V = 0.11).

Results

Sample characteristics

Sample characteristics of the total sample and the two subsamples of 2013 and 2018 are presented in Table 1.

Table 1 Demographic characteristics of total sample and subsamples from the general population

A total of 9453 valid addresses were identified (4360 in 2013 and 5093 in 2016). The main reasons for non-participation were that it was not possible to reach someone in the residence (after four attempts: 2013: 12.9%, 2018: 14.4%), that the person who answered the door refused to let anyone in the household participate in the study (2013: 13.6%; 2016: 16.5%), that it was not possible to contact the randomly selected household member (2013: 1.9%, 2018: after four attempts: 2.6%) and that the selected member refused to participate in the study (2013: 12.4%; 2018: 15.8%).

The final sample counted 5039 participants. Missing values were low. For example, only 14 respondents did not complete the CTS. Of the total sample, 54.3% were female. The mean age was 49.1 years (SD = 18.2). The sample characteristics closely match those of the German population in gender (54.3% female vs. 50.8%), employment status (unemployed: 5.3% vs. 5.7%), and educational level (Statistisches) [26]. However, compared to the general population, subjects of non-German nationality were underrepresented in our study sample (4.1% vs 11.1%).

Item characteristics, internal consistency and factorial validity

The item characteristics are presented in Table 2.

Table 2 Item characteristics of the CTS in the general population

The inter-correlations between the items ranged between 0.17 for physical neglect and sexual abuse and 0.57 for emotional abuse and physical abuse, indicating small to strong effect sizes [6]. The inter-correlations are presented in Table 3.

Table 3 Spearman Rho Intercorrelation between the CTS items in the general population

Considering the brevity of the two subscales, the CTS total scale (α = 0.68) and the CTS abuse subscale (α = 0.73) showed acceptable internal consistencies. However, the neglect subscale showed poor internal consistency (α = 0.50). An additional table shows this in more detail (see Additional file 1).

To evaluate the dimensional structure of the CTS, confirmatory factor analyses (CFA) was conducted. Therefore, a 1-factor model including all five CTS items was estimated and compared with a two-factor Model with loadings from the two neglect items onto the neglect factor and the three abuse items with factor loadings on an abuse factor. As shown in Fig. 1, factor loadings ranged from 0.34 to 0.82.

Fig. 1
figure 1

Results of the confirmatory factor analyses for a one- and two factor model

Compared with the one-factor model, the suggested two-factor model fits the data better, as indicated by robust fit indices CFI = 0.986, TLI = 0.994, RMSEA = 0.041, and the 90% confidence interval for RMSEA = 0.029–0.053 (model 1 in Table 4). A second step was to test if the model parameters would vary between men and women. Therefore, the total sample was split into males and females. A multigroup CFA was used to evaluate if factor loadings, residuals and model fit would differ between those four groups (Table 4). Chi-square values were used to examine structural invariance between gender groups. Results for the one-factor solution showed less favorable TLI and CFI indices (0.875 and 0.938, respectively). In line with the analyses of the total sample, the comparison for the gender subgroups also indicated a two-factor solution to fit the data best. Generally, the factor loading of the physical neglect item was relatively small.

Table 4 Multigroup confirmatory factor analysis (CFA) in four subsamples of females and males

Convergent validity

To assess convergent validity, inter-correlations (Kendall’s Tau, τ) between the CTS items and the respective items of ACE-D were calculated. The correlation of the emotional neglect items was τ = 0.401 (p < 0.001), for the physical neglect items was τ = 0.161 (p < 0.001), for the physical abuse items was τ = 0.629 (p < 0.001), for the emotional abuse items was τ = 0.489 (p < 0.001), for the sexual abuse items was τ = 0.619 (p < 0.001) and for the total scale of the CTS and the sum score of the five ACE items was τ = 0.406 (p < 0.001).

Normative data

Table 5 summarizes the normative data for the different age groups stratified by gender. Data for the CTS total score, the abuse subscale score and the neglect subscale score are presented. This data can be used to compare individual scores and scores from specific populations with normative data from the general population.

Table 5 Normative data from the general population for CTS

Discussion

Generally, based on more than 5000 participants from the German general population, our results demonstrate that the CTS, an ultrashort version of the CTQ, is a valid screening instrument for retrospective assessment of child maltreatment in the general population. The results also indicate that the physical neglect item has inadequate psychometric properties. Therefore, the use of this particular item is questionable. While data on validation [13] and clinical cut-offs [12] were available, this is the first study to provide norm data and, more importantly, to examine the psychometric properties of the CTS in isolation from the CTQ. Previous studies have examined the psychometric properties of the CTS only in conjunction with the CTQ. It should be considered that these psychometric properties may change when presented separately and not in conjunction with another twenty items that also measure child maltreatment.

Due to the brevity of the CTS and the general interrelatedness of different types of maltreatment [16, 32], it was not possible to test whether each item represented a separate scale. However, the CFA revealed evidence of a two-factor structure of the questionnaire with excellent indicators of model fit. This factor structure is consistent with the Centers for Disease Control's (CDC) classification of child maltreatment, which has established consistent definitions of child maltreatment across professions [22]. In this classification, child maltreatment is split into acts of commission, including emotional maltreatment, physical maltreatment, and sexual abuse, and acts of omission, including failure to assist and failure to supervise. The CFA results are largely consistent with this classification, underscoring the construct validity of the instrument. The correlation between the two factors (see Fig. 1) is also evidence of the general correlation between the different types of maltreatment. Therefore, it can be assumed that the total score and the abuse score (emotional, physical and sexual abuse) and the neglect score (emotional and physical neglect) can be calculated and used in analyses. It should be noted, however, that the neglect scale has insufficient psychometric properties (Cronbach's alpha 0.5) and should therefore be used with caution. In contrast, the abuse subscale showed acceptable internal consistency (Cronbach’s alpha of 0.73) [28]. It is worth noting that such ultra-short measures do not have psychometric properties comparable to longer measures such as the CTQ. However, retrospective assessment of physical neglect appears to be problematic in general, as other studies have also criticized the reliability of the physical neglect subscale of the CTQ [11, 21].

The convergent validity of the CTS is generally supported by high intercorrelations of the CTS items with the corresponding items from the ACE. The sum score also showed moderate correlation with the corresponding score extracted from the ACE. Therefore, the convergent validity of the CTS can be assumed. Once again, the physical neglect item showed only a low correlation, underscoring that this item may not be suitable for reliably and validly assessing physical neglect, which is possibly due to cultural and socioeconomic influences. A recent prevalence study of child maltreatment in the German general population based on the CTQ [32] reports prevalence rates for at least moderate severity. For physical neglect, the authors report a rate of 22.4% for emotional neglect, 13.3% for physical neglect, 6.6% for physical abuse, 6.5% for emotional abuse and 7.6% for sexual abuse. Setting a cut-off of at least 3 in the present study results to comparable rates, with 18.6% for emotional neglect, 25.1% for physical neglect, 10.1% for physical abuse, 8.8% for emotional abuse, with the exception of sexual abuse with a rate of 3.8%. These results suggest that the sexual abuse item may be less sensitive to sexual abuse assessment than when sexual abuse is assessed with more items. Overall, research suggests that measures with more items result in higher prevalence rates of child sexual abuse [27] and therefore may also more accurately capture this type of maltreatment. When assessing child maltreatment, extensive measures should be used whenever possible. However, circumstances such as economic considerations may necessitate the use of an ultrashort measure, such that the use of this measure would be justified. This may be particularly true when child maltreatment is not the primary outcome variable but is used as a control variable.

By reporting normative data, we provide the opportunity to contextualize individual outcomes and specific clinical samples. Given that age- and gender-specific comparative data were generated based on subgroups of 141–500 participants, sample sizes were sufficient to provide normative data for these subgroups. In general, scores of > 2 on the abuse items should be considered a warning signal. However, as mentioned above, the sexual abuse scale may be less sensitive and therefore may result in lower prevalence rates. For this item in particular, values of more than one should be regarded as a warning signal. Due to the rather insufficient psychometric properties, the item on physical neglect should only be used with great caution.

Two of the strengths of this study are the large sample size and the representativeness of the study sample. However, this study also has some weaknesses. The data for this study were combined from two samples from 2013 and 2018 using identical methods and instruments. However, we found that the CTS items yielded slightly lower prevalence rates in 2018 than in 2013. It would appear that these fluctuations are more likely to be due to normal statistical variation. It is less likely that this can be interpreted as an actual change in prevalence rates or that it is due to a change in social norms. Moreover, the order of presentation might also have contributed to these differences. Therefore, combining the two samples could reconcile the differences in prevalence rates and lead to a more accurate estimate of CTS norms. Another potential limitation of the study is the response rate of 78.9%. In general, response rates are lower in general population studies than in clinical trials. Although the random route approach is a very established method, particularly due to its economic and practical qualities, it also has its limitations. Bauer [2, 3] showed in his calculations that it violates the assumption of equal probability and that this leads to distorted expected values for several variables. The strongest errors were found in variables related to the spatial location of households. This means that the method provides good indications of the occurrence of variables in the population, but there may be bias if there is a strong local occurrence. Therefore, the study systematically excludes potential high-risk populations, such as, individuals with inadequate German language skills and individuals currently living in institutions. Another limitation is the Underrepresentation of other nationalities and refugees.In addition, validity tests for an instrument must demonstrate both convergent and discriminant validity. This study for the general population did not include broader measures of child maltreatment or multi-informant measures, which are thought to result in the highest rates. Nevertheless, convergent validity with the German version of the ACE [31] was demonstrated, except for the item assessing physical neglect. Another possible limitation could be the items for emotional neglect and emotional abuse, which reflect the subject's emotional feelings rather than representing objective behavior. The CTS is not sufficient for a comprehensive assessment of child maltreatment. However, it may provide useful information in the context of research that includes child maltreatment as a control variable. Overall, longer instruments such as the CTQ should be preferred for retrospective assessment of child maltreatment. However, in certain circumstances, a shorter instrument may be necessary. The use of the CTS is recommended as an effective instrument in settings where resources are strictly limited. Future studies should focus primarily on a more reliable and valid assessment of physical neglect and a more sensitive assessment of sexual abuse, as prevalence is highly dependent on the assessment method, the definition used or the form of sexual abuse (e.g. hands-on, hands-off acts) [1, 9, 27].