The Childhood Trauma Questionnaire—Short Form (CTQ-SF) used with adolescents – methodological report from clinical and community samples

Purpose The Childhood Trauma Questionnaire—Short Form (CTQ-SF) is a widely used retrospective screening tool for childhood maltreatment in adults. Its properties are less known in adolescents. The objective was to investigate acceptability and psychometric properties when used in adolescents. Method A community sample of adolescents (n=1885) in four waves (from 13 or 14 to 17 years old) and a clinical sample (n=74, mean age 18), both from Sweden, were used to assess acceptability and different aspects of validity and reliability. Results The CTQ-SF was found to be well-accepted. As expected, the community sample scored lower than the clinical sample on all maltreatment-scales and showed stability over-time. In the community sample, internal consistencies were substantial or excellent for all scales except Physical neglect, and in the clinical sample this was found for all scales. One-year test-retest consistencies of subscales were substantial or almost perfect, and for all scales, they increased from early to mid-adolescence. Directed inconsistencies on item level decreased from early to mid-adolescence. Convergent validity was shown in relation to scales on family climate, parental relations, and emotional health also from early adolescence. Discriminant analyses showed more moderate discriminatory ability although almost seven times better than by-chance. Conclusions The CTQ is well accepted and can be trusted to provide consistent and valid self-reports from the age of 14 on childhood maltreatment. Some caution is advised when used with younger adolescents, since the test-retest stability is then weaker, and the interpretation of the M/D scale is more ambiguous.


Introduction
Child maltreatment exemplifies a harmful relational environment that poses significant risks for maladaptation across biological, social, and psychological domains of repeated measures in longitudinal studies, was used by Fan et al. (2006) to identify incorrect responders in adolescent self-reports.
The psychometric properties of the Swedish version of the CTQ-SF have been explored in both clinical and nonclinical adult samples giving retrospective reports on their childhood (Gerdner & Allgulander, 2009). The non-clinical samples were university students, not epidemiologic community samples. Epidemiological norm data from non-clinical adolescent populations in Sweden are still lacking. The psychometric properties of CTQ-SF when used in adolescent community samples are also less known, as its acceptability for use among adolescents (Nilsson & Svedin, 2017).
Even in a severely traumatised clinical sample of addicted women with psychiatric comorbidity, the CTQ-SF was found to be acceptable and non-intrusive (Lundgren et al., 2002). Epidemiological studies on adolescents using other instruments report that 4-15% felt upset by questions on emotional problems, including questions on childhood trauma (Finkelhor et al., 2014;Hasking et al., 2015). However, we saw no study reporting specifically on adolescent acceptability of the CTQ. Bernstein & Fink (1998) published norm data from an adolescent clinical population (n=398, aged 12-17), but not from an adolescent community population. Swedish norm data on adolescent populations are still lacking. Combining clinical and non-clinical adolescent samples in the same study makes it possible to study discriminant ability, i.e., the ability of the CTQ to discriminate between the clinical and non-clinical samples.
The test-retest consistency of the CTQ was studied by the creators in a clinical population (n=40) showing substantial test-retest reliability with retests conducted after 1.6-5.6 months (mean 3.6 months) and showing a high intraclass correlation of all subscales (R=.79-0.86) (Bernstein & Fink, 1998). Test-retest was also tried in three independent studies in different populations (Cammack et al., 2016;Kim et al., 2013;Paivio, 2001). Cammack and colleagues studied testretest during pregnancy and after completion of the pregnancy with a substantial proportion of women who reported exposure to maltreatment. They found test-retest reliability to be at least moderate, indicating consistent reporting. Kim and colleagues studied outpatient and inpatient schizophrenia patients and found high test-retest reliability. Paivio In a review of research on the developmentally salient outcomes of child maltreatment during adolescence, Trickett et al. (2011) found extensive evidence for the impact of child maltreatment on adolescent development. For example, early maltreatment has been shown to have a negative impact on affect regulation, attachment relationship formation, and self-system development (Cicchetti & Rogosch, 2002;Trickett et al., 2011). This may lead to an impaired ability to become more self-directed and independent and to manage close relationships outside the immediate family (Cicchetti & Rogosch, 2002). Maltreated adolescents therefore tend to have more problems in peer relationships, romantic relationships, and academic functioning than their non-maltreated peers (Trickett et al., 2011). Delinquency and substance use have also been found to be more common among maltreated adolescents (Trickett et al., 2011). A remaining impact in adulthood is suggested by associations with such as epigenetics (Bendre, 2019), the serotonergic system (Berglund et al., 2013), and the immune defence system (Carlsson, 2016).
There are different ways of gathering data about child maltreatment. Methods typically involve retrospective selfreports from adults, reports from caregivers, observations of caregiver behaviour, and/or analyses of records from child welfare services and/or medical records. These different approaches yield to very different prevalence rates (Shaffer et al., 2008). For example, given that only a small number of maltreatment cases are identified through child protective services (CPS) reports, relying exclusively on such reports risks underestimating the incidence of maltreatment in a population (Briere 1992;Kendall-Tackett & Becker-Blease, 2004). Self-reports, however, can be influenced by subjective interpretations of questionnaire items and/or different degrees of willingness to report acts of child maltreatment (Weeks & Widom, 1998;Widom et al., 2004). Shaffer et al., (2008) found that the cases with the greatest number of incidences of maltreatment were likely to be identified by both retrospective self-report and official records. In a new study, Kalin et al. (2021) found that many of the most severe cases were in fact not detected by child welfare services. Hence, it is important to identify reliable self-report measures that can correctly evaluate to what degree adolescents have been exposed to child maltreatment.
From a critical review, Hardt and Rutter (2004) examined the evidence of validity of retrospective reports by adults of their own adverse experiences in childhood and found a substantial rate of false negatives and measurement errors, but also that false positive reports were probably rare. They argued that comparison of contemporaneous and retrospective accounts obtained in epidemiological/longitudinal studies of non-clinical populations is the best method to address this issue of validity. A similar approach, using long-term include children aged 15 and older, mainly addressing highrisk populations (Aloba et al., 2020;Charak et al., 2017, Dovran et al., 2013Grassi-Oliveira et al., 2014), and to our knowledge this is the first study on adolescents addressing test-retest.
In conclusion, there are extensive psychometric studies internationally regarding the use of the CTQ-SF in adults, but there are fewer studies concerning adolescents both in community and in clinical populations. No equivalent study on adolescents in a non-clinical community sample was found. It is important to test its acceptability as well as its validity and reliability for adolescents, publish normative data, and address the issue of how the Minimising/Denial scale works in adolescent populations.
The aims are to present acceptability and psychometric data on the CTQ-SF when it is used repeatedly in a community adolescent population. This includes means and distributions over the years, intercorrelations between scales and between waves, internal consistencies over the years, global (test-retest) consistencies on both scale and item levels, also testing directionality in the case of non-consistent items, convergent validity, and norm data in adolescents. Based on that, a further aim is to evaluate from which age the CTQ-SF can be recommended, and then if the M/D scale can be used in the same way. Additional aims are to present psychometric data in an adolescent clinical sample and the ability of the CTQ-SF to discriminate between the community and clinical samples.

The instrument
Childhood Trauma Questionnaire-Short Form (CTQ-SF; Bernstein & Fink, 1998) consists of 28 items, of which 25 measure childhood maltreatment (total), including five subscales of five items each, i.e., Emotional Abuse (EA), Physical Abuse (PA), Sexual Abuse (SA), Emotional Neglect (EN) and Physical Neglect (PN). Three items are designed to measure Minimisation/Denial (M/D). All 28 items are constructed as statements beginning with the phrase 'When I was growing up…' The key wordings of the items that followed are shown in Table 4. All five abuse and neglect subscales are sums of the scorings from 'never true' (score 1) to 'very often true' (score 5), and after reversing seven items, all subscales can therefore vary between 5 and 25. The M/D scale is different, since only the highest positive scores (score 5) are counted, and it can therefore vary from 0 to 3. Ratings of statements, such as 'I had the perfect childhood' with the highest possible score, are unrealistic and therefore biased. Positive scores on M/D may therefore conducted test-retest of the CTQ before and after therapy of maltreatment victims. Although there is moderate to high consistency in all these studies, which involved various populations of adults, the nature of inconsistencies should also be analysed. Decline in recall accuracy and detail is one proposed factor and could increase with a greater time lapse between the events and the survey (e.g., Williams, 1994). Embarrassment or a wish to protect the perpetrator may result in conscientious underreports (Melchert & Parker, 1997). Positive reconstructing (minimisation) is another factor, resulting in under-reports of maltreatment being more frequent than over-reports (Della Femina et al., 1990;Fergusson et al., 2000).
Test-retest data concerning the CTQ-SF in adolescent populations are scarce but should contribute to the understanding of the ecological validity of the CTQ-SF. Adolescents report their experiences closer in time and could therefore have less memory bias. Lack of distance to the experiences at very young age might also be a problem, if the most recent events are given more attention than the general situation. All the proposed factors involved in explaining inconsistencies could be influenced by age at the time of reporting. Specifically addressing the test-retest stability of self-report measures within the developmental period of adolescence could give important information about the nature of reporting bias in retrospective self-report measures of child maltreatment (Hardt and Rutter, 2004). It is therefore relevant to examine whether the CTQ works similarly when used in an adolescent population, and from what age it can best be used.
Another issue concerns the M/D scale in the CTQ, which is used to identify people where there is a risk of minimisation and denial. Unrealistic extreme scorings on the three M/D items are assumed to demonstrate such risks, and the clinician is encouraged in such cases to try to find additional data sources that can confirm the assessment (Bernstein & Fink, 1998). This way of interpreting M/D had support in the Swedish study (Gerdner & Allgulander, 2009) which showed that high M/D is consistent with the propensity to give socially desirable rather than correct answers. Further support for the position was given in a large collaborative international study on the M/D scale by MacDonald and others (2016). A related question is therefore whether the interpretation of M/D should be the same when used in an adolescent population.
The CTQ-SF is recommended to be used from the age of twelve (Bernstein & Fink, 1998), based on its use in the adolescent clinical population aged 12-17 (Bernstein et al., 1997). There is, however, a lack of studies evaluating its use in early adolescence based on analyses of its acceptability, consistency and functional equivalence of items and scales, including M/D. Prior studies on adolescents mainly The CTQ was used repeatedly in four waves, in part starting in Wave 2 (aged about 13 and 14 years), and in full starting in Wave 3 (14 and 15 years) and was then repeated in Wave 4 (one cohort, about 15 years), and Wave 5 (about 17 years). In Wave 2, only 13 of the 28 items were used (subscales EN, EA and the three items of M/D). This decision was taken for two reasons, one was that sexual and physical abuse was regarded as very sensitive at such young age, and the other was to save space since the Wave 2 questionnaires included a comprehensive personality inventory. The first three waves (W2-4) had the same time frame, asking the respondents about their childhood situation 'before you were 12 years old'. In Wave 5, when they had reached the age of 17, the time frame was changed to 'before you were 15 years old'. The reason for that was the possibility of finding emerging problems with onset during adolescence. The responders were not informed that the questions were to be repeated in new data collections.

Clinical sample
The data collection for the clinical sample was conducted in 2016 at nine outpatient clinics located in south and central Sweden. The clinics are specialised units aimed at young people with substance use problems and are run in collaboration between social services and health care. All clinics offer various forms of family therapeutic treatment and manual-based treatment programs for alcohol and drug addiction, often with multimodal approaches. The average length of care is four to six months (Anderberg & Dahlberg, 2018).
At the beginning of treatment, the staff carry out a survey based on the structured interview method UngDOK (Dahlberg et al., 2017). For this study, a number of participants were asked if they also wanted to complete a selfassessment form about experiences of child maltreatment before the age of 12, i.e., the CTQ-SF. In these matters, a short screening questionnaire is preferable to an interview since it is less intrusive and gives more valid replies (Kim et al. 2008). A total of 74 young people (39% girls) chose to fill out the CTQ-SF. The average age was 18 years with a spread of 14-25 years. The study was approved by the Ethics Review Board in Gothenburg (Dnr. 2015/160-31).

Analytical design
The data obtained were used for several analyses and reports: 1. To determine the acceptability of the CTQ based on community respondents´ comments.
be problematic, and it has been recommended in such cases to interpret clinical assessment on maltreatment scales with caution and possibly try to validate data through other sources (Bernstein & Fink, 1998;Gerdner & Allgulander, 2009;MacDonald et al., 2016). The same items can also be used as a summative scale in full length (3-15) as suggested to measure Idealising Upbringing (IU), and not necessarily a bias (Gerdner & Allgulander, 2009).

Community sample
Data from the LoRDIA programme was used. LoRDIA (Longitudinal Research on Development In Adolescence) followed adolescents of two year-cohorts in four small and medium-sized municipalities (10,000-38,000 inhabitants) in the south of Sweden. Two municipalities are industrial and two are commuting communities, linked to nearby cities.
One of the commuter municipalities is located near a large city. The unemployment rate, annual income, educational level, and proportion of first-generation immigrants across the four municipalities were close to the national means (Statistics Sweden, 2019).
In 2013, all adolescents in grades 6 (cohort A; 12 years old) and 7 (cohort B; 13 years old) were invited to take part in the programme. Annual data collections were then conducted. Cohort A had data collections in grade 6, 7, 8 and 9. Cohort B had data collections in grades 7, 8 and 9 Both cohorts had an additional data collection when they were in the second year of high school, aged 17.
All parents had been informed by letter (to both parents if they lived separately) about the aims and scope of the programme, its longitudinal character, and their right to decline participation on behalf of their child. The letters had been translated into the home language (32 different languages other than Swedish). If the parents did not decline their child's participation, the child was invited and given the same information in content, although adapted in form to their age. They were then informed of their right to decide for themselves whether to take part, including their right to opt out. Of all 2150 invited by parent letter, 1885 students (88%) finally remained in the programme. Those who opted out (193 on the parent's decision, 73 due to their own decision) did not differ from the study population in terms of gender, immigrant status (studying Swedish as second language), school merits or absenteeism when compared to participants using school register data. Each data collection wave of the research programme was approved by the Ethics Review Board in Gothenburg (Nr. 362-13;2013-09-25;2014-05-20;2015-07-31;2017-07-21).

Acceptability
When we first introduced the CTQ in the study, we hesitated to use the full instrument, especially physical and sexual abuse, with 13-year-olds. In Wave 2, CTQ therefore was only used in part. When using it fully in waves 3-5 however, we never received any critical remarks on these questions, according to protocols from the data collections. In addition, the questionnaires ended with questions concerning how it felt to fill in the questionnaire. In Wave 3 with 390 questions, 88% stated that they felt the questions overall were important, and 98% stated that they had replied honestly to all. They were also given the possibility to comment on specific questions. Of all 1321 respondents, 200 chose to make some comment. Only three (0.2% of responders) commented on the CTQ, and of these, two were critical of 2. For both the clinical and community samples (all waves), the distributions of childhood problems are presented. 3. Intercorrelations between scales and between waves in the community sample were tested using Pearson R. Same-scale correlations were also tested separately for those with and without M/D=0. 4. The internal inter-item consistencies of all scales (Cronbach's alpha) are presented for both samples and for all waves in the community sample. 5. Long-term consistency in one-year test-retests were examined in the community sample both on item and scale levels by comparing waves 2 and 3, as well as waves 3 and 4. Since all items were ordinal, the method to test agreement was the symmetric Gamma (γ). Gamma estimates systematic agreement when corrected for random agreement, varying from 1 in the case of perfect agreement, to 0 which is not more than random agreement (Goodman & Kruskal, 1954). Negative values exist when agreement is less than random. Gamma is the equivalent of Cohen's Kappa applied to ordinal scales and can be interpreted in the same way, i.e., as follows: <0.00 'poor', 0.00-0.20 'slight', 0.21-0.40 'fair', 0.41-0.60 'moderate', 0.61-0.80 'substantial', and >0.81 is 'almost perfect' (Landis & Koch, 1977;Gerdner & Wickström, 2015). 6. Inconsistencies were further analysed for possible directionality in over-versus under-reporting, i.e., whether adolescents tended to change their evaluation of childhood in more positive or in more negative ways, using the Wilcoxon Sign Rank test (Gibbons, 1993). 7. Convergent validity was examined in the community sample in three waves, by same-wave correlations to scales chosen to reflect family climate, parental relations, and the child´s emotional health. For the clinical sample, convergent validity was examined in relation to emotional health. 8. Discriminant validity, i.e., the CTQ´s ability to discriminate between the samples, was tested. 9. Norm data: The percentiles of the subscales are presented for early and mid-adolescence (waves 3 and 5) in the community sample and for late adolescence in the clinical sample.

Measures for testing convergent validity
Convergent validity is tested against relevant scales from waves 2, 3 and 5 (W2, W3 and W5) of the community sample and from the clinical sample, presented with the respective internal consistency (Cronbach's alpha, α).
Measures of family climate: Family cohesion and Family conflict are two scales from Bloom (1985), each consisting waves, mean differences may partly reflect selection. The clinical sample scored higher than the community sample (compared to Wave 5, being most similar in age) on all scales except the M/D scale which was lower.

Intercorrelations between scales in two waves of the community sample
In Table 2, intercorrelations between CTQ scales were studied both within and between the two waves that had data on all subscales and examined in both age cohorts, i.e., Wave 3 at the age of 14-15 years and Wave 5 at the age of about 17 years. In addition, same-scale correlations were also investigated for groups with and without any positive scores on M/D.
Most maltreatment scales were highly positively correlated (Rs >0.40) and in both waves, except for more moderate (although significant) correlations for EN/SA (Rs [W3;W5]=0.17 and 0.15). PN/SA (Rs=0.28 and 0.26) and EN/PA (Rs=0.24 and 0.32). M/D and IU were as expected negatively correlated to maltreatment scales with stronger questions on sexual abuse, and one stated that the questions were too personal and abstained from answering. In waves 4 and 5, ratings were almost identical, but the only comment on the CTQ in Wave 4 was that the respondent would also have liked to have questions on parental physical abuse of siblings, and in Wave 5 none commented on the CTQ. We conclude that the CTQ in general was well accepted. Table 1 presents outcomes on global childhood maltreatment (CTQ total) and the different subscales, separately for boys and girls in four waves over the years from the community sample, as well as for boys and girls in the clinical sample.

Outcomes in community samples (four waves) and clinical sample
In the community population, the difference in scale means over the waves were small for both boys and girls when measured on group levels, although boys scored somewhat more on EN and EA in Wave 2 than in later waves. Since individual participation differed between EA (α:s=0.70-0.75; although tangible to substantial in Wave 3, α = 0.69). The three items forming M/D were examined in full length, as the scale named IU, and then showed acceptable internal consistency (in three of four waves (α:s=0.60-0.69), but poor in Wave 2 (α = 0.44). The most problematic scale was PN with moderate internal consistencies in all three waves (α:s=0.33-0.43). In the clinical sample, however, internal consistencies were even better, substantial for correlations for the full-length IU scale than for M/D. Here, the strongest correlations concerned EN, CTQ total, and EA, while SA and PA were uncorrelated or poorly correlated to IU and M/D. Same-scale correlations over time (W3/W5) were strong (Rs>0.40) for M/D, IU, EA, CTQ total, and EN, but more moderate on SA, PN and PA.
The possible impact of Minimisation/Denial on these diagonal correlations was studied by repeating the W3/ W5 correlations with separated analyses for those with and without indications on M/D. Excluding all cases with any M/D-score increased W3-W5 correlations for the total CTQ and three of the five subscales (PN, EA, SA), with the opposite pattern of decreased correlations for the group that had some positive M/D-scores (CTQ total, EN, PN, EA, SA). However, PA showed changes in the opposite direction, with increased correlation for those with M/D scores and decreased correlation for those without. It should be noted that the n of those with positive M/D scores was close to twice the n of those without.

Internal consistencies
Internal (inter-item) consistencies of all full-length scalesnot M/D -are presented in Table 3.
In all studied waves of the community sample, we found substantial to excellent consistencies for CTQ total (α:s=0.80-0.89), and for four of its subscales -i.e., EN (α:s=0.83-0.88), PA (all α:s=0.79), SA (α:s=0.88-0.94), and Table 2 Intercorrelations (Pearson R) between the CTQ and its subscales in Wave 3 (above diagonal; n = 1302) and Wave 5 (below diagonal; n = 934) as well as same scales between the two waves (diagonal; n = 733); the latter also separately for groups with and without M/  We found that of the 13 W2/W3 test-retest consistencies on item level, eight were substantial, four moderate and one only fair, while of the 28 W3/W4 consistencies, 11 were almost perfect, 14 substantial and three were moderate. On scale level (means of item γ), consistencies were substantial for PN, EN and EA and almost perfect for PA and SA. For IU, consistency started as fair (W2/W3) and increased to substantial (W3/W4). All test-retest consistencies increased from W2/W3 to W3/W4. PN and PA (α:s=0.75 and 0.78), and excellent for all other scales (α:s=0.90-0.96).

Test-retest consistencies
In Table 4, we analysed one-year test-retest consistencies between pairs of waves among the three waves with identical timeframes. The analyses were conducted on item level by systematic i.e., symmetric correlation (γ), shown in the first part of the table. Below at the bottom of the table, the mean item consistencies for all scales are given. In the Table 4 One-year test-retest consistencies of items and scales concerning CTQ subscales, tested with symmetric correlation, i.e., Gamma (γ

Convergent validity
Self-rated exposure for abuse and neglect and idealisation of childhood should be related to judgments about the family climate, about parental relations as well as about individual emotional health (both positive health and emotional problems) for data collected in the same wave. These relations are shown as correlations in Table 5. Expected positive correlations as well as expected negative correlations are interpreted as convergent validity.
Family climate: Family cohesion and family conflict were examined in Wave 2 and three to four years later in Wave 5 of the community sample. On both occasions, cohesion was strongly negatively related to EN and EA, and positively related to IU, while conflict was correlated to the same subscales in opposite directions, as could be expected since they concerned positive vs. negative aspects of family climate. All these correlations were strikingly similar after three to four years. In Wave 5, moderate correlations in the same directions were also found concerning PN, PA, and SA. Thus, relations to family climate factors were consistent over the years and correlations were in the expected directions and varied between subscales in expected ways. Being witness to domestic violence was included in waves 3 and 5, and with similar moderately strong relations to all scales, with a somewhat stronger correlation to physical abuse, also related to violent behaviour in the family.
Still, some item inconsistencies occurred. In the first testretest (W2/W3), inconsistencies showed significant directionality for seven out of 13 items; five of which (items no.  EN]) showed significant directionality, all resulting in more positive evaluations for these items after one year.

Percentiles and categories of severity
The percentiles can be used for comparisons to interpret assessment of individuals in clinical work. These are presented here in Table 6, separately for boys and girls, for two different adolescent age groups of the community sample, i.e., 14+ and 17 years, and for the clinical sample aged 14-25 years, with a mean age of 18. Based on ROC curve estimation of sensitivity and specificity for each subscale against criteria measures from Evaluation of Life-time Stressors, Fink (1998, 2011) proposed cut-offs to create categories of severity: Parental relations: Perceived maternal and paternal support were examined only in Wave 5, with correlations in the expected pattern as described above for family climate, except for non-relation to SA. Parental substance misuse was only included in Wave 5 and found to be moderately correlated to all subscales in expected directions, with a somewhat stronger correlation to PN, which could be expected since that subscale included item 4, i.e., parents being too drunk to take care of the family.
Emotional health: Psychological health was included in all waves and showed a very similar pattern over the years, although with somewhat weaker correlations in late adolescence. For all years, the strongest correlations were negative to EN and EA, and positive to IU. Correlations to PN, PA and SA were more moderate. The Mental Health Continuum (MHC-SF) showed a similar consistent pattern in waves 3 and 5 of the community sample, with the strongest negative correlations to EN and EA, and a positive correlation to IU. In the clinical sample, these correlations tended to be in the same direction (although IU could not be studied), but with a stronger negative correlation to PA. The correlations with PN and SA were weaker here too, and with the lower number of subjects in this sample, they did not reach statistical significance. Emotional problems and the impact of problems are two subscales in the Strength and Difficulties Questionnaire. They were included in community sample waves 2 and 3 with very similar, moderately strong correlations to EN, EA and IU, and weak correlations (or none) to the other scales.

Discriminant ability
All 25 items of the CTQ problem subscales were used in a stepwise discriminant analysis between the clinical population (n=74) and the LoRDIA-population (n[W5]=901). Wave 5 of the LoRDIA population was used to create more  this would indicate that adolescents provide more credible and reliable answers from the age of 14 than before.
Previous test-retest studies on the CTQ-SF have been conducted on adult populations (e.g., Paivio & Cramer, 2004) showing acceptable consistency. Our findings, however, contribute to a growing body of evidence on test-retest consistency in three aspects. First, our test-retest data have a longer follow-up time than most, approximately one year in between. This is only outnumbered by Shannon et al. (2016) who administered the CTQ with 18 months between the times. Second, our test-retest studies are on adolescents, which to our knowledge has not been studied before. Third, starting the analyses on item level, using systematic correlation, and combining with analyses of inconsistencies have not previously been done.
A closer look at the directionality of inconsistencies can give us more understanding. Non-directed inconsistencies indicate that the problem concerns lack of precision (i.e., a problem of reliability), while directed inconsistencies indicate systematic bias (i.e., a problem of validity). Although about 10% of items in the W3/W4 test-retest showed significant directionality, this is a sharp decrease from W2/W3 where more than half of tested items showed directed inconsistencies. The retrospective test therefore seems more valid when conducted at the age of 14-15 than at earlier ages.
Convergent validity was tried in relation to scales on family climate, parental relations, and emotional health. All correlations were in the expected directions, and they varied in expected ways between subscales. The stronger correlation between physical abuse and being witness to domestic violence can serve as one example. This is not surprising since previous research found that physical abuse is common among children witnessing domestic violence (Broberg et al., 2011). Other examples are the expected stronger relation between physical neglect and parental substance misuse, and the expected findings concerning emotional neglect and abuse having a strong relation to all scales on emotional health. The findings thereby provide support for convergent validity. They also showed consistent patterns over time, also from the earliest age.
The discriminant ability was, perhaps, not as good as expected. The CTQ was able to correctly classify almost all respondents in the community sample. Although only half of the respondents in the clinical sample were correctly classified, this was almost seven times better than by-chance. Therefore, we cannot conclude that it discredits the validity of the CTQ.
Means and norm data from the clinical sample and the community sample showed that the clinical sample scored higher on all scales. This is expected since child maltreatment has been robustly associated with problematic alcohol/ substance use in adolescence (e.g., Alvarez-Alonso et al.,

Discussion
This study contributes to a growing body of evidence showing the CTQ to be an acceptable and reliable instrument to be used in both clinical screening and research on child maltreatment. In addition, our study provides new knowledge on the validity and reliability of the use of the CTQ in adolescents.
The CTQ was found to be well accepted among respondents in early and late adolescence from a clinically naïve community sample. Previously, it was found to be accepted in an adult and severely traumatised group with many clinical experiences (Lundgren et al., 2002). Thus, the acceptability of the CTQ has been shown in very different groups.
The means of the maltreatment scales (CTQ total and five subscales) in the community sample showed small differences over time, and as expected, the clinical sample scored higher than the community sample on all scales except for the M/D. Maltreatment scales were highly or moderately positively correlated within waves, while same-scale correlations over 2-3 years (W3/W5) were strong for most scales and moderate on others.
In the community sample, internal consistencies were substantial to excellent for the CTQ total and four of its subscales and from early to late adolescence. The exception was PN, with only moderate internal consistency. The additional IU scale showed acceptable internal consistency in three of four waves. In the clinical sample, internal consistencies were even better and substantial also for PN. The latter finding was not expected since most other studies report lower consistency for PN (e.g., Gerdner & Allgulander, 2009;Karos et al., 2014;Paivio & Cramer, 2004;Sölva et al., 2020).
Test-retest consistencies on item level varied from fair to substantial in W2/W3 and from moderate to almost perfect in W3/W4. Mean one-year test-retest consistencies of scales were substantial or almost perfect, and for all scales, they increased from early to mid-adolescence. As mentioned above, Hardt and Rutter (2004) claim that comparisons of contemporaneous and retrospective accounts obtained in epidemiological/longitudinal studies of non-clinical populations are among the best methods to assess biases in reporting, i.e., invalidity. The same approach was used by Fan et al. (2006) who recommends repeated longitudinal measures to assess validity of behaviours that are unlikely to be externally validated. After one year, it would in general be impossible to remember replies to specific questions, not knowing that they would be repeated in coming data collections. We argue therefore that substantial stability in long-term testretest adds support to the validity of the instrument. Furthermore, since test-retest stability increased over time on item as well as scale level despite the longer recollection time, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. 2016; Hagborg et al., 2020). Norm data are presented for clinical guidance with options to choose between reference groups.
The findings show that the CTQ-SF is acceptable to adolescents in a community population, and that it is valid and reliable for use in adolescent populations. Our findings are in line with previous studies showing that adolescents generally report meaningful and positive experiences of answering questions about child maltreatment (e.g., Chu et al., 2008;Hasking et al., 2015).

Implications concerning early adolescents
The long-term stability in measurement and the convergent validity led us to the conclusion that the adolescents provided valid responses. The findings concerning discrimination ability do not discredit that. However, some caution is advised when applied to early adolescents aged 12-13 years, since the test-retest stability is then weaker.
Lastly, we can evaluate the M/D scale when used in adolescents. The interpretation of the scale signalising caution for having bias is supported by the fact that same-scale correlations (Table 2) increased when those with positive M/D scores were excluded. In addition, there were no test-retest item inconsistencies when they were removed. Most adolescents, however, had some positive M/D scores ( Table 2) and removing them altogether would harm the inference of the study even more. A further look at the inconsistency analyses (Table 5) show that the scale mostly affected was the M/D scale, with 2 out of 3 items having significantly directed inconsistencies. This finding stresses the need for some caution when interpreting the M/D scale in early adolescence, since it is the one with most unstable measurement.
In conclusion. The CTQ has acceptability when used in adolescence and can be trusted to give consistent and valid self-reports on childhood maltreatment that occurred before the age of 12. However, some caution is advised when the CTQ is used in early adolescence up to the age of 14 years, and when interpreting the M/D scale at such a young age.