Introduction

According to the World Health Organization (2022), approximately 1 in 5 women and 1 in 13 men experience some form of sexual abuse before the age of 18. Physical abuse is also common, and it is estimated that between 20 and 60% of children worldwide suffer from physical abuse (Moody et al., 2018; World Health Organization, 2006). In Mexican adolescents, 3.8% of women and 1.2% of men report having suffered child sexual abuse (Valdez-Santiago et al., 2020). Regarding physical abuse, data are scarcer, but some estimates claim that every month 6% of Mexican children suffer severe physical punishment (i.e. beatings, blows with objects) (UNICEF, 2019). A third type of abuse, emotional abuse, has been less studied, but some international data suggest that it may even be on the rise (Witt et al., 2018). All three forms of maltreatment can have significant mental health implications, including increased risk of depression, suicidal behavior, anxiety and substance misuse (Angelakis et al., 2019; Gallo et al., 2018; Gardner et al., 2019; Hailes et al., 2019; Hughes et al., 2017; McKay et al., 2021; Seff & Stark, 2019). Ending child abuse and exploitation is part of the Sustainable Development Goals (SDGs); indeed, Target 16.2 is specifically aimed at eliminating all forms of violence against children (United Nations, 2015).

It has long been acknowledged that child abuse is linked to psychopathology in adulthood (Briere, 1992). Childhood adversity is associated with negative thought patterns, changes in brain regions like the amygdala and prefrontal cortex, and problematic behaviors; these processes may in turn be connected to the sensitization to stress, which increases the likelihood of developing mental disorders (Sheffler et al., 2020). The connection between child maltreatment and depression and anxiety is well-supported, although the evidence regarding alcohol use is somewhat mixed (De Waal et al., 2022; Jaffee, 2017). Some possible explanations for the link between abuse and psychopathology include heightened vigilance towards threat, difficulties in recognizing and processing emotions, and reduced responsiveness to rewards (Jaffee, 2017). Additionally, gender differences have been noted concerning occurrences of child abuse. When it comes to sexual abuse, a higher incidence has consistently been observed among women (Moody et al., 2018; Solís-García et al., 2019; Valdez-Santiago et al., 2020; Vallejos & Cesoni, 2020). Conversely, concerning physical abuse, findings have been mixed, with certain studies showing a higher prevalence in males (Akmatov, 2011; Salem et al., 2020; Solís-García et al., 2019). Similarly, evidence related to emotional abuse is also inconclusive, as some studies have reported a higher prevalence among women (Moody et al., 2018; Vallejos & Cesoni, 2020), while others have found it to be higher in men (Akmatov, 2011).

Given the importance of child abuse, there is a need to develop instruments to measure this phenomenon (Mathews et al., 2020; Saini et al., 2019). One of the main debates in this field is whether to use prospective or retrospective measures of maltreatment (Widom, 2019). Prospective measures involve collecting data on abuse as it occurs, whereas retrospective measures rely on individuals’ recollections of abuse that occurred in the past. Baldwin et al. (2019) found that, in general, there is low agreement between prospective and retrospective measures of child maltreatment, highlighting the need for caution when using either approach. On the other hand, other authors have found that both types of measures similarly predict self-reported mental health outcomes (Gardner et al., 2019; Reuben et al., 2016). All in all, retrospective measures are complementary (and no less valid) tools that provide a broader picture of the iceberg of violence (Baldwin et al., 2019; Reuben et al., 2016); indeed, they may be more sensitive than prospective measures, which tend to underreport (Mathews et al., 2020). Three of the most widely used retrospective measures of child maltreatment worldwide are the Adverse Childhood Experiences (ACE) questionnaire (Felitti et al., 1998), the Childhood Trauma Questionnaire—Short Form (CFQ-SF; Bernstein et al., 2003), and the Conflict Tactics Scale: Parent to Child (CTSPC; Straus et al., 1998). In Mexico, Esparza-Del Villar et al. (2020) developed a scale to measure child abuse and neglect in adults from Northern Mexico. Originally, Esparza-Del Villar et al. (2020) created items to represent four aspects of child maltreatment: physical abuse, sexual abuse, psychological abuse, and neglect. They included 52 items for their initial analysis. After a series of psychometric evaluations, they reduced the number of items to 29, of which 15 reflected experiences of abuse. This measure represents an important contribution to the field, as it helps to shed light on the prevalence of child maltreatment in Northern Mexico.

Despite the existence of the above instruments, the retrospective measurement of child abuse is still an area for improvement. For example, the ACE questionnaire measures each type of abuse with a single item that is answered in a binary manner. This approach presents an important limitation, as it does not consider more specific aspects of abuse, such as its frequency (Lacey & Minnis, 2020). Although this aspect is considered by the CTQ-SF, this instrument also has limitations. In a systematic review, it was found that the subscales of emotional and physical abuse tended to present problems in terms of internal consistency (Georgieva et al., 2021), which undermines the reliability of the instrument. Moreover, as Meinck et al. (2022) note, the CTQ-SF is not in the public domain. This is a limitation in contexts where resources are limited and where open science is increasingly promoted (Beidas et al., 2015). Child abuse or maltreatment assessment tools such as the CTSPC have been typically tested in the North American (Straus et al., 1998) context, incipiently in Latin American (e.g. Reichenheim & Leite-Moraes, 2006), and yet to be adapted specifically to the Mexican population. Finally, the test created by Esparza-Del Villar et al. (2020) presents a factor structure of child abuse (sexual abuse, mild physical and verbal abuse, and severe physical abuse) relevant to young university students residing in Juarez, Mexico, a city typically characterized by higher levels of social violence. Such factor structure is not consistent with the international tripartite classification of child abuse (i.e. sexual, physical and emotional abuse; World Health Organization, 2006). As it can be seen, in this instrument physical abuse is distributed in two dimensions, one of which also contains verbal/emotional abuse items. This prevents us from clearly distinguishing what type of abuse is being measured in these cases. Such shortcoming warrants a robust, conceptually clear and freely usable assessment tool to measure experiences of child maltreatment in the Mexican population. The present study is methodologically justified by the lack of a measure that meets these needs; specifically, one that assesses the three types of abuse (sexual, physical and emotional) with good psychometric properties. Moreover, although child maltreatment is a global phenomenon, the specific ways in which it develops are influenced by the different systems (economic, cultural and familial) in which an individual develops (Herrenkohl et al., 2018). Indeed, what is meant by maltreatment may vary between regions and countries (Dubowitz & Oates, 2018). What is considered maltreatment in one culture might not be seen as maltreatment in another (Korbin, 2022). Maltreatment exists across all cultures, but the way it manifests can differ (for instance, the use of “la chancla” in Latin American families; Vidal, 2014). It is therefore essential to adopt a culturally sensitive approach to properly assess maltreatment experiences (Fontes, 2008). Hence, there is a need for instruments adapted to different cultural contexts, which are diverse enough to include at-risk groups such as sexual minorities.

In Mexico, data on violence against children are scarce, especially concerning physical and emotional abuse (UNICEF, 2019). In part, this is due to the lack of measurement instruments adapted to the local context. In this sense, the contribution of Esparza-Del Villar et al. (2020) was relevant in creating a specific scale for the Mexican population. However, it is necessary to examine whether the scale also works well in other regions of the country (since the original study focused on young people in northern Mexico living in particular conditions of violence); moreover, it is important to consider additional items of emotional abuse, since this dimension was underrepresented in the original version. As mentioned above, both retrospective measures (reported by adults) and prospective measures (reported by children and adolescents) are crucial and serve as valuable complements to each other (Baldwin et al., 2019; Reuben et al., 2016). Additionally, retrospective measures have been found to be highly predictive of self-reported mental health issues, indicating that the memory of maltreatment is associated with psychopathology beyond the mere experience of maltreatment itself (Coleman & Baldwin, 2023). Hence, it is imperative to have a strong, unbiased, and culturally sensitive assessment tool to measure maltreatment experiences in Mexican adults. Therefore, the present study had the following aims: (1) to reformulate Esparza-Del Villar et al.’s (2020) scale by adding emotional abuse items until the expected tripartite structure was obtained; (2) to estimate the internal consistency reliability of each subscale; (3) to examine measurement invariance according to sex; (4) to analyze item-level functioning through item response theory methods; and (5) to obtain evidence of validity related to the association with other variables. We expected significant correlations between the three types of abuse and depression, anxiety, and suicidal ideation. Also, we expected higher levels of sexual abuse in women as opposed to men. No other hypotheses were explicitly stated.

Method

Participants

We worked with two non-probabilistic samples of people who, on a voluntary basis, decided to answer a questionnaire disseminated through social networks. Following classical recommendations (Everitt, 1975; Kline, 2016), the protocol for this study proposed a minimum sample size of between 200 and 300.

Sample 1

It consisted of 405 people. The inclusion criteria were (a) being 18 years of age or older and (b) having lived in Mexico for the last five years. The mean age was 23.36 (SD = 4.54), which reflects that participants were mostly young adults. The majority (72.8%; n = 295) were women (as to their birth certificate). Some 64.2% (n = 260) identified as female, 27.9% (n = 113) as male, and 6.2% (n = 25) identified as non-binary (including gender-fluid); one person identified as “other” gender and six people preferred not to indicate their gender; 39.3% of the sample (n = 159) recognized themselves as part of the LGBTQI + community. The states of the republic with the highest representation were the State of Mexico (15.3%; n = 62) and Mexico City (13.3%; n = 54). Finally, 36.3% (n = 147) of participants reported binge drinking in the past month. This sample was randomly divided into two halves, one for exploratory analysis (Sample 1a; n = 202) and one for confirmatory analysis (Sample 1b; n = 203). This was achieved using the sample() function in R. As will be detailed in the data analytic plan, Sample 1a was utilized for conducting exploratory analyses and developing a factor model, which was then validated through testing on Samples 1b and 2.

Sample 2

It consisted of another 405 people following the same inclusion criteria as for Sample 1. The majority were women (77.8%; n = 315) and the mean age was 23.02 (SD = 4.62). Some 68.1% (n = 276) identified with female gender, 23.5% (n = 95) with male gender, and 5.9% (n = 24) defined themselves as non-binary; two people identified with “other” gender and eight people preferred not to state their gender. Also, 40.0% of the sample (n = 162) identified as LGBTQI + . The most represented states were the State of Mexico (13.1%; n = 53), Mexico City (8.4%; n = 34) and Aguascalientes (7.9%; n = 32). Most participants (56.1%; n = 227) had a college education. Finally, 17.5% (n = 71) indicated that, during the last year, they had had suicidal thoughts “frequently” or “always or almost always”. This sample was used for the final confirmatory factor analysis, as well as for examining the associations with anxiety, depression and suicidal ideation.

Measures

Child Abuse Scale for Adults (EAIA, Spanish initials)

It was constructed from the instrument proposed by Esparza-Del Villar et al. (2020) to measure experiences of abuse and neglect. For the purposes of the present study, only child abuse was measured. In the Procedure section, the development of the items is explained in detail. Initially, there were 24 items that were answered on a 5-choice Likert scale (1 = never, 2 = rarely, 3 = sometimes, 4 = often, 5 = always or almost always). The final version consists of 14 items distributed in three dimensions: sexual abuse (5 items: items 1–5), physical abuse (4 items: items 6–9) and emotional abuse (5 items: items 10–14). The items of each dimension are summed (or, alternatively, averaged), with higher scores indicating more frequent experiences of abuse. The remainder of this study will detail the psychometric properties of this instrument.

Adverse Childhood Experiences Questionnaire (ACE; Felitti et al., 1998)

It consists of 10 dichotomous response items (no/yes) that ask about adverse experiences in childhood and adolescence. For the present study, only the first three questions were considered, which correspond to experiences of emotional, physical and sexual abuse. Each of these was considered separately as an indicator of the absence or presence of the corresponding type of abuse. The version validated for the Mexican population by Nevárez-Mendoza and Ochoa-Meza (2022) was used. In that study, two dimensions were found: family dysfunction (α = 0.45) and abuse (α = 0.70). The internal consistency was found to be unsatisfactory, likely due to the ACE questionnaire assessing a range of diverse adverse experiences that might not be related to a common set of factors. Therefore, in the current study, each item of the ACE questionnaire was individually analyzed, following a standard practice for this test (Lacey & Minnis, 2020).

Patient Health Questionnaire (PHQ-9; Kroenke et al., 2001)

This is one of the measures of depression with the strongest psychometric properties (Persons et al., 2018). It consists of 9 items that are answered on a four-choice Likert scale (0 = not at all, 4 = nearly every day). Participants must indicate how often, during the past two weeks, they presented any of the symptoms listed. This instrument has been used before in Mexican population and showed a good psychometric performance (Arrieta et al., 2017). In the present study (375 participants from Sample 1), reliability was excellent (α = 0.91, ωcategorical = 0.92).

Patient Health Questionnaire (PHQ-2; Kroenke et al., 2003)

The PHQ-2 is a short version of the PHQ-9 and consists of the first two items: “Little interest or pleasure in doing things” and “Feeling down, depressed or hopeless”. The response form is a four-choice Likert scale (0 = not at all, 4 = nearly every day). The PHQ-2 has been applied before in Mexican population with good results (Arrieta et al., 2017). In the present study, it was applied to Sample 2 (n = 405) and obtained an acceptable reliability (α = 0.74, ωcategorical = 0.75).

Generalized Anxiety Disorder Scale (GAD-2; Kroenke et al., 2007)

This is a brief measure of generalized anxiety symptomatology. It consists of the following two items: “Feeling nervous, anxious or on edge” and “Not being able to stop or control worrying”. The response options are the same as those of the PHQ-2 and, in fact, it is often used in conjunction with this test (Kroenke et al., 2009). The GAD-2 has been used successfully in large-scale surveys in Mexico (Gaitán-Rossi et al., 2021). In the present study, it was applied to Sample 2 (n = 405) and showed good internal consistency reliability (α = 0.83, ωcategorical = 0.82).

Binge Drinking

According to the international definition of binge drinking (National Institute on Alcohol Abuse and Alcoholism, 2022), this variable was measured with the following question: “Considering all types of alcoholic beverages during the past 30 days, did you ever have ___ or more drinks in two hours or less?”. The blank was filled in with “five” for men and “four” for women.

Single Item of Suicidal Ideation

An ad hoc item was constructed to measure the frequency of suicidal ideation. This consisted of the following question: “How often have you thought about ending your life during the past 12 months?”. The following response options were provided: 1 = never, 2 = rarely, 3 = sometimes, 4 = frequently, and 5 = always or almost always.

Procedure

Development of the EAIA

First, permission was requested and obtained for the use and potential modification of the original instrument (Esparza-Del Villar et al., 2020). Only the first 15 items of this instrument were used, since they were the ones that measured experiences of abuse (the other 14 items measured different experiences of child neglect). These abuse items were reviewed in research seminars attended by doctoral students in the areas of psychology and mental health. They were also evaluated by a committee of PhDs with extensive research experience in psychology. Based on the feedback obtained, the following two decisions were made: (a) To slightly modify some items to make them more understandable to the general Mexican lexicon (considering that the original version was developed specifically in the northern part of the country); indeed, items 7 and 13 were changed (see Online Resource 1 for details). (b) To create new items to measure emotional abuse, since this important aspect of the construct was only measured by two items in the original version. For this last point, the operationalization of psychological maltreatment by Hart et al. (2018) was used as a starting point. Items were constructed for spurning (4 items), exploiting/corrupting (1 item), terrorizing (3 items) and isolating (1 item) (Online Resource 2). Emotional unresponsiveness and mental health, medical and educational neglect were not considered, as they correspond to neglect rather than abuse. The 24 proposed items were tested in interviews with five people, of both sexes, from different states of the country. Based on their feedback, it was decided to improve the phrasing of some items and to change the response options to clearly indicate a Likert frequency scale.

Data Collection

It was conducted at two points in time (one for Sample 1 and one for Sample 2). The first took place between August 27 and September 1, 2022. A SurveyMonkey form was developed in which the EAIA (the initial 24 items), the ACE questionnaire, and the PHQ-9 were included; the latter two were presented in random order. The second time of data collection occurred between September 10 and September 16. On this occasion, a Google form was used that included the following instruments (presented in a fixed order): single item of suicidal ideation (included in sociodemographic data), GAD-2, PHQ-2 and EAIA. At this time of evaluation, only the final items of the EAIA were applied, which were obtained from the analyses conducted in Sample 1. On both occasions, the survey was shared through social networks (e.g. Facebook and Instagram personal accounts).

Data Analysis

Preliminary Analyses

The mean, standard deviation, skewness and kurtosis of the original pool of items in Sample 1 were examined. The response percentages for each of the Likert options were also calculated. Those items that presented a possible floor or ceiling effect (90% or more of responses in an extreme option) were discarded. This decision was made based on two reasons: (a) the importance for public health, since an event with very low prevalence could be of limited relevance for the general population; and (b) psychometric performance, since a variable with these characteristics can lead to unstable factorial solutions and spurious dimensions (Bandalos & Finney, 2019).

Exploratory Factor Analysis (EFA)

This and the following analyses were conducted assuming that the items behaved as numerical variables, which is reasonable when there are at least five response options (Rhemtulla et al., 2012). Pearson correlations and the unweighted least squares estimator were used. As a preliminary assessment of the feasibility of the EFA, it was examined whether the KMO index reached at least 0.70 (Lloret-Segura et al., 2014). The number of dimensions was decided by (a) the theoretical criterion, (b) a parallel analysis and (c) the reduction of the Bayesian information criterion; in all three cases, it was agreed to retain three factors. The Promin oblique rotation method was applied to the factorial solution. Those items that had factor loadings < 0.50 on their corresponding factor or loadings ≥ 0.32 on some other factor (Costello & Osborne, 2005) were sequentially eliminated. Likewise, the expected residual correlations method (EREC; Ferrando et al., 2022) was considered to detect possible cases of correlated residuals, which could substantially affect the interpretability and reliability of the test (Dominguez-Lara, 2019). All these analyses were performed on Sample 1a (n = 202). The software used was FACTOR (Ferrando & Lorenzo-Seva, 2017) in its version 12.01.02.

Exploratory Graph Analysis (EGA)

Complementary to EFA, an exploratory graph analysis was also performed to examine the number of dimensions. This technique was developed within the framework of network psychometrics (Golino & Epskamp, 2017) and has shown similar or better accuracy than methods traditionally used to determine the number of factors (Cosemans et al., 2022; Golino & Epskamp, 2017). For the present study, a Gaussian graphical model with GLASSO regularization was used. The tuning parameter (ʎ) was set such that the extended Bayesian information criterion was optimized. The hypertuning parameter (γ) started at a value of 0.50 and went down to 0.25 or to 0, seeking that all nodes in the model were connected to at least one other. To determine the dimensions, the Walktrap algorithm was used. The stability of the network was examined with bootstrapping, following the guidelines of Christensen and Golino (2021) and using 2500 simulated samples. The EGA analysis was applied on Sample 1a and only with the items from the final EFA model. In addition to providing greater support for the number of dimensions obtained with traditional methods, the use of EGA has the advantage, from a more substantive point of view, that childhood abuse is understood as a network of interrelated experiences rather than as a set of unobservable latent variables (Breuer et al., 2020; Fonseca-Pedrero, 2018). The statistical package EGAnet 1.2.3, implemented in R 4.0.3, was used.

Confirmatory Factor Analysis (CFA)

It was performed using Pearson correlations and a robust maximum likelihood estimator: MLR (Yuan & Bentler, 2000). Model fit was assessed with a set of approximate indices: the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root-mean-square error of approximation (RMSEA) and the standardized root-mean-squared residual (SRMR). The model was considered to have a good fit if it approached the following values: CFI > 0.95, TLI > 0.95, RMSEA < 0.06 and SRMR < 0.08 (Hu & Bentler, 1999). CFA was performed separately on Sample 1b (n = 203) and Sample 2 (n = 405). In both cases, reliability was estimated through alpha (α) and omega (ω) coefficients. These analyses were carried out with the lavaan (version 0.6–11) and semTools (version 0.5–3) packages, implemented in R.

Factorial Invariance by Sex

For this analysis, the databases of all study samples (1a, 1b, and 2) were pooled. This was done to meet the recommendation of having at least 200 participants per group to perform an invariance analysis (Dimitrov, 2010). Models with increasing equality restrictions were tested sequentially from a base or configural model: a metric invariance model (equal factor loadings) and a scalar invariance model (equal intercepts; Han et al., 2019; Spector et al., 2015). Invariance (or lack thereof) was assessed through two criteria (a) the chi-square test and (b) the change in CFI (ΔCFI), where a worsening of this index above 0.01 would imply lack of invariance (Cheung & Rensvold, 2002). Although the latter criterion is pragmatic and less conservative than the former, it is worth mentioning that its use has been questioned by some methodologists (Putnick & Bornstein, 2016). If invariance was not met, modification indexes (MI) and expected parameter change (EPC) were examined and evaluated together before deciding on a model respecification (Saris et al., 1987; Whittaker, 2012). This revision of the model led to a test of partial invariance, in which case comparison between groups is still possible (Byrne, 2012). In fact, the last step was to test the invariance of latent means; if the latter was not met, we proceeded to examine in which dimensions the difference between sexes occurred and what was the magnitude of this difference. This is similar to the comparison of observed means usually conducted with procedures such as t-tests; however, comparing latent means has the advantage of controlling for measurement error (Müller & Schäfer, 2017). In the present study, we examined the significance of the latent mean difference in the scalar-invariant model, and calculated Cohen’s d based on this value as well as on the latent factors’ variances. It should be noted that, because the groups were markedly unbalanced (nmen = 200, nwomen = 610), the analysis of invariance was conducted following the procedure described by Yoon and Lai (2018); for the present study, 1000 replications were used. These analyses were carried out with the packages lavaan and semTools, implemented in R.

Graded Response Model

In the combined total sample (n = 810), three graded response models (GRM; Samejima, 2016) were fitted, one for each dimension of the EAIA. The GRM is a two-parameter item response theory model that attempts to estimate one discrimination parameter (a) as well as k-1 difficulty parameters (b) per item, where k is the number of response options. With these data, item information curves were also plotted to examine at which levels of the latent variable (θ) each item had better psychometric quality (Furr, 2018). These analyses were performed with the mirt package (version 1.33.2) implemented in R.

Associative Validity Evidence

First, the standardized mean difference (Cohen’s d) was estimated to examine the association between the three types of abuse from the EAIA and each of the three dichotomous abuse items from the ACE questionnaire. This same calculation was performed to analyze the association between the three types of abuse and binge drinking. In addition, Pearson’s correlation coefficients were used to examine the association between the score on each dimension of the EAIA and a set of other measures: the PHQ-9, the PHQ-2, the GAD-2, and the single item on suicidal ideation. These analyses were also conducted in R.

Ethical Considerations

All participants read an informed consent form and agreed to continue with the assessment. This document explained the possible discomfort derived from the study (specifically, emotional discomfort due to remembering traumatic situations). All assessments were anonymous, and no information was recorded that would allow the identification of individuals. At the end of both forms, a list of free psychological services to which interested individuals could turn was included. This study was part of a larger project, which was approved by the Ethics Committee of the Masters and Doctoral Program in Psychology of the Universidad Nacional Autónoma de México (EP/PMDPSIC/0268/2022).

Results

Item-Level Descriptive Analyses

Table 1 presents the descriptive statistics of the initial pool of EAIA items. As can be seen, items 9, 11, 12, 13 and 20 showed marked floor effects (> 90% of responses in the never option). This is also reflected in their remarkably high skewness and kurtosis values. Therefore, these items were dropped in this first phase.

Table 1 Item-level descriptive statistics of the initial pool of items

Exploratory Factor Analysis & Exploratory Graph Analysis

With the remaining items, an EFA was performed on Sample 1a after verifying that the KMO index was 0.91 (i.e., > 0.70). From the first model, it was decided to eliminate items 17, 18, 22, and 23 because they did not have loadings ≥ 0.50 on any factor or because they had significant cross-loadings (≥ 0.32). In the second EFA model, cross-loadings and items with low loadings were no longer observed, but the EREC method detected a possible case of correlated errors between items 14 and 15. We decided to retain item 15 because of its shorter and simpler phrasing. The final model (Model 3) showed a clear structure and no possible correlated errors (Table 2).

Table 2 Exploratory and confirmatory factor analyses of the EAIA

The 14 items resulting from the EFA were modeled with an EGA. As shown in Fig. 1, the algorithm clearly detected three dimensions (sexual, physical and emotional abuse), which corresponded to the EFA findings. The bootstrapping results confirmed the stability of the findings. First, the number of dimensions extracted had a median of 3, as well as a 95% CI of [2.94, 3.06]; in fact, 3 dimensions were extracted in 99.9% of the bootstrap samples. Second, at the item level, it was observed that all items were assigned to their corresponding dimension in almost all bootstrap samples; the lowest value corresponded to item 10, which was assigned to its dimension (physical abuse) 97% of the time.

Fig. 1
figure 1

Exploratory graph analysis of the EAIA (n = 202)

Confirmatory Factor Analysis & Internal Consistency Reliability

The CFA conducted on Sample 1b (n = 203) showed adequate fit, χ2(74) = 128.44, p < 0.001, CFI = 0.97, TLI = 0.96, RMSEA = 0.06, SRMR = 0.06. Internal consistency reliability was adequate for all three dimensions: sexual (α = 0.91; ω = 0.92), physical (α = 0.79; ω = 0.81), and emotional abuse (α = 0.94; ω = 0.94). In Sample 2, these results were replicated. The fit was adequate, χ2(74) = 190.12, p < 0.001, CFI = 0.96, TLI = 0.96, RMSEA = 0.06, SRMR = 0.04; and reliability was good for all three factors: sexual (α = 0.90; ω = 0.91), physical (α = 0.87; ω = 0.88), and emotional (α = 0.93; ω = 0.93) abuse. Factor loadings and interfactor correlations for both CFAs can be found in the last columns of Table 2. As per the observed composite scores (i.e., the sum of each subscale’s items), means and standard deviations were as follows: M = 8.15, SD = 4.03 (sexual abuse), M = 7.37, SD = 3.28 (physical abuse), and M = 14.15, SD = 5.75 (emotional abuse).

Measurement Invariance

When examining factorial invariance (Table 3), it was observed that the restriction of equality of factor loadings did not worsen model fit, so metric invariance was met. On the other hand, the intercept equality constraint did affect the fit according to both criteria (Δχ2 and ΔCFI), so scalar invariance was not met. After examining the MI and EPCs, it was decided to allow the intercept of item 1 (“Someone touched me sexually”) to vary between groups; this allowed partial scalar invariance to be met according to the ΔCFI criterion, but not according to the Δχ2. Given this, MI and EPC were re-examined and it was decided to test a new model in which the intercept of item 7 (“I was hit at home with objects such as belts, flip-flops or boards”) was also allowed to vary freely. This new modification allowed partial scalar invariance to be met according to both criteria. Finally, when examining the invariance of latent means, significant differences were found between both sexes according to the Δχ2 (although not according to the ΔCFI). When the partial scalar model was analyzed in detail, it was found that the difference came from the latent variable of emotional abuse, which indicated a lower mean in men (z = -4.01, p < 0.001). The magnitude of this difference was between small and medium (d = 0.33).

Table 3 Invariance of the EAIA according to sex

Graded Response Model

The GRM applied to the sexual abuse subscale showed that the most discriminative item was item 2 (“Someone made me touch him/her sexually”). In addition, the most “difficult” item (i.e., the one that required a higher level of sexual abuse to provide affirmative responses) was item 3 (“Someone made me have a sexual act (for example, sexual intercourse or oral sex)”) (Table 4). In general, the items of this subscale had a higher psychometric quality at higher levels of the construct, especially around 2 SD above the mean (Online Resource 3). As for the physical abuse subscale, the “easiest” (i.e. least injurious) item was item 7 (“I was hit at home with objects such as belts, flip-flops or boards”), while the “hardest” (i.e. most severe) item was item 9 (“My parents hit me hard on my head”) (Table 4). The scale was most informative at above-average levels, but the presence of item 7 allowed for the measurement of somewhat “minor” levels of physical abuse to be covered as well (Online Resource 3). Finally, it was observed that, unlike the two previous subscales, the emotional abuse subscale had its highest psychometric quality at levels close to the mean; its most informative item was item 12 (“I was made to feel at home that I did everything wrong”) (Online Resource 3).

Table 4 Unidimensional graded response models applied to the EAIA’s subscales

Associative Validity

The association between each subscale of the EAIA and the three abuse items of the ACE questionnaire was examined. As shown in Table 5, the sexual abuse subscale presented a stronger connection with the corresponding ACE item, compared to the other two items. The same was observed for the physical and emotional abuse subscales, although slightly less markedly, suggesting that these two types of abuse tend to co-occur (Table 5).

Table 5 Association between the EAIA’s subscales and related variables

Table 5 also shows the relationship between the three subscales of the EAIA and binge drinking; as can be seen, in all three cases the association was non-significant and of negligible magnitude. As for the relationship with the other psychopathological variables, significant associations were observed in all cases except for physical abuse and anxiety (Table 5). In general, the type of abuse that presented the highest correlations was emotional abuse. Finally, it should be noted that depression measured with the PHQ-9 showed stronger correlations than when measured with its brief version, the PHQ-2.

Discussion

The present study reported the development of a retrospective scale of child abuse in heterosexual and LGBTQ Mexican adults. Consistent with the international literature on maltreatment, three dimensions were identified: physical, emotional, and sexual abuse (World Health Organization, 2006), which were correlated with each other as expected according to the existing literature (Matsumoto et al., 2021). Moreover, these three subscales were partially equivalent between men and women, which justifies their use for comparisons between these two groups. Also, it was observed that the dimensions of sexual and physical abuse performed better in people who had suffered high levels of maltreatment, while the emotional abuse scale performed better at levels close to average. Finally, significant associations were found with other measures of abuse, as well as with a set of psychopathological variables (except for binge drinking).

When examining each of the experiences measured separately, disturbing percentages were found in the study sample. For example, 28% reported having suffered inappropriate touching in their childhood at least “sometimes”, while 13% reported having been victims of rape. This partially coincides with what is reported in the international literature, where the prevalence of child sexual abuse is estimated to be around 20% for women and 10% for men (Moody et al., 2018; World Health Organization, 2022). In terms of physical abuse, the most frequent experience was being hit with belts, flip-flops or boards (49% reported being hit at least “sometimes”). Worryingly, 23% reported being beaten and left with marks or scars; this is in line with the global prevalence of physical abuse, which is estimated to be between 20–60% (Moody et al., 2018; World Health Organization, 2006). In general, physical violence estimates found in this study are in line with prevalence rates of corporal punishment reported in the landmark International Dating Violence Study (Straus, 2010), however, particular acts such as using objects to hit a child (e.g. belts, flip flops, etc.) are remarkably high even by standards of specific corporal punishment measures (Fauchier & Strauss, 2007) and require further investigation in Mexico. Even more prevalent were experiences of emotional abuse, of which the most reported were being made to feel that they did everything wrong (55%) and being teased at home (57%). Due to the limited information on this type of abuse, despite its importance for mental health (similar or superior to other types of maltreatment; Gardner et al., 2019; McKay et al., 2021; Seff & Stark, 2019), the data presented here can be a starting point for a more detailed exploration of the prevalence of emotional abuse in the Mexican population. It should be noted that, despite the similarity of our data to those of representative epidemiological studies at the population level, our study sample was non-probabilistic and estimates can be expected to vary when a national sample is examined.

A strength of the present research is the concordance between the scale generated and the international classification of child abuse (Mathews et al., 2020; World Health Organization, 2006). This overcomes a limitation of the original version of the instrument, in which emotional abuse was combined with some items of physical abuse, which prevented a clear delimitation (Esparza-Del Villar et al., 2020). Also, the response options were reworded to clearly indicate a frequency scale, similar to what is observed in most instruments of this type (Meinck et al., 2022). It is also important to note the good internal consistency found for the three subscales (all of them with values close to 0.90). This contrasts with the CTQ-SF, one of the most widely used instruments worldwide (Meinck et al., 2022; Saini et al., 2019), which has shown internal consistency problems in its emotional and physical abuse scales (Georgieva et al., 2021). Moreover, unlike the EAIA, the CTQ-SF is not a freely available instrument, which limits its use in contexts of limited resources (Beidas et al., 2015; Meinck et al., 2022). Yet, another strength of the present study involves the inclusion of a higher proportion of participants identified as of LGBTQ adherence. Indeed, this is an understudied population, as most measures of child abuse or most forms of family violence tend to focus on heterosexual women and men. Further attention for differences in experiences of child abuse by sexual orientation could shed more light into how individuals, women and men cope with such experiences and their respective mental health implications.

In general, the EAIA showed measurement invariance between men and women, justifying valid comparisons between the two groups (Dimitrov, 2010; Spector et al., 2015). However, there were slight differences between the intercepts of item 1 and (to a lesser extent) item 7. Allowing these two parameters to vary freely, the intercept of item 1 (“Someone touched me sexually”) was found to be higher for women than for men, whereas the opposite was true for item 7 (“I was hit at home with objects such as belts, flip-flops or boards”). This may indicate substantive differences between the sexes; that is, that women tend to report more inappropriate touching, and men more occasions when they were hit with the objects described, regardless of their overall exposure to both types of abuse. On the other hand, it is also possible that these results are due to different response styles between groups (Boer et al., 2018; Han et al., 2019; Spector et al., 2015). Further studies should examine whether the lack of invariance of these two items is replicated and, if so, explore possible causes for it.

Another notable aspect of the present results was the differences between sexes when comparing the latent means of the three dimensions of abuse. Women reported higher levels of emotional abuse compared to men, which is partially consistent with some existing literature (Moody et al., 2018; Vallejos & Cesoni, 2020) but in contrast to others (Akmatov, 2011). As for physical abuse, no significant differences were found; this finding adds to the set of mixed results observed in the literature, where on some occasions higher scores have been found in men (Akmatov, 2011; Salem et al., 2020; Solís-García et al., 2019). Finally, unexpectedly, no significant differences were found in the dimension of sexual abuse, which contrasts with the higher prevalence of this type of abuse observed in women (Moody et al., 2018; Solís-García et al., 2019; Valdez-Santiago et al., 2020; Vallejos & Cesoni, 2020). This may be due to the fact that, as it was not a representative sample, the men who decided to complete the questionnaire were motivated by their high levels of adverse childhood experiences; this would result in an over-representation of male victims of sexual abuse. Another plausible explanation is due to the nature of the sample by sexual orientation; almost 40% of male and female participants in the sample in the present study identified membership to the LGBTQ community, and so prevalence rates for sexual violence are likely to not mimic rates of child abuse normally reported in studies sampling heterosexual individuals. On the other hand, it should also be noted that, in studies conducted with the CTQ-SF, a higher sexual abuse score has sometimes been found in men than in women (Aloba et al., 2020; He et al., 2019). This could be related to differences in self-reported pencil-and-paper measures compared to, for example, interviews such as those conducted in national surveys (Moody et al., 2018). Finally, it is also important to remember that absence of evidence does not imply evidence of absence (Altman & Bland, 1995), so non-significant results should not be interpreted as evidence that both groups have equal means.

This study incorporated a set of novel techniques, such as exploratory graph analysis (Golino & Epskamp, 2017) and the graded response model (Samejima, 2016). The latter allowed us to examine item functioning at different levels of their respective constructs. Specifically, it was found that the sexual abuse subscale measured this variable with greater reliability at higher levels; that is, in people with minimal experience of this type of abuse, the scale might not be very informative. Something similar was observed in the physical abuse subscale, although the presence of the item related to having been hit with belts, flip-flops or boards also allowed us to measure less severe experiences of abuse. In contrast to the two previous subscales, the emotional abuse subscale was more informative at levels close to the mean. This is important, as emotional abuse is the most prevalent type of abuse; therefore, it is necessary to have measures that have good reliability in the general population.

The three subscales of the EAIA showed associations of expected magnitude with the abuse items of the ACE questionnaire (Felitti et al., 1998), as well as with a set of psychopathological variables: depression, anxiety and suicidal ideation. These findings align with the results of Gardner et al.'s (2019) meta-analysis, which reported small to moderate correlations between various forms of abuse and depression. Additionally, their study revealed that emotional abuse exhibited the strongest link with psychopathology, a pattern we have also replicated, with even larger coefficients. This coincides with what has been reported in the literature and justifies interpreting the EAIA as a measure of childhood abuse (Angelakis et al., 2019; Gallo et al., 2018; Gardner et al., 2019; Seff & Stark, 2019). A notable exception was binge drinking, which showed negligible and non-significant associations with all three types of abuse. At first glance, this result seems surprising given the existing literature; for example, Hailes et al. (2019) found that the relationship between child sexual abuse and substance misuse was supported by scientific evidence of the highest quality. Similarly, problematic alcohol use is one of the variables most strongly associated with adverse childhood experiences (Hughes et al., 2017). However, when the existing literature is examined in more detail, a possible explanation emerges. It is possible that the association between childhood abuse and alcohol use only occurs at high levels of alcohol use (e.g., dependence). Indeed, this is suggested by the data from Wang et al. (2020), where child abuse is found to have a clear association with dependence but not with alcohol abuse. Similarly, another study examining the relationship between child abuse and binge drinking also found no significant association (Chen et al., 2017).

Limitations

The present study has some limitations that deserve mention. First, all the measures used were self-reported, so they may be affected by recall bias. This is of particular relevance for the constructs measured, which involve recalling events that are temporally distant from the present (Baldwin et al., 2019). Second, our scale was limited to measuring child abuse, but did not include items measuring the other major aspect of maltreatment: neglect. Future research could integrate our reformulated abuse measure together with a specific neglect measure to thoroughly examine child maltreatment including types of non-violent aggression. Third, the type of sampling used limits the extrapolations that can be made. In fact, both study samples were overwhelmingly female and young adult. Further studies should examine whether the results obtained are replicated in samples with different sociodemographic characteristics. Finally, and related to the above, the limited size of men in the sample prevented an in-depth examination of possible gender differences. Although differences were found between men’s and women’s experiences of emotional abuse, no significant differences were found in relation to sexual abuse, which contrasts with what is expected from the literature. This may be due to idiosyncratic characteristics of the sample, but may also be a product of low statistical power. Further studies should seek to include a greater number of men in the analysis.

Conclusion

A new retrospective measure of child abuse was developed for the Mexican population. It measures all three types of abuse (sexual, physical, and emotional) with high reliability and similar quality in both sexes. Prevalence estimates obtained from our data suggest high levels of abuse, especially of the emotional type. Questions remain open regarding the potential different functioning of some items between women and men. Moreover, there was also an interesting pattern regarding differences in abuse experiences between both sexes. An important limitation of our study was the fact that we worked with convenience samples of limited representativeness. Future studies should seek to overcome this limitation. In conclusion, the EAIA instrument can be used by both clinicians and researchers to obtain prevalence estimates, as well as to examine potential correlates of abuse. Such a measure is important for survivors of abuse, as it allows for expression and potential dialogue on an issue that affects everyone, both individually and as a society: violence against children. Moreover, clinicians providing care for individuals who face emotional distress can enhance their psychotherapeutic approach by identifying past instances of abuse that need treatment.