Theoretical and methodological development in social sciences often occurs in isolation, separate from applied research, which often leads to the development of instruments and measurements that do not adapt to the theory and, consequently, do not provide adequate responses (Leplow, 2017). In the case of intimate partner violence against women (IPVAW), the multicausal explanatory models (e.g. Heise, 2011; Ranganathan et al., 2021) incorporate the category ‘gender’ as a transversal explanatory factor of this type of violence, highlighting, at the same time, the need to consider some other factors in the emergence of this violence, including poverty, disability and/or ethnicity in women’s lives, and pointing out the need for an intersectional approach (Meyer et al., 2022; Walby et al., 2017). Particularly, the impossibility of understanding IPVAW without consideration of gender has been widely documented by feminist social researchers in various disciplines (DeKeseredy, 2016, 2021; Delgado, 2020; Ferrer-Pérez & Bosch-Fiol, 2019;  Johnson, 2011; Zapata-Calvente et al., 2019). Nevertheless, some researchers and instrument developers who do not take into account the weight of this perspective generate measurements and make inferences that, instead of producing greater understanding, perpetuate a bias in the understanding of this phenomenon (e.g., Straus, 2011). Consequently, there is a need to develop measuring instruments more closely aligned with the substantive research questions and, simultaneously, with the psychometric standards of quality. To this end, this conceptual work aims to contribute towards a reduction in the disconnect between applied research and good practices of measurement, specifically in the field of IPVAW. It should be noted that this work does not purport to carry out a critical analysis of the instruments currently in use, but rather to reflect on the psychometric and methodological criteria that should guide research of IPVAW, at all times in accordance with the scientific measurement standards.

The specific objective of this analytical article is to apply the focus on validity according to the current standards of research in the field of IPVAW. More precisely, the novel aspect of this work is a critical analysis from validity evidence, including evidence based on testing consequences (Messick, 1998; American Educational Research Association et al. [AERA et al.], 2014). Through a feminist conceptual framework this work will argue the extent to which research on IPVAW that excludes gender (gender-blind research) fails to meet these standards, and will propose several strategies to develop more adequate measurements as a basis for more fair and comprehensive explicative models in line with the guidelines of AERA et al.

Understanding Intimate Partner Violence from a Gender-based Standpoint

Feminist studies tend to approach gender as an expression of a power struggle; that is, an understanding that in a patriarchal society, gender relations are structured asymmetrically and unequally in terms of power, status, access to personal and social resources, and access to recognition (a symbolic power), such that males perform roles and assume socially dominant positions while women assume subordinate positions (Millet, 1969/1995; Oliva, 2020; Valcarcel, 2009). This analysis gathers the differences and inequalities derived not only from biology but also from early experiences of differential socialization, and stresses the social construction of gender and the representation of gender relations within a historically defined culture and historical period, and with the objective of designing actions (and policies) that change living conditions and power relations (Gahagan et al., 2015; Gamba, 2009; Puleo, 2008; Rodriguez-Magda, 2020).

Within a patriarchal context characterized by asymmetrical gender relationships and described inequalities, violence against women and girls (VAW) constitutes the maximum expression of this inequality, as explicitly noted in the majority of international declarations on the subject (e.g., Committee on the Elimination of Discrimination against Women [CEDAW], n.d./2022; Council of Europe, 2011; United Nations [UN], 1994). It should be noted that United Nations Women (UN Women, n.d./2022) uses the term gender-based violence (GBV) to refer to “harmful acts directed at an individual or a group of individuals based on their gender that is rooted in gender inequality, the abuse of power and harmful norms”, pointing out that gender-based structural differences of power disproportionately position women and girls at risk of multiple forms of GBV, while men and boys who do not adhere to the traditional mandate of the male gender may also be subjected to the same risk (Carlton et al., 2016). In fact, CEDAW (n.d./2022) points out that it could be even more pertinent to use the term “gender-based violence against women” (GBVAW).

The weight of gender-based constraints and the structural situation of inequality they cause is also reflected in the multicausal models explaining these types of violence. Thus, these same models consider that VAW can only be explained from the intervention of a set of specific factors, in the general context of power inequalities between men and women, at an individual, group, national and world level (UN, 2006), which is to say that they understand that gender and gender relations play a key role in the violence carried out by men against women (American Psychological Association [APA], 2009; Harway, 2002) and they link these violent actions to the existence of a profoundly unequal, genderized and dichotomized society (Bosch & Ferrer, 2002; Delgado, 2013).

In short, VAW is currently recognized by different international organizations and by numerous nations as a serious social and health problem of pandemic proportion and a violation of human rights, whose ultimate cause is gender inequality (García-Moreno et al., 2005; Heise, 2011; Jewkes et al., 2015; UN, 2006). Different forms of VAW can occur in every context of a woman’s life including the family environment. In fact, the UN (2006), has previously pointed out that women suffer VAW in the family at different moments of their life (from before they are born and until old age). Among them, intimate partner violence against women (IPVAW) is the most widespread form of VAW globally (DeVries et al., 2013; European Union Agency for Fundamental Rights [FRA], 2014; García-Moreno et al., 2006; Sardinha et al., 2022; Stockl et al., 2013; World Health Organization [WHO], 2013, 2021). This violence includes a range of sexually, psychologically, emotionally, economical and physically threating or coercive acts used against adult or adolescent women by a current or former male intimate partner, without her consent (UN, 2006; WHO, 2012), and implies a pattern of behavior that is used to gain or maintain power and control over an intimate partner (UN Women, n.d./2022).

It is important to underline that IPVAW has been given different terminology, including spousal abuse or domestic violence. For example, the Council of Europe Convention on preventing and combating violence against women and domestic violence (Council of Europe, 2011), in Article 3 uses the term domestic violence (although its prologue acknowledges “that domestic violence affects women disproportionately”); and UN Women (n.d./2022) considers the terms to be synonymous, referring to “domestic violence, also called domestic abuse or intimate partner violence”. In contrast, the UN (2006), WHO (2012), or European Institute for Gender Equality (EIGE, 2022) prefer the term intimate partner violence (IPV), although they specify that: “The overwhelming global burden of IPV is borne by women” (WHO, 2012, p. 1); “Although women can be violent in relationships with men, often in self-defence, and violence sometimes occurs in same-sex partnerships, the most common perpetrators of violence against women are male intimate partners or ex-partners”(WHO, 2012, p. 1); or “[IPV] constitutes a form of violence which affects women disproportionately and which is therefore distinctly gendered” (EIGE, 2022, para. 1). Precisely for this reason, and although IPV is the most common term, the last decade has seen frequent use of the term IPVAW (e.g., Delgado, 2020; Ferrer-Pérez & Bosch-Fiol, 2019; Gracia et al., 2015; Martín-Fernández et al., 2018; Rodríguez & Khalil, 2017), which is, in our view, much more precise and appropriate as it makes specific mention to one of the fundamental characteristics: the fact that this violence is perpetrated primarily against women.

(Mis)understanding Intimate Partner Violence from a Gender-blind Standpoint

Despite evidence that IPVAW is a gender-based violence and the conceptual and explanatory relevance of this fact, previously presented, a significant amount of scientific literature on the topic continues to exclude the gender perspective in both research design and the interpretation of results (Ferrer-Pérez & Bosch-Fiol, 2019). Such malpractice in research is known as “gender blindness”, that is, a “research [that] does not take gender into account and assumes that the research is gender neutral or that potential differences between men and women are not relevant” (Korsvik & Rustad, 2018, p. 10), whether as a result of training, considering that this category is not related to the topic of study, or for other reasons, including a resistance to accepting this analytical perspective (Biglia & Vergés, 2016; Caprile, 2012; García-Calvente et al., 2010). This gender-blind approach may impact all aspects of the research, as we have previously analyzed in detail in several papers (Ferrer & Bosch, 2005; Ferrer-Pérez & Bosch-Fiol, 2019).

As an example, one of the most frequently employed theoretical models for the analysis of IPVAW is the so-called perspective of family violence or family conflict, a model formulated by Straus and his associates (Straus et al., 1980) over 40 years ago to explain IPV, and which continues to be of relevance in certain scientific environments (e.g., Laskey et al., 2019; Straus, 2011) despite massive evidence demonstrating, as previously noted, a very different reality. Specifically, this perspective considers IPV to be a reciprocal, symmetrical or crossover violence, a mutual combat where there is no difference between the amount of violence exercised both by men and women. This perspective does, however, acknowledge that women (due to their particular circumstances as physically weaker, or as targets of violence during their pregnancy) suffer to a greater extent the consequences of this violence, for which they require a greater level of attention (Holt et al., 2008).

Other cases directly analyze IPV, pointing out that it is endured more by women than men, although they specify that “when it comes to perpetration of IPV, men and women tend to show equivalent rates, yet women are more likely to experience physical injury and to use IPV in self-defense” (Chester & DeWall, 2018, p. 55), and propose that this type of violence must be analyzed from the perspective of metatheories of aggression, such as the General Aggression Model (Allen et al., 2018) or the I3 Model (Finkel & Hall, 2018).

Gender blindness in these approaches collides directly with more recent recommendations regarding the need to incorporate gender perspective in the research. For example, the SAGER guidelines (Sex and Gender Equity in Research) warn that “the lack of interest in sex and gender differences can not only be harmful, but also result in the loss of opportunities for innovation” (Heidari et al., 2019, p. 204); the European Commission (2020) points out that “Integrating sex and gender analysis into research and innovation (R & I) adds value to research and is therefore crucial to secure Europe’s leadership in science and technology, and to support its inclusive growth” (p. 7); and the Spanish State Investigation Agency (Agencia Estatal de Investigación [AEI], 2020) indicates that in all cases when a research project may directly or indirectly affect human beings, it is necessary to avoid gender bias because “a science based on gender stereotypes or on masculine patterns and interests, generalized as if they were universal for the population as a whole, is bad science and misses opportunities” (AEI, 2020, p. 1).

In addition to these issues related to the quality of scientific production, it is important to remember that the non-inclusion of the gender perspective in the studies carried out has or may have negative effects, notably at an ethical level (Vázquez, 2014), as pointed out by the Task Force of Psychology and Gender Equality under the General Council of Official Psychology Associations (Grupo de Trabajo de Psicología e Igualdad de Género del Consejo General de Colegios Oficiales de Psicólogos, 2016).

Returning to the particular case of IPVAW, starting from a supposed neutral or gender-blind point of view directly contradicts the very essence of IPVAW as a form of GBV, as made evident when defining VAW and IPVAW. However, and despite this, it is not uncommon, as we have seen, for these forms of violence to be approached from the perspective of theoretical (and, consequently, conceptual) models that do not take this reality into account (Delgado, 2020; Ferrer-Pérez & Bosch-Fiol, 2019). This gender blindness affects two fundamental aspects of research standards. First of all, the exclusion of a variable so relevant as gender for an appropriate analysis constitutes the most serious methodological error regarding model specification (Hair et al., 2010). And, moreover, it constitutes a threat to validity due to the social consequences of using these instruments (AERA et al., 2014). These two issues lead us to the central focus of analysis of this article, which we will develop below: the analysis of construct validity as a basis for developing more appropriate measures, and its use to support more appropriate explanatory models on IPVAW.

Consequences of Gender Blind Research on IPVAW: a Matter of Validity

As we have argued, gender-blindness in research on IPVAW would be scientifically unsustainable from a theoretical point of view. Moreover, beyond the previously reviewed theoretical and conceptual aspects, a fundamental issue in research on IPVAW in particular, and in Social Sciences in general, is that theories are completely dependent on the instruments with which human behavior is observed and categorized, while simultaneously conditioning that which can be observed. This in turn gives rise to the central issue of measurement validity. Using the Standards for Educational and Psychological Testing (AERA et al., 2014) as a reference, validity, understood as a unitary concept, “refers to the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (p. 11). To this effect, the measures employed in empirical research must provide evidence of construct validity so that the inferences based on them can be considered sufficiently valid, and it implies not just the accumulation of empirical evidence but also the theoretical congruence with the construct to be measured (Messick, 1989). Validity as such cannot be reduced to that of a mere statistical concept linked to a measurement tool, but rather it applies to the very root of what is intended to be measured, and how and for what reason it is meant to be measured.

From the perspective of current standards, validity is a process, not a state, which can be understood as the cumulative construction of a sufficiently sound validity argument (Cronbach, 1988; Kane, 1992, 2001, 2002). This process “begins with an explicit statement of the proposed interpretation of test scores, along with a rationale for the relevance of the interpretation to the proposed use” (AERA et al., 2014, p. 11), which necessarily includes the explicit and reasoned definition of the construct it is meant to measure. On this basis, the validity argument is built by integrating various types of evidence in a process that, by its inherent nature, has no end. This process comprises five known sources of supporting evidence of validity, according to the purpose for which the instrument will be used; in the area of greatest concern to us, it refers to the following requirements: 1) the content, i.e., the items, represents all of the relevant aspects of IPVAW; 2) the internal structure of empirical measures can reproduce the theoretical dimensions of IPVAW; 3) the relationship with other variables theoretically related to the construct can be empirically corroborated; 4) the responses to the items in the comparison groups is based on the same response processes, e.g., ensuring that they are not minimized by any given group; and 5) the use of the instrument has no negative consequences for any group in particular. This last source of validity evidence, which has been included in the Standards for over 20 years, assumes a social and political dimension incorporated into the validation process, based on a basic principle of justice and equality regarding all persons potentially affected (Messick, 1975, 1998; Padilla et al., 2006, 2007). In short, both an ethical and a technical judgement involving an analysis of potential negative effects for any given group, as well as a critical analysis of the underlying assumptions in the interpretation of the data and subsequent decisions made (Kane, 2002). Within the scope of IPVAW, these consequential aspects of validity would be particularly relevant given the potential risks that the inadequate use of the instruments could have for a group in a subordinate social position (women, in this case), as discussed in the following section.

According to the arguments-based validity schema, the analysis of validity evidence based on the consequences would begin with the identification of unanticipated effects from using the test for specific persons or groups (Standard 1.25), effects that, in and of themselves, do not constitute proof of invalidity (Padilla et al., 2006), but become relevant in the validity analysis when they “can be traced to a source of invalidity such as construct underrepresentation or construct-irrelevant components” (AERA et al., 2014, p. 21). In other words, they would be warning signs forcing a review of the validity argument as an underlying structure upon which the inferences that have given way to the previously noted consequences rest, redirecting the focus onto the weakest elements of the argument (Kane, 2001; Padilla et al., 2007) in order to discern whether they are due to a real and objective difference, or, conversely, to an incorrect measure of the construct meant to be measured. In other words: are we measuring more or less than what should be measured?

In the first case (“measure more”) we are facing what the Standards refer to as construct-irrelevance; which is to say, “the degree to which test scores are affected by processes that are extraneous to the test’s intended purpose” (AERA et al., 2014, p. 12). To that effect, the scope of IPV has traditionally employed surveys and other self-report procedures that could be affected by various sources of systematic variance external to the construct. One such source, albeit not the only one, could be the phenomenon of social desirability (e.g., Navarro-González et al., 2021; Sugarman & Hotaling, 1997), which could be positively affected by the nature of the questions usually employed within this field of research. Such an effect should be avoided or, at the very least, controlled to the extent possible. However, the specific case of IPV-IPVAW gives rise to the problem of gender blindness, given that the tools used to measure social desirability provide “neutral” scores without taking into account that the responses of women and men are based on a different perception of what it means to use violence (e.g., Ackerman, 2018; Hamberger, 2005; Hamberger & Larsen, 2015; Hlavka, 2014), in accordance with internalized mandates of femininity and masculinity. This would be another type of construct-irrelevant effect that interacts with social desirability itself. It would be incorrect, therefore, to make comparisons without taking these effects into account when interpreting the scores; contaminated scores will inevitably lead to inappropriate inferences.

In the second case (“measure less”) we are facing what the Standards refer to as construct underrepresentation; which is to say, “the degree to which a test fails to capture important aspects of the construct” (AERA et al., 2014, p. 12). In this regard, the IPVAW construct is a complex one that integrates observable conducts as well as various aspects that are more difficult to operationalize but fundamental for the comprehension and interpretation of the measures. Examples in this regard include reasons for violence (instrumental or control-instigated vs. reactive or defensive), the relative positions within the power dynamics (domination vs. subordination), or the internalized meanings resulting from a gender-based differential socialization (legitimization or justification vs. penalization or rejection of the use of violence). And since in practice it is complicated to incorporate all of these relevant aspects in the measurement tools, it would be helpful, at the very least, to make explicit the extent to which the relevant aspects excluded from the measurement limit the inferences of the scores obtained, and also to refrain from generalizations that reproduce the gender-blindness in relevant aspects of the construct. Otherwise, an incomplete operating definition would necessarily lead to biased inferences.

Each of the threats to the validity presented herein (measure more or measure less than what is intended) corresponds to a technical dimension of the validity analysis that should form part of the analysis from the perspective of the Standards. Added to this fact is a second dimension of analysis related not to the instrument itself, but to the consequences of using the instruments for their intended purpose. In short, this involves an ethical judgement based on the basic principle of fairness (Messick, 1975; Padilla et al., 2006; AERA et al., 2014); it ultimately refers to the appropriateness and justice of the decisions made based on the interpretation of the scores, within a predetermined theoretical and conceptual framework.

Within the scope of scientific research, validity and fairness are, consequently, two intrinsically linked imperatives (Standard 3.0) in an evaluative process comprising theoretical, technical, ethical, social and political aspects (Gómez-Benito et al., 2010; Padilla et al., 2007). In this sense, all research related to IPVAW should necessarily conform to the current measurement standards, given that the measuring tools are the cornerstone from which theoretical models emerge and decisions, with far reaching consequences for the persons involved, are made.

Tracing the Consequences of Gender Blind Research Back to their Roots

As previously noted, according to the perspective of the Standards (AERA et al., 2014) it would be necessary to pay special attention to the negative consequences derived from the use of a test, given that they could be the result of factors of invalidity from their point of origin; that is, in the definition and operationalization of the construct that is meant to be measured. (DeKeseredy, 2021; Malbon et al., 2018; Wijsen et al., 2022). Some of the more serious negative consequences might include the risk of underestimating IPVAW, especially among those who tend to use or to justify it (DeKeseredy & Schwartz, 2011), or upholding myths that minimize the importance of IPVAW, blame the victims or exonerate the perpetrator, (Bosch-Fiol & Ferrer-Pérez, 2012; Peters, 2008). Myths that, taken to their highest extent, would even involve disregarding IPVAW itself, since it is considered to be a political construct linked to a specific ideological position (Dekeseredy, 2016, 2021).

As an example, let us consider one of the most widely used measuring tools within the scope of IPV-IPVAW, the Conflict Tactics Scales (CTS; Straus, 1979) and its revised version CTS2 (Straus et al., 1996), which remain in full force as a standard of reference for views based on a mutual combat, despite numerous studies that continue to question the validity of the inferences based on these instruments (e.g., Ackerman, 2018; Dekeseredy & Schwartz, 2011; Delgado, 2020; Lehrner & Allen, 2014; Malbon et al., 2018; Wareham et al., 2022). A common error when using these scales is to interpret the direct scores from the self-reports, contaminated by gender, as an indicator of objective violence, to which is added the non-inclusion in the measuring tool of the most relevant aspects of the construct, as previously noted. This confusion in the use of these scales results in violence displayed as a reciprocal or bidirectional phenomenon, further contributing to the myth that women and men engage in IPV equally and, moreover, that women are greater perpetrators with respect to psychological violence or certain forms of physical violence. This mythology surrounding IPV could have serious implications in matters of prevention and intervention (e.g., Fleming & Franklin, 2021), not to mention that the negation of gender-based violence is rooted in arguments put forward by certain academic, social and political sectors “who are intent on eliminating major legislative efforts to curb woman abuse” (Dekeseredy, 2021, p. 624).

According to the Standards, the negative consequences derived from the use of these scales for a certain group (women), would require the revision of relevant aspects regarding the appropriateness of the measures that might be affecting the inferences and, ultimately, fairness as an issue that is “central to the validity and comparability of the interpretation of test scores” (AERA et al., 2014, p. 63). In keeping with Kane (2001), the first step should be to approach the weaker parts of the validity argument in these and other scales that are meant to measure IPV-IPVAW. In this regard, many are the criticisms and controversies that have been directed specifically at the CTS since the publication of its first version over 40 years ago (Jones et al., 2017; Malbon et al., 2018). Some criticisms were addressed and, to a certain extent, resolved in the revised version of the scales (CTS2). Nevertheless, this second version continues to suffer from the fundamental problem that directly threatens construct validity: it continues to position IPV within a theoretical framework based on interpersonal conflict and the symmetry of the relationship by systematically disregarding gender and inequality of power as a higher-order factor (e.g., Straus, 2010). It is the construct itself, which is meant to be measured, that would not be adequately represented in the measuring tool, as it excludes from the measurement such fundamental aspects as the difference between control-instigated violence and other acts carried out in self-defense or reactively (Johnson, 2011). One consequence of this lack of validity is the invisibilization of the relevant aspects that make IPVAW a very specific violence, different from other types of violence, under the mantle of an apparent political and scientific neutrality (Dekeseredy, 2021; Dekeseredy & Schwartz, 1998; Ferrer & Bosch, 2005).

As a result, the CTS2 cannot be considered an adequate instrument for measuring IPVAW on its own, since it neglects the most fundamental aspects of IPVAW (Dekeseredy & Schwartz, 1998; Ferrer & Bosch, 2005; Ferrer-Pérez & Bosch-Fiol, 2019; Jones et al., 2017). Instead, it is an inventory that measures the frequency of certain acts “in a vacuum”, with no context or antecedents and consequences; acts whose interpretation would be very different as a function of the motives, the meanings attributed, or the position of power occupied by the intervening parties. This narrow definition of violence (Dekeseredy & Schwartz, 2011), implicit in this type of instruments, constitutes a clear example of construct underrepresentation, which threatens the validity of any inference that results in the terms of violence perpetration or victimization, in accordance with current measurement standards (AERA et al., 2014). Indeed, reviews explicitly and systematically exploring possible gender differences by incorporating contextual and motivational aspects in their analysis (e.g., Hamberger & Larsen, 2015; Hamberger, 2005) contain differences between men and women: greater reactive and self-defense behavior among women, and greater proactive behavior and greater probability of being motivated by control among men (Johnson, 2011). Likewise, the balance of consequences is clearly more negative for women, not only in terms of emotions, but also in terms of the probability and seriousness of physical injuries.

In the face of this threat to validity related to the content, we could add, in line once again with the Standards, a second critical aspect relative to construct-irrelevant components, which affects the scores. This involves the question of the differential meaning that men and women can attribute to the same conduct when it is decontextualized in its presentation. In this case, beyond the fact that the scores can be biased as a result of different uncontrolled and uninformed response processes (Standard 1.12), the aspects relative to fairness in the measures are not fulfilled (Standards 3.6 y 3.17). The gender-differences in the way violence is symbolized have been supported in recent investigations where the CTS was applied, including ad hoc questions about the perception and contextual details. There were differences found in the way that men and women experience violence. For example, Lehrner and Allen (2014) claim that “the CTS has the potential to miscategorize some women as violent, as well as to overestimate the frequency and severity of IPV” (p. 484); and Ackerman (2018) found that “males over-reported victimizations [by female partners] at a much higher rate than did females (…) while females over-reported perpetrations [against male partners] at a higher rate” (p. 211). Finally, a recent exploratory study has found evidence of gender bias in some items from the CTS2 scale related to perpetration of violence, having applied a measurement invariance analysis across sex (Wareham et al., 2022).

In summary, the validity of those gender-blind measures would be seriously questioned, and with it a risk that, from these measures, decisions or judgements would be made with serious social consequences. In light of this evidence, one must consider whether it is acceptable to disregard the question of their validity to study IPV-IPVAW. It is a question that affects not only technical matters, but also, and most definitely, ethical matters. This complexity of variables that converge in intimate partner violence introduces even further the need to incorporate an interdisciplinary perspective that would permit delving deeper into the multicausal and multidimensional nature of this type of violence (e.g., Heise & Kotsadam, 2015; Humbert et al., 2021; Ranganathan et al., 2021; Sardinha et al., 2022; Zapata-Calvente et al., 2019).

Some Ideas to Overcome Gender Blindness in IPVAW Measures and Models

As a conclusion, we return to our initial questions and develop our reflection on how to reinforce the validity-fairness binomial in research related to IPVAW.

How to Help Develop More Appropriate Measures for the Construct?

In our opinion, when the nomological net of the construct (Messick, 1998; Shepard, 1997) includes aspects that are not directly observable, the investigation would benefit from a multi-method approach with complementary measures leading to more precise and complete estimates of the construct. The act of incorporating measurements of internalized, gender-based attitudes and beliefs, differentiated between men and women, as well as other factors among the many involved (Heise, 2011), could benefit from methodological advances in terms of more robust procedures for gathering data influenced by factors external to the construct, which would allow for calibrating or qualifying the findings with traditional self-report measures (e.g., Ferrer-Pérez et al., 2020; Gracia et al., 2015). Nevertheless, for the reasons set out, the core of construct validity would be the inclusion of the category gender as a transversal explanatory factor of this type of violence. Such an analytical category would not be susceptible to the operationalization of a singular variable; and much less could it be reduced to a score comparison by gender, which remains a common practice in current research. It refers, in contrast, to the hierarchical power and positions in relationships, in this case between men and women, and directly underscores the differential use of IPV. But this use does not refer exclusively to frequency and intensity; rather, and above all, to its meanings and effects, as well as the role it fulfills. The category of gender, in the end, constitutes an axis connecting all of the variables related to the differential behavior between men and women (Ferrer-Pérez & Bosch-Fiol, 2019; Delgado, 2020).

In this sense, the critical analysis of a measuring instrument to confirm the extent to which the category of gender is incorporated could include, among others, the following questions: 1) Is the construct being measured by the instrument clearly defined within an adequate theoretical framework, or is the gender perspective ignored in the definition? 2) Are all the relevant aspects of the construct specified as defined, or are relevant dimensions from the gender perspective being ignored? 3) Is each aspect of the construct adequately represented with a sufficient number of items and proportional to its relevance? As an example, the Index of Spouse Abuse (ISA) (Hudson & McIntosh, 1981) includes weighted indices as a function of item relevance for construct measurement (representativeness criterium) and, in addition to physical, sexual and emotional abuse, includes other aspects such as social isolation and economic control (exhaustiveness criterium). In turn, this instrument features different versions adapted to different ethnic groups, such as African American women (Campbell et al., 1994; Cook et al., 2003), thus taking into account different manifestations of violence against women as a function of other variables that interact with gender.

Other examples of measurements incorporating gender perspective in research on IPVAW include questionnaires used to carry out WHO multi-country study (García-Moreno et al., 2006) or the FRA survey (2014), and finally, a consideration of the suggestions put forth by Ellsberg and Heise (2005) for formulating these questions. In fact, the WHO’s questionnaire has been widely used in several countries before their good evidence of construct validity and reliability were adequately demonstrated in different researches (Badenes-Sastre et al., 2023; Nybergh et. al., 2013; Schraiber et al., 2010).

On the other hand, with respect to the sources of the construct-irrelevant variance, one aspect that could serve to improve the appropriateness of the measures would be, in our opinion, to broaden the parts involved in the construction of the validity argument (Padilla et al., 2007). In this regard, the use of mixed methodologies allowing the participants to qualitatively characterize and define the meaning of the scores obtained from the scales or other quantitative instruments would entail an intriguing advance, from both a methodological and a substantive point of view (e.g., Lehrner & Allen, 2014).

Certainly, what may be possible in an intervention setting is not always possible in a research context (Jones et al., 2017). Case-specific evaluations that require intervention will permit an exhaustive evaluation that is not always possible in social research. In contexts of social research, it is necessary to obtain information from large samples in which the participants collaborate, responding to the instruments voluntarily. The time required to complete the questionnaires should not, therefore, exceed established limits. Nevertheless, it would be advisable to include short questions that provide an adequate contextualization of the behaviors under study, such as questions that can evaluate the positions of power within the relationship (e.g., “To what extent do you decide how to spend the family money?” or “How many times have your partner made important decisions for both of you without your consent?”), the motives (e.g., “Did your partner ever try to get your obedience by means of any of the mentioned behaviors?” or “Did your partner ever used any of the mentioned behaviors to intimidate you?”) and consequences of the violent behavior (e.g., “Did any of the mentioned behaviors affected your work or studies?” or “Did any of the mentioned behaviors make you feel scared or frightened?”), and the proactive or reactive nature of such behavior (e.g., “How many times do you estimate that in behaving that way you tried to physically protect yourself from yours partner’s assault?” or “To what extent do you estimate that you used any of the mentioned behaviors without having been previously attacked or seriously threatened by your partner?”) (see, for example, DeKeseredy & Schwartz, 1998; Ellsberg & Heise, 2005; Johnson, 2006; Yakubovich et al., 2019).

How to Help Substantiate Explicative Models with Empirical Evidence more Closely Aligned to Measurement Standards?

One of the basic principles of methodology is not to exclude relevant aspects from the explanation of a phenomenon (Hair et al., 2010), even though in conducting research it is not infrequent to exclude relevant variables due to the difficulty of operationalizing them. While in these cases it tends to be a common practice to point out the corresponding limitations, the problem arises with inferences made, as if these limitations did not exist. To draw conclusions about a construct in this way, one whose measurements have excluded relevant variables from the construct, is to breach the principle of validity and lead to, therefore, the creation and maintenance of biased explicative models (Delgado, 2020). Uncritical reductionism, in the case of IPVAW, would ultimately be contributing to maintain this serious social problem. This is not to suggest that there are not different forms of IPV, as well as different complementary approaches to address them. Following the logic of the Standards, this would mean that, whatever the object and the approach, a reasoned definition must be given to the construct, establishing its logical connection with the tools meant to measure it, detailing the use and scope that will be given to the scores, and adjusting the level of inference by taking into account the eventual consequences of the judgements and decisions made with respect to the scores.

The editorial policies of scientific publication could contribute greatly to this end in the same mode of the proposals put forward in the SAGER guidelines for the equitable incorporation of sex and gender in research (i.e., Heidari et al., 2019). A greater insistence on validity evidences and transparency in the publication of research results, and the evaluation of their compliance with the measurement Standards would, without a doubt, promote a change that would contribute to a more rigorous advance in knowledge and a more profound debate between the varied results.

We will conclude by returning to the root of the problem (i.e., the very definition of the construct), in accordance with the needs set forth by Lehrner and Allen (2014): “clarification of the definition and meaning of IPV is essential for improved assessment (…). Different conceptualizations yield different measures and prevalence estimates, with different validity claims” (p. 487). This clarification of the construct should be explicitly stated in the publication of results, allowing for an analysis to determine whether its representation in the measuring tools is adequate and to what extent the interpretations are adjusted to what is meant to be measured (Ferrer-Pérez & Bosch-Fiol, 2019). Likewise, from the perspective of consequential validity this rational-theoretical analysis would benefit from the incorporation of the underlying value judgements and the conceptual and ideological framework in which the study is positioned (Padilla et al., 2006, 2007). Beyond the practical difficulty of applying these considerations, this terrain of values and ideology could also witness the appearance of the first great obstacle, the myth of “scientific neutrality”. While it remains in force in certain academic circles, this understanding of science as a value-free activity has been overtaken by new epistemological and methodological trends (e.g., Cronbach, 1988; Dekeseredy, 2016; Harding, 2004; Messick, 1975, 1989, 1998; Shepard, 1997; Wijsen et al., 2022). Notable among these trends is the critique of the assumptions that lead to identifying “neutrality” with “blindness” in gender-related topics, where numerous science philosophers have demonstrated an androcentric vision (and, as such, not neutral) underpinning this identification (e.g., Harding, 1986). Thus, and in line with current standards, it is not a question of choosing between value-free or value-laden research, but rather of critically analyzing which values entail which consequences for the most aggrieved group, in this case women, and to respond accordingly. In our opinion, taking into account precisely the consequential aspect in the analysis of validity, it would be difficult to justify ignoring or negating the relevance of gender as an analytical category and the multidisciplinary contributions of the feminist or gender perspective to the study of IPVAW (e.g., Dekeseredy, 2016, 2021; Korsvik & Rustad, 2018; Malbon et al., 2018). Accordingly, we advocate the transversal incorporation of the gender perspective, from defining the construct to interpreting the data, and including the research design and the use of the measurement tools in order to avoid, control, or at least, detect possible sources of bias (Ferrer & Bosch, 2005).

Finally, a brief reflection on the scope and limitations of this work, in which we have focused on arguing for the need to include the gender perspective in research related to IPVAW in light of a new scope in this field, namely the measurement Standards and, more specifically, the focus on consequential validity. Every delimitation implies a limitation. Delimiting our object of study as IPVAW, understood as violence against women in heterosexual relationships, also necessarily implies a limitation in the scope of the study. Violence is not exclusive to heterosexual relationships, even though the universal figures for IPVAW have compelled international organizations to consider it a pandemic social problem and a priority. Additionally, the analysis addressed in this work has focused specifically on the category gender as a universal phenomenon of oppression against women, regardless of the diversity in cultural and social manifestations. Other variables that interact with gender, such as ethnicity or social class, add a very relevant percentage of variance to the explanation of the phenomenon. Research conducted from these other perspectives and that include an intersectional perspective will contribute towards gaining a full understanding of this violence in which, as we have argued, gender cannot and should not be ignored. However, beyond these limitations, we can summarize the main idea provided in this article by recalling an editorial published in the journal Nature (2020, p. 196): “accounting for sex and gender makes science better”.