1 Introduction

Perceptions regarding domestic violence (DV) can play an important role in driving legislation to support and protect victims, also by increasing its reporting. If DV is believed to be uncommon, reporting is also expected to be lower when victims do experience DV, as violence is perceived to be a private matter rather than a social problem (Aizer and Dal Bó 2009).Footnote 1 The proper recording of DV episodes, which can be considered consequential to societal awareness of the problem, faces several challenges related to either objective factors (e.g., lack of economic independence and job skills among women, presence of children) or subjective factors (e.g., personal or religious beliefs, underestimation of the severity of the episode, lack of knowledge of the services available to help victims), as mentioned by Bates et al. (2019). Using a unique survey of more than 4500 female respondents in Italy in July 2020, we investigate to what extent subjective factors, as proxied by gender stereotypes, affect perceptions of DV.

We define DV as any type of violence (psychological or physical) against women, which mainly includes, in our framework, partners, although the respondents were free to think of other domestic situations in terms of both the relationship with the abuser (e.g., relatives) and the characteristics of the victim (e.g., adults), and we focus on four dimensions of DV perceptions: Diffusion, defined as the perception of the prevalence of different categories of violence (overall, physical, and psychological) in the respondent’s province of residence; Severity, as the severity of violent behavior and measured with the rank assigned by the respondent to 14 hypothetical behaviors, grouped into five macro categories of social, economic, psychological, physical, and sexual violence; Causes, capturing potential alibis or ex post rationalizations that the respondent chooses in the face of hypothetical DV behavior, referring either to situations deemed transitory, such as economic crisis or COVID-19-related justifications, or to personality features of the abuser, such as childhood trauma or substance abuse; and finally, Appropriate advice, capturing the advice a respondent would give to a hypothetical DV victim, including reporting DV to the police, using services (e.g., helplines, shelters, antiviolence centers), dealing directly with the partner, or doing nothing.

Italy is an ideal case study since there is regional heterogeneity in both gender stereotypes and DV underreporting. According to a 2014 National Institute of Statistics (ISTAT) survey on DV, almost 30% of victims do not disclose their experience to anyone, 12% report the event to the police, and only 4% report it to an antiviolence center.Footnote 2 Consistent with what is also reported in Aizer and Dal Bó (2009), when asked about their reasons for not reporting DV, respondents cite personal beliefs as key factors; these beliefs include the belief that the respondent can solve the issue by herself (40%), the belief that DV is not a serious issue (31%), or the fear of not being believed or being shamed for the violence (7%). The large role played by personal beliefs calls for a deeper understanding of their roots, such as gender stereotypes. As reported by ISTAT, gender stereotypes are widely spread throughout the country; they tend to increase among older and less educated individuals, and there exists a north-south gradient, with southern regions reporting, on average, the highest levels of gender stereotypes (Istat 2019).

Although we do not claim to provide evidence of a causal relationship between stereotypes and perceptions of DV, we are able to construct an individual-level stereotype index and to isolate the link between such individual measures and the average level of stereotypes and cultural attitudes in the area of residence of the respondents, proxied by area fixed effects. In doing so, we can control for time-invariant local historical determinants of gender stereotypes that could otherwise bias our results (Alesina et al. 2013).

According to our results, stereotypes play a minor role in explaining variations in the perceived diffusion of DV, with the exception of psychological violence, a more discretionary category, yet they are strongly correlated with systematically ranking social and economic violence as more severe than physical and sexual violence. By the same token, respondents holding stronger stereotypes are more likely to justify DV as resulting from temporary situations, such as the stress associated with an economic crisis, and less likely to link violent behaviors to long-term problems of the abuser. When questioned about the type of advice that they would give a hypothetical DV victim, respondents holding stronger stereotypes are less likely to recommend formally reporting the event or using available support services and more likely to recommend addressing the problem by talking to the partner. The correlation between the availability of help services across the country and our measure of stereotypes is positive, although not significant, such that the type of advice cannot be driven by services’ actual availability.Footnote 3 Comparing our results to the literature is difficult, as we do not focus on the frequency of reporting of DV episodes but rather on perceptions of severity and appropriate behavior. However, we provide additional context on how to interpret our results in light of the existing evidence, in addition to some simulation exercises to better appreciate the impact of respondents being on the lower or upper part of the stereotype distribution on the outcomes of interest.Footnote 4

Although it is difficult to completely rule out the role of unobservable factors that contemporaneously affect individual stereotypes and perceptions of DV, we address this potential limitation in different ways through a rich set of robustness checks. Among these, we control for regional fixed effects, netting out the time-invariant cultural environment and also notorious criminal cases that took place in the respondent’s area of residence. Then, following the seminal work of Fernandez and Fogli (2009), by including region-of-birth fixed effects, we check whether the effect of individual stereotypes could be absorbed by the average stereotypes of the respondents’ birth region. We also control for media exposure. The media propose role models that directly affect both stereotypes and perceptions (e.g.,La Ferrara et al. (2012); La Ferrara (2016)), and at the same time, they are a relevant source of information on the diffusion of DV through the news they share daily. We use weekly data on exposure to television from the end of February 2020 to the end of June 2020 at the regional level by age group. We focus on this period because it overlaps with the period of stay-at-home orders issued due to the COVID-19 pandemic. Since the stay-at-home orders were particularly strict in Italy (Caselli et al. , 2022), people were naturally more exposed to information via television than from other sources, although they could access other sources of information for which we do not have proper controls. Finally, in addition to adding further controls such as the time spent compiling the survey, we run both the test in Altonji et al. (2005) and the one suggested by Oster (2019). Overall, our baseline estimates remain robust.

In the additional robustness checks, we use alternative measures of individual stereotypes. Some of the questions we used to construct our preferred measure could be subject to social desirability bias. Since this implies, in our context, that respondents would tend to be less outspoken (underreport) about the gender stereotypes they truly hold, it is important to be aware of this problem to provide a correct interpretation of our results. We thus generate alternative individual measures on the basis of other questions that are less subject, in our view, to social desirability bias. While the baseline results are confirmed, these alternative measures, which do not completely rule out a social desirability effect, support the idea that our preferred measure captures a lower bound of the true link between individual stereotypes and DV perceptions. The use of principal component analysis to generate our stereotype indexes does not change the outcomes.

As a final step, we examine the mechanisms that explain why, in relation to perceptions of DV severity, respondents with stronger stereotypes classify social and economic violence as more severe than physical and sexual violence. We look at acceptable behaviors between a couple (the acceptability of an occasional slap in a relationship) and at potential mentions of victims’ behavior in justifying violence (whether a rape victim can be blamed if they were drunk at the time). Stronger stereotypes are associated with greater perceived acceptability of a slap in a relationship and a higher level of blame placed on a drunk rape victim. The results, robust across specifications, appear to point to a victim-blaming mindset when stereotypes are stronger.

Overall, our findings point to the relevance of stereotypes as a driver of perceptions of the acceptability of violent behaviors. Our results imply that providing role models, sharing information, and changing cultural references, for instance, through targeted media campaigns, might improve awareness of the problem of DV and increase its report rate. However, our contribution does not address either the link between stereotypes and crimes against women or the direct link between stereotypes and the level of reporting of DV. Rather, we argue that perceiving a behavior as deviating from what is acceptable in a romantic relationship is the first step in dealing with DV, both formally and informally.

The paper is organized as follows. Expectations regarding the drivers of DV perceptions are listed in Section 2, while Section 4 provides detailed accounts of the outcomes of interest and the measure of stereotypes and their descriptive statistics. Section 3 provides information on our survey, and that on Section 5 introduces the empirical strategy and discusses the results and robustness checks. In Section 6, we address the victim-blaming mindset to explain some of our results. Section 7 concludes the paper.

2 Expectations on DV perceptions and survey design

Knowledge of the drivers of DV perceptions is indeed of primary importance for defining appropriate policies to facilitate broader awareness at the societal level: we could expect that a higher awareness, among other things, would be correlated with a lower incidence of the most severe cases, either because victims might look for help at the first sign of an abusive situation and when they do they are also taken seriously or because abusers might more often look for help. The individual perception of DV might be affected by multiple factors, and in what follows, we focus on 3 groups of factors: personal experience, stereotypes, and observable (individual or contextual) characteristics.

2.1 Personal experience: the victim and the abuser

Although it has been shown that the acceptability of DV is positively correlated with knowing a perpetrator but negatively correlated with knowing a victim (Gracia and Herrero 2006), controlling for a respondent’s personal experiences with respect to DV is of utmost importance. However, defining personal experiences presents several challenges. In principle, personal experience could refer to either the respondent herself or her own network (i.e., friends or relatives) and to either the past or the present. For instance, the sociology literature identifies experiences in the family of origin during childhood as one of the most relevant elements affecting DV perceptions, contributing to the intergenerational transmission of violence, the so-called theory of social learning (Mihalic and Elliot 1997; Boyd and Richerson 1995, 2005). The logic is that since learning occurs via observation, exposure to DV in childhood is reflected in a perceived lower gravity of DV behaviors at older ages (Kernsmith 2006). However, even personal experience later in life contributes to shaping individual perceptions (Kernsmith 2006; Copp et al. 2019).

To capture the dimension of personal experience, in our survey, we proceed in two ways. In the first part of the survey, we were straightforward in asking questions that define the nature of DV according to gender (i.e., questions 15–16, specifically referred to violence against women) and that required assessing the phenomenon at the territorial level (province of residence).Footnote 5 Then, we proxy personal experiences of DV by the variable Know a victim. The variable is a dummy equal to 1 if the respondent declares that she knows a victim of DV. The question was intentionally asked indirectly so that respondents did not have to identify themselves as DV victims even if they had previously been victims or were victims at the time of the survey. The rationale behind this choice was to reduce concerns about underreporting that emerge when similar questions are asked directly. DV could be underreported even in survey data, as there is a large emotional cost associated with it (Overstreet and Quinn , 2013).Footnote 6 Consistent with this approach, for the rest of the survey, we did not give any specific benchmark to the respondents in terms of, for instance, the identity of the potential victim they would give their advice to.Footnote 7

By the same token, although referring to the partner in some questions (i.e., 54, 55, 62, 63), we did not provide specific attributes of the potential abuser. The identity of the potential abuser might be relevant under two dimensions: their gender and their relationship with the victim.Footnote 8 As reported by the National Institute of Statistics (ISTAT) on the basis of official data from the Dipartimento per le Pari Opportunità, more than 90% of victims who call the helpline for gender violence (1522) are women (with the share rising from 93% in 2013 to 98% in 2020).Footnote 9 Descriptive evidence based on ISTAT data on crimes according to the gender of the offender and the victim show that, in 2016, 97% of the offenders investigated for rape, 93.5% for gang rape, and 81.9% for stalking were male. If we look at the data released by the Ministry of the Interior (in charge of public safety and law enforcement in Italy) based on the 2019 reported crimes on rape and stalking, the number of reported rapes in which a female was a victim was 13 times that of crimes with a male victim, while in the case of stalking, the ratio was 68 to 1. Because murders and rapes receive much media attention and they tend to involve female victims, it is quite likely that respondents would imagine their advice going to another woman, even when the self-reference mechanism is not in place. Overall, we do not have strong enough evidence to state that the type of advice, for instance, would differ based on the gender of the abuser. It would be reasonable to think that a respondent with higher levels of stereotypes might be much less inclined to accept that a male could be a victim of DV perpetuated by a woman, exactly because of those stereotypes.

Finally, there are no public and systematic data on the relation between the abuser and the victim involved in DV. However, based on 2018 aggregated data on the distribution of murder victims according to the relationship between the victim and killer and to the gender of the victim, it appears that while the majority of murders involving male victims were committed by unrelated killers, the overwhelming majority of murders involving female victims were committed by partners/ex-partners, or relatives, with the majority committed by the partner/ex-partner (Parliamentary-Committee , 2020). If this is true for the entire population of murders, we could expect an even stronger incidence of these relationships in DV.

We might expect that motivation for a lower/higher perception of DV would depend on the type of relationship between the abuser and the victim, but we do not have strong expectations on the direction of the perception (less or more) based only on this relationship.

To leave the door open to different types of relationships, in questions on hypothetical advice, we mention the partner, and we also mention personal family matters.

2.2 Stereotypes

By implicitly defining which actions are considered appropriate by society, stereotypes may play a significant role in individual perceptions of DV (Mulla et al. 2019), both because they affect what is acceptable and because they can affect the frequency of certain behaviors. For instance, according to a bargaining model approach, when traditional gender norms are stronger, women are less empowered, which means that they have fewer credible outside options to stop violence (Aizer 2010; Tauchen et al. 1991; Farmer and Tiefenthaler 1997). Hence, stronger stereotypes lead to both more violence and more victim blaming (e.g., Anderberg et al. 2016; Tur-Prats 2019b). However, according to male backlash theory (MacMillan and Gartner 1999), women’s empowerment may have the opposite effect by challenging stereotypes, causing men to feel that their breadwinner role is threatened (Dobash and Dobash 1979; Bloch and Rao 2002; Zhang and Breunig 2023). Hence, a lower degree of emancipation would lead to less violence and lower perception (Agüero 2019).

However, ex ante, the direction of the correlation between individual stereotypes and individual perceptions of DV is not obvious. In discussing interactions between women and men, Glick and Fiske (1996) classified two types of sexism: benevolent sexism, according to which women are fragile creatures who need men’s protection, and hostile sexism, based on an antagonistic attitude classifying women as manipulators who seek control through seduction. Glick et al. (2000) provided evidence that in certain countries, hostile sexism is stronger among men than among women, while women are more accepting of benevolent sexism, especially in contexts in which there is more gender inequality. The more prevalent benevolent sexism is, the higher women’s belief that society is fair to them. According to the logic of benevolent sexism, because men should take care of women, women are to blame for any type of violent male behavior, since violence against women should not be accepted. At the same time, stronger stereotypes (benevolent sexism) might favor a victim-blaming attitude when facing episodes of gender violence. Women who become victims of violence might be perceived as deserving it since society is believed to be fair to them and men are supposed to protect them.

González and Rodríguez-Planas (2020) recently found a strong and robust correlation between stereotypes and DV incidence and intensity, relying on the assumption that stereotypes among first- and second-generation immigrants can be effectively proxied by measures of gender equality in their countries of origin.Footnote 10 The expectation is that stronger gender stereotypes might induce individuals to have lower perceptions of the severity of certain behaviors or, conditional on perceiving the behavior as inappropriate, to resort to patience and commitment to try to overcome the problem. The role of stereotypes in the reporting of violence has been the object of several studies, with ambiguous findings.

To address the link between stereotypes and the perception of DV, we use a female-only sample, as in Arenas-Arroyo et al. (2021). The approach is similar to that of Alesina et al. (2021), who investigated the perception of racial gaps in US society and decided to oversample African American respondents. While we acknowledge that the male perspective is very important, to define the societal perception of DV, focusing only on females presents several advantages, mainly due to the framing of questions. As stated in Section 2.1, self-reference questions are easier to interpret when restricting the analysis to the gender most likely to be victim of DV. At the societal level, the levels of stereotype of female and male are highly correlated (see Section 4.1), so we might expect a similar effect on the perception of DV, while a difference, if any, could have emerged regarding the effect on potential advice.

2.3 Individual and neighborhood characteristics

The last two groups of causes affecting perceptions of DV are individual and neighborhood characteristics. Among the former, education, employment status, marital status, and the presence of children are considered factors that influence the acceptability of violence but also knowledge of the options available to victims. Similarly, neighborhood characteristics influence individual learning processes: authorities in areas with higher poverty rates have a lower ability to maintain basic social control, and thus, residents in these areas are more likely to justify violent acts as a way to resolve conflicts (e.g., Button (2008)). Although, on average, neighborhood characteristics are generally found to play a minor role relative to that of individual characteristics, they still matter in our context, and we proxy them with unemployment and the teen-pregnancy rate at the province-of-residence level.

3 Our survey

Our survey was conducted by a survey company (Demetra), which administered the questionnaire and guaranteed that the sample of female respondents was representative of women aged 20–65 in the northern regions and the macroareas of central and southern Italy. We focus on the working age bracket so that we can additionally control for employment status once estimating the effects of stereotypes.

The survey includes 7 sections covering the socioeconomic profile of the respondent (e.g., marital status, education level), the respondent’s and her partner’s (if any) working status, her beliefs regarding gender roles, her working conditions during the stay-at-home orders, her perceptions of violence, her advice to hypothetical DV victims, issues related to couple relationships and the stay-at-home orders, and her perceived severity of different DV behaviors.Footnote 11 Alongside the different parts of the survey, we assessed the changes due to the first wave of the COVID-19 pandemic (e.g., change in occupational status, financial distress). The first wave of the pandemic started at the end of February 2020 and ended by the beginning of June 2020, and the survey was administered from July 15 to July 31, 2020. During an overall period of ten weeks from the official start of the pandemic, very strict measures were introduced to contain the emergency, such as quarantining municipal clusters, imposing strict restrictions on people’s movements, closing schools and shops, and halting industrial activities for an overall period of ten weeks.Footnote 12 This contextual situation was obviously correlated with higher levels of distress and more uncertainty about the future among households, which might have affected individual perceptions of DV. Northern regions were particularly affected by the first wave.

The questionnaire was administered online through email invitations, and respondents were invited to participate via computer-assisted web interviews (CAWIs). On average, it took 20 min to complete the 66-question survey. Each respondent received an economic incentive to answer the questionnaire– 0.1 euro per minute, so for a 20-minute interview, they received approximately 2 euro. All details on the sampling strategy and response rates are explained in Online Appendix B. The final analysis was performed with a sample of 4574 women. Following the spirit of Alesina et al. (2021), who investigate perceptions of racial gaps in the US, underrepresenting the southern part of the US where the racial gaps are stronger than elsewhere, we were concerned that the representativeness of our sample was particularly accurate for the northern part of the country since such regions are traditionally considered more emancipated in terms of gender norms, with higher female involvement in the labor market. If any, we slightly oversampled the population of some regions in the north. In Table B1, we show the distribution of the female Italian population in the seven regions in the north and the macroareas of the center and the south, which is consistent with the distribution of the same population in our survey. The average age appears balanced, but for the subsample in the south, the survey population is slightly younger than the actual population. This is consistent with the method used to sample (CAWI). The daily use of the internet is more common in younger cohorts, and since in the south, the level of daily use of the internet is lower than that in the rest of the country, although levels of internet access are equal, older cohorts are slightly less represented. However, if any, this element only waters down our results, since younger cohorts hold fewer stereotypes than older cohorts (see Table B2).

We then check the representativeness of our sample using another survey run by ISTAT along a subsample of control variables that, in principle, play an important role in shaping both stereotypes and the equilibrium in adult relationships, such as employment status, level of education and religiosity, marital status, number of children, and the dimensions of residence(i.e., indirect proxy for wealth). In Table A1, we also provide the main correlation coefficients for the controls used in the empirical analysis. As it appears, our measure of stereotypes is positively correlated with having children, being married, defining oneself as a religious person and living in larger accommodations, while it is negatively correlated with a higher level of education and being younger. In Appendix B, we provide further explanations, and in Table B3, we report the distribution of the selected variables in our sample and in the ISTAT samples. Our respondents are slightly more likely to be employed and slightly less likely to be religious, although the religiosity is differently defined in the ISTAT questions.

4 Stereotypes, outcomes, and descriptive statistics

4.1 Stereotypes

We provide a new measure of stereotypes at the individual level based on the extent to which respondents agreed or strongly agreed with 8 statements, as reported in Table A2 and in Appendix B2, question 30. The statements either imply clearly stereotypical assessments (for example, that men are better in leadership roles), borrowed from the World Values Survey (WVS), or simply capture the burden of some situations experienced by women, such as the distress faced by women who earn more than their male partner. We construct an index based on the z score of the responses to the statements; for each respondent, the z score is equal to the average of all the standardized replies. This index has a zero average by construction and a standard deviation of 0.532.

The advantage of our index of stereotypes over others available only at the area-of-residence level (e.g., the regional level) is that it allows more variation within and between geographical areas. We externally validate our index with two measures of stereotypes. The first is based on a 2018 survey on gender violence and stereotypes conducted by ISTAT.Footnote 13 ISTAT released aggregated regional data; we exploit the regional variation in the share of respondents who agreed or strongly agreed with three statements, as explained in Table A2. As we do for our main index, we create two z score indexes based on the standardized regional shares, one including the share of female respondents only (ISTAT females) and one including both female and male respondents (ISTAT all). As a second measure of stereotypes, we use regional-level data from the 2018 Labor Force Survey, run by ISTAT, and construct a ratio of the employment rate of women relative to that of men, with both rates measured for the age group 54 and older. Women generally have higher employment rates at the beginning of their working careers but tend to leave the labor force after having children. Higher values for this ratio represent higher levels of gender equality in labor market participation, which we use as a proxy for lower stereotypes.

We aggregate our measure of individual-level stereotypes at the regional level to increase the comparability of the different indexes. The correlations among them are strong and in the expected direction: as shown in Table A3, there is a positive correlation with the indexes created by ISTAT and a negative correlation with the employment ratio.Footnote 14 By decomposing the correlations by macroareas (north vs. center and south), we can observe that the correlations between the stereotypes in our sample and the two indexes from the ISTAT surveys are even stronger, especially in the north.Footnote 15

Finally, we use the different available measures of stereotypes to check whether they are negatively correlated with the presence of antiviolence services. This is an important check since both a low perception of DV and a potential low propensity to advise for the use of formal help channels might be a function of a low presence of services. The results of a simple regression in Table A5 show that the direction of the correlation between the number of antiviolence centers and stereotypes indexes is positive and not negative. However, as shown in Table A6, respondents with higher stereotypes are less likely to be aware of the existence of support services, which, however, are not less likely to be present in their residence area. In the survey, we include questions on the knowledge of the helpline number, which was highly advertised during the stay-at-home periods. We then estimate the impact of stereotype on an outcome, which is 1 when the respondent is aware of the existence of any of the following three services: helplines, antiviolence centers, and shelters. The results, cleaned for the regional fixed effects, show that women with above-average levels of stereotypes are 4 percentage points less aware of any types of antiviolence services (a 4% of the baseline average awareness of 86%).

4.2 Outcomes

We define the variable on the perceived diffusion of DV in the respondent’s province of residence with three dummies: Overall Violence, Physical violence, and Psychological violence. Each dummy is equal to 1 when the respondent reports that she considers DV (overall, physical, or psychological) to be widespread in her province of residence and is equal to 0 otherwise. The questions specifically refer to violence against women.

The measure of the perceived severity of violent behaviors is the result of an exercise that respondents had to perform: ranking a list of 14 given behaviors from the least (1) to the most (14) severe.Footnote 16 Given the design of the question (i.e., respondents were asked to rank behaviors by degree of severity), we expect that choosing a lower rank for a specific behavior mechanically implies imposing a higher rank on another behavior. Overall, the listed behaviors reflect 5 dimensions of DV: Social (i.e., control over social media, control over friends), Economic (i.e., control over work, control over expenses), Psychological (i.e., quarrels, threats, insults, humiliation, recrimination, spite), Physical (i.e., throwing objects, battering), and Sexual (i.e., sex requests, sexual violence). At the individual level i, the score assigned to each dimension (\(Dimension_{i}\)) is defined as the average individual rank (\(rank_{ni}\)) assigned to each of the N relevant behaviors associated with that dimension, as in Eq. 1.

$$\begin{aligned} Dimension_{i}=\frac{1}{N} \sum _{n=1}^{N} rank_{ni} \end{aligned}$$
(1)

By construction, a respondent’s rank assigned to each behavior, \(rank_{ni}\), is a discrete measure that ranges from 1 to 14. For instance, if respondent A ranks throwing objects as 6 and battering as 10, the score assigned to physical violence (\(Physical_{i}\)), which by definition combines the two, is equal to 8 (the average of the individual ranks, \(rank_{ni}\)). Figure A1 graphically presents the individual- and provincial-level ranks for each dimension, while Fig. A2 reports the specific rank of each of the 14 behaviors. There is a marked difference between the scores at the individual level and the scores at the provincial level. Individual-level scores exhibit only marginal differences in the different dimensions of violence, while the provincial-level scores exhibit a clear expected gradient in the perceived severity of violent behaviors, with the social and economic dimensions placed at the bottom of the distribution and psychological, physical, and sexual violence at the top. These graphs also show a large dispersion of the perceptions of severity at the individual level, which is averaged out at the provincial level.

Causes of violent behaviors are defined as potential justifications or rationales used to explain hypothetical DV episodes. With respect to the 14 previously listed violent behaviors, respondents are asked to choose the most likely explanations among a preset list of potential causes. The causes are grouped into three categories: COVID-19-related factors (i.e., limitations on social life, inadequate living space, inadequate access to IT devices), economic distress not necessarily due to the COVID-19 crisis (i.e., stress due to work uncertainty, financial problems within the relationship), and factors associated with less temporary situations and mostly related to the personality of the abuser (i.e., issues balancing family and work, addiction to or abuse of toxic substances, a need to feel superior to one’s partner, and negative experiences as a child). The underlying purpose of asking this question is to differentiate between justifications associated with temporary circumstances and those associated with long-term problems that should not be considered situation-specific. To facilitate interpretation, both of the outcome variables on the rankings of behaviors and justification scores are defined as dummies that take a value of 1 when the index is above the sample median of each distribution.

Finally, we analyze the respondent’s perceptions of what constitutes appropriate advice for a DV victim.Footnote 17 The responses are grouped into four categories: Reporting (reporting the incident to the police), Use of Services (referring the victim to an antiviolence center, calling the 1522 helpline, or seeking other services), Partner (talking with or leaving the partner) and No reaction (offering no advice, offering no interference, or dismissing the concern). Importantly, we are not evaluating the effectiveness of alternative types of advice but comparing the drivers of the respondents’ preferences for each of the suggested options. We consider hypothetical advice a potential behavior based on the perception of DV.

Detailed outcome definitions descriptive statistics are available in Table A7, while in Table 1, we compare the mean outcomes for individuals with weak levels of stereotypes (Weak Stereotypes), and strong levels of stereotypes (Strong Stereotypes) and we calculate the p-value of the difference between the former and the latter.Footnote 18 The direction of the difference is positive for the measures of perception of diffusion, which means that respondents with weak levels of stereotypes perceive the level of violence to be higher, especially psychological violence. By the same token, respondents with weak levels of stereotypes perceive the severity of physical and sexual violence to be higher (positive difference-see column 3), and they tend to see psychological problems of the abuser as explanations for their violent behaviors (Alibi abuser). Finally, respondents with weak levels of stereotypes recommend more formal tools to deal with violence (i.e., Reporting and Use of Services).

Table 1 Outcomes averages by level of stereotypes

4.3 Controls

Table A8 reports a detailed definition of each variable used in our analysis and presents the descriptive statistics of the sample. Consistent with the expectations set in Section 2, controls are grouped into 4 categories: personal experience, socioeconomic controls, COVID-19-related controls, and neighborhood characteristics.

Average statistics show that 38.8% of respondents declared that they know a victim, a percentage in line with the estimated incidence of domestic violence based on other surveys (FRA 2015). When asked about knowing a victim experiencing DV during the health emergency, 10% of the sample answered affirmatively. Respondents are mainly Italian nationals, 47% were married, 60% had at least one child, and 61% self-identified as religious. They are almost uniformly distributed across 4 age groups, with 31% younger than 35 and 19% older than 55. A total of 37% live in homes of more than 100 sq. m. (which captures both the level of wealth and the severity of constraints imposed during the strict stay-at-home orders). A total of 78% of respondents have at least a high school degree, and 35% declared that they were not employed in February 2020 (before the first wave of the COVID-19 pandemic). A total of 76% of the respondents considered it highly important to comply with the stay-at-home orders. Almost 30% had struggled with serious financial problems due to the pandemic, 34% feared that they would lose their job because of the pandemic, and in 17% of the cases, the respondent experienced distress due to cohabitation because of the restrictions.

Finally, since considering DV a private matter (e.g., refusing the involvement of other people in a relationship) is among the reasons a victim might not report it to the authorities, we also use a question about respondents’ networks. We asked about the type of network that respondents rely on when they have to share positive (e.g., promotions or good news from their personal life) or negative (e.g., personal or work-related struggles) events in their lives. Based on their answers, we identify three types of networks: a family network composed of the partner and the family of origin, a formal network that includes professional figures such as general practitioners (GPs) or therapists, and an informal network that includes friends and coworkers. Note that respondents could choose up to 3 options for each type of event, including the option “no one.” Therefore, the indexes are not mutually exclusive. Additionally, in this case, to aid in the interpretation of the results, the network indexes are defined with dummies: each dummy is equal to 1 if the respondent has an above-median level of the specific network support.

5 Empirical analysis

5.1 Empirical strategy

As described in Section 2, we aim to analyze how individual stereotypes (\(S_{i}\)) correlate with DV perceptions (\(P_{i}\)), controlling for other potential confounders such as personal experience (\(E_{i}\)), individual characteristics (\(L_{i}\)), and neighborhood characteristics (\(N_{p}\)). We estimate Eq. 2 using a linear probability model. The additional inclusion of area-level fixed effects (\(\tau _{a}\)) allows us to control for broader time-invariant, spatial-specific characteristics (e.g., cultural context, trust in institutions, provision of services). We alternatively consider the 5 Italian macroareas (i.e., northeast, northwest, center, south, and islands) that group together more regions defined using ISTAT’s official definition or the 21 regions (i.e., 19 regions and 2 autonomous provinces) as geographic units. Standard errors are clustered at the corresponding area level.Footnote 19 Conditional on the fixed effects (\(\tau _{a}\)), the estimation compares individuals within homogeneous geographic areas, leveraging on differences across individuals.

$$\begin{aligned} P_{i}=\alpha +\beta _{1}S_{i}+\beta _{2}E_{i}+\beta _{3}L_{i}+\beta _{4}N_{p}+\tau _{a}+\epsilon _{i} \end{aligned}$$
(2)

We sharpen our empirical analysis through robustness checks (Section 5.3).

5.2 Results

The baseline results are presented in Table 2. In Panel A, we report the results using macroarea fixed effects, while in Panel B, we use a robust specification incorporating regional fixed effects. The results are robust across panels, meaning that the effect of stereotypes does not depend on the differences in the average level of gender stereotypes across regions but is driven by within-region variations in individual beliefs about stereotypes. This result represents an improvement on the literature, which has thus far considered only aggregate measures of stereotypes, without exploiting individual-level variation within regions. Since Panel B provides the robust version of the baseline model, our preferred specification is the one including region-of-residence fixed effects.

Table 2 Baseline estimation

Stereotypes have a minor association with Diffusion, significantly affecting the perceived diffusion only of psychological violence (\(-\)6.5% at the mean of the outcome when stereotypes increase by 1 standard deviation). The effect goes in the same direction for perceived diffusion overall, even though it is not significantly different from zero. Such a path seems to indicate that subscribing to strong stereotypes reduces awareness regarding the diffusion of DV, particularly psychological violence, the definition of which can be highly subjective.

Stereotypes are strongly correlated with Severity, reducing the perceived severity of physical and sexual violence (\(-\)4.4% and \(-\)6.5% when stereotypes increase by 1 SD) and increasing that of social and economic violence (+5.6% and +2.6%). Although these effects are partially mechanical, the results systematically highlight that individuals with above-average levels of stereotypes assign a lower level of severity to physical and sexual violence and a higher level of severity to social and economic violence than individuals with lower levels of stereotypes. This means that stronger stereotypes are correlated with a higher perceived acceptability of physical and sexual violence.

Similarly, regarding Causes, stronger stereotypes are negatively correlated with attribution of the causes of DV to longstanding characteristics of the abuser (e.g., childhood trauma) and positively correlated with citing context-specific causes. Respondents with stronger levels of stereotypes are more likely to justify violent behaviors during transitory shocks, such as the COVID-19 emergency. A 1 SD increase in the stereotype index accounts for a +6.2%, +5.1%, and \(-\)8.2% change in perceptions of DV as being caused by COVID-19-, economic crisis- and abuser personality-related factors, respectively.

As far as Appropriate advice is concerned, stronger stereotypes are correlated with less frequently recommending formal reporting: a 1 SD increase in the holding of stereotypical norms decreases the likelihood of advising a victim to report DV to the police by \(-\)10.1% and to use support services by \(-\)1.8%. The results in the last two columns of the table show that respondents holding more stereotypical beliefs recommend behaviors that exacerbate the underreporting of DV: a 1 SD increase in stereotypes increases the probability that the respondent advises the victim to try to resolve DV with the partner by +7.5% and the probability of advising the victim not to react by +35.2%.

To further appreciate the magnitude of the estimated effects, we calculate the predicted outcome levels along the distribution of the index of stereotypes from our sample (the \(25^{th}\), median (\(50^{th}\)), and \(75^{th}\)), based on the estimated coefficient of stereotype in Panel B of Table 2. Column (4) reports the difference between the predicted outcome level at the \(75^{th}\) percentile of stereotypes and the outcome level at the \(25^{th}\) percentile. As individual stereotypes become stronger (from the bottom to the top), for instance, we record a 7.8% decrease in the perception of overall violence, a 9.6% decrease in the probability of perceiving sexual violence as severe, and a 12% decrease in the probability of considering the abuser’s personal characteristics as a cause of their violent behaviors. The larger impact is on the recommendation to take no reaction, which increases by 38%. All results are available in Table A9.

Our estimates are smaller than those found in the literature, yet the comparability with the literature is not straightforward since unlike many studies on DV that address the trends in reported episodes (Leslie and Wilson 2020; Bullinger et al. 2021), we focus on the determinants of population perception. In this respect, we use different outcomes as well as a different way to proxy stereotypes. We measure stereotypes at the individual and not at the aggregate level: aggregated data are more likely to carry aggregate biases. In fact, the underlying assumption when using aggregate data is that the estimated relation between variables is homogeneous across all individuals for whom the aggregate index is used. As a consequence, the estimated impact of the changes in the explanatory variable on the dependent variable is much larger in the aggregate regression than in the less aggregated regression (Garrett 2003). Additionally, the outcomes found in previous literature are not always comparable since they refer to episode frequencies and, in some cases, mainly focused on intimate partner violence (IPV), specifically referring to physical and sexual violence. For example, González and Rodríguez-Planas (2020) find that a 1-SD increase in gender inequality increases the incidence of violence, intensity of violence, and number of cases of violence by 28%, 43%, and 33, respectively. Tur-Prats (2019a) focuses on a specific country, as in our case, and measures norms using the contemporaneous family structure (stem vs nuclear) in Spain, claiming that women living in stem families experience a lower incidence of IPV by as much as 57%.Footnote 20 Alesina et al. (2021) find similar country-level results not only on the incidence of IPV but also on the attitudes toward acceptance of IPV, proxying for stereotypes with information on the ancestral ethnicity of respondents, including the mode of production, marriage status, and residential patterns. They find that respondents with less gender-equal backgrounds are 77% more likely to justify violent acts by the husband. Finally, recent work by Guarnieri and Tur-Prats (2023) found a positive effect between gender norms and sexual violence during civil conflicts. Additionally, in this case, gender norms are measured at the aggregated level of the armed actors. They find that a 1-SD increase in the index of male dominance increases violence by 0.29 SD.

Our results are consistent with previous evidence based on Italian data on the use of help-seeking services, such as antiviolence helplines for women victims of domestic abuse. For example, Colagrossi et al. (2022) studies the effect of a television campaign on the use of the helpline in case of domestic violence during the first wave of COVID-19 in Italy on the actual use of the service. They find an overall increase in calls of 30 to 40% after the campaign in areas with a 1-SD higher exposure to the campaign. However, this effect is much lower in areas with higher gender stereotypes, where the number of calls increased by only 3 to 10%.

Table 3 Standard error corrections

5.3 Robustness checks

As discussed in Section 5.2, robustness checks are presented only for our preferred specification. We group the tests in five sets: 1) to address potential problems due to multiple hypothesis testing, 2) to address potential bias stemming from the definition of our outcomes, 3) to address potential unobservable factors, 4) to address potential bias from our definition of stereotypes, and 5) to address potential problems due to the slight imbalance in certain observable dimensions.

5.3.1 Alternative specifications

In a first set of robustness tests, we address potential problems due to multiple hypothesis testing and to the clusters of the standard errors. When performing simultaneous inference with a large number of outcomes, as in our case, there might be a problem of overrejecting the null hypothesis. This might undermine our results since significant coefficients may emerge simply by chance, even if there are no strong effects (Anderson , 2008). Hence, we adjust the p-values to limit the proportion of false rejection of the null hypothesis across all individual outcomes (Allcott et al. , 2020) following the approach in Simes (1986), as implemented in Erten and Keskin (2018). Our main results are still robust, as shown in Panel (A) of Table 3.

Since in our preferred specification, we cluster errors at the regional level, we might have biases generated by the use of a low number of clusters. To address this problem, we exploit the wild cluster bootstrap standard error estimation (Cameron et al. , 2008). In Panel B of Table 3, we show that our baseline results are robust to the use of a wild cluster bootstrap approach (Roodman et al. , 2019). Our results are also robust to the use of a simpler heteroskedasticity robust standard error regression, as shown by the results in Panel C of Table 3.

5.3.2 Alternative outcomes

As explained in Section 4.2, we asked respondents to rank 14 different types of violence, and we calculated the individual-level average of the elements in each group. However, we did not spell out the value of the position per se. For this reason, we also run our baseline regression separately for each element of each group of behavior for a total of 14 regressions, and given the high number of outcomes, we report both unadjusted and adjusted standard errors for multiple hypothesis testing using a Bonferroni procedure (Erten and Keskin , 2018). The results shown in Table 4 are consistent with our baseline estimation.

Table 4 Alternative outcome definitions: ranking, single items

Additionally, we use a principal component analysis (PCA) with the same dimensions to create an alternative measure of the 5 groups of violent behaviors. Hence, we generate a PCA index for Social violence using control over social media and control over friends. Similarly, we created an index for Economic violence by running the PCA on the position of control over work and control over expenses; for Psychological violence, we do so by running the PCA on the position of quarrels, threats, insults, humiliation, recrimination, and spite; for Physical violence, we do so by running the PCA on the position of throwing objects and battering; and for Sexual violence, we do so by running the PCA on the position of sex requests and sexual violence. We then use these indexes as outcomes in the regression analysis, as shown in Table 5, and the results do not substantially differ from our baseline.

Table 5 Alternative outcome definitions: ranking computed using the principal component analysis
Table 6 Alternative controls for unobservables

5.3.3 Alternative controls for unobservables

The presence of other factors simultaneously affecting both stereotypes and DV perceptions (i.e., omitted variable bias) may indeed undermine the robustness of our results. To address the role of unobservable confounding factors, we run several checks. First, we add the average stereotypes index in the birth region of the respondent to the baseline specification, building on the approach proposed by Fernandez and Fogli (2009). According to their work, the beliefs and preferences in the country of origin have significant effects on the realization of individual outcomes. We define a synthetic measure of birth-region stereotypes by assigning the average regional stereotypes index to individuals based on their region of birth. Since we include region-of-residence fixed effects, the effect is identified by movers (i.e., 19% of the sample) whose birth region differs from their region of residence. The results in Panel A of Table 6 show that adding this control does not affect the estimate of the effect of stereotypes.

A potential confounding factor affecting both perceptions of DV and the holding of stereotypes is media.Footnote 21 While we do not have information on the main source of information used by respondents, we recover weekly data on the exposure (time) to television from the end of February to the beginning of June 2020, covering the entire period of the first wave of the pandemic. During that time, because of strict stay-at-home orders, people tended to be exposed to television information more than usual, with some territorial and age variations. Data on the average time viewing (ATV) were provided by an Italian company that collected this information (AUDITEL) for both 2019 and 2020. The data refer to two age groups, women aged 25–44 and 45–64. The reason for including these data in our model was to test two specifications. In Panel B of Table 6, we show the results for the first specification, where we control for the absolute number of minutes per week per region. In Panel C, we control for the rate of growth in ATV from 2019 to 2020 for the same months. Even if the impact of media is not limited to that of television, our baseline results are robust to this specific definition of access to information.

In Panel D, we control for an alternative definition of personal experience with DV, since, in the baseline results, we control for the variable Know a victim, based on whether the respondent knew a victim of domestic violence before the first wave of the COVID-19 pandemic. The idea is to check whether the main effect is driven by experience in the recent past, which might affect both beliefs regarding gender stereotypes and perceptions of violence. The results substantially confirm our baseline estimates.

Finally, to cope with potential measurement errors driven by the individual characteristics of the respondent (Elliott and Valliant 2017) due to the use of self-reported measures, we propose an alternative specification where we also control for Length of interview. This is an individual continuous measure reporting the number of minutes elapsed from the start of the survey to its submission, which serves as a proxy for the respondent’s level of attention at the time of the interview. The results available in Panel E of Table 6 confirm that individual characteristics that can simultaneously affect multiple self-reported answers, such as the level of attention, do not drive the results.Footnote 22

5.3.4 Alternative stereotype indexes

We test for possible interpretation challenges related to our index. For instance, it might be that the type of questions on which the index relies affects the interpretation of the results. The concern behind this check is related to so-called social desirability bias. This bias arises if respondents tend to try to comply with or satisfy the expectations of whoever designed the survey. In our context, this would mean that respondents, when facing some clearly gender-stereotypical questions, such as whether men are better than women at performing a specific task, would tend to state that men and women are equally able to perform the task even if their true belief differs. If the bias is strong, it would mean that the estimates of the baseline specification are identified by respondents who — notwithstanding social desirability bias — are outspoken in affirming that men are better than women at performing specific tasks. As a consequence, the identified effect would represent the upper part of the distribution of stereotypical beliefs among women. The answers corresponding to those women who subscribe to stereotypes but are ashamed to be straightforward about it would not explain the results we found.

Table 7 Alternative measures of stereotypes

Hence, to account for the potential role of social desirability bias, we measure individual stereotypical beliefs with questions potentially less sensitive to this bias. The new indexes are again given z scores, as already discussed in Section 4.1, and the results are shown in Table 7). The first index (Panel A) relies on respondents’ identification of gender-specific job positions (question 31). Respondents were asked to state whether each of 8 listed professions (from schoolteacher to public transport inspector) was more suitable for a man, a woman, or someone of either gender. The index on this question measures how frequently a respondent chose a specific gender instead of the gender-neutral “either.” Panel B shows the results based on an index computed on answers to general statements about the balance between private and working life (question 32). Again, respondents faced 8 hypothetical situations (from “a woman works more than her partner” to “a boy does not have a driving license”) and expressed how acceptable (from very to not at all) they considered the situations to be.

The baseline estimates are confirmed. In some cases, these alternative indexes have a higher correlation (i.e., larger magnitude) with the outcomes of interest. For instance, respondents with stronger stereotypes have a much lower perception of the diffusion of all types of violence (overall, psychological, and psychological violence). Since the main results are not dramatically affected, we choose to stick with our initial index; given that we have no way to completely rule out social desirability bias, this index at least provides us with a clearer result once we keep in mind that it represents a lower bound of the true correlation between stereotypes and perceptions of DV.

Additionally, since our measure of stereotypes is in fact created using the average of standardized answers to 8 items of one question, it might be that the answers to these items are strongly correlated because they either capture similar dimensions of stereotypes or simply suffer from a common method bias. Hence, we create an alternative stereotype index using a (polychoric) principal component analysis. The polychoric PCA measures the correlation between two unobserved continuous variables, but above all, unlike normal PCA, it can address a larger proportion of variance. In fact, the polychoric PCA would allow us to maintain the ordinal nature of the Likert scale, specifically how respondents have answered those questions. Results, using this PCA transformation of the stereotype index, are confirmed as shown in Table 7, Panel C.

Finally, in Panel D, we also show the results for a different version of our index using a dummy variable rather than the continuous index. It distinguishes between weak and strong stereotypes based on the mean of the stereotype index distribution (mean=0). While simplifying the interpretation of the estimates, it also shows that the results are not substantially affected.

Table 8 Weighted regression

5.3.5 External validity

As discussed in Section 3, our sample is well representative along certain dimensions and is slightly less representative along other dimensions positively correlated with stereotypes. Hence, as a final check, we use an entropy balance procedure to match our sample to other sample statistics by region and consequently create weights to achieve this balance with respect to relevant dimensions of the population. We focus on regions since they define our territorial unit of interest, although the sample was not defined based on them, at least for the center and the south. We use the entropy balance methodology to create weights and run a weighted version of our baseline. The results are available in Table 8, both when we weight using only the age distribution Panel A and when we weight using education, flat size, and number of children distributions within regions Panel B.

6 Acceptability of violence

We go into greater depth on the beliefs expressed on the severity of violent behavior, with respondents holding stronger stereotypes ranking economic and social violence as more severe than physical violence. We proceed by looking at what is considered acceptable within a couple relationship and into the potential role of victims of violent behavior. We examine two questions: the first concerns the acceptability of an occasional slap in a relationship, while the second concerns blaming a rape victim for her own victimization if she was drunk at the time of the rape. These questions aim to check how likely respondents holding stronger stereotypes are to rationalize men’s misbehavior as resulting from women’s deviation from behavioral norms, consistent with a benevolent sexist approach. Having stronger stereotypes is associated with a higher perceived acceptability of a slap in a relationship and more blame attribution to a rape victim if she was drunk. The results are robust across specifications, as shown in Table 9.

Table 9 Outcome: acceptability

7 Conclusion

An accurate understanding of the seriousness and potential consequences of certain types of behaviors is, however, the first step toward gaining awareness of and dealing with DV. If DV is believed to be a rare event, reporting is likely to be lower and the matter more likely to be relegated to the private realm. By the same token, the costs in terms of psychical and mental health, the costs on working and career outcomes, and the costs on the cognitive and emotional development of involved children are also going to be relegated to the private realm. By changing how DV is perceived and how it affects public opinion, victims could become more empowered to deal with it.

Our analysis focuses on gender stereotypes among the main drivers of DV perception while controlling for a large set of factors that are expected to affect the perception of DV (e.g., socioeconomic factors). Our findings show that stereotypes are not necessarily just the product of the environment in which they are embedded, since we obtain significant results of stereotypes even when we absorb the territorial effects using respondents’ region of residence fixed effects. This means that in principle, you might live in a very emancipated context and yet have strong stereotypes, and vice versa. We think that this is the main takeaway of our analysis, which provides a lower bound estimate of the true effects, and although it is common knowledge that cultural norms do not change overnight, slight changes are possible. For instance, stereotypes should be the target of ad hoc campaigns. As shown by La Ferrara et al. (2012); La Ferrara (2016), the media can influence and shape stereotypes, particularly gender roles, by holding up new role models. Providing more information on the availability of services to victims, such as helplines, can be equally useful. These campaigns could be easily implemented, and they could generate the benefit of increasing awareness of a problem that is all too common.

As personal backgrounds and experiences matter in defining both norms and perceptions of DV, future research should be devoted to measuring and recording personal experiences in depth, such as asking questions about task allocations in the family of origin and the working status of the female figures. In our survey, we addressed personal experiences, working on the framing of questions so as to allow for a self-reference approach. The use of a panel could further help in this respect.

Our results do not include male perceptions of DV since we rely on a survey of only females, and although the stereotypes of males and females are highly correlated at the territorial level, it would be interesting to analyze how a male respondent would perceive the severity of certain behaviors and what action he would suggest in the face of DV situations. Finally, we do not identify either the gender identity of the victim or the type of relationship between the victim and the abuser, which would be worthwhile to investigate in future works on the topic.