Introduction

The adoption of CEDAW was a remarkable achievement in the history of the women’s movement. Its ultimate aim was to catalyse social transformation that transcends cursory legislative reform (Facio & Morgan, 2009). Article 3 of CEDAW promotes this social transformation, calling for state parties to ‘take all appropriate measures’ to achieve gender equality. In practice this has included, but has not been limited to, gender-blind strategies, awareness raising, litigation, international advocacy, art and social media activism, and gender mainstreaming (see Table 1 for definition).

Table 1 Definition of concepts

The Global Gender Gap Index 2022 benchmarks 146 countries on the evolution of gender-based gaps in economic participation and opportunity, educational attainment, health and survival, and political empowerment (World Economic Forum, 2022). Although the Index measures gender parity (defined in Table 1) rather than substantive equality, it is a useful tool for analysing progression and regression. With scores depicting the distance to parity on a scale of zero to one hundred, the 2022 Report found the average distance completed to parity was 68 per cent. With the present trajectory, it will take 132 years to close the gender gap and 151 years to achieve equal economic participation and opportunity (World Economic Forum, 2022). Moreover, these estimates are predicted to worsen as the world faces crises in politics, economics, health, food, and the environment. Now more than ever we must assess our successes and failures in attempting to reduce gender inequality and discrimination.

The aim of this systematic review was to identify and synthesise evidence of the effectiveness of social justice interventions that sought to reduce gender inequality, gender bias, or discrimination against women and girls. Because recent systematic reviews have examined the effectiveness of interventions targeting violence against women and sexuality (e.g. Karakurt et al., 2019; Bourey et al., 2015; Yakubovich et al., 2018) we did not include these types of interventions. We were unable, however, to identify systematic reviews examining other interventions targeting gender equality. Therefore, this review focused on interventions that sought to achieve gender equality in any political, social, cultural or economic context, except violence against women and sexuality.

Theoretical Framework

The truism ‘context matters’ is pertinent to this systematic review. According to contextual social psychology, effects brought about at a microlevel are modified by the mesolevel and macrolevel, and vice versa (Pettigrew, 2021). In this review, microlevel variables include individual characteristics, including biology, beliefs, behaviours, values, and emotions, such as empathy and resentment. Mesolevel contextual factors include interpersonal interactions in family, work, and school etc. (e.g. gender segregation), and macrolevel context includes broader social and cultural norms, including religion and politics. Social norms in this context are “rules of action shared by people in a given society or group; they define what is considered normal and acceptable behaviour for the members of that group” (Cislaghi & Heise, 2020, p. 409). In this sense, social norms exist within the mind, while gender norms exist outside it, and both are produced and reproduced through social interaction. In contextual social psychology, beliefs are embedded in institutions that affect our relational behaviours. While there are psychological causes of macrophenomena (Pettigrew, 2021), these phenomena (such as patriarchy) also influence individual affect. For example, affirmative action laws (macro) should increase contact between genders (meso), which in turn should reduce individual prejudice (micro). While this is a top down example, it also works from the bottom up, whereby micro behaviours can affect macrophenomena. In this context, prejudice against women and girls is a “multilevel syndrome” (Pettigrew, 2021, p. 74).

“Systems thinking” also recognises the intersection between problems and processes from local to global levels (Arnold & Wade, 2015). Systems thinking is a complex interplay of a multitude of constantly evolving factors (Banerjee & Lowalekar, 2021). According to systems thinking, gender equality will be realised when interventions at the micro, meso and macrolevel are configured holistically, rather than individualistically. Interventions at any level need to consider and accommodate the role of processes and factors that may support or hinder the effectiveness of the intervention to yield population benefits. The different contextual levels that impact on gender inequality may be successfully tackled by feminist movements, but integrating the interventions pluralistically rather than monistically remains elusive as feminist movements appear to continue to work in silos. In undertaking strategies across different contexts, however, we are more likely to achieve substantive equality. But we need to address this complexity in the three contextual levels (micro, meso, macro) in order to predict, modify and eliminate discrimination against women and girls. These theoretical frameworks are used throughout this review to aid the synthesis of the evidence and identification of implications for practice.

Method

Review Design

The Sample, Phenomena of Interest, Design, Evaluation, Research type (SPIDER) tool was used to design the review (Cooke et al., 2012). SPIDER is appropriate for systematic reviews of quantitative, qualitative, and multi-methods research. We use the term multi method rather than mixed method because mixed method studies could be considered to have used multiple methods of data collection/analysis, but not all multi-methods studies follow “mixed methods” procedures as they do not always provide an integrated synthesis of findings across the methods used (Creswell, 2009). The search terms are documented in Supplementary Tables 1 and 2. The review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Page et al., 2021). Rapid review methods were used for citation screening and data extraction (Plüddemann et al., 2018). Papers were eligible according to the criteria defined below.

The sample could include people of any age, race, or gender in local, global, or transboundary intervention contexts. The phenomena of interest included any social justice, cognitive or behaviour-change interventions that sought to reduce gender inequality, gender bias, or discrimination against women, with any mode of delivery and duration. Interventions could be any type of program (e.g. behaviour change), policy (e.g. gender mainstreaming), process (e.g. awareness raising) or experimental condition that aimed to influence gender-focused outcomes. An intervention was categorised as achieving its aim (e.g., having a beneficial effect on gender equality or reducing discrimination), partially achieving its aim, not achieving its aim according to the assessment in the paper (i.e. if the analyses in the respective paper found that the intervention did not work), or having a harmful effect (i.e. resulting in increased discrimination or inequality).

The intervention being investigated could have been administered by any party, including expert advocates, government or non-government organisations (NGOs), social justice enterprises, or academic researchers. The research design did not need to include a comparator or control group, but must have incorporated a between-groups or pre-post comparison, or retrospective assessment of the impact, feasibility or acceptability of the intervention or program. The primary outcome for evaluation was any measure of actual or perceived level of, or change in, gender (in)equality, gender bias, or discrimination against women or girls. Secondary outcomes were the perceived level of inclusion, solidarity, awareness, empowerment, or equity. The research methods could include qualitative, quantitative, and mixed- or multi-methods. Eligible papers were published in peer-reviewed journals in English from 1990 to 2022. Whilst CEDAW was adopted in 1979, this timeframe was selected to ensure contemporaneity. A protocol for the review was developed a priori, but not registered.

Search Strategy and Eligibility Screening

As this was a review of research across multiple disciplines, three databases were used: Scopus, ProQuest, and psycINFO, in addition to reviewing reference lists and recommendations by experts. Search terms were adapted to each database. After screening the first search results it was evident that the terms were not broad enough, so a second search including additional terms was undertaken (see Supplementary Tables 1 and 2 for terms of both search strategies). All search results were uploaded to Covidence for eligibility screening and duplicate removal by reviewer one. Using Abstrackr, a second author screened a minimum of 10 percent of citations, consistent with rapid review methods (Plüddemann et al., 2018), or until < 50 percent of citations were predicted to be relevant. Abstrackr is a machine-learning program that generates predictions of the likely relevance of records based on judgements made by the reviewer (Wallace et al., 2012), which has been found to have excellent sensitivity and to generate significant workload savings (Giummarra et al., 2020). After titles and abstracts were screened, full text articles were assessed against the eligibility criteria, noting reasons for exclusion. Both reviewers met to discuss any conflicts; if consensus could not be reached a third author was consulted. The authors included experts in gender equality who provided significant input into the search strategy, identification of relevant literature, and synthesis.

Quality Assessment

The quality of research was assessed by the first author using a standard method (Kmet et al., 2004) with the added criterion of whether papers reported approval by a formally constituted human research ethics committee. Supplementary Tables 3–5 specify the quality criteria. Overall quality was classified as poor (studies meeting < 0.50 criteria), adequate (0.50–0.69), good (0.70–0.80), or strong (> 0.80) consistent with previous studies (Parsons et al., 2017).

Data Extraction and Synthesis

Data were extracted in three categories: The authors and publication year of the paper; research aims, theoretical approach, methods, sample size, eligibility criteria, and sample characteristics; and, the intervention, aim, type, sector, geographic region, description, duration, targeted outcomes, effects, and short- and long-term impacts. Figures to summarise the proportion of studies from different geographic regions were generated using www.sankeymatic.com/build/. Ten percent of the full-text articles were randomly selected, stratified by research method, for independent data extraction by a second author, consistent with rapid review methods (Plüddemann et al., 2018). The data extracted from both reviewers was cross-checked for accuracy and completeness. Sources of heterogeneity were noted, particularly variation in study samples, settings, contexts and intervention designs or aims. Given the heterogeneity of the interventions and the research, meta-analysis and meta-synthesis were not appropriate. Therefore, the findings were thematically synthesised according to intervention sector (e.g. education, employment etc.) and context (i.e., micro, meso and macro levels).

Results

A total of 7,832 records were screened for eligibility with the last search conducted on 18 July 2022 (Fig. 1). Seventy-eight papers, each reporting a single intervention and using qualitative (n = 36), multi (19), or quantitative (23) methods, met the inclusion criteria. The characteristics of qualitative, quantitative, and multi-methods studies are summarised in Supplementary Tables 6, 7, and 8, respectively. The intervention effects for each study are summarised in Supplementary Tables 9 and 10.

Fig. 1
figure 1

Preferred Reporting Items for Systematic review and Meta-Analysis Protocol (PRISMA) Flow Diagram

Five interventions were at the microlevel, 37 were at the mesolevel, and 17 were at the macrolevel. The final 19 interventions straddled micro-meso, meso-macro, or micro–macro. No intervention covered all three levels or took a systems thinking approach.

Quality Assessment

The overall quality of each paper is detailed in Supplementary Tables 6–8, and ratings for each quality domain are in Supplementary Tables 3–5. Studies using quantitative methods (range 0.58–1.00; median = 0.92, Q1 = 0.82, Q3 = 1.00) had significantly higher quality than qualitative (range 0.41–0.91; median = 0.73, Q1 = 0.67, Q3 = 0.79; χ2(1) = 13.71, p < 0.001) and multi-method studies (range 0.48–0.94; median = 0.76, Q1 = 0.63, Q3 = 0.82; χ2(1) = 21.96, p < 0.001). There was no difference in the quality of qualitative and multi-methods studies (p = 0.97).

All quantitative studies articulated the research question and reported the results adequately. Randomisation and blinding were used in most studies. While estimates of variance and controlling for confounding were not consistently reported, 18 studies using quantitative methods were considered to be strong quality, and seven had a perfect score.

In reports of qualitative studies, the study design, context, and conclusion were generally addressed well. However, only six studies used verification processes (see Table 1 for definition). No qualitative study received a perfect score; 20 studies were considered to be good quality.

For multi-method studies, the objective, context, data collection, analysis, and conclusion were generally reported well. Blinding was not applicable, and estimates of variance and control of confounding were generally not reported. No multi-method study received a perfect score although the quality of six of multi-methods papers was assessed as good.

Corresponding authors were contacted to confirm ethics approval; authors of two papers confirmed that the study did not receive ethics approval, and authors from 16 studies did not respond or confirm whether they had ethics approval. The omission of evidence of ethical approval is concerning and should be addressed in all future research with humans. The 18 studies with respect to which we either could not confirm ethics approval or did not receive ethics approval were all published in highly ranked journals. Furthermore, it was not, in general, clear in the majority of papers which agency or organisation conducted the intervention or undertook the study (e.g. government agency, NGO, academic researchers) making it difficult to assess reflexivity, and the prospect of future implementation.

Included Interventions

Intervention Sectors

Interventions were implemented and evaluated in various sectors: education (26 interventions); politics (10); employment (8); information, communications, and technology (6); legal (5); economics (6); health (3); sustainable development and land rights (3); sport (3); and women’s and girls’ rights (2). Interventions in the areas of conflict and of water, sanitation, and hygiene were reported in one paper each.

Intervention Settings

Interventions were set evenly throughout the Global South (35 papers) and the Global North (39 papers). Interventions were evaluated in Africa (15), Europe (12), North America (19), Asia (10), Latin America (6), the Middle East and North Africa (4), the United Kingdom (6), and the Pacific (4). Just under half of the Global South interventions were conducted in rural settings (16/35), whereas Global North interventions tended to be urban (22/39) (Fig. 2).

Fig. 2
figure 2

Settings for interventions in Global North and South Countries

Research Participant Characteristics

Twenty-seven interventions included both women and men as participants, 30 included only women, and one intervention included only men. Thirteen studies did not report the gender of the sample, and in seven studies gender of the sample or population was not applicable (e.g. intervention sought to affect a broad population approach irrespective of gender, such as a new law that applied to the whole population in order to improve gender equality, or a collective political party that sought to influence gender issues in parliament). Thirty papers did not report other participant demographic characteristics. Where sample characteristics were reported, participants were 10–80 years of age, with education level ranging from none to post-graduate.

Study Characteristics

All papers but one (Devasia, 1998) were published after 2005. Most papers reported data gathered across years, with twelve interventions taking place over hours or weeks. The timeframe did not appear to be associated with whether or not the intervention had a significant beneficial effect on the aims of the intervention. For example, McGregor and Davies’ (2019) two year study of the effects of a pay equity campaign achieved its aim (legislation was enacted), but Hayhurst’s (2014) girls’ entrepreneurship study that ran for several years had harmful effects (girls income was taken by men). Similarly, Zawadzki et al., (2012) board game intervention that takes 60–90 min achieved its aims but Krishnan et al. (2014) conditional cash transfer study over a month had no effect on social change.

In the qualitative and multi-method studies, theoretical frameworks were rarely reported. The few papers that did report theoretical frameworks used feminist standpoint theory, post-structuralist feminist theory, or social constructivist theory. Qualitative data collection methods were diverse: interviews (41 studies), focus groups (19), document analysis (18), observations (15), case studies (2), and visual techniques (e.g. PhotoVoice) (2). Quantitative and multi-method studies predominantly used surveys and questionnaires (22), with one study each using of the following tools: Gender Equitable Men’s Scale (Gottert et al., 2016), the Knowledge of Gender Equity Scale, the Empathy Questionnaire (Spreng et al., 2009), the Feminist Identity Scale (Rickard, 1989), and the Gender Related System Justification scale (Jost & Kay, 2003).

Few interventions aimed to achieve gender equality per se. Rather, they aimed to achieve components of gender equality (see Table 1 for definition), which ranged from gender neutrality through to striving towards a feminist revolution. Overall aims included greater awareness, inclusion, empowerment, parity, equity, and substantive equality (Supplementary Tables 6–8, column 3). The evaluation of whether interventions achieved their aims was usually assessed through surveying participants. The most common aim was to enhance “empowerment” (n = 18), which was generally not clearly defined. The interventions had various levels of effectiveness, with 37 studies having a significant beneficial effect on the aim of the intervention (i.e., they achieved their aims); 31 having a partial beneficial impact on the aim of the intervention; four studies having no beneficial or harmful impact on the aim of the intervention; and six studies having a harmful effect on the aim of the intervention (e.g., the intervention led to increased discrimination, inequality, or abuse). Examples of harmful effects include the ‘Girl Effect’ program in Uganda which resulted in participants being abused or robbed of the money they had earned (Hayhurst, 2014), and a girls’ resiliency program in the USA that resulted in increased abuse from male peers (Brinkman et al., 2011).

Intervention Design and Effectiveness by Sector

Education and Training Interventions

Evaluations of education and training interventions were reported in 18 papers (6 qualitative, 6 quantitative, 6 multi-methods). Education interventions covered a range contexts (3 micro-meso, 11 meso, 3 meso-macro, 1 macro). Most interventions (14) used awareness-raising workshops targeting individual change, and reported only partially achieving the aim of the interventions. Five workshops were assessed in randomised controlled trials. Two qualitative studies targeted increasing girls’ enrolment in formal education in Morocco (Eger et al., 2018) and India (Jain & Singh, 2017), both of which achieved the aims of the interventions. One qualitative study in the Democratic Republic of Congo targeted behaviour change in men only (Pierotti et al., 2018), which had a partial beneficial effect because men increased their willingness to contribute to household chores but maintained control over the broader gender system. This intervention was an eight-week long mesolevel men’s discussion group focused on “undoing gender” through social interaction (e.g. promoting a more equal division of labour in the household, improving intra-household relationship quality, and questioning existing gender norms).

Gender parity in schools did not signal an end to, or transformation of, gender inequities in the schools or communities studied (Ralfe, 2009). To bring about education policy reform, Palmén et al. (2020) found that top-down institutional commitment to gender equality was essential to create change. However, bottom-up strategies were also needed as teachers had to foster cooperative learning that encouraged working together and valuing different abilities across genders (Sánchez-Hernández et al., 2018). Sufficient resources, in addition to monitoring and evaluation of education initiatives, were found to be a key to intervention success (Palmén et al., 2020). Ultimately, social norms did not change beyond the school environment (Chisamya et al., 2012; Jain & Singh, 2017).

While interventions in traditional education contexts only partially achieved their aims, experiential learning was found to be a powerful process to deliver knowledge about gender equity in a nonthreatening way (Zawadzki et al., 2012a). Zawadzki’s study was a mesolevel intervention that used a board game to teach participants the cumulative effect of subtle, nonconscious bias, to discuss how bias hinders women’s promotion in the workplace, and to find solutions for what can be done to reduce that bias. They found that the delivery of information was less effective when new knowledge did not promote self-efficacy or lead participants to resist perceived attempts to influence their beliefs or behaviours. Furthermore, they established that learning about gender inequity was not sufficient for knowledge retention. Rather, participants had to link the knowledge to their own experiences and be empowered to feel that they could act on that knowledge.

Awareness-raising interventions in education and training generally only partially achieved the aims of the interventions, and did not necessarily translate into behaviour change (Ralfe, 2009). In the strong quality (0.93) quantitative mesolevel study by Moss-Racusin et al. (2018), the Video Interventions for Diversity in STEM (VIDS) intervention was found to achieve significantly greater awareness of bias in participants compared to the non-intervention control condition; however, effects on behaviour were not assessed. This intervention presented participants with short videos about findings from gender bias research in one of three conditions. One condition illustrated findings using narratives (compelling stories), the second presented the same results using expert interviews (straightforward facts), and a hybrid condition included both narrative and expert interview videos.

A lack of awareness, knowledge, or understanding of women’s human rights was found to be a key barrier to the achievement of gender equality in education-based interventions (Murphy-Graham, 2009). Gervais (2010) reported that awareness-raising can have direct effects on participants by giving them confidence to speak up against violations of their rights, although they noted that this might anger violators. Similarly, education was found in some cases to enable women to negotiate power-sharing with their husbands, while other women were verbally abused and threatened because their husbands disapproved of the education program (Murphy-Graham, 2009). Similar to the study by Pierotti et al. (2018), Murphy-Graham (2009) sought to “undo gender” by encouraging students to rethink gender relations in their everyday lives (mesolevel). Including men together with women in education programs enabled women to gauge men’s reactions to social change in a safe environment (Cislaghi et al., 2019). Potential harmful effects of interventions are further summarised under the ‘The problem of hostile affect’ header below.

STEM Education

Among education interventions were a subset of Science, Technology, Engineering and Maths (STEM) education interventions. These specifically targeted secondary school girls as a pathway to tertiary STEM education, and were reported in eight papers (1 qualitative, 3 quantitative, 4 multi methods). The design of interventions varied from science clubs, outreach programs, after school sessions, residential camps and immersion days. Archer et al. (2014), however, took a multipronged approach. Their intervention included school excursions, visits from STEM Ambassadors and a researcher-in-residence, a STEM ‘speed networking’ event, and participation in a series of teacher-led sessions for girls aged 13–14 years. Despite this significant investment, the intervention did not significantly change students’ aspirations of studying science, although it did appear to have a beneficial effect on broadening students’ understanding of the range of science jobs.

All STEM education interventions were aimed at the mesolevel and were located in the urban Global North. While the long-term impact (e.g. increased enrolment of women into tertiary STEM education) were inconsistent among studies. Gorbacheva et al. (2014) found that secondary same-sex education had no influence on this objective. Alternatively, Hughes et al. (2013) found having role models was more critical than sex segregation. Finally, Lackey et al. (2007), Lang et al. (2015) and Watermeyer (2012) all established that a network of support (e.g. family, school, industry) made a positive difference to girls equality in STEM education.

Employment Interventions

Eight interventions focused on women’s employment: 4 qualitative, 2 quantitative, 2 multi-methods studies. They covered a range of contexts (1 micro/meso, 5 meso, 2 meso/macro). Three interventions addressed women’s promotion (Eriksson‐Zetterquist & Styhre, 2008; Grada et al., 2015; Smith et al., 2015). Two interventions evaluated microenterprise; one produced harmful effects (Hayhurst, 2014), and the other only partially achieved its aim (Strier, 2010). Hayhurst (2014) evaluated an intervention auspiced by the Nike Foundation and concluded that it had an unfair and deleterious effect by placing the burden of social change on girls. In this intervention, focusing on the mesolevel, girls were taught to be entrepreneurs to enable them to escape abuse, buy land, grow food, and work. In practice, this economic empowerment strategy led to increased abuse by men who wanted to take the girls’ money to pay their own taxes and fines. This study was good quality (0.73). Participants in the study by Strier (2010) thought that microenterprise promised self-realisation and escape from the slavery of the labour market, but they found it to be a false promise, characterising the informal sector as both a disappointment and a fraud. Overall, employment interventions led to unreliable and inconsistent outcomes.

Economic Interventions

Six interventions (1 qualitative, 2 quantitative, 3 multi-methods studies) addressed various contexts (1 micro, 1 micro/macro, 2 meso/macro, 2 macro interventions) that targeted economic empowerment. Overall, the interventions partially achieved their aims. For microfinance interventions, women benefited less than men because they were given smaller loans for less lucrative businesses (Haase, 2012). Krishnan et al. (2014) conducted a good quality (0.79) multi-method study of a micro–macro level intervention that provided conditional cash transfers in India, and found minimal positive effects from the implementation of this scheme to address social behaviours related to valuing girls. In this study, parents had to register the birth of their daughter in order to receive financial benefit, but this did not transform the social mindset that daughters are a burden. In another study, the size and frequency of cash transfers directly influenced outcomes: large but infrequent payments enabled investment that could facilitate economic transformation (Morton, 2019). Lump-sum payments also challenged stereotypes about what women could invest in, and could transform the gender asset gap. Institution of a social protection floor (e.g. welfare benefits) enhanced women’s power and control over household decision-making in financial matters and household spending in South Africa (Patel et al., 2013). While a social protection floor had benefits for women’s empowerment at the microlevel, it did not transform unequal and unjust gendered social relations of power at the macrolevel.

Legal Interventions

Five interventions (3 qualitative, 2 quantitative studies) in two contexts (1 meso/macro, 4 macro) reported on legal interventions. In Zartaloudis’s (2015) qualitative macrolevel study of an employment strategy in Greece and Portugal, legislation was found to have an important but not transformative effect on gender equality in employment. Three other studies found that changes in law must be accompanied by incentives and penalties in order to be effective (Kim & Kang, 2016; Palmén et al., 2020; Singh & Peng, 2010). While the decline in levels of discrimination was at first sharp after enacting anti-discrimination legislation, its implementation plateaued over time, calling into question the long-term sustainable effects of law reform without adequate enforcement mechanisms. In this macrolevel study by Singh and Peng (2010), the Ontario Pay Equity Act was effective because it was proactive in persuing pay equity, rather than being complaint based.

Legal opportunity and litigation were strategic choices in campaign strategies in one study, playing an important role in effecting change to prevent discriminatory pay for work typically performed by women (McGregor & Davies, 2019). The strong quality (0.92) macrolevel study by Mueller et al. (2019) increased access to legal services in order to improve legal knowledge in rural Tanzania. It found that, despite increased access to legal services, women still had moderate to low knowledge of marital laws, and only 2.7 percent of women would refer someone to a paralegal for problems with a widow’s assets, divorce, or marital disputes. Mueller et al. (2019) concluded that an increased investment in access to justice needed to be made through informal channels (mesolevel change) in addition to the macrolevel law reform.

Political Interventions

Ten papers (4 qualitative, 3 quantitative, 3 multi-methods studies) that covered a variety of contexts (1 micro/meso, 2 meso, 2 meso/macro, 5 macro) reported assessments of political interventions. Electing women to council increased other women’s access to councillors because women had greater heterosocial networks (i.e., comprising women and men), but did not affect men’s access to councillors (Benstead, 2019; Levy & Sakaiya, 2020). However, increasing the number of women in public office did not necessarily improve equality (McLean & Maalsen, 2017). For example, an evaluation of gendered outcomes of Hon. Julia Gillard’s tenure as Prime Minister of Australia saw increased gender-based denigration and vilification of her leadership (McLean & Maalsen, 2017).

A qualitative macro study using interviews and ethnography to explore the impact of political gender quotas in Mali (Johnson, 2019) found that savings groups, together with political gender quotas, were important for catalysing the first steps towards social and political transformation. In Mali, gender quota laws required political parties to field a minimum of 30 percent women candidates, and to include a woman within the first three places on a party’s candidate list. In this context, savings and credit associations developed women’s self-efficacy and increased their confidence to become political candidates (Johnson, 2019).

An example of discursive change based on political activism was found by Cowell-Meyers’ (2017) multi-method study examining the impact of a new feminist political party in Sweden. Near consensus by political parties that gender equality needed to be tackled through government intervention was achieved through the efforts of the small women’s rights party. However, another multi-method mesolevel study examining the effects of Transnational Advocacy Networks (TANs) in Europe found that they either ignored or subverted gender mainstreaming language (S. Lang, 2009). Gender mainstreaming policy interventions were found to have only partially achieved their aims, but were successful when law and policy detailed specific roles and responsibilities for action (Kim & Kang, 2016). Policymakers in two other studies were found to avoid the responsibility of implementation not because they opposed gender mainstreaming itself, but because they objected to being forced into it (Hwang & Wu, 2019; Kim & Kang, 2016). Therefore, the attitude of bureaucrats (microlevel) was considered to be an important factor in implementing gender equality initiatives at the macrolevel.

The strong (perfect quality score) quantitative study by Saguy and Szekeres (2018) reported on the effect on gender-based attitudes (microlevel) following exposure to the 2017 Women’s March across the US and worldwide in response to Donald Trump’s inauguration. The research found that large-scale collective action had a polarising effect on those exposed to it. Over time, men who identified more closely with their own gender increased the degree to which they justified gender inequality after exposure to the protests, suggesting a backlash reaction (mesolevel). People who were found to be positively affected by collective action were already in favour of the protesters’ cause. The backlash found for high-identifying men was explained by reactance theory (Brehm, 1966) whereby people become motivationally aroused by a threat to or elimination of a behavioral freedom (Brehm, 1989).

Barriers to Achieving Gender Equality: The Problem of Hostile Affect

No study accounted for men’s and boys’ emotions (microlevel change) as part of the aim and design of the intervention, but their significance became apparent in the results of several studies. Men and boys reported feeling hostility, resentment, fear and jealousy when social norms were challenged. Attempts at addressing gender inequality were found to threaten men’s sense of entitlement, and it was theorised that boys expected to be the centre of attention (Brinkman et al., 2011). In the meso study by MacPhail et al. (2019) that evaluated a men’s participation program in South Africa, participants reported equality as a zero-sum game that meant respecting women equated to disrespecting men. In that intervention, activities included intensive small group workshops, informal community dialogue through home visits, mural painting to stimulate discussions of key messages, informal theatre, soccer tournaments, and film screenings. In another study, women’s oppression was maintained by men because they feared losing control of ‘their’ women (Devasia, 1998). In several studies, men shared their fear of being perceived as weak or feminine in front of their peers or community (Bigler et al., 2019; McCarthy & Moon, 2018; Murphy-Graham, 2009; Pierotti et al., 2018; Singhal & Rattine-Flaherty, 2006). Male participants in the study by Pierotti et al. (2018) believed that allowing women to be leaders in households would disintegrate society. They believed that upholding men’s lack of accountability and position as ‘boss’ was important to maintaining the fabric of society.

In contrast, Cislaghi (2018) found that men in Senegal did not resist increased political participation of women. And a radio program in Afghanistan that addressed gender equality was found not to offend men’s cultural or religious beliefs, and ultimately succeeded in changing attitudes and behaviours towards women and girls (Sengupta et al., 2007). The outcome included changes in the community, such as giving permission to women to leave their home alone, to vote, to go to school, and to reject child marriage. While participants expressed increased empowerment (micro), they also acknowledged that they may have their rights, but can never make decisions pertaining to their rights (Sengupta et al., 2007). For example, women may have the right to vote (macro), but they cannot go to vote or decide who to vote for without male guardianship (meso). In that study, 15 h of civic education material was promoted by radio, focusing on peace, democracy, and women’s rights. At the community level, interviews and focus groups with participants revealed that there was no resistance to listening to the radio program from men or families. However, the Sengupta et al. study was not longitudinal and had a relatively small sample of 115 people (72.2% women), and the women in the study may not have been in a position that allowed them to admonish the men in their community.

It was found in one study that resistance and backlash can be ameliorated by including men and boys in the development and delivery of interventions (Sengupta et al., 2007). Behaviour change in men required an increase in empathy to achieve the aim of gender equality (Becker & Swim, 2011). Hadjipavlou (2006) and Vachhani and Pullen (2019) found that empathy was a viable alternative feminist strategy. In their qualitative study, Hwang and Wu (2019) in Taiwan found that trust-building between civil servants and advocates reduced resistance and hostility. Activists in this intervention used four strategies: (1) Giving praise and encouragement instead of criticism and blame; (2) Engaging civil servants on a personal level to create bonding; (3) Appeasing fears about being blamed by offering assistance; (4) Attempting to invoke their identification with the values of gender mainstreaming through informal educational efforts, all of which are mesolevel strategies.

Promoting Social Change to Reduce Gender Inequality

There was a wide array of types of change in different aspects of gender equality, with interventions varying in their success across settings and contexts. Table 2 summarises the types of change (e.g. legal, financial, behaviour, social) and the context (i.e., micro, meso, macro) that were identified and whether interventions aims were fully or partially achieved, or were not achieved, or had a harmful effect. Physical change, such as increased physical presence of women through inclusion or solidarity (meso) was the most consistently achieved beneficial outcome. Interventions targeting macrolevel social change, however, predominantly failed to achieve their aims or had harmful effects, reflecting how hard it is to realise social change, especially from a single, usually localised, intervention. Quotas could perhaps achieve their aim, although this finding was derived mostly from one good quality study (Johnson, 2019). The largest group of interventions were those implemented in education-based contexts, but these generally only partially achieved their aims, and focused mostly on physical changes (e.g., inclusion, solidarity). Most gender mainstreaming interventions did not achieved their aims.

Table 2 Number of Studies (Median Study Quality Score) in each intervention type and the nature of effects

Altogether, the findings confirm that social transformation is not automatic, easy, nor necessarily sustainable (Murphy-Graham, 2009). Furthermore, economic transformation is constrained if it is not supported by concurrent social transformation (Haase, 2012). One researcher, reporting a good quality meso-macro multi-method educational study in rural Bangladesh, claimed to have achieved social transformation (Sperandio, 2011). The appointment of women into roles that are traditionally occupied by men (in this case, teaching) led to widespread acceptance and normalisation of women in other non-traditional roles in a conservative village. Because the researcher did not interview or survey members of the community in which the intervention was evaluated, it is not clear whether broader social change was achieved.

It was found in several studies that dialogue was key to creating change in gender norms (Hwang & Wu, 2019; MacPhail et al., 2019; McGregor & Davies, 2019; Murphy-Graham, 2009; Sánchez-Hernández et al., 2018). However, Matich et al.’s (2019) qualitative study of the #freethenipple campaign and Boling’s (2020) study of the #ShePersisted campaign found that small steps bring about only small changes. For instance, in the #freethenipple campaign, women took control of how they were represented (microlevel) in order to challenge patriarchal gender norms (macrolevel). The authors noted that, despite good intentions, a hashtag cannot erase stereotyping. Pierotti et al. (2018) also found that small changes (micro) in quotidian tasks (e.g., participation in household chores) did not lead to substantive social change (macrolevel change). That is, while changes in tasks occurred with relative ease, social transformation through the cumulative effect of small steps towards egalitarianism did not occur.

In comparison, the qualitative study by McCarthy and Moon (2018) examined a women’s program in Ghana and found that changing everyday practices did matter, but becoming cognisant of the need for revolution led people to become overwhelmed and immune to change efforts. The researchers found that a key challenge in achieving social transformation was the need to bring about changes in daily interactions. For instance, one participant stated that if a person is not empowered at home, no matter how much money you give them, they are going to need more (McCarthy & Moon, 2018).

All genders need to participate to achieve a re-socialisation (Brinkman et al., 2011). Sengupta et al. (2007) concluded that their radio program would have alienated men if it had targeted only women. By including all genders, potential resistance to change can be neutralised (Devasia, 1998). In summary, social transformation is possible, but transformation is not likely to be universal or successful across all contexts (Sánchez-Hernández et al., 2018), particularly from any single monistic intervention. Holistic responses that take account of system thinking may create the change needed.

Discussion

Overall, despite concerted effort, it seems that in the past thirty years we have not uncovered the keys to social change in order to enhance gender equality and non-discrimination against girls and women. Perhaps the reviewed interventions did not achieve macrolevel change because they did not simultaneously and explicitly address meso and micro change. Whilst CEDAW seeks the ‘elimination of all forms of discrimination’, achievement of that aim is far from complete, although it is not surprising that no single intervention could catalyse social change that achieves CEDAW’s objective. This review demonstrates that it will take time and a variety of endeavours to achieve gender equality.

To summarise the substantive lessons from this systematic review, we offer the following distillation as a summary of the findings to date. This distillation includes definitive statements that should be viewed only in the context of this review and may not generalise across all efforts towards gender equality in all societies.

What is Ineffective in Promoting Gender Equality

Microlevel

  1. 1.

    Small changes do not lead to big changes. Small concessions are granted to maintain peace, while big changes are often denied to maintain power.

  2. 2.

    Men and boys can feel the micro effects of fear, hostility, resentment, and jealousy when meso-macro gendered social norms are challenged.

  3. 3.

    Increased confidence, agency, empowerment, or individual leadership (micro) is not sufficient to promote the structural changes required to increase gender equality (macro).

  4. 4.

    A lack of change in mindsets (micro) and poor enforcement can mean that laws (macro) are not realised or have little effect at the community level (meso).

Mesolevel

  1. 5.

    The overall focus on women ignores the real problem, and the need to engage with all members of society.

  2. 6.

    Education and awareness-raising may establish the right to education but do not necessarily create gender equality.

  3. 7.

    Raising awareness alone does not translate into behaviour change (meso to micro).

  4. 8.

    Transnational advocacy networks are not effective.

  5. 9.

    Protests in western democracies can have a polarising and backlash effect.

Macrolevel

  1. 10.

    Gender mainstreaming efforts generally fail to achieve positive outcomes.

  2. 11.

    Economic transformation does not automatically lead to social transformation.

What is Effective in Promoting Gender Equality

Microlevel

  1. 1.

    Eliciting positive affect in interventions garners positive outcomes.

  2. 2.

    Empathy is a viable feminist strategy, although evidence is limited.

Mesolevel

  1. 3.

    All genders need to participate in re-socialisation of gender norms.

  2. 4.

    Dialogue is a key to success.

  3. 5.

    A large number of women must behave differently for new behaviours to be accepted (micro to meso).

  4. 6.

    Experiential learning is a powerful way to embed knowledge about gender equity in a nonthreatening, lasting way.

  5. 7.

    Investment in access to justice must include informal channels of the justice system.

Macrolevel

  1. 8.

    Social transformation can be achieved in households through daily interactions (meso to macro).

  2. 9.

    Enabling environments (macro) are more effective than individual empowerment (micro), but should include top-down and bottom-up approaches.

  3. 10.

    Quotas are effective.

  4. 11.

    Laws must be proactive as well as reactive or complaint based.

The contextual levels of analysis developed by Pettigrew (2021) has also been adapted from these lists into Fig. 3. These distillations challenge our thinking about how to achieve gender equality and therefore require greater discussion amongst feminist activists, advocates, and the general population for ecological validation. The key findings of this review have implications for policy and practice because they call into question the type of change sought by feminist movements, the type of intervention used to achieve that change, and whether that intervention is likely to be effective in practice. Overall, this review gives pause for thought. We hope it will inform future decisions about how to achieve gender equality.

Fig. 3
figure 3

Contextual levels of analysis for this review, adapted from Pettigrew (2021)

Strengths and Limitations

Our broad inclusion criteria identified relevant interventions across a range of political, economic, social and cultural contexts, published over a thirty year period. Consistent with the recommendations by Garritty et al. (2021) we used rapid review methods; this may have led to the omission of some eligible studies. However, the use of a machine learning approach by reviewer two to rapidly screen a sample of the records predicted to be most relevant helped to limit the omission of relevant studies. Moreover, our restriction of literature to 1990 onwards may have omitted some studies conducted since the adoption of CEDAW in 1979. Given that only one study was published from 1990–2000, however, it is unlikely that this restricted timeframe had a significant impact on the review. Excluding papers not published in English is a limitation, and may have led to the omission of studies in some settings. We urge those who have non-peer-reviewed evaluations to submit them to peer-reviewed journals for future inclusion in reviews like the present one. The results of the large number of studies included in the review are difficult to generalise given the heterogenous study methods, intervention designs, populations, and settings. Because of a lack of reflexivity in most qualitative and multi-method studies, it is impossible to discern (for example) whether research undertaken in the Global South was conducted by Global North researchers. Moreover, there was no evidence of the ethical conduct of 16 studies and two studies did not have ethics approval. Together, these limitations may indicate potential problems with informed consent and implicit racial or other biases, although none were explicitly identifiable. There was insufficient evidence to assess whether and how culture played a part in attempts to achieve gender equality. Furthermore, while 86 percent of interventions predominantly or partially achieved their aims, this may inflate the effectiveness of such interventions because of reporting biases that favour publication of positive results (Sengupta et al., 2007; Sperandio, 2011).

Conclusion

This review has taken stock of successes and failures in seeking to promote gender equality. The findings reveal that undue reliance has been placed on the presumed efficacy of awareness raising, and that the race to achieve gender parity has not yet catalysed the desired social transformation. Entrepreneur programs can be exploitative, and legal actions have had limited effects, potentially failing because of men’s feelings about change. This review has shown that men can be fearful, resentful, jealous, and angry towards acts that disrupt the status quo. Until we adequately address these emotions and biases, the change that women (and potentially all genders) want, and the equality we all need will not be realised. Social context and systems thinking have shown us the importance of holism when tackling systemic discrimination. In this context, to be fully human is to be emotionally fulfilled. Ergo, human rights will be realised when there is dignity, humanity and positive emotionality among genders. Only then is the promise of CEDAW likely to be fulfilled.