Evaluating gender equality effects in research and innovation systems

Despite the fact that the topic of “women in research and innovation” has been on the agenda for decades and numerous measures have been implemented at both national and supranational levels to improve gender equality in research and innovation systems, it is still unclear which measures and under which conditions these measures are most effective. Even less research has been carried out on the effects of better representation of women in terms of research and innovation. This paper is based on the application of an innovative evaluation framework, which encompasses complexity and theory of change approaches and aims at exploring the link between interventions and their subsequent effects to two case studies. We discuss two major German flagship programmes aiming at increasing the participation of female researchers in the science system, the “Women Professorship Programme” and the “Pact for Research and Innovation”. Through the two programmes, we tested and validated the evaluation framework and its indicators. As part of the validation process, a theory of change has been developed for each of the programmes. The theory-based evaluation approach helped not only to identify gender equality impacts but also broader effects on research and innovation that might have otherwise remained undetected. We studied the effects of the two programmes on the number of women in leadership positions and analysed whether an increase in the proportion of women leaders influences publication patterns. Although linear linkages are challenging to establish due to the complexity of the process, the findings suggest that the flagship programmes have contributed not only to higher shares of women researchers but also to improved female publication and citation rates. There are clear benefits for Germany in terms of scientific results from the increased proportion of female researchers in research and innovation.


Introduction
With the rise of evidence-based policy making (e.g. Nutley et al. 2002;Solesbury 2001;Sanderson 2002), expectations have grown regarding the use of scientific evidence in decision-making in the policy realm. Scholars show that assessments of evidence-based policy-making have been increasing alongside the growing interest for impact assessment (Reale et al. 2014). At the same time, establishing causal relationships between policy interventions and observed effects poses a theoretical challenge, and entails empirical and methodological problems, as linear relations between interventions and impacts are difficult to identify (Halpern 2014;Reale et al. 2017). As regards the effects of gender equality policies, scholars call attention to the lack of studies and evidence, and the simplification of approaches in impact assessment of policies (Timmers et al. 2010;Kalpazidou Schmidt and Cacace 2017).
Despite many efforts undertaken in the past, there is no comprehensive and rigorous analytical framework to consider the numerous variables in gender equality interventions. Various projects funded by the European Commission such as PRAGES (Practising Gender Equality in Science), GARCIA (Gendering the Academy and Research: combating Career Instability and Asymmetries), GEDII (Gender Diversity Impact-improving research and innovation through gender diversity), GENERA (Gender Equality Network in the European Research Area), GenPORT (An internet portal for sharing knowledge and inspiring collaborative action on gender and science), GenSET (Increasing Capacity for Implementing Gender Action Plans in Science), STAGES (Structural Transformation to Achieve Gender Equality in Science) and GENOVATE (Transforming Organisational Culture for Gender Equality in Research and Innovation) have explored the gender equality (GE) dimension with different foci, encompassing varying levels of applications, from basic research to developing various supporting mechanisms. While these previous projects and subsequent studies have illustrated a number of evaluation approaches, concepts, etc. and provided examples of measuring different kinds of impacts, a clear understanding of the linkages between gender equality-related policy initiatives and interventions (inputs) and results (outcomes and impacts) is still not available. There is hence limited knowledge about how effective GE interventions have been and little is known about the effects of these interventions in research and innovation. In order to address these challenges, an evaluation framework has been developed within an H2020 project, 1 aiming at mapping the mechanisms that mediate the relationship between gender equality inputs and the expected effects not only in terms of gender equality itself, but also on research and innovation (R&I). The evaluation framework 2 provides the theory and tools to analyse how gender equality-related interventions contribute to the achievement of set objectives on gender equality 3 and how those achievements affect the desired outcomes of research and innovation.

Theoretical framework
Evaluating policy interventions is a complex process as no linear link between interventions and effects can be easily established (Cartwright and Hardie 2012;Dahler-Larsen 2012). Our theoretical framework has its point of departure in complexity theory and adopts a theory of change perspective in studying the link between GE interventions and effects in research and innovation (Kalpazidou Schmidt and Graversen 2020).
According to the notion of complexity, gender equality interventions are embedded in the complex systems that they form part of and involve multiple variables that interact in non-linear ways to produce effects. These systems respond to the policy interventions, adapt and produce new conditions (Halpern 2014). Evaluating complex interventions poses thus great challenges as new conditions are constantly emerging (Rogers 2008). Reale et al. (2017) assert that it is problematic to speak about attribution, as it is difficult to determine to what extent the interventions are the exclusive or most important cause of the measured effects. Therefore, in complex systems, impact cannot be directly attributed to a particular intervention but has to be conceptualised by means of evaluative approaches that pursue intervention contributions to achieve impact (Kalpazidou Cacace 2017, 2019).
One way to address the complex challenges discussed above is the theory-based impact evaluation approach (Kalpazidou Schmidt and Graversen 2020). Theories of change may be used as models of how change is expected to occur or how change has come about (Mayne and Johnson 2015). It is similar to the logic model, often used in development interventions, which convey a scheme, program, or project in a brief, visual format (McLaughlin and Jordan 2004; Knowlton and Phillips 2012) but explicitly includes a reflection on assumptions. Mayne and Johnson (2015) assert that a theory of change develops a framework to highlight how an intervention will perform. Thus, a tested and verified theory of change can be the point of departure for assessing the contribution of the intervention or programme to the measured effects. We build on this approach in line with a growing strand of research that discusses how the theory of change can contribute to the evaluation and understanding of policy interventions (Funnell and Rogers 2011;Rogers 2008).
According to Vogel (2012, 3) a theory of change approach is "an outcomes-based approach which applies critical thinking to the design, implementation and evaluation of initiatives and programmes intended to support change in their contexts". In theory-based impact evaluation, causality is defined as a problem of contribution, not attribution. "Why and how" questions are typically asked instead of "how things would have been without" as counter-factual approaches do (Döring and Bortz 2016). The goal is to answer the "why it works" question by identifying the theory of change ("how things should logically work to produce the desired change") behind a policy intervention or program and assessing its success by comparing theory with actual implementation in a certain context. Articulating assumptions about the links between interventions and their effects to make explicit the mechanisms producing change are key to the development of a theory of change (Van Belle et al. 2010).
Theory of change and gender equality approaches can enrich each other. The added value of combining gender sensitive perspectives and theory of change is that both address change and seek to articulate how change occurs, adhering to the non-linearity and contribution paradigm. A theory of change is characterized by reflexivity, where key assumptions linked to a programme are made explicit with the aim to be verified or challenged through empirical testing. Gender transformative work encompasses questioning assumptions behind gender roles as well as relations, it can also help to unpick gendered assumptions that may form the basis of the design of various interventions (HIVOS 2014). There are often many assumptions about gender relations that essentialise both women and men. In the field of R&I-these include gender differences in terms of different publication behaviours, networking patterns, research topics, leadership styles etc. Making these assumptions explicit means that interventions need to be justified on the basis of evidence. Reflexivity is also central to gender sensitive approaches, as is context. Gender perspectives and reflexive approaches are sought after in the literature that could help understand how to evaluate policies from a gender perspective (Bustelo 2017).
According to the adopted evaluation framework, gender equality interventions are embedded in a complex context in which a large number of variables interact with each other and thus ultimately determine the effectiveness and impact of a programme (Kalpazidou Schmidt and Graversen 2020). The actual results of GE policies depend both on policy effectiveness and on other contextual variables. Contextual factors are organizational structures and cultures, as well as national and regional structures, capabilities and policies. The application of gender sensitive and theory-based impact evaluation approaches allows us to take these different levels of influences on policy effectiveness-mechanisms and context-systematically into account.
This theoretical perspective, based on theory of change and complexity, has important implications for gender equality policy and practice. It implies that the effects of interventions are largely expected in terms of contributions to change, improved conditions to foster change, as well as an increased probability that change can happen. Therefore, to better understand the effects of the programmes discussed below, we need to move away from deterministic models expecting the programmes to lead to change over a relatively short period of time (and in a linear logic). Instead, we need to adopt a reflexive and probabilistic approach demonstrating contributions to change under different contextual conditions and over longer periods of time based on clearly formulated assumptions about the impact of interventions in specific contexts, developing theories of change for each of the cases we study (Kalpazidou Schmidt and Graversen 2020).

Methodology
We carried out a comprehensive desk research drawing on already developed and applied indicators in gender equality interventions and R&I research (RIO Observatory, OECD STI Scoreboard etc.), as well on recent studies on RRI indicators (Ravn et al. 2015a, b;European Commission 2015a) to develop the evaluation framework and create the preliminary list of indicators. First, we identified the most relevant indicators according to a systematic literature review. Second, based on the review and "smart practice" examples implemented in different organisations and contexts, we clustered the indicators into different categories, dimensions and sub-dimensions, according to an evaluation logic model based on the theory of change. The indicators have also been differentiated in terms of input, output, outcome and impact. Finally, for each of these aspects, the indicators have been categorised at micro/individual or team level, meso/organisational level and macro/ policy or country level (Kalpazidou Schmidt et al. 2018).
The evaluation framework and the indicators are based on an extensive literature review and the collection and review of "smart practices" implemented in Europe and beyond. The identification of "smart practices" was based on an assessment of practices that are relevant, effective and efficient in the context that they operate in terms of their quality of both evaluation and measurement (Kalpazidou Schmidt et al. 2018). "Smart practice" examples evaluated various different measures in terms of scope and length: some constituted large national programmes with a long-term perspective, while others were of a more limited character. The selection of "smart practices" was based on the following criteria (1) the quality of the implemented measures, and (2) the impact of the measures. The quality of the measures was assessed based on the parameters of relevance, effectiveness, efficiency, and sustainability of the interventions, while the impact of the measures was assessed in relation to its subjective/objective dimension. Subjective impact refers to the satisfaction of beneficiaries and the ability of the intervention to promote consensus among the key stakeholders, which is a precondition for impact. Objective impact involves the effects of the intervention on the organization but-in line with the contribution, not attribution approach of the complexity theory-it also includes a "probabilistic stance", i.e. "the creation of conditions pointing to the activation of further change processes" (Kalpazidou Schmidt and Cacace 2017, 106).
In addition, we used the existing evidence on the beneficial impacts of gender equality on different areas, integrating the respective indicators into the developed evaluation framework, in order to illustrate the wider areas where effects can be recorded (Table 1).
The links between gender diversity and the selected types of benefits mentioned above can be traced back to greater diversity of thinking, differences in values and norms, the activation of underused human capital as well as different collaboration styles. Power asymmetries are important as well. In the field of science, this often leads to the fact that women tend to publish in new and emerging fields of science, not yet dominated by male colleagues. Those new fields are often interdisciplinary in nature and typically associated with higher shares of citations (Okamura 2019, Wang et al. 2015. From a business perspective, the presence of diversity balances biases, which thereby contributes to the generation of alternative perspectives and experiences for exploring new problems. Thus, diverse teams are much more likely to consider and implement alternative approaches and have innovative ideas. Through their diverse viewpoints and ideas, diverse teams develop ideas and solutions that are more creative and more inclusive, which often leads faster to results and innovative products (Cosley et al. 2015). Furthermore, diversity in the workforce makes it possible to better adapt to different customer groups or markets. By giving the customer groups interlocutors who "speak the same language", the company is also able to address and win new customers. The presence of female directors signals to stakeholders that the company is committed to the advancement of women, which is interpreted as a socially responsible action (Bilimoria and Wheeler 2000). Finally, female employees can bring in their particular competencies to cope with necessary changes induced by eco-innovations (shift in firms' organizational goals, practices and routines due to their complexity and systemic character) through emphasizing teamwork and cohesion (Horbach and Jacob 2017). The link between environmental impacts and gender diversity is derived from the observation that pro-environmental behavior is associated with femininity (Brough et al. 2016). According to recent research, women have more environmentally focused values (Civitas 2020). Risk perception is a further key mediating concept: Difference in environmental concern is due to differentially perceived vulnerability to risk in terms of health and safety as well as social and economic threats (Xiao and McCright 2012). Women are more likely to have a higher recognition of health issues and more highly developed risk perceptions, often acting on their internalised health and environment orientation (Schultz and Stiess 2009). However, it is worth mentioning that collaboration within diverse teams is challenging as well and the performance of such diverse teams depends on various factors like gendered competency expectations, hierarchy in teams (Rommes 2014), level of team deference, scale of empathy, acknowledgement of gender issues in the team and perception of rebalancing power, to mention just a few.
In accordance with the theory of change approach, we revisited, validated and refined the initial configurations regarding the two cases. Thus, theories of change were developed for these interventions in order to provide examples of the mechanisms behind the specific programmes and a framework for understanding the multi-faceted character of the interventions (Kalpazidou Schmidt and Graversen 2020).
As already mentioned, the evaluation framework developed based on complexity and theory of change approach was followed up by the case studies that have been conducted to validate and further improve the evaluation framework. Case studies as a method have been used extensively in evaluation research. Yin (1994, 13) defines a case study inquiry as one that "Investigates a contemporary phenomenon within its real-life context, especially when the boundaries between phenomenon and context are not clearly evident." Therefore, the case study method lends itself to research where contextual factors are highly pertinent to the phenomenon of study (ibid). The multiple case study work shed light on those factors and mechanisms that shape and influence the effects of gender equality interventions on research and innovation outputs. It also set out to explain how the national science system influences the intervention in terms of the main contextual elements as well as the main agendas, strategies, and policies that shape the intervention.
Using the evaluation framework, the case studies examined whether two of the major German flagship programmes to increase the participation of female researchers in the German science system, the "Women Professorship Programme" and the "Pact for Research and Innovation", have actually increased the number of women, especially in leadership positions. In a second step, based on literature and desk research as well as bibliometric analysis using Scopus, we analysed whether such an increase influences the publication patterns of authors with German affiliation.
In the following section, we focus on two cases of the German flagship programmes to increase the participation of female researchers in the science system, the "Women Professorship Programme" and the "Pact for Research and Innovation", and discuss their effects and possible benefits in terms of more women researchers and publications. For each of the selected programmes, we present the Theory of Change (ToC) and describe not only the programme objectives, inputs and throughputs, but also the target group, the central actors, as well as promoting and hindering contextual factors at policy, organisational and team level. Each ToC also presents the expected or already occurred effects in the form of short-term outputs, medium-term outcomes and longer-term impacts. Based on the impact assumptions also explicitly made in the ToC, which go beyond the actual GE effects, the ToC also contains references to R&I impacts.

The case studies
The two cases were selected because they constitute the two most important examples in the German science system to overcome its rather poor performance in terms of gender equality. In Germany the central policy context factors include freedom of science, research and teaching as a fundamental right pursuant to Article 5 of the Basic Law. Accordingly, universities as well as RPOs have comparatively high autonomy. Going along with this structure, actors in the science system have committed to gender equality goals, and there are positive incentives, but there are only a few legally binding measures. This can be assumed one of the reasons why gender equality in academia is improving, but this process is slow. Women in academia have lower positions on average compared to men; they are also more often working in precarious positions than men. Compared to the European average, approximately twice as many female researchers in Germany have precarious working contracts (European Commission 2015b, 104). Gender inequality in R&I in Germany is connected to the working time and attendance culture. The working time culture in academia means that even researchers in full-time positions regularly work overtime (Eurostat 2016). Male researchers work more overtime than female researchers (Eurostat 2016), which may lead to further career advantages for men. The high workload and attendance culture that characterises scientific careers is less attractive for women than for men (Niessen et al. 2010) and thus may constitute a potential barrier to gender equality in science.
The first of our cases, the "Pact for Research and Innovation", is a flagship project in Germany, for several reasons: its size (3% growth rate of the institutional funding for each of the RPOs, during phase 2 even 5%), long duration and the functional mechanism (linking overall policy objective including GE with institutional funding). The intervention is rather unique as it connects institutional funding with overall policy objectives like improved gender equality (GE). The second case, the "Women's Professorship Programme", was selected for the following reasons: its size (150 million euros for each of the first two phases, 200 million euros for the third phase), its duration (since 2008) and the smart combination of goals: not only promoting GE and GE in leadership positions, but also structural change at German Higher Education Institutions). Finally, this case shows a rather innovative functional mechanism through relying on competition, and setting incentives to develop convincing gender equality plans.
The "Pact for Research and Innovation" started in 2006. Meanwhile, the start of a fourth phase has been decided, extending the program until 2030. The current "Pact for Research and Innovation" (2016-2020) has two predecessors. The first phase lasted from 2005 to 2010, the second phase from 2011 to 2015 and the third phase runs from 2016 to 2020. The overriding goal of the concerted action by the federal government and the states is to strengthen the competitiveness of the German research system. It addresses the German Research Foundation (DFG) as the most important source of third-party funds in Germany and the publicly funded non-university research institutions Fraunhofer-Gesellschaft (FhG), Max Plank Society (MPG), Helmholtz Association (HGF) and Leibniz Association (WGL). The Pact obliges the research organisations to comply with several negotiated targets. The organisations themselves are responsible for the progress towards these targets and must document this in an annual monitoring report. In return for their compliance, the organisations' budgets receive an annual boost of currently 3%. Furthermore, the government guarantees them sufficient autonomy and flexibility in budgeting, human resources and construction, public procurement and participation rights. In the context of the "Pact for Research and Innovation", the research organisations also set targets for the share of women at different hierarchical levels, applying the logic of the cascade principle (BMBF 2016;GWK 2016). Figure 1 shows the ToC for the "Pact for Research and Innovation", as derived from the theoryof-change approach. Beside the effect on improved shares of women in research teams and decision-making positions, we expect different types of research outputs such as a better quality of research, operationalized as number of citations.
The "Women's Professorship Programme" is a national initiative that addresses higher education institutions (HEIs) in Germany. Not only universities but also universities for applied sciences ("Fachhochschulen") and art and/or music colleges are eligible for funding. The eligibility criteria are different for HEIs applying for funding for the first time, and those that have already participated in one of the precursor stages. If HEIs apply for funding a second or even third time, they have to describe the success and/or failure of previous gender equality measures and the lessons learned for the future. They also have to indicate the evaluation approaches planned for continuous monitoring. Furthermore, they have to describe how they intend to anchor their GE interventions in a sustainable way (BMBF 2018, Bekanntmachung). The programme grants funding to universities for initial appointments of women to tenured professorships at the rank of a full professor (W2 and W3 positions). Submitting a promising and tailored gender equality plan (and in later stages, providing evidence for its successful implementation) is the prerequisite to receive funding (BMBF 2018). The programme offers primarily financial resources: Each HEI with an approved gender equality plan (GEP) can receive funding for up to three professorships for a duration of 5 years maximum. The maximum sum for each professorship is 150,000 euros per year during the first two phases and 165,000 euros in the third phase. In round three, a maximum of 10 HEIs with top scores in the appraisal can receive further funding for a fourth professorship. The Women Professorship Programme ("Professorinnenprogramm") was launched in March 2008 when the funding guidelines were announced. In June 2012, the Joint Science Conference (Gemeinsame Wissenschaftskonferenz, GWK) decided to continue the programme ("Professorinnenprogramm II"). The third round ("Professorinnenprogramm III") was launched in February 2018 (BMBF 2018, Bekanntmachung). Figure 2 illustrates the ToC for the second German case. The main expected output in the area of gender equality is the increased probability for women to reach a top position and in the area of research an increase in the number of citations.
In the case studies, we focused on the development of the female researchers in leadership positions among the largest German research performing organisations, on the one hand and at the universities, on the other. As mentioned in the ToCs, one of the main objectives of the "Women Professorship Programme" is to enhance the presence of women at all levels and to raise the numbers of women at the top of universities (see Fig. 2). The "Pact for Research and Innovation" (see Fig. 1) requires an increased representation of women in the science system, especially in leadership positions, as well. Furthermore, we looked at the publications and excellence rates of female researchers compared to their male colleagues. This was done because we expected, as listed in the assumptions displayed in the ToC figures, effects on the publication output if the research teams become more diverse. The following Fig. 3 shows the share of women in leading positions in the four big German research performing organisations. Due to an organisation-specific definition of three "management levels", a comparison of the total figures between the Fraunhofer-Gesellschaft (FhG), the Helmholtz Association of German Research Centres (HGF), the Max Planck Society (MPG) and the Leibniz Association (WGL) is only possible to a limited extent. However, the share of women in management positions in science rose from a total of 2.0% in 2005 to 17.8% in 2017. What is also noticeable is that the increase occurred after the establishment of the "Pact for Research and Innovation" in 2005 (see also Bührer and Frietsch 2020). Figure 4 shows the number of full-time professors at German universities between 2004 and 2017. 36,126 male full-time professors and 11,442 female full-time professors were employed at German universities in 2017. Here, too, it can be pointed out that the number of women professors has risen more dramatically than the total number of professorships. A particularly dynamic development can also be observed here that coincides with the launch of the women professorship programme in 2008.
According to the above, there is a significant increase of women in research (see Figs. 3,4). Also the number of women (co)authors of scientific publications grew during the same period (see Fig. 5). Furthermore, a look at the most frequent quality indicators (also to be understood as longer-term impact), such as citations and excellence, reveals that the rates are roughly the same for male and female authors (see Bührer and Frietsch 2020). This finding is widely confirmed in the academic literature as well: women typically publish less frequently overall, but with higher quality as measured by citations (Campbell et al. 2013;Tower et al. 2011;Powell et al. 2009). These results enable us to show that more     women in the science system not only bring about a "gain in justice", but also a concrete scientific benefit. The excellence rates are measured as the shares of publications of an institution that belong to the top 10% of the most highly cited articles in their particular fields. Our analysis shows that women's excellence rates are higher than that of men on average, for both the university and the non-university sector, as illustrated in Table 2 below (adopted from Bührer and Frietsch 2020).

Discussion and conclusions
It is obvious that our evaluation approach reflects many of the limitations that are characteristic of the evaluation field. A key issue is that there are many factors intervening in the social space, which means that impact is indirect and long-term (Martin 2011;Reale et al. 2014). Thus, as pointed out in the discussion of our theoretical point of departure, it has not been our intention to establish a direct linear link between the flagship programmes and the results in terms of publications and excellence rates but to show the contributory nature of these kind of programmes to achieve gender equality effects.
Second, despite the fact that we compare our results to the results of other studies to validate them, we are aware that only quantitative methods are not enough in order to study complex phenomena as such as the flagship programmes, and their effects. Adhering to the gender-sensitive and complexity line of research, we acknowledge that further research is required to complement traditional quantitative measures with a gender-sensitive and reflexive design that requires a combination of quantitative and qualitative methodologies (Bustelo 2017;Espinosa 2013).
Based on a thorough analysis of the relevant knowledge in gender equality, evaluation as well as science and innovation research and the structured analysis of smart practice examples, we developed an evaluation framework with indicators for the assessment of GE interventions. Through the case studies of the flagship programmes, we tested and validated our evaluation framework and indicators. As part of the validation process, a theory of change has been developed for each of the programmes. The theories of change developed for the cases analysed herewith have shown that going a step further when investigating the effects of GE measures and looking at research outputs in terms of research and innovation, is a very innovative and promising approach. Concretely, the theory-based evaluation approach helped us not only to identify GE impact but also broader effects on research and innovation that might have remained undetected otherwise (compared for example to GESIS (2017) that focused on GE effects only).
There are clear gaps in the literature as regards the link between gender sensitive approaches and evaluation, and evaluations from a gender perspective are limited (Espinosa 2013). By combining a gender perspective with a theory of change approach, we position change in context and offer an in-depth reflection on assumptions of change. This approach has the potential to effectively address some of the challenges that evaluation of gender equality programmes has faced. It may support organisations and programme owners to develop evidence-based strategies. It may also support better monitoring to generate more elaborated data, moving away from only quantitative data towards evaluating gender disparities and advances articulated in theories of change.
Wrapping up, in accordance with our complexity, contribution and theory based approach we could identify broader impacts beyond pure gender equality effects. The cases presented above show that, not least due to the two large national GE interventions, the role of women academics in the German publication landscape has changed significantly over the past 15 years and there has been a clear increase in the number of (co-)publications by female authors. Furthermore, although the overall number of women has also increased significantly since the introduction of the flagship promotional programmes of the "Women Professorship Programme" and "Pact for Research and Innovation", it has not risen to the same extent as women's participation in scientific publications. Thus, we can show that GE progammes may have broad positive effects on science as well (cf. Bührer and Frietsch 2020).
In line with our evaluation framework and approach, we have aimed at investigating the linkage between GE programmes and research and innovation in terms of contribution to change and in a probabilistic way, i.e. whether the programmes have created the conditions so that change can take place -and not in terms of attribution. We have thus good reasons to claim that the programmes have contributed not only to the higher shares of women within the research performing organisations, but-as a long-term impact-also to improved female publication patterns and citation rates.