Introduction: Do Governance Instruments Have Effects?

After almost 50 years of empirically investigating the introduction of new governance instruments, we are still unable to establish whether and how they affect the conduct and content of research. We assume that changes in governance affect epistemic change—understood here as change in the practices of producing new knowledge and in the outcomes of these practices—but cannot prove it according to the methodological standards of social science.

The field of science and higher education policy studies has strongly grown since the 1970s, not least due to significant changes in the governance of science and higher education in many OECD countries since then (Whitley 2010; Reale and Seeber 2013; Capano and Pritoni 2020). The introduction and sometimes rapid succession of policy reforms and new governance instruments called for description, comparison and assessment. Areas of interest include

  • changes in research funding, with an emphasis on the emergence of research councils as new important actors in the science system (Braun 1993, 1998; Rip 1994; Braun and Guston 2003; Nedeva 2013),

  • the realisation of political interests through particular funding programmes, with an emphasis on support for emerging fields (Molyneux-Hodgson and Meyer 2009; Bensaude-Vincent 2016) and on excellence funding (Laudel and Gläser 2014; Langfeldt et al. 2015; Möller et al. 2016), and

  • higher education reforms, with an emphasis on changing authority relations in the higher education sector and performance-based funding of higher education institutions (Schimank 2005; Whitley and Gläser 2007; Paradeise et al. 2009; Meier and Schimank 2010; Musselin 2014; Thomas et al. 2020).

One strand within this research has been devoted to the identification of epistemic effects of changes in the governance of science. This is not surprising given that the empirical objects of science policy studies – governance instruments – are designed to influence the conduct and content of science. Many science policy scholars have also been involved in advising science policy on the effectiveness of its instruments. Studying governance instruments remains incomplete without considering their intended and unintended effects.

However, several reviews have reported that convincing evidence of any causal links remains scarce (de Rijcke et al. 2016; Gläser and Laudel 2016; Thomas et al. 2020). Moreover, the possibility of establishing causality with our current approaches has been called into question (Gläser and Laudel 2016; Schneider et al. 2016; Aagaard and Schneider 2017; Gläser 2017; Thomas et al. 2020). Thomas et al. (2020: 282-283) found that the literature on “performance-based research evaluation arrangements” is often descriptive, cannot establish effects on specific kinds of research, primarily focuses on micro-level changes, and does not apply comparative frameworks. They propose a new research agenda, whose key elements are the introduction of comparative analytical frameworks, complementing “efficiency concerns” (“whether arrangements have achieved what they set out to achieve”) with “effectiveness concerns” (“are the ‘right’ things being done in the science system?”), the causal attribution of effects, and the inclusion of effects on the structure of global knowledge communities and bodies of knowledge (ibid: 283; see also Gläser and Laudel 2016: 156).

In this paper, I address the problem of causally attributing changes in the conduct and content of research to changes in governance. I demonstrate that studies that claim to have found effects of governance do not meet the methodological standards of causal analysis. However, as important as improving our methodologies is, it is unlikely to be sufficient for establishing causality due to the complexity of the causal processes involved. While the understanding of causality underlying the search for such effects is rarely explicated, the studies’ designs reflect attempts to solve a mono-causal problem, which is at odds with the multi-causal nature of social phenomena. This is why I suggest that we need to change our approach to establishing causality by abandoning the search for effects of governance and moving to the search for causes of changes in research content.

To develop this argument, I identify current dominant approaches to establishing causality and discuss their methodological shortcomings and principal limitations (2). Based on this analysis, I discuss reasons why tracing influences of governance instruments through the complex science and higher education system to the researchers whose behaviour they are assumed to change is unlikely ever to be successful (3). I then successively narrow my argument. First, I suggest that for establishing causality with qualitative methods, we need to reverse our analytical strategy by tracing causal processes backwards from observable epistemic change and determining the causal role of governance in these processes (4). Second, I turn to one particular kind of change and ask how we can identify epistemic change (5). Some conclusions about consequences of the new approach for science policy studies can be drawn (6).

Have We Demonstrated that Governance Causes Change in the Conduct and Content of Research?

In this section, I identify two dominant approaches that are used in science studies for causally ascribing changes in the conduct and content of research to governance instruments. Instead of duplicating the reviews listed in the introduction, I distinguish studies according to their empirical approach to collecting evidence supporting causal claims and discuss typical or particularly influential examples. I begin with bibliometric studies that try to causally attribute changes in publication behaviour or publication performance to a particular change in governance by considering it as a ‘treatment’ which is followed by ‘effects’. A second type of study attempts to establish causality by asking participants about changes in their research. Although a growing number of studies employs a ‘mixed methods’—approach, most studies use only one approach for establishing causality. At the end of this section, I consider the ‘causality narrative’, which is unfounded in the light of the methodological problems.

“Change after Treatment”—Bibliometric Studies of the Impact of Governance

Among the studies of effects of governance, one research tradition uses bibliometric methods to establish epistemic change on the country level, university level, or on the collective level of grant recipients. Bibliometric methods have three major advantages. They are unobtrusive, i.e., they do not require interactions with the units of analysis. They support the analysis of macro-level epistemic changes and of changes in researchers’ publication practices in national science systems. They can also be used to study research performance if it is validly reflected in publications or citations, a premise which is increasingly challenged.

Given this potential, bibliometric methods have so far been applied with a surprisingly narrow focus on publication behaviour, possibly because this is what most governance instruments under study target. Most studies include some aspect of research quality, e.g. by differentiating publications according to strata of journals or by including citation-based measures. To the extent to which these differences are assumed to reflect research performance, bibliometric studies address not only behavioural change but also epistemic change.

Bibliometric studies of effects of national research evaluation systems or grant funding apply before-after comparisons in which the introduction of a new governance measure is considered as treatment. Three prominent examples include:

1) A study by Butler (2002; 2003a; b; 2004) according to which the publication component of Australian formula-based funding of universities, which rewards publications regardless of their impact, caused a disproportionate increase in the number of publications in low-impact journals. Although the texts of Butler’s publications treat the causal claim as a “hypothesis” for which “support” is found, the titles of the publications “Explaining Australia’s increased share of ISI publications—the effects of a funding formula based on publication counts” (Butler 2003a) or “What Happens when Funding is Linked to Publication Counts?” (Butler 2004) promise to answer a causal question, as do some statements in the abstracts. Butler supplemented her argument by showing a different dynamics of publications in two control groups (the hospital sector and the government sector, Butler 2003a, b, 2004) and by presenting anecdotal evidence about links between different responses of universities to the introduction of the funding formula and publication dynamics (Butler 2003a). She also considered and dismissed some possible alternative causes of the publication dynamics (Butler 2003a: 149). Her study was received as demonstrating causality (Weingart 2005; Hicks 2009; Good et al. 2015; de Rijcke et al. 2016). It motivated a similar study of effects of the Norwegian funding system (Schneider et al. 2016), whose authors argued that the number of publications in low-impact journals did not rise in Norway because the Norwegian system takes the quality of publications into account. Although Schneider et al. were aware of the difficulties involved in establishing causality (ibid: 245), they nevertheless asked a causal question (ibid: 246) and answered it (ibid: 255). A year later, van den Besselaar et al. (2017) attempted to refute Butler’s argument by presenting data that showed “increased research quality” (ibid: 905), which they also causally attributed to the Australian funding formula. Several discussants of that paper pointed out that the available evidence does not enable causal claims either way (Aagaard and Schneider 2017; Gläser 2017; Hicks 2017). In a continuation of this discussion, Schneider et al. (2017) stated in a comment on their article from 2016 that they did not intend to make a causal claim.

2) The observation by Jimenez-Contreras et al. (2003) of an acceleration in the rate of Spanish publications indexed in the Web of Science from the end of the 1980s onwards, “which eventually became exponential” despite a levelling out of investment in science (ibid: 130-131). The authors reviewed possible explanations for this change and argued:

Because the factors reviewed above are not able in themselves to explain the change in the growth rate (although clearly they are of some relevance), the determining factor in this recent increase in the publication of Spanish research in international journals appears to be the introduction, in 1989, of mechanisms of evaluation of publicly-sponsored research activity. (ibid: 134, my emphasis)

Their study has been received as proving the effect of the governance scheme (Weingart 2005; Hicks 2012). Later, their conclusions were challenged by Osuna et al. (2011), who systematically considered possible alternative causes and compared participants in the sexenio to a control group. The authors argued that the causal ascription of changes in the number of international publications to the sexenio could not be upheld:

It seems clear enough from our analysis that simplistic approaches such as before and after measures, often used by politicians to legitimise their ‘narratives’, are not sufficient from a research point of view. (ibid: 589)

3) The analysis of a statistical association between institutional and individual incentive schemes, on the one hand, and an increase in submissions to and publications in the journal Science, on the other hand (Franzoni et al. 2011). The authors consider only very few alternative explanations (the variation of research inputs, the extent of international collaborations, and each country’s share in positions on the editorial board of Science), which are controlled in their model. They write about association rather than causation throughout the paper but abandon this careful stance in the conclusions, where we read:

Incentives increased competition (with the US) from countries with latent capacity by altering the amount and apparent quality of the work that is submitted for scientific review and eventually published. (ibid: 703, my emphasis).

In the supporting online material, we also find a mix of causal and non-causal language. On page 4, we read “We also analyzed the impact of the incentive policies on the number of published papers (Table S7)” (my emphasis). On page 7, the authors state as one limitation that they “cannot test for causality”, and that there might be other explanations.

Bibliometric methods are also applied in the study of effects of funding schemes. Most empirical studies usually attempt to answer the question of whether a funding scheme achieves its intended aims and what side effects it has. For a discussion of these studies, I utilise the investigation of the impact of “effects” of grant funding by the Danish Council for Independent Research (Bloch 2020), which includes a detailed review of previous research (ibid: 457-460). According to this review, most studies compare grantees’ publication and citation performance to that of non-grantees, with the control group usually constructed from unsuccessful applicants. To exclude alternative causes, some studies apply matching techniques that eliminate ex-ante performance differences and other differences between the two groups. Other studies use regression discontinuity designs, which compare grantees to rejected applicants with assessment scores close to the threshold of funding. Results are inconsistent, with some studies finding significant correlations between funding and performance indicators, other finding significant correlations for publications but not citations, and yet others finding significant correlations for some fields but not others.

These bibliometric studies of evaluation systems and funding programmes illustrate the methodological problems involved in establishing effects of a ‘treatment’. Three minimum conditions for establishing causality (Aagaard and Schneider 2017) are only partially met by the studies’ designs.Footnote 1 A first condition is precedence: the cause must precede the effect. This can be easily established only for funding programmes. In the case of national evaluation systems, the ‘treatment’ is a process whose position and boundaries in time cannot be established because incentives can be anticipated by the actors involved (Butler 2017; Hicks 2017) but also “trickle down” (Aagaard 2015) at different speed in different parts of the science system.Footnote 2

The second condition is that correlation between treatment and effect must be established. This happened only in the study by Franzoni et al. (2011) and in some of the studies of funding programmes (with other such studies trying to but not finding correlations). Correlation is difficult to establish in studies of evaluation systems due to the impossibility to determine the time of the ‘treatment’ (Hicks 2017) and for other reasons listed by Aagaard and Schneider (2017: 924).

The third condition is non-spuriousness. Establishing non-spuriousness would require the systematic exclusion of possible alternative causes of the observed phenomena, which the studies did only to a very limited extent and rarely with a convincing design. If studies of evaluation systems excluded possible alternative causes at all, these were introduced ad hoc. Some studies use untreated control groups, which are constructed from researchers who were not subject to an evaluation system or from various populations of non-grantees, respectively. Studies of funding programmes attempt to exclude alternative explanations by applying advanced matching techniques that minimise differences between grantees and non-grantees. However, this matching cannot exclude all alternative explanations because the conditions under which grantees and control group members work are not investigated.

Given these problems, it cannot be concluded that the studies of governance instruments were successful in identifying effects of governance instruments, i.e. changes in the conduct or content of research that can be at least partially ascribed to the causal influence of these instruments. This is not to say that governance instruments do not influence researchers or research. We just cannot conclude from existing studies that they do or how they do so. A few studies have at least plausibility on their side. Studies of the Czech evaluation system reported a strong increase in the number of published books that meet minimum criteria of the evaluation system, with universities and departments acting as publishers of their academics’ books (Broz and Stöckelová 2018), and a steep rise in the proportion of proceedings papers in the social sciences (Vanecek and Pecha 2020). In China, the cash-per-publication reward policy introduced strong direct incentives for researchers to publish more (Quan et al. 2017). These studies did not exclude alternative explanations of the observed changes in publication behaviour but have some plausibility because they link specific properties of evaluation systems to particularly drastic or unexpected changes in publication behaviour, which has the advantage that alternative explanations are difficult to imagine.

“How do you Think your Research has Changed?”—Survey-Based and Interview-Based Studies

The second main approach to establishing causation of epistemic change by governance instruments is based on asking researchers how their research has changed under the influence of these instruments. This approach utilises surveys (Harley and Lee 1997; Hammarfelt and de Rijcke 2015), interviews (Gläser et al. 2010; Leišytė et al. 2010; Linkova 2014; Cañibano et al. 2018; Neff 2018), or both methods in combination (McNay 1998; Good et al. 2015; Mouritzen and Opstrup 2020). In few cases, focus group discussions (McNay 1998; Linkova 2014) or observations (Linkova 2014) are included. Ethnographic studies are exceedingly rare and do not claim to establish causality (e.g. Lucas 2006).

In addition to evaluation-based funding schemes for universities, funding schemes for competitive grant funding have been studied with interviews and questionnaires. For example, Morris (2000) used interviews with researchers, administrators, and stakeholders from funding agencies to study the impact of grant funding on the work of biologists. She did not focus on a particular funding scheme. Hellström et al. (2018) also used interviews in their study of the Swedish funding programme for centres of excellence.

The aim of many of these studies was to identify effects of governance instruments. This interest is sometimes cast as studying “academics responses” to governance instruments (Linkova 2014; Leišytė et al. 2010). In other cases, an interest in “effects” or “the impact” of the governance instrument under investigation is clearly stated (Harley and Lee 1997; McNay 1998; Cañibano et al. 2018; Hellström et al. 2018; Mouritzen and Opstrup 2020). Hammarfelt and de Rijcke (2015) do not frame a causal question but leave the reader to wonder why one would investigate changes in publication practices of humanities scholars after the introduction of a performance evaluation system if not for finding out whether it made a difference.

Using researchers as sources of information about the impact of governance on their work has three distinct advantages compared to the unobtrusive bibliometric methods discussed in the previous section. Questioning researchers in their role of ‘obligatory passage points‘ for influences on research content (Gläser 2019: 423) makes it possible to acquire first-hand knowledge about epistemic change, to consider overlaying influences, and to capture a wider range of possible changes than bibliometric indicators. Indeed, the studies listed above investigate not only changes in publication behaviour but also other behavioural change and a range of changes of the interviewees’ research including epistemic diversity and orientation towards the mainstream, the basic/applied character, interdisciplinarity, riskiness and time horizons of research.

At the same time, this approach faces methodological challenges when it comes to establishing causality. The first challenge is to avoid a methodological trap in the design of questions for interviews and surveys. The opportunity to ask researchers about changes in their research is sometimes used to also ask them what changes were caused by the governance instrument of interest. This practice of investigators ‘passing on’ their research question to participants instead of operationalising it is likely widespread but difficult to identify due to the woefully incomplete reporting on qualitative data collection.

Unfortunately, studies that adhere to higher standards of reporting their methodology need to serve as examples for the insufficient operationalisation of causal questions. In their study of Center of Excellence (CoE) funding, Hellström et al. (2018: 75) reported to have asked interviewees the following question:

How has the Linnaeus CoE funding affected your research in terms of (a) organization (how research projects are run and related, teams etc.) and (b) the way that you pursue knowledge in your field?

Similar questions have been asked by Pinheiro et al. (2019) in their study of the impact of governance changes on performance in Nordic universities and by Leišytė (2007: 382) in her study of the Dutch and English evaluation systems.Footnote 3 Questions of this kind are not neutral and exercises pressure on the participant to communicate effects (see e.g. Cairns-Lee et al. 2021). They also violate the principle of openness underlying qualitative research because the outcome of interest frames the approach to data collection. The validity of information obtained with them must therefore be doubted, which is why Mouritzen and Opstrup (2020) avoided this line of questioning (ibid: 108).

Answers to such leading questions are unsuitable for causal analysis. A question that casts the governance instrument as a cause presumes causation—that the instrument has effects—and asks interviewees to name these effects can only reconstruct the causality study participants believe to be at work. When participants describe changes in their behaviour or in the content of their research as effects of governance, they provide us with their holistic subjective theory of causes, assumed causal process, and effects. Probing questions that ask interviewees to explain their reasoning just force them to elaborate and thereby strengthen their theory by mobilising auxiliary hypotheses or constructing examples ad hoc. Instead of enabling a social-scientific causal analysis, research that passes on its research question can only collect participants’ subjective theories about that research question.

A correct operationalisation of the causal question would translate it in several interview questions that ask about conditions, actions, and outcomes of actions separately. For example, an interview schedule can be divided into a part that reconstructs decisions on research content and all the reasons why these decisions were made, a part in which necessary conditions for conducting this particular research are explored and a part in which perceptions of national and university evaluation schemes are explored. This separation and ordering avoid the framing of the reconstruction of decisions by the discussion of evaluations.

A second methodological challenge is the interpretation of self-reported behavioural change. Three examples illustrate the problem:

  • Cañibano and Corona (2018) compared statements of five historians who unambiguously reported to have changed their publication behaviour in response to an evaluation system to their publication histories and found no evidence for the self-reported behavioural change.

  • Hammarfelt and de Rijcke (2015: 69) observe that the shares of articles and monographs in publication channels of the faculty of arts they investigated remained constant and on the same page quote a participant saying the opposite.

  • In their very careful mixed-methods study, Mouritzon and Opstrup (2020:123) find that overall, “the introduction of the [evaluation system] does not seem to have resulted in major changes” but note that “…the many qualitative statements above almost exclusively emphasize unintended dysfunctional consequences of the [Bibliometric Research Indicator] for the production of scientific knowledge” (ibid: 124).

It is very likely that the interviewees in these studies believed their descriptions of their own or their colleagues’ responses to governance interventions to be correct. Nevertheless, their statements contradicted independently collected data about their behaviour.

The third challenge is the consideration of alternative explanations of observed change. Similar to the bibliometric studies discussed in the previous section, the strong focus of survey-based and interview-based studies on one governance instrument limits their openness for other partial causes or alternative explanations (Harley and Lee 1997; McNay 1998; Cañibano et al. 2018; Neff 2018). Studies that explore participants’ situations more fully find that the governance instrument in question is not the cause of behavioural change. For example, interview-based studies of the Australian, British and Dutch evaluation systems found the necessity of obtaining external grant funding to exercise a much stronger influence than research evaluations (Gläser et al. 2010; Leišytė et al. 2010).

These three methodological challenges can be addressed by increasing methodological rigour. A fourth challenge, which applies only to studies based on interviews, appears to be more obstinate. While micro-level change of research content can be identified and causally attributed to governance changes with qualitative approaches (Gläser et al. 2010; Hellström et al. 2018; Whitley et al. 2018), these micro-level observations do not currently enable conclusions about macro-level change. Such conclusions depend on the identification of mechanisms that aggregate micro-level epistemic change and of influences of overlapping processes of knowledge production. Neither task has yet been addressed by science studies.Footnote 4

Like the authors of studies discussed in the previous section, authors of survey-based and interview-based studies are ambiguous with regard to the question of having established causality. For example, Hammarfelt and de Rijcke make a causal argument in the title of their paper “Accountability in context: effects of research evaluation systems on publication practices, disciplinary norms, and individual working routines in the faculty of Arts at Uppsala University” (Hammarfelt and de Rijcke 2015: 63) and in its abstract (ibid.) but later in the paper state: “We cannot make the causal claim that the implementation of evaluation models at the national and local level is solely or even mainly responsible for these changes” (ibid: 74). Neff (2018) claims causality with his title (“Publication incentives undermine the utility of science: Ecological research in Mexico”) but reports only his respondents’ opinions about effects.

Other studies move from the intention to establish causality to the accurate reporting of causality reported by participants. Cañibano et al. claim to have found causal relationships but ultimately present them as claims of their study participants, e.g. “According to our interviewees, the evaluation system encourages theoretical stagnation and repetitiveness” (Cañibano et al. 2018: 787; see Neff 2018 for the same approach). Such statements can be validly derived from the empirical evidence but move the target from investigating effects of governance to reporting what study participants think about it.

Taking Stock: Methodological Challenges and Causality Narratives

The empirical studies discussed in the previous sections consider changes in research performance, researcher behaviour and in the content of research as effects of governance instruments. Most of them claim in one way or other to have established causality, at least by occasionally using suggestive causal language. None of them can be considered to have successfully established causality.

The authors of these studies appear to be aware of that problem and include disclaimers concerning causality that effect. Since causal language is nevertheless used repeatedly in many studies, readers are confronted by a curious mix of disclaimers saying causality could not be established and claims to have shown effects of governance instruments.Footnote 5 This resembles a practice termed “spin” in the biomedical literature, which is defined as “reporting practices that distort the interpretation of results and mislead readers so that results are viewed in a more favourable light” (Chiu et al. 2017: 11). Spin is not necessarily applied intentionally. However, regardless of the reasons for publications applying ‘causality spin’ in the discussion of effects of governance, they feed a meta-narrative about governance instruments causing change in the conduct and content of research, e.g.

Studies that focused on effects of funding and evaluation systems on scientific output have indeed demonstrated goal displacement. Butler … analyzed the introduction of performance metrics in Australian research funding allocation. Her study revealed a sharp rise in ISI-ranked publications in all university fields (but not in other branches of research where this type of funding allocation is not present) when funding becomes linked with publications … Butler earlier demonstrated how this strategy, while leading to a rise of relative share of Australian publications, has also contributed to a decline of scientific impact (measured in citations) during the same period (Butler 2003a, b). … Similar effects of the use of bibliometrics on the amount of publications have been found in Spain … Denmark, Flanders, and Norway …. (de Rijcke 2016: 162-163).

***

Numerous studies have analyzed changes in publication patterns in relation to the criteria that were set in subsequent national research assessment exercises in the UK, finding convincing evidence for a link between publication behavior and the conditions of assessment [...]. In these and other studies the effects (i.e. a rise in the number of publications) were seen to occur in systems where funding and scores on the metrics were directly linked. (Müller and de Rijcke 2017: 159)

At the current state of our knowledge this narrative must be considered a myth. We need to be more careful.

The Nature of the Causal Problem

While most of the methodological problems discussed in the preceding section might be solved by more faithfully applying relevant methodological rules, this is unlikely to be sufficient. The studies appear to have set themselves an impossible task of causal analysis. The search for effects of a particular governance instrument and the disregard of alternative explanations of these effects frames a study as monocausal, which is clearly at odds with the complex structure of causal processes (Mackie 1965; Franzese 2007). I briefly recount an argument from the discussion about the counter-claim by van den Besselaar et al (2017) to Butler’s claim about effects of the publication component in Australia’s evaluation-based funding system (see above, section “Change after treatment”—bibliometric studies of the impact of governance) in order to demonstrate that any possible influences of governance instruments are likely to get lost in translation, superposition and synthesis (Gläser 2017). The underlying problem is that the search for causation must be conducted in a vertically differentiated multi-level system (Fig. 1).Footnote 6 In these systems, the behaviour of lower-level elements is influenced by their embeddedness in higher level structures and at the same time generates higher-level processes (Mayntz 2009: 91).

Fig. 1
figure 1

The causal web between governance instruments and macro-level epistemic change (simplified and generalised version of Figure 2 from Gläser 2017: 930, solid arrows represent the commonly assumed causal path, dashed arrows represent major additional influences in the causal process)

The logic underlying current approaches to establishing effects of governance instruments starts from a governance instrument of interest and attempts to trace its influence to micro-level or macro-level change. Such a governance instrument may address researchers or research groups directly (as happens with funding programmes), or indirectly by communicating to universities the expectation that they make their researchers do more, better, or different research (as happens with national evaluation systems other policies). In both cases, actors that are not addressed by the governance instrument still observe it and thus may be influenced by it.

The influence of the governance instrument is overlaid by other influences from a variety of sources including other national or trans-national research governance instruments, other societal actors (e.g. commercial interests or civil society actors) and national and international scientific communities (which makes the conditions influencing meso-level actors and researchers field-specific). These influences contribute to shaping the situation of meso-level actors such as employment organisations or funders of research as well as the situation of a researcher.

Meso-level actors and their sub-units (see Mouritzen and Opstrup 2020 on the influence of university departments) thus face a situation shaped by overlapping influences from several actors, which are exercised through a variety of channels. They are organisations in an “evaluative landscape” (Brandtner 2017). They respond to this situation by influencing their researchers, thereby translating influences according to their own situations and interests. The influence exercised by meso-level actors is unlikely to be consistent. In most cases, a variety of expectations will be communicated and will be backed by different means for exercising influence. Together with influences from macro-level actors, these influences from meso-level actors shape the situations of researchers. It is important to note that this is not a superposition of equal influences. To maintain their identity as members of their scientific community, researchers need to produce contributions that meet the community’s standards of relevance and methodological conduct. National governance instruments are always overlaid by strong influences from scientific communities (Gläser 2019; Tirado et al. 2023).

Researchers respond to their situation by making epistemic choices about the content of their research. As a result, their research might change in accordance with the purposes of the governance instrument in question and/or in other ways. This micro-level epistemic change is overlaid by what other researchers in different situations do. The governance instrument in question might exercise an influence on these different situations, too. Researchers in other countries may experience a similar influence from a national governance instrument or different influences from different governance instruments.

The micro-level epistemic change is aggregated by mechanisms which are still scarcely understood. They can be assumed to be field-specific, which again complicates the causal analysis. The local and the communal levels of knowledge production are linked through processes like collaboration, peer review, controversies, and organisational as well as intellectual mobility of researchers. In each of these processes, community members influence their community’s knowledge production according to their social position and status in their community. The aggregation of micro-level epistemic change is in fact a synthesis that is accomplished by a community’s influential members in a complex process of negotiation and mutual adjustment that is still ill understood and has not yet been empirically investigated.

This brief account illustrates three problems of identifying behavioural or epistemic change and causally attributing it to a particular governance instrument. First, changes in the conduct of research (in decisions of researchers on topics, approaches, collaborations, or publications) or in the content of research including research performance (epistemic change) may not at all occur. This does not necessarily mean that the governance instrument had no effect. For example, it might have prevented epistemic change that would otherwise have occurred. Second, if change occurs, it inevitably has more than one cause because any governance instrument whose effects we are trying to establish is an INUS condition, i.e. an “insufficient but necessary part of a condition which in itself is unnecessary but sufficient for the result” (Mackie 1965: 245).Footnote 7 In other words, social phenomena are caused by more than one condition (multicausality), and several different sets of conditions can cause them (equifinality). This is why the effects of any single governance instrument are unlikely to be identifiable and unlikely to be causally attributable.

The distinction between multicausality and equifinality highlights an important tension of causal analysis. The search for alternative explanations and their exclusion addresses equifinality, i.e. the possible existence of a different unnecessary but sufficient condition that produces the same result. Such explanations must indeed be excluded for causal ascription to work. However, other causal factors identified in the investigation may also be partial causes in an explanatory account of the governance instrument under study as another partial cause. These factors are other parts of the same unnecessary but sufficient condition and must be included. When should additional causal factors be excluded because they are part of an alternative explanation, and when should they be included as candidate partial causes?

This dilemma in causal analyses points to a the third problem of identifying epistemic or behavioural change. The inclusion of other partial causes or alternative causal factors in the assessment of effects of governance instruments requires a middle-range theory that explains changes in the conduct and content of research by systematically linking conditions under which research takes place to processes triggered by these conditions and the outcomes of these processes. Without such a theory, we cannot know whether the governance instrument we are studying can have effects at all, which other conditions (other partial causes) must exist for the governance instrument to have an effect, or whether the effect might be produced without the governance instrument under study (alternative explanations). The difficulty of establishing causality of any kind in studies of the governance of science is largely due to the absence of a theory that could inform us what to look for. Currently, prior empirical research must take its place but as I demonstrated in the second section, this research is problematic.

These problems of causal analysis are inescapable implications of an analytical strategy that starts from a particular governance instrument or governance arrangement and tries to ‘forward-trace’ its influence on the conduct and content of research. This strategy faces the problem of a possible dilution of the governance instrument’s influence by translation, superposition, and synthesis in a causal network about which we have no theory yet, and is unlikely to lead to observations of change that can be causally attributed to governance. Researchers respond to a multitude of simultaneous influences, and their responses are overlaid by others’ responses to equally complex situations. Studies attempting to establish effects of governance instruments by searching for the change they cause set themselves an impossible task.

From Tracing Effects to Tracing Causes

Having discussed necessary methodological improvements and the complexity of the causal processes underlying eventual influences of governance on the conduct and content of research, I now turn to the question of how causal analysis could be improved. This question is somewhat optimistic because it presumes that a successful causal analysis is possible, while the preceding section could also be read as an argument against this very possibility. However, given the importance of causal analysis for theory building, particular in social studies of fields where creating change is a major concern of key actors, I believe we should keep trying. I first discuss three major approaches to causality in the social sciences and point out developments that could improve empirical analyses and then further develop the argument for qualitative studies by arguing that we need to turn the causal question around.Footnote 8 Instead of asking what change is produced by governance, we should ask how governance contributes to change by starting from observable change and find out how this change is brought about.

Three Ways of Establishing Causality in the Social Sciences

The literature on social science methodology and on the philosophy of social science discusses three ways in which causality can be established.Footnote 9 The first and most common way is to establish causal relationships, i.e. to identify social phenomena that can be considered as causes and effects. This is achieved by finding associations among variables through statistical analysis and providing grounds on which this correlation can be considered to represent causation (Pearl 2009).Footnote 10 The discussion about the difficulties of drawing causal conclusions from correlational analysis has not only contributed to the rising interest in causal mechanisms but also led to innovations in statistical methods that can better support causal analysis (ibid.). Still, it remains unclear how far innovative statistical methods can go without support from much deeper substantive knowledge of the empirical realm under study (Freedman 2010).

A second way of establishing causality, which has enjoyed increasing attention in the last three decades, is the search for causal mechanisms. A causal mechanism (or social mechanism) is understood here as a sequence of causally linked events that occur repeatedly in reality if certain conditions are given and which link specified initial conditions to a specific outcome (Gläser and Laudel 2019: [4]).Footnote 11 The search for causal mechanisms is a response to approaches that identify causes and effects but black-box the process by which the causes produce the effect.Footnote 12 Establishing causality by providing the mechanism that produces effects from causes confronts social science with the task of identifying social mechanisms, which is currently discussed in the political science literature as process tracing (Mahoney 2000: 412, 414). Unfortunately, the burgeoning literature on process tracing does not seem to converge on a coherent understanding of this approach (Trampusch and Palier 2016).

Deriving causal relationships from associations among variables and identifying causal mechanisms are complementary in some respects. Quantitative approaches support the identification of causal relationships within a population through empirical generalisation, i.e. generalisation from samples to the population. They thus provide information on the empirical scope of the causal relationship. In contrast, process tracing provides a generalised description of initial conditions that trigger and maintain the mechanism (causes), the mechanism itself, and the change it produces (effects). While it does not support an empirical generalisation, its abstract description of the conditions under which the mechanism is likely to operate can be considered as the mechanism's theoretical scope, i.e. a generalisation that provides a theoretical description of the conditions under which the mechanism is triggered and maintained. This information could be used to determine the empirical scope by determining the population in which the conditions exist.

The third approach combines the first two and thus creates a much higher bar for establishing causality. According to the Russo-Williamson Thesis (RWT), “a causal claim can be established only if it can be established that there is a difference-making relationship between the cause and the effect, and that there is a mechanism linking the cause and the effect that is responsible for such a difference-making relationship” (Ghiara 2022: 2; see also Russo and Williamson 2007; Shan and Williamson 2021). The difference-making relationship is considered as causal relationship established through interpreting statistical association, while the causal mechanism is identified with qualitative methods. Thus, when

  • a non-spurious correlation between smoking rates and lung cancer rates is established that supports at least one of the following statements: “(i) intervening on smoking behaviours results in a decrease of cancer rates; (ii) when smoking rates decrease, lung cancer rates decrease too; (iii) smoking increases the probability of developing lung cancer” (Ghiara 2022: 3)

and

  • there is “evidence of a sufficiently well understood biological mechanism made of entities (such as proteins and genes) and activities (such as protein expressions or genetic mutations) that links smoking and lung cancer” (ibid.),

the causal claim “smoking causes lung cancer” is supported.

Although research that successfully implements RWT appears to exist (Ghiara 2022), this approach seems very difficult to apply. It requires establishing the non-spuriousness of the difference-making relationship, which is a continuing problem in statistical analysis (Goldenberg 1998), it requires operationalising the same concepts for qualitative and quantitative studies (which appears to be much easier for the health sciences for which RWT was originally developed than for the social sciences), and it requires matching the empirical scope of the difference-making relationship to the theoretical scope of the mechanism(s).

If effects of governance are to be identified, doing so with quantitative methods would be based on the first approach to causality, while qualitative methods would need to adhere to the second. The fact that neither approach has been faithfully applied in empirical research on effects of governance instruments on the conduct and content of research points to the difficulties involved, which I outlined in section “The nature of the causal problem”. However, a possible way forward appears to be a specific approach to process tracing in the ‘mechanismic’ approach.

Causal Reconstruction

A small but consistent body of work suggests starting from an observed phenomenon and working backwards through the mechanism(s) producing it, thereby identifying the phenomenon as an effect of the conditions that trigger and maintain the mechanism(s). Van Evera identifies this approach as process tracing and sees its main utility in arriving at the prime cause:

The investigator traces backward the causal process that produces the case outcome, at each stage inferring from the context what caused each cause. If this backward process-trace succeeds, it leads the investigator back to a prime cause. (Van Evera 1997: 70)

This aim seems to be a bit narrow in the light of the multi-causality already discussed, which is why more recent approaches to this version of process tracing appear to be more promising.

Mayntz (2004: 238, 2009, 2016) introduced the idea of “causal reconstruction” of macro phenomena, an approach that leads to a causal explanation via the reconstruction of causal processes that produce the phenomenon. This strategy is not only applicable to the explanation of macro phenomena. Beach and Pedersen include it both as “theory-building process tracing” for cases in which the outcome is known but the causes are not (Beach and Pedersen 2013: 16) and as “Explaining-Outcome Process-Tracing” (Beach and Pedersen 2013: 18-21; 2016: 308-313). The basic idea of this approach to causal reconstruction is to take a phenomenon and conduct a comparative analysis of the conditions and processes producing it.

Applied to the problem of identifying causal links between governance and epistemic change, using causal reconstruction means that instead of taking any specific governance change and trying to ascertain its effects on knowledge production, we start from a specific epistemic change and try to ascertain how it was produced, and what role various governance instruments played in its causation. Considering all partial causes simultaneously is part of the conscious design of the study rather than done ad hoc.

The reconstruction of causal processes leading to a specified outcome is a well-established approach in the political sciences. The main advantage of causal reconstruction appears to be that it is less dependent on pre-existing theory because it does not require a priori knowledge about possible partial or alternative causes. Instead, partial causes (initial and operating conditions of mechanisms) are empirically identified with open qualitative methods for each link in the causal network.

Although causal reconstruction appears to have major advantages for the analysis of the role of governance for change in the conduct and content of research, it also has disadvantages. Any specific change in governance a policy researcher is interested in might turn out not to be a cause at all because the observed change cannot be traced back to it. However, this risk is not different from the current risk of not finding an effect when starting from a change in governance. It is also very unlikely that the reversal of the causal question will make governance disappear as a cause of change. Applications of causal reconstruction in a study of the development of scientific innovations demonstrated that governance practices do indeed constitute important causes of such change (Gläser et al. 2016; Whitley et al. 2018).

Reversing the strategy of causal analysis is not without problems. If observable change is to be the starting point of the causal analysis, the identification of theoretically significant or politically relevant change becomes crucial. This change needs to be identified and traced backwards to its causes at different levels of aggregation, which requires the integration of quantitative methods for identifying change with qualitative methods of process tracing. While this resembles the RWT approach in its combination of methods, it is much less demanding on empirical research and causal reasoning because no causal relationship needs to be established prior to the search for causal mechanisms.

On Identifying Epistemic Change

Causal reconstruction starts from the phenomenon that shall be explained, which in our case is epistemic change. The question then is how epistemic change, which I understand as change in the content and properties of knowledge production, can be empirically identified. This question must be answered for different levels of aggregation (Table 1).

Table 1 Epistemic change in scientific communities on three levels of aggregation

At the level of international scientific communities, research content may change due to a re-distribution of effort over currently addressed topics or through the emergence of new topics through intellectual innovations. Innovations occur when new findings trigger sustained processes of change of research practices and purposes (Whitley et al. 2018). Examples include the transformation of cancer research by molecular-biological approaches (Fujimura 1988), the experimental realisation of Bose-Einstein condensation, the emergence of evolutionary developmental biology and international large-scale student assessment (Whitley et al. 2018).

Changes in the distribution of effort across topics in a scientific community are difficult to identify because the knowledge a scientific community works with must be delineated, topics be identified, and efforts on topics measured. Unfortunately, bibliometrics has not yet developed robust methods for delineating scientific communities and their knowledge (Held et al. 2021; Held 2022). International scientific communities and their topics are units of analysis that cannot currently be delineated with the necessary validity and reliability.Footnote 13 In contrast, intellectual innovations provide ideal sites for the causal reconstruction of processes leading to epistemic change because they are often highly visible on all levels of aggregation. Their development is based on researchers’ decisions to alter the trajectories of their work, which often leads to discontinuities and rapid change. These changes of individual research trajectories become integrated in the emergence and growth of new topics in universities, national sub-sections of scientific communities and international scientific communities. The discussion of findings triggering an innovation by scientific communities adds to their visibility. Changes of research trajectories often incur high costs in terms of resources and time for learning, which makes them susceptible to governance processes.

Although epistemic properties of fields have been discussed and compared in science studies for a long time, their measurement is still in its infancy. Epistemic diversity is a well-defined concept, but its bibliometric operationalisation is not yet settled (Abramo et al. 2018: 1191-1192). The rate and mode of growth of knowledge might be measurable as the rate and extent of novel and original contributions but valid measures have yet to be developed. Various measures of novelty (e.g. Evans 2010; Azoulay et al. 2011; Wang et al. 2017), innovativeness (e.g. Klavans et al. 2014) and originality (e.g. Shibayama and Wang 2020) have been proposed but have not yet been sufficiently validated.

Assuming that epistemic change can be identified in international scientific communities, it is likely to be of interest mainly as a baseline against which epistemic change at lower levels of aggregation can be assessed. Governance is more likely to influence epistemic change in national sub-communities because major conditions for researchers such as access to positions and resources is provided by national institutions. Epistemic change on this level could occur because the provision of positions and resources is increasingly tied to expectations about the directions and performance levels of research. Therefore, epistemic change in national sub-communities is an interesting starting point for causal reconstruction.

While conceptually important for the study of governance, the level of national sub-communities is difficult to conceptualise sociologically because the knowledge production of many international scientific communities is tightly integrated on the international level and does not vary thematically between national sub-communities. Specific thematic foci—and thus nationally specific epistemic change—may occur in national subcommunities due to a community’s orientation towards applications (which are often developed in national contexts) and nationally specific research objects or research traditions.

National sub-communities are subject to the same kinds of epistemic change as international scientific communities, which pose the same problems of empirical identification. In addition, the contributions by a national sub-community to the international community’s knowledge production may change due to governance processes. Such change may be empirically identified if valid bibliometric indicators can be found. Quantitative indicators of research performance that are used so far measure the number and visibility of publications rather the amount and quality of relevant and reliable contributions to particular topics in the knowledge production of international scientific communities.

At the level of research groups and individual researchers, directions of knowledge production depend on research programmes as well as accessible empirical objects and methods. In addition to the properties already discussed, process-level epistemic properties like the epistemic uncertainty of research processes can be included. Such properties are potentially important because they may change if different research problems are chosen or due to external selection processes such as grant funding. Information about these properties can be obtained by qualitative methods. As discussed in the section “The nature of the causal problem”, the conceptual and empirical tools to aggregate these changes at the level of national sub-communities have yet to be developed.

The perspective on epistemic change within scientific communities must be complemented with a perspective on epistemic change in units of analysis that are created by governance. These units are multi-disciplinary because national science systems include all national sub-communities in a country and research organisations are comprised of research groups from different scientific communities. These units of analysis can thus be understood as maintaining multidisciplinary portfolios whose change constitutes a specific kind of epistemic change. Epistemic properties of these portfolios include their epistemic diversity and ‘performance’, i.e. the aggregate of contributions to the knowledge of various scientific communities (Table 2).Footnote 14

Table 2 Epistemic change in national science systems on two levels of aggregation

The identification of epistemic change in national science systems and public research organisations faces the challenge of comparing and aggregating field-specific epistemic changes. Comparing or aggregating epistemic change across fields seems currently impossible due to both missing conceptual foundations on epistemic properties of fields and a lack of methods for comparative measurement.

Given the methodological challenges involved in the empirical identification of epistemic change, utilising intellectual innovations in the sciences, social sciences and humanities as starting points for the causal reconstruction of influences on epistemic change appears to be the most promising approach. The epistemic change they bring about is easier to identify than changes in epistemic properties, occurs on all levels of aggregation from international scientific communities to individual researchers, and often involves high-cost decision situations that are particularly susceptible to governance.

Conclusions

Among the many fruitful analyses of changes in the governance of science that have been conducted in the last decades, one strand appears to have failed to achieve the goals it set itself. The studies that claim one way or other to have identified effects of governance on the content and conduct of research do not stand up to scrutiny. I identified some methodological problems that could be overcome but argued that the main problem is that current analytical approaches do not do justice to the complexity of the causal problem they need to solve. As a remedy for qualitative studies of governance, I propose to start from observable epistemic change and to apply a strategy of causal reconstruction that identifies the causal processes bringing about that change, the conditions that trigger these mechanisms, and the conditions under which they operate. In this causal reconstruction, the contribution of governance to change can be identified. Starting from observable change requires its empirical identification, which I discussed for the case of epistemic change.

Five conclusions can be drawn from this analysis. First, taking a governance instrument as analytical starting point and forward-tracing its effects is likely to lead to a study design that assumes mono-causality and cannot differentiate between additional partial causes (which need to be included due to the multi-causality of social phenomena) and alternative explanations (which need to be excluded to address the equifinality of causal social processes). These limitations are partly due to the lack of a middle-range theory that explains the contributions of governance to changes in science by linking specific conditions under which governance operates to changes in the conduct and content of research through mechanisms operating under these conditions.

Second, given the complex causal relationships and processes in which governance instruments are embedded, it may not be possible to identify consequences of governance for the conduct and content of research at all—even in the sense of partial, non-deterministic causality advanced in this paper. If this is the case, the field of science policy studies needs to rethink its agenda.

Third, for those who want to keep trying to establish causality, a possible alternative strategy for qualitative causal analysis appears to be the causal reconstruction of processes leading to change, i.e. starting from observable change and tracing it back to partial causes, which may or may not include governance. Although this strategy is not without problems, starting from change and putting governance in its place among the conditions and mechanisms leading to such change is likely to produce more information about the causal role of governance than our current strategy.

Fourth, identifying macro-level epistemic change and the causal mechanisms producing it requires a much higher level of integration of quantitative (bibliometric and survey-based) and qualitative methods than has been achieved so far. Both methods would need to operationalise the same concepts. This would require a theoretical and methodological integration far beyond the combination of methods in current ‘mixed method’ approaches.

Finally, identifying epistemic change and reconstructing the causal processes producing it requires sophisticated comparative strategies. Such studies would need to be simultaneously field-comparative in order to establish the causal role of epistemic and social-structural properties of scientific communities and country-comparative in order to establish the causal role of governance structures and processes. This requires larger teams, longer time horizons and more resources than are currently common in the sociology of science and science policy studies. Applying the new strategy of causal analysis would also require a significant change of research practices.