Measuring effectiveness
The examination of knowledge production and use in policy making revolves around a number of themes. Venues used to produce knowledge for policy include expert advisory bodies, parliamentary select committees, and policy appraisal settings (Jordan and Russel 2014), of which experiments and pilots are variants.Footnote 2 Experiments as knowledge producers at the science–policy interface are expected to provide decision-makers with evidence of the effects of a policy, which can have concrete or conceptual effects on its audience. In this context, we focus on how an experiment influences a policy actor’s mind set, and a conceptual utilisation process described as the gradual sedimentation of ideas into a policy network (Weiss 1977). This focus contrasts with concrete utilisation, where research findings are found to directly influence specific policy decisions (Greenberg et al. 2003) (also known as the knowledge-driven model in Weiss 1977). Although understanding the direct effects of knowledge on policy decisions is valuable, the number of interacting variables to be considered means demonstrating any impact on actual decisions would be difficult (Turnpenny et al. 2014). Measuring the perspectives of decision-makers, in contrast, is straightforward in comparison and broadens understanding of how experiments influence policy making. Whether a decision-maker uses experimental evidence in their decisions may depend on how favourably they perceive the experiment.
To assess conceptual influence, we draw on criteria regularly used to assess the effectiveness of a science–policy interface—how credible, salient, and legitimate an interface is perceived to be (Cash et al. 2003). The Cash typology is regularly used to assess the science–policy interface and is similar to the criteria that Weiss (1977) used to assess enlightenment experienced by decision-makers (being the perceived technical quality, the relevance of research to policy, and the political acceptability of the research). Credibility refers to the degree to which policy makers consider the experiment authoritative and believable, and to the degree in which they trust the outcomes. It includes credibility of both the knowledge production processes and of the knowledge holders (Saarki et al. 2014). Salience refers to the perceived relevance of the experiment by decision-makers at a certain moment in time. It makes us aware of the relationship between expert knowledge and decision-making, emphasising that credibility alone is not going to improve political decisions (Cash et al. 2003). The third criterion is legitimacy, which reflects the perception that the production of information has been respectful of stakeholders’ divergent values and beliefs, unbiased in its conduct, and fair in its treatment of views and interests. Legitimacy rests on how the choice was made to involve some actors and not others, and how information was produced and disseminated (Ibid.). The three criteria are summarised below (Table 1).
Table 1 Three criteria measuring effectiveness, defined by Cash et al. (2003)
Typology of experiments
Studies have analysed experiments in terms of their characteristics (van der Heijden 2014), purpose (Ettelt et al. 2015), and implications for policy (Greenberg et al. 2003). Here, experiments are assessed in terms of how the organiser “sets” the experiment’s institutional rules, as described in the Institutional Analysis and Development Framework developed by Elinor Ostrom (2005). Ostrom uses the rules to describe an action situation, and they determine who is involved and who is excluded (boundary rules), how responsibilities are distributed (choice rules), what types of information are distributed, how regularly, and to whom (information rules), the extent of buy-in by participants (pay-off rules), and how decisions are made (aggregation rules).
How these rules are set can be understood as design choices made by an experiment’s organiser. To facilitate empirical investigation, differences in the settings of each of the rules can be aggregated into three different types of experiment: technocratic, boundary, and advocacy types. Real-world examples can then be approximated against these types (Weber 1968; Dryzek 1987). Typical diametric technocratic and interpretive approaches to policy analysis (Owens et al. 2004) provide a basis for distinguishing such types. The typology is also informed by a model of the science–policy interface that classifies different roles of science (Pielke Jr. 2007): science as arbiter, issue advocate, or an honest broker of policy options. The sections below summarise the rule settings for each ideal type (rule settings for each experiment type are given in detail in “Appendix 1”).
Technocratic experiment
The technocratic policy experiment resembles the technical–rational model of knowledge production, where an expert elite generates scientific knowledge for policy decisions (Owens et al. 2004). It produces scientific information with little or no connection to the policy process until the end, when the results are presented to decision-makers. The experiment thus plays a supposedly objective and disconnected role in politics as “science arbiter” (Pielke Jr. 2007). Knowledge is produced and verified through processes acceptable to the involved scientific community, with fact finding occurring within the parameters of the goals previously set. This arrangement reinforces the view that science is independent of politics (Koetz et al. 2012).
Boundary experiment
A boundary policy experiment provides an opportunity for actors—state and non-state—to gain access to and possibly influence policy making. The boundary experiment is initiated by a collaboration of actors, and the production of scientific knowledge is supplemented by multiple knowledge systems—relevant contextual, lay, and traditional forms of knowledge, which are considered of value (Koetz et al. 2012). The experiment’s role in policy making resembles the “honest broker of policy alternatives” (Pielke 2007), where it engages with the policy process and develops policy solutions in accordance with multiple value perspectives. It is expected that the engagement results in participants appreciating the different ways the problem can be understood, and in turn designing and testing a mutually beneficial solution (Lejano and Ingram 2009).
Advocacy experiment
By choosing to design their experiment as an advocacy type, an organiser indicates that they have a predefined problem definition and are not open to alternative interpretations. They intend to use the experiment to encourage action in a particular policy direction and to soften objections (compare the “(stealth) advocate” role in Pielke 2007). An advocacy experiment is generally organised by policy makers and includes dominant, traditional actors in coalition. Different actor types might be represented, but they agree with the problem conception and those with contrasting expectations (“outsiders”) are barred from gaining access (Owens et al. 2004). Those in charge retain power and control over design, monitoring, and evaluation procedures, reinforcing the existing structures of power.
To summarise, the three experiment types each represent an aggregate of different rule settings with divergent configurations of information, power distributions, and participants. Individual rule settings could be analysed as independent variables in themselves, but it is not the focus of this analysis (see Leach et al. 2014 for an assessment of how individual design variables affect learning outcomes). The following section outlines how the (conceptually derived) expectations of how the types might produce different effects on decision-makers.
Experiment design and effectiveness
The literature suggests several factors that could influence credibility, saliency, and legitimacy. Based on these factors, three hypotheses are built that connect the design of experiments to these criteria.
H1
If an experiment has a technocratic design, it will be considered highly credible and moderately legitimate, but not salient.
For the technocratic type, due to its emphasis on independent scientific methods and expertise the experiment is expected to be considered highly credible (Cash et al. 2003; Owens et al. 2004). These experiments maintain a transparent process and reporting of scientific findings—including uncertainties and limitations, which also boosts credibility (Saarki et al. 2014). Separating the participants (expert actors) from policy makers and excluding discussion on different perspectives means, the experiment is less likely to resonate with the needs of policy makers, reducing the possibility of the project being considered salient. The funding for the experiment is likely to be from organisations with a purely scientific interest, which care more about scientific publications than about policy relevance. Finally, the closed character of the technocratic type reduces its legitimacy because the research question, data gathering process, and report writing have not involved stakeholder groups or ordinary citizens and might not address arguments they consider important (Millo and Lezaun 2006); however, this loss of legitimacy is tempered by the experiment’s openness and transparency.
H2
If an experiment has a boundary design, it will be considered highly legitimate yet moderately credible and salient.
In a boundary experiment, wide boundary settings ensure that non-state actors have access to policy making where they can influence how a public policy problem is solved (Dryzek 1987). This may result in the experiment being perceived as very legitimate, as the inclusion of different perspectives increases the chance that the evidence resonates with societal needs (Hegger et al. 2012). Boundary experiments are the only type that allow actors to enter the process on their own volition, which improves their legitimacy compared to the other two types. Moreover, open and transparent information transmission between participants allows for the “extended peer review” of the experiment by a range of actors (Funtowicz and Ravetz, 1993), rendering the information produced more legitimate (Ibid.). The inclusion of different knowledge types will distract from the independent and reliable knowledge produced, so a lower perception of credibility than for the technocratic type is expected. Including a range of actors may ensure salience, although increased inclusiveness can have negative effects because it may mean issues are reframed that make an experiment irrelevant (Cash et al. 2003). Nevertheless, a boundary experiment will strengthen linkages between knowledge production and users and increase the probability that the experiment will be designed around the best question for policy (Saarki et al. 2014).
H3
If an experiment has an advocacy design, it will be considered highly salient but not very credible or legitimate.
Finally, in regard to the expected impacts of an advocacy experiment, credibility is undermined by including policy and non-state actors in the experiment along with expert actors, which dilutes the validity of scientific knowledge with the production of practical knowledge. Moreover, if it is noticed, selective information distribution and a lack of transparency reduce the experiment’s perceived reliability. In the attempt to show there is support for a particular proposal, the organiser blocks participation by critical actors and thereby undermines their concerns, reducing fairness and the perceived legitimacy of the project. However, salience may be high because of the presence of dominant policy actors, which helps when the experiment is used to keep a policy idea alive (Greenberg et al. 2003), and outcomes are presented when the time is right—carefully gauged and engineered by the policy actors involved.
Table 2 summarises the expectations sketched above into three tentative hypotheses.
Table 2 Expected scores for the three types
Intervening variables
Our independent variable (governance design choices made by experiment organisers) is only one possible way to explain variations in an experiment’s effectiveness. There are competing explanations that may explain their impact; for example, what role (if any) the respondent’s organisation had in the experiment. Other relevant factors include what government institution they work for, or the extent they consider the experiment innovative (Weiss 1977). Playing one of these roles might positively bias a decision-maker’s survey responses. These variables are also operationalised and examined in the analysis below. Other intervening variables include the extent of change in the political environment external to the experiment and environmental crises such as flooding events or drought, but these external changes were not controlled for.