The importance of policy design fit for effectiveness: a qualitative comparative analysis of policy integration in regional transport planning

Policy design has returned as a central topic in public policy research. An important area of policy design study deals with effectively attaining desired policy outcomes by aligning goals and means to achieve policy design fit. So far, only a few empirical studies have explored the relationship between policy design fit and effectiveness. In this paper, we adopt the multilevel framework for policy design to determine which conditions of policy design fit—i.e., goal coherence, means consistency, and congruence of goals and means across policy levels—are necessary and/or sufficient for policy design effectiveness in the context of policy integration. To this end, we performed a qualitative comparative analysis of Dutch regional transport planning including all twelve provinces. Outcomes show no condition is necessary and two combinations of conditions are sufficient for effectiveness. The first sufficient combination confirms what the literature suggests, namely that policy design fit results in policy design effectiveness. The second indicates that the combination goal incoherence and incongruence of goals and means is sufficient for policy design effectiveness. An in-depth interpretation of this counterintuitive result leads to the conclusion that for achieving policy integration the supportive relationship between policy design fit and policy design effectiveness is less straightforward as theory suggests. Instead, results indicate there are varying degrees of coherence, consistency, and congruence that affect effectiveness in different ways. Furthermore, outcomes reveal that under specific circumstances a policy design may be effective in promoting desired policy integration even if it is incoherent, inconsistent, and/or incongruent.


Introduction
How to develop policy designs that effectively address policy problems has been an ongoing topic of research for decades. A policy design is generally understood as a mix of interrelating goals and means that governments employ to give effect to their policies (Howlett & Rayner, 2013;Howlett, 2014a). Even though policy design thinking has expanded considerably over the years, a key component has always focused on bringing about intended policy outcomes by consciously matching goals and means (Howlett & Mukherjee, 2018a) because a good fit between goals and mean is said to minimize incompatibilities and exploit synergies, so as to improve policy design effectiveness (Rayner et al., 2017). In this setting, policy design fit is considered to be the sum of coherence of goals, consistency of means, and congruence of goals and means (Van Geet et al., 2019a, 2019b. Contemporary studies on policy design often present these elements-coherence, consistency, and congruence-as the criteria that determine policy design effectiveness (e.g., Howlett & Rayner, 2013Kern & Howlett, 2009).
To further mature as a field, policy design studies would benefit from methodological innovations and a stronger emphasis on the application and operationalization of the field's theoretical principles (see e.g., Rogge et al., 2017;Schmidt & Sewerin, 2018). This also applies to the relationship between policy design fit and effectiveness which, to date, has undergone limited empirical testing (Rogge & Schleich, 2018). Consequently, the evidence for the positive relationship between policy design fit and policy design effectiveness has been predominantly of a theoretical nature (e.g., Howlett, 2009;Howlett & Rayner, 2013Rayner et al., 2017). Not only have existing empirical studies hardly addressed all three conditions of policy design fit together, but they also present different findings of the importance of design coherence, consistency, and congruence for achieving desired outcomes (see Kern et al., 2017;Kern & Howlett, 2009;Rogge & Schleich, 2018). A systematic empirical analysis, including all three conditions, of how coherence, consistency, and congruence-i.e., policy design fit-contribute to policy design effectiveness is lacking. This article aims to bridge this research gap by investigating to what extent policy design fit is needed for policy design effectiveness.
Understanding the relationship between policy design fit and design effectiveness is especially relevant in moving toward more comprehensive policy integration. Recently, there is increased interest in linking policy design thinking and policy integration research (Peters, 2018). The purposive nature of policy design can help to achieve the increased integration between policy areas and levels of government that is needed to address crosscutting policy problems (Candel & Biesbroek, 2016;Cejudo & Michel, 2017). Several scholars have already highlighted that certain policy instruments can help to address policy issues that span across policy fields and levels of government (Jordan et al., 2005). However, few policy design studies have given this consideration (Peters, 2018). So far, it has remained unclear whether and how aligning policy goals and instruments help to effectively bring about desired policy integration. An earlier study by Van Geet et al. (2019b) did reveal that discrepancies in the degree of integration between policy goals and policy instruments give rise to policy design incoherence, inconsistency, and incongruence. It is, however, unclear to what extent these mismatches between policy design elements impede effective policy integration.
The current study addresses these research gaps and investigates whether all three attributes of policy design fit are required, or whether, for example, only coherence or a certain combination of attributes, is sufficient for effectively achieving desired policy integration.
Our study focuses on the impact of policy design fit in the domain of transport planning, where the challenge of promoting policy integration has become particularly apparent. Over the course of decades, transport planning evolved from an unimodal approach, to a multimodal approach, to an integrated approach on land use and transport planning Busscher et al., 2015;Heeres et al., 2012). While the field has progressed toward increased integration, governments are struggling to come up with effective policy designs to support current integrated ambitions on land use and transport (Van Geet et al., 2019a).
To achieve its objective, the current study applies Howlett's (2009) multilevel policy design framework to study policy integration in Dutch regional transport planning and adopts qualitative comparative analysis (QCA) methodology to analyze the policy designs of all 12 Dutch provinces. More specifically, this research design is adopted to investigate whether the coherence of goals, the consistency of means, and the congruence of goals and means-or combinations of these three attributes-are necessary and/or sufficient for effective policy integration. QCA was selected both for its systematic approach to case comparison and for its configurational nature (Gerrits & Verweij, 2018;Rihoux & Ragin, 2009), which means that it allows for analyzing the necessity and sufficiency of conditions or combinations of conditions for achieving certain outcomes. QCA is an appropriate method for studying policy designs and explaining policy outcomes . Even though QCA has been used in public policy studies  and planning studies (Verweij & Trell, 2019) before, it offers a new methodological approach to examining the influence of policy design fit on policy design effectiveness.

A multilevel approach to policy design
The literature on policy design has come a long way. In its early stages, policy design thinking revolved around Tinbergen's (1952) notion that the most effective policy design consists of a 1-to-1 goal-means ratio, where one instrument fully addresses one policy goal (Knudson, 2008). Tinbergen himself acknowledged the difficulty of maintaining this 1-to-1 ratio because comprehensive policy goals will require a mix of policy instruments. Yet, it took some time for his policy design approach to develop into the more comprehensive thinking that a policy design should be understood as a mix of interrelated goals and instruments that are deployed throughout the policy process Howlett et al., 2015;Howlett, 2014a).
Key principles of current policy design thinking are captured by the nested model introduced by Howlett (2009), building on the work by Hall (1993) and Cashore and Howlett (2007), as visualized in Fig. 1. Since its introduction, this model has been incrementally developed and further established in a series of studies (Howlett & Cashore, 2009;Howlett & Rayner, 2013;Howlett & Mukherjee, 2018b;Peters et al., 2018;Howlett, 2019). The model adopts a multilevel perspective on how mixes of policy goals and means are formed, based on the principle that higher levels of abstraction delineate and shape the features at the lower levels. The highest level of abstraction is the macro-level. This concerns the general mode of governance (e.g., corporatist, market, and network governance) that shapes policy deliberations and decision-making, as well as the preferred type of government regulation mechanisms (e.g., legal, financial, or communicative mechanisms) (Howlett, 2009, Howlett (2018b) describes the macro-level as the contextual features that structure the policy formation and policy implementation practices of governments. The intermediate level of abstraction is the meso-level. This is referred to as the policy level and concerns the generic set of policy objectives of a certain policy sector, as well as the combination of policy instruments that are used throughout the policy process to attain these objectives (Howlett, 2018b). The decisions on policy design that are made on the meso-level set the boundaries for the design choices that can be made at the micro-level, which is the third and lowest level of abstraction. At the micro-level, policy design is operationalized and directly linked to goal attainment. Micro-level policy design is concerned with the delivery of policy outcomes. It is the level of the specific on-the-ground measures that are formed by detailed policy goal settings and specific instrument calibrations.
A key aspect that hallmarks current policy design thinking is the conscious effort to bring configurations of interrelating policy goals and instruments into alignment, so as to effectively achieve intended outcomes (Howlett et al., 2015). Multiple scholars have defined attributes to operationalize the alignment of design components. Examples include coordination, complementarity, coherence, consistency, and congruence (Bali & Ramesh, 2018). Some of these concepts partly overlap, as their focus is on either of three possible relations; the alignment between goals, between means, or between goals and means. When it comes to the multilevel understanding of a policy design that is adopted here, the alignment across components is generally expressed as, the coherence of goals, the consistency of means, and the congruence of goals and means (Howlett, 2009;Howlett & Rayner, 2013). This is shown in Fig. 1. More specifically, coherence is achieved when goals, objectives, and settings can be pursued at the same time without trade-offs (Kern & Howlett, 2009). Rogge and Reichardt (2016) argued that the consistency of a policy design reflects Fig. 1 Components of public policy in policy design and conditions for policy design fit. Based on: Cashore and Howlett (2007b), Howlett (2009), andHowlett (2018a) how well instruments are aligned with each other and how well they contribute to achieving the same policy objective. Their study explained that consistency may range from the absence of contradictions between policy means to the existence of synergies between policy means. This is in line with Howlett and Rayner's (2013) view that the consistency of policy means is reflected by the 'ability of multiple policy tools to reinforce rather than undermine each other in the pursuit of goals ' (p. 174). Congruence, finally, reflects the extent to which policy goals and means are mutually supportive and successful at working together to achieve corresponding goals (Kern & Howlett, 2009). The sum of goal coherence, mean consistency, and congruence of goals and means, may be described as policy design fit (van Geet et al., 2019b).
Policy design fit is a dynamic concept; policy design components develop over time, through processes of layering, drift, conversion, replacement, and exhaustion (Kern & Howlett, 2009;Rayner, et al., 2017;van Geet et al., 2019b). It is important to take into account these processes of change as they will impact the fit of policy design components (van Geet et al., 2019b). Cashore and Howlett (2007) provide useful insight on how processes of policy design change unfold between goals and means across different policy levels. They found dynamics to differ across the model's components goals, objectives, settings, instrument logic, tools, and calibrations, because they evolve irregularly at a varying tempo, depending on the institutional structures that are in place for each component. In general, it is assumed that the macro-level is characterized by long-lasting stability; at the meso-level instances of policy change will occur in a higher frequency; and the micro-level is most dynamic (Hall, 1993;Howlett, 2009).

Attributes of policy design effectiveness
Effectiveness is widely acknowledged to be the fundamental goal of any policy design and is receiving considerable attention from design scholars. This is not surprising as effectiveness is generally considered the foundation upon which additional goals-such as sustainability, public value, or justice-may be constructed Howlett, 2018b;Mukherjee & Bali, 2018;Peters, 2018;Peters et al., 2018;. Peters et al. (2018) even argued that effectiveness is why policymakers, either implicitly or explicitly, engage in policy design in the first place. The growing interest in policy design has encouraged scholars to identify a variety of attributes that, in complement to policy design fit, are considered to potentially affect policy design effectiveness. Within this rapidly growing body of literature, a differentiation can be made between effectiveness in terms of process-in which policy design is seen as a verb-and content-in which policy design is seen as a noun (Howlett & Rayner, 2013;Peters et al., 2018). This study focuses on the effectiveness of a policy design as a verb-i.e., the extent to which the technical specifications of a policy design are successful in attaining desired outcomes.
In recent years discussions on policy design effectiveness (as a noun that is) have converged to explore a variety of attributes that go beyond the traditional focus of matching goals and instruments. Schmidt and Sewerin (2018), for example, pose that policy designs with a higher intensity-i.e., the amount of resources or activity that is invested or allocated to a specific policy instrument-and a higher balance-i.e., the variety of instrument types within a design-will be more effective. Additionally, Thomann (2018) highlighted that explicitness in the calibration of a policy instrument-i.e., the extent to which desired behavior is encouraged by attributing positive or native valence to certain 1 3 actions-in part accounts for its effectiveness. Furthermore, Mukherjee and Bali (2019) argue that the capacity-i.e., the range of analytical, operational, and political skills-a government has available will determine its ability to successfully put instruments to use for achieving desired outcomes. Finally, both Peters et al. (2018) and Capano and Howlett (2019) describe the goodness-of-fit attribute, which holds that the calibration of an instrument needs to be responsive to the context in which it is deployed. Despite these valuable theoretical contributions, there is scant empirical evidence on the interrelationship between these attributes and design effectiveness. Filling this research gap is an important next step for bridging the growing gap between policy design theory and practice. Increasing our understanding of the relationship between policy design fit and effectiveness is an important first step to be made as this may be considered the foundation of current design thinking and still plays a leading role in contemporary policy design theory (e.g., Howlett & Rayner, 2013Howlett, 2009Howlett, , 2018b. The limited number of conducted empirical studies on the relationship between policy design fit and effectiveness has been confined to a single-level focus on policy design and reveal different outcomes based on divergent research approaches. For example, Kern and Howlett's (2009) single-case study describes how discrepancies in the development of policy goals and means over time resulted in sub-optimal outcomes, as a result of instrumental inconsistency as well as growing incongruence between goals and means. However, they did not explain how incongruence and inconsistency negatively influenced policy design effectiveness. Similarly, Kern et al. (2017) study on design dynamics implied that consistency and coherence may encourage goal attainment. However, they have not supported this claim by analyzing achieved policy outcomes. Rogge and Schleich's (2018) pioneering explorative quantitative study tested how the perception of German companies (n = 390) regarding the coherence, consistency, and congruence of a policy design was associated with the policy outcome of low-carbon innovation. This study found only weak support for consistency and congruence, while coherence was not significantly contributing to the achievement of intended outcomes. Furthermore, Reichardt and Rogge (2016) conducted a multiple-company case study (n = 6) and found that stable and coherent long-term policy goals, in combination with a consistent mix of policy instruments that were congruent with the long-term goals, led to successful corporate innovation in the offshore wind energy in Germany. Taken together, these empirical studies provide initial evidence that design coherence, consistency, and congruence may benefit policy design effectiveness. However, a systematic assessment of the relationship between policy design fit and effectiveness based on the multilevel understanding of policy design is still missing.

Operationalizing policy design effectiveness
When it comes to determining the effectiveness of a policy design, the current body on policy design literature broadly provides the conformance and performance approaches as the two main alternatives. The first, which is also the most widely described, regards effectiveness as the degree to which a policy design achieves intended outcomes (e.g., . In this approach, effectiveness is determined by comparing policy intentions to outcomes. The conformance approach to effectiveness is in line with the purposive understanding of policy design as a systematic effort to link appropriate means to attain predefined goals. Del Río (2014) argues that, despite the straightforward nature of this conformance-perspective, design effectiveness remains a multifaceted notion; any criteria for measuring effectiveness are to be specifically defined for each individual policy design 1 3 to 'include different criteria and policy goals which are relevant' (Del Río, 2014, p. 269). Multiple scholars highlighted that policy monitors and policy evaluations may be used for assessing this type of design effectiveness (Del Río, 2014;Doremus, 2003;Howlett, 2018c;Peters et al., 2018). Alternatively, the performance approach to policy design effectiveness, which was introduced by Peters et al. (2018), focuses on how a policy design performs as a 'frame for action […] through which problem, process, and result are collectively defined and accepted' (Peters et al., 2018, p. 14). The performance approach is more geared toward the effectuality of policy processes. From this perspective, effectiveness is determined by analyzing the policy processes that follow the employment of a policy design to determine whether a policy design effectively supports policy actors in making sense of policy problems and addressing them. To date, this performance approach to design effectiveness is still in its infancy; the literature provides limited leads to develop a robust method for performance assessment. Thus, this study will adopt a conformance understanding of policy design effectiveness that revolves around comparing policy goals to policy outcomes.

Research design
This paper aims to determine whether all three conditions of policy design fit, or whether a single condition or a combination of two conditions are sufficient to ensure policy design effectiveness. To this end, we apply qualitative comparative analysis (QCA). This is a settheoretic method for analyzing the necessary and sufficient conditions (or combinations of conditions) that explain a certain outcome of interest (Schneider & Wagemann, 2012). As such, it allows testing conditions for effective policy design, while maintaining a qualitative understanding of the specifics of the individual cases from which results are derived. To be able to maintain this in-depth qualitative understanding, QCA works best for an intermediate number of cases. In QCA, studying a small or intermediate number of 12 cases is common (see e.g., Rihoux et al., 2013;Verweij & Trell, 2019). The 12 cases are naturally sufficient for this study because they constitute the entire population of Dutch provinces. This section goes into the design and execution of the qualitative comparative analysis methodology.

Adopting policy design thinking to study policy integration
Policy design fit-i.e., coherence, consistency and, congruence-is a multi-faceted concept; its operationalization and assessment will depend on the 'specific job at hand' (Howlett, 2014b). When it comes to the job of achieving policy integration, work by Candel and Biesbroek (2016) provides a base for operationalizing these policy design conditions. More generally, policy integration is understood here as a strategy to overcome the fragmented organization of the public sector in order to address problems that cross established administrative and jurisdictional boundaries (see e.g., Cejudo & Michel, 2017;Trein et al., 2019). Candel and Biesbroek (2016) put forward policy integration as an ongoing process. They discern between goals and instruments as two of the dimensions on which these processes on integration take place that vary on a spectrum from a low to a high degree of integration. They argue that integration processes often show discrepancies or a-synchronicity across dimensions and that consequently, goals and means may be of a different degree, or level, of integration. For this study, we use the synchronicity of the integration process across goals and means as a measure for policy design fit.

3
To determine the level of integration of goals and instruments, Candel and Biesbroek (2016) have formulated specific criteria. On the dimension of policy goals, the degree of integration is dependent on two aspects. First, the range of policies, both between as within subsystems that collectively address the same problem (e.g., domains of transport, energy, and maritime all addressing climate change) and second, the extent to which different subsystems embed their policy goals in an overarching strategy directed at solving a collective problem. Additionally, the degree of integration on the dimension of policy means is reflected by three aspects. First, the diversity of instruments that are deployed and support each other in addressing a collective goal. Second, the range of instruments in place that structure interaction and coordinate policy action across administrative boundaries to achieve collective, overarching goals (e.g., interdepartmental working groups, overarching plans, overarching funding programs). Third, the extent to which a mix of cross-subsystem instruments is adopted, tailored to meet an overarching policy goal.
Over time, transport planning has incrementally evolved toward an advanced level of integration-see Fig. 2 (Curtis & James, 2004;Heeres, 2017;Van Geet et al., 2019a). Traditionally, transport planning was characterized by a sectoral unimodal approach in which sectoral specialization resulted in the segmented planning of roads, railways, and waterways (Busscher et al., 2015;Owens, 1995). However, as the awareness increased of the interrelationships between different modes of transport and the interactions between land use and transport, multimodal, and integrated land use and transport planning approaches were developed (Hull, 2010;Potter & Skinner, 2000). A multimodal approach focuses on the entire transport system and regards the different modes of transport and infrastructure networks as functioning as an integrated whole (Arts et al., 2014;Heeres et al., 2012;Hull, 2005). Integrated land use and transport planning goes one step further and also considers the reciprocal relationship between the multimodal transport system and land use (Hull, 2010;Wegener & Fürst, 1999). It focuses on 'people' and 'places,' by acknowledging that travel is a means to engage in activities such as meeting family, working, and shopping (UN-Habitat, 2013) and that transport infrastructure connects different spatial functions where these activities take place (Heeres et al., 2012(Heeres et al., , 2016. The latter approach combines transport planning measures (e.g., investment in infrastructure networks) and land use planning measures (e.g., mixed-use planning, urban density, proximity, and distance to public transport) to achieve broad policy goals, such as improving accessibility (Hull, 2010;Straatemeier, 2019;Van Wee et al., 2013) or sustainable mobility (Banister, 2008;Bertolini et al., 2005).
Collectively, these studies on policy integration and transport planning provide a foundation for operationalizing this QCA's four policy design attributes. The three conditioning attributes are assessed based on the synchronicity of the policy design in terms of integration. Table 1 shows how each component may be scored as unimodal, multimodal, or integrated land use and transport. Depending on synchronicity that is consequently observed between the policy design's components, policy design coherence, consistency  Unimodal Policy should be developed separately for single modes of transport to address problems that fall within the boundaries of that specific transport mode Instruments are adopted following a logic of single-mode specialization. Policy implementation is achieved using specialized instruments that are directed at specific single modal goals Multimodal Policy should be developed from a broader transport perspective; there is general recognition that in governing transport problems, the interrelationships between different modes should be taken into account Instruments are adopted following a logic of intra-sectoral integration; policy implementation is achieved using instruments that coordinate and steer collaboration within the transport system Integrated land use and transport Policy should be developed from an integrated land use and transport perspective; there is a general recognition that policy problems should be governed according to a holistic approach on the land use system and the transport system Instruments are adopted following a logic of inter-sectoral integration; policy implementation is achieved using instruments that coordinate and steer collaboration between the transport system and the land use system Meso-level-Policy level or program level operationalization OBJECTIVES What does policy formally aim to address?
TOOLS What types of instruments are used?

Unimodal
Policy goals aim to address problems on the individual transport network and can be attained through single modal planning (e.g., goals oriented at influencing transport flow, vehicle speed, congestion, and network connectivity). Specific strategic policy plans are developed for each transport network The instrument mix that is in place has a purely sectoral focus and only addresses single-mode problems. There are no instruments that coordinate and steer government action on multiple transport modes Policy goals aim to address broader transport problems that require integrated action across different modes of transport (e.g., goals may be targeted at greater overall mobility, intermodal transfer, and the complementarity between different networks). Shared policy goals are embedded within an overarching transport strategy The instrument mix that is in place guides policy action on multiple transport modes to achieve an overarching transport goal. This mix includes a range of instruments that coordinate and steer government action on multiple modes of transport Integrated land use and transport Policy goals aim to address policy problems that require integrated action across land use and transport (e.g., transit-oriented development, sustainable mobility, and accessibility). Shared policy goals are embedded within an overarching spatial planning strategy The instrument mix that is in place guides policy action on transport and land use to achieve an overarching policy goal. This mix includes a range of instruments that coordinate and steer government action on land use and transport Micro-level-Specific on-the-ground measures SETTINGS What are the specific 'on-the-ground' requirements of the policy?

CALIBRATIONS
What are the specific ways in which the instrument is applied?
Unimodal A combination of specific on-the-ground measures is formulated to address problems on a single transport network (e.g., infrastructure development and infrastructure expansion) Policy instruments have a specialized focus; they are applied for planning and delivering policy measures on single modes of transport Multimodal A combination of on-the-ground measures is formulated for different transport modes to address broader transport problems (e.g., the development of a transport hub or park-and-rides) Policy instruments are applied for planning and delivering policy measures on multiple modes of transport. There are instruments in place that coordinate policy measures on different transport modes and combine these to attain overarching transport goals  (2016), Cashore and Howlett (2007), Curtis and James (2004), Howlett (2018a), and Wegener and Fürst (1999) 1 3 and congruence will be determined. The outcome attribute will be assessed on the extent to which policy outcomes correspond to the desired degree of integration. This is done, in line with conformance thinking, by comparing the level of integration of meso-level goals to the level of integration of the achieved policy outcomes. This intermediate-level is the key level when it comes to determining effectiveness as these are the objectives you hope to achieve in formulating specific on-the-ground measures.

Unit of analysis
The transport planning policy design is the unit of analysis for each of the twelve cases.
Each of the designs included in this study was adopted between 2003 and 2009. The effectiveness of each design was assessed from its adoption until 2020-so over the course of at least ten years. Data were collected in the form of provincial policy documents, provincial websites, internet archives, and online policy monitors. A total of 193 sources were collected and coded in ATLAS.ti. Table 2 provides an overview of the documents and websites. The reference list is given in the Appendix.

Calibration and the data matrix
As part of the QCA, the collected data were calibrated following the guidelines of Basurto and Speer (2012), De Block and Vis (2018), and Gerrits and Verweij (2018). During the calibration, membership scores are defined for each case on every 'set.' In QCA, each condition and the outcome is understood as a 'set.' Our analysis includes four sets: the conditions coherence, consistency, and congruence, and the outcome was effectiveness. Calibration involves the transformation of the qualitative case information (in this case the coded documents for the twelve transport planning policy designs) into quantitative setmembership scores (Gerrits & Verweij, 2018). We based our calibration choices on the analytical framework and operationalization presented above. An extensive overview of the data calibration can be found as supplementary material. The data were calibrated through systematic document coding in ATLAS.ti and followed three steps for every case. The first step involved coding the data in line with the policy design components outlined in Fig. 1. The main long-term strategic transport plan of every province was retrieved and was used to identify the policy goals and policy means. These data were complemented with additional material regarding the policy means that were described in the strategic plan (see Table 2). This provided an overview of the macro-, meso-, and micro-level transport planning policy goals and means of each of the provinces. In a second step, using the criteria listed in Table 1, the degree of integration for each of the design components was assessed for each province. In other words, each of the design components was qualified as either unimodal, multimodal, or integrated land use and transport. Additionally, the policy outcomes were reviewed to determine whether the policy design was effective. Policy design effectiveness was scored by triangulating evidence from Provincial Annual Reports and from material on monitoring and evaluation. Table 2 provides an overview of the material that was used in the process of data calibration, and Table 3 presents the output.

Qualitative comparative analysis results
The analysis was carried out using Charles Ragin and Sean Davey's Fuzzy-Set/Qualitative Comparative Analysis 3.0 software. First, a test for necessary conditions was performed. A condition is necessary when the outcome cannot be achieved without it (Gerrits & Verweij, 2018). The results of the necessity analysis are presented in Table 6. A tilde sign indicates the absence of a condition; for example, ~ COHER means incoherent. The consistency value in the second column of Table 6 reflects the degree to which the cases-the empirical evidence-support the claim that the set-theoretic relationship exists. The coverage value expresses the empirical importance of the relationship (ibid.). As no condition has a consistency value of 0.9 or higher, we find that no single condition is necessary for policy design effectiveness. Subsequently, the sufficiency of the configurations was determined by using a truth table analysis. The truth table in Table 7 lists all the logically possible combinations of conditions and illustrates the cases that are covered by these combinations. Truth table analysis involves the pairwise comparison of configurations that agree on the outcome and differ for only one of the conditions. Four configurations had no cases and thus are not included in the analysis. One configuration (i.e., COHER* ~ CONSIS* ~ CONGR) has a consistency below 0.75 and thus is not included in the analysis either (Gerrits & Verweij, 2018;Ragin, 2009). In the end, three configurations were selected for the pairwise comparison. The analysis was specified to explain positive outcomes, i.e., policy design effectiveness. Table 8 presents the results of the truth table analysis. The table shows that two configurations are sufficient for policy design effectiveness. The first configuration-COHER*CONSIS*CONGR → EFFCT-confirms the theoretical model of Howlett and Cashore (Fig. 1) and supports the notion that the combination of coherent goals, consistent means, and congruence of goals and means explains policy design effectiveness. The second configuration-~ COHER* ~ CONGR → EFFECT-states that incoherence in combination with incongruence is sufficient for policy design effectiveness. Furthermore, consistency is redundant in explaining policy design effectiveness for this pathway.

Interpreting and discussing QCA findings
As part of the increased interest in policy design thinking, the development of effective configurations of goals and instruments has become a major theme within policy science and practice. A widely accepted assumption is that the fit of policy design componentsi.e., the combination of goal coherence, means consistence, and the congruence of goals and means-is to benefit effectiveness Howlett & Rayner, 2013Howlett, 2009Howlett, , 2018b. Building on the theoretical advancements that have been made in conceptualizing the relationship between design fit and effectiveness, this study provides a first empirical assessment by applying Howlett's (2009) nested model on policy design in the context of policy integration. Our analysis did not find any necessary conditions for achieving policy design effectiveness. It did find two sufficient pathways for achieving policy design effectiveness. The first states that the fit of policy design components is sufficient for effectiveness, whereas the second states that policy design incoherence combined with incongruence is sufficient for effectiveness. This two-sided outcome suggests that the supportive relationship between policy design fit and policy design effectiveness is not as straightforward as theory suggests and provides some interesting footholds for further discussion. This section further elaborates and interprets the two pathways to policy design effectiveness and subsequently formulates implications of these findings for policy design and policy integration literature.

Sufficiency of policy design fit for policy design effectiveness
The first pathway states that the combination goal coherence, means consistency, and congruence of goals and means is sufficient for design effectiveness. In light of the object of this study, the outcomes suggest that when the goals and means of a policy design are of the same degree of integration across all three policy levels, this will be sufficient to promote desired policy integration. The pathway's consistency score of 1.0 indicates that all cases covered by this configuration support this result. The low coverage score (0.25) indicates that this result is of limited empirical relevance as the specific configuration accounts for 2 out of 8 instances in which design effectiveness was observed. Interestingly, even though  Howlett & Rayner, 2013Howlett, 2009Howlett, , 2018b, other than our analysis, very few empirical studies have been able to provide empirical verification. So far, only Reichardt and Rogge (2016) have demonstrated, based on interview data, that, in addition to credibility and stability of policy strategies, also policy design coherence, consistency, and congruence were considered by their respondents as important conditions for effectively promoting desired corporate innovation activities in offshore wind. Due to the considerable differences in the design of the current and Reichardt and Rogge's (2016) study, it is hard to draw comparisons and discuss outcomes in relation to one another. Overall, this pathway provides initial empirical proof in support of the theoretical assertion that a coherent, consistent, and congruent policy design effectively attains desired outcomes.

Why incoherence and incongruence was sufficient for policy design effectiveness
The second pathway states that the combination goal incoherence and incongruence of goals and means is sufficient for policy design effectiveness. This pathway, which followed out of the pairwise comparison of the configurations ~ COHER*CONSIS* ~ CONGR → EFFECT and ~ COHER* ~ CONSIS* ~ CONGR → EFFECT (see Table 7), indicates that discrepancies in the degree of integration between policy goals on the one hand, and between policy goals and policy means across policy levels, on the other hand, will be sufficient for promoting desired policy integration. Another aspect that stands out that consistency is redundant in explaining effectiveness for this pathway. From a policy design theory perspective, this outcome is highly unexpected for several reasons. First, because it suggests that the negation of policy design coherence and congruence is sufficient for policy design effectiveness. Furthermore, this outcome contradicts the findings of other empirical studies that found policy design coherence  as well as congruence (Kern & Howlett, 2009;Rogge & Schleich, 2018) to encourage goal attainment. Additionally, this pathway also contrasts several empirical studies that explicitly underline the importance of instrumental consistency in achieving desired outcomes (Kern & Howlett, 2009;Kern et al., 2017;Rogge & Schleich, 2018). The high coverage score (0.75) indicates the result to be of strong empirical relevance as it represents a considerable share of the cases. This underlines the relevance of finding and robust explanation for this counterintuitive outcome. To this end, an in-depth empirical and theoretical account is given for the two individual configurations that formed this pathway.
Concerning the configuration ~ COHER*CONSIS* ~ CONGR two aspects require clarification. First, what stands out already in Table 3 is that the observed incoherence and incongruence in all five cases was the result of the discrepancy between macro-level goals-qualified as integrated land use and transport-and all other policy design components-qualified as multimodal. It is relevant to note that these incoherencies and incongruences were exposed as a result of the multilevel framework that was adopted for this study; a single-level approach would not have found these mismatches. This raises a relevant question. Namely, to what extent do policy misfits between goals and means across policy levels impact effectiveness? A possible answer is provided by Howlett (2014a) who argues that the nested nature of the policy level cause design choices on each policy level to be constrained by higher-order components; high-level governance modes set the outside boundaries for the decision on the second level of policy or program operationalization, which in turn, shape the micro-level operationalization of a policy design. Consequently, this microlevel of policy design has to most significant influence on the outcomes that are achieved (Howlett, 2009). This suggests that if high-level ambitions are not correctly translated into the meso-level components, their influence on policy outcomes will be limited. Furthermore, the impact on effectiveness is expected to be minimal as policy design effectiveness is determined based on meso-level goals. This clarification provides a plausible explanation as to why in the current study the configuration ~ COHER*CONSIS* ~ CONGR was effective. However, for this explanation to hold, it is required to account for Limburg, where the same configuration led to a different, contradicting outcome. Table 7 shows that Limburg is the only case where the configuration ~ COHER*CONSIS* ~ CONGR was ineffective. Following the strategies for resolving contradictions by Gerrits and Verweij (2018), an explanation was found by re-examining the original case-based material. A closer look at the Provincial Annual Reports of Limburg as well as  policy evaluation, revealed that policy action in Limburg was primarily geared toward attaining unimodal policy goals that were put down in the Provincial Coalition Agreement (Dutch: 'bestuursakkoord') instead of the multimodal goals that were defined in the strategic transport policy document, which was used to perform the QCA. This provides an explanation why the desired level of multimodal policy integration was not achieved in Limburg.
The configuration ~ COHER* ~ CONSIS* ~ CONGR was found effective in the cases Groningen and Overijssel. Looking more closely at the scoring of these individual cases in Table 3, it stands out that both cases managed to effectively promote desired integration across transport modes by using instruments that were designed for integrating land use and transport planning. From a policy design perspective, it is surprising that despite this incongruence effectiveness was achieved. Alternatively, from a policy integration point of view, it is sensible that instruments designed to promote integrated policy action within and across the domains of land use and transport can also be effective in promoting collective action within the domain transport.

Implications and opportunities for future research
This study adopted QCA methodology to explore the causal mechanisms behind policy design effectiveness through case-based research. QCA is especially appropriate for testing, refining and validating theory as it requires researchers to follow a well-structured and transparent analytic procedure that iterates between theory and case-based data (Befani, 2013). It should however be taken into account that due to its case-based and explanatory character, QCA findings cannot simply be decontextualized (Byrne, 2013). The results of the current analysis are therefore primarily representative for the Dutch regional transport planning context. Extrapolating case study findings beyond the target population is possible but should be done with care as it requires a degree of similarity between both cases (Greene & David, 1984). Case-based research seeks to make analytic generalizations by assessing the applicability of theoretical conceptions in explaining observed outcomes within a specific context. In doing so, the current QCA provides powerful evidence on the generalizability of the notion that policy design fit benefits effectiveness. The contradicting outcomes of the QCA highlight that this relationship is more ambiguous; then literature puts forward. The process of interpreting and explaining the outcomes by returning to theory and case study data yielded several implications for policy design and policy integration literature.

Implications for policy design
The main contribution of this study to policy design can be considered its empirical insights regarding the relationship between policy design fit and effectiveness. Even though the multilevel policy design model that was adopted in this study is receiving much attention in the more conceptual strand of policy design literature, it so far had not been empirically tested. This study found seemingly contrasting evidence on the relationship between policy design fit and effectiveness. On the one hand, our results provide empirical proof that supports the general theoretical consensus that matching goals and instruments across policy levels benefits design effectiveness (see e.g., Howlett & Rayner, 2013Howlett, 2014b). On the other hand, results show that neither coherence nor consistency nor congruence nor a combination of those features is necessary for policy design effectiveness. Furthermore, we found that under specific conditions, design effectiveness can be achieved despite the presence of incoherence, inconsistency, and incongruence. These outcomes clearly illustrate that the relationship between policy design fit and policy design effectiveness is not as straightforward as theory suggests.
Our multifaceted findings are in line with other empirical studies, which describe, based on different research approaches, various outcomes regarding the relationship between policy design fit and effectiveness. When it comes to promoting effectiveness Kern et al. (2017) describe the importance of coherence and consistency, Rogge and Schleich (2018) highlight the need for consistency and congruence, while Reichardt and Rogge (2016) indicate that coherence, consistency, and congruence are all needed. However, since these studies have adopted a single, meso-level approach to studying policy designs, and use different methodologies, their outcomes are difficult to compare to our findings. Essentially, a systematic assessment, based on the multilevel model like the one in this study had been missing in current design discussions. The outcomes of this study are therefore the first step toward a more profound understanding of the influence of the fit of policy design components across policy levels on effectiveness. It needs to be noted that the findings are, due to the specific operationalization of the model, closely related to the domain of policy integration. It would be essential to empirically study this model in other policy domains to get a better understanding of the apparently intricate interrelationship between policy design fit across policy levels and policy design effectiveness. The well-developed theoretical body of literature on policy design offers a robust analytical framework for designing well-structured and consistent empirical research across a broad field of applications, which would allow for triangulation of findings from a wide range of applications (George et al., 2005).
Another possible explanation for the multifaceted findings regarding the relationship between policy design fit and design effectiveness of this and other studies is that there may be other policy design attributes at play that might have influenced design effectiveness. The literature review presented an overview of the various attributes that have been introduced in recent studies. From these attributes, it is only certain that temporal influences as described by e.g., , Peters et al. (2018), and Rayner et al. (2017) can be ruled out as we tracked the development of the policy designs over time by analyzing annual reports and did not observe any changes in the typology for any of the design components. It is hard to reflect on the possible influence of the other attributes that have been linked to policy design effectiveness-i.e., policy design balance (e.g., Schmidt & Sewerin, 2018), explicitness (e.g., Thomann, 2018), capacity 1 3 (e.g., Mukherjee & Bali, 2019), and goodness-of-fit (e.g., Peters et al., 2018)-on our findings as existing studies on those attributes lack empirical testing as discussions have remained predominantly explorative and conceptual of nature. This once more underlines the need for empirical research that studies the impact of policy design attributes on effectiveness, by adopting policy design effectiveness as the dependent variable. This would require conformance and performance approaches to policy design evaluation to further develop into complementary approaches for assessing policy design effectiveness (see e.g., Faludi, 1989;Mastop & Faludi, 1997).
In addition to existing studies on the relationship between policy design fit and effectiveness, the outcomes of this study provide a novel perspective on the concept of policy design fit. Currently, the features that determine policy design fits are presented as a duality; a policy design is either coherent, consistent, and/or congruent or it is not. This goes against our empirical findings, which suggest that when taking different policy levels into account, varying degrees of incoherence, inconsistency, and incongruence affect effectiveness differently. Importantly, these need not result in ineffectiveness. Especially when these are found at a macro-level they have limited impact on effectiveness. As such, this argues for a much more nuanced conceptualization of policy design coherence, consistency, and congruence. This would be an interesting avenue for further research to explore.

Implications for policy integration
Building on Candel and Biesbroek (2016), this study analyzed how effectiveness discrepancies in the level of integration of interrelated policy design components. As shown in Table 2, the current study differentiates between three levels of integration: sub-sectoral fragmentation (unimodal planning), intra-sectoral integration (multimodal planning), and intersectoral integration (integrated land use and transport planning). The individual scoring of our cases in Table 3 shows that, in line with Candel and Biesbroek (2016), discrepancies in the level of integration within a single policy design are 'the rule rather than exception.' Interestingly, it was found that these a-synchronicities under specific circumstances do not necessarily stand in the way of achieving effectiveness. More specifically, results indicate that instrument mixes, which are of a higher degree of integration than the related policy goals can be effective in promoting desired integration. This was observed in both Groningen and Overijssel. In these cases, intersectoral instrument mixes were used to effectively promote desired processes of intra-sectoral policy integration. It could, however, be argued that such a-synchronicity is inefficient since instruments promote higher levels of collaboration and interaction throughout the public sector than is necessary to achieve desired outcomes. Additionally, the effective policy design of Groningen illustrates that intra-sectoral objectives may be incoherently operationalized to sub-sectoral on-theground measures as long as there are instruments in place that help to coordinate sub-sectoral policy action in line with intra-sectoral objectives. This outcome resonates well with work by Cejudo and Michel (2017) who argue that policy integration may be achieved through overarching procedural instruments that guide sub-sectoral policy action in line with a shared overarching integrative logic.
The results of the analysis suggest that the formulation and adoption of a shared overarching logic to guide integrated government action is not straightforward. It stands out from the observed discrepancies in Table 3 that, even though it is widely recognized that policy-making should take into consideration intersectoral relationship between land use and transport, only a few organizations have successfully translated these integrative 1 3 ambitions to lower policy levels. This observation is in line with work by Rayner and Howlett (2009), who noted that integrated goals are rarely adopted unless there is a widespread dissatisfaction of existing approach, as a variety of institutional barriers-financial, organizational, cultural, legislative, political, and technical-that have to be overcome (cf. Hull, 2010).
The outcomes of this study add to an emerging body of research on the appropriate policy instruments for giving effect to these integrated goals (e.g., Marsden & Reardon, 2017;Mu & de Jong, 2016;van Geet et al., 2021). Although findings have been derived from the context of transport planning, the analytic generalizations that can be drawn from this study (see Polit & Beck, 2010) carry interesting implications for debates on policy integration in other sectors such as health policy, climate policy, environmental policy, and energy policy.
A key insight that was obtained from the current study is that the design of the instrument mix plays a crucial role in supporting and steering the integrated government action that is required to achieve policy goals that are shared between multiple sectors. In line with Candel and Biesbroek (2016) and Cejudo and Michel (2017), our analysis underlines the importance of overarching policy instruments that allow to steer and coordinate sectoral or sub-sectoral action in line with shared goals. Our outcomes indicate that in line with the degree of integration of the adopted policy goals-i.e., the range of sectors or sub-sectors that are involved-instrument mixes need be at least of the same degree of integration.

Conclusion
This study applied the multilevel approach to policy design to determine which conditions of policy design fit-coherence, consistency, and congruence-are necessary or sufficient for policy design effectiveness in the context of policy integration. The QCA that was performed revealed no necessary conditions or combinations of conditions and showed two configurations of conditions to be sufficient for policy design effectiveness. The first configuration confirms that the presence of policy design coherence, consistency, and congruence is sufficient for policy design effectiveness. The second configuration is counterintuitive and states that the combination incoherence and incongruence is sufficient for policy design effectiveness.
An in-depth theoretical and empirical interpretation of the QCA outcomes leads to the conclusion that when it comes to promoting policy integration, achieving policy design effectiveness is not a matter of simply matching goals and means across policy levels. In specific situations, a policy design is still effective despite being incoherent, inconsistent, or incongruent. For example, mismatches between macro-and meso-level policy design components will not necessarily impede design effectiveness when meso-and micro-level components are aligned. That is, there are different degrees of policy design coherence, consistency, and congruence that impact effectiveness differently. Furthermore, when policy means are inconsistent but show a higher degree of integration, these means can still be effective even though this makes them less efficient in achieving the desired outcomes. Hereby, our study shows that the relationship between policy design fit and policy design effectiveness is more intricate in practice than theory suggests. More empirical research is needed to complement the initial steps made in this study to get a better understanding of the relationship between policy design fit and effectiveness from a multilevel policy design perspective.