This paper deals with the quality of integrated scenarios from a futures research perspective. How to measure and assess scenario quality still is a contested question. Although several proposals and lists of criteria have been circulating for decades (e.g., [13]), these criteria are not consensual, and often are only weakly defined [4, 5]. In scenario construction and usage, actors and methods from various traditions and social systems—from academia and practice – come together [6]. Different quality criteria meet, and at times come into conflict, confronting different perspectives [7] as well as –the academic and practical expectations involved [5]. Currently, comprehensive efforts are being undertaken to define general standards and quality criteria for futures research, as seen for example in research by Gerhold and colleagues [8] as well as Kuusi, Cuhls and Steinmüller [5]. These proposals come from a rather academic perspective, and focus on futures research activities in general. They are not specifically tailored to scenario quality, but more generally apply to various forms of “futures maps” [5] produced by futures research.

The general question underlying this paper is how to measure the quality of (integrated) scenarios from a futures research perspective, i.e., how to assess and to compare the quality of different forms of methodologies and their resulting scenarios. This question was one aspect addressed by our study exploring new forms of integrated scenario methodologies and their effects (ACCESS).Footnote 1 Integrated scenario methodologies, i.e., those combining “storyline and simulation (SAS)” [9, 10] have become state of the art in developing explorative scenarios of socio-environmental and socio-technical change [11]. An overview on early empirical applications is given by Alcamo [9, 10]. Despite having plenty of appeal, these approaches are also fraught with difficulties [12, 13]. Therefore, new forms are called for [11, 14] and are also currently being developed in the fields of climate change [15] and energy research [13, 1620], for example.

This paper presents the small set of criteria that have been developed by the ACCESS project, namely scenario traceability and scenario consistency. It explains why these two criteria were selected, and how they were defined and empirically applied to new forms of integrated scenarios. Our work suggests that the two criteria allow discussion and assessment of if, how, and on what levels and dimensions new forms of integrated scenario methodologies do (or do not) support scenario consistency and scenario traceability when compared to classical ‘story and simulation’ approaches. In sum, they allow the comparison of scenario quality from an academic perspective. These criteria do not claim to be exhaustive, but rather indicate a starting point by focusing on two central quality issues of integrated scenarios.

Based on the identification of challenges of current (integrated) scenarios from an academic perspective, particularly concerning their consistency and traceability (section 2), the quality criteria of scenario traceability and scenario consistency can be more precisely defined (section 3). For illustration, this paper reports on their application to two empirical cases. In these cases, new forms of integrated scenario methodologies are used, combining a qualitative form of systems analysis (cross-impact balance analysis CIB, developed by Weimer-Jehle [21]) with simulation in order to construct socio-environmental scenarios. A few selected findings are presented (section 4) and centrally, the criteria are discussed regarding their practical usefulness and in light of the current debate on quality in futures research (section 5). A conclusion sums up the work and findings and points to further research (section 6).

Central quality challenges of integrated scenarios

Integrated scenario methodologies are characterized as being deeply ‘hybrid’. Mainly issues of traceability and consistency challenge their academic quality.

Background: The hybrid character of integrated scenarios

The basic idea of SAS [9, 10] is to construct a set of qualitative storylines covering a range of possible futures, to then translate the driving forces of the storylines into quantitative sets of input data for the numerical model(s), and to use these sets for scenario simulation. The approach relies on the principle of iteration, and recommends revising the storylines after simulation, adapting the input-data sets to the refined storylines and repeating the simulation.

Scenario approaches of the SAS type are ‘hybrid’ with regard to several dimensions: First, SAS methodologies (in the sense of Hinkel [22]) combine methods from very different realms and disciplines, ranging from mathematical modeling and informatics to the facilitation of creativity workshops. They include various types of actors ranging from scientists, researchers, scenario experts, stakeholders and at times even lay people [6, 7, 23]. And they bring together various forms of data, i.e., knowledge, information, assessments of and beliefs on past, present and future developments [24]. Second, at times these elements introduce diverging paradigms and conceptions of the future into these processes, ranging from ‘predict and control’ to ‘create the future’—and then provide hybrid forms of “modes of orientation” in the sense of Grunwald [25]. Third, SAS results in hybrid scenarios, comprising qualitative (context) descriptions and quantitative model calculations of system consequences. Finally, integrated scenarios mostly have multiple different functions, target audiences and users [26]. Ideally, they are simultaneously intended—and expected – to serve as policy advice and in decision making, as well as to be published and further used in academic contexts. In particular, their numerical modeling is assessed by scientific criteria.

In sum, integrated scenarios are boundary objects [7]. Among the variety of scenario approaches, they can be situated on the ‘academic’ side of futures studies. Their critics mainly stress their ‘unscientific character’ [10], and new forms that are being developed mostly aim at improving their ‘scientific’ quality [13, 27]. In the following, two quality challenges of integrated scenarios are identified, which we consider central from an academic future research perspective.

The traceability challenge

A first key challenge of integrated scenarios is their traceability.

A standard for scenario communication

Traceability of scenarios is often discussed in the terms of ‘transparency’, ‘explicitness’, ‘accessibility’, ‘documentation,’ and also ‘reproducibility’. Even if there is little conceptual precision, literature shows that the idea of what we call traceability is an agreed-upon and fundamental standard in futures research [6, 24]. With respect to scenarios, traceability is considered as a substitute for participation during scenario construction [6]. The central idea is that traceability allows those actors who have not been included into the production of scenarios, i.e., the external “recipient users” [28] to “make an informed choice, whether and how to use them” [6]. When engagement of users is not possible, "[t]he only alternative is for developers to provide fully detailed and explicit accounts of scenarios’ underlying reasoning and assumptions” [6] and embedded values.

This standard is “widely advocated”, but rarely achieved [6] as it “requires such a ‘traceable account’ of how each scenario was produced including areas of weakness, low confidence and disagreement" [6]. This in turn requires honest disclosure of all “ingredients” and their “mixture”, in the sense of Grunwald [24], behind a scenario process. This means to disclose expert guesses, tacit knowledge, errors and detours, i.e., to go beyond text book presentations or idealized design descriptions [22, 29]. Thus, in sum, it is the traceability of both—of scenario assumptions and of the scenario construction process—that are seen as prerequisites to enabling external users to assess scenario quality.

Traceability of integrated scenarios

In integrated scenarios, both components, i.e., qualitative storylines as well as numerical model based scenarios, are criticized for not being traceable. This threatens the use by, and the usability of, these scenarios for external users.

Alcamo himself considers one of the key limits and challenges of SAS to be that qualitative storylines suffer from a lack of what he calls “reproducibility”, Footnote 2 as they are based on “assumptions and mental models of storyline writers [that] remain unstated” [10]. As these assumptions are not transparent and not explicitly documented, the storylines are difficult or impossible to access, to criticize and to reproduce. In consequence, storylines are perceived as being not scientific.Footnote 3

However, numerical models, and the scenarios based on these, have issues pertaining to transparency, explicitness and accessibility, too. Parson [6] and Grunwald [24] warn that especially model and simulation based scenario studies do suggest scientific quality, but are very difficult to use by external users—as these users do not easily understand what is behind the results. Grunwald [24] criticizes, with reference to model based energy scenarios, that the underlying models often are not public. Van der Sluijs [30] found that even publically accessible numerical models are based on hundreds of implicit (internal) assumptions and modeling, as well as simulation decisions that are often only partially documented or inaccessible for externals.Footnote 4 Thus, even those numerical simulation modeling results that are traceable in theory often are not so in practice, at least not to externals and to those who are not modeling experts.

The critiques of both components of integrated scenarios, taken individually, can be summed up in accusations from the one side of being ‘unscientific’ and ‘nontransparent’ vs. accusations from the other side of being ‘black-boxed’ and ‘technocratic.’ There is little information in literature on what happens to traceability when both components come together. As one exception, Kemp-Benedict [31] hopes that integrated approaches will foster traceability. He argues that mathematical modeling forces the narrative to clarify the definitions of its elements and of the interactions between these elements, which leads to more rigor and transparency. On the contrary, it also seems plausible to assume that integrated scenarios do combine the difficulties of both components, and in addition, might add new complexities and ‘muddling’ to the scenario construction process in form of further (not explicit) assumptions.

The promise of consistency

The second quality challenge of integrated scenarios is their consistency.

A principle of scenario construction

Consistency of scenarios is also discussed in terms of ‘coherence’, ‘plausibility’, ‘logics’, ‘realism’, or ‘compatibility’ [4]. Scenario literature shows that consistency has several functions. It is considered a constitutive element of scenarios, i.e., an integral part of the definition what a scenario is (e.g., [32] Footnote 5 and others). At the same time, it is a fundamental principle of scenarios construction and selection.Footnote 6

In my view, consistency is understood as a safeguard against arbitrariness of scenarios. It is a substitute for empirical validation, which is not possible and not appropriate with respect to scenarios because their object is not accessible in the present and because they do not claim to be or to become true. As a scenario construction principle, consistency is a heuristic that forces the scenario builder to reflect how ‘bits and pieces’ are brought together to form scenarios [4]. Consistency is considered a necessary, but not sufficient, condition for a scenario to be plausible [3]. Plausibility in turn is linked to ‘possibility’ and ‘credibility’ of scenarios [33].

Beyond this apparent consensus, different consistency concepts, criteria and measures of consistency coexist. Van Asselt and colleagues [29] have shown that different understandings of consistency are circulating: Consistency means being in line with historical trends and developments when the “historic deterministic temporal repertoire” is used; or it refers to the internal consistency of scenarios, when the “futurist difference temporal repertoire” is taken over. In different scenario ‘schools’, diverse consistency concepts are applied: Mathematical models can be considered ‘objectively’ internally consistent by definition of their mathematical (causal) logics.Footnote 7 Storylines, however, rely on holistic consistency ‘filters’ as intuitive gut feelings, i.e., subjective consistency definitions [34]. More systematic, qualitative scenario approaches use the consistency principle to future variants with each other to form comprehensive pictures and to select scenario samples (e.g., the so-called consistency analysis, CA [35]). To this aim, different formal consistency algorithms and consistency scales have been developed. Footnote 8 The different consistency measures do apply different consistency criteria: CA is using the criterion of co-incidence or co-existence of factor developments. In contrast, CIB is using qualitative causal information considering the direction of influences between developments [36].

Currently, the consistency principle is criticized as not being adequate for the representation of scenarios of complex adaptive systems, reasoning with the argumentations from transition research that inconsistencies point at dynamics and change.Footnote 9 This debate shifts the focus of attention towards (slightly, but not completely) inconsistent scenarios.

Consistency of integrated scenarios

Generally, the SAS approach suggests that modeling and simulation are used to identify inconsistencies in the storylines [9, 10]: “[SAS] can incorporate state of the art computer models for generation of numerical information about environmental changes and their driving forces and […] checking consistency of qualitative scenarios.” This ‘promise of consistency’ is taken over in literature by many, seemingly unquestioned and in most cases without further explication of how this works out (cf. e.g., [3, 31, 3739]).

However, overall, the SAS literature is not very precise with regard to what exactly is meant by consistency and how concretely this “consistency check” [10] can be carried out successfully: First, in descriptions of the SAS approach, different levels of consistency are mentioned, without explicitly reflecting that these are different levels. For instance, in a text on the methodology of the Millennium Ecosystem Assessment [39], the authors make allusions to what we identify as at least four different levels of consistency, namely: consistency with current knowledge,Footnote 10 internal consistency of storylines or of assumptions,Footnote 11 consistency between numerical models and storylinesFootnote 12 and finally, consistency between (input and output data of) different numerical models.Footnote 13 Second, scenario literature gives conceptual and empirical hints that this promise of consistency is difficult to keep. The consistency check of SAS is limited to those parts of the storylines that are covered by the numerical systems model, too [40, 41]. Kemp-Benedict adds that texts on SAS “provide little or no guidance to those responsible for the narratives beyond a dialogue with the model output” [41]. Furthermore, there are empirical hints that this promise of consistency is difficult to fulfill in practice. For instance, Volkery et al. [23] report from their PRELUDE project that problems of consistency occurred on two levels: With regard to consistency between different storylinesFootnote 14 and “problems of ensuring overall consistency between qualitative [storyline] assumptions and [corresponding] quantitative [model] input” [23].Footnote 15 Further empirical hints are provided by Schweizer and Kriegler [40]: Through an ex-post reconstruction of the storylines published by the IPCC within the so-called SRESFootnote 16 [42], they analyzed the storylines regarding their assumptions on interrelations between scenario factors.Footnote 17 They found out that the storylines vary widely in their internal consistency; and that several further fully internally consistent scenarios were absent from the sample. These empirical findings indicate that the consistency check promised by SAS is not automatic.

Overall, qualitative storylines are perceived as being problematic in terms of consistency. Therefore, an array of consistency methods and scales has been developed. With regard to integrated scenarios, it remains unclear under what conditions the ‘promise of consistency’ can be effectively fulfilled in practice—and for what levels and understandings of consistency.

Scenario traceability and scenario consistency – working definitions

For qualitatively measuring, assessing and comparing different (integrated) scenario methodologies and their resulting scenarios, scenario traceability and scenario consistency need to be more precisely defined and operationalized. This paper proposes the following working definitions.Footnote 18

Scenario traceability

Based on a transdisciplinary and common sense based understanding, a process is traceable, means one can follow, what has been done and how a process came to its results. Traceability refers to tracing or to tracking results back to their production process, but also resonates with understanding and comprehending reasons and justifications underlying this process.

Scenario traceability more specifically refers to the process of scenario construction, namely to the ingredients that are used and the process of relating them to each other [24], as well as to the further processing and presenting of them. The ingredients comprise, following Grunwald [24], heterogeneous elements of knowledge, expectations, fears and hopes. These can be summarized rather generally under the term of assumptions on future developments or scenario assumptions. The term assumption explicitly refers to the understanding that these are present statements (in the sense of Grunwald [24]) on scenario uncertainty (in the sense of Walker and colleagues [43]). We distinguish two types of scenario assumptions, namely assumptions on future developments and assumptions on systemic characteristics linking these.

The relating, processing and serving of the ingredients then refers to the procedures of scenario construction, often structured by specific scenario methods providing specific rules to do so. This centrally comprises two dimensions: On the one hand, the composition of individual scenarios, i.e., the combination of individual scenario assumptions into an overall bundle. On the other hand, the definition and selection of a scenario sample, i.e., the selection of distinct alternative scenarios for the same scenario field and future space.

Furthermore, scenario traceability is understood as a subjective category depending on the access to information about ‘ingredients’ and their ‘mixing’ (e.g., by internals vs. externals). In addition, perceived traceability might also be influenced by the method expertise and the individual background knowledge of a scenario’s user, as well as the effort (s)he invests into tracing a scenario construction process. Particularly in combined scenario processes, scenario traceability is assumed to be an issue for internal users too: In integrated scenario processes, scenario-groups, modelers and scenario-experts are, depending on the design of actor inclusion, internals to some of the scenario construction activities—but external to others. Therefore, this definition distinguishes between internal scenario traceability, i.e., traceability for internal actors of the entire process,Footnote 19 and external scenario traceability, i.e., traceability for completely external actors, i.e., actors that have not participated during any of the scenario construction activities.Footnote 20 Furthermore, it distinguishes between the perceptions of users that are (method) experts (e.g., modeling experts, scenario experts) and those that are rather lay persons with respect to the methods used. In this sense, an internal qualitative scenario expert might be a lay person with respect to the numerical model, for example.

Overall, tracing scenario construction means an (internal or external, expert or lay) user of the scenarios can trace the following four dimensions:

  1. 1)

    Assumptions on future developments: What alternatives have been included as possible and relevant future developments?Footnote 21

  2. 2)

    Assumptions on interrelations between future developments. What ‘logics’ or ‘overall system representation’ lay behind the scenarios, and what has been assumed on interrelations between future developments?

  3. 3)

    Individual scenario composition: How have individual scenarios been composed and how was their composition decided about and why do they look how they look like—and why do they not look differently?

  4. 4)

    Scenario sampling: Why has this scenario sample been chosen and why not a smaller, bigger or different one, focusing on other scenario features (e.g., extreme scenarios). In sum, why have these n = x scenarios and not, e.g., n = y + 2 scenarios been chosen?

Scenario consistency

Based on a transdisciplinary understanding, consistency means that something makes sense and is coherent in itself. This understanding fits the general definition that something is consistent if it does not show inconsistencies, i.e., does not contain contradictions.

First, scenario consistency more specifically refers to scenarios as products such as scenarios texts, films, tables and graphics. A scenario product can be assessed as consistent or not, but not the scenario process leading to it. Nevertheless, it is the scenario construction process that contains the reasons for (in-)consistencies.

Second, scenario consistency is understood as a relational category, meaning something is (in-)consistent with something else: (A) and (B) are (in-)consistent, with A and B respectively both being scenarios, scenario elements, or underlying (numerical, conceptual, mental etc.) models.

Third, scenario consistency depends on the consistency criterion applied, i.e., A and B are (in-)consistent with respect to a specific definition of consistency (x). Regarding scenario consistency, these criteria can be either intuitive (holistic) or systematic (analytic and formal). A scenario can be intuitively matching one’s ideas and its intuitive consistency can be judged by subjective assessment. On the contrary, a systematic-analytic consistency concept follows formal rules that allow for objectively decomposing and recomposing its logics; examples are coincidence and causality. We assume that different consistency criteria can conflict. A scenario pair consistent in the sense of the CA is not necessarily consistent in the sense of CIB – and it is an open question whether a scenario pair consistent with regard to a formal criterion is also intuitively perceived as a consistent one by (internal or external) users. In sum: ‘(A) and (B) are (in-)consistent under criterion (x)’, with A and B being scenario (elements) or numerical, conceptual or mental models.

We propose to distinguish four levels of scenario consistency:

  1. 1)

    Internal consistency refers to the question whether an individual scenario is consistent within itself. Or, to turn it into a ‘relational’ formulation, whether the assumed development of each scenario element is consistent with the assumed developments of all other scenario elements.

  2. 2)

    Consistency within a scenario sample (or scenario set) refers to the question whether all scenarios of one sample are consistent with one another.Footnote 22

  3. 3)

    Consistency between different forms of one scenario, e.g., between a narrative and a numerical form of a scenario, refers to the translation of scenarios into different forms as it occurs in integrated scenario approaches. The question at this level of consistency is: Are the, e.g., numerical scenarios consistent with their corresponding narrative scenarios? Regarding consistency between numerical and qualitative scenarios, we propose to distinguished two steps:

    1. a)

      Is the ‘first half’ of numerical scenarios, i.e., the quantitative input data sets, consistent with the corresponding sample of qualitative storylines?

    2. b)

      If yes, are the ‘second half’ of numerical scenarios, i.e., the model calculated indicators (output) also consistent with the corresponding qualitative storylines?

    3. 4)

      Consistency of underlying models refers to the system representations underlying the different (numerical, narrative etc.) forms of a scenario sample, comprising system boundaries, system elements, internal and external relations. The question is whether the (e.g., qualitative) system representation underlying one (narrative) scenario is consistent with the (e.g., numerical) system representation underlying the corresponding (numerical) scenario? In principle, this level of consistency refers to all thinkable different types of models, i.e., mental models of different actors or actor groups as well as conceptual and numerical models that can be compared among one group or with each other.Footnote 23

Consistency on different levels can, but does not need to, interrelate. On each level, different consistency criteria can be applied. Note that on all four levels, scenario builders may have very good reasons not to strive for consistency, but instead to explicitly focus on – or to live with—inconsistencies.

Empirical application and selected findings

To illustrate how these criteria can be applied, we report about their use in two case studies [44] and present selected findings.

Application in two empirical case studies

These case studies explored new forms of integrated scenario methodologies. They combine cross-impact balance analysis (CIB) [21] with simulation to construct socio-environmental scenarios.

CIB is a systematic yet qualitative form of systems analysis. It starts by building an impact network on future (societal) developments, i.e., a form of conceptual model. This impact network is based on expert judgments on the direction and strength of influences between alternative developments of system elements. System elements and their alternative developments are considered in their double role as influencing factors and as factors receiving influence. Impacts are assessed pairwise by using a semi-formalized scale (from strongly hindering to strongly promoting impacts). These assessments are underpinned with textual justifications. Assumptions are stored in a matrix, making the mental model(s) of those using the method explicit. The methodical core of CIB is a specific form of balance analysis. It allows for the identification of internally consistent network configurations, i.e., scenarios. The balance analysis is based on the information on the impact relations between the alternative developments of system elements. In CIB, those scenarios are defined as internally consistent, meaning that they are in accordance with the impact arguments of the impact network. This function of CIB can be used to support the construction and selection of qualitative scenarios. Overall, CIB has a medium degree of formalization. For instance, for single scenarios, the balance analysis can easily be done with pen and paper [21]. CIB has been applied as a qualitative scenario technique and stand-alone method in various fields ranging from the future of energy and sustainability to health and innovation issues.Footnote 24

Using CIB in combination with simulation has been labeled ‘CIB and simulation’ [12] or ‘context scenarios’ [13, 20]. Due to the specific characteristics of CIB, this new approach is expected to enhance integrated scenario methodologies, especially with regard to traceability and consistency.

The first case combining CIB with simulation (‘UBA’) was a demonstrator application of the use of CIB to construct framework data sets for a group of environmental models “Germany 2030”.Footnote 25 The second case (‘Lima Water’) was a full pioneer application of CIB in combination with a numerical water system simulator, resulting in integrated (qualitative-quantitative) scenarios on “Lima’s water management futures 2040”. Footnote 26 To reflect both cases, empirical evidence was collected by using three sources, namely participant observation, semi-structured interviews with process-participants, as well as process documents [44].

To measure the traceability—which had been understood as a subjective category—we centrally relied on semi-structured interviews with internal and external actors of the scenario construction processes. In both cases, all central internal actors were interviewed, i.e., modelers, CIB scenario-experts and members of the scenario-groups as well as selected external stakeholders. Overall, n = 32 interviews were conducted. Interview records were transcribed, coded [45] and analyzed through qualitative data analysis [46].

To assess the consistency of the resulting scenarios, we centrally relied on process documents, i.e., the interim and final versions of the different scenario forms (raw CIB scenarios; storylines; input data sets; simulation outputs as well as integrated, i.e., qualitative-quantitative, scenarios). Each form of scenarios was analyzed over time and the different forms of scenarios were compared regarding their structure and content. This was supported through qualitative content analysis [47].

In addition, evidence from interviews and process documents was triangulated through findings from participant observation. These were used to identify possible explications for the development of (in-)consistency and (non) traceability of scenarios in the course of the integrated processes.

Evidence was first analyzed by case. Each individual case report was validated by n = 2 key informants. Second, a cross-case analysis was carried out to compare patterns of conditions and factors for traceability and consistency in the two methodologies (in the sense of Hinkel [22]). In order to support the interpretation and generalization of findings, an expert workshop was carried out.

Selected findings

The empirical analysis shows that scenario methodologies combining CIB with numerical simulation have some new answers to the traceability and consistency challenges with which more ‘classical’ SAS approaches are confronted—and that some challenges remain.Footnote 27

Tackling the traceability challenge

Table 1 sums up the degree of scenario traceability perceived in the UBA and the Lima Water case.

Table 1 Overall degree of scenario traceability in the UBA and Lima Water case (across actors and forms of scenarios) (own assessments on a 6 point scale: 1 very low, 6 very high)

Overall, in both cases, across actor groups and across the different forms of scenarios (i.e., raw CIB scenarios as well as in the derived narrative, numerical and integrated scenarios), assumptions on future developments were perceived as highly traceable. In contrast, but again in both cases, the traceability of assumed interrelations, of the composition of individual scenarios and of the sample were perceived as rather low, at least by non-experts or externals, i.e., actors not directly involved into these activities.

In each of the cases, the degree of traceability has been put into relation to the methodology in the sense of Hinkel [22], i.e., the individual constellations of methods, actors and ‘data’ influencing different activities and outcomes. This allowed for interpretation and qualitative ‘explanation’ of these traceability results. The cross-case analysis showed rather similar patterns and factors for (non)traceability effects:

First, using CIB within the combined methodologies of both cases had fostering effects on scenario traceability, especially with regard to assumptions on future developments: The qualitative scenario part is not based on the mental models of its producers only, but also on the conceptual CIB model making these explicit [27]. This conceptual model provides—at least for internals and those with method expertise—access to assumptions on future developments and also on interrelations between those. In principle, the mathematical model thus has an (albeit qualitative) ‘model partner’ it can be explicitly (albeit qualitatively) compared to on the level of system elements and interrelations. Nevertheless, for externals, assumptions on interrelations have been covered again in the derived scenario forms (e.g., storylines, numerical scenarios).

Second, by using CIB instead of other intuitive approaches to scenario selection (through intuitive logics or through the modelers themselves), the task to compose individual scenarios is taken away from the (intuitive approaches of) modelers or scenario groups, and instead is handed over to the CIB analysis and its balance algorithm. Therefore, it is traceable—at least for those who understand the method. At the same time, CIB is not unchallenging to understand, whether for externals or for internals as members of the scenario group, who in both cases rather achieved a ‘roundabout’ understanding of the approach. This challenges the traceability of assumptions on interrelations, as well as the traceability of the composition of individual scenarios.

Traceability of the scenario sample and of the assumptions on interrelations is also influenced by (the access to) documentation. Following Parson [6], this study was based on the assumption that scenario traceability could be a substitute for participation. Still, there were difficulties in both cases of realizing traceability, even for those project internals (e.g., the modelers) who were in the role of externals with respect to (parts of) the CIB. Achieving an appropriate substitute seems very demanding in terms of documentation, method expertise and explication.

Reversing the promise of consistency

Table 2 gives a rough overview of the degree of scenario consistency reached in both cases.

Table 2 Overall degree of scenario consistency in the UBA and Lima Water case (across actors and forms of scenarios) (own assessments on a 6 point scale: 1 not given at all, 6 fully given, consistency criterion of CIB if not indicated otherwise)

The internal scenario consistency as well as consistency within the sample (both in the sense of the consistency criterion of CIB), are high in both cases. In the Lima Water case, this holds true for all forms of scenarios, namely raw CIB scenarios, storylines, numerical input data sets and integrated scenarios—with the exception of one out of four scenarios. Still, consistency between content and structure of different forms of scenarios is assured on the level of appearance only—as assumptions on systemic interrelations had been covered again (see above). Furthermore, consistency of underlying models is cautiously estimated as being rather low in the Lima Water case.

Again, these empirical findings have been interpreted and qualitatively explained through the individual methodologies of the cases. A cross-case analysis has shown patterns and factors for scenario consistency: First, in both methodologies, contrary to’classical’ SAS, the burden to assure scenario consistency (of qualitative scenarios) is handed over to the CIB. The promise of consistency is reversed. Our findings suggest that the internal consistency of the qualitative scenarios is rather easy to assure, namely by the correct application of CIB.

Second, if all scenarios of a chosen sample are based on the same CIB, then consistency within this sample is given, too. In the Lima Water case, the CIB based scenario structure had been consciously given up for one out of the four scenarios at the end of the process to assure the scenario groups’ support of the sample.

Third, as the Lima Water case shows, the consistency of the raw CIB scenarios can propagate to the narrative storylines and to the sets of numerical input data, and thus support consistency between different forms of scenarios. This propagation of internal scenario consistency does not occur automatically, but can be actively supported by the use of data generated by the CIB,Footnote 28 as well as by the active work of CIB advocates, meaning actors representing and following the CIB, for instance during the writing of narrative storylines and during the translation into numerical scenarios. A shared system understanding of the different actors included in the further processing of the CIB scenarios (as e.g., modelers, storyline authors, scenario-group etc.) supports the consistency between different forms of scenarios. Otherwise, the propagation of scenario structures and contents is threatened by various types of distortion and bias (as e.g., the subjective perspective of storyline authors as well as model needs and simulation requirements, which possibly do not correspond to the assumptions of the CIB scenarios). In turn, consistency between different (qualitative and quantitative) forms of scenarios is not automatic, either and was, in the Lima Water case, achieved rather on the level of appearance. This means that input data sets and model outputs were ‘somewhat’ in line with the general ideas of the CIB scenarios and the derived storylines.

Fourth, achieving deeper degrees of consistency between CIB scenarios and numerical scenarios requires not only explicit and systematic model comparisons, but possibly also the reciprocal adaptation of the conceptual and numerical models. Full consistency of underlying models was not achieved in either case. In the UBA case, conceptual and numerical models were neither systematically nor explicitly compared nor adapted; in the Lima Water case, the CIB and the simulator were compared and adapted only tacitly and selectively.

Overall, our findings suggest that integrated scenario methodologies using CIB offer some new answers to the traceability and consistency challenges ‘classical’ SAS approaches are confronted with—but that some difficulties remain.


Usefulness and refinement of the working definitions

Overall, the criteria of scenario traceability and of scenario consistency as defined in this paper were useful in assessing new integrated scenario methodologies and their resulting scenarios. The criteria allowed for empirical measuring and comparing of the degrees of scenario traceability and scenario consistency that were reached. The pre-defined sub-dimensions allowed for empirical distinguishing between different dimensions of scenario traceability. They were used to show on what dimensions traceability effects did or did not occur. The proposed sub-dimensions of scenario consistency allowed for precise separation of different levels of scenario consistency. In particular, it was fruitful to distinguish between consistency on the level of scenarios and consistency on the level of underlying models. In sum, the working definitions provided a small set of precise, distinct and empirically applicable criteria of scenario quality that are relevant from an academic perspective. The criteria aim at supporting external users (or evaluators) in their assessment of the scenario processes and scenarios produced by others. At the same time, the criteria also could be helpful to reflexive scenario developers to guide them during their own scenario development and reporting activities.

Still, the empirical application also revealed that some of the sub-dimensions could be further refined: Regarding traceability, ‘traceability of assumptions on future developments’ is a notably broad category which does not sufficiently distinguish between what has been assumed and why it has been assumed, i.e., what justifications and reasons are given (or not), meaning what are the ‘assumptions behind the assumptions’. Furthermore, open questions remain concerning the issue of what needs to be traceable to whom. To more specifically analyze these dimensions, different types of (internal and especially external) users and their traceability needs must be conceptually and empirically analyzed in more depth than it has been possible in this study.

Regarding consistency, ‘consistency with current knowledge’ had been explicitly excluded from the pre-defined scenario consistency dimensions. This had been decided to stress the fundamental future openness of scenarios – and to avoid the bias measuring scenario quality against their accordance with knowledge on past and present developments [29]. However, empirically, this dimension did play a considerable role for scenario producers and users. Therefore, in further studies, this aspect might need to be added to the consistency definition, especially when referring to consistency with current knowledge on future developments.

Also, to analyze apparent consistency (especially in the Lima Water case), more detailed categories have been added to compare verbal and numerical (input related) scenario forms. We distinguished between the apparent scenario structure (in contrast to the underlying model structures, i.e., reasoning in terms of interrelations) and the scenario content. Scenario content has been further characterized by the type of representation (qualitative vs. quantitative shares), the type of coverage of the translation (fully or partial, split into more than one indicator) as well as the direction and spread of variants and time-series.

Furthermore, the application showed that scenario traceability and scenario consistency are not fully independent. For instance, comparing, assessing and realizing scenario consistency, i.e., of underlying models, requires a certain degree of traceability of assumptions on interrelations.

Finally, the application revealed that the concepts of scenario traceability and consistency were analytical ones. They did not play much of a role for the participants of the case studies themselves (as e.g., members of the scenario groups and externals but also modelers). This holds true for both concepts and for most of the interviewees. This points at the question of to what degree scenario traceability and consistency are relevant categories to the participants of integrated scenario processes; and to what degree scenario quality is intuitively assessed by these users by other—perhaps more practice oriented—criteria that have not been considered within the focus of this study.

In the light of the quality debate in futures research

In parallel to this study, the academic futures research communities have intensified their discussions about quality criteria and standards. How can one situate the criteria proposed in this paper within the current debate?

Traceability Footnote 29 is demanded by Schüll and Gerhold [48] as a general feature of good academic practice that should be required in futures research, too. They define measures to reach traceability very broadly from the precise definition of the research question, over the different phases of a study, to the tension between necessary documentation and the need to focus the documentation on the most relevant issues. Kuuri, Cuhls and Steinmüller [5] propose a list of six “external validity criteria” of “futures maps,” the last two of which are very close to traceability, namely those asking if many people and/ or relevant experts understand a ‘futures map.’ The working definition of scenario traceability developed by this study is more specific, as it refers specifically to scenario methodologies and not to futures studies in general. Nevertheless, we consider that the three proposals could benefit from each other by establishing how the ‘validity’ aims required by Kuuri, Cuhls and Steinmüller could be reached by the traceability means proposed by Schüll and Gerhold and what relevance they have on the four dimensions of scenario traceability defined in this study and for different scenario users.

Consistency is not included into the overarching criteria within either proposal. Still, under the head of ‘argumentative testability’,Footnote 30 Grunwald [49] proposes to apply the principle of consistency together with those of internal and external coherence as well as three different types of transparency. Thus, Grunwald’s proposal emphasizes the links between traceability and consistency that have also been identified within this study. Again, the criterion of consistency as defined in our study is more specifically related to scenario methodologies, not to futures studies in general. We recommend that the notion of ‘coherence’ introduced by Grunwald, should be further discussed and operationalized with regard to scenarios, too.

This re-consideration of the state of research emphasizes again that the criteria developed within this study clearly fall onto the side of scenario construction by futures research. Integrated scenarios have ‘academic’ ambitions, namely to develop exploratory and in part quantitative scenarios, also for the use in further research. In new forms of integrated scenario methodologies using CIB, this research orientation even becomes more prominent than in classical SAS. In these new forms, intuitive- narrative approaches to scenarios are added or replaced by CIB, a systematic method with mathematical foundations and academic credibility. In these approaches, the hybridity resulting from the combination of two paradigmatically different components (i.e., qualitative scenarios and numerical modeling and simulation) is weaker than in SAS, as the degree of formalization is converging. Thus, in sum, ‘academic’ quality criteria are considered adequate to assess the quality of this type of scenarios (cf. also [16]). They play a role with respect to their credibility, usefulness and acceptability from a futures research perspective . Nevertheless, integrated scenario methodologies are also intended and expected to support policy advice and decision making – and the two criteria proposed in this paper do not explicitly consider the practical perspective of the usability and credibility of integrated scenarios beyond science and research. Thus, scenario traceability and scenario consistency can be considered necessary conditions to assess the quality of integrated scenarios when these are used for thinking socio-environmental futures among academic experts. Nevertheless, we expect that they are contributing but are not sufficient to support (external) users to assess whether and how to use these integrated scenarios to develop policies and strategies.

Conclusion and avenues for further research

The general question of this paper was how to measure the quality of integrated scenarios from a futures research perspective, i.e., how to assess and to compare the quality of different forms of methodologies and their resulting scenarios. The starting point was the diagnosis that integrated scenario methodologies are especially challenged by meeting the standard of traceability and by fulfilling their ‘promise of consistency.’ As in the scenario literature, not much conceptual clarity was given regarding these two criteria; scenario traceability and scenario consistency were more precisely defined and operationalized by distinguishing between different sub-dimensions and levels. The paper shortly reported about the empirical application within two explorative case studies combining the systematic yet qualitative cross-impact balance analysis (CIB) with simulation. The empirical analysis showed that the criteria were useful to illuminate whether, and on what dimensions and levels, these new forms of integrated scenario methodologies do or do not support scenario consistency and scenario traceability. The new integrated scenario methodologies combining CIB with numerical simulation do present some new answers for integrated scenario methodologies. First, they propose tackling the traceability challenge through the use of CIB as a shared conceptual model. Second, they reverse the promise of consistency by handing over the task of assuring internal consistency of qualitative scenarios from the numerical models to the CIB. Still, both effects are not automatic, but rather depend on specific factors and conditions within the design of the individual integrated scenario methodologies. Overall, the empirical application suggests that the two criteria as defined by this paper provide a small set of precise, distinct and empirically applicable criteria of scenario quality that are appropriate and useful for assessing scenario quality from a futures research perspective.

Considering limitations, this study was rather weak with respect to empirical evidence on traceability and consistency needs from different internal and external user groups. Therefore, for further research, we recommend a systematic analysis on who (modelers, scenario group, members, scenario experts, different types of externals) needs to trace what at what moment of the process, and what can remain black boxed? And who needs what degree of consistency in what situations? This could be supported by further empirical, but also conceptual and theoretical work. Furthermore, the working definitions of scenario consistency and scenario traceability provided by this study are work in progress. We recommend strengthening these through theoretical sources not only of scenario literature, but also through conceptual resources available in fields such as modeling, cognition and communication research, philosophy and mathematics. Finally, further research is needed to understand the relation of traceability and consistency to additional quality criteria that influence the practical usefulness and credibility of scenarios from a policy advice oriented perspective.