Introduction

Sustainability has long been a popular concept but is hard to quantify. Our study touches on theoretical and practical aspects of sustainability, which we believe are important in order to evaluate and critique the—real or implied—role of simulation techniques for characterising and quantifying agricultural sustainability, and the usefulness of the sustainability concept as a research criterion. It has been frequently proposed that bio-physical systems approaches using simulation techniques are suitable for quantifying agricultural sustainability (Monteith 1996; Hansen 1996; Kropff et al. 2001) in a way that is “literal, system-oriented, quantitative, predictive, stochastic and diagnostic” (Hansen 1996, p. 138). Indeed, simulation models have been widely applied to balance, often conflicting, economic and environmental goals (Bergez et al. 2010; Keating et al. 2003, 2010). Examples are the study of Murray-Prior et al. (2005), who used cropping systems simulation to balance trade-offs between increasing profitability while improving soil fertility, and reducing runoff and subsoil drainage in diverse rotations, including wheat and cotton, and that of Muchow and Keating (1998), who identified irrigation guidelines that maximise sucrose yield whilst minimising water losses and groundwater tapping by simulating a sugar cane farming system.

Simulation models are now mainstream research tools in complex systems science (Peck 2004; Bergez et al. 2010). However, their role in assessing and quantifying sustainability beyond trade-off analyses, as discussed above, remains unclear, despite suggestion or claim of the contrary (e.g. Hansen 1996; Kropff et al. 2001). Reasons for this may be conceptual, logical, methodological or practical. Grammatically, the word ‘sustainability’ is an abstract, uncountable noun. Generic quantifiers such as ‘some’, ‘more’ or ‘not much’ can be used to describe sustainability, but not numbers. Thus, there is incongruity between word properties and the quest for quantification. This adds to the ambiguous nature of sustainability (Cox et al. 1997), which is a hindrance to the development and adoption of a clear assessment framework, although sustainability has long been a popular notion in general terms (e.g. Kane 1999). In the following, we review some of the core issues—many arise from the relations between science and values that are frequently contested and ill-defined (Carrier 2008; Allenby and Sarewitz 2011; Meyer 2011; Benessia et al. 2012).

Notions of agricultural sustainability are broadly centred on “the capacity of agricultural systems to maintain commodity production through time without compromising their structure and function” (e.g. Hansen 1996; Ruttan 1999; Bell and Morse 2000). Most people would have an intuitive understanding of this and agree that agricultural sustainability is something desirable. However, broad agreement on such a public value (Meyer 2011) does not preclude conflict over definitions of sustainability, and how its presence or absence can be assessed. Theoretical concepts of agricultural sustainability have been seen as either goal-describing or system-describing (Thompson 1992). The goal-describing concept specifies a priori how the system ought to be, and entails normative judgements about agricultural practices and their sustainability (Cox et al. 1997; von Wirén-Lehr 2001 refers to it as means-oriented). It has been criticised as being logically flawed (Thompson 1992; Hansen 1996). The argument is that an a priori definition of what ‘is sustainable’ (in the sense of a prescription) largely eliminates the need for assessment. However, even a predetermined definition allows evaluation in respect to whether or not the system meets the criteria prescribed by the definition. The system-describing concept seeks to treat sustainability as an objective property intrinsic to a defined system, specifies criteria to predict and explain system behaviour, and is thought to be better suited to form the basis for evidence-based assessments of agricultural sustainability (Hansen 1996; Cox et al. 1997).

In fact, the notion of sustainability itself is strongly influenced by non-empirical knowledge and, hence, any approach to assessing sustainability has normative elements. The question is how and where choices come in and how these choices affect the scientific process. For example, the question that the analyst seeks to explain determines the specification of the system, its external boundaries and internal interactions (Thompson 1992; Kropff et al. 2001). The choice of performance criteria to evaluate system function or dysfunction is closely linked to system specifications (Girardin et al. 1999; Smith et al. 2000; Bouma 2002). As the system specifications and performance criteria depend on the analyst’s perspective, their selection is normative, even if it is embedded in sound reasoning (Hollander 1986; Thompson 1992). Thus, the development and adoption of an approach to assessing sustainability can never be purely ‘scientific’ or ‘objective’, which stands in stark contrast to the classic self-image of the sciences to proceed under the exclusive rule of logic and facts (Carrier 2008).

Likewise, the development and application of suitable performance criteria (indicators) to monitor change and sustainability has been subject to significant debate (e.g. Girardin et al. 1999; Riley 2001; Nortcliff 2002; Büchs 2003). Indicators have been designed to capture ecological, economic and social dimensions of sustainability for different systems and scales (Meyer et al. 1992; Girardin et al. 1999; Smith et al. 2000; Büchs 2003). The sustainability state of a system is typically assessed by comparing current or predicted indicator states with selected reference states. Reference states have been defined by critical limits, margins of tolerance (Gomez et al. 1996; Arshad and Martin 2002) or by a reference system (Abbona et al. 2007). Yet, there is a lack of generality related to the choice and specification of the reference state (Girardin et al. 1999; Arshad and Martin 2002; Büchs 2003). An example of a conceptual problem is the comparison of an ‘unsustainable’ reference state with a ‘more sustainable’ alternative, which would demonstrate some improvement in sustainability, but could hardly be viewed as ‘sustainable’. Indicators should condense and convey complex information in a way that assists with making difficult choices. However, indicators also entail the risk of hiding information, especially if several system attributes are combined to form composite indicators (Kane 1999; Girardin et al. 1999). Correlations between indicators (e.g. crop yield and the profitability of production) can increase the weight of one aspect of a system relative to the others (Smith et al. 2000; Arshad and Martin 2002), which needs to be considered when interpreting results.

Methodological challenges also originate from the temporal nature of sustainability. Some of these can be addressed using simulation modelling, which allows extrapolation beyond the timeframes typically employed in empirical approaches. However, despite that crop simulation models offer the advantage of capturing temporal variability over the range of the available climatic record (Moeller et al. 2008), value judgement determines how long a system should persist to be rated sustainable. A long time horizon may be important in ecological terms, but could be of little practical value in a rapidly changing economic and policy environment. Similarly, the timing of the assessment can bias the results of the sustainability analysis because system components vary at different scales. For example, the performance criterion ‘crop yield’ fluctuates at higher frequencies than ‘soil organic matter’, requiring a different length of assessment to capture the full range of possible, or even likely, outcomes.

Beyond the theoretical views on sustainability discussed above, practical assessment approaches typically entail both normative and objective elements (von Wirén-Lehr 2001). von Wirén-Lehr (2001) referred to the ‘hybrid’ concept used in practice as “principal goal-oriented concept of sustainability”. Respective studies follow a common, five-step strategy involving: (1) the definition of a sustainability paradigm, (2) the formulation of aspired sustainability goals for a specified system, (3) selection of measurable performance criteria, (4) evaluation and (5) advice on sustainable management practices (von Wirén-Lehr 2001).

We adopted such a principal assessment strategy for an ex-post evaluation of a model-based sustainability assessment using a real-world example. This study considers the usefulness of the sustainability concept and assesses the possible roles of simulation modelling for characterising and quantifying aspects of sustainability. Emphasis is placed on the theoretical and practical implications of our findings.

Model-based sustainability assessment framework

To exemplify a model-based sustainability assessment, we chose a system and environment that is representative of those found in countries of the Middle East and North Africa (MENA) (Cooper et al. 1987; Pala et al. 1999; Ryan et al. 2008). von Wirén-Lehr’s (2001) principal assessment strategy guided our analysis of potentially conflicting sustainability goals in wheat-based cropping systems in a semi-arid Mediterranean environment of northwest Syria using the cropping systems model Agricultural Production Systems sIMulator (APSIM; Keating et al. 2003; Moeller et al. 2007).

A brief outline of the steps taken in our assessment is given at the outset here. (1) We reviewed key issues for agricultural sustainability in MENA, and the specific issues in current wheat-based cropping systems. (2) This review informed the formulation of a sustainability paradigm and provided insights into the sustainability goals for guiding change. To address the sustainability issues identified, we then reviewed alternative management strategies and decided on exploring contrasting tillage systems in simulated wheat–chickpea rotations. These were conventional tillage without and with stubble burning and no-tillage. (3) To assess whether the consequences of the alternative tillage systems were to move towards or away from a sustainability state, we evaluated seven sustainability indicators: crop yield, water-use efficiency (WUE) and the gross margin (GM) of both wheat and chickpea, and the amounts of soil organic carbon (OC) across cycles of the rotation. Other indicators could have been chosen which underline our earlier point that the indicator selection can never be comprehensive and, hence, objective. (4) We explored the simulation scenarios of the management practices and used sustainability polygons (ten Brink et al. 1991) to illustrate the sustainability state (as described by the indicators) of an alternative management scenario relative to a reference state. Finally, we discuss the theoretical and practical implications of our findings.

Rationale for the sustainability paradigm

We formulated the sustainability paradigm for the MENA region as “Sustainable agricultural development contributes to improved food security, increases wealth in rural areas, and maintains agriculturally productive land and water resources”.

For over half a century, the MENA region has experienced a decline of per-capita cereal production (Dyson 1999). Production has grown slower than the demand by growing populations. As a consequence, MENA has become the largest food-importing region of the developing world (Pala et al. 1999; Roozitalab 2000). Across the region, the livelihoods of rural populations depend largely on agriculture. Most of the poor live in rural areas, where agricultural workers support their families with an average daily gross domestic product (GDP) of less than 3 US$ (Rodríguez and Thomas 1998; Roozitalab 2000). Small-holder systems with land holdings of less than 10 ha are common. Technological advances (Pala et al. 1999; Ryan et al. 2008) to increase agricultural productivity have aimed at reducing both poverty and the reliance on food imports (Rodríguez 1995; Chaherli et al. 1999).

The most important environmental factor limiting crop productivity in MENA is the highly variable, often deficient, rainfall (Cooper et al. 1987). To reduce climatic risks and boost production, the expansion of irrigation agriculture has been a key strategy (Rodríguez 1995; Rijsberman and Mohammed 2003; Araus 2004). With over 80 % of water resources being used in agriculture, this strategy has led to rapidly diminishing groundwater resources across the region (Araus 2004; Comprehensive Assessment of Water Management in Agriculture 2007). Soil fertility losses due to erosion, soil salinisation, declining soil organic matter and nutrient mining (Pala et al. 1999; Lal 2002) have tightened the dilemma of increasing production in an agro-ecological region where land and water resources are inherently scarce (Agnew 1995). Thus, to meet the imperative for ‘sustainable agricultural development in MENA’ (Rodríguez 1995; Chaherli et al. 1999), improved production systems are needed that maintain the resource base and increase the productivity per unit land and water. The intensification of rain-fed (non-irrigated) systems will play a key role for achieving these goals (Cassman 1999).

Rationale for the sustainability goals

The sustainability goals for wheat-based systems in the MENA region were chosen as “To increase the productivity of rain-fed cropping systems per unit (1) land and (2) water, (3) increase the profitability of production, and (4) maintain or enhance soil fertility”.

Across MENA, wheat (Triticum aestivum L. and Triticum turgidum ssp. durum) is the main staple food. Wheat-based systems dominate the zone delineated by the 350–600-mm isohyets. Typical rain-fed wheat-based rotations include food (Cicer arietinum, Lens culinaris, Vicia faba) and feed legumes (Medicago sativa, Vicia sativa) (Cooper et al. 1987; Pala et al. 1999; Ryan et al. 2008). Fields are commonly left fallow over summer, as insufficient moisture prohibits the reliable production of rain-fed summer crops. Long fallows (winter plus summer) have been largely replaced by cropping to increase production through intensified land use (Tutwiler et al. 1997; Pala et al. 2007).

Conventional tillage includes deep ploughing (0.2–0.3-m depth) with a disc or mouldboard plough, followed by seed-bed preparation with tined implements (Pala et al. 1999, 2000). Some farmers may plough up to five times prior to planting. The rational is to obtain a fine, weed-free seed bed. Farmers also manage stubble loads by burning (Tutwiler et al. 1990; López-Bellido 1992). Reasons for stubble burning have been named as to control weeds, pests and diseases, and to facilitate seedbed preparation for the following crop (Pala et al. 2000; Virto et al. 2007). However, these tillage and residue management practices have been shown to degrade soil physical and chemical properties, as indicated by losses in structural stability and soil organic matter (Govaerts et al. 2006; Roldan et al. 2007; Verhulst et al. 2011). Stubble management further includes summer grazing by sheep and goats. Land is rented out to herders following the crop harvest in spring/early summer, which generates additional income for arable farmers in the traditional crop-livestock systems (Tutwiler et al. 1997).

Because of its strategic importance for food security, wheat has become the major irrigated winter crop (Perrier et al. 1991). In Syria, farmers managed to double wheat yields through the use of modern technologies, including irrigation, high-yielding varieties and fertilisers in 10 years since 1980 (Tutwiler et al. 1997). Meanwhile, the productivity of rain-fed wheat-based systems has remained low. Rain-fed wheat produced in the Syrian governorates Homs, Hama, Ghab, Idleb and Aleppo (1988–1997) yielded, on average, 1.1 t/ha compared to 2.9 t/ha when irrigation was applied (Ministry of Agriculture and Agrarian Reform 1999). Growth conditions are often characterised by low WUE due to suboptimal agronomic practices, including insufficient weed control and non-aligned nutrient management (Pala et al. 2007; Passioura and Angus 2010). The application of fertiliser is often perceived as too risky because of high rainfall variability (Pala and Rodríguez 1993; Pala et al. 1999). Developing the rain-fed systems would not only contribute to food security but may also reduce the pressure on over-exploited groundwater resources (Varela-Ortega and Sagardoy 2002).

Rationale for an alternative tillage/residue management

Conservation agricultural practices, including residue retention and no-tillage sowing, have been successfully adopted in other semi-arid regions such as Australia, where they have become a key component of cereal-based systems (Thomas et al. 2007). As part of the sustainability assessment strategy, we reviewed such practices as possible alternatives to the conventional soil and residue management practised in MENA. In semi-arid environments of the Mediterranean region, wheat and barley yields increased with no-tillage compared to conventional tillage under relatively drier conditions as determined by site and/or season (Lampurlanés et al. 2002; Cantero-Martínez et al. 2003; De Vita et al. 2007). Benefits of conservation agriculture include more efficient crop water use and increased yields through improved soil water infiltration and storage (Bescansa et al. 2006; Verhulst et al. 2011), reduced evaporative losses with residue retention, enhanced soil fertility through higher levels of soil organic matter (Mrabet et al. 2001; Roldan et al. 2007), improved timeliness of sowing and reduced fuel consumption through the use of direct seeding (Knowler and Bradshaw 2007). However, farmers also require the system-specific management skills to overcome pitfalls, including increased susceptibility to stubble-borne diseases (Fernandez et al. 2008), reliance on herbicides for weed control and the risk of herbicide-resistant weed populations (D’Emden and Llewellyn 2006), risk of reduced crop N availability (Angás et al. 2006) and a trade-off between crop residue retention and the need for animal feed (Tutwiler et al. 1997). In other words, conservation agriculture is a knowledge-intensive technology that necessitates in-depth understanding of the possible consequences by farmers. This contrasts with knowledge-embedded technologies (e.g. mineral fertiliser or hybrid seed), which require little, if any, additional knowledge to be applied.

Simulation scenarios

Current and alternative management strategies were simulated with the cropping systems model APSIM. Model details and a comprehensive description of the simulation scenarios are given in Appendix A. Briefly, the simulations captured the most important features of rain-fed wheat-based systems in the target region, and were conducted for Tel Hadya, northwest Syria, using a typical soil type. The climate at the site is semi-arid Mediterranean (Moeller et al. 2007). Continuous simulations of wheat–chickpea rotations (1979–2005) included three alternative tillage/residue management practices. In the simulated conventional tillage (CT) system, straw residues were removed after harvest and the remaining stubble was incorporated into the soil by deep ploughing. With burn-conventional tillage (BCT), all wheat residues were removed by burning prior to conventional tillage. No-tillage (NT) was simulated with complete residue retention. Fertiliser nitrogen (N) was applied at wheat sowing at five rates ranging from 0 to 100 kg N/ha (N0, N25, N50, N75 and N100). The possible tillage system × fertiliser rate combinations lead to 15 simulation scenarios.

Sustainability indicators

In outlining our chosen indicators, we highlight the partial nature of our analysis. Their utility as measures of agro-ecosystem function has been discussed elsewhere (e.g. Meyer et al. 1992; Smith et al. 2000; Arshad and Martin 2002; Bouma 2002; Murray-Prior et al. 2005; Passioura and Angus 2010). Briefly, the variable ‘yield per hectare’ integrates all environmental and agronomic aspects of crop production, and is a measure of the efficiency with which resources and agricultural inputs are converted into a single, physical output, namely yield. The agronomic WUE (defined here as the grain yield produced per unit evapotranspiration from sowing until crop maturity) is a measure of the efficiency with which the scarce and variable rainfall is converted into yield. Organic carbon is a key indicator of soil health and function, and integrates agriculturally important soil properties such as aggregate stability, nutrient availability and water retention. The GM measures the degree with which an enterprise activity has covered its variable production costs.

Estimates of costs and prices for calculating the GM of wheat and chickpea production reflect those prior to the current political crisis in Syria (Leenders and Heydemann 2012; Seale 2013). We compiled information on prices and markets in Syria from agricultural statistics (Ministry of Agriculture and Agrarian Reform 2000), farmer interviews (Pape-Christiansen 2001), policy documents (Rodríguez et al. 1999; Wehrheim 2003; Huff 2004; Atiya 2008) and personal communications. It is important to note that the economic environment in Syria has been largely that of a centrally planned economy, despite on-going reforms towards greater market liberalisation (Hopfinger and Boeckler 1996; Huff 2004). For example, farm-gate prices for strategic commodities such as wheat and chickpea have been regulated and do not necessarily reflect prices on the world markets (Huff 2004). Until recently, diesel was highly subsidised and traded at about 40 % below the world fuel price (Atiya 2008).

For the purpose of our study, the GM per hectare was calculated as GM = gross revenue − variable costs specific to the three alternative tillage systems (Appendix B). One set of costs and returns was used. Thus, the GM varied only with the range and variability of rainfall. In the CT system, the gross revenue was calculated as grain yield plus recovered straw times the grain and straw price, respectively. The calculation was similar for the BCT system, except that all wheat straw was ‘burned’ and the consequent revenue for straw was zero. With NT, the gross revenue was calculated as grain yield times the grain price. Further details on prices and costs used in the GM calculations are given in Appendix B.

Sustainability criterion and reference system

We specified the sustainability criterion as “A management system is sustainable if its sustainability state (as described by the sustainability indicators) is similar or enhanced in comparison to a reference state”.

To assess whether or not this criterion was met, we illustrated the long-term average values of the sustainability indicators for an alternative management system relative to the values obtained with a reference system in sustainability polygons (ten Brink et al. 1991). In this visual reference-based assessment, the reference (baseline) system was a wheat–chickpea rotation subjected to CT in which wheat received fertiliser N at a rate 50 kg N/ha of at sowing, and represents agronomic practices that are typical for the study region (Pala et al. 1999). For the purpose of our study, we chose to illustrate the long-term average of all indicators. However, different aggregations for different types of indicators could have been chosen (e.g. start and endpoints for data showing a trend or running averages to illustrate state changes over time).

Assessment results

The sustainability polygons (Fig. 1) illustrate the results simulated for an alternative management scenario relative to those obtained in a reference scenario, and visualise whether the consequences of the simulated management practices were to move towards or away from the sustainability goals. This integrated assessment showed that NT addressed all sustainability goals by improving yield, the efficiency with which scarce rainfall was converted into yield, profitability and soil quality in the rain-fed wheat-based system.

Fig. 1
figure 1

Sustainability polygons to assess the sustainability of wheat–chickpea rotations at Tel Hadya (1980–2005): average indicator values (bullet with dash) with a, c, e no-tillage (NT) and b, d, f burn-conventional tillage (BCT) relative to the values (set 100 %; bullet with dash) obtained with conventional tillage (CT) and the application of 50 kg N/ha to wheat. In the NT and BCT systems, the amounts of fertiliser N applied to wheat were a, b 0 (N0), c, d 50 (N50) and e, f 100 (N100) kg N/ha. Indicators: wheat (W) and chickpea (CP) yield, water-use efficiency (WUE), gross margin (GM) and soil organic carbon in 0–0.3-m depth (OC)

Specifically, NT performed better than CT and BCT in all sustainability indicators, except when no fertiliser N was applied to wheat (Fig. 1; Table 1). Enhanced sustainability with NT, was first of all, a consequence of soil water conservation with the residue mulch. Residue retention also improved levels of OC, except when no fertiliser N was applied. The minimum N rate required for the NT system to outperform the reference system was N25 (not shown). When no fertiliser N was applied, N limitations reduced wheat yield, GM and WUE (but not OC), and, ultimately, the sustainability of all tillage systems. However, chickpea benefited somewhat from residual soil moisture left from a preceding N-limited wheat crop, which explained why the chickpea indicators yield, WUE and GM performed slightly better as in the reference system (CT with N50). The modelling showed that burning wheat stubble in the BCT system constrained sustainability by reducing revenue (consequently GM) at N rates of N0, N25 and N50 (Fig. 1d). Revenue was lost primarily by missing out on the productivity benefits from soil water conservation and by not selling straw as animal feed (Table 1). Application of high N rates (N75 and N100) compensated for revenue losses incurred by burning wheat stubble (Fig. 1f). Detailed diagnostic evaluations of causes and effects, and variability and trend of the indicator values complemented the integrated assessment using sustainability polygons. These are presented in Appendix C.

Table 1 Average grain yield, water-use efficiency (WUE), gross margin (GM), gross revenue (GR) from grain and straw sales, and soil organic carbon (OC) in wheat–chickpea rotations (1980–2005) simulated with conventional tillage (CT), burn-conventional tillage (BCT) and no-tillage (NT)

Discussion

We explored aspects of sustainability by modelling a particular system consisting of a manageable number of entities that are arguably well understood and described structurally and mechanistically in APSIM. The sustainability polygons enabled an integrative view on sustainability by collapsing the range of quantitative data (Appendix C) into simple graphs visualising numerous responses (Fig. 1). Correlations between indicators (e.g. yield and gross margin) are revealed in the sustainability polygons. This is an advantage over composite indicators, which can be biased by hidden correlations. The polygons allow an instantaneous judgement of the system’s sustainability: ‘better’, ‘neutral’ or ‘worse’. These descriptors are neither quantitative nor exact. In fact, the assessment results are deliberately qualitative and vague; there can be different degrees of ‘better’, influenced by norms and values of the analyst. However, this qualitative property is derived from highly quantitative simulation data. The demonstration of vagueness echoes the discourse on contested values embedded in the concept of sustainability (e.g. Bell and Morse 2000), and is a strength of the approach because the human experience of ‘what constitutes sustainability’ cannot be fully internalised in, and represented by, a model. In contrast, an exact measure of sustainability would be paradoxical, and unlikely to be meaningful for practical decision-making; in fact, it is illogical to answer a fuzzy question (‘what constitutes sustainability?’) with a precise number. Or, by paraphrasing Adams (1979): “the answer to [sustainability,] life, the universe and everything equals 42”, which is a very precise but an utterly meaningless answer.

Based on our analysis, we argue that vagueness is a core property of sustainability, and that system-specific vagueness can be denoted using descriptive quantifiers (e.g. ‘greater’). However, the detailed, diagnostic evaluations (Appendix C) also demonstrate the power of bio-physical modelling to quantify, predict and diagnose constraints to sustainability that are important for wheat-based systems in the semi-arid study environment, and identify management practices that can address defined sustainability goals related to land and water productivity, profitability and soil fertility (Appendix C). Key bio-physical (crop growth and water) and chemical (N and C) processes can be numerically described in time (by simulating responses across seasons) and space (by simulating responses for contrasting soils; e.g. Moeller et al. 2009) using models such as APSIM. Thus, individual system components can be quantified and predicted, while there is vagueness at a higher level of integration in our framework. It follows that sustainability is better described as an ‘emergent property’, which “arise(s) out of more fundamental entities and yet (is) novel and irreducible with respect to them” (O’Connor and Wong 2012).

It is valid to argue that the bio-physical modelling presented here is a form of ‘organised simplicity’ inapt to truly capture sustainability as, for example, human choices and decision-making are not explicitly included in the modelling. Intimately linked to such valid critique of the approach and framework are the questions of which system components to choose, the specifications of system boundaries, the context in hierarchy and the criteria for judging success or failure. However, to elicit such critique and concrete questions is precisely the purpose of the approach. Indeed, it is a characteristic of research in complex systems that, as more entities and processes are considered, uncertainty increases and predictability decreases. Thus, there is a clear need to specify and define the target system for analytical reasons (Hansen 1996; Monteith 1996; Peck 2004). Implicit to this is a natural sciences’ view of scientific rigour and complexity we can describe and, hence, grasp (Allenby and Sarewitz 2011). In this context, the elements of sustainability as characterised here by the model manifest themselves as deterministic knowledge, whereby all outcomes and the probabilities of these outcomes (e.g. Fig. 5 in Appendix C) are ‘known’. In reality, however, systems are interrelated at various scales, uncertainty confines predictability and the human experience of sustainability extends beyond the in silico environment. Hence, it is exactly this property that constitutes the real value of the framework and our analysis: policy-makers and practitioners will have to accept that fuzzy answers—as exemplified in the sustainability polygons (e.g. ‘greater’ or ‘not much’ sustainability)—may be the best expression of expertise; scientists will have to learn that the identification of the fuzzy space between deterministic knowledge, perception and ignorance may be the sign of real competence (Walker and Marchau 2003).

Based on our evaluation, we argue that the separation of the goal-describing and system-describing concepts of sustainability (as reviewed in the Introduction) is, in its core, artificial and practically irrelevant. Intrinsic to any sustainability concept and subsequent assessment must be some a priori understanding of success or failure of a predefined system. It is the very process of specification and definition of a target system, as detailed here, which demonstrates that sustainability can never be an ‘objective system property’ (Hansen 1996, p. 134). In statistics, objective properties are mean, median, standard deviation, among others. Simulation models are based on objective bio-physical principals (Bergez et al. 2010; Keating et al. 2003). In contrast, the criteria for evaluating success or failure in the sustainability of a defined agricultural system (e.g. wheat-based systems in MENA) are a matter of choice and the consequence of a societal discourse. Useful sustainability indicators are valid, rather than true or false.

Change towards sustainability is arguably the leitmotif in any sustainability assessment, with the endpoint typically being the provision of advice to decision-makers and the presentation of findings as a fait accompli (as described in the review by von Wirén-Lehr 2001, but not included here). Implicit to this approach is a very specific, linear epistemological model that often fails to deliver desirable changes because of the disconnect between the generation of new knowledge, and the needs and values that inform the sustainability goals of individual decision-makers in the farming community. An example from developing countries is the enthusiastic promotion of conservation agricultural practices for sustainability by researchers (e.g. Kassam et al. 2012; Lal 2000, and some literature reviewed as part of our assessment strategy), and the reluctance or refusal of many farmers to adopt this knowledge-intensive technology, which highlights that important agro-ecological and socio-economic constraints and complexities have not been considered in the research (see Giller et al. 2009 for a review on the suitability of conservation agriculture in small-holder systems in Africa).

So, the question arises as how to connect the in silico knowledge generated by our model-based assessment framework with the needs, values and the consequent sustainability goals of individual decision-makers. Firstly, sustainability should be viewed as a process rather than an endpoint of assessment. Secondly, viewing sustainability as a process implies a cyclic epistemological model (in contrast to the linear knowledge model discussed above), which evolves through time, as do the needs and sustainability goals of individuals (see also the ‘adaptation cycle’ described by Meinke et al. 2009). Research that straddles the generation of new knowledge and the various perceptions of what constitutes reliable and relevant knowledge in the face of complex and changing political, economic, social and bio-physical environments has been described as “boundary work” (Guston 2001; Clark et al. 2011) or “participatory action research” (Carberry et al. 2002; McCown 2001, 2002). Boundary work using bio-physical modelling has been applied successfully in Australia, where it involved iterative learning cycles in which the participating researchers, policy-makers and farmers (re-)designed and (re-)evaluated simulation scenarios as informed by practical experience and empirical observations (Meinke et al. 2001; Kokic et al. 2007; Nelson et al. 2007, 2010a, b). Such participatory, reflective modelling can cater for the various perceptions of sustainability (other than the single perception put forward in this study), as well as changes in perceptions throughout the participatory learning process.

Conflicts and contradictions in respect to “what constitutes a sustainable social, environmental, and economic outcome” that extends beyond the modelled system must be anticipated. In our analysis, the use of N fertiliser improved the values of all sustainability indicators in systems without stubble burning (Fig. 1). Nitrogen fertiliser is a means to increase productivity (Appendix C) and therefore contributes to food security in MENA (Pala and Rodríguez 1993; Rodríguez 1995; Tutwiler et al. 1997; Ryan et al. 2008). However, N fertiliser is also a non-renewable, emission-intensive agricultural input, and an environmental pollutant (Erisman et al. 2013). Similarly, there are sustainability trade–offs associated with alternative choices and priorities in conservation agriculture. For example, recent research conducted in Syria and Iraq instigated farmers’ interest in affordable, locally made no-tillage seeders—a success for researchers who had identified potential benefits of the technology for the region. Farmers responded to opportunities related to reduced fuel consumption (environmental and socio-economic benefits) and labour input (socio-economic benefit for a farmer and socio-economic loss for a farm worker) but remained sceptical about the long-term benefits of residue retention because residues are a feed resource for both arable farmers and livestock herders (Tutwiler et al. 1997; Jalili et al. 2011; Kassam et al. 2011). The socio-economic fabric of the traditional crop-livestock systems (Tutwiler et al. 1997) is likely to be affected in some way by changes in residue use. Embedded in a boundary approach, our model-based framework can assist exploring, and reflecting on, sustainable solutions for such difficult, applied problems that influence the triple bottom line. However, there is limited knowledge about the effectiveness of boundary work using bio-physical modelling in small-scale farming systems of MENA, although some successful applications have been reported from developing countries in other regions (Whitbread et al. 2010; Clark et al. 2011).

In formulating our sustainability paradigm, we acknowledged that ‘what constitutes sustainability’ is scale-dependent. Constraints to sustainability related to, for example, resources’ endowment, population growth and political change (e.g. Agnew 1995; Rodríguez 1995; Chaherli et al. 1999; Araus 2004; Bank and Becker 2004; Leenders and Heydemann 2012; Seale 2013) are outside of the system being modelled but impact on sustainability at the farm/field scale in profound ways that are often surprising and unpredictable. For example, the disruption of the largely state-controlled economy (Hopfinger and Boeckler 1996; Bank and Becker 2004; Huff 2004) in consort with the current political crisis in Syria (which was unforeseeable just a few years ago) means that previously highly subsidised diesel prices (Appendix B; Table 3) are now up to seven-fold higher compared to 2008 (Atiya 2008). Much of the diesel is traded via increasingly important black markets (personal communications). With diesel being a critical agricultural input, farmers would have reviewed their priorities and choices (e.g. plough more shallow/less frequently) and attempt to adapt to this and other novel circumstances over which they have no control. This example demonstrates that sustainability can be an issue of wicked complexity in which “a system’s makeup and dynamics are dominated by differing (or even antagonistic) human values and by deep uncertainty not only about the future but even about knowing what is actually going on in the present. Any solution to a wicked problem should be expected to create unanticipated but equally difficult new problems […].” (Allenby and Sarewitz 2011, p. 109). The consequent sustainability concept would be a ‘wicked concept of sustainability’, which acknowledges that there is no universally excepted answer to the question of sustainability. This may be viewed as a rather sobering conclusion. And, yet, while there is no finite resolution, socially desirable outcomes can emerge from a commitment to confronting and working with the perceptions and contested values embedded in the concept of sustainability.

Conclusions

We outlined that vagueness is a core property of sustainability, and that system-specific vagueness can be denoted using descriptive quantifiers. The model can be used to assess trade-offs and constraints to sustainability in ways that would be impossible in vivo. It is a quantitative, predictive and diagnostic tool for characterising important, but partial aspects of sustainability in wheat-based systems of the Middle East and North Africa (MENA). We stress that inherent values and individual choices cannot be fully internalised in a model. Hence, sole reliance on a model (any model) in sustainability assessments would be a rather technocratic confinement attempting to understand sustainability outside of the wider societal discourse and context. Yet, the model-based assessment framework has value when it serves as a powerful, exploratory core element in conversations with diverse stakeholders. It is a research approach that embraces and connects clearly with the needs and values of decision-makers in the farming community. In light of our analysis, we conclude that sustainability is as a vague, emergent system property of often wicked complexity. This property applies within the realm of methodologically grounded norms, values and constraints that are inherent to any assessment strategy. Rather than being the endpoint of an assessment, a ‘wicked concept of sustainability’ may guide a research process within an adaptive framework that integrates thinking, traditions and practices of both the natural and social sciences.