A Delphi Approach to Develop Sustainable Food System Metrics

Recurrent food crises and global environmental change are critical issues that pushed food security and sustainability to the top of the policy agenda. Policy-makers need assessment tools that help them decide what actions they should take to achieve these goals. This paper proposes a new metric system assessing the sustainability of food systems and diets at a subnational level adapted to the context of the Mediterranean area. Recognizing the systemic dimension of sustainability, the proposed information system builds on a vulnerability/resilience conceptual framework and considers the interactions between a set of biophysical and socioeconomic drivers of vulnerability and a number of context-specific food and nutrition security issues. A three-round iterative Delphi survey was conducted to involve a number of selected experts in the indicator selection process. 18 indicators were finally identified for eight preselected causal models of vulnerability and resilience at the interactions between a set of four drivers of change (water depletion, biodiversity loss, food price volatility, and changes in food consumption patterns) and four food and nutrition security outcomes (nutritional quality of food supply, affordability of food, dietary energy balance, and satisfaction of cultural food preferences). Each interaction was disentangled in exposure, sensitivity and resilience. The exercise allowed discussion of a conceptual and dynamic framework for food systems, and identification of indicators that gather consensus among the expert community.


Introduction
The scientific and international debate on the sustainability of food systems is gaining increasing recognition within the international community (Garnett 2016;IPES-Food 2016;Tilman and Clark 2014). On one side, the coexistence of undernourishment, nutrient deficiencies, overweight and obesity-the triple burden of malnutrition (Gómez et al. 2013)-is encouraging us to reconsider health and nutrition as essential goals of food systems. On the other side, while improving food and nutrition security, agriculture and food industry have generated unintended consequences including environmental losses (UNEP 2016). Simultaneously, several regions are experiencing unprecedented weather events caused by climate change and habitat depletion, further destabilizing global food and nutrition security (Thompson and Cohen 2012;Dora et al. 2014). This confluence of food crises with increasing environmental degradation suggests an urgent need for novel analyses and new paradigms to describe and understand the causes and facilitate adaptation and mitigation (Barrett and Palm 2016).
Participants at the 2010 international conference organized by the FAO and Bioversity International agreed on a common definition of Sustainable Diets 1 that emphasizes the food and nutrition security purpose of food systems, and the need to maintain or enhance this outcome over time, and across generations, by preserving essential human assets and the flows of services they provide (FAO/Bioversity International 2012). However clear consensus on metrics of sustainable diets and food systems is still lacking and a host of efforts are being implemented towards this goal (Fanzo et al. 2012;Johnston et al. 2014;Gustafson et al. 2016;Rutten et al. 2016). Understanding what constitutes sustainable food systems and diets, and how to assess them are key for decision-and policy-making.
The general concept of sustainability aims at allowing present and future generations to meet their needs within the limits of the earth's capacity. It highlights the importance of the dynamics that regulate the interconnections within social and ecological systems (Carpenter et al. 2009;Rockström et al. 2009;Waas et al. 2011;Whitmee et al. 2015). Modern societies depend indeed on complex agro-ecological and socio-economic systems to provide food; the move to sustainable diets calls for changes in the agricultural and food systems. Yet policy-makers and stakeholders need evidence-based information and assessment tools to design implementable and efficient policy interventions. Indicators in particular are crucial. Building on research jointly conducted by Bioversity International and 1 3 CIHEAM-IAMM over several years, this paper aims to identify a set of metrics assessing the sustainability of diets and food systems.
A sound conceptual framework is the starting point in constructing metrics (OECD 2008). Building on the above-mentioned project and related publications Allen et al. 2014;Allen and Prosperi 2016), we assume for the purpose of this paper that sustainability assessments aim at capturing the ability of a system to maintain and enhance its essential functions over time (Conway 1985;Hansen 1996). Sustainability addresses threats to the preservation of life support systems, including their capacity to withstand and adjust (Turner 2010). It is then key to assess stocks of and changes in human and natural assets (Stiglitz et al. 2010). Derived from sustainability sciences, the vulnerability approach, complemented by inputs from the resilience literature (Turner et al. 2003), is here mobilised to analyse the sustainability of critical food and nutrition security outcomes (Ericksen 2008;Eakin 2010;Tendall et al. 2015). The vulnerability/resilience framework is concretely operationalized through the selection 2 of a set of drivers of change and food and nutrition issues that are relevant to the Western Mediterranean area; the study area of the project.
The selection of indicators within this framework, which is the focus of this paper, is a crucial step and needs to be carried out beyond individual subjectivity and value judgments. The transparency of the whole exercise is essential in constructing credible indicators (OECD 2008). It should also be based on what is desirable to measure and not on which indicators are available. This paper reports the iterative expert knowledge-based Delphi method that was thus used to identify a set of indicators of sustainable diets and food systems. On top of the indicator selection, participants were also asked to discuss and complement the framework and underlying assumptions.
The first section of this paper describes the Delphi approach adopted in this study, the participant selection, and the process undertaken. The second section illustrates the main findings of the study including consensus on indicators, drivers and issues, and on the vulnerability/resilience interactions proposed. The last section reviews the usefulness and limitations of this study with respect to the results obtained and the methodology applied.

The Delphi Technique
The Delphi technique is "a method of structuring a group communication process so that the process is effective in allowing a group of individuals as a whole to deal with a complex problem" (Hugé et al. 2010). It is a procedure that organizes and structures dispersed expert group debate. It provides a systematic method to involve experts in problem analysis and discussion on complex issues, helping convert diverse views and opinions into one or more communal notion through an iterative feedback process (Benitez-Capistros et al. 2014).
Delphi survey starts generally with an initial questionnaire addressed to a group of experts or participants, which then leads to a series of rounds of discussion between the group members through a controlled feedback process organized by a facilitator. Concretely, the facilitator prepares a questionnaire that is sent and completed by the experts. At the end of this first round, each expert is provided with common feedback on the group responses. The facilitator usually indicates to each participant his/her position within the panel. Provided with this information, participants complete the survey questionnaire again, confirming or amending their initial responses. In each round the participants are thus asked to judge the opinions and elements that were suggested by the group during the preceding round. The process can be reiterated several times until a consensus emerges or is achieved. Consensus is usually reached following two to four rounds. Larger numbers of rounds often generate an important decline in participation. For a more detailed review of Delphi surveys please refer to Linstone and Turoff (1975), Hasson et al. (2000), Landeta (2006) and Frewer et al. (2011).
There are a variety of approaches to conduct a Delphi survey. The common building blocks are (1) an iterative process of rounds of discussion, allowing participants the option to amend their responses, (2) a systematic and transparent management of group dynamics and feedback process, and (3) participants' anonymity. Anonymity helps to avoid bias problems typical of group dynamics, preventing bandwagon effect and verbal agility or authority figures to dominate the debate, thus allowing experts to freely express their opinions (de França Doria et al. 2009).
The Delphi technique has been extensively applied in sciences and engineering, as well as in social sciences. It has been applied in a wide range of research domains related to the issues addressed in this paper, spanning from medicine and public health (Hwang et al. 2006;Boulkedid et al. 2011), food safety and policy (Wentholt et al. 2009;Frewer et al. 2011), food security (Wolfe and Frongillo 2001) and its governance (Moragues-Faus et al. 2017), farm sustainability (Etxeberria et al. 2014), aquaculture (Fezzardi et al. 2013), agrienvironmental indicators to sustainability assessment (Bélanger et al. 2012), agro-based bioenergy (Rikkonen and Tapio 2009), meat consumption (Vinnari and Tapio 2009), to sustainability metrics (Benitez-Capistros et al. 2014), sustainable tourism (Choi and Sirakaya 2006), climate change (Mastrandrea and Schneider 2004;Webster et al. 2003;Arnell et al. 2005;Prato 2008), adaptation to climate change and adaptive management (Plummer and Armitage 2007;de França Doria et al. 2009), landscape and conservation management (Mehnen et al. 2013) and vulnerability assessment (De Lange et al. 2010). Yet, as far as we could determine, it has not been applied at the nexus between sustainability and food security issues, specifically to identify sustainability indicators of food systems and diets.
The present Delphi study focuses on a list of indicators of sustainable diets and food systems over three iterative rounds. Experts were asked to select indicators from a list to inform a number of criteria. As already mentioned, these criteria were based on a conceptual framework presented and discussed extensively in Allen and Prosperi (2016). It has been the subject of two preliminary focus groups. The key elements of this framework are briefly recalled here for the comprehension of the Delphi process and design decisions. See Fig. 1 for a schematized representation of the overall sequential research approach.

The Background Conceptual Framework
The framework behind the Delphi survey is based on concepts and scientific approaches from the broad social-ecological system literature. More specifically, the vulnerability and resilience framework was applied, building upon the interactions between identified drivers of change and specific food and nutrition security issues, and disentangling these interactions in exposure, sensitivity and resilience. Vulnerability-as the degree to which a system is likely to experience harm due to exposure to a perturbation or stress-is a function of exposure, sensitivity, and resilience. 3 Several global and regional drivers of change affect the structure and processes of the food systems (Brunori et al. 2009) putting at risk context-specific food and nutrition security outcomes (Ericksen 2008). The vulnerability and resilience framework was proposed in particular because of its suitability for linking socioeconomic and biophysical causal factors within a given system (Turner et al. 2003).
Given the geographical area of interest, 4 it was possible to identify four geographically related drivers of change impacting the food system 5 and four context-specific outcomes of the food system (food and nutrition security issues) likely to be vulnerable to these changes. 6 A set of eight interactions between these two categories (of drivers and issues) Fig. 1 Research steps 3 Exposure is the nature and degree to which a system is likely to be affected by the occurrence of a change; sensitivity is the degree to which a system is affected either adversely or beneficially, by a change; resilience is the ability of a system to anticipate, absorb, accommodate, or recover from the effects of a potentially hazardous event in a timely and efficient manner, through ensuring the preservation, restoration, or improvement of its essential basic structures and functions (IPCC 2012). 4 Western Mediterranean European countries, i.e. France, Italy and Spain. 5 The selected drivers of change were adapted from the drivers proposed by the second report of the European Union Standing Committee on Agricultural Research at the European level (Brunori et al. 2009). 6 The vulnerable context-specific outcomes of the food system were identified within the general food and nutrition security issues (availability, access, utilization)-as main outcomes of a food system-following the main understanding brought by the definition of food and nutrition security (UN 1996) and the framework of the food system in the context of global change (Ericksen 2008;Ingram et al. 2010). was then identified. Therefore, the base structure for identifying the indicators was a matrix framework displaying the interactions between the "impacting" drivers of change and the "affected" food and nutrition security issues. The indicators were organized (for each vulnerability interaction) into the three components of vulnerability: exposure, sensitivity, and resilience (Füssel and Klein 2006;Schröter et al. 2005). See Fig. 2.
Within a set of sixteen (i.e., 4 drivers for each of 4 issues) potential interactions between the drivers of change and the food and nutrition security issues, only a selected subset of eight direct interactions were studied, analysed, and then presented to the Delphi participants, accompanied by a detailed list of 136 indicators, for the selection of metrics (see Annex 1 of electronic supplementary material). These eight interactions involve direct and potential impacts of the drivers of change on food and nutrition security issues, as well as related recovery potential from the food system. These potential impacts and recovery potential, which are considered essential attributes of the food system characterizing its sustainability, are captured by the three components of the vulnerability/resilience framework. This resulted in 24 components (i.e. [exposure + sensitivity + resilience] for each of 8 interactions) presented in Table 3.
The suggested framework and the initial shortlist of indicators were discussed with selected experts in two preliminary focus-group feedback sessions as recommended (Frewer et al. 2011). The goal of these focus groups was to discuss the framework, test the questionnaire and to anticipate the likely perception of the survey by the large panel of experts. The focus groups explored the practical applications of the protocol in the large survey panel of experts. The exploratory focus groups allowed adjustment of the components of the framework, augmentation of the list of indicators provided, improvement of the ergonomics of the online survey, enlargement of the contacts of potential participants, and verification that the food system issues under discussion were relevant.

Conduct of the Delphi Survey
The survey took place between March and July 2014. As previously mentioned, two exploratory focus groups were gathered in October and December 2013 to discuss the framework, but also the questionnaire and the initial list of 136 indicators before starting the Delphi process in March 2014 (see Fig. 1). The survey was conducted via e-mail and Fig. 2 Matrix of interactions between global and regional drivers of change and context-specific food and nutrition security issues SurveyMonkey ® , a web-based survey platform. 7 The online-based Delphi was adopted to improve the efficiency of the questionnaire by facilitating the process and saving time for the participants. The combination of the web-based survey platform and the questionnaire simplified the statistical analysis, avoiding the demands of paper-based surveys and limiting data entry and computing errors that are frequently reported in Delphi studies (Cam et al. 2002).
First, an extensive list of potential experts was developed reviewing academic publications. An electronic letter of invitation was sent to the identified experts. Following the process used in Dalkey et al. (1970), the invitation letter explained the goals and protocol of the study, and permitted potential participants to self-estimate their expertise and aptness to the study. Two weeks later, a general email was sent to all identified experts containing a link to the questionnaire and background material. This material included a document explaining the conceptual background, the specific aim and purpose of the Delphi study, and the summary details of the proposed indicator.
Participants were asked to provide their responses within 2 weeks of receiving the questionnaire. However, as predicted, reminders had to be sent to encourage responses from the maximum number of participants. Extra time was necessary for some experts to complete the questionnaire; therefore, the actual timeframe was 3 weeks for the first round, and 4 weeks for the second and third rounds. After the first and the second rounds, 2 weeks were needed to run the statistical analysis, provide participants with feedback and amend the questionnaire. Feedback reports providing each participant with the group results and their individual previous responses were sent via email after each of the three Delphi rounds. Overall, the final results were given 4 months and 10 days after sending the first letter of invitation.
The questionnaires were composed of three sections: (1) selection (and/or proposition) of indicators; (2) appraisal (and/or the proposition) of drivers and issues; and (3) openended questions on food system sustainability assessment. In each round, participants were asked to select their preferred indicator for each of the 24 components of the framework from a menu of five to eight preselected indicators. Participants had the opportunity to propose new indicators. Indicators that did not receive any participant preference were excluded from the following rounds. New indicators were added if at least two participants proposed the same, or similar, variable. 8 A "Don't know" option was included in the menu to allow experts to express their lack of knowledge on a specific component.
On top of these closed-ended questions related to the indicators, experts were asked a number of open and appraisal questions related to the drivers of change and issues of food and nutrition security composing the framework in order to confirm, or not, the importance of the food system dynamics proposed, and to further open the analysis to other key aspects related to the sustainability of food systems. Over the different rounds, participants were also offered the possibility to augment the framework by suggesting new drivers and issues, and to assess these new propositions. Relevance of drivers of change and interactions was assessed along a four-option rating scale ("Not at all important"/"Not that important"/"Important"/"Very Important" 9 ) in order to encourage participants to make clear-cut choices. The appraisal of the food and nutrition security issues used a nine-point Likert scale (with one being the lowest and nine being the highest).
Feedback reports to the experts following each round reported basic descriptive statistics. For the selection of indicators, in the three rounds, the results were indicated for each indicator in percentage of participants. For the close-ended questions on the appraisal of drivers and interactions, the results were presented as ratings in percentage form. Apart from the indicator selection, the appraisal of the elements structuring the framework (Drivers, Issues, and the related interactions) was conducted through single assessments not reiterated in the Delphi process.

Identification and Selection of Delphi Participants
Selection of expert participants for a Delphi survey is critical. Experts are "[…] persons […] who are particularly competent as authorities in a certain matter of facts" (Flick 2009, p. 165 with reference to Deeke 1995, pp. 7-8). Identifying an expert based on his/her expertise and knowledge is challenging (Burgman et al. 2011;Failing et al. 2007), and selection must be performed thoroughly so that the group composition shapes the diversity of valuable knowledge (Okoli and Pawlowski 2004). For the purpose of this research, we considered an expert as a person who, by verifiable means (such as their scientific role, expertise/knowledge, publications etc.), is known to have information or has access to information with regards to the issues under investigation. In this study, we opted for a twostage selection procedure.
In a first step, potential participants were selected by reviewing academic publications and participation in scientific conferences using broad keywords in an online search engine, and through professional networks. The sampling procedure identified experts working or conducting research on issues related to the sustainability of the food systems. The group included experts from a multiplicity of disciplines to guarantee a heterogeneous array of opinions. Initially, 213 suitable candidates were identified. Information on academic discipline, age and gender was collected. All potential participants were invited to participate by e-mail.
On the last day, 41 (19.3%) questionnaires had been completed. In a second step, 18 experts belonging to underrepresented disciplines and age or gender groups were then contacted by telephone or, if this was not possible, by another personalized email in order to balance the sample to better reflect the diversity of the scientific community. An extra week was given to complete the questionnaire. At the end of this process 51 (23.9%) experts returned the questionnaire.

Sample Characteristics and Participation
In this study, 51 experts returned the first round questionnaire, representing a response rate of 72% of those who acknowledged receiving the invitation. In rounds two and three, given the relatively small samples in Delphi surveys, achieving and maintaining a certain degree of participation was crucial to ensure the validity of the study. Overall, 39 experts (out of 51) logged in again and answered the second round of the questionnaire, giving, therefore, a response rate of 76.5%, above the highest response rate encountered in the literature (Akins et al. 2005). Then, 36 experts (out of 39 of the second round) responded to the third and last round of the survey; a response rate of 92% (see Table 1).
Representation of the different academic disciplines was relatively stable over the three rounds (see Table 2). With regards to the first round, the majority (33%) were economists (although from different sub-disciplines ranging from agricultural and food economics to resources and environmental economics) closely followed by nutritionists (22%). Agronomists and environmental scientists represented 10 and 8% respectively. About 12% of the panel could be associated to food policy and governance, and another 10% as working specifically on food systems and sustainability science(s). Only one panellist brought in food technology expertise (see Table 2). The requested confirmation about the scientific field of the experts confirmed the multidisciplinary composition of the panel. The final participation rate indicated that women represented about one third of the sample (31%). 10

Delphi Round One
In this initial stage, no dimension or indicator stands out as making full consensus. Majority (≥ 50%) was reached for only four indicators and dimensions, while in 12 dimensions it  was observed that one or more indicators reached 35%, and in eight dimensions all indicators were below 35%. Four initial indicators were not selected and were then excluded from the list for round two. On the other hand, 14 new indicators were proposed by participants and introduced for selection in round two.

Delphi Round Two
In round two, consensus was clearly emerging for 10 of the 24 desired indicators (i.e., more than 60% agreement on one indicator), however no indicator had yet met the defined high threshold consensus criteria (80%). For some interactions and components, the majority of panellists appeared to be balancing between two main options: for seven components, out of 24, two main indicators were preferred, displaying more than 30% agreement each. 31 indicators, proposed in round two, were not selected and were then excluded from the list for round three. Experts suggested four new indicators that were integrated into the questionnaire for selection in round three.

Delphi Round 3: Final Results
Consensus was finally reached in round three for 14 of the 24 desired indicators. Eight indicators have met the high threshold consensus criteria (80%), three other indicators have met the medium threshold consensus criteria (70%) and another three have achieved the low threshold consensus criteria (60%). Four indicators have been selected by the majority of the participants (above 50%).
For five dimensions (out of 24), clear bipolarity can be reported (two indicators above 35%). In some of these cases, several experts recommended constructing a composite indicator. Three dimensions remain unresolved with a wide dispersion of expert opinions among indicators and little improvement of the consensus through the rounds. Moreover, "Don't know" rates (the default option) are high for these three dimensions only. Table 3 summarises the results of the indicator selection.
For the analysis of the stability of the consensus over the three rounds we observed that for dimensions acquiring at least 50% consensus among participants, 93% of the experts were consistent with their choice of preferred indicator from round two to round three. Although there is no specific statistical test to measure the stability of responses between rounds for qualitative nominal variables, this observation indicates that a certain degree of stability of the consensus has been achieved. Furthermore, 75% (18 out of 24) of dimensions reached, at least, a majority consensus (≥ 50%) on one indicator, and in 100% of these cases the most popular indicator in round three, was also the most popular in rounds one and two. This additional observation also demonstrates a certain degree of stability of consensus. The progression of consensus was thus ascending over the three rounds (see Fig. 3).

Appraisal of Interactions, Drivers, and Issues
The survey was also the occasion to discuss the framework and its operationalization in the Western Mediterranean context. A number of close-and open-ended questions were asked to assess and complement the initial set of drivers of change, and food and nutrition security (FNS) issues.

Drivers
In the first round, experts made 139 propositions of additional drivers of change likely to affect each of the suggested FNS issues. 25% of them already emanate from the framework and 75% were original suggestions. Each driver was classified according to the 2nd European Commission Standing Committee on Agricultural Research (SCAR) Foresight Exercise typology of drivers likely to significantly impact food systems (Brunori et al. 2009). We were able to associate most of the participants' propositions with a shortlist of 12 revised broad drivers of change.

Interactions
In the second round, experts ranked the importance of the eight initial interactions. All the interactions proposed were judged "important" or "very important" by more than 80% of the panellists. Agreement on importance ranged from 85% (impact of biodiversity loss on nutritional quality of food supply) to 97% (impact of changing food consumption patterns on dietary energy balance) (Fig. 4).
In the second round, for each issue, the first two proposed drivers, were submitted to participants to be ranked in order of importance. Overall, the two newly proposed drivers did not reach the same high degree of consensus on importance for food and nutrition security as the initial set did. Three drivers attained comparable levels for some issues (near or above 80%), namely; changing agrofood patterns, policy actions and technological innovations (See Fig. 5).

Food and Nutrition Security Issues
In the second round, participants proposed new FNS issues judged important for the geographical context considered, in addition to the four food and nutrition security context-specific issues initially proposed in the framework. 18 experts (45%) answered this non-compulsory question and respondents suggested five additional issues. Two of these (environmental externalities and social equity) are not generally considered to be conventional food and nutrition security issues, as is usually encountered in the literature (Pangaribowo et al. 2013).
Following on from as text analysis, illustrated in Box 1, the propositions of the experts have been analysed and categorized on five additional issues, namely: (physical) accessibility, food safety, governance, environmental externalities and (social) equity.
In the third round, participants rated the priority of the overall nine issues (four initial + five newly proposed) using a Likert scale of one to nine (with one being the lowest and nine being the highest) (see Fig. 6). The four initial issues appear within the first five priority challenges for food and nutrition security identified in the region. Two new challenges-"Environmental externalities" and "social equity"-come second and third (see Fig. 6). As already highlighted, these two crucial challenges are not conventional food and nutrition security issues, as usually encountered in the literature (Pangaribowo et al. 2013). These could arguably be interpreted as the very concept of "sustainable food and  Speth (1993) with the inclusion of environmental and social issues at the same level as the more commonly accepted economic and nutritional dimensions of food security.

Methodological Considerations
Disadvantages of the Delphi technique have been clearly identified in the literature and scholars have questioned the reliability, validity and credibility of this research methodology. Sackman (1974), for example, has noted that anonymity may lead to a lack of accountability because responses may not be traced back to the individual. Todays' Internet approaches allow tracking responses of participants, but for evident scientific and ethical reasons facilitators are not allowed to divulge this information. In addition, it has been suggested that a consensus approach can lead to a diluted version of the best opinion and the

Box 1 Text analysis for FNS issues
Physical accessibility has been suggested as another food and nutrition security issue by three participants, "next to affordability". One expert specified that "physical access" was another important issue. Another stated that "allocation of food within society/household" is an issue when discussing accessibility. Food safety was another issue mentioned. One expert mentioned "food system sovereignty and governance (transparency in decision-making, ability of consumers and producers to influence food system drivers and outcomes)". Other panellists talked about "food production patterns" and "re-orientation of industry [organization] or technological improvement" along the value chains. Several experts added "environmental impacts" or "externalities". Others specifically mentioned "greenhouse gas emissions" or "biodiversity" or "water use" or "energy consumption", as food and nutrition security issues. An expert highlighted the importance of "increased inequality in wealth/income distribution", while others added "inequitable (and unethical) healthy food distribution" or "equity" as food system outcomes that need to be considered. Another expert mentioned labour regulations and corporate social responsibility as crucial elements for the future of the food systems         result represents the 'lowest common denominator' (Powell 2003, p. 378). It could then be argued, however, that all approaches (for example, working groups and nominal groups) to gaining consensus could be vulnerable to this effect. Others have argued that this approach is time-consuming, labour-intensive and therefore expensive (Fitzsimmons and Fitzsimmons 2006) although there is not agreement on this (Powell 2003). A number of methodological issues arising in respect to Delphi have the capacity to threaten the credibility of the study and these include issues around panel expertise, number of rounds, questionnaire development, analysis and achievement of consensus (Hanafin 2004). The team involved in the present study has tried to overcome some of these main limitations by (1) organizing focus groups to design and test the questionnaire; (2) carefully controlling the selection of experts; and (3) maintaining an adequate rate of participation over the three rounds. Generally Delphi studies involve, on average, between 15 and 60 experts; within this range a Delphi panel is considered valid (Hasson et al. 2000). In a recent Delphi study by Benitez-Capistros et al. (2014) only ten participants actually participated in the first round and only five responded to the third and final round. The rate of participation by the initially invited experts is also an important aspect to consider. In de França Doria et al. (2009), 61% of the invited experts effectively participated in the first round despite 85% of the invited experts declaring acceptance to participate. In Wentholt et al. (2009), 22% of the invited experts actually participated in the first round. In this study, the results obtained for the participation, both in absolute number (51 and 39 in, respectively, the first and third rounds) and percentage (72% of the invited participants who acknowledged receiving the invitation participated in the first round; 24% out of the overall invited participants) largely fall within the standard recommendations for the implementation of the Delphi technique. A balanced composition of the panel over the rounds, as well as the participation of qualified participants, is also important (Powell 2003). In our Delphi study, the multidisciplinary composition of the Delphi panel was accurately closely monitored over the three rounds. The rate of participation of each disciplinary group was maintained constant from the first through the third round.
A number of lessons can be drawn in terms of methodology to enhance participation and consensus for further Delphi studies.
• Having an institutional support guaranteeing that participants could perceive the beneficial purpose for society and not an exclusively profit-seeking aim for the team running the study; • Holding a face-to-face meeting by mentioning that a Technical workshop, held in Montpellier on November 2014, was going to be organized at the end of the Delphi study and that participants would be invited for further scientific discussion and involvement; • Setting a diverse and appropriate expert team with a very good knowledge and understanding of the problems of the sustainability of the food systems; • Gathering two preliminary focus-group sessions as pilot application for properly managing, motivating and administrating feedbacks; • Sending qualitative personalized feedback with comments, explanations, and suggestions from the experts for real interaction of the group; • Allowing for consulting large, geographically dispersed, expert communities through internet technology; • Using a common language (e.g. English) to involve international and multidisciplinary participation; • Providing relevant but not overloaded scientific content and materials to participants; • Structuring the survey to make each round progressively less time-consuming.

Discussion on the Indicators
The purpose of this section is to discuss the level of consensus reached on the indicators, their computability and their interpretation within the vulnerability/resilience framework.
Consensus on the indicators was categorized according to the threshold proposed by Keeney et al. (2011): majority (50-59%), low (60-69%), medium (70-79%) and high (> 80%) consensus. According to Hasson et al. (2000) and de França Doria et al. (2009) a Delphi-derived consensus between 51 and 80% should be considered acceptable. In this Delphi study, as described in the results section, an acceptable consensus on indicators was obtained on 18 out of 24 interactions, meaning an acceptable consensus in 75% of cases. The 12 best indicators, i.e. half of the targeted indicators, attained 81% consensus on average. Consensus was thus clearly reached for some indicators.
On the contrary, no satisfactory level of consensus has been attained for seven indicators. In three cases, none of the proposed indicators has even convinced half of the expert panel. These interactions need to be carefully re-examined. For some of the dimensions that presented a manifest bipolarity, a number of experts proposed to integrate the two indicators into a composite one. This proposition could be potentially extended to all the interactions that presented bipolarity. Yet, this suggestion would have had to be validated through consensus in a fourth round.
Although consensus seemed to be reached for some indicators, the question remains as to whether or not the experts share the same paradigm as to how to interpret the indicators selected. Table 4 provides the interpretation that would derive from a literal understanding of the different components of the vulnerability/resilience framework according to Allen and Prosperi (2016), Gbetibouo et al. (2010) and Nazari et al. (2015). 11 The selection of indicators does not allow confirmation of any single justification behind each indicator.
Although an indicator list was provided, including a precise definition for each indicator as well as details about units and bibliographical references, the hypotheses for each indicator were purposely excluded from materials provided to experts. First, the inclusion of hypothesis proposals would have overwhelmed participants with information and exposed the process to the risk of early termination (Landeta 2006). Second, the formulation of these hypotheses was judged as the core element on which the experts were to make a decision.
The interpretation behind the choices of the experts is rather straightforward for some selected indicators. It is however puzzling in some cases. For instance, the indicator "% of diets locally produced" (CIHEAM/FAO 2012) reached a medium consensus (72%) as proxy variable for the sensitivity of the satisfaction of cultural food preference in a context of biodiversity loss. How to interpret this choice? A literal understanding would be that the higher the share of diets that are locally produced, the more vulnerable the system. Should we thus understand that in the case of a high level of diets provided for by local food production, the fulfilment of the food preferences (strongly dependent on this local production) would likely be more impacted by an erosion of biodiversity in the agrofood system? If so, locally produced diets would be assessed as being detrimental, contrary to current common understanding. Alternatively, it is possible that some experts selected the indicator independently of the framework or that the other proposed indicators were even less convincing.
Furthermore, experts selected "% of diets locally produced" indicator even though it is not yet clear how to compute it. Within the indicators not selected for this interaction, some of them are quite easy to apply. For instance the "Mediterranean Adequacy Index" (Alberti-Fidanza et al. 1999;Bach-Faig et al. 2011) and the "Consumer interest in ethical imports" (DEFRA 2009) are two indicators that have been measured, and are also widely acknowledged and well conceptually structured. These decisions suggest that despite the lack of evidence to support a specific indicator in terms of validity or computability, some experts were still willing to select them as potential candidates for use.
Searching for appropriate metrics implies finding those that are desirable to measure and not simply choosing from those that are available (OECD 2008). In this Delphi study, several indicators were proposed in the original list although no data application or validity tests had been reported. Following Tapio (2003), panellists were let free to make "some tacit knowledge explicit". Experts were thus purposely left to decide if data availability was a selection criterion. Some experts, indeed, indicated in the open-ended comments that they considered this information to make their choice.
This selection of indicators provided acceptable results in terms of consensus for a number of indicators. Yet, application would require further research in particular for the indicators that have not yet been applied. The next steps would be to classify the selected indicators into categories that distinguish validated indicators from promising indicators, and define how to proceed with the not-yet validated indicators.

Discussion on the Framework
Experts proposed a number of driver/issue interactions and the two most quoted per dimension were submitted to be ranked based on importance. Hence, an opportunity was given to propose new interactions, beyond those provided by the framework. These propositions represent new research hypotheses to explore when assessing the sustainability issues of the food systems.
A number of the proposed new drivers deserve to be highlighted: • Experts proposed several drivers that can be associated with the broad set of "Changing agrofood patterns". These relate to the dynamics that shape the structure of the food systems and the relationships within the food value-chain elements and stakeholders (Brunori et al. 2009). • Participants also suggested analysing the impact of technological innovation on the nutritional quality of food supply; • And the interactions between soil degradation and nutritional quality of food supply.
The reflection that emerged from the proposition and the appraisal of these new interactions shows that there is no unique interpretation of the impacts of a driver of change on food and nutrition security. Often impacts can be either positive or negative, or mixed both positive and negative impacts depending on the context. The geographical-specificity of food and nutrition security issues is key to determine the final outcome. For example, water depletion will very likely have impacts on food and nutrition security in Mediterranean countries, but maybe not to the same extent everywhere else in the world. Soil degradation might be a particularly important concern in some parts of the world, and less 1 3 in others. Experts in a Delphi survey can significantly help identifying and understanding these specificities.
Additionally, respondents suggested five additional issues, including environmental externalities, social equity, physical accessibility, food safety, and food governance. In a following step, experts assessed the initial and these newly proposed food and nutrition security issues.
Environmental and social welfare are two recognized food system outcomes, on top of food and nutrition security issues such as availability, access and utilization (Ericksen 2008;Ingram et al. 2010). Yet, these two newly proposed issues, environmental and social, are generally not considered as conventional food and nutrition security issues (Pangaribowo et al. 2013). Since the mid-80s, several scholars and international organizations have proposed the concept of "sustainable food security" (UN 1987;Speth 1993;UN 1996) or "sustainable food and nutrition security" as an enlarged concept of food and nutrition security considering environmental and social issues. Simultaneously, the term "sustainable food and nutrition security" has been used literally as a forward-looking concept, characterizing the ability of food systems to sustain food and nutrition security, "to address the longer term, root causes of hunger and malnutrition" (Thompson and Scoones 2009). The connections between these two understandings and uses of "sustainable food and nutrition security" are not clear. Further analysis and debate would be necessary to acknowledge, not only conceptually but also in practice, the introduction of environmental and social concerns into food and nutrition security issues-and the related trade-offs (Prosperi and Peri 2014; Barrett and Palm 2016)-to adopt in order to reach sustainability goals.
A few lessons can be learned as to the identification of sustainable food system metrics.
• Given the diversity of views and understandings of what sustainability means, discussions need to be guided through structured methods, in particular if metric systems are the final outcome of the discussions. In this regard, iterative approaches are appropriate; • Transparency and multi-disciplinary participation are crucial in the development of sustainability indicators, but present the risk of weaker consensus and longer timeframe; • Agreeing on a detailed background framework is essential for the development of indicators, but unlikely to happen if too specific. The objectives of informing stakeholders and aiding decision-making should be the driving principles when reducing the framework to its core elements; • When it comes to sustainable food system metrics, whatever the different interpretations of the framework, some key aspects and indicators were identified by a large majority of the participants, namely (1) water use, with the different possible combinations of the Water Footprint; (2) agrobiodiversity, with an indicator such as the Crop Agrobiodiversity Factor; (3) dietary diversity, with the Household Dietary Diversity Score; (4) households' food costs, with the share of household expenditure on food; and (5) price volatility, with the Sensitivity to price volatility indicator that needs to be clarified and validated by its proponents. Nutrient adequacy could be a sixth aspect to consider as many indicators using nutrient-density scores attained appropriate level of consensus. It is difficult to assess to which extent these dimensions were pre-determined by the framework.

Conclusions
The aim of this work was to identify a reduced set of indicators for the assessment of sustainable diets and food systems. This goal was achieved through combined use of the participatory Delphi method and the vulnerability and resilience framework. The indicators were intended to assess the essential properties that allow a food system to maintain 'food and nutrition security' over time, while under the influence of global and regional drivers of change.
The Delphi protocol proved useful in exploring the complexities of the assessment of sustainable diets and food systems. With 51 participants in the first round and 36 in the last round, including economists, nutritionists, natural scientists, agronomists and food security specialists, acceptable participation rates and a balanced composition of the panel of experts from different disciplines were achieved.
Consensus on indicators was reached for a number of dimensions. High, medium, low and majority consensus was progressively obtained on 75% of the indicators. 18 indicators were specifically identified and need to be further tested in the context of sustainable food systems. The remaining dimensions, originating from the initial framework or proposed by the participants, need to be re-examined for full completion of the exercise. It emerged in particular that participants proposed different possible combinations of the Water Footprint to measure issues related to water use. The Household Dietary Diversity Score was selected several times as a measure related to the nutritional quality of food. Furthermore, different nutrient-density scores were chosen to assess nutrient adequacy. As regards the dimensions for which no satisfactory level of consensus was reached, interesting avenues for future research have been outlined by respondents, such as combining multiple indicators and deriving composite measures.
The survey provided also interesting inputs for the framework that organises the different dimensions of sustainable food systems. The proposed drivers of change and food and nutrition security issues, as well as their interactions, were widely judged important or very important. All the initial driver/issue interactions reached the highest level of consensus, with the relationship between consumption patterns and dietary balance attracting particular attention from the experts. A number of participants proposed to add environmental externalities and social equity to the four issues of the original framework. Experts also suggested several new drivers such as "Policy actions", "Technological innovations", "Soil degradation" or "Changing agrofood patterns", which refer to the dynamics that shape the structure and the relationships between the activities and actors within the food value chains.
This study supports that structured participatory and iterative approaches, through the Delphi method, are powerful tools for gathering opinions and forging group consensus. Delphi demonstrated to be an efficient and versatile method capable of integrating the knowledge and expertise of a diverse panel of researchers by guaranteeing transparency and multi-disciplinary participation. However, evaluation of the actual value of Delphi outputs for policy-making remains a topic that needs additional attention as the actual effectiveness and applicability of some of the indicators need to be demonstrated. The limitations of this study concern mainly the reduced number of interactions proposed in the framework and the availability of the data for some selected indicators. Some indicators had not been estimated in the literature and existed mainly as concepts, such as the indicator "% of diets locally produced". While some experts declared to consider data availability as a main criterion of choice, others were still willing to select them. This highlights the need for collecting new data. Further analysis on the socioeconomic and biophysical aspects of the sustainability of food systems could use or adapt the results of this Delphi study by testing the indicators, but also by adapting the framework to other contexts and food and nutrition-related issues.
Finally, a number of lessons can be drawn from this specific experience to enhance participation and consensus for further Delphi studies. It matters in particular that organisers (1) are able to demonstrate the benefits for society and science of the proposed survey; (2) pay particular attention to the selection of a diverse and appropriate set of experts; (3) conceive properly the first questionnaire-in that respect, organising preliminary focus-group sessions as tests proved very successful; (4) provide respondents with clear and concise personalized feedback; (5) make use of new technologies to ease the process; (6) and, if possible, organise a final face-to-face encounter to dissipate remaining uncertainties and possible misunderstandings. Validity of the Delphi survey is conditional on maintaining a good level of participation until the last round.