Introduction

Argumentative Delphi surveys have been around since the first experiments with Delphi procedures, which initially took place in two or more rounds (Dalkey, 1968, 1969a, b; Dalkey et al., 1972; Dalkey & Helmer, 1962, 1963; Gordon & Helmer, 1964; Brown, 1968; Brown & Helmer, 1964; Brown et al., 1969). Most of them were used to assess and give estimates about the future of science and technology, but more and more the embeddedness of science, technology and innovation came into play (Abadie et al., 2010; Cuhls et al., 2022; Eriksson & Weber, 2008) and was also judged upon in Delphi surveys. For efficiency reasons, easier programming of the software and fast surveys, during the period between the 1970s and 2010s, often numerical answers, judgements on scale and statistical analysis were in the forefront of Delphi analyses. As most of the Delphi surveys are conducted online, nowadays, there are many more opportunities for including many people (experts - often defined in a broader sense) into the surveys, and to feed back quantitative analyses like percentages, mean or median, but also to feed back arguments.

I take the opportunity of the symposium of Christian Dayé to recall the history and first variants of Delphi surveys, which tried to organise predictive work and create knowledge about futures by a controlled interaction (Dayé, 2020:41). This was also the starting point of my own dissertation (see Cuhls, 1998) looking back to the “original Delphi surveys” conducted by Helmer and Dalkey reported in 1962/1963 and the larger one by Gordon and Helmer in 1964 mainly concerned with science and technology, but also related to societal developments. For a long time, the main use of Delphi surveys was expert judgements, quantitiative estimations and statistical analysis of the findings only. Delphi gained momentum when applied in science and technology futures, especially in Japan, where it became an institution to conduct a Delphi survey every five years to update their data about the future of science and technology (see e.g. Cuhls, 1998; Kuwahara et al., 2008; NISTEP 2019). Sociologists were not that much interested in this kind of futures research but can also profit from methodological ideas and the findings in the different studies (see for example Bell, 1996, 2003, 2004). Sociologists also argued that working with self-fulfilling prophecies can be an interesting aspect (Merton, 1948) and is part of Delphi surveys where issues are raised and their time horizons estimated. Delphi surveys are one of the established methods of Foresight and Futures Research (Cuhls, 2008; Martin, 1995; Slaughter, 1990) or later Anticipation (Poli, 2017, 2019).

In the large Delphi surveys, researchers mainly used figures and statistics (for an overview see Belton et al., 2021). The value of the qualitative argumentation, often about societal developments, was not acknowledged when fast analyses and the application of new software tools in different contexts, mainly science and technology, were experimented with. And the way of asking for reasons when strongly disagreeing with the majority as in the early Delphi surveys (see Dayé, 2020: 184) was for a long time rather seen as problematic than as an opportunity in Delphi surveys. Even Gordon and Helmer in later times asked themselves if they should have insisted more on asking for reasons and qualitative “data” (Gordon & Helmer, 1964:61). But in most cases, they regarded it as rather disturbing to the consensus search if they would include arguments (especially considered for Gordon & Helmer, 1964; Brown & Helmer, 1964, also cited in Dayé, 2020:184). Often, comments existed or it was asked for new ideas or arguments, but the results in the form of newly given arguments were rarely published.

This changed a few years ago, when software applications allowed for new ways of considering argumentations:

There are new variations of Argumentative Delphi surveys and it is depending on the programming of software if this kind of Delphi survey is a „success“ in achieving enough answers and good results or not. The variant of the Dynamic Argumentative Delphi method (DAD) is new and goes beyond the pure “justification” of experts’ estimations and assessments (as addressed in Dayé, 2020:59–60, referring back to Dalkey and Helmer, 1962). The DAD was developed in Romania and first tried out there in national projects (UEFISCDI, 2013). It is a Real-time Delphi, means, it is used with real-time feedback function. On a large scale, it was tested on a European level in the BOHEMIA project (European Commission 2018). Other non-dynamic real-time Delphi surveys can be used on a small or medium scale and there are new examples, too, for example a German Delphi survey on the Future of African-European relations (BMZ 2020), or one about the Future of language learning or R&I issues derived from PostCovid-19 scenarios for the European Commission.

My contribution to Dayé’s newly started discussion explains the new variants of Argumentative Delphi surveys, demonstrates two examples and summarizes the learnings from and limits of argumentative surveys with feedback. The arguments make use of the expert’s knowledge and should be taken into account - much more than until now.

Delphi Surveys Online

The Delphi method is one of the subjective-intuitive methods of Foresight and based on structured consultations. It uses the intuitively available information of the respondents, who are usually “experts”, a notion often defined very broadly (Cuhls, 2000, 2009, 2012, 2019; Dayé, 2020). The Delphi method provides qualitative and quantitative results and has normative as well as explorative and even prognostic elements. There are diverse variations of applications (very early ones also mentioned in Dayé, 2020 or Cuhls, 1998 are Dalkey, 1968, 1969a, b; Dalkey et al., 1972; Dalkey & Helmer, 1962, 1963; Helmer and Gordon, 1964; Brown, 1968; Brown & Helmer, 1964; Brown et al., 1969) and a consensus that the “Delphi method is an expert survey in two or more rounds, in which the results of the previous round are fed back in the second or later rounds of the survey.”

Thus, starting from the second round of the survey, the experts judge under the influence of their expert colleagues‘ opinions. Therefore, the Delphi process can be described as “a comparatively highly structured group communication process in which experts judge on issues about which uncertain and incomplete knowledge exists,“ according to Häder and Häder’s (1995, p. 12; Häder, 2009) early working definition. This kind of controlled group interaction without the presence of experts judging on certain issues is already reported early (see Dayé, 2020: 40 ff. referring to Kaplan et al., 1959 and earlier studies). Many use a pragmatic characteristic (see also Niederberger and Renn 2019/2023) similar to Wechsler’s (1978, p. 23 f. translation) “standard Delphi method”: “it is a monitor-group-driven, multi-round survey of a mutually anonymous group of experts for whose subjective-intuitive forecast consensus is sought. After each round of questioning, a statistical group judgement informs about the median and interquartile range of the individual forecasts and, as far as already possible, the arguments and counter-arguments of the extreme, i.e. outside the interquartile range, individual forecasts are fed back in a standardised way.“ Whether consensus is sought or merely identified varies in each case. Delphi surveys for a long time mainly fed back quantitative data in the second or later rounds.

The reason for this is efficiency, pragmatism and time - it takes much time to feed back arguments and to analyse them in-between rounds. With new digital possibilities and software for fast statistical analysis, it became rather easy to feed back quantitative data - and more and more software was able to calculate that directly (see Aengenheyster et al., 2017). That means, in the second round, only pure data were fed back, examples are percentages, numbers or mean and median. The participants than judged on the basis of the aggregated results of the previous round and just had to tick boxes. In some cases, they got also feedback on their personal previous judgement, but not in all cases. Researchers struggle here if it is better to remind them on their previous judgement (and send it with the reminder) or if it is better to let them re-think again without knowing the previous judgement.

The first Delphi surveys at the RAND Corporation gave the possibility for arguments, explanations or comments. That was reduced later. In most Delphi cases, only general comments could be given (see most of the survey analysed in Belton et al., 2021).

Back to Arguments in Delphi Surveys

As indicated, argumentative Delphi methods are not new. They were already used in Dalkey’s experimental studies in the 1960s (Dalkey, 1968, 1969a,b, Dalkey et al., 1969; Dalkey & Helmer, 1963, Dalkey et al., 1972). They are based on the usual Delphi procedures in two or more rounds with feedback and the possibility to reconsider one’s own assessments in the second or later round, taking into account the aggregated answers of other experts, and to revise them if necessary without having to justify oneself or “lose face” (Cuhls, 1998, 2019; Häder, 2009). In argumentative Delphi procedures, participants are asked to give arguments or reasons for their own assessment, i.e. to justify, for example, why one assumes an early or late realisation of the topic or why one considers the topic to be particularly promising for the future. The question of justification is linked in each case to a quantitative assessment, which is later evaluated statistically.

The “Dynamic Argumentative Delphi” (DAD) goes one step further. It builds on the advantages of the real-time Delphi and has been successfully tested several times. The main aim of the approach is to enable online Delphi surveys with a large number of participants (hundreds or more), while retaining the interactive “argumentative”, i.e. justification-based, character of traditional Delphi. Since processing respondents’ arguments typically involves a lot of manual effort over the several rounds and is thus the main obstacle to a large number of participants, this process is automated in DAD by introducing some simple rules. A DAD is only possible as a Real-time Delphi online.

A Real-time Delphi is characterised by the fact that it is a survey in which feedback is given directly and in real time, i.e. immediately, on the answers given by previous participants (see Gerhold, 2019; Cuhls, 2019, Gnatzy et al., 2011; Gordon and Peace 2006), so that participants can reconsider and change their own opinions based on the results - or not. As a rule, they can even do this several times and “come back” as often as they like. The first Real-time Delphi studies were already conducted in 2006 (Gordon and Peace 2006; Cuhls et al., 2007, Friedewald et al., 2007; Zipfinger, 2007). Ted Gordon is the same Ted Gordon who paved the way for the “classic Delphi” survey, the form that made use of statements and questions in 1964 (Gordon and Helmer, see also the discussion in Dayé, 2020 Chap. 6). Real-time Delphi studies (RTD) are basically only possible online if they want to preserve the anonymity of the participants. RTDs therefore take advantage of the special group dynamics in Delphi studies and combine them with much more direct communication among the participants without compromising the anonymity of the survey.

Several software versions already exist to conduct online Delphi studies, for an overview see Aengenheyster et al., 2017, but only very few allow for arguments and a dynamisation of them. In most cases, the software has to be individually programmed and adapted to the purposes.

Special Features of Argumentative Delphi Surveys

Delphi surveys demand a lot from the participants: they have to read, grasp and assess the theses (assumptions about the future) and they have to do it several times, i.e. at least twice, in order to be able to change their opinion. Otherwise it is not a Delphi but a simple (future) survey. In doing so, the assessment needs a full understanding of the respective topic. Moreover, each person’s assessment is made under certain premises and biases. The arguments for the assessment, the premises, or “what the expert was thinking”, are to be recorded and made evaluable in the Argumentative Delphi. This is more time-consuming for the participants because they have to write it down and not just note down or click on a value or box. It is also more time-consuming for the evaluators because only content-based, sometimes fragmentary evaluations are possible. Text mining approaches or the use of MaxQDA have been tried out by the author, but have only been of limited help so far. In the evaluation, the biases of the evaluators can also come into play.

Unlike the aggregated statistical evaluations, the arguments and comments are individual opinions that - when fed back - can become dominant opinions again, contrary to the Delphi principle. Statistics can be automated in the evaluation and are thus easier to handle. Nevertheless, the arguments offer substantive clues and justifications for the respective assessments and thus valuable content or additional information.

Why now also introduce dynamics and how does it work? The main aim of the approach is to enable online Delphi consultations with a large number of participants without losing the interactive “argumentative” (i.e. justification-based) character of traditional Delphi surveys in smaller groups. Since processing respondents’ arguments typically involves a lot of manual effort over several rounds (and is thus the main obstacle to expanding the number of participants), Dynamic Argumentative Delphi has automated this process by introducing some simple rules and back to the discussion in Dayé, 2020, is the occasion not only to predict or estimate a time horizon, but also to use this occasion to argue for likelihood, feasibility, possibility, importance and other issues not directly asked for, too.

In an Argumentative Delphi or DAD, each Delphi statement to be evaluated quantitatively (e.g. on probability, importance, impact, time horizon for realization etc.) is linked in the online questionnaire with two to three ‘start’ arguments, which - together with all arguments added later by the respondents - are always visible to the participants. The expanding set of arguments with each participation serves as the rationale for the quantitative estimates, as in traditional Delphi formats. Respondents are asked to enter their quantitative estimate and justify it by selecting at least one existing argument or providing at least one new argument, or both. The maximum number of arguments that can be added or selected by each respondent is usually limited to two or three.

The list and frequency of arguments updated with the newly selected/added arguments is always visible to subsequent respondents (real-time updating). The arguments in the list are ordered by the number of votes collected during the exercise (counts are given). In contrast to the constantly updated list of arguments, the own quantitative estimates are only visible to the individual participant who created them.

With this approach, it is possible to either limit the Delphi to a single “round” and give the opportunity to respond as many times as desired during this round, or to add a second round (e.g. only for those respondents whose estimates are above/below a certain threshold compared to the group average or median). In both cases, the advantages of the traditional Delphi method remain. It is also possible to limit the number of arguments to a manageable level, as the structure discourages participants from duplicating arguments already introduced, as well as digressions, excessively long texts or irrelevant considerations. Essentially, the arguments are “self-bundling”. This means that the number of participants can be very high without a proportionate manual editing effort on the part of the organizers of the survey, and at the same time much more content can be generated.

However, it is also possible to conduct a simple future survey with dynamic arguments (Dynamic Future Survey). This is the case when the participants are not given the opportunity to answer a second time or to revise their assessments. In practice, this means that each participant sees the assessments of those who have answered prior to her- or himself as soon as a certain number of answers are available. If done in this way, each person receives different feedback and can be influenced by different feedback at a certain stage of the argumentation, which is critical without the Delphi function (the possibility of changing the own previous answer). Some even call these future surveys a Delphi, which often leads to confusion for recipients and later readers of reports and papers.

It is therefore important in a dynamic, argumentative Delphi survey that the advantages of the traditional Delphi format are retained. Respondents reflect on the statistical data as well as the justifications for their quantitative estimate. They continue to answer anonymously and can be influenced, but do not have to be. In doing so, they draw on the arguments of the respondents answering before them, thus ensuring a certain degree of intersubjectivity in the exercise. Participants are not forced to subscribe to a particular opinion. Participants can - and are encouraged to - return to the online questionnaire to consult the updated argument lists and review their arguments in the light of the arguments. They can reassess if they feel it is necessary.

In short, the main advantages of DAD are assumed to be the following (assumptions):

  1. 1.

    The number of participants can be significantly expanded compared to traditional Delphi and even typical online exercises. Participants come into more direct exchange.

  2. 2.

    Compared to some online formats, the consensus test on the quantitative variable is explicitly linked to arguments, so the usual pitfalls of averaging opinions are somewhat mitigated. There is not only statistical feedback.

  3. 3.

    Participant interaction focuses on the most important/ ‘popular’ issues or arguments, preventing digressions.

  4. 4.

    It prompts respondents to select existing arguments as originally formulated, rather than unnecessarily adding ‘new’ reasons that partially or fully duplicate reasons already entered.

  5. 5.

    It facilitates the interpretation of the “meaning” or significance of the quantitative result by highlighting which arguments are more in favour or against.

  6. 6.

    Nevertheless, everyone can reconsider and change their own opinion without having to justify themselves. Anonymity is guaranteed.

In the concluding section, I will come back to these advantages and assess them in the light of the experiences until now. In the following, tho examples of argumentative Delphis are explained.

First Example: Project BOHEMIA

BOHEMIA is the accronym of a project for the European Commission: „Beyond the Horizon: Foresight in Support of the Preparation of the European Union’s Future Policy in Research and Innovation“. BOHEMIA was a comprehensive Foresight project that combined several methods: megatrend identification, Horizon Scanning, interviews, scenarios, Delphi method, targeted scenarios, and a consultation. This Foresight approach was intended to support the design (especially topic identification) of the European Union’s 9th Framework Programme for Science and Technology in an early preparatory phase starting in 2016, now called “Horizon Europe”. The steps in the project built on each other. The first phase was dedicated to the collection of megatrends, which were combined into scenarios. In cooperation with the newly founded Foresight Correspondent’s Network (FCN, now Horizon Europe Foresight Network, HEFN) consisting of representatives from all departments of the EU Commission, a perseverance scenario (business as usual) and a change scenario (a more desirable scenario) were described (European Commission 2017a).

The Delphi study marked the second phase of the BOHEMIA project (European Commission 2017b and 2018) and started with interviews with people who have a broad overview of possible futures and a semi-automated query on techno-scientific futures. From these and the results of a Horizon Scanning (approach especially via publications), theses were generated and discussed and selected in a moderated, so-called “scoping workshop” with participants of the FCN and external experts with a broad knowledge. The theses were reviewed several times by the responsible officers of the European Commission and the project team and then subjected to the assessment in the field (see below).

In a third phase of the project, the results of the Delphi survey were evaluated and the theses that were among the most important were combined into so-called “mini-scenarios” or “targeted scenarios”. These 17 targeted scenarios marked research and technology directions to be particularly focused on, often formulated in terms of need, e.g., No. 6 Defeating Communicable Diseases, No. 7 Emotional Intelligence Online, No. 8 Human Organ Replacement, or No. 10 Low Carbon Economy (European Commission 2018, also published them as individual scenarios). The scenarios were subjected to a further Delphi-like evaluation (consultation with arguments, but each participant could only participate once). The final results included recommendations for policy implementations and were published in an overall report in 2018 (European Commission 2018). In the following, only the Delphi study is discussed in more detail.

Respondents were invited to participate in the BOHEMIA Delphi survey by the project team in a letter sent by email. Specifically, they were asked to visit the survey website (bohemia-consultation.eu) and set up a personal account to allow them to start, complete and exit the survey at their convenience. After the registration step, participants received an email with a personalised link to the questionnaire. When accessing the questionnaire, the participant was asked to select one of several fields under which the Delphi statements were clustered, based on their expertise and interests. (The fields of knowledge were divided into two broad classes: S&T Developments and R&I Policy Statements with differing questions). Participants were also asked to return at any time after completing the first field to select further fields (for a maximum of three).

Fig. 1
figure 1

Example for the assessment of the Time Horizon including arguments

After selecting the field(s), the first Delphi thesis was made available (Fig. 1). After the assessment was done on one browser page, the respondent could move on to the next page - and so on until all theses in the selected field had been worked on. In the top half of the page, the thesis was visible in bold orange letters, with the option below it to skip it - if the person did not feel informed enough to assess it. Below the Delphi thesis, the respondent was asked to estimate the “time of realisation” of the statement, for which he or she could select one of the given options from a drop-down list with the possibilities “2025”, “2030”, “2035”, “2040”, “After 2040”, “Never” or “I don’t know”. After selecting an assessment, the respondent was asked to support this assessment with arguments, which could either be selected from a list of existing arguments or added by the respondent. The arguments already on the page could be selected or deselected by clicking on them. The new arguments could be written into a field and then edited or removed. The screen that then appeared looked similar to the one in Fig. 1. The numbers behind the arguments show how many participants have already selected this argument.

Then, the participants were asked to rate the importance of research and innovation for the realisation of the statement (as in Fig. 2). The rating system (1 to 5 stars from “not significant” to “highly significant”) was briefly explained. Again, arguments could be selected or added. In our case, the thesis had to be evaluated, otherwise it was not possible to move on to the next page.

Fig. 2
figure 2

Assessment of significance

Annotation: For the type of statements in the figure above, the importance of “R&I for realization” is assessed. For another type of statement, the assessment was different.

After assessing a single Delphi thesis, respondents could move through all statements related to the field they had selected. Those who were finished with the statements of the field were also asked to access the questionnaire a second time to see the assessments of the previous participants and, if necessary, to revise their own assessments in light of the new information. Feedback was only displayed when at least 15 people had responded. If this was not the case, participants were informed that they would be invited again at a later date. After the questionnaire was called up again, the greyed-out background of the page displayed a message at the top briefly explaining the content of the new page and the new tasks.

Fig. 3
figure 3

Revision of Statements

As in Fig. 3, each participant was shown the distribution of answers individually (orange bars for all, blue bar for their own answer). The participant was then asked to either keep or revise his/her own original assessment of the realisation date and the previous arguments. After completing the first or second “round” of the questionnaire, respondents were taken to a section of the questionnaire where they were asked to fill in information about their “profiles” (age, gender, etc.) as well as their expertise. In this project, the reminder management and questionnaire guidance therefore ensured a combination of Real-time Delphi (real-time feedback) and a kind of second round (invitation to start again from the beginning, and additionally in targeted e-mails).

The final results could be seen immediately: The Delphi statements were divided into two broad classes according to their content and the way they had to be assessed. They were statistically analysed according to the judgements (numbers and percentages), and they were clustered according to their argumentations. The results can be found in the final reports (European Commission 2017b and 2018), the pure results for each statement and some rankings in tabular form. There are.

  1. 1.

    theses that are expected to become reality by 2030 at the latest: At least 60% of the participants chose either 2025 or 2030 as the realisation date;

  2. 2.

    theses that are likely to become reality in 2035 at the earliest: at least 60% of the participants selected 2035, 2040 or beyond as the realisation date;

  3. 3.

    theses that are unlikely to become reality: at least 60% of the respondents selected “never” as the answer; and.

  4. 4.

    theses with unclear realisation time: all arguments that are not included in the above classes.

Cross-cutting analyses and statistical tests were also carried out, but not published, as the European Commission only wanted to use the data of the individual 147 theses as well as the overview rankings. These were used extensively for their own analyses. As part of the BOHEMIA project, the most important topics were filtered, re-clustered and so-called “Targeted Scenarios” were compiled and enriched (European Commission 2018).

Second Example: The Future of African-European Relations

Africa and Europe are very close neighbours. With their shared history, interests and challenges, there seems to be an obvious need for Africa and Europe to cooperate for tackling climate change, implementing the 2030 Agenda, dealing with migration or fight pandemics, among many other areas. The questions in this Real-time Delphi survey with arguments (not a DAD) on behalf of the Federal German Ministry for Economic Cooperation and Development (BMZ) were: Will the two continents continue to be close partners, or will other players take over Europe’s role? And will the two continents be able to take their partnership to the next level? Where are opportunities for good relations? What form will they take in the long term?

The project was carried out by the German Gesellschaft für Internationale Zusammenarbeit (GIZ) and the Fraunhofer Institute for Systems and Innovation Research (ISI). The results were published in a joint report (BMZ 2020). The Delphi survey was conducted to generate expert knowledge and explore different perspectives from Africa and Europe. The results contributed to the political discourse of Germany’s Presidency of the European Council, including in the Africa Forum in September 2020.

The survey was based on the following questions:

  • How can cooperation between Africa and Europe be made fit for the future?

  • How can a strategy be designed to promote sustainable cooperation between Africa and Europe?

  • What are the main drivers, opportunities and uncertainties or ‘blind spots‘ that are relevant to the relationship?

This study had several steps:

Step 1: Identification of experts. 38 African and European experts from politics, administration, business, academia and civil society were identified to be personally interviewed. The selection of experts was made considering a gender, regional and sectoral balance.

Step 2: Semi-structured expert interviews were conducted in February and March 2020 in order to identify issues and opportunities of and obstacles to future African-European relations.

Step 3: Formulation of theses. Based on the analysis of the expert interviews, 22 theses on the future of African-European relations were formulated by Fraunhofer ISI and GIZ. Each statement stands for itself and may contradict another thesis.

Step 4: Real-time Delphi survey - assessment of theses. The theses were subsequently assessed in a Real-time Delphi survey online in March and April 2020 involving 90 participants providing anonymous input on their possibility, time horizon, influence and desirability. For each thesis, participants were asked:

1. Do you think this is possible?

1a. Why do you think this is possible/ not possible?

1b. If you consider it possible, until when?

2. If this development happens, how influential will it be on the African-European relationship?

2a. Please give reasons for your rating.

3. Is this development desirable from your personal perspective?

Step 5: Analysis of the Real-time Delphi survey. The evaluation of theses, explanations and comments were statistically and qualitatively analysed. In this context, differences in the assessments of different groups (gender, region, sector) were examined when they were statistically significant.

The survey was prepared and performed during the very first phase of the Covid-19 pandemic, in lockdown times. The pandemic has affected all political, economic and societal spheres in Africa and Europe, so it could also fundamentally impact future cooperation, among other areas in terms of doing business, fighting hunger, ensuring joint pandemic preparedness, managing conflict and adapting social security systems. A supplementary question was therefore added during the course of the survey: ‘What could be the short-term impacts of the Covid-19 crisis on African-European relations?’

Figure 4 illustrates the aggregated responses to the question ‘If this development happens, how influential will it be for African-European relations?’ The results suggest that the topics covered by the survey are very important to the future African-European relations: all 22 theses were rated as influential by the vast majority of the participants. Among these, shaping the international order ‘the African-European way’ by strengthening soft power and rules-based multilateralism is seen to have a particularly large influence. Moreover, a group of nine theses is seen as ‘fundamentally influential’ by more than 40% of respondents. Interestingly, all of them indicate rather positive ‘visions’ of the two continents’ future relations, e.g. Africa and Europe speaking with a common voice, having intense cultural and societal exchange and enjoying legal and unrestricted mobility both within and between the two continents.

Fig. 4
figure 4

Influence and time horizon

The survey also asked ‘Do you think this (thesis) is possible?’ Most of the theses were rated as ‘definitely possible‘ or ‘rather possible‘ (Fig. 5). For six theses, respondents rate a high possibility above 80%. The two theses with the highest possibility rating within this group were very different, however.

First, and very encouragingly, the thesis on cooperation on climate change receives the highest possibility ratings of all theses. Tackling climate change jointly via an African-European response is perceived as representing real momentum for enhanced cooperation between the two continents. However, the future does not look as promising in relation to the thesis on populism, xenophobia and nationalism, which respondents – with almost equal ratings than those on cooperation on clime change - perceive as ‘definitely possible’ or ‘rather possible’ in terms of seriously undermining the relations between the two continents.

The theses with a short time horizon that are rated highest on influence include populism, xenophobia and nationalism, security collapse, and an Africa that sees little value in deepening relations with Europe. These trends have serious potential for a deterioration in African-European relations. At the same time, the issue of the EU recalibrating its development aid approach is also considered as being highly influential and achievable within a short time horizon.

On the other hand, some of the theses are regarded as less likely or more uncertain (Fig. 5). Legal and unrestricted mobility within and between the two continents is rated as ‘rather not’ or ‘definitely not’ possible by almost half of the respondents. A paradigm shift in migration policy in the two continents policy thus appears to be one of the least imaginable scenarios. The same applies to Africa and Europe speaking with a common voice and implementing joint strategies, which also rates relatively low on possibility compared with the other theses.

Participants were also asked to assess the time horizon for realisation of the thesis in response to the question ‘If you think (the thesis) is possible, until when?’ (Fig. 4). Here, there were three different groups of responses (Figs. 4 and 5).

  • A majority consider the time horizon for the first seven theses to be short-term (realisation by 2025). These theses are led by threats like populism, xenophobia, nationalism, security collapse and authoritarian alliances but also by the scope for Africa and Europe to join forces to forecast pandemics.

  • More than 80% consider the last five theses to be medium- to long-term developments (realisation by 2035 to 2045). They include ‘big’ topics like an ‘African-European model’ for re-shaping the international order or legal and unrestricted mobility.

  • Assessments regarding the time horizon for the ten remaining theses vary. Some respondents regard them as short-term, others as medium- or even long-term.

  • None of the theses attracts complete consensus on the time horizon, but there is particular disagreement concerning the thesis on a new African Security Architecture, with an almost equal distribution of all possible responses.

Those with a long time horizon include some ‘big’ topics, such as an African-European model for reshaping the international order, intense societal and cultural exchange between the two continents, legal and unrestricted mobility, African solutions, and unquestioned ‘African agency’ (Figs. 4 and 5). Those issues will require long-term structural transformation, for which the foundations need to be laid today but whose implementation will take time.

Fig. 5
figure 5

Possibility and time horizon

The relationship between time horizon and possibility (Fig. 5) shows that the theses rated as long-term projects are also rated as less possible than others, for example structural challenges such as reforming global trading mechanisms and institutions. Some other key findings are:

With a collapse in security, pandemics, populism, economic crisis and climate change, the future of Africa and Europe can, at times, look bleak and characterised by challenges that are as diverse as they are impactful. However, the opportunities, too, are diverse. With political will, credible leadership and tangible efforts towards implementation, Africa and Europe could not only make a virtue of necessity by joining forces to address common threats but also engage in shaping future cooperation on key global issues and set out a path towards a more peaceful, sustainable and just future. This was the central narrative around which the interviews and Delphi theses on future African-European relations took place.

However, during the course of the survey, it became apparent that opinions and assessments of possibility, influence and time horizon diverge and that setting priorities involves as many options as there are fields of expertise. Nevertheless, some issues turned out to be particularly important in the view of the experts - raised in the arguments. Based on the analysis of the expert interviews and the Real-time Delphi survey, some key results have therefore been identified (BMZ 2020: 9f): The time is now to start a new era of cooperation - especially the pandemic and post pandemic times will create momentum and define the immediate future of African-European relations and if there will be more collaboration with Europe - or if other partners will drop in. If the two continents manage to join forces, they could make safeguarding multilateralism a common future endeavour through an African-European path to international cooperation. The most controversial thesis in this study is whether each continent manages to find a common voice. It will be key if ‘African Agency‘ – the African continent’s capacity to act – is reached and if the colonial heritage is cleared.

Africa’s young people represent more than a ‘demographic dividend’. A generation of young, self-confident and innovative future leaders is increasingly asserting its place in society and demanding a say in politics. Their vision for the continent’s future will profoundly influence the shape of African-European relations, and greater attention should be paid to societal and cultural exchange and fostering personal contacts to tackle stereotypes, build trust and alter perceptions. Migration is likely to remain a politically sensitive topic in the years to come, but framed differently as ‘the burden of migration’ on the European or as ‘beneficial mobility’ on the African side. Though the narratives are different, the questions to be answered remain the same. Providing platforms for professional and academic exchange, both for Africans in Europe and for Europeans in Africa, could be a practical way of harnessing the benefits of mobility in the short term. The shared challenges such as climate change, energy supply and pandemics are the immediate opportunities for cooperation. Joining forces on concrete problem-solving might foster African-European relations and can have spill-over effects for cooperation in other policy areas.

Conclusions: Pro’s and Cons of Expert Argumentations in Delphi Surveys

With an argumentative Delphi study, the collection of statistical-quantitative and argumentative-qualitative data can be used in an optimal way. As with any study in social sciences or sociology, however, the survey must be well prepared and tested by experts and organisers. Unscientific or poorly designed questionnaires have already discredited any kind of Delphi study (Sackman, 1975; Coates, 1975). At that time, it took a long time to restore confidence in the method. The possibilities of misuse, incorrect data collection or evaluation exist with Delphi surveys just as with all social science studies. However, if the survey is well prepared and conducted according to scientific criteria (Gerhold et al., 2015; Mayring, 2015 or similar), it can be carried out very quickly (feedback in real time) and can also be analysed and evaluated in a fast way.

Since the organisers or moderators of the process have insight into the data at all times, they can intervene immediately in the event of misuse or one-sided responses with manipulative intentions. The author has not yet observed any direct manipulation of the results in her own projects, but this does not mean that it is impossible. Through the online platforms, the raw data can be downloaded and checked as well as evaluated immediately at any time in the process and also at the end of the field phase. This means that the speed of the process can be used optimally, both in terms of feedback and at the end of the field phase. Graphs and tables can be created almost immediately and, with good preparation, can even be available in the desired design.

A particular advantage is that a great deal of additional, qualitative information can be collected via the arguments, which otherwise often remains hidden. With purely statistical information, the question of “why” and the intentions of the participants often remain unanswered.

But the disadvantages should not be concealed either. For example, it is difficult to understand how the mutual influences take place, especially since in a Real-time Delphi not all persons answer several times and the level at which they answer varies. The scientific situation is even less clear when it comes to a dynamic, argumentative future survey, which is not a “real Delphi”, but allows each person to judge only once (argumentative future survey). In an argumentative future survey, each participant answers on a different level and does not have the opportunity to revise the own judgement.

As with all Delphi variants, the theses must be formulated briefly, concisely and unambiguously. This requires great precision in the formulation of the Delphi theses and excludes subject areas that require detailed descriptions and explanations (many from social science fields) because these cannot be made explicit in sufficient brevity. In such cases, other methods (e.g. scenario methods) are more appropriate. Just as with all surveys, the results are not “the future” but a working material that can be used for further priority setting, planning or decision-making purposes (Cuhls, 2003).

Unfortunately, DAD software is not (yet) available off the shelf, but the current software tools have to be adapted for arguments in each case. In some cases or for higher demands, a complete reprogramming is necessary. The import of the first predefined arguments for “learning purposes” and the pretests are also an effort in the preparation of the field phase of the survey that should not be underestimated. If biased predefined arguments are chosen, the following arguments by participants can also be biased or directed into a certain argumentative direction leaving out other paths of thought. This is a danger, but we did not observe it until now. The given arguments in our DAD were in nearly all cases ranked rather low or medium and other arguments came to the forefront.

The handling of a personalised link or a password-protected access or “account” is time-consuming - also for the participants - so that everyone can answer several times and always start again from where he or she left off or start again and navigate. This requires advanced programming. Data protection requirements must be taken very seriously here.

The main disadvantage is that an argumentative Delphi can quickly become very time-consuming - both for participating experts and for the organisers and analysts. As a consequence, many participants also mean many arguments that have to be read each time by the participants despite the ranking (even more so, of course, if they are not dynamically ranked). The questionnaire becomes longer and longer due to the newly added arguments. In the process, the participant quickly loses the overview and is tempted to choose the most frequently selected arguments as well - here the consensus principle is almost at work again.

The same applies to the analysis. Here, too, the Delphi organisers quickly lose the overview, so that sometimes tools like MAXQDA are used, which evaluate neutrally but do not show the - sometimes important - nuances and details. Or the list of findings leads to the fact that only the upper ranks, the frequently chosen arguments are looked at, whereby the rarely mentioned statements sometimes also offer interesting additional information, but receive less attention. Therefore, only “qualitative evaluations” are possible here.

Another disadvantage is that the qualitative results, which are so interesting, often no longer find a place in the reports and publications, which are already very long due to the tables and graphs. So much information is often generated that it can no longer all be accommodated. This goes hand in hand with the trend towards ever shorter and more visual reports, combined with ever shorter attention spans among recipients. The detailed findings can then no longer be mentioned - which is unsatisfactory for the participating experts when they have made such an effort. Paradoxically, the advantages of the new methodology in terms of information generation can at the same time be disadvantages of the method.

Argumentative future surveys and especially the dynamic argumentative Delphi surveys offer many new possibilities to generate information about possible futures. They thus go back to the original idea of Kaplan et al. or Dalkey and Helmer to make use of expert opinions and their argumentations - as reported in Dayé, 2020. Argumentative surveys do not only collect statistical data (numbers, percentages), but also the associated rationales, which in turn contain further information or explanations. Coming back to the initial assumptions, we have to consider:

  1. 1.

    Yes, the number of participants can be significantly expanded compared to traditional Delphi and even typical online exercises, but the arguments get too exhaustive after a certain number of arguers is reached. This limits the participants‘ direct exchange.

  2. 2.

    Yes, if well organised in the survey, the consensus test on the quantitative variable is explicitly linked to arguments, so the usual pitfalls of averaging opinions are somewhat mitigated. But the organisers of a Delphi survey should be aware that this effect is never fully gone. It is a huge advantage that there is not only statistical feedback and the analysts understand better why participants judge this way.

  3. 3.

    Yes, participant interaction focuses on the most important/ ‘popular’ issues or arguments, preventing digressions. But sometimes these popular issues are not the most interesting ones, so it is important to have a look at the „outlier’s opinion“, too.

  4. 4.

    Yes, it prompts respondents to select existing arguments as originally formulated, rather than unnecessarily adding ‘new’ reasons that partially or fully duplicate reasons already entered. But there are still duplicates or similar reasons - analysts have to check for them.

  5. 5.

    Yes, it facilitates the interpretation of the “meaning” or significance of the quantitative result by highlighting which arguments are more in favour or against. These additional information are very useful, not only for writing text but mainly for the interpretation of the Delphi results in general. It also brings the „mainstream topics and arguments“ to the forefront.

Nevertheless, everyone can reconsider and change their own opinion without having to justify themselves. Anonymity is guaranteed.

In order to use this potential, an online platform is needed, which unfortunately still requires a lot of programming. Current online tools make surveys more detailed and adaptable to the participants, but adaptation and testing become more difficult and take longer.

Deliberate misuse is possible with a DAD, just as with all real-time Delphi studies or social science surveys - but has not been known so far. Biases can occur just as in other surveys, but are more often revealed through argumentation and communicative exchange. Nevertheless, it cannot be ruled out that misuse of the increasingly simple tools and misinterpretation of the statistics will increase.

However, the main problem that all surveys have today cannot be solved by a DAD either: In a time of information overload, it is becoming increasingly difficult to attract and retain knowledgeable people (experts) to participate in the survey, to convince them to read through everything and come back several times to revise their own assessment if necessary. Whereas in the early days of national Delphi studies (Kuwahara et al., 2008; Cuhls et al., 2002, 1998, Cuhls, 2016, 1998, BMFT 1993) it was still possible to score points with the incentive that participants would receive the results automatically and earlier than others, such incentives no longer work today. Even in Japan, a country with more than 50 years of experience and national Delphi studies every five years (latest version: NISTEP 2019), this decline in participation can be observed. Scientific studies with many participants (“big Delphis”) like BOHEMIA will therefore remain rare. However, the potential of “communicating with many” about very different future issues is there. With DAD, swarm intelligence (see also Surowiecki, 2017 and 2004) could indeed be used and activated, if it really exists.

In terms of content, Dynamic Argumentative Delphi studies offer immense potential not only to provide assessments of future issues, but also to generate qualitative information for a better classification and understanding of the (statistical) results in cases of topics, theses and issues that have to be judged for and under uncertainty. For the judgement of relatively clear and certain issues, a Delphi is not necessary.