1 Introduction

In recent years, there has been an increase in situations in Germany that pose “a serious threat to […] the fundamental values and norms of a social system, which—under time pressure and highly uncertain circumstances—necessitates making critical decisions” (Rosenthal et al. 1989, 10). This increase is highly likely due to climate change and risks will also highly likely rise even further (Otto et al. 2018; IPCC 2023). Of course, this is nothing new, at least seen from the perspective of security research. Recent accumulation of crises and growing awareness of potential climate crisis impacts have amplified the need for functioning crisis management. In the practice of some organizations, however, things look different, probably due to the country's low-crisis history (Dombrowsky 2014) and the predominance of blue-light organizations like the police and fire departments in managing disasters. Situations such as the 2015/2016 refugee movements into Germany, as well as the COVID-19 pandemic, were not classic “blue light” or disaster management situations. Rather, they required the involvement of public administrations at various levels, and in particular the local crisis management of municipalities and communities. Our data from previous research projects on crisis management of public administrations show that although the situations were not completely unexpected, they confronted the administrative parts with unexpected challenges. Crisis management was often not perceived as a “natural task” for some public administration departments or representatives. Some did not even see themselves as responsible in this area. Due to their duration, the dynamic developments, e.g., in political and legal terms, and the simultaneous involvement of various administrative subsystems, some public administrations faced difficulties that they had not previously encountered (Schönefeld et al. 2023; Schütte et al. 2022b).

Consequently, German public administrations have taken on the unfamiliar role of crisis management. Some reasons for this unfamiliarity were probably that a large number of them had neither experienced such real-life crisis, nor had they regularly practiced their crisis management in any form for years (Schmitz 2021). In addition, the crisis management of public administrations in Germany is characterized by many “grey areas” and a diffuse diversity due to the federal system and hardly any clear legal regulations in the area of crisis management—exceptions here are decrees and ordinances in a few federal states (Schönefeld et al. 2023). For example, the composition of crisis teams and the implementation of their work is usually the responsibility of such teams themselves (Klinger et al. 2022). At the latest with the COVID-19 pandemic and parallel situations such as the 2021 floods in parts of Germany, awareness for issues of mitigation, precaution and preparedness for crises seems to have grown. Crisis management as a central task, a self-image as a crisis managing organization and the recognition of the need to maintain the ability to act in and for crises are being accepted by more and more representatives of public administrations (Schönefeld et al. 2023). This is accompanied by the realization that something must be done for this purpose, i.e., practiced or exercised (Thielsch et al. 2023). To this end, exercises subject people, technology, organization, and their interactions to endurance tests, thereby identifying weak points, such as those in infrastructures, organizational forms, practicability, etc., early on, and optimizing them for emergency situations. This is especially true for inter-organizational exercises when different organizations are involved, as in crisis (management) teams (CTs)Footnote 1 (Bach et al. 2023). Such exercises, especially when they are comprehensive and as close to reality as possible, like CT and full-scale exercises, are often not implemented by public administrations alone, because the latter lack the time, personnel and specialist resources to do so. They are usually supported by external parties, such as private companies, and scientists, who design, (scientifically) accompany and evaluate such exercises. Even if there are guidelines for crisis management in general, as the “Crisis Communication Guidelines” (BMI 2014), for example, which contains information on the individual phases of crisis management and the relevance of inter-organizational exercises, however, the way in which these can be evaluated remains a blank. There are almost no standards for what such an evaluation might look like (Drews et al. 2019; Kern et al. 2021). The question that emerges is whether (fully or partially) standardized instruments are necessary, and to what extend such standards can guarantee a heightened level of objectivity in the assessment or evaluation process. As it stands, it is often a matter of negotiation between the practitioner and the person designing and or accompanying/supervising the exercise.

This was also the case for a public administration in Germany in 2022. Together with a consulting enterprise, it developed a CT framework exerciseFootnote 2 with a complex scenario to practice the administrative crisis management and its instruments. The aim was to determine the suitability and applicability of the underlying staff regulations. The interdisciplinary team of the authors of this text with a social sciences and safety engineering background provided scientific support for the exercise. Their interest was to examine how communication and leadership processes did or did not function against the background of the newly introduced staff regulations during the exercise. They elaborated an evaluation approach for the exercise in exchange with the administration representatives and the exercise managers.

This paper describes the evaluation approach developed, discusses some exemplary results and the potentials of a partially standardized approach. Finally, it is determined to what extent a multi-method approach to exercise evaluation makes sense and is sustainable, which contains a standard framework but also leaves scope for individual adaptation. Additionally, it is also discussed what practical and scientific added value results from this. However, before introducing the concrete approach, the state of the art of such exercises and their evaluation is examined in more detail in order to identify methodological and content-related orientation points.

2 Basic Points of Orientation in Terms of Content and Methodology for Evaluating Crisis Team Exercises

The field of research on the topic of exercises, especially in security contexts such as staff framework or crisis management exercises, is not particularly large (Beerens 2021). The topic of exercise evaluation is even smaller and more specialized in this context. In order to find a content and methodological basis for the development of an evaluation approach, a brief (basic) search was carried out, which primarily referred to German-language studies and was later expanded to include English-language research. A first summarizing finding from the analyzed literature is that although the relevance of crisis management exercises is repeatedly emphasized in science and practice, it is not equally reflected in research papers. Beerens (2021) goes into this in more detail in his work and makes it clear that the field of exercise research is relatively fragmented. Above all, this means that there is rather little (published) research on exercises, which is probably even more true for interorganizational exercises, such as those common in the CTs discussed here. Another point is that concrete approaches to planning, implementation and evaluation are rarely presented. The design of exercises and its potential impact on the success of exercises is rarely explicitly presented and investigated. This is similar with regard to methods and approaches of corresponding evaluations as well as the added value of their use and opportunities to learn from corresponding findings. There tend to be gaps in the literature here (Bach et al. 2023; Beerens 2021; Berlin and Carlström 2014; Grunnan and Fridheim 2017). A possible reason for these gaps is that “[p]lanning and running exercises is in many ways a “practitioner’s game” (Grunnan and Fridham 2017, 82). Additionally, the field tends to be dominated by the private sector, in particular consultancies, and less by scientific actors. Probably also for competitive reasons, among others, no specific instruments and exercise designs will be published and evaluation results, if an evaluation is carried out, remains internal (Grunnan and Fridheim 2017).

From the published studies, however, some indications can be derived regarding the (1) content orientation and (2) methods of evaluations, which have also been incorporated into the design of the present evaluation approach:. Regarding (1) content, some topics can be identified, especially in the somewhat broader literature on the topic of crisis management and staff work, which are repeatedly emphasized as relevant and critical to success: leadership or management (performance) in CT, decision-making and communication, interaction in CT, (interorganizational) work and coordination in terms of common goals, clear structures and processes, responsibilities and competencies, and role clarity, to name the ones most addressed (e.g., Adam and Schaller 2021; Berlin and Carlström 2014; Curnin et al. 2022; Gißler 2021; Grunnan and Fridheim 2017; Heumüller et al. 2014; Högl 2013; Hofinger 2008; Laurila-Pant et al. 2023; Moon et al. 2023; Rehfeld 2022; Sørensen et al. 2020; Son et al. 2020; Strohschneider 2011; Thieme and Hofinger 2008; Trauboth 2022).

Regarding methodological aspects (2), there is much less evidence and agreement, as Kern et al. (2021) point out: “However, the benefits and impact of such exercises are currently often limited by inconsistent and non-systematic approaches to exercise evaluation”. Beerens (2021) confirms this in his comprehensive research on disaster risk management exercises. There are hardly any evaluation standards, especially as only a few studies report on specific methods and their implementation. He points out that many aspects are lacking here. According to his research, the usability of evaluations in crisis management is relatively limited because their results remain anecdotal, the approaches are often not very systematic in terms of documentation and evaluation, are rarely objectively validated and are merely perceived as “paper-pushing activities” (ibid., 22). Although there are almost no standards, there is “common methodology”. (Jacquinet et al. 2022). An often-used instrument for exercise evaluations is observation. Observations of CT are used to study the obvious or visible, such as communication and interaction between staff members, behavior of individuals, but also whole groups for comparisons (e.g., Cipolat and Blanche 2010; Drews et al. 2019; Gißler 2020; Halwachs and Sifferlinger 2023; Helfgott et al. 2021; Heumüller et al. 2014; Moon et al. 2023; Osarek and Künzer 2022; Pettersson et al. 2022; Reuter and Pipek 2009; Son et al. 2020; Son et al. 2022; Starke 2006; Steinke 2018; Strohschneider 2008; Unger 2010). In the context of computer-supported exercises and simulations, the system-immanent documentation or data collection as well as video recordings are also used more frequently, since they are usually collected automatically in the process. Other more technology-based approaches include evaluations of email communications and digital mission diaries (Duvillard 2018; Ellebrecht et al. 2013; Haddington et al. 2022; Hofinger and Zinke 2013; Jacquinet et al. 2022; Laurila-Pant et al. 2023; Pettersson et al. 2022; Son et al. 2022, 2020; Starke 2006; Starke and Strohschneider 2005; Strohschneider 2003; Unger 2010). Although survey and interview instruments are generally popular methods in order to capture individual perceptions as well as personal assessments of performance, success, etc., in addition to observations, they are less frequently found in the exercise evaluation literature. Some mention (quantitative or standardized) survey instruments and/or interviews which could be used before and or after an exercise—though surveys seem to account for a larger share than interviews (Crisanti et al. 2022; Fiedrich et al. 2012; Gißler 2020; Helfgott et al. 2021; Sørensen et al. 2020; Son et al. 2022; Thielsch and Hadzihalilovic 2020; Unger 2010). Equally rare are elaborated mixed-methods-designs that (loosely) combine, but triangulate more than two different methods and or data (sources) (Bruns et al. 2022; Drews et al. 2019; Gißler 2020; Reuter and Pipek 2009; Unger 2010).

Against the background that there is no standard evaluation approach for the monitoring of exercises, the team of authors has developed its own mixed-methods evaluation approach based on the mentioned commonly used methods, which is presented in the following section.

3 Mixed-Methods Crisis Team Exercise Evaluation: Background and Approach

Exercises in the context of civil protection, disaster and emergency management have been scientifically accompanied for many years at the higher education Institute of Public Safety and Emergency Management (IPSEM). Both large, inter-organizational and small, intra-organizational exercises were examined and evaluated. Due to the interdisciplinary orientation, safety engineering and social science approaches as well as experiences from prior evaluations were incorporated into the respective evaluations as well as in the approach described in this text.

3.1 Background and Design

During the discussions between the author team and the exercise leaders of the above-mentioned exercise, communication and leadership emerged as topics of interest, in addition to the evaluation of the newly introduced staff regulations. The main question was how these two aspects were influenced by the staff regulations during the exercise. As the literature review results above show, leadership and communication are two relevant topics when crisis management is practiced. On the one hand, everything within staff work ultimately takes place in some form of communication, and on the other hand, staffs are a management tool or support tool for leaders in themselves (Behrmann et al. 2022; Gißler 2020, 2021). Both of these aspects are proving to be all the more challenging in the case of interdisciplinary and inter-organizational CT such as those of public administrations (Bach et al. 2023). This is, among other things, due to different organizational logics coming together. The members of (interorganizational) CT are recruited from parts of the public administration, fire department, police, aid organizations and other affected stakeholder groups (e.g., energy, electricity operators and water supply) (Behrmann et al. 2022). In order to prepare well-founded decisions, they generally have to develop a joint picture of the situation and ideally a shared mental model. To do this, they exchange information on the situation from their original organizations within the crisis management team in (usually structured) situation reports,Footnote 3 bilateral communication, e-mail exchange, telephone and radio, if necessary, etc. (based on recommendations like written in the FwDV100) By creating temporarily valid common organizational and sensory structures such as goals, clearly defined responsibilities, coordinated communication channels and modes, etc., as is the case in staff exercises, the clash of organizational logics is cushioned (e.g., Hofinger 2008), but potentials for friction and conflict still remain.

Verbal, non-verbal and technically mediated communication in the sense of processes in which information is transferred between sender and receiver thus forms the basis of this evaluation (Behrmann et al. 2022). In order to be able to better handle the topics of interest research-related, an additional sociotechnical analysis approach was taken. The approach has become established in the context of event security and safety research at IPSEM following the assumption that inter- and intraorganizational safety and security production can be made visible and analyzable in terms of human, technological and organizational elements of systems and their interactions (Pasmore et al. 2019; Schütte et al. 2022a; Schütte and Willmes 2022). Human elements (category “people”) stand for the people themselves, their behavior, human factors, personal characteristics and traits, knowledge and qualification, social ties, social measures, etc. Technological aspects (“technology”) comprise technologies, engineering, technical systems, building constructions, rooms, infrastructure and premises. Structures and processes, i.e. (formalized) organizational design, are organizational components (“organization”) (Schütte et al. 2022a; Schütte and Willmes 2022). These three elements were considered as central analysis categories for communication and leadership within the survey, interview and observation instruments (see below).

Ultimately, and in this CT framework exercise for the first time, the research team used a design with five phases, alternating quantitative and qualitative methods (see Fig. 1). The goal was to examine the subject of the staff regulations as comprehensively as possible. To this end, quantitative and qualitative research methods were combined quasi-sequentially, and the data obtained and, above all, the results were triangulated with one another (Flick 1992, 2011). More details can be found in the following explanations of the individual steps.

Fig. 1
figure 1

Evaluation approach for crisis team exercises; own illustration

3.2 Instruments for Data CollectionFootnote 4

3.2.1 Step 1 and 3: Standardized Online Survey “Before” and “After”

The idea of the survey in advance (“Before”) was to gain a mostly unbiased view from those involved in the exercise on the staff regulations and the exercise itself. Therefore, it was conducted before the exercise (from one week until the day before the exercise). The focus was on the participants' state of preparation, expectations in general and in particular technical, social/personal as well as organizational aspects. To this end, they were asked about the following: (1) experience in crisis management, real operations and exercises, and preparations, (2) emotions related to the exercise, (3) the extend of knowledge about the staff regulations, (4) expectations regarding room and technology, as well as social (e.g., personnel composition, togetherness), and organizational (such as staff regulation, information management, coordination) aspects. The latter are based on headings in the staff work manual Handbuch Stabsarbeit from Hofinger and Heimann (2022) which presents them as significant components of CT. In addition to a majority of closed questions, some open questions were also included, for example on an individual assessment of the staff regulations, in order to gain deeper insights here. The survey was distributed among exercise control members, participants of the staff framework exercise, and those of the operational-tactical staffs exercising alongside, but remotely, on the same scenario. In total, this encompassed a population of around 150 individuals.

The second survey (“After”) took place immediately after the exercise over the course of one week. Similar to the pre-exercise survey, the same topics (1–4) were asked again to see to what extent changes in views and lessons learned, e.g., about the staff regulations, could be determined here. Topic (4) was expanded to include questions on communication and leadership. Questions on the latter topic were oriented to explanations of success-critical factors of leadership performance (e.g., Gißler 2020, 2021; Hofinger and Heimann 2022; Linnenbürger 2020). The topic of communication was underpinned with excerpts from a tested and validated survey tool from a previous project at IPSEM (Behrmann et al. 2022; Hofinger and Heimann 2022; Linnenbürger 2020). The evaluation was carried out using descriptive statistics. The open questions were evaluated via qualitative category-based content analysis, analog to the interviews described in step 4.

3.2.2 Step 2: Exercise Observation

The observation represented a peculiarity in the case. At IPSEM, open observations supported by a very rough guideline are usually conducted in the field, e.g., at major music events (Schütte et al. 2022a; Schütte and Willmes 2022). This was not possible in the present case, as exercise developers and those responsible from the public administration wanted to limit potential disturbances by external persons as much as possible. Therefore, only two “observers” were allowed to participate silently at places (initially) assigned to them. Due to the size of the room and up to 48 participants during the three exercise days, it was hardly possible to closely monitor the communicated content in the room in a qualitative manner. It was decided to implement a quantitative observation with possible qualitative additions. The observers recorded who communicated with whom at what time in a prepared excel list (communication processes could be verbally, by telephone or radio). In addition, situational characteristics were written down in more detail (e.g., situation presentations, video interventions, accumulation of people somewhere in the room) on classic observation sheets, allowing for comments e.g. on individual stress situations, peculiarities in communication processes or qualitative observations of leadership. Thereby, communication in the room could be counted and complemented by observation of general behavior and interactions. These observations have been used to add exercise-specific questions to the guided interviews (see step 4). To facilitate the mixture of the quantitative and qualitative observation during an exercise or “real life” CT settings, the IPSEMdeveloped an app, where communication processes between the CT members can be recorded even more quickly and notes can be included at any time and each input (quantitative or qualitative) has its own automated time stamp. The app has been tested and improved during exercise evaluations between May and December 2023.

The communication processes were analyzed in three different manners, anonymized, partly anonymized and in full. The (partly) anonymized results do not allow conclusions to be drawn about individual participants, it simply describes the extend of detail used for the evaluation: The participants can be divided into four groups, “lead”, “subject groups”, “administrative services” and “expert advisors”.Footnote 5 In the anonymized analysis, the communication processes between these four groups were considered, not taking into account different subgroups. Those were part of the partly anonymized analysis whereas the full assessment allows for individual assessment of members of the sub-groups. Due to the fact that most subgroups only consist of one to three members, these findings could be used to draw conclusions about individuals and are therefore not used in publications or other presentations of results. All results can be displayed at a certain point during the exercise (in 5-min time stamps) or cumulated until a certain point during the exercise. The first being especially important for the analysis of leadership and leadership processes, the latter for an analysis of communication processes and channels in general. For both use cases there are examples in the “data evaluation” section of this text.

3.2.3 Step 4: Guided Interviews

After the exercise, guided interviews were conducted with volunteers from the exercise participants. For this purpose, the representatives present from all subject areas, administrative services, expert advisors and lead were approached (48 people) and asked for an interview. The interview guideline was set up aligned to the surveys (Step 1 and 3). However, only open-ended questions or narrative prompts are formulated here, so that the interviewees had the opportunity to speak comprehensively and to decide independently on the direction and depth of their statements. The interviews were analyzed using a qualitative category-based content analysis (Gläser and Laudel 2010; Mayring 2015). For this purpose, a deductive-inductive approach was used. Deductive categories, i.e., categories determined on the basis of preliminary theoretical considerations (see above), were communication and leadership with the respective subcategories of people, technology, and organization. Inductive categories were formed on the basis of the material runs through the interviews. In the last step, the categories were combined into essential statements against the background of the text passages found. Ultimately, this was always oriented to answering the question of the usefulness of the newly introduced staff regulations.

3.2.4 Step 5: Triangulation

In order to not let the collected data and obtained results stand on their own, the research team planned a triangulation of methods, data and results (after the elicitation and individual evaluations), i.e., a kind of method, data or result linkage. The team wanted to be able to cover the subject under investigation as comprehensively as possible and to enrich and complete the findings (Denzin 1989, 2008; Denzin and Lincoln 2000; Flick 1992, 2011). Therefore, the dominating triangulation logic was that quantitative and qualitative research could support and complement each other (Bryman 1988, 1992; Flick 2011).

This use of synergies is shown in various aspects of the evaluation of the results. Using the same categories for the qualitative content analysis of the surveys and the interviews is one step, allowing for extended personal views and individual perceptions by the participants. Furthermore, expressing these views in a face-to-face, rather spontaneous situation of an interview possibly appeals to some people more than writing down their individual thoughts in a structured way during a survey; and vice versa. Synchronizing the survey questions with the interview guideline, as stated earlier, is another important aspect, acknowledging the advantages of both qualitative and quantitative research methods and backing up or questioning results from the one with those from the other. The comparison between individual perceptions before and after an exercise can be used to differentiate between expectations and retrospective assessments. The individual perceptions of the participants (29 people completed the “Before” survey, 42 the “After” survey and 17 participants agreed to interviews) were contrasted with the results of the observations. Do these individual impressions match the actual observed communication processes? Do leadership processes and tools, assessed during the surveys and interviews, have impact on the observable activities in the CT?

To give a clearer picture of what the combination of the methods and their results can mean in this context, the following section shows exemplary results combining the used methods.

3.3 Data Evaluation: Exemplary Results and their Meanings

As described, IPSEM works with three different methods (a survey before the exercise, a survey afterwards, the observation of the exercise itself and interviews with the participants afterwards) with the ultimate goal to describe and evaluate communication and leadership aspects during an exercise. Since the evaluation is still work-in-progress, this passage focuses on the communication category for the exemplary results. The interaction of these methods will be shown here using two examples from the above-mentioned, 3-day exercise for a public administration in Germany we evaluated in 2022: (A) (experienced) communication and (B) communication in relation to leadership tools. 44 to 48 persons attended the exercise each day, 21 to 23 working in the subject groups, 7 to 8 in the administrative services, 11 to 13 as expert advisors and 4 in the exercise lead. The exercise control (management, white cell, etc.) comprised of approximately 40 persons, seated in adjoining rooms.


Example A “Communication”.

Starting with two matching survey questions „The conditions for functioning communication among the participants are good “ (before) and „The staff regulation provides a sufficient framework for communication “ (after), Fig. 2 compares results.

Fig. 2
figure 2

Results for two matching survey questions regarding communication in the staff during the exercise, asked before the exercise (blue) and afterwards (orange); nbefore = 29, nafter = 42; own illustration

The data show that before the exercise, the participants were rather optimistic regarding functioning communication, 76% stated “fully applies” or “applies” to the question. Here, the familiar facilities and some reading knowledge of the newly introduced staff regulations went into the assessments. Also, 83% of the participants thought beforehand, that the composition of the staff was sensibly chosen, indicating that the majority of the participants were familiar with each other—either from a work context or due to the workshops that were carried out in preparation of the exercise.

After the exercise, when the participants had experience with using (not only reading) the staff regulations, people were slightly more negative and, maybe most importantly, more uncertain (31% stated “I don’t know” here) if communication is well addressed. This indicates at least some negative experiences of communication processes during the exercise.

The consultation of the respective quantitative data from the observation of the exercise can help to interpret these statements. Figure 3 shows the anonymized, cumulative communication processes divided into the different communication channels for a general overview. Also, for this purpose, we provide some statistics of overall communication processes in Table 1.

Fig. 3
figure 3

Cumulative communication processes at the end of the exercise (810 min in total, 44 to 48 persons per day). Darker cell backgrounds = more communication processes; own illustration

Table 1 Overall statistics of communication processes during the three exercise days; own illustration

Communication processes during a CT meeting (be it for an exercise or an actual crisis) are not evenly distributed through the participant groups, due to the different tasks and responsibilities they have. During the exercise at hand, but also being the case for most exercises the IPSEM team were allowed to attend, by far the most communication processes took place between the members of the subject groups (S1 to S6, summarized here): accumulated over the exercise days, they communicated 509 times with each other, gave information to or discussed with the lead 85 times, and discussed with or asked the expert advisors and administrative services 164 + 51 = 215 times. They were also the most addressed group by the other participants.

From the figures in Table 1 it becomes obvious that the participants were able to increase the interactions with other particoants throughout the days. The so-called “chaos phase” in the beginning is a rather well-known phenomenon during exercises, sometimes even during real crises, where people have to adjust to the situation and are more preoccupied with themselves and their respective tasks. But the fact that the participants were able to increase the communication processes per hour by a factor of 1.6 between days 1 and 2 and 1.8 between days 1 and 3 demonstrates the good communication skills of the CT. Assuming a medium of 46 participants, the figures show 3.5 communication processes per participant and hour on day 3. Taking into account that most of these processes are not quick “answer-response” issues, but rather complex discussions and the fact that the subject groups, having the most responsibilities in the CT, take part in 75% of these communication processes (extracted from Fig. 2), but only accounted for approximately 50% of the participants, a high communication load for this group can be stated. From a quantitative perspective, these figures do not suggest insufficient conditions for communicating. They are also not able to evaluate the quality of the communication processes.

To get to the bottom of these assumingly inconsistent results, more information from the participants can be taken into account. From the “after” survey we know that 38% of the participants found all information received was relevant to them, for 60% this rather did not or did not apply. Also, only 24% found that a lot of the communication with other participants was necessary for their work, for 67% this rather did not or did not apply.

It can therefore be assumed that the participants thought correctly that the premises (and other parameters known beforehand, e.g., familiar participants) provide for good communication conditions, but once the “chaos phase” was over and communication processes started picking up, it became clear that not everybody knew what to communicate with whom. The sheer number of communication processes especially regarding the subject groups therefore indicates a tendency to ineffectiveness of communication.

The interviews also provide evidence for this assumption. The interview guideline aimed specifically at the staff regulation and if and where it was helpful for crisis management during the exercise or where weaknesses lie.

For communication processes, the answers were ambivalent: most interviewees stated that after the “chaos phase”, from the second day on, communication became “better”, they “learned” which information was important for whom, how to present the information during situation reports or which other staff members had potentially important information for them, i.e., the communication processes gained in quality or effectiveness. But the aforementioned ineffectiveness also manifests among the interviewees:

“After a while, I also noticed what was important and what wasn't with the situation report […].” (Interviews, “subject group 3”, 2022)

“But certain information actually did not accumulate with me. […] that you can't really get a good feel for how the development actually is on the ground right now.” (Interviews, “expert advisor 7”, 2022)

“In my opinion, this was not yet fully developed. Because some information simply arrived in duplicate and triplicate.” (Interviews, “subject group 2”, 2022)

Regarding one of the main goals of the exercise evaluation, the assessment of the staff regulations, it becomes equally clear that those staff regulations can only provide a framework and the participants mostly seem to have a realistic view on what staff regulations can or cannot provide for:

“However, this can be presented in so well in the staff regulation. It just has to get into people's heads. […] Simply to train […] accordingly. That they do know: It's in the staff regulation, but they've simply internalized it.” (Interviews, “subject group 1”, 2022)

“The staff regulations should provide for substitution rules and information on which subject groups have to consult with each other” (Interviews, “subject group 2”, 2022)

“The staff regulations need to be internalized […]. […] smaller exercises just for information in- and output make sense“ (Interviews, „subject group 1“, 2022)

“[…] more knowledge about the expert advisors” (Interviews, “expert advisor 7”, 2022)

Those “wishes” regarding the staff regulations that we requested during the interviews seem both realistic and reasonable. In conclusion, it can be said that the observed and perceived inefficiency in internal staff communication was realistically assessed by the participants and decreased in the course of the exercise days. The staff regulations were helpful only to a certain extent, the participants provided some useful suggestions for improvement. But “overregulating” has to be avoided and a lot of the participants concluded that exercising is preferable to a too much detailed and therefore no longer generally applicable framework.

Example B “leadership tools and their effect on communication”.

A small but very specific example of how leadership tools affect communication processes is the comparison of those processes before and after so-called situation reports.

From the observation it is known at what time situation reports took place. The communication processes (again anonymized to the four subgroups) in the half hour before and the half hour after provide information about the impact of these reports (see Figs. 4 and 5).

Fig. 4
figure 4

Cumulative communication processes during the 30 min before a situation report (in the middle of day 2 of the exercise, 45 persons) (pls. note the different color code from Fig. 2); own illustration

Fig. 5
figure 5

Cumulative communication processes during the 30 min after a situation report (in the middle of day 2 of the exercise, 45 persons) (pls. note the different color code from Fig. 2); own illustration

Especially for the already as the most actively communicating members of the CT identified “subject groups”, the difference between the two 30-min situational snapshots is evident. In total, the subject group members address more than twice the amount of people after the situation report than before (47 vs. 21). Of course, there is no information on the quality of these interactions: are they questions? Information given? Discussions? Due to the fact that the staff lead ends each situation report with new (or repeated) tasks for the subject groups and some other staff members, it is likely though that a lot of the interactions are information exchanges.

This can be combined with statements such as cited above already: “After a while, I also noticed what was important and what wasn't with the situation report […].” (Interviews, “subject group 3”, 2022) or.

“[…] these situation reports that came in […] with a lot of accessories, but also a lot of important things, where you have to sort out what is important […].” (Interviews, “expert advisor 5”, 2022) or

“The situation reports. […] we can definitely get better at this” (Interviews, „subject group 1“, 2022).

Sorting out the important information from the situation reports and giving stringent, short and concise situation reports were two aspects described as rather difficult in the interviews. The sharp increase in communication processes may therefore be partly due to the fact that the information received had to be reviewed and classified.

The “After” survey included two questions regarding the quality of situation reports during the exercise. The results of these questions are shown in Fig. 6 and are another step in the assessment of the question at hand, how a leadership tool such as a situation report affects communication.

Fig. 6
figure 6

Results of two survey questions (“after”) regarding the situation report; n = 42; own illustration

Again, similar to the triangulation in example A, the results send slightly different signals that have to be addressed. 71 resp. 83% of the survey participants had a rather positive image of the situation reports. Here, the drastic rise in communication processes after a report indicates that the CT knew their tasks and were eager, willing and able to dive into them once the situation report was finished.

Of course, the truth will most probably lie somewhere in the middle. Summarizing information and making it accessible for everybody is both a helpful and a difficult task, that therefore has to be practiced. And obviously, the more practiced a staff member is the more effective they can display the relevant information during their reports and also distinguish relevant from not so relevant information in others.

The example shows that a triangulation helps to provide a “360°-view” of a certain topic, such as communication in a CT. Looking only at the results displayed in Fig. 6 would give an incomplete, mainly positive, picture. The objectively measurable communication processes help with classification and the interviews show nuances and capture personal, internal assessments.

4 Conclusion, Impact and Outlook

What do those findings say about the quality of the proposed evaluation approach?

Both these examples do not show surprising results. Statements like “communication has been ineffective” or “I didn’t know who to address” or “I didn’t get all the information I needed for a comprehensive situational picture” are often heard after exercises as well as during review meetings of actual CT. But even if individual findings from the respective individual methods are not surprising in themselves, their extended consideration through the results of the application of other methods certainly leads to interesting insights. In terms of the concrete example here, it was only through the interviews and the questionnaires that it became clear why an increased level of communication was observed during the exercise. In this case study, it was partly due to the administrative backgrounds and logics of the individual members, for whom, for example, the staff regulations were not formulated clearly enough and offered too little concrete (and legally secure) guidance. Additionally, some interviewees also lacked clear structural guidelines for meetings and exchanges from the CT leaders. The survey results underlined this in that they show that in many cases the right information was not transferred to the right places and added also the aspect of technical difficulties. Such triangulated results are an initial indication that a multi-method evaluation approach as well as a triangulation of methods and results can deliver empirically sound results. In addition, they level out bias that are inherent in the nature of the individual methods. The methods used are long validated and established in empirical social research. Beyond that, however, (combined) use of different methods enables a comprehensive view of a topic such as communication or leadership (or their combination) in CT. And the socio-technical analysis framework is open enough to be supplemented by different topics and thus enable individual adaptation to specific training needs. In the opinion of the authors, the triangulation method in this context is particularly valuable because it allows an evaluation to become comprehensive and can complete it with regard to the subject matter. But the it is still rare in evaluation processes and also not yet fully discussed in the fields of empirical social research. More research and elaboration with regard to the development of standards, reliable measurement and quality criteria still needs to be done (Beerens 2021).

The experience of the team of authors in accompanying exercises shows also that in science-to-practice-transfer-aspects practitioners appreciate the combination of results from different data sources, particularly for the follow-up of exercises and ask for them in order to have a valid basis for possible changes. The authors believe that the partial-standardized approach described here not only is able to support this transfer by allowing for a standardized, easy-to-apply display of results. It can also improve the quality of results due to a high objectivity, the immanent learning process of the scientists and the comparability of results over a large span of time (and a large set of executive scientists).

Such an approach has further implications for research and practice in this field as well. The systematic survey, analysis and a practice-oriented elaboration have the potential to increase the acceptance of scientists as companions of exercises and confidence in the scientific foundation itself. In the case of the authors, this can be seen, for example, in the fact that they are repeatedly asked by certain stakeholders to accompany exercises and that they are also recommended for exercises of other stakeholders. However, this is only possible if the practical side also considers and implements a thorough follow-up which is not always the case (Bach et al. 2023). Only then, exercise leaders, for example, are able to identify results that can help to improve crisis management, design even more suitable exercises or refine staff regulations to make them even more fitting to their context. Otherwise, there is a risk that the results will remain in the scientific “drawer” be forgotten or misinterpreted. On the other hand, for scientists it also means a certain feeling for the field of practice and its requirements for the presentation of results (Baroutsi 2023). What can make scientific evaluations uninteresting for practitioners are, for example, a potential lack of practical relevance, a lack of clarity about their added value and benefits (especially in relation to the (transactional) costs incurred by practitioners) and the long time it takes to produce them. The authors worked out the present evaluation approach in the course of requested scientific exercise support. In the situation described above, it worked well. One reason is certainly that the chosen evaluation approach was at least partially tailored to the exercise at hand after consultation with the exercise planners as well as the involved administrative management. It helped that the scientists already had experience with exercise evaluations and the corresponding methodological skills. The mentioned exercise was a kind of practice field to test the instruments in detail and the approach as a whole.

The latter point is of course a central limitation of this work. At the time this article was written, the instruments had not yet been tested in other exercises. It is therefore not clear to what extent this actually provides meaningful opportunities for comparison. In addition, after the first trial, the need for adjustments became apparent which are currently being implemented. The first step is to define and elaborate the individual evaluation modules as a reasonably fixed framework. One example is the quantitatively oriented observation. Based on this, an app already was designed for subsequent exercises. It not only facilitates the recording of communication activities, but also provides additional information. A timestamp is automatically stored for each activity. It is also possible to select the sender and receiver as well as special communication forms (e.g., situation report) and to make notes on them. The next tool to be adapted will be the survey. Here, the questions should be more homogenized to allow for even better “before-after” comparisons This means that more questions are included here, which can be asked both before and after the exercise in order to recognize developments. For example, the pre-exercise survey asks about expectations in terms of technical, analogue, personnel and spatial equipment during the exercise, and the post-exercise survey asks about what the exercise participants actually encountered. This has already been implemented in other surveys. However, the socio-technical perspective as a theoretical framework is retained as a common anchor of all instruments, while the topics to be examined are to remain interchangeable. Thus, it is planned to prepare topics such as communication, leadership, information management and cooperation in terms of content in question blocks so that they can be selected according to the practice interests. Briefings for other researchers could then be used to explain the evaluation framework and its use for their projects and questions within. Although the idea was to develop a modularized and phase-related evaluation design, which consists of quantitative and qualitative research tools or modules, that can be used primarily formatively, i.e., process-accompanying, either as individual self-contained modules, diverse module combinations or as an overall design, for each exercise, it makes sense to check individually which measures are suitable on the one hand and feasible on the other. This makes the approach relatively resource-intensive at present. The triangulated evaluation also still requires some time, which practitioners often do not have until they need results. It will therefore be necessary to reduce the amount of resources and time required in future, possibly with the help of practitioners in the context of transdisciplinary exchange formats.

In the near future, other exercises will be evaluated using parts of the approach. Further exercise accompaniments with the whole approach have already been scheduled. On this basis, the organizational and working forms of CT can be compared, similarities and differences in crisis management can be identified, but also success factors, e.g., for communication, leadership and collaboration in CT exercises. The next step after testing, adaptation and evaluation by the authors will be to have it tested throughout the own institute and then also by external scientists and, if possible, to develop common standards for scientific exercise evaluations in the crisis management field—outside the consulting industry.