1 Introduction

Cyber security is of growing concern for society, especially in relation to critical infrastructures. Critical infrastructure is defined by the European Union as assets or systems that are “essential for the maintenance of vital societal functions, health, safety, security, economic or social well-being of people” [1]. Critical infrastructures are rapidly becoming industrial cyber-physical systems with the convergence of information and operational technology [2]. This makes critical infrastructure increasingly vulnerable to cyber incidents highlighting the need for improved cyber security and training across critical sectors [3]. There is meanwhile a growing realization within research that human factors are essential aspects of cyber security. This is highlighted by investigations into the causes of cyber security incidents which estimate that half of all incidents are in some way caused by human error [4]. This shows the large potential for improving security through understanding human performance in this domain. It is not sufficient to develop and implement technical solutions to achieve cyber security. We must consider the decisions and actions of the people responsible for using the tools that are developed. Situation Awareness (SA) is a large field of research specifically aimed at investigating the processes that can lead to human error in contexts of critical importance. The research on how SA affects performance spans many different operational contexts [5]. Some examples include how SA is essential to preserve power system security [6], how it predicts surgery performance [7], and how it can explain and prevent aviation accidents [8]. SA has many definitions, but can most generally be described as “the process of gathering information about a situation and converting this information into an awareness that can differentiate between the suitability of potential actions” [9].

The amount of research on SA within cyber security is growing, but we do not have sufficient in-depth knowledge of SA processes in this domain. Several reviews have pointed to a lack of empirical evidence regarding SA in cyber security [10, 11]. Much of the existing research is aimed towards developing tools that will improve human performance through SA. This research often assumes that such tools improve SA, although this has largely not been empirically tested [9]. The lack of empirical research leaves a considerable gap in our knowledge of how SA impacts cyber security. One recent review concluded that to fill this gap the research community should “(1) understand what cyber SA is from the human operators’ perspectives, then (2) measure it so that (3) the community can learn whether SA makes a difference in meaningful ways to cybersecurity, and whether methods, technology, or other solutions would improve SA and thus, improve those outcomes.” [11]. If we are to close this gap, we need research that specifically investigates the mental processes of SA for those working within cyber security. Then we can identify what constitutes good SA in this context, and how we can ensure and improve it.

One of the major challenges for SA research on cyber security is access to respondents. To investigate SA mechanisms researchers must gain access to operators and their working environments. There have been relatively few attempts to directly investigate SA within cyber security. Several studies report challenges due to the lack of access to respondents [12,13,14]. Much of the existing research is conducted in educational settings or exercises as part of public conventions. One proposition is to investigate SA in defined groups of cyber security specialists responsible for specific networks and services [9]. Such groups are often referred to as security operations centers (SOCs). SOCs within critical infrastructure have many of the same characteristics as settings where SA has been researched before. In other fields like aviation, control rooms, and first responders, research has resulted in the operationalization of SA mechanisms [15]. This has in turn enabled empirical testing of SA’s impact on human performance [5]. The results of the SA research in other fields hold a promise of increased human performance, but it is only through an in-depth understanding of SA mechanisms that such results will be realized. In-depth understanding can only be achieved with sufficient access to the human operators and their environment.

The first step for investigating SA in SOCs is an analysis of the goals and decisions that are performed and what information is required for the operators to gain sufficient SA during incidents. Methods of task analysis (TA) have been developed to achieve this first step but have not yet been rigorously performed within SOCs. When TAs aimed at SA are conducted in new contexts it is recommended to perform a goal-directed task analysis (GDTA) which also maps and prioritizes the goals and decisions within the specific context [16]. This study aims to conduct a full-scale GDTA establishing the requirements for SA in a SOC for critical infrastructure during incidents. This includes investigating the goals, the decisions, and the information required by the SOC operators. In addition, this study investigates different timelines of decisions and goal completion based on different types of incidents. The study aims to answer the following research questions:

  • RQ1: What are the goals of the SOC and how are they prioritized?

  • RQ2: What decisions are made by the operators during incidents and what are the related SA requirements?

  • RQ3: How do the prioritization of goals and order of decisions differ between types of incidents?

The performed GDTA answers RQ1 and RQ2. An additional investigation of incident timelines answers RQ3.

The contributions of this study are as follows:

  • It is the first to complete a full GDTA within the context of SOCs for critical infrastructure.

  • It provides empirically based knowledge of the SA requirements for SOC operators during incidents. This gives unique insights into the SA mechanisms of operators responsible for cyber security within critical infrastructure.

  • It provides maps of the SOC operators’ goals, timelines of how they handle incidents, and a detailed description of how they gather, process, and utilize information to gain SA. This is achieved by performing interviews, reviewing incident reports, and observing SOC operators in their actual working environment. The gained insight is discussed and compared with the current theoretical foundations of Cyber SA.

  • The results shed light on how different theories and models of Cyber SA can be related to different SA processes rather than being different explanations of the same phenomena.

The remainder of the paper is structured as follows: Sect. 2 presents the background of the study and related work. Section 3 describes the research methodology, whilst Sect. 4 presents the results. Section 5 discusses the results and Sect. 6 summarizes the conclusions.

2 Background and related work

In the following, the relevant research related to the study at hand is presented. First, the concept of SA is presented in Sect. 2.1. This includes a presentation of the most recognized theoretical model explaining SA as a cognitive process consisting of 3 levels of human information processing. The section also gives a short presentation of how SA has been conceptualized in groups and at a systemic level. Section 2.2 describes how SA has gained growing interest in cyber security research. It further describes how much of the existing Cyber SA research has been focused on developing tools that alleviate human operators’ SA processes with a lack of empirical SA measurements. A short description of recognized methods of measuring SA is also given. In Sect. 2.3 the existing research on SA in teams of cyber security operators is presented. It is explained how such groups are comparable to other contexts where SA research has proven useful before. Lastly, it is explained how a lack of in-depth analyses of SA processes within this context is a major barrier to empirically investigating the impact of SA, and thus a key for bridging the recognized research gap this study is aimed at.

2.1 The concept of SA

SA is a widely researched topic within human factors. The challenge of gaining and maintaining a good awareness of a situation is intrinsic in many domains. The challenge has been exacerbated by information technology enabling humans to monitor and control complex systems. The research on SA was mainly developed within piloting and airspace control. The theory and methods of SA have later been adapted to other domains where human performance is essential, such as control rooms of nuclear power plants, military command, and surgery [17].

The most recognized theory of SA was developed by Endsley and is based on cognitive psychology [17]. The theory focuses on individual information processing and defines SA as «the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future» [18]. This definition is based on the recognition of 3 levels of processing information namely perception, comprehension, and projection. Human operators first perceive elements in their situation which is then comprehended to gain understanding. Then this understanding is used in the projection of the situation’s future status to assess the suitability of different actions. This process is influenced by external factors related to tasks and the systems that are operated alongside individual factors related to differences in the mental processing of information. Figure 1 presents Endsley’s model of SA showing how information in the environment is processed in 3 levels leading to decisions and actions which in turn feed back into the operator’s environment.

Fig. 1
figure 1

Endsley’s three-level SA model [18]

Theoretical development within SA research has led to several models that explain SA at different levels of operation. The different theories of SA can be categorized into 3 groups based on what level they conceptualize SA [9]. At the individual level, Endsley’s cognitive model is still the most recognized and by many regarded as the de-facto standard. At the group level, Team SA focuses on the aggregate of individuals’ SA, while Shared SA focuses on the overlap between individuals’ SA [15]. These group-level SA models are largely based on Endsley’s individual model but are extended to explain SA in groups. Distributed SA conceptualizes SA at the systemic level as a product of the interactions between both human and technical agents [19]. Distributed SA is a systemic theory and does not adhere to the notion that SA only resides in the operators as mental processes. There has been some contention between Distributed SA and theories based on Endsley’s cognitive model [20].

2.2 SA in cyber security

Improvement of SA has been highlighted as a promising contributing factor to cyber security [21]. Within cyber security, there is growing interest in SA, but the theoretical foundation of the SA research is sometimes unclear. Most of the available research refers to Endsley’s theoretical model with the addition of some more technically based theories of SA related to data triage [9]. The term Cyber SA gained adoption from 2009 and is defined as a subset of SA relating to operator tasks aimed at cyber security [22]. A review from 2014 of Cyber SA [10] showed that the research was mainly aimed at developing tools that could improve Cyber SA. Nevertheless, the review pointed to a clear lack of empirical research assessing Cyber SA and its impacts. The knowledge gap regarding an in-depth understanding of Cyber SA processes and their relation to human performance in this domain was confirmed in a later paper [11]. Most of the Cyber SA research trends towards automating processes using technical tools that could alleviate human operators’ SA-related tasks. Still, systemic SA theories are rarely used as a basis for this research. This implies a mismatch between the goals of Cyber SA research and the theoretical models used [9].

There are many available methods for measuring SA, but they have not been sufficiently applied in the context of cyber security. The methods are mostly developed within the theoretic framework of Endsley [23]. Measuring SA is a difficult challenge as it needs to assess the quality of the mechanism of processing information. Direct observations through freeze probes like SAGAT [24] are one of the most valid types of measurement for human SA [5]. SAGAT establishes a realistic simulation of tasks that are “frozen” at set intervals where the participants are probed about their awareness of relevant aspects of the situation [24]. Observer-rating is an alternative method where Subject Matter Experts (SMEs) rate the participants’ SA based on observation during simulation [5]. Self-rating is used where participants rate their own SA, but this method is criticized based on the prevalence of bias in the measurements. Proxy measurements like eye-tracking [23] are sometimes used, but these are dependent on validation through comparison with more direct SA measurements. Lastly, performance measures are often used as a supplement for SA measurement so that relevant performance data can be compared with corresponding measurements of SA [23]. Reviews of Cyber SA research show that there is a distinct lack of empirical testing where SA is measured using recognized methods [10, 11].

2.3 SA in SOCs

Within critical infrastructure, it is the SA of the personnel responsible for protecting digital systems that are most relevant for cyber security [25]. These personnel are often organized as groups of specialists responsible for the security of a defined set of networks, services, and equipment. In this article, such groups are termed SOCs. Within the research on SOCs, there is a growing realization that the human aspect of SOCs needs to be better understood. A recent review pointed out that the interactions between the human operators and the technology developed for SOCs need to be researched further to gain the full potential of SOCs [26]. There is likewise a growing realization that the performance of human operators is highly reliant on correct mental models and that training and exercises can benefit the operators in this aspect. The development of cyber ranges demonstrates this development [27]. One could argue that SA is a well-suited concept for investigation in SOCs because this context is highly comparable to other contexts where SA research has proven fruitful before.

In the same way as for Cyber SA more generally, the SA research within SOC settings is dominated by tool development that promises improved SA. In a recent review, we found that these promises are mostly based on assumptions. Very few studies have empirically investigated SA within SOCs. The studies that do assess SA mostly rely on performance measures or proxy measures [9]. There are few but noteworthy exceptions where multiple measurements are used including freeze-probe measurements [28, 29].

The lack of SA measurement in the context of SOCs can be attributed to missing in-depth analyses of SOC tasks. In order to perform specific SA measurements like freeze probes or observer rating, one must first establish criteria for what constitutes higher or lower quality of SA. This is highly context-dependent and calls for an in-depth analysis of the SA processes within the specific context of SOCs. Methods of Task Analysis (TA) have been developed to gain such understanding of SA mechanisms within a specific context [30]. TAs are qualitative methods that map relevant tasks, decisions, and SA requirements for human operators. To develop rigorous measurement of SA in SOCs, such TAs are therefore an important first step that is largely missing within this context.

Research attempting to conduct TAs in SOC environments has been restricted by access to participants and observations in their working environments. There are several examples of mapping SOC-related tasks [31,32,33,34,35,36,37,38,39,40,41,42] but only a few studies document complete TAs aimed at SA [12, 13, 43,44,45,46]. One study from 2005 analyzed cyber defense tasks of information assurance analysts across several organizations. The few more recent studies that have conducted such TAs had restricted scopes and only analyzed SA in a small set of tasks or roles. One series of studies investigated the tasks and team communication of cyber analysts [13, 44], one study investigated the network defense tasks of cyber analysts [12], and a series of studies analyzed the tasks of log analysts [45, 46]. Many of the studies report that they had to make compromises regarding the scope and choice of methods because of restricted access. Several also stated that they would have conducted GDTAs if they had gained sufficient access to do so [12, 13]. GDTA is a recognized method for establishing an in-depth understanding of SA processes in new contexts [16]. There exists one reference and partial results from an unpublished GDTA for cyber defenders conducted in 2010 by Connors et.al. [47]. Apart from this to the best of the author’s knowledge, no complete GDTAs that investigate the SA processes in SOCs have been published.

3 Methodology

The research setting is described in Sect.  3.1. Further, the methodology of this study consists of two parts. The first part of the study is a GDTA conducted according to existing guidelines described in Sect. 3.2. The second is an additional analysis of the variation of how goals are achieved by the SOC during incidents described in Sect. 3.3. The presentation of the methodology is concluded by the analysis of its limitations in Sect. 3.4.

3.1 Research setting

This study was conducted over one year in a SOC operating within Norwegian critical infrastructure. The SOC was responsible for network management and cyber security for large customers within the energy and manufacturing domains. This included tasks of monitoring networks and security systems as well as responding to incidents on a 24-h basis. Over 30 operators were employed at the SOC having varying degrees of experience ranging from 1 to over 15 + years. Their roles ranged in level of responsibility and content e.g., security operator, network operator, network technician, operations coordinator, security executive, technical executive, and SOC director. Nevertheless, they all were counted as SOC operators with overlapping tasks regarding incident response.

The SOC had one main location serving critical infrastructures distributed geographically at a national scale. The main location had one large operations control (OC) room with 8 workstations each with several monitors and a wall of larger monitors in view from all workstations. There was also one smaller OC room similarly configured but with fewer stations for operations coordination. The location also consisted of conference rooms applied with retractable workstations. These were used for incident response and allowed groups to discuss incidents while seamlessly continuing their workstation processes. Apart from this, there were 15 offices with one or two workstations each. Only the SOC employees and additional necessary staff had access to the SOC facilities. Research access to the SOC was ensured by employment as a researcher in the organization with the necessary security clearances to discuss the operators’ work in depth and observe their work in situ.

All respondents in the study gave informed consent, and all information revealed in this article was reviewed and risk-assessed regarding unwanted disclosure by the SOC, before publication.

3.2 GDTA method

The first part of the study was a GDTA conducted following recognized guidelines [16]. This method can be described as an extensive qualitative process for establishing an in-depth understanding of SA processes within a new context [47]. GDTA method is comprised of a sequential series of semi-structured interviews with subject matter experts (SMEs). The process is described as an iterative one where the results from one interview are implemented into the preliminary GDTA and used as a base for the next interview. The GDTA can also be complemented with reviews of documents describing routines or documenting previous relevant events [16]. 8 steps are prescribed in the guidelines for conducting GDTAs that were followed throughout this first part of the study [30]. These 8 steps are presented in Fig. 2.

Fig. 2
figure 2

The steps of the GDTA method [30]

The method of GDTA gains most of its empirical data from interviews with SMEs. Within the guidelines of the GDTA method, there are only general descriptions of how the interviews are to be conducted [16, 30, 48]. In this study, the interviews were conducted with aimed methodological variations as described in Table 1. Empirical data gathered from the interviews were complemented by reviews of documents and reports as well as observation of work within the SOC. Each of the conducted steps of the GDTA is presented in Table 1, including what questions were asked, the number of respondents, and what data was collected.

Table 1 Overview of methods used, and data gathered in the GDTA

Step 1 was initiated by a review of all relevant work descriptions and manuals for the SOC operators. This included role specifications and security guidelines as well as routines for incident escalation, incident management, and external communication. Then a series of 3 one-hour long unstructured interviews were conducted with the SOC’s director in Step 2. These interviews were not audio-recorded but notes regarding the preliminary goal hierarchy were made and discussed during the interviews. Choosing the SOC director as the respondent in this step was done to gain the best overview of the SOC’s goals. The SOC director had 15 + years of experience. In Step 3 observations in the main OC room of the SOC were conducted. These observations were not made during the escalated incident response but during regular monitoring and planned work. This allowed the observation to consist of informal probing regarding how the operators worked and the systems they used. The data gathered in Steps 1–3 was used to develop a preliminary goal hierarchy. This goal hierarchy was used together with all data gathered in Steps 1–3 to make a preliminary GDTA in Step 4.

An interview guide was made based on recommended GDTA guidelines [16] and this was used during the interviews in Step 5. Step 5 was conducted as six semi-structured interviews with SOC operators. The interviews lasted a total of over 6 h and were audio-recorded to ensure maximal information retention. Choosing respondents in this step was done to cover a wide array of different expertise areas within the SOC. The six respondents had different roles in the SOC, and their experience in their roles ranged from 3 to 10 years. Three of the respondents worked as network operators, while the other 3 worked as security operators. The respondents’ responsibilities had some overlap, but they all had different specialist areas e.g., Intrusion Detection and Prevention Systems (IDPSs), Security Information and Event Management Systems (SIEMs), firewalls, network architecture, network diagnosis, and information security management. All the participants had considerable experience with incident response. The interview guide was gradually complemented by updated GDTAs which were used as a starting point for consecutive interviews. After each interview, the GDTA was updated based on the new information given by the respondents, as recommended by existing guidelines [30]. In Step 6 all the gathered data was reviewed and a revised GDTA was established.

Step 7 was conducted as two unstructured interviews with two of the SOC executives who were responsible for network and security operations respectively. The executives both had 15 + years of experience in their roles. Choosing respondents in this step was done to enable a revision of the whole GDTA based on experience in having an overview of operations as well as knowing the details involved. Step 7 also involved a revision of the GDTA based on feedback from the initial peer review of the reported results. The GDTA was finally validated in Step 8. This involved the observation of 3 real-time escalated incidents, two of which were network-related and one security-related. A comparison of the observations made and the revised GDTA was enabled by asking specific questions during the incident evaluation meetings. This confirmed the match between the established GDTA and the actual observed incident responses. Finally, a series of 3 interviews with a total of 4 respondents were conducted to validate the final GDTA. Two of the respondents in Step 8 had also been respondents in Step 5. In total the final GDTA was based on the review of 200 pages of documents, almost 15 h of interviews with 11 different SMEs, and over 25 h of in situ observation and conversations with the SOC operators.

3.3 Timeline method

In addition to the GDTA conducted as presented in Sect. 3.2, timelines describing variations in the prioritization of goals were developed. The timelines provide specific examples of how SA requirements are used to gain SA during incidents. This additional analysis is an original methodological step developed specifically for this study.

Using the developed GDTA as a basis, a review of 34 SOC reports from escalated incidents ranging back at most 3 years prior to the study was conducted. Different prioritizations of the goals in the GDTA during incidents were identified. This review resulted in a goal map that showed different possibilities regarding the order in which goals were performed. Then the different pathways through the goal map were compared with different types of incidents. The different types of identified incidents were exemplified by two realistic and illustrative timelines. This development of the goal map and timelines was also aided by many unstructured conversations with SOC operators throughout the study.

The goal map and the exemplified timelines were verified alongside the GDTA during Steps 7–8 in the GDTA method described in Sect. 3.2.

3.4 Methodological limitations

Although the study was performed following the prescribed method of GDTA [16, 30] still, it has some methodological limitations. As in many other studies, the empirical base could have been stronger if the number of respondents had been larger. In some studies, this is the case [49, 50], yet the empirical base of this study was complemented by an extensive review of documents and incident reports. Furthermore, the observation of work within the SOC and the additional observation during actual escalated incidents including debriefing, strengthens the findings of the study. When comparing this to the other available studies that have performed TAs in SOC environments, this level of access to participants and their working environment is unique.

The unique access to participants was ensured by the employment of the author as a researcher by the SOC. This might be unconventional and asks for some consideration. One could argue that the author would be biased by the employment of the SOC when doing research. Yet, it is still somewhat difficult to argue that employment would sway the findings of this study in any particular direction. The study aims to describe the SA processes present in the SOC and does not make any judgment on quality nor does it promote a specific approach that benefits the SOC in question. Further, employment was the only way to gain access to the respondents and their environment. This is not only a practical question but also a legal one. Even though it is important to consider the potential bias of the connection between the researcher and respondents, it is equally important to note that the study would not be possible at all without this connection. This is confirmed by the previous attempts at conducting GDTAs in such environments [12, 13].

4 Results

In this section, the results of the study are presented and connected to the defined research questions. First, the goals within the SOC are described and presented in a goal hierarchy; this answers the first research question. Second, decisions and SA requirements are described and presented in tables of decisions with corresponding ideal SA requirements; this answers the second research question. Third, variations of goal prioritization and different timelines through incidents are described and visually presented; this answers the third research question. When presenting the results related to the first two research questions, this is done according to the guidelines of GDTA [16]. Meanwhile the goal map and timelines are additional results developed specifically for explaining SA processes within this context, so these are presented following the format found most suitable. The results including the goal map and timelines are further used in Sect. 6 for discussing SA theory and levels of conceptualization within the context of SOCs.

4.1 Goals within the SOC

When investigating the goals of the SOC, one main goal was identified which is accomplished through a set of interconnected major goals and subgoals. The goal hierarchy is rich with partially conflicting subgoals which are negotiated based on the specifics of the situation. How the goals are prioritized varies based on the nature of the incidents and is also sometimes shifted throughout an incident based on the changing awareness of operators. Still, all the subgoals are completed before the SOC concludes the management of an incident. In Fig. 3 the goals are presented and categorized in the form of a goal hierarchy.

Fig. 3
figure 3

Goal hierarchy

The main goal of the SOC was operationalized and accomplished through the following 4 major goals:

  1. 1.

    Monitor, detect, and escalate incidents includes monitoring network status and security alerts. Potential incidents are identified and escalated to mobilize incident response from the SOC.

  2. 2.

    Mitigate incidents includes subgoals to assess damage potential and to implement mitigations to minimize damage caused by the incident. The mitigations are temporary and often adjusted according to the progression of the incident. The effective communication of mitigation is an important subgoal to render mitigations effective and minimize negative consequences.

  3. 3.

    Determine cause of incident includes the localization of the incident and the assessment of hypotheses regarding its cause. It is also important (subgoals) to verify the correctness of the identified causes and to communicate causes both internally in the SOC and to relevant stakeholders.

  4. 4.

    Re-establish secure system operation includes the implementation of necessary lasting changes to systems and communicating these effectively to reduce further vulnerabilities. Another important subgoal is to communicate the conclusions from the incident, to prevent future failures or security incidents.

The goals of the SOC are heavily interconnected, and their prioritization is situation-specific. Usually, it is the completion of goal 1.2 Escalate incident and communicate it effectively that triggers the other goals in the hierarchy. The mitigation and identification of causes are often intertwined, and the goals are met through iterative processes where the partial completion of one goal serves as an SA requirement for another goal. One example of this is when a preliminary coarse-grained topological localization of a security breach triggers the isolation of a large portion of a network as a mitigation measure. Consecutive fine-grained localizations of the incident serve as updated SA requirements leading to moderated isolations in the network. Other goals are more loosely connected. Subgoal 4.3 Communicate incident conclusions provides information to the decision-making on other goals in the future. Likewise, the identification of incident causes might trigger temporary changes to the goals of escalating incidents by heightened alertness or focused monitoring. The goal hierarchy presented in Fig. 3 is therefore best described as a general presentation of the SOC’s goals. The specific prioritization of goals varies and is better understood when exemplified by the timelines presented in Sect. 4.3.

4.2 Decisions and SA requirements

The results from the GDTA include a mapping of all decisions related to the goals in the goal hierarchy. For these decisions, ideal SA requirements were identified and categorized based on their level of SA. The study identified a total of 15 decisions related to the subgoals of the operator tasks during incidents. The number of decisions relating to each subgoal ranged from one decision to at most 3 decisions per subgoal. Table 2 presents the identified decisions the operators need to make during the completion of the subgoals presented in Fig. 3.

Table 2 Goals and decisions for operators during incidents

Each of the decisions had several SA requirements related to them, although the number of ideal SA requirements related to one decision ranged substantially. In total 136 unique SA requirements related to the 15 decisions. Several of them are duplicated and serve as requirements for more than one decision. 83 unique level 1 SA requirements were identified including 13 groups of information that were often used together, that are presented as callouts as recommended in GDTA guidelines [16]. One example is the callout named Network Management System (NMS) alerts consisting of Alert type, Alert severity, and Node name. Table 3 presents all the unique level 1 SA requirements, and the grouped callouts numbered 1–13. Table 3 also presents a thematic categorization of the type of information the requirements consisted of.

Table 3 Level 1 SA requirements categorized by type of information

All the SA requirements were also differentiated into the 3 levels of SA. The requirements on level 1 as presented in Table 3, provide the perceived information that is further used to gain comprehension and projection according to the 3-level model of SA [13]. The categorization of requirements on SA levels is presented in Table 4.

Table 4 SA requirements categorized by SA level

A complete overview of decisions and SA requirements is presented in Table 5. The table presents all the identified SA requirements related to each decision. The SA requirements are also categorized by SA level. Many of the decisions are interconnected through the SA requirements. In addition, one decision may be dependent on another in unpredictable ways. One example is the connection between the 3 decisions: 3.2.2 What is the verified cause of the incident?, 2.1.1 What is the damage potential of the network incident?, and 2.2.1 How should incident be mitigated? In some incidents, one must first verify the cause of the incident (3.2.2) to mitigate (2.2.1), and only later recognize the true damage potential (2.1.1). In other incidents, the damage potential (2.1.1) is assessed first to determine the mitigation (2.2.1), and the actual cause of the incident (3.2.2) is only discovered later. Such connections between decisions and differences between incidents are investigated further in Sect. 4.3.

Table 5 Goals, decisions, and SA requirements

4.3 Goal map and timelines

The identified decisions and SA requirements demonstrate the complex nature of SOC operator tasks. This complexity is partly due to the interconnected nature of goals and consequently, the decisions associated with each goal. In order to present the complexity identified in the GDTA, an additional analysis of incident timelines is presented in this section. Although this is not required in the GDTA method, this author believes that it will provide a more complete understanding of the SA mechanisms present during incidents in SOCs.

It was pointed out in many of the interviews that the prioritization of goals was context-dependent. Upon further questioning regarding the specifics of this variation, the study sampled different possible paths through the completion of the 4 identified major goals and their subgoals. The sampled paths were then compared with the reviewed incident reports. This confirmed the presence of some assumed paths and complemented the developed map with some additional pathways. Finally, a complete map of possible pathways through the goals was validated. This resulted in the goal map presented in Fig. 4. The coloring of the goals in Fig. 4 matches that of Fig. 3, to facilitate comparison.

Fig. 4
figure 4

Goal map

As the goal map in Fig. 4 shows, there are several possible paths available for completing the subgoals during an incident. The incident response is in most cases initiated by completing the goal of 1. Identify and escalate incidents. Going forward from this, there are several different possibilities for how to prioritize consecutive goals. 2. Mitigate incidents and 3. Determine cause of incident is often an effort done in tandem where loops of iterative goal completion are necessary. Some clear patterns in the choice of pathways between different types of incidents were identified in this study. Although a complete explanation of contextual considerations behind the choice of pathway is beyond the scope of this study, some of the identified patterns are described and presented.

The most prevalent pattern regarding the choice of goal paths was related to differences between network and security incidents. Network incidents are here defined as incidents causing parts of the network to be unavailable because of physical, technical, or logical failures. Security incidents are on the other hand defined as incidents causing confidentiality or security attributes within the network to be threatened or compromised. When reviewing the incident reports, it was identified that in network incidents the identification of cause was prioritized before mitigation. The interviews revealed that this was because it often was impossible to perform mitigation in network incidents before the cause was identified. This was not the case in security incidents where mitigation often was prioritized early in the incident and modified in tandem with the development of SA regarding the cause.

This main difference in goal paths is further explained by presenting two different timelines. One timeline presents a realistic network incident (Fig. 5), and one presents a realistic security incident (Fig. 6). The timelines show why and how the completion of goals is prioritized differently and consequently why decisions are made in different orders between the timelines. Validation of the realism of timelines was established during Steps 7–8 in the GDTA method described in Sect. 3.2. The timelines were found to be representative of the two types of incidents by the respondents.

Fig. 5
figure 5figure 5

Network incident timeline

Fig. 6
figure 6figure 6

Security incident timeline

Relations between goals, decisions, and SA requirements were explained in more detail by the timelines. They show how not all identified SA requirements for a given decision are relevant in all incidents. In Table 5 decisions and SA requirements are presented following existing guidelines for the GDTA method. Table 5 thus presents the totality of SA requirements relevant across all types of incidents. In contrast, the timelines in Fig. 5 and Fig. 6 only present the relevant SA requirements in the given situations. Therefore, the timelines give a complementary understanding of how SA requirements may vary from incident to incident. Specific SA requirements related to each decision are presented throughout the exemplified incidents.

To facilitate comparison between figures and tables, the decisions in the timelines are colored according to the overview of goals in Fig. 3. Furthermore, the path taken through the goal map of Fig. 4 is indicated at the top of each timeline contributing to such comparison.

When analyzing these two timelines we see examples of reasons why goals are prioritized differently between incidents. We see that in a typical network incident, the partial assessment of the cause (marked in orange) is prioritized early in the incident response. In a typical security incident, the mitigation (marked in red) is prioritized earlier and before this assessment of the cause. There is a simple explanation for this prioritization: In many network incidents, mitigation is only possible after the cause is identified. In many security incidents, it is the other way around i.e., mitigation must be done early to maintain security attributes, but also because the investigation of cause only can be performed in a safe setting. The results are discussed further in Sect. 5.

5 Discussion

As explained in the introduction, a complete GDTA within SOCs for critical infrastructure has not been conducted before. It is therefore important to consider if this study shows that GDTAs are possible and relevant to conduct in this context. How one can understand the SA from the human operators’ perspective is still an open question within the field of Cyber SA [11]. The results of this study show that the goal hierarchy of SOC incident response is possible to identify and that it is comparable to goal hierarchies identified in other fields [48,49,50]. This points toward asserting that SOC environments, such as the one studied here, are comparable to environments where SA research has proven useful before. Furthermore, this points towards the conclusion that such SOC environments are compatible with the established methods of assessing SA for human operators.

When we consider the goal hierarchy identified in this study as presented in Fig. 3, it becomes clear that the operational scope of the SOC investigated here might be larger than the typical internal SOC within an organization or SOC services for hire [51]. Other SOCs are often exclusively focused on the cyber security aspects [52]. The SOC investigated in this study was responsible for responding to both network incidents and security incidents. During the interviews, the respondents clearly stated that the combination of responsibility for both aspects was inherently necessary because the SOC served critical infrastructure. They argued that if these two aspects are not considered in tandem, one cannot respond adequately to incidents within critical infrastructure.

The main goal of the SOC was to “Keep systems operative and secure”. This two-part goal of keeping the systems both operative and secure highlights the complicated negotiation of SOC operator goals. These two considerations were not always aligned. Keeping systems secure will sometimes demand reducing their operative status and vice versa. For SOCs in critical infrastructure, the availability of networks and services can in given circumstances be of the highest priority. One example is the operation of networks that control physical processes in critical infrastructures like power generation or manufacturing. The loss of availability can potentially be more severe than a security breach leading to the disclosure of sensitive information. The SOC operators must therefore constantly negotiate between keeping the networks operative and secure. Often one aspect is given priority first and then the other is prioritized at a later point in time. The tension between these aspects demonstrates the complexity and time-sensitive nature of SOC operators’ tasks within critical infrastructure.

Another interesting aspect of the results was how flexibly the SA requirements were used. As explained in the results many SA requirements served several decisions, but in different ways. One example is the SA requirements of the callout 4 Contextual documentation of affected systems. At an early stage of an incident, these SA requirements serve to gain an understanding of how severe the situation is. This information is crucial for being able to project the situation into the future in subgoal 2.1 specified by Projected damage potential of network incident and Projected damage potential of security incident as level 3 requirements. Meanwhile, when the secure operation of systems is re-established through changes in the system in subgoal 4.1, this documentation mostly serves SA perception regarding the status quo of the system. At this stage, it is the requirements in the callout 13 State-of-the-art descriptions that are mostly necessary to project into the future.

The flexibility in the use of SA requirements points toward another feature of the SOC operator that is not common in other settings. Throughout the study, it became clear that SOC operators often tweak and self-develop the interface of their available information. Many of the operators were accomplished programmers and several of the tools they used to gain information were developed in-house or adapted based on need. One such example was how they used the Network Management System (NMS). During an observed incident caused by network overload, the operators quickly scripted a customized query that identified network patterns similar to the one at hand. Based on this information they projected the probable peak of the network load. Based on this projection they scripted a stepwise and timed denial of specified network services which would cause minimal disruption of availability. The available NMS tool was not developed with this in mind, but the operators extended the potential use of the tool to meet their SA requirements.

Many of the respondents argued that experience was the most important factor in handling the task complexity. They did confirm that the SA requirements at level 1 and 2 were necessary. But often when discussing level 3 SA they argued that experienced intuition was the most important requirement. One example was the great challenge of triaging the available information to recognize information of relevance. The author’s first assumption was that the benefit of experience was the ability to filter through a large amount of information in a short amount of time. Based on the respondents’ answers this was not the case. They argued that experience helped in knowing where to look for the right information based on situational context and ignoring large parts of the total available information. This challenges the aspect of ideal SA requirements. In one way the ideal SA requirement across incidents is to have all possible information available, but in the specific case, one only wants the relevant information available. There is an interesting parallel finding in research on human intelligence and cognitive performance. The neuro efficiency theory shows that higher intelligence or cognitive performance is not associated with more activity in the brain, but rather less. This finding is valid across tasks where training improves performance [53].

When considering the results from the GDTA it is useful to compare the findings to previous research analyzing tasks in SOC settings. Previous TAs have investigated SA for operators handling network incidents [12] and security incidents [13, 43, 45, 46]. Their findings align with the findings of this study in many aspects. Two common findings are the importance of experience and collaboration between team members. The respondents in this study all worked in flexible teams assembled based on the need for expertise in each specific incident. According to the respondents, this maximized the positive effect of experience across different incidents. In other studies, these flexible roles seem less common in operation centers that exclusively focus on cyber security than those responsible for network incidents. If we take the goal map and timelines into account, we can start to reflect on why this is the case. In network incidents, the cause of the incident is of immediate and critical importance. The response involves a fast generation of cause hypotheses and consequently pruning of possible causes by targeted information gathering. This somewhat creative process can in many security incidents be done later without strict time constraints through digital forensics [54]. Such delay of cause verification will arguably also allow for more specialized operator roles. A strong argument against this specialist approach is pointed out by this study. One cannot always sacrifice availability for confidentiality or integrity in critical infrastructure.

The two timelines presented in Figs. 5 and 6 highlight the differences between network incidents and security incidents. The interviews revealed that in many incidents there were complex combinations of network and security aspects. One illustrative example was incidents related to firewalls for critical networks which had some aspects of IDPS as integrated functions. When the SOC experienced such components being unavailable, the situation had characteristics of both a network incident and a security incident. The balancing of availability against security aspects was often highly complex and challenging. A network-focused mitigation would be to circumvent the faulty firewall, but this would compromise the security enabled by the firewall. In such incidents, the operators had to consider the possibility of this device being taken down by an adversary with the goal of a response that would open an unprotected link into the network in question. The rise of Advanced Persistent Threats [55] including such targeted network attacks is yet another argument for combining the responsibilities of both network and security aspects for SOCs in critical infrastructure. Recent research has argued that such integration of network and security operations centers is beneficial [52].

In an earlier review of SA research in SOCs, it is shown that there is some theoretical mismatch between the goals of the research and its theoretical underpinnings. Much of the existing research is aimed at automating SA processes in SOCs while the referenced theoretical framework of Endsley disagrees with this approach [9]. The additional analysis of SA processes performed in this study, resulting in the goal map and timelines, may be used as a basis for investigating the compatibility of the different theoretical frameworks for SA within this context. Furthermore, the goal map and the timelines help the understanding of what processes might match different theories and operationalizations of SA in SOCs. A complete analysis of the compatibility of different models of SA with different parts and paths through the goal map is outside the scope of this article. Still, a preliminary reflection regarding such an analysis is made here.

  1. 1.

    Monitor, detect, and escalate incidents is marked in blue in Figs. 4, 5 and 6. One can identify that many of these SA processes may be good candidates for automation. It is often the information systems themselves that present the operator with potential incidents, and the consideration of escalation is to a large degree rule-based. When considering theoretical models of SA, the dependence on alerts from systems during this first stage of the incident response could indicate that the systemic perspective of Distributed SA would be a good match [19]. This perspective is also well aligned with the aim of developing more autonomous systems. The identification of potential security incidents is often done by systems like IDPSs which are inherently prone to a large degree of false positives. A rule-based automation of escalation therefore requires the integration of the information in the callouts of 4 Contextual documentation of affected systems and 5 Connected services documentation. One could imagine an effective system based on Artificial Intelligence (AI) that would serve such a function. With the current technology [56] one would need a large set of relevant training data. Such a dataset must connect alerts from existing systems like IDPSs with contextual documentation and include verified escalation interpretations. The lack of availability of such datasets, especially within critical infrastructure, might prove a significant barrier against establishing such AI systems. There is one notable exception to the match with a systemic approach to SA regarding the decision 1.2.1 How should escalation be communicated? This decision would arguably be better approached with the Shared SA perspective at the group level [9] because it tries to achieve a common understanding of the specifics of the incident for the involved parties.

  2. 2.

    Mitigate incidents is marked in red in Figs. 4, 5 and 6. There seems to be a need for several SA models to explain the processes. 2.1.1 What is the damage potential of the network incident? and 2.1.2 What is the damage potential of the security incident? can partially be understood through Distributed SA [19]. The level 2 requirements of Determined technical impact of network incident and Determined technical impact of security incident would arguably be possible to automate because they mostly rely on the synthesis of information already present in the technical systems. Determined contextual impact of network incident, Determined impact of external factors, and Determined contextual impact of security incident require much more human involvement. Here the shared SA model is not beneficial because different operators and stakeholders understand different aspects of the contextual damage potential. One can therefore argue that an aggregate approach of individual SA through Team SA would be most fitting [15]. The same argument can be made for the two decisions under 2.2 Determine mitigation to minimize damage. This subgoal also involves negotiation between stakeholders regarding the positive and negative consequences of the mitigations themselves. Such negotiations demand the consideration of conflicting viewpoints which indicate Team SA as the right approach. 2.3.1 How should mitigation be communicated? would benefit from the Shared SA perspective [15] following the same logic as explained regarding decision 1.2.1.

  3. 3.

    Determine cause of incident is marked in orange in Figs. 4, 5 and 6. This process also includes a variety of SA mechanisms best explained by different SA models. 3.1 Localize incident could be automated to a large degree and thus benefit from the Distributed SA model [19]. 3.2.1 What are potential causes of the incident? is a creative collaborative effort best understood through the Team SA model [15]. Within decision 3.2.2 What is the verified cause of the incident?, the level 2 SA requirement of Assessed verification method of incident cause entails the goal of a common understanding of how to verify the cause which points towards the Shared SA model [15]. Determined verification of cause on the other hand is in the interviews explained as processes performed individually through prioritized delegation. Here the classic individual SA model of Endsley [18] would be most fitting. 3.3.1 How can the cause be communicated effectively? suggests the Shared SA model [15].

  4. 4.

    Re-establish secure system operation marked in green in Figs. 4, 5 and 6, would probably prove difficult to automate. Within this part of the goal map, the Shared SA model seems most appropriate maybe except 4.1.1 What system changes should be done? which could benefit from the more diverse SA approach of Team SA [15].

The reflections regarding the differentiated use of models for understanding SA processes in SOCs should be investigated further. Such an investigation should include measurements of SA within the context of SOCs. This has until now proven difficult [11], but as this study provides the necessary in-depth understanding of SA processes in SOCs, such research will now be possible to conduct. In further research, one could perform SA measurements of different parts of the goal map while emphasizing the different approaches of the SA models. If this research shows that different processes of the goal map benefit measurably by the different SA models this can lead to improved performance through both automation and human operator performance. This would also form the basis for a synthesis of SA approaches in SOC environments and contribute to bridging the knowledge gap of Cyber SA [11]. If such further research is successful, it can ultimately contribute to the synthesis of opposing SA theories in general [20].

6 Conclusions

This study is the first to conduct a GDTA in SOCs for critical infrastructure. This was done to gain an in-depth understanding of the SA processes in the SOC throughout incidents. Following the prescribed methods, the study completed a GDTA by conducting a targeted set of unstructured and semi-structured interviews as well as an extensive review of documents. In addition, the GDTA was aided by in situ observation of the work within the SOC. This was further complemented by an analysis of different types of incidents and how they resulted in different prioritizations of goals and decisions during the incidents. Different pathways through the goal hierarchy were identified based on the review of 34 reports of previously escalated incidents and were validated alongside the GDTA.

The results of the GDTA showed that the goal hierarchy consisted of 4 major goals and 11 subgoals. The 11 subgoals consisted of 15 different decisions that had a total of 136 unique ideal SA requirements related to them. A categorization of the 89 level 1 SA requirements was also conducted based on what types of information they contained like logged information on systems/operations, information from sensor/analytical technology, and description of routines or requirements. All SA requirements were categorized regarding which level of SA they served. A complete overview of the ideal SA requirements for each of the decisions in the goal hierarchy was presented. This gives a complete overview of the SA requirements relevant to incident response in a SOC for critical infrastructure.

It became clear throughout the study that the SA processes during the handling of incidents are complex and highly dependent on context. Therefore, the GDTA was complemented by a goal map showing different identified paths through the goal hierarchy and two exemplified types of incidents. Two main types of incidents were identified, namely network incidents and security incidents. These had different patterns of moving through the goal hierarchy and had somewhat different SA requirements associated with them. The additional analysis gives an even more in-depth understanding of SA mechanisms within SOCs for critical infrastructure.

The main contribution of this study is the conducted and presented GDTA. This is a unique contribution to closing the knowledge gap regarding Cyber SA [5] and enabling the direct measurement of SA in SOC environments. Moreover, the goal map and timelines provide a foundation for further research into how different SA processes might best be understood by different SA models. This study provides a discussion of how some of the SA processes may be automated, and which SA processes might best be understood and facilitated by different existing SA theories and models. This shows the potential of explaining SOC incident response through a synthesis of different SA models. Different models can explain SA mechanisms in different parts of the goal map of SOCs for critical infrastructure. Such synthesis of SA models could focus research regarding what processes to automate, and what processes to optimize for human performance. This study can thus be a steppingstone in a coordinated effort to improve both human performance and automated processes within SOCs.