1 Introduction

A security assurance case (a.k.a. security case, or SAC) is a structured set of arguments that are supported by material evidence and can be used to reason about the security posture of a software system. SACs represent an emerging trend in the secure development of critical systems, especially in domains like automotive and healthcare. The adoption of security cases in these industries is compelled by the recent introduction of standards and legislation. For instance, the upcoming standard ISO/SAE 21434 on “Road Vehicles—Cybersecurity Engineering” includes the explicit requirement to create ‘cybersecurity cases’ to show that a vehicle’s computing infrastructure is secure.

The creation of a security case, however, is far from trivial, especially for large organizations with complex product development structures. For instance, some technical choices about the security case might require a change of the development process. The security case shown in Fig. 1 (and discussed in Section 2), e.g., requires that a thorough threat analysis is conducted throughout the product structure and at different stages of the development. If this analysis is not yet created during development, either a thorough re-organization of the way of working is necessary or the security case should have been structured in a different way. Also, the construction of a security case often requires the collaboration of several stakeholders in the organization, e.g., to ensure that all the necessary evidence is collected from the software and process artifacts.

Fig. 1
figure 1

An example of a SAC

Companies are thus facing the conundrum of making both urgent and challenging decisions concerning the adoption of SACs. In order to facilitate such an endeavor, this paper presents a systematic literature review (SLR) of research papers on security cases. It summarises academic research which has published a relatively large number of papers on the topic in recent years and therefore provides practitioners an overview of the state of the art. To the best of our knowledge, this is the first study of this kind in this field. This SLR collects most relevant resources (51 papers) and presents their analysis according to a rich set of attributes like, the types of argumentation structures that are proposed in the literature (threat identification—used in Fig. 1—being one option), the maturity of the existing approaches, the ease of adoption, the availability of tool support, and so on.

Ultimately, this paper presents a reading guide geared towards practitioners. To this aim, we have created a workflow describing the suggested activities that are involved in the adoption of security cases. Each stage of the workflow is annotated with a suggested reading list, which refers to the papers included in this SLR. We remark that the SLR also represents a useful tool for academics to identify research gaps and opportunities, which are discussed in this paper as well.

The rest of the paper is structured as follows. In Section 2 we provide some background on assurance cases and discuss the related work. In Section 3 we describe the research questions and the methodology of this study. In Section 4 we list the papers included in this study and present the results of the analysis. In Section 5 we present a workflow for SAC creation and a reading guide for practitioners who want to adopt them. In Section 6 we further discuss the results and the lessons learnt from them. In Section 7 we discuss the threats to validity of this study. Finally, Section 8 presents the concluding remarks.

2 Background and Related Work

In this section, we first present background information about SACs, their main elements and their application areas as well as a simple example of a SAC. Afterwards, we discuss the related work.

2.1 Assurance Cases

Assurance cases are defined by the GSN standard (Group 2011) as “A reasoned and compelling argument, supported by a body of evidence, that a system, service or organisation will operate as intended for a defined application in a defined environment.” Assurance cases can be documented in either textual or graphical forms. Figure 1 depicts a very simple example of an assurance case and its two main parts, i.e., the argument and the evidence. The case in the figure follows the GSN notation (Spriggs 2012), and consists of the following nodes: claim (also called goal), context, strategy, assumption (also called justification), and evidence (also called solution). At the top of the case, there is usually a high-level claim, which is broken down to sub-claims based on certain strategies. The claims specify the goals we want to assure in the case, e.g., that a certain system is secure. An example of a strategy is to break down a claim based on different security attributes. Claims are broken down iteratively until we reach a point where evidence can be assigned to justify the claims/sub-claims. Examples of evidence are test results, monitoring reports, and code review reports. The assumptions made while applying the strategies, e.g., that all relevant threats have been identified, are made explicit using the assumption nodes. Finally, the context of the claims is also explicitly defined in the context nodes. An example of a context is the definition of an acceptably secure system.

Assurance cases have been widely used for safety-critical systems in multiple domains (Bloomfield and Bishop 2010). An example is the automotive industry, where safety cases have been used for demonstrating compliance with the functional safety standard ISO 26262 (Palin et al. 2011; Birch et al. 2013; International Organization for Standardization 2011). However, there is an increasing interest in using these cases for security as well. For instance, the upcoming automotive standard ISO 21434 (International Organization for Standardization and Society of Automotive Engineers 2018) explicitly requires the creation of cyber-security arguments. SACs are a special type of assurance cases where the claims are about the security of the system in question, and the body of evidence justifies the security claims.

2.2 Related Work

To the best of our knowledge this study is the first systematic literature review on SACs. However, there have been studies covering the literature on safety assurance cases.

Nair et al. (2013) conducted a systematic literature review to classify artefacts which can be considered as safety evidence. The researchers contributed with a taxonomy of the evidence, and listed the most frequent evidence types referred to in literature. The results of the study show that the structure of safety evidence is mostly induced by the argumentation and that the assessment of the evidence is done in a qualitative manner in the majority of cases in contrast to quantitative assessment. Finally, the researchers list eight challenges related to safety evidence. The creation of safety cases was the second most mentioned one in literature according to the study. In our study, we focus on security rather than safety cases. We also review approaches for creating complete assurance cases, meaning that we look into both the argumentation and the evidence parts, in contrast to the study of Nair et al. (2013) which focuses on the evidence part only.

Maksimov et al. (2018) contributed with a systematic literature review of assurance case tools, and an extended study which focuses on assurance case assessment techniques (Maksimov et al. 2019). The researchers list 37 tools that have been developed in the past two decades and an analysis of their functionalities. The study also includes an evaluation of the reported tools on multiple aspects, such as creation support, maintenance, assessment, and reporting. In our study, we also review supporting tools for the creation of assurance cases, but we focus on the reported tools specifically for SAC.

Gade and Deshpande (2015) conducted a literature review of assurance-driven software design. The researchers provide a review of 15 research papers with an explanation of the techniques and methodologies each of these papers provide with regards to assurance-driven software design. This work intersects with our work in that assurance-driven software design can be used as a methodology or approach for creating assurance cases. However, unlike Gade et al. our study focuses on SAC, and is done in a systematic way.

Ankrum and Kromholz (2005) created a non-deterministic workflow for developing a structured assurance case. However, the proposed flow does not include anything related to tools or patterns usage. It does not consider the preliminary stage of considering a SAC either. Cyra and Gorski (2007) present the life-cycle, derivation procedure, and application process for a trust case template. All these artifacts, however, build on the argumentation strategy being derived from a standard, which is not always the case.

3 Research Method

We conducted a systematic literature review following the guidelines introduced by Kitchenham et al. (2007).

3.1 Research Questions and Assessment Criteria

This study aims at answering the following four research questions.

[RQ1] RATIONALE—In the literature, what rationale is provided to support the adoption of SAC?

In particular, we are interested in whether there are statements that go beyond the intuitive rationale of using SAC “for security assurance”. For instance, our initial research (Mohamad et al. 2020) indicated that compliance with security standards and regulations is also an important driver. As shown in Table 1, to answer this research question we analyze the surveyed papers and extract two characteristics:

Table 1 Assessment criteria for RQ1 (rationale)
  • Motivation, i.e., the reason for using SACs as stated by the researchers. We used two criteria for determining whether a certain study provides a motivation for using SAC. That is, the wording has to be explicit (i.e., there must be a reference to usage or advantage) and specific (i.e., providing some details).

  • Usage scenario, i.e., scenarios in which SAC could be used to achieve additional goals, next to security assurance. We used the same criteria (explicit and specific mention) used for the motivation.

[RQ2] CONSTRUCTION—In the literature, which approaches are reported for the construction of SACs and which aspects do the approaches cover?

This question aims at inventorying the existing approaches for creating SAC, which is a challenging task for adopters. As shown in Table 2, we also assess the coverage of the approach, i.e., whether it can be used for creating the argumentation, for collecting the evidence, or both. Finally, for each covered part of the SAC, we summarise the approach with respect to the suggested argumentation strategy and the types of evidence to be used in creating SACs.

Table 2 Assessment criteria for RQ2 (construction)

[RQ3] SUPPORT—In the literature, what practical support is offered to facilitate the adoption of SAC?

The purpose of this question is to understand the practicalities of creating and working with SAC. With reference to Table 3, first we study the approaches and identify the conditions (i.e., prerequisites) that have to be met in order for the outcome of the paper to be applicable. Second, we check whether the papers propose libraries of patterns or templatized SAC, as these are extremely useful for non-expert adopters. Third, we analyze the tool support. We check whether the paper suggests the usage of a tool for any of the activities related to SAC. In case it does, we extract the description of that tool, and whether it was created by the researchers or if it is a third party tool used in the paper. The last characteristic in this research question is the notation used to represent the SAC. The most common ones are GSN (Spriggs 2012), and CAE (Adelard 1998), but there are other notations such as plain text.

Table 3 Assessment criteria for RQ3 (support)

[RQ4] ALIDATION—In the literature, what evidence is provided concerning the validity of the reported approaches?

Our interest is to understand how the approaches and usage scenarios of SAC are validated (or supported by evidence). With reference to Table 4, we aim at identifying:

  • The type of validation conducted in the study, e.g., case study, or experiment. Note that ‘case study’ is a widely used term to refer to worked examples (Easterbrook et al. 2008; Runeson and Höst 2009). In this work, we consider a validation conducted in an industrial context to be a case study (Yin et al 2003), and those done within a research context to be illustrations. Experiments are studies in which independent variables are manipulated to test their effect on dependent variables (Easterbrook et al. 2008).

  • The domain (i.e., application area) in which the validation is conducted.

  • The source of the data used for the validation, e.g., a research project or a commercial product.

  • Whether or not a SAC is created as part of the validation process.

  • In case a SAC is created, we look for its creators. This characteristic has three possible values: academic authors, authors with industrial background, or third-party experts.

  • The validators, i.e., the parties that conducted the validation with values the same as for creators.

Table 4 Assessment criteria for RQ4 (validation)

3.2 Performing the Systematic Review

We performed a search for papers related to SAC by means of 3 scientific search engines: IEEE Xplore, ACM Digital Library, and Elsevier Scopus. We selected these libraries, and did not include Google Scholar as, in our own prior experience which was confirmed by a preliminary search, the results from this search engine overlap with the results of the engines we mentioned.

3.2.1 Constructing the Search String

To maximize the chance of obtaining all relevant papers in the field, the search string used in the search engines must contain keywords that are commonly used in said papers. Therefore, prior to constructing the search string, we familiarized ourselves with the specific terminology used by researchers in the field of SAC. To do so, we conducted a manual search for papers related to SAC that were published in the past five years in the following venues: SAFECOMP, CCS, SecDev, ESSOS, ISSRE, ARES, S&P, Asia CCS, and ESORICS. The selection of the venues was based on their high visibility in the security domain.

Next, we created the search string for the selected libraries to identify papers that are potentially relevant for this study. In particular, we used two groups of keywords. The first group (line 1 below) is meant to scope the area of the study, while the second group (lines 2–4) included the terms referring to the parts of an assurance case. As a result, we formed the search string as follows:

figure d

As a quality check for our search string, we ensured that we would find three relevant, known studies (Finnegan and McCaffery 2014a; Ben Othmane et al. 2014; Xu et al. 2017) with the search string. This was to make sure that our search string would return all three relevant studies, hence confirming its validity. We ran the query in IEEE Xplore and confirmed that the papers were returned.

3.2.2 Inclusion and Exclusion Criteria

The inclusion and exclusion criteria are shown in Table 5. This list has been created and fine-tuned by means of a calibration exercise involving two authors. We have invested significant time in performing an initial search of papers (prior to the systematic search) and discussing what papers should be included / excluded and why. This calibration made the application of this criteria straightforward later on, when filtering the results of the systematic search (as discussed below). The inclusion criteria are rather straightforward, considering the nature of this SLR. Concerning the exclusion criteria, we have decided to only consider studies written in English language, as this is the common language among the authors of this SLR. Further, SAC have been the focus of research only in recent times (although assurance cases, in general, have been around for much longer) and the field is rapidly evolving. Hence, we restricted our SLR to the past 15 years to avoid outdated results. We also excluded short papers, as answering our research questions requires studies with results rather than only ideas. Finally, exclusion criteria 4–8 exclude studies that focus on topics that are marginally related to SAC but would not help us answer our research questions.

Table 5 Inclusion and exclusion criteria

3.2.3 Searching and Filtering the Results

We executed the query on three libraries (IEEE Xplore, ACM Digital Library, and Scopus) in January 2019, and got the results shown in Table 6. In the case of Scopus, we limited the search to the domains of computer science and engineering. Also, because of the high number of returned results from Scopus, we decided to limit the included studies to the first 2000 after ordering the results based on relevance. We believe that the considered studies were sufficient, as the last 200 papers of the retained set from Scopus (i.e., papers 1801–2000) were all excluded when we applied the first filtering round (see below).

Table 6 Number of included studies after each round of filtration

Afterthe systematic search had been applied, one author, who has been working in industrial projects about SAC with multiple partners in multiple domains (automotive and medical), performed an initial filtering (based on the inclusion and exclusion criteria) and tracked their confidence (high, medium and low) with each included / excluded paper. For the cases of medium to low confidence we held a series of meetings after each filtration round, where the three authors jointly discussed whether such papers should be included / excluded.

In the first filtering round, we applied the inclusion and exclusion criteria to the titles and keywords of all results (8440 papers). As shown in Table 6 this round reduced the number of studies to 211 papers. In the second filtering round, we applied the inclusion and exclusion criteriaFootnote 1 to the abstracts and conclusions of the 211 remaining studies. After this step, the number of studies was reduced to 49. In the last filtering round, we fully read the remaining 49 papers, applied the inclusion and exclusion criteria on the whole text, and ended up with 43 included studies.

We also looked at the references mentioned by the included papers and performed backward snowballing (Wohlin 2014). In this step, we did not restrict the search to only peer-reviewed studies in order to allow for potential gray literature to be included. This resulted in additional 7 papers (including 2 technical reports) being included in our review. We looked into the references of these 7 papers, but this did not result in the inclusion of additional papers and we terminated the snowballing.

Finally, the authors kept monitoring the literature on the topic of SAC after the search was performed. This led to the inclusion of one additional paper, which is accessible through Scopus. In total, we thus included 51 studies.

3.3 Analysis of the Included Papers

Once the final list of included studies was ready, we started the analysis phase. This was done in an iterative manner, where one author would use the infrastructure provided in Tables 12, and 4 to prepare the analysis of a batch of papers (approximately 10 at a time). The outcome is then discussed in a group of the three authors as a means of quality control and calibration for the next batch.

4 Results

In this section, we provide a descriptive analysis of the included papers in this SLR, and then present the results and answers to our four research questions.

4.1 Descriptive Statistics

Figure 2 shows the years when our 51 included studies were published. The graph shows a peak of 10 publications in 2015, which indicates an increase in interest in the research field compared to previous years, especially the time between 2005 and 2012 where the number of publications was three or less each year. We decided to exclude the studies from 2019 in Fig. 2, as our search was conducted in that year and thus, results would necessarily be only partial. Including the results from that year would thys give a false indication of the trend compared to previous years.

Fig. 2
figure 2

Publication year of the included studies

Figure 3 shows the venues where the included studies were published. The graph shows that most of the publications were in conferences and workshops (18 and 17 respectively). 13 of the papers were published in journals, and three were technical reports.

Fig. 3
figure 3

Types of publication of the included studies

We also looked into the authors of the selected papers to find the portion of the papers with at least one author from industry. We found that less than 25% (12 papers) (Cockram and Lautieri 2007; Goodger et al. 2012; Netkachova et al. 2015; Netkachova and Bloomfield 2016; Xu et al. 2017; Gacek et al. 2014; Rodes et al. 2014; Bloomfield et al. 2017; Netkachova et al. 2014; Gallo and Dahab 2015; Cheah et al. 2018; Ionita et al. 2017) included at least one author from industry.

To get an overview of the quality of the papers, we looked at the ranking of the venues for both conference and journal publications. We used CORE (), which has search portals for conferences and journals. The site gives the following ranking categories: A*—flagship venue in the discipline, A—Excellent venue, B—Good venue, and C— Other ranked venue. The ranking is based on the ERA ranking process (Australian Research Council 2018). For journals that were not ranked in Core (8 studies), we compared their impact factors to similar journals listed in CORE and assigned a ranking accordingly. Figure 4 shows the rankings of the venues that could be found in the portal’s database. The column NA refers to conferences that were not found in the database.

Fig. 4
figure 4

Ranking of the venues of included journal papers according to the Core ranking portal

4.2 RQ1: Motivation

In order to find the rational reported in literature for the adoption of SAC, we looked into motivations and usage scenarios, as explained in Section 3.1. Some of the identified motivations in RQ1 could also be seen as usage scenarios. For example, compliance with standards and regulation could be seen as a motivation for using SAC, but also as a purpose for which SAC could be used.

4.2.1 Motivation

In the literature, papers often refer to the use of SAC as a means to build security assurance, which is a generic (and rather obvious) motivation. Instead, we looked for more specific motivations. In some of the papers, the motivation was made explicit in a separate section, or as the focus of the whole study (e.g. Knight 2015; Alexander et al. 2011). However, in most papers, this was briefly discussed either in the introduction and background sections, or as a part of motivating the used or suggested approach for creating SAC. If a study discusses only the generic SAC benefits, or is not being specific about the motivation (e.g., states that SAC provide security assurance in general), then we have categorised this paper as one that does not discuss any motivations for using SAC.

Table 7 shows all motivations found in our 51 sources. The results show that about 73% of the studies included at least one motivation for using SAC.

Table 7 RQ1—Papers stating the motivations for using SAC

Categorizing the motivations resulted in the following categories:

  • External forces: Compliance with standards and regulation (9 mentions), and compliance with requirements in case of suppliers (4 mentions).

  • Process improvement: SAC helps in integrating security assurance with the development process (6 mentions). Moreover, they help factoring work per work items, and analyzing complex systems (2 mentions).

  • Structure and documentation: The structure of SAC implies a way of work that reduces technical risks, and enhances security communication among stakeholders (7 mentions).

  • Security assessment: SAC help in assessing security and spotting weaknesses in security for the systems in question (6 mentions). Hence, they help building confidence in the those systems (3 mentions).

  • Knowledge transfer: It is a proven approach in safety which has been used effectively for a long time, and could be similarly in security (5 mentions).

4.2.2 Usage Scenarios

While SACs are usually used to establish evidence-based security assurance for a given system, researchers have reported cases where SAC could be used to achieve different goals. We looked into studies that focus on using SAC for a purpose other than security assurance, or for a purpose that is specific to a certain domain (e.g., security assurance for medical devices) or context (e.g., security assurance within the agile framework).

Table 8 shows the usage scenarios of SAC found in literature. We were able to extract usage scenarios from 14 different papers (28% of the total number of papers). The usage scenarios we found show a wide range of applications of SAC. Seven of the papers suggest using SAC for evaluating different parts of the system or its surroundings. For five papers, the use of SAC can be categorised as providing process and life-cycle support. One paper suggests to use SAC to communicate between organisations involved in developing and using medical devices and one paper uses SAC to teach students about information security.

Table 8 RQ1—Papers relevant to understanding the usage scenarios

4.3 RQ2: Approaches

We were able to find 26 different approaches in the literature. These studies focus on creating either one part of SACs (argumentation or evidence) or both parts. Table 9 shows these approaches, which part/s of SAC they cover, which argumentation strategies they use to divide the claims and create the arguments, and the evidence used to justify the claims in the approaches. We categorize the approaches as follows:

  • Integrating SAC in the development life-cycle: These approaches suggest mapping the SAC creation activities to the development activities to integrate SACs in the development and security processes (Agudo et al. 2009; Ben Othmane et al. 2014; Ray and Cleaveland 2015; Vivas et al. 2011), as well as assurance case driven design (Sklyar and Kharchenko 2016, 2017a, b, 2019). In general, these approaches suggest that the different stages of software development (requirements, design, implementation, and deployment) correspond to different abstraction levels of the security claims that can be made on the system. The hierarchical structure of SAC makes it possible to document these claims at every development stage as well as the dependencies to claims in the later or earlier stages (Vivas et al. 2011). This also applies to incremental development, e.g., using the SCRUM method (Ben Othmane et al. 2014). Updating SACs during the development life-cycle is, however, essential for these approaches to work. Hence, conducting these updates has to be included as a mandatory activity in the security life-cycle of the system under development (Sklyar and Kharchenko 2016).

  • Using different types of AC for security: These approaches suggest using different types of assurance cases other than SAC for security assurance. These types are: (i) trust cases, which are based on assurance cases templates derived from the requirements of security standards (Cyra and Gorski 2007); (ii) trustworthiness cases, which focus mainly on addressing users’ trust requirements (Górski et al. 2012; Mohammadi et al. 2018); and (iii) combined safety and security cases (Cockram and Lautieri 2007). This approach combines safety and security principles to create assurance cases with the main goal of achieving acceptable safety. The resulting cases have separate top claims for safety and security followed by separate argumentation; (iv) dynamic assurance cases (Calinescu et al. 2017), an approach for generating arguments and evidence based on run-time patterns for the assurance cases of self-adaptive systems; (v) multiple viewpoint assurance cases where security is treated as an assurance viewpoint (Sljivo and Gallina 2016). The approach suggests to reuse AC artefacts by building multiple-viewpoint AC using contracts, and introduces an algorithm for a model transformation from a contract meta model into an argumentation meta model; and (vi) dependability cases with focus on security (Patu and Yamamoto 2013a).

    Table 9 RQ2—Papers presenting approaches to construct SAC (A: Argument, E: Evidence, TR: Test Results, TVA: Threat and Vulnerability Analysis, CA: Code Analysis, BA: Bug Analysis, PA: Security Standards and Policies, RA: Risk Analysis, LA: Log Analysis, PD: Process Document, SA: Security Awareness and Training)
  • Documenting and visualizing SAC: These studies give guidelines of how to document a SAC, and visualize it (Poreddy and Corns 2011; Coffey et al. 2014; Weinstock et al. 2007). In this category there are papers that focus on a specific part of SAC. These are:

    Argumentation-centric: These approaches focus on the argumentation part of the SACs. Different strategies are suggested in literature: security standards-based argument (Finnegan and McCaffery 2014a, b; Ankrum and Kromholz 2005), and satisfaction argument (Haley et al. 2005). Structures of argumentation found in literature are: model-based (Hawkins et al. 2015), and layered structure (Netkachova et al. 2015; Xu et al. 2017). Moreover, we have one study which suggests an automatic creation of argument graphs (Tippenhauer et al. 2014). As we can see, there is a variety of argumentation strategies used in these approaches, which shows that SAC arguments can be flexible and fit for most security artefacts present at organizations. However, this is not necessarily a positive characteristic when applied in industry, as it might result in heterogeneous SACs created in different parts of an organization. In consequence, it would be hard to apply quality metrics to the SACs and to combine SACs created for sub-systems. Hence, companies need to find a way to choose a suitable approach, but there is a lack of comparison of SAC creation approaches in literature, especially for different industries and in different contexts. This is further discussed in Section 6.2.

    Evidence-centric: These approaches focus mainly on different aspects of SACs’ evidence. These aspects are: searching for evidence (Chindamaikul et al. 2014), collecting and generating evidence (Shortt and Weber 2015; Lipson and Weinstock 2008), and rating of potential artifacts to be used as evidence (Cheah et al. 2018). We conclude that even though the approaches cover main evidence-related activities, i.e., searching, locating, and rating, there are still essential parts missing, which are for example: assigning the evidence to claims, storing the evidence, and updating it over time. Similar to the argumentation-centric studies, the evidence-centric ones need to be more focused on the contexts in which they are applicable. Apart from the work of Cheah et al. (2018) which is done in the automotive domain, there is no focus towards domain specific SAC evidence work. We discuss this further in Section 6.5

4.3.1 Coverage

As shown in Table 9, 16 of the found approaches cover the creation of SACs including both argument and evidence, six focus on argument, and the remaining four on evidence.

Five out of the 16 studies to create argument and evidence of security cases did not include any examples of evidence to justify the claims.

In general, the level of detail in the studies varies significantly. For example in the studies which cover the creation of argument and evidence of SAC, we found papers providing a very high-level description of both how to create them and what to use them for (Ray and Cleaveland 2015; Poreddy and Corns 2011), while other papers had very detailed descriptions of how to extract the claims and divide them to create the arguments. However, the latter is often related to a specific context, e.g., self-adaptive systems (Calinescu et al. 2017). We also observed that these studies focus significantly more on the argument part than the evidence part. This is further discussed in Section 6.5.

4.3.2 Argumentation

Argumentation is a very important part of SAC. The argumentation starts with a security claim, and continues as the claim is being broken down into sub-claims. The strategy is used to provide a means by which claims are broken down. Each level of the argumentation could be done with a specific strategy. Hence, one SAC might have one or more argumentation strategies as is the case in some of the included studies in this SLR, e.g., Agudo et al. (2009) and Mohammadi et al. (2018).

We looked for an explicit mention of the used strategy. If none was provided, we analysed the example cases to find the used argumentation strategy. Table 9 shows the approaches we found in literature with the respective argumentation strategies used in each of them.

When regarding argumentation strategies in the context of the different approaches, we could not find any correlation between the two. For instance, different approaches which integrate SAC within the development life-cycle use different argumentation strategies (e.g., requirements Agudo et al. 2009 and development phases Ray and Cleaveland 2015). The most common strategy depends on the output of a threat, vulnerability, asset or risk analysis (8 papers) (Cockram and Lautieri 2007; Coffey et al. 2014; Cyra and Gorski 2007; Mohammadi et al. 2018; Patu and Yamamoto 2013a; Vivas et al. 2011; Xu et al. 2017; Weinstock et al. 2007). Other popular strategies are breaking down the claims based on the requirements or more specifically quality requirements and even more specifically security requirements (5 papers) (Agudo et al. 2009; Calinescu et al. 2017; Haley et al. 2005; Netkachova et al. 2015; Sklyar and Kharchenko 2017b), and arguing based on security properties, e.g., confidentiality, integrity and availability (5 papers) (Chindamaikul et al. 2014; Finnegan and McCaffery 2014a; Mohammadi et al. 2018; Poreddy and Corns 2011; Sklyar and Kharchenko 2017b). Additionally, researchers also used system and security goals (4 papers) (Agudo et al. 2009; Ben Othmane et al. 2014; Mohammadi et al. 2018; Tippenhauer et al. 2014), software components or features (3 papers) (Agudo et al. 2009; Hawkins et al. 2015; Sklyar and Kharchenko 2017b), security standards and principles (2 papers) (Ankrum and Kromholz 2005; Sljivo and Gallina 2016), pre-defined argumentation model (1 paper) (Górski et al. 2012), and development life-cycle phases (1 paper) (Ray and Cleaveland 2015).

4.3.3 Evidence

Even though evidence is a very important and complex part of SAC, only four of 26 included approaches focused on it. Even in the approaches which cover argument and evidence of SACs, there was a much deeper focus on the argumentation than the evidence, which explains why five out of these did not even include an example of what evidence would look like. We found evidence either by looking for explicit mentions in the articles or by extracting the evidence part from the reported SACs. Table 9 shows the approaches we found in literature with the respective evidence types used in each of them.

The most common types of evidence reported in literature are test results (TR) (12 papers) (Ben Othmane and Ali 2016; Calinescu et al. 2017; Cheah et al. 2018; Chindamaikul et al. 2014; Lipson and Weinstock 2008; Poreddy and Corns 2011; Shortt and Weber 2015; Sklyar and Kharchenko 2016, 2017a, b, 2019; Sljivo and Gallina 2016) and different types of analysis. These analysis include threat and vulnerability (TVA) (Cockram and Lautieri 2007; Finnegan and McCaffery 2014a, b, Patu and Yamamoto 2013a), code (CA) and bug (BA) (Chindamaikul et al.2014; Ben Othmane and Ali 2016; Sklyar and Kharchenko 2016, 2017a, b, 2019), security standards and policies (PA) (Agudo et al. 2009; Netkachova et al. 2015), risk (RA) (Mohammadi et al. 2018), and log analysis (LA) (Mohammadi et al. 2018; Patu and Yamamoto 2013a). Cheah et al. (2018) present a classification of security test results using security severity ratings. This classification can be included in the security evaluation, which may be used to improve the selection of evidence when creating SACs. Chindamaikul et al. (2014) investigate how information retrieval techniques, and formal concept analysis can be used to find security evidence in a document corpus. Shortt and Weber (2015) present a method to apply fuzz testing to support the creation of evidence for SACs.

Other types of evidence reported in literature include process documents (PD) (Lipson and Weinstock 2008), design techniques (DT) (Mohammadi et al. 2018), and security awareness and training (SA) (Patu and Yamamoto 2013a; Lipson and Weinstock 2008; Weinstock et al. 2007). Lipson and Weinstock (2008) describe how to understand, gather, and generate multiple kinds of evidence that can contribute to building SAC.

4.4 RQ3: Support

In this section, we list our results from reviewing the practical support to facilitate the adoption of SAC reported in literature. Specifically, we report on the tools used to assist in any of the SAC activities, e.g., creation and maintenance, the prerequisites of the approaches, and patterns for creating SAC.

4.4.1 Tools

We found 16 software tools which have been used one way or another in the creation of SAC in literature. Seven of the found tools were created by researchers. Four of these seven target assurance cases in general (Fung et al. 2018; Gacek et al. 2014; Hawkins et al. 2015; Tippenhauer et al. 2014), while the remaining three are created to be used in the creation of SAC specifically (Ben Othmane and Ali 2016; Cheah et al. 2018; Shortt and Weber 2015). Table 10 shows the tools and the respective studies in which they are used. A brief description of the main functionalities of the tools, as well as whether the tools are created or used by the authors are also presented. There are four main types of reported tools. In the following, we list the tools of each type, and we discuss the main features of each tool as reported in the studies:

  • Creation tools: used to create and document assurance cases in general.

  • Argumentation tools: focus mainly on the creation of the argumentation part of SAC.

  • Evidence tools: focus on the creation of SAC evidence.

  • Support tools: several studies reported supporting tools to assist the creators of SAC in the analysis needed for creating them, e.g., by helping users determine the relevance of a given document to be used as evidence (Chindamaikul et al. 2014).

Table 10 RQ3—Tools supporting the creation, documentation, and visualization of SAC (U: Used, C: Created)

4.4.2 Prerequisites

Prerequisites are the conditions that need to be met before an approach presented in a study can be applied. We found prerequisites in the included studies by checking the inputs of the proposed outcomes (approaches, usage scenarios, tools, and patterns). If an input is not a part of the outcome itself, we considered it to be a prerequisite to that outcome. Table 11 shows the prerequisites we found along with the respective type of study for each. There are 17 reported prerequisites. The majority belong to approaches (11) (Chindamaikul et al. 2014; Cockram and Lautieri 2007; Cyra and Gorski 2007; Hawkins et al. 2015; Patu and Yamamoto 2013a; Ankrum and Kromholz 2005; Cheah et al. 2018; Sljivo and Gallina 2016; Tippenhauer et al. 2014; Vivas et al. 2011; Xu et al. 2017) while the remaining ones belong to usage scenarios (2) (Bloomfield et al. 2017; Goodger et al. 2012), patterns (2) (Patu and Yamamoto 2013b; He and Johnson 2012), and tools (1) (Gacek et al. 2014). We categorize prerequisites as follows:

  • Usage of specific format (Gacek et al. 2014; Hawkins et al. 2015; Sljivo and Gallina 2016): In this category, studies require the use of artefacts which have specific formats to achieve the purpose of the study.

  • Usage of specific documents and repositories (Chindamaikul et al. 2014; Cockram and Lautieri 2007; He and Johnson 2012; Patu and Yamamoto 2013a; Tippenhauer et al. 2014; Vivas et al. 2011): The studies in this category use specific repositories and documents for retrieving required data for building or using SAC.

  • Usage of security standards (Ankrum and Kromholz 2005; Cyra and Gorski 2007): The studies in this category require the use of security standards to create SAC or make use of them.

  • Existence of analysis and modelling (Cheah et al. 2018; Goodger et al. 2012; Patu and Yamamoto 2013b; Xu et al. 2017): The studies in this category require the existence or performing certain analysis and models to achieve their purpose.

  • Existence of special expertise (Bloomfield et al. 2017): The one study in this category relies on expertise provided by an external safety regulator.

Table 11 RQ3—Papers discussing the prerequisites of SAC approaches, usage scenarios, and tools

4.4.3 Patterns

Reoccurring claims and arguments in SAC can be subsumed in patters. They can save the creators of SACs a lot of time and effort. We found ten studies which deal with patterns. Six of these create their own argumentation patterns (Finnegan and McCaffery 2014a, b; He and Johnson 2012; Patu and Yamamoto 2013b; Poreddy and Corns 2011; Xu et al. 2017). The remaining four include usage of patterns (Hawkins et al. 2015; Tippenhauer et al. 2014), a guideline for creating and documenting security case patterns (Weinstock et al. 2007), and a catalogue of security and safety case patterns (Taguchi et al. 2014). Since we we only considered patterns created and used for SAC, we excluded those studies in which patterns are borrowed from the safety domain, e.g., Calinescu et al. (2017).

Table 12 shows the studies that deal with SAC patterns. While the created patterns cover an important aspect, namely abstraction, it is not clear how re-usable or generalize-able they are. Some patterns are derived from various security standards, e.g., Finnegan and McCaffery (2014a) and Taguchi et al. (2014) (these are usually from the medical domain where security standardization is more mature compared to other security-critical domains), and one from lessons learned from security incidents (He and Johnson 2012), but none is derived from previous applications of SAC in industry. Another observation we made is that the patterns focus heavily on the argumentation part of SAC in contrast to the evidence part. Only few studies provided examples of evidence that can be used in a given pattern (Poreddy and Corns 2011; Taguchi et al. 2014; Weinstock et al. 2007). However, these examples are specific to the context of the studies, and leaves the abstraction to the reader, with the notable exception of the examples provided by Weinstock et al. (2007).

Table 12 RQ3—Papers presenting patterns

4.4.4 Notations

Out of 51 studies, 41 specify at least one notation to be used for expressing and documenting a SAC. Table 13 shows the number of studies that use each notation, and lists them. The most common notation is the Goal Structure Notation (GSN) (Spriggs 2012) which is suggested by 27 studies. Another popular notation is the Claim Argument Evidence (CAE) (Adelard 1998) notation which is suggested by nine studies. Other notations are: text (6 studies), concept maps (Coffey et al. 2014) (1), and Claim-Argument-Evidence Criteria (CAEC) (Netkachova and Bloomfield 2016; Netkachova et al. 2014, 2015) notation which is extension of the CAE notation (3 studies of the same authors).

Table 13 Studies which use each notation

4.5 RQ4: Validation

We consider validation to be the process to show that an approach or tool for creating SAC works in practice or that an SAC can actually be used for a suggested usage scenario. In case validation is performed in a selected study, we looked for the type of validation, the domain of application, the source of data, whether a SAC is created during the validation, the creators of the SACs, and who performed the validation.

Table 14 shows these different aspects for the 36 studies which include a validation of the outcome. The majority of the outcomes were validated using illustrative cases (21), 11 were validated using case studies, and the remaining four used experiments (3) and observation as a part of an Action Design Research (ADR) (Sein et al. 2011) study.

Table 14 RQ4—Papers presenting a form of validation

The data sources vary among the validations, as can be seen in Table 14. We categorize these sources into three main categories:

  • Research, open source, and in-house projects (20) (Ankrum and Kromholz 2005; Chindamaikul et al. 2014; Cockram and Lautieri 2007; Coffey et al. 2014; Gacek et al. 2014; Haley et al. 2005; Hawkins et al. 2015; Mohammadi et al. 2018; Netkachova et al. 2015; Patu and Yamamoto 2013a; Poreddy and Corns 2011; Ray and Cleaveland 2015; Rodes et al. 2014; Shortt and Weber 2015; Sklyar and Kharchenko 2019; Sljivo and Gallina 2016; Strielkina et al. 2018; Tippenhauer et al. 2014; Vivas et al. 2011; Gallo and Dahab 2015)

  • Commercial products / systems (9) (Ben Othmane and Ali 2016; Ben Othmane et al. 2014; Calinescu et al. 2017; Cheah et al. 2018; Goodger et al. 2012; Górski et al. 2012; Masumoto et al. 2013; Xu et al. 2017; Netkachova et al. 2014)

  • Standards, regulation, and technical reports (7) (Bloomfield et al. 2017; Cyra and Gorski 2007; Finnegan and McCaffery 2014b; Fung et al. 2018; Graydon and Kelly 2013; He and Johnson 2012; Sklyar and Kharchenko 2017b)

SACs were presented in 31 out of the 36 validations. Representing a complete SAC is mostly not possible even in small illustrative cases due to the amount of information required to build one. However, how much of an SAC is represented in the included validations varies to a large extent. Some validations present an example of a full branch of SAC, i.e., a claim all the way from top to evidence (e.g., He and Johnson 2012), while others present very brief examples of SACs (e.g., Gallo and Dahab 2015).

Table 14 also shows who created the SACs in each study. In only two cases, experts were used to create the SACs. In the majority of the studies (28), the authors created the SACs. However, eight of the studies included authors from industry. These are shown in Table 14 as “Authors*” in the Creators column.

Table 14 also shows the domains in which the validation was conducted. The most common domains are Software Engineering (7) and Medical (7).

The last column in Table 14 shows the persons which performed the validation in each study. Out of the 36 included validations, only five used third parties to validate the outcomes. These were industrial partners in two cases (Ben Othmane and Ali 2016; Cheah et al. 2018), an external regulator (Bloomfield et al. 2017), one security expert (Coffey et al. 2014), and a group of security experts (Finnegan and McCaffery 2014b). In the remaining 31 validations, the authors performed the validation. However, eight of the studies included authors from industry. These are shown in Table 14 as “Authors*” in the Validator column.

5 SAC Creation Workflow

Based on the results of this systematic literature review, we have found that the outcomes described in the literature fall into one or more parts of the workflow depicted in Fig. 5.

Fig. 5
figure 5

Flowchart of SAC creation

We also realised that there is agreement in the literature that SAC are to be created in a top-down manner. This means that one starts from a top-claim which represents a high-level security goal and work their way through strategies and sub-claims all the way to the evidence. We have not seen approaches that, e.g., start from the existing evidence of a certain system and constructs claims out of them in a bottom-up fashion. However, this agreement is not expressed in sufficient level of detail in any one paper yet.

Hence, we have synthesised the existing knowledge into a generic workflow for the construction of SAC. Even though the literature might have some gaps and fallacies, this workflow is useful as a contextual learning guide for the readers to familiarize themselves with the different aspects of SAC creation.

There are five main blocks in the workflow. We will list and describe them in the remainder of this subsection. Additionally, Table 15 lists recommended papers from our systematic literature review for practitioner wanting to adopt SAC as well as researchers wanting to conduct studies in a specific area of SAC. The recommended papers focus on aspects related to the individual blocks and together provide a thorough investigation of each.

Table 15 Recommended reading material for each block of the SAC creation workflow

Study and Understand SAC

Building SACs is not trivial and requires significant effort. Hence, before going ahead and creating them, it is important to understand what they are and what they can be used for. This step includes studying the structure of SACs, their benefits, what needs to be in place to create them, and their potential usage scenarios, e.g., standards and regulation compliance. Block number 1 in Fig. 5 shows the corresponding entity in the workflow.

Argumentation

This block includes selecting the top claim to achieve, and the strategy to decompose this claim into sub-claims. This is a very important step, as selecting an argumentation strategy decides to a big extent which activities are needed to complete the SAC. For example, if a strategy using decomposition based on vulnerabilities is adopted, a vulnerability analysis of the system in question has to be conducted. A sub-block of the argumentation is the usage of patterns. Patterns help the creators of SACs to save time and effort by using pre-defined and proven structures. The creators could, however, decide not to use a pattern, and create their own unique structure if the situation requires that. A pattern is created based on the knowledge gathered while creating SACs. It is outside the scope of this workflow. However, this is discussed in the recommended papers in Table 15.

Evidence

This block includes locating, collecting, and assigning evidence to the claims of the SAC. In some cases, the evidence is not present when the SAC is being built; hence, they need to be created. In our workflow, this would be a part of the collect evidence activity. Moreover, these activities might be done in an iterative manner, including the assessment.

Assessment

This block focuses on assessing SACs. This is done to check the quality of the created SAC, and, e.g., to determine whether a claim needs extra evidence to reach a certain confidence level. Assessment starts after the claims have been identified and the evidence is assigned to the corresponding claims. The result of this step might require the creators of the SAC to go back to the point where they assess a claim and make a decision whether or not to further decompose it or assign evidence to it. Since there is a lack of studies that focus on quality assurance of SAC, we have recommended studies which include some metrics to help assessing SACs in Table 15.

Documentation

This block includes making a decision of whether or not to use a tool for modelling the argument and documenting the SAC. If a tool is used, then the notation to be used is limited to the one/s supported by the tool. If the documentation will be created manually, the creators will have the freedom to use an existing notation, extend one, or even create their own.

6 Discussion

In this section we discuss the main findings and insights we gathered while reading the papers included in this study. In summary the main observations are the following:

  • There are potential benefits of SAC adoption, but further investigation is need.

  • There is a rich variety of approaches, with room for improvement.

  • Knowledge transfer from the safety domain should take into consideration differences between safety and security.

  • There is a lack of quality assurance of the outcomes, which should be avoided in future studies.

  • There is imbalanced coverage in literature, which requires more academic research.

  • There is room for improvement when it comes to support, which requires companies or the open-source community to step up.

  • There is a lack of a mature guidelines for SAC adoption, which might require a standardization activity.

6.1 Potential for a Wide Range of Benefits

The literature is full of motivations for using SAC, as well as suggestions for where to use them, as our results of RQ1 show in Sections 4.2.1 and 4.2.2. We consider this to be a positive factor. However, our impression is that these motivations are on a high level and lack detailed studies to show how realistic and applicable they are. For example, many papers motivate the adoption of SAC as a way to establish compliance with regulation and standards without pinpointing the regulations’ and standards’ specific constraints for using SAC. Without having an in-depth knowledge of the specific regulation or standard, it is hard to determine whether SACs are explicitly required, or rather just recommended as a way to create a structured argument for security. Incidentally, in our own previous work we have tried to demystify this issue in the context of automotive systems (Mohamad et al. 2020).

Furthermore, some studies suggest that SACs can be used for evaluating the level of security of a system by assigning measurements to the elements of the cases, e.g., the evidence. However, these studies lack detailed guidelines of how to create these quantitative attributes. An example is the usage scenario that suggests to use attribute values for evaluating an architecture (Yamamoto 2015). The approach suggests to assign values to evidence, ranging from − 2 to 2, based on how satisfying the evidence are to the claims, i.e., to what degree the provided evidence justify the claims. However, there is no specific criteria for determining the attribute value, making the exercise subjective and nontransparent.

Another example is when SACs are suggested as tools to aid in information security education (Gallo and Dahab 2015). This very interesting concept is not supported by a discussion on the required level of detail in the SACs presented to students.

We believe that there is a substantial gap between the potential of SAC reported in literature and their application in industry. An obvious question to ask is: why are SACs not more widely adopted in industry even though there are so many motivations and usage scenarios for them in literature? It has already been shown that adopting SACs is non-trivial (Mohamad et al. 2020). It requires a substantial amount of effort and time, which grows as the systems become more complex. It also comes with many challenges, such as finding the right expertise to create them. Furthermore, the challenges do not stop at the creation of SACs, but are extended to updating, maintaining, and making them accessible at the right level of abstraction to the right users. We believe that these matters need to be addressed in studies that suggest the usage of SACs in different domains.

6.2 Wide Variety of Approaches

The literature includes a rich variety of studies which explore approaches for creating SACs, especially when it comes to the argumentation part, as shown in the results of RQ2 in Section 4.3. This gives organizations the possibility to choose those approaches that fit their way of working and the security artefacts they produce. For example, a company that works according to an agile methodology could choose to adopt an SAC approach for iterative development (Ben Othmane and Ali 2016). However, this choice has to consider constraints of the applicability of the approach, including benefits and challenges of its adoption. These aspects are not discussed in the literature and the burden is left to the adopter.

Another example is the question of conformance with different standards. While this has been discussed in literature, there is a lack of studies which systematically assess different approaches based on their ability to help achieving conformance with a certain standard. To generalize this, we observed that there is a lack of studies which compare different approaches in different contexts. In consequence, from an industrial perspective, organizations need to select suitable approaches in an exploratory way, which can be confusing.

The studies presenting new approaches also lack the discussion of the granularity level that is possible, or required to achieve using each approach. We believe that future studies should take into consideration the possible usages for SACs created using different approaches, and discuss the required granularity level based on that. For example, would a SAC created through the security assurance-driven software development approach (Vivas et al. 2011) be useful to companies which outsource parts of their development work to providers? In that case, on which level should these cases be created, e.g., on the feature level or on the level of the complete product?

Lastly, we believe that there is room for exploration of hybrid approaches which combine two or more of the approaches reported in literature. This becomes especially important when different approaches target individual parts of SAC, e.g., argumentation and evidence.

6.3 Security Might Differ from Safety

We have seen in many cases that the approaches presented in literature treat security and safety cases as the same, e.g., Chindamaikul et al. (2014), Graydon and Kelly (2013), Hawkins et al. (2015), Sljivo and Gallina (2016), Goodger et al. (2012), Fung et al. (2018), Ankrum and Kromholz (2005), Gacek et al. (2014) and Sklyar and Kharchenko (2017b, 2019). We believe that since assurance cases in general are mature in the safety domain and have been used for a long time, it is natural to consider the gained knowledge and transfer it into other domains, such as security. However, this knowledge transfer has to take into consideration the differences between safety and security, e.g., in terms of field maturity and nature. For example in safety, there is usually a wide access to information in contrast to security, where threat and risk analysis are considered sensitive information (Piètre-Cambacédès and Bouissou 2013).

Alexander et al. (2011) provide a discussion on the differences between safety and security both from theoretical, and practical aspects. Other studies combine security and safety assurance by creating combined arguments or security-informed safety arguments (Taguchi et al. 2014; Netkachova and Bloomfield 2016; Cockram and Lautieri 2007; Netkachova et al. 2015). We have also seen that some studies use different types of assurance cases to argue for security in Section 4.3. The results do not show any noticeable differences to SAC. This means that we were not able to find any special characteristics in the different types of ACs that distinguish them from SAC, when they are applied on security. However, the approaches for creating the argumentation part differ among the types according to their focus, e.g., trustworthiness and depend-ability.

6.4 Lack of Quality Assurance

Quality assurance is the weaker part of the literature reviewed in this study. We talk here about three main things. First is the quality of the outcomes when it comes to their applicability in practice. We have seen in the results of RQ4 in Section 4.5 that illustrative cases are often preferred over types of more empirically grounded validation. This indicates scarcity of industrial involvement. The reason might be a lack of interest, which contradicts with the reported motivations and usage scenarios, or simply because it is hard to get relevant data from industrial companies to validate the outcomes, as security-related data is considered to be sensitive (as we mentioned earlier). Furthermore, with the exception of a few cases, the creation and validation of SAC in literature is done by the authors of the studies. We believe that this contributes heavily to the lack of information addressing challenges and drawbacks of applying SACs in a practical context.

The second issue is the generalize-ability of the approaches with regards to their used argumentation strategies. The approaches we reviewed use a wide variety of argumentation strategies, e.g., based on threat analysis, requirements, or risk analysis. However, they lack validations and critical discussions as to whether the approaches work only with the used strategies or can use other strategies as well. We suggest to validate these approaches based on different types of strategies in future research.

The last point is the lack of mechanisms for building-in quality assurance within the SACs. We believe that it is essential for the argumentation provided in SACs to be complete in order for them to be useful. For that there needs to be a mechanism to actively assess the quality of the arguments to gain confidence in them. This is not addressed in literature apart from a few studies, e.g., Chindamaikul et al. (2014) and Rodes et al. (2014). Similarly, the evidence part also needs to be assessed. e.g., by introducing metrics to assess the extension to which a certain evidence justifies the claim it is assigned to. The inter-relation between claims and evidence need to be addressed. For example each claim can have a certain saturation level to be achieved, and each evidence provides a degree of saturation. Hence, it would be possible to assess whether the claim is fully satisfied or not by the assigned evidence.

6.5 Imbalance in Coverage

The coverage of matters related to SAC in literature is imbalanced to a large extent. When it comes to the approaches, our results in Section 4.3.1 indicate a tendency towards covering the argumentation part more than the evidence part. This indicates a weakness in the approaches, as elements of SAC cannot be evaluated in silos. For example, if we take an approach to create security arguments, how would we know which evidence to associate with these. Moreover, we will not be able to assess whether we actually reach an acceptable level of granularity for the claims to be justified by evidence. Same thing applies for the evidence part. If we only look at the evidence we will not be able to know which claims the suggested evidence can help justify. To be able to evaluate the evidence, they have to be put in context with the rest of the SAC. When reviewing the studies that focus on one element of SAC, we were not able to find any links to related studies focusing on the remaining elements, which indicates incompleteness of the approaches especially for putting them into practice.

When it comes to other areas, the assessment and quality assurance of SAC is rarely covered, as we discussed in the previous sub-section. Furthermore, there is a lack of studies covering what comes after the creation of SAC. In particular, for SAC to be useful, they have to be updated and maintained throughout the life-cycles of the products and systems they target, otherwise, they become obsolete (Mohamad et al. 2020). Particularly, there need to be traceable links between the created SACs and the artefacts of these products and systems. Many SAC approaches use GSN, which allows to reference external artefacts using the context and assumption nodes. However, these nodes are rarely exploited in the examples provided in the studies we reviewed.

6.6 Room for Support Improvement

The tools reported in literature cover activities related to AC, such as creation, documentation and visualization, as shown in the results of RQ3 in Section 4.4. Some of these tools have features such as the validation of AC based on consistency rules related to the used notation, or even user-specified rules (Adelard 2003). Other tools assist in the maintenance of AC through change impact analysis (Fung et al. 2018), and assessment of AC (G.U. of Technology 2010). When it comes to automatic creation of SAC, there were only coverage for the argumentation part (Tippenhauer et al. 2014; Hawkins et al. 2015), which reflects the imbalance in coverage we discussed earlier.

What we observed is that most of the tools are originally created for supporting safety cases, and not in particular SAC. As a consequence, they lack specific features which can be very helpful while building SAC. In particular, we note the fact that security assurance cases need to be treated as living documents (more so than their safety counterparts) due to a continually shifting threat landscape. For example, there is no tool that integrates with other security tools, e.g., an intrusion detection system, to actively update evidence. In general, we note that the tools lack integration with other systems, which agrees with what Maksimov et. al. reported in their study (Maksimov et al. 2018).

Moreover, even though some studies have reported the demonstration of created tools using a case study, e.g., Tippenhauer et al. (2014), it is not clear how flexible they are to be tailored for specific needs of a certain organization, and to be integrated with their tool-chain. We believe that in order for practitioners to use these tools, there needs to be a certain amount of confidence, which is absent due to little reported usage or replications in industry. The same thing applies for the reported patterns. For a specific artefact to be qualified as a pattern, it needs to be used in several studies and in several contexts, which is not the case. Additionally, as we discussed earlier, some important aspects, e.g., traceability is not covered in literature, and this is also the case when it comes to the reported supporting tools.

We also believe that there is room for creativity in the development of the tools. For instance, there are no supporting tools which use machine learning techniques to predict whether a requirement or test case qualifies to be a part of a SAC. This opens up opportunities for companies and the open-source community to step up and close the gap between the potential and the current support.

6.7 Need for a Guideline

Finally, we believe that there is a need for an explicit guideline for on-boarding a SAC-based approach in an industrial context. We believe that with the current level of maturity in related literature, companies which want to adopt SAC approaches have to account for a high cost, as they have to learn, experiment and develop a lot internally. This is due to the lack of reported validation and lessons learned from industry, but another sign is the lack of tool support specific for SAC (as mentioned above).

Standardization bodies are aware of the importance of SAC, as they are being mentioned as requirements in some security standards and best practice documents, e.g., the upcoming standard for cyber-security in automotive ISO21434 (International Organization for Standardization and Society of Automotive Engineers 2018). However, these standards do not provide any specific guideline or constraints for how SAC should be created and used. It is important that key players in selected domains (e.g., automotive and healthcare) put together efforts to standardize the scope and requirements related to SAC. We believe that this would elevate the maturity in the field.

7 Validity Threats

In this study, we consider the internal and external categories of validity threats as defined in Campbell and Stanley (2015), and described in Wohlin et al. (2012) and Kitchenham et al. (2007). The work of conducting the review was done by one researcher. This means that applying the inclusion / exclusion criteria in each of the four filtering rounds was done by one person. This imposes a risk of subjectivity, as well as a risk of missing results, which might have affected the internal validity of this study. To mitigate this, a preliminary list of known good papers was manually created and used for a sanity check of the selected and included papers. Additionally, a quality control was performed periodically by the other authors to check the included and excluded studies.

Restricting our search to three digital libraries could have increased the probability of the risk of missing relevant studies. This was mitigated by performing the snowballing search to search for papers that are not necessarily included in the databases of the three considered libraries.

Another threat to validity is publication bias (Kitchenham et al. 2007). This is due to the fact that studies with positive results are more likely to get published than those with negative results. This could compromise the conclusion validity of this SLR, as in our case we did not find any study that is, e.g., against using SAC, or which reported a failed validation of its outcome. In our study, we have partially mitigated this threat by also including a few technical reports (i.e., non peer-reviewed material). These papers have been identified as part of the snowballing, as we did not restrict to peer-reviewed papers.

External validity depends on the internal validity of the SLR (Kitchenham et al. 2007), as well as the external validity of the selected studies. We did scan gray literature to mitigate publication bias, but we excluded studies that are under 3 pages, and old studies as exclusion criteria to mitigate the risk of including studies with high external validity threats.

When it comes to the reliability of the study, we believe that any researcher with access to the used libraries will be able to reproduce the study, and get similar results plus additional results for the studies which get published after the work of this SLR is done.

8 Conclusion and Future Work

In this study, we conducted a systematic review of the literature on security assurance cases. We used three digital libraries as well as snowballing to find relevant studies. We included 51 studies as primary data points, and extracted the necessary data for the analysis.

The main findings of our study show that many usage scenarios for SAC are mentioned, and that several approaches for creating them are discussed. However, there is a clear gap between the usage scenarios and approaches, on one side, and their applicability in real world, on the other side, as the provided validations and tool support are far from being sufficient to match the level of ambition. Based on the results of this systematic literature review, we created a workflow for working with SAC, which is a useful tool for practitioners and also provides a guideline on how to approach the study of the literature, i.e., which paper is relevant in each stage of the workflow.

Based on our results and findings, in the future we will be working to close the gap between research and industry when it comes to applying security assurance cases. We will be looking into exact needs and challenges for these cases in specific domains, e.g., automotive. We believe that introducing SAC in large organizations needs appropriate planning to, e.g., find suitable roles for different tasks related to SAC, and integrating with current activities and way of working. Hence, we see a potential direction of future work in that area.

When it comes to the technical work, we believe that there is room for improvement in the approaches for SAC creation, especially when it comes to the evidence part. For instance, a possible future work direction is to look into ways to automatically locate, collect, and assign evidence to different claims.

Finally, we believe that quality assurance of SAC has not been addressed sufficiently in literature. As a future work, we will look into ways to ensure the completeness of a security case when it comes to the argumentation, as well as the confidence in how well the provided evidence justify these claims.