1 Introduction

In the digital age, security has become increasingly important, as more and more information is stored electronically and more systems are connected to the internet. The mass adoption of these systems has led to the need of ensuring security by regulations that impose security requirements. To address security, best practices suggest the adoption of threat modelling and risk analysis methodologies that allow the security administrator to obtain (in a preliminary way) information on the security problems from the early stages of the system life cycle. The choice of using threat modelling processes is also widely shared by the scientific community and is becoming increasingly widespread. To demonstrate the increasing proposals of threat modelling techniques, we produced a bar graph, shown in Fig. 1, that plots the number of papers related to threat model* from 1980 to 2022. The digital libraries used to produce it are IEEE, Springer, Scopus, and ACM. It is worth noting that results from Springer are filtered to the articles in Computer Science, written in English. It shows clearly how much the scientific literature has focused on threat modelling in recent decades.

Fig. 1
figure 1

Paper on threat model

The overall results, described in Table 1, claim that there are many different proposals in the literature.

Table 1 Results threat model*

In particular, this paper extended the conference work (Granata et al., 2022) and aims at providing a systematic analysis of automated threat modelling techniques and a comparison of open-source tools.

The main contributions of this work are (i) a literature review on automated threat modelling approaches and their supporting tools; (ii) a detailed description of the supporting tools; and (iii) a comparison carried out on a simple, but significant case study, involving a well-known content management system (CMS) platform: WordPress

It is worth noticing that the new contributions compared to the previous work are (i) the systematic analysis of the threat modelling methodologies; (ii) the selection of a lot of tools based on the criteria found in the previous analysis; and (iii) considering this selection, a more precise analysis was made on one tool supported by OWASP: PyTM.

The remainder of the paper is organized as follows: Sect. 3 summarizes the automated threat modelling techniques in a systematic way and also enumerates all the supporting tools; Section 4 describes the tools that we addressed in our analysis. Section 5 compares the tools, using the WordPress application as a basis for the comparison. Finally, Section 6 summarizes our conclusions and future work.

2 Related work

As already mentioned in the introduction section, over the years threat modelling has undergone many changes, and numerous approaches and tools have been developed in some surveys and research. Despite the wealth of literature, there is a need for a systematic and comprehensive review of the state of the art in threat modelling. A thorough evaluation of the literature would concentrate on the benefits and weaknesses of some methodologies as well as the associated tools. This review would also provide a clear overview of the existing gaps in the literature, allowing for a more focused and targeted approach to addressing these disparities. By taking a systematic and comprehensive approach to the existing literature, it will be possible to gain a better understanding of the current state of the art in threat modelling, and to identify areas for future research and development.

Existing surveys focus on methodologies and techniques aimed at modelling a software system and automatically deriving the threats and attacks. Some authors focused on analysing the most common techniques and the related and open-source tools. Hussain et al. (2014) focus on several threat modelling techniques, such as STRIDE, attack-tree, and CORAS, and related tools like Microsoft threat modelling and the CORAS tool. This work highlights the need for using formal modelling techniques and formal methods to verify the requirements. Even if they analysed both threat modelling techniques and the implementations, the analysis is not systematic since they focused on the most common models and tools. A similar study is conducted by Bernsmed et al. (2021) that conducted research on the usability of threat modelling techniques in industrial and academic contexts by focusing only on the most used modelling technique (i.e. data flow diagram) and using STRIDE classification and threat modelling automation. All the application scenarios have some difficulties in implementing threat modelling concepts in practice and suggest focusing on attacks using some automation techniques.

Some other authors instead took threat modelling techniques into account by applying them only in a specific context. As an example, Nweke and Wolthusen (2020) focus on asset-centric threat modelling techniques that aim to protect system components by producing threat models. They analyse techniques such as DREAD, Trike, and OCTAVE and their strengths and limitations. The authors also suggest that formal methods can improve the automation of threat modelling and security assessment processes. A specific study that takes into account the IoT domain was conducted by Mahak and Singh (2021) that analysed the existing threat modelling methodologies aimed at assessing all the threats an IoT infrastructure is affected by and evaluating the related risks through some risk analysis procedures. Omotunde and Ibrahim (2015) explain how threats are modelled in the design phase and present research activities that use techniques such as attack-specific modelling techniques and misuse case analysis. They suggest integrating threat modelling practices with these techniques. All the relevant surveys were applied to specific contexts, but the most complete study was conducted by Tatam et al. (2021). Their study discusses the limitations, strengths, and gaps in threat modelling processes for advanced persistent threats (APTs). The authors present a taxonomy of threat modelling approaches, including asset-centric, threat-centric, data-centric, and system-centric. They analyse each technique and suggest a hybrid approach that considers both threat modelling activities and operational models based on attack patterns. Despite the completeness of the taxonomy they propose, the authors have not focused on the tools that implement the methodologies described.

A few different works were carried out on the most common tools related to the existing methodologies.

Shi et al. (2021) provide an overview of threat modelling tools, dividing them into those based on diagrams, text-based models, and others, but they focus only on the most commonly used.

Unlike the latter, Tan and Garg’s (Tan and Garg) research study focuses on open-source tools that are recently committed, applicable to different domains, and have a machine-readable model. They label each tool with features such as complexity of logic, amenability of custom threats, operational usability, security functionality, and extensible for privacy, and assign a score for each feature to determine the most useful tool. To the best of the authors’ knowledge, there has been no complete threat modelling analysis carried out that provides a precise description of how threat modelling works are selected.

Despite the surveys in the literature regarding threat modelling techniques, as far as we know, the literature lacks a systematic study that takes into account both the threat modelling techniques and the tools that implement these techniques. For this reason, the proposed work aims to systematically produce an updated review of these techniques in different domains and to show the applicability across the use of some tools.

3 Systematic literature review

Automated threat modelling approaches have become widespread. In order to have a complete literature overview of the automated approaches used to model the systems and obtain all the possible threats it is affected by, we applied for a systematic literature review (SLR). The technique used, explained in Subsection 3.1, aims to answer some research questions synthetically, as many works have been carried out on this line of research in very different ways.

3.1 Methodology

The systematic literature review has been conducted using the method proposed by Kitchenham et al. (2009). According to the authors, the review process is divided into three parts: Planning, Conducting, and Reporting (as shown in Fig. 2).

Fig. 2
figure 2

Three-step SLR methodology

The Planning phase consists of developing a protocol used for searching articles from the sources, including and excluding papers from the overall results to answer some research questions. In the Conducting phase, we apply the rules developed in the protocol to obtain the list of the accepted papers suitable for answering research questions previously developed. The Reporting phase involves writing up the results of the review and circulating the results to potentially interested parties. The aim of conducting an SLR is both having a complete overview of the automated threat modelling approaches and enumerating the tools used to support the processes. We used the tool StArt, developed by LaPES (Fabbri et al. 2016), in order to support the systematic literature review. The tool was used only for supporting us to obtain the data of the paper, avoid duplicated papers that may be included in different digital libraries and collect data in order to produce charts. In the next subsections, we will show in detail the protocol used to perform the SLR and the results from the execution phase.

3.2 Planning

The first step of each SLR is the individuation of the need for the review. The aim of the article is to understand how widespread are automation techniques and which methodologies and tools are used. In order to have an exhaustive overview of these techniques, we formulate two different research questions:

  • RQ1: Which methodologies are used to produce a threat model automatically?

  • RQ2: Which open-source tools are used to support the automated threat modelling methodologies?

To answer these questions, we used a keyword-based approach to select the papers from which to obtain the data. The review keywords we choose are Systematic threat model*, Automated threat model*, Tool for threat model*, and tool supporting threat model*. The sources we choose as search engines are the most common ones: IEEE, Scopus, Springer, and ACM. Before searching papers from the sources, we applied a preliminary analysis in order to choose the right query and avoid not-relevant results.

We firstly present three different search approaches, shown in Table 2. Each search query has been performed on each source considering the abstract, the title, and the related keywords. It is important to take into account that all the results are from 2012 until now and, as for Table 1, Springer results are only articles in Computer Science written in English.

Table 2 Different search approaches

As shown by the high numbers, the first search approach is too generic since it can contain some non-relevant articles. The second approach instead contains some strict constraints on the query string (e.g. if it contains an “Automated Threat Model”), and we obtained few results; therefore, we discarded it. The third approach gives a reasonable number of papers that could be relevant to answer our research questions. For this reason, we used the third query as input to conduct the SLR. Chosen the search query, we also needed eligibility criteria to select the relevant works. The inclusion and exclusion criteria choice depends on the aim of the systematic literature review: having an overview of the automated threat modelling approaches used in literature and the existing supporting tools. For this reason, we accepted all the papers that:

  • Describes a fully or partially automated threat modelling methodology;

  • Presents a supporting tool aimed at automating threat modelling approaches.

The inclusion criteria are built considering both RQ1 and RQ2. We consider as automated threat modelling a process that does not require human effort to evaluate the system security (previously modelled). A methodology can also be partially automated when a few tasks are performed manually, but the effort is still minimum compared to the whole process. The exclusion criteria instead are the following:

  • The paper is not about a new threat modelling methodology;

  • Threat modelling is performed to a specific field;

  • The process is carried out manually;

  • The paper is not downloadable;

  • The language is not English.

Once the papers have been selected using the inclusion/exclusion criteria, some data can be extracted to answer the research questions. According to our protocol, we collected different data from each paper: (i) the technique used to model the architecture, (ii) the threat classification method used, (iii) the way their methodology and supporting tool select the threats from the model, and (iv) the supporting tool they provide. Using all these extraction fields defined in the protocol, we expect to have a complete overview of automated threat modelling and the related tool.

3.3 Conducting

The second phase of a systematic literature review is the identification and selection of relevant papers (i.e. the ones that meet the criteria described in the protocol) and the extraction of the relevant data (i.e. according to the data extraction fields provided in the protocol). We performed the third query described in Table 2 in all the selected sources. From the search, we obtained 465 papers as a result. As shown from the pie chart in Fig. 3, the most selected papers are from Scopus library (58%, 269 papers), followed by Springer (23%, 109 papers), IEEE (16%, 73 papers), and ACM with only 3% (14 papers).

Fig. 3
figure 3

Sources chart

The first step to being carried out when conducting an SLR is to apply the eligibility criteria by reading only the abstracts. Our approach relies on the easy to use of StArt tool that automatically takes the Bibtex files (automatically downloaded from each source engine) as input and then lists all the papers in an interactive view to obtain easily all the information needed (abstract, keywords, journal, etc.). Applying the inclusion and exclusion criteria to the abstracts, we reduced the number of papers from 465 to 115. StArt tool was useful to automatically remove the 103 duplicated papers coming from different sources, as shown in Fig. 4.

Fig. 4
figure 4

Selection result chart

Once the number of papers has been reduced by the reading of the abstracts, a deeper analysis has to be performed by carefully reading the accepted papers. Considering the same criteria used in the previous phase, but applied to the overall paper, only 55 papers met the inclusion criteria and have been used to extract the data. On the other hand, 9 papers have been classified as Duplicated (StArt was not able to recognize them) and 51 have been rejected as out of the SLR scope. Analysing the inclusion criteria from Fig. 5a, it is worth noting that most of them presented a new threat modelling methodology, but not a supporting tool.

Fig. 5
figure 5

Inclusion and exclusion criteria

Similarly, a bar chart has been produced by the StArt tool for all the rejected papers in Fig. 5b. Most papers did not present threat modelling techniques but were still about security assessment procedures. Other works were not generically applicable but concerned a specific area. Few works instead were not available to downloadFootnote 1.

3.4 Results

Data have been extracted from the paper performing the SLR. We divided all the data into 4 groups taking into account the data field described in the SLR protocol: (I) modelling data, (ii) threat classification data, (iii) threat selection approaches, and (iv) tools. As highlighted from Table 3, most of the approaches used to automate threat modelling rely on data flow diagram (DFD) modelling technique.

Table 3 Modelling technique used in literature

A data flow diagram is a model, introduced by DeMarco (1979), that describes a system as a set of specific components, and the data flows from/to its components. Each component of the system can be modelled as a generic function, a database and external agents, while a data flow is represented as an arch. Because of its simplicity, some authors (Haitao et al., 2022; Frydman et al., 2014; Mani & Venkat, 2017; Von Der Assen et al., 2022; Sion et al., 2021; Martins et al., 2015) have extended the model with further component types useful to obtain security features. As a result of the SLR, the most widespread models are based on graphs (e.g. DFD). Several works used their own graph-based model to describe the components (i.e. nodes) and the relationships between them (i.e. arcs). A recurring model resulting from the SLR is the MACM (multi-purpose application composition model), described in detail by the authors (Salzillo et al., 2020; Rak et al., 2019; Casola et al., 2019; Rak et al., 2020; Granata et al., 2021; Ficco et al., 2021; Rak et al., 2022). While graph-based models are highly expressive and very intuitive, some authors prefer to use the unified modelling language (UML) standard (Rumbaugh et al., 2004) to describe the system. The authors extended some UML diagrams to describe security requirements and automatically derive security issues. Anyway, the modelling phase is the first step of a security assessment process and depends on its scope. For this reason, some authors rely on specific own models to describe the software system (Messe et al., 2020; Chen, 2018; Schaad & Borozdin, 2012; Dominic et al., 2016; Ding et al., 2017; Monteuuis et al., 2018), while some of them are only interested in their assets (i.e. valuable items owned by the company) (Althar et al., 2022; Valenza et al., 2022; Schlegel et al., 2015; Hasan & Hasan, 2021; Haji et al., 2019; Ansari et al., 2019; Ivanova & Ivanenko, 2022). Once the model of a system is provided, an automated (or semi-automated) threat modelling technique can derive threats from it without much additional effort. Most of the threat modelling techniques in the literature are based on a specific threat classification. As shown in Table 4, the most common threat classification technique is STRIDE (Ansari et al., 2019). Since the threats are very heterogeneous, a complete threat model is difficult to generate. To face this problem, all the threats can be described in classes: Spoofing, Tampering, Reputation, Information Disclosure, Denial of Services, and Elevation of Privileges.

Table 4 Threat classification used in literature

In literature, there are several extensions (Chen, 2018; Monteuuis et al., 2018; Chen, 2019; Vallant et al., 2021) of STRIDE classification (e.g. considering Privacy and a different threat class). Similar threat classification is provided from the LINDDUN methodology (LINDDUN, LINDDUN). According to the LINDDUN knowledge base, there are 7 threat categories encapsulated in the LINDDUN acronym: Linkability, Identifiability, Non-repudiation, Detectability, Disclosure of information, Unawareness, Non-compliance. Some authors (Wuyts, 2020; Sion et al., 2018; Wuyts et al., 2018) associated their own threats to LINDDUN classes and used the LINDDUN knowledge base as input to the threat modelling process. STRIDE and LINDDUN are related to the malicious behaviours the threat can implement, another approach is to consider the security requirements the threat can compromise in terms of CIA: Confidentiality, Integrity, and Availability. Similarly, considering Privacy as a compromise-able requirement, some extensions considering CIAP have been considered. STRIDE and LUNDDUN classify each threat expressed by a high-level malicious behaviours. Few works instead use Attack patterns classification to perform threat modelling. According to CAPEC, Attack Patterns are descriptions of the common attributes and approaches employed by adversaries to exploit known weaknesses in cyber-enabled capabilities.. These approaches (Messe et al., 2020; Alwaheidi & Islam, 2022) rely on CAPEC knowledge base and their attack-pattern classification, therefore they are more attack-specific compared with STRIDE and LINDDUN. As a result of classifications in the literature, some more elaborate techniques make use of first-order logic (Althar et al., 2022) (i.e. Prolog rules), or data structures such as threat-trees (Moreira et al., 2016; Radoglou-Grammatikis et al., 2022), threat paths (Ramazanzadeh et al., 2022), and threat matrix (Ivanova & Ivanenko, 2022).

Based on well-defined models and threat classification, there are much different threat modelling techniques that enable enumerating all the threats the system is affected by. A full report of all threat selection approaches is shown in Table 5. It is worth noting that some articles can use a combination of the selection approaches taking into account different points of view.

Table 5 Threat selection techniques used in literature

As in evidence, most approaches (especially the ones based on graph models) select all the threats from their threat catalogue using some labels. Labels are some words that describe the behaviours of a component. Each component can be associated with one or more labels. Leveraging this relationship, the literature approaches automatically lists all the threat for each component. A common and simple approach consists of using STRIDE classes. The model is decomposed and the STRIDE threats are analysed component-per-component. The overall threat model is a report for each STRIDE class. Despite the simplicity of this approach, some authors prefer a more detailed approach. Some techniques still rely on STRIDE (or LINDDUN), but they mapped each class to the component typology described in the model (e.g. DFD element type). This is useful to automatically derive STRIDE threat for each component. The used approach is fully automated, but is not detailed (e.g. threats belonging to the same class can be several). Microsoft proposes a methodology implemented by their tool (Microsoft, 2018) that automatically derives threats from the relationships between the components. They take into account the relationship type between two components and also their typology (e.g. a threat can be due to the communication between a web application and a database). Several works (Sion et al., 2018; Chen, 2018; Danielis et al., 2020; Asif et al, 2021; Zeinali & Hadavi, 2018; Ding et al., 2017; Shelupanov & Konev, 2019; Vallant et al., 2021; Mahmood et al., 2022) based their approach on the one proposed by Microsoft (using also their tool). Other authors, instead consider the relationship as an additional threat source depending on the protocol used (Rak et al., 2022; Granata et al., 2022). Similar to their approaches, some authors (Verreydt et al., 2022; Meland et al., 2014; Ivanova & Ivanenko, 2022; Rak et al., 2022) claim that relationships and components are not the only threat source. According to them, some threats can be added to the overall threat model considering some Propagation rules (e.g. a threat can compromise a component, but also the services used by it). Unlike these simple approaches, there are more complex threat selection criteria based on pattern recognition in the graph or specific rules mapped with threats (Sion et al., 2021; Zeinali & Hadavi, 2018; Frydman et al., 2014; Messe et al., 2020; Valenza et al., 2022). Threat modelling approaches aim also at ensuring security requirements in the system. Few authors proposed their methodology based on model requirements (Moreira et al., 2016; Wirtz and Heisel, 2020). As a result of SLR, we also analysed some less widespread techniques using questionnaires (Schaad & Borozdin, 2012; Wuyts et al., 2018), threat tree (Wuyts et al., 2018; Haji et al., 2019), and generic parameters (Dominic et al., 2016; Ansari et al. (2019). Some of them instead used OWASP ASVS (Von Der Assen et al., 2022), Petri-nets (Ramazanzadeh et al., 2022), weaknesses from CWE database (Alwaheidi & Islam, 2022), and attack-paths (Haitao et al., 2022).

3.4.1 Tools

Once we analysed all the methodologies, we provide an enumeration of the tools used to implement them. The most commonly used for automating threat modelling is Microsoft threat modelling (Microsoft, 2018) used in 8 papers from the SLR. A research line carried out by Granata et al. (2022) used instead sla-generator to perform threat modelling and even risk analysis. Another supported tool is Threat Dragon provided by OWASP (Bhattacharya, 2020). Some others’ tool are SPARTA (Sion et al., 2018) from KU Leuven, CoreTM (Von Der Assen et al., 2022), TAMELESS (Valenza et al., 2022), MetaGME (Martins et al., 2015), TAM tool (Schaad & Borozdin, 2012), AutSEC (Frydman et al., 2014), and STS-tool (Meland et al., 2014). The SLR results reported 9 threat modelling tools, and we enriched this list through a specific research by adding further tools such as CAIRIS (Faily, 2018), Threats Manager Studio (Curzi, 2020), Threatspec (Fraser Scott & Smotrakov, 2019), PyTM (pyTM, 2019), and Threagile (Threat Agile, 2020). It is worth noticing that in this work, the enumeration of the tools is not complete, but it both derives from the SLR and is carried out manually. This is why our focus regards the threat modelling techniques each tool relies on and the tool is needful only to validate them.

Considering all these tools aimed at performing automatic threat modelling, we decided to analyse the open-source ones that do not require any manual task to select the threats and focus on web application Domain. Accordingly, a full analysis of the tools that meet these requirements is shown in Table 6. For each tool selected from both the SLR and the manual extension, the analysis provides the modelling techniques. The modelling field describes two different parameters: (i) how the model is created by the user (i.e. graphically or by coding) and (ii) the modelling technique (e.g. Threatspec generates a DFD graph using the code provided as an input, instead OWASP Threat Dragon uses directly a DFD model as an input).

Table 6 Tools features

The domain field is a list of all the areas in which the tools have been applied according to the SLR or the documentation page. It is worth noting that taking into account the application domains is useful to choose the appropriate tool for the threat modelling scope. In addition to the threat modelling process, some tools were analysed to provide a risk analysis process aimed at offering a rough evaluation of the risk (i.e. the probability that a threat would occur). Among all the selected tools, sla-generator, SPARTA, and Threat-agile support risk analysis and (in some cases) suggest the appropriate countermeasures to reduce the risk. The threat selection methodology column describes how each tool can use one or more approaches (described as a result of the SLR in Table 5) to select the threats compromising each asset. As an example, Microsoft tool selection methodology is based on the DFD relationships and, accordingly, associates each threat to a triple: PointingNodearchPointedNode. Sla-generator, instead, has three different ways to select the threats: (i) based on the component labels (i.e. specific parameters provided in the modelling phase), (ii) considering both the protocols used in each relationship and the role in the communication (e.g. if an asset is a client or a server), and (iii) extending the analysis to the threats that affect a component and can compromise also the neighboards.

4 Threat modelling tools

As already outlined, security experts are costly and human-driven threat modelling is costly both in terms of money and time. As a result of SLR, there are now a few tools that aim at offering support to experts in threat modelling, simplifying the work, requiring less experienced experts (most of the threats are catalogued), and offering solutions that produce the models in limited time. According to our analysis made in Section 3, most of the threat modelling tools use graph-based approaches to model the system. Therefore, this section provides a detailed analysis of the tools that rely on a graph-based model or can generate it, created by the user: Microsoft threat modelling tool, which is probably the most largely adopted one, the OWASP Threat Dragon and PyTM, supported by the OWASP consortium, and the SLA-generator tool, developed in the H2020 MUSA European project. SPARTA is also a promising tool, but it has not been selected for the following analysis because, using the catalogue offered by the official documentation, the tool provides threats using a stride-based approach: each STRIDE category is associated with the components of the DFD and to their relationships. This technique is not (yet) compatible with the analysis (i.e. it cannot be compared to the Microsoft and OWASP ones). In the following, we briefly outline the threat modelling approaches they support.

4.1 Microsoft threat modelling tool

The Microsoft threat modelling tool we tested was released in September 2018 (Microsoft, 2018). It aims at reducing threat modelling times, generating the threats to which a system is subjected automatically, relying on a model of the system. The system under analysis (SuA) is modelled by the user through a graph-based model. The user has the possibility to choose various stencils to be included in the application. Each node of the graph represents an application service, while each edge indicates a generic data flow (i.e. request or response). The Microsoft modelling technique requires that each node is characterized by two labels: component type and component value. The first one describes the type of the component while the second provides further functional information. Most of the pairs (componenttypecomponentvalue) are shown in Table 7. For instance, a node can represent a generic database, so it can be modelled as a generic data store type and database value. The table does not represent all possible values for brevity’s sake.

Table 7 Example values related to each component type

Once the application is modelled, the tool generates a threat report automatically. Threats are associated with each interaction between components. Each threat is selected from a proprietary catalogue taking into account the type of components involved in the interaction and the type of interaction. For example, requests made by a web application toward a storage service can generate the SQL Injection threat. In addition to providing threats associated with system assets, the tool suggests possible mitigations selected from a proprietary Microsoft database.

4.2 OWASP threat dragon

Threat Dragon is a free, open-source, cross-platform threat modelling application based on diagram models and rule engines to auto-generate threats and mitigations (Bhattacharya, 2020). It supports STRIDE (Ansari et al., 2019) classification and CIA. The tool was presented during the OWASP Open Security Summit in June 2020 by OWASP Lab Project and it is available as open-source code in Goodwin (Goodwin). The tool requires the application to be modelled through a graph-based model in which the nodes represent the components, while the edges define the transfer of data between them. Each node can be (i) a generic running process, (ii) an actor, or (iii) a component that stores the data. Each element (node or edge) is characterized by a set of attributes that can be used to identify its security problems. All the parameters related to each element of the graph are described in Table 8.

Table 8 Parameters related to each component type

The pair (componentTypeassociatedParameters) is used to obtain the threats associated with the component/flow (i.e. asset) of the diagram. For example, a store that has is encrypted as a parameter may be subject to the vulnerable encryption algorithm threat that could lead a malicious user to obtain data out of the application. The threats are obtained from the related catalogue (OWASP, OWASP) in a fully automatic way. The user can also define some custom threats and associate them with each element of the application. For each pair (assetthreat), the tool asks the user for the threat status (open or mitigated) field and a priority level (low, medium, high) and then suggest a list of possible mitigations.

4.3 SLA-generator

The SLA-generator threat modelling technique (Granata & Rak, 2021; Rak et al., 2022) relies on MACM, an expressive model that describes WHAT to assess and test. The MACM is a graph-based modelling technique in which each graph node represents a component of the system, and each edge characterizes the existing connection between two different components. MACM offers a simple way to synthesize an application architecture, focusing on its main components and relationships, enabling the security evaluation automation of the assessed systems. Nodes have a primary label, which identifies the asset class and may have a secondary label, which further specifies the primary class. Moreover, each node has a set of properties that better describe more specific aspects. A mandatory property is the asset type, which specifies the functional behaviour of the asset represented by the node. The allowed asset types for a node depend on the labels. Labels and supported asset types are listed and described in Table 9.

Table 9 MACM node labels and assets

The possible relationships between the nodes are uses, hosts, provides, and connects and are described extensively in some works, cited above. In order to manage the MACM model, the tool represents them in a graph database, namely Neo4j. The MACM is preliminarily produced by the user in Neo4j and then requested by the tool (available at linkFootnote 2 ) for the threat modelling phase. The tool communicates with the graph database, obtaining the correctly modelled applications. The technique selects all the threats applicable to the SuA by evaluating the asset-type field of each component (i.e. MACM node). The technique relies on a threat catalogue, which organizes the threats according to their asset type. The catalogue describes the threats with 8 parameters, as shown in Table 10.

Table 10 Threat catalogue template

A threat can be linked to an asset (asset type) or a communication protocol. For this reason, some fields may be left blank. For example, if a threat affects a specific asset typology, i.e. the read DB configuration threat for a service.DB asset type, both the relationship and role fields are left unspecified.

The compromised field indicates the asset that is compromised by the malicious behaviour and it can assume the following values:

  • Self, if the threat compromises only the node specified by the asset type;

  • Source (relation), when it compromises the node pointing from the arch;

  • Target (relation), when it compromises the node pointed by the arch;

It is worth noting that when the compromised field is source or target, the argument relation can be uses, connects, or hosts. The tool then obtains the threats by considering both the asset-type field of the component and the related communication protocols used by the component. The tool also suggests, for each selected threat, one (or more) NIST SP-800-53 (Joint Task Force Interagency Working Group, Joint Task Force Interagency Working Group) controls.

4.4 PyTM

A Pythonic framework for threat modelling (PyTM) (pyTM, 2019) is a framework developed in Python by the OWASP consortium. The tool is not based on graph-model directly, but it implements a new paradigm: threat modelling as a code. Accordingly, the threat modeller should model the system using Python code and this should improve the usability of the threat modelling process by integrating this phase in the development. The model is produced leveraging a set of Python elements (i.e. pytm Python module).

Table 11 lists the supported elements and their brief description.

Table 11 Threat parameters

The tool also allows the user to associate some properties to each element of the model, but, we will not enumerate them for brevity’s sake. Inspired by the infrastructure-as-a-code approach, the core idea of pyTM is that the target system architecture can be described through code, using as assets the elements outlined in Table 11. The model can be enriched through a set of properties associated to the assets as variables of the object.

In order to illustrate the idea, consider the following code: it describes the creation of a datastore, based on the CentOS operative system. The datastore is a relational database based on MySQL. Setting the inScope parameter to true, allow the tool to consider this database as in-scope of the threat model. Also, some controls can be applied in the modelling phase, as shown in the code example. In this case, the database has been analysed using a hardening process in order to find security vulnerabilities by applying recommended best practices.

db = Datastore("SQL Database")

db.OS = "CentOS"

db.type = DatastoreType.SQL

db.inScope = True

db.controls.isHardened = False

Once the model is built with all the data, the tool allows to:

  • Generate a DFD from the code by parsing the Python model into a graph-based model;

  • Generate a sequence diagram by considering the Python execution order.

  • Produce a threat model

Since our scope is to analyse the threat modelling process, we focus on how threats are selected. The tool has its own threat database based on data provided by MITRE and CAPEC. A threat is described as a set of 7 parameters, as shown in Table 12.

Table 12 Threat parameters

It is worth noticing that the most crucial parameter is Target since it is used to select the threats for each element of the model. Custom threats (i.e. manually provided by the users/developers) are supported by leveraging the overriding function.

The tool considers as assets all the elements defined in Python; accordingly, both the DFD nodes and the data flows are the resources to protect and some threats are associated with them. According to the tool technique, a threat compromises an element if the element type is the target list of the threat and the condition is verified. The condition is a logical condition that considers some Boolean parameters. As an example, a Privilege Abuse threat can be applied only if the Target value is Server, Process, or Datastore and the condition is target.controls.hasAccessControl is False or target.controls.authorizesSource is False. Accordingly, the threat is applicable only if the server does not have an access control system enabled AND an authorization control system.

An innovative function is the possibility to add controls to mitigate threats to the model. This allows the user to apply the countermeasures, produce a new model, and update the threat model (no longer considering the threats mitigated by the countermeasure). In this way, the tool supports the continuous threat modelling paradigm since the process is carried out on an ongoing basis rather than just being a one-time assessment.

5 Comparison

In this chapter, we want to compare the different threat modelling tools and the approaches they adopt. In order to show the differences, we will use a very common application, typically executed on a cloud infrastructure: an e-commerce site developed on top of WordPress. Considering this application, we modelled the system with the four different tools and documented the threat modelling results from each tool offered.

5.1 The WordPress case study

WordPress is an open-source content management system, which allows the creation and distribution of an Internet site made up of textual or multimedia content, which can be managed and updated dynamically. The web application WP is hosted on a cloud virtual machine on top of an Apache web server and interfaced with a MySQL database. In order to enable scalability, the WordPress component can be deployed multiple times, reusing always the same Database (that can scale only vertically, i.e. adding memory and/or CPU to the hosting VM). A Load Balancer distributes the Client requests to the connected WP instances. The developer simply customizes the WP instances, through custom plugins and customizing the application behaviour.

Even if the development of such systems is simple and commonly relies on very limited skills from the developer/system administrators, the application manages money and personal data, so it has strict security requirements. It must be considered that an incredible amount of WordPress instances on the web are vulnerable (see Abela, 2020), due to incorrect security planning and management.

5.2 Microsoft tool analysis

The Microsoft tool allowed us to describe the WordPress application in a complete way, as it supports a large number of stencils. As described above, the Microsoft tool considers the interactions between components (arcs of the graph) as assets and obtains security information by evaluating the type of the two components involved in the communication (Fig. 6).

Fig. 6
figure 6

Microsoft tool model WordPress

In order to model the application, the client was modelled as a Browser, while WordPress and Load Balancer as a web application. The MySQL database instead was modelled as a database component value. Each service is running on a host node. Once the user has modelled the application, the tool automatically generates the threats for each asset (i.e. threat model) by producing a report in HTML format. Part of the threat model is described in Table 13.

Table 13 Part of the threat model WordPress using Microsoft tool

It is important to note that the threat model shows, in this case, three values as asset field: sourcenodetypeofrelationshipdestinationnode. From the results, it can be noted that, for example, each service exposes some threats in the relation to the generic process it hosts. As an example, a malicious user can get sensitive data from the service configuration files. A possible countermeasure that the tool suggests is to encrypt only the configuration files that contain sensitive data. The sending of the access credentials by the user to the service can also be compromised. In fact, a malicious user can steal this data in different ways. In order to reduce the risk that this threat happens, the Microsoft tool suggests some countermeasures. As an example, the user can disable the auto-complete HTML attribute in sensitive forms and inputs. The analysis also shows problems related to the use of weak encryption algorithms in the communication between the Load Balancer and WordPress. In fact, a malicious user can intercept the packets containing the encrypted data and apply an encryption reversing algorithm to recover the plain-text data.

5.3 Dragon analysis

We modelled the system using the Threat Dragon diagram tool. The number of stencils available is limited, so, as shown in Fig. 7, The WordPress application was modelled using only the Process, Store, and Actor.

Fig. 7
figure 7

Threat Dragon model WordPress

Load Balancer and WordPress were modelled as two processes, while for the Client and Mysql database we have chosen the stencil of Actor and Store respectively. Each node of the graph communicates through a DataFlow relationship. As highlighted in the previous section, the tool considers both the nodes and the arcs of the graph as assets (i.e. resources to be protected). Each asset has a set of properties aimed at selecting the related threats, as shown in Table 14.

Table 14 Parameters related to each component type

We modelled the Load Balancer service and WordPress application as a web application. In particular, we assumed that the WordPress-based website is an e-commerce (manages payment cards) that stores data and encrypted credentials in a MySQL database. Each communication is made on a public network with HTTP protocol. Considering the selected parameters, the tool automatically collects threats (i.e. threat name, description, and STRIDE classification) for each component of the application and suggests the related mitigations. A partial list of threats for each component is shown in Table 15.

Table 15 Part of the threat model WordPress using Dragon TM

As the user can access from a public network, a malicious user can exploit a fingerprinting threat against the data exchange between the client and the Load Balancer, sending specific requests to obtain information in order to profile the application. The WordPress-based web application on the other hand can be subject to Card Cracking threats since it manages payment cards. In this case, the malicious user can carry out a brute force attack on the payment process in order to identify the missing values of the card (i.e. expiry date, security code, etc.). A brute force attack prevention system can (partially) mitigate the threat.

5.4 SLA-generator analysis

Figure 8 shows the MACM model of our case study. Each label affects the colour of the nodes, while attributes are not visible in the picture. As anticipated, the system is composed of a cloud service provider (e.g. Azure or a private cloud) that provides three virtual machines. which are labelled as IaaS, and their asset type is VM, e.g. virtual machine. One VM hosts a Load Balancer service while the other two VMs host respectively a WordPress instance and a MySQL a database instance. We modelled the Load Balancer (LB) and WordPress (WP) as SaaS nodes and we set their asset type as web application. The MySQL instance, instead, was labelled as a SaaS, but with database (DB) value as asset type. The LB uses the WP that, in turn, uses the DB. The Client (modelled as a CSC node) uses the Load Balancer service, which acts as the application interface. Each virtual machine is connected to the public network.

Fig. 8
figure 8

WordPress MACM

Applying our threat modelling technique, we produce a list of threats but, for simplicity’s sake, we report in Table 16 just one for each asset type. The full list of threats is not compatible with the length of the paper.

Table 16 Part of the threat model WordPress using SLA-generator

The results show how nodes labelled as SERVICE.Web can be subject to Injection threat in which an attacker legitimately sends commands to the exposed service without proper authorization. In order to mitigate this threat, we suggest the usage of NIST Control SI-10, Invalid input validation.

The tool also models the threats associated with the network, such as Message Reply threat for which an attacker can re-transmit some packets (previously intercepted) in order to obtain data.

5.5 PyTM analysis

We modelled the system using the Python code and the related documentation. Since the number of elements is limited, we used only two Server to model WordPress and the Load Balancer, the Actor to model the browser, and the Datastore to model the MySQL database. The DFD produced by the Python code is shown in Fig. 9.

Fig. 9
figure 9

PyTM DFD

The user also has modelled all the requests and the responses through the DataFlow element. There is also the possibility to attach some data in the data flow (e.g. tokens, credentials), but, at the moment, the tool does not consider them to evaluate the threats. The tool produced a report containing 91 threats, each one associated with the compromised asset. A part of the threat model is shown as an example in Table 17.

Table 17 Part of the threat model WordPress using PyTM

From the analysis of the table, we can notice that the threats are described at a low level as well as the suggested mitigation. There are also some technological references that make the threat model less applicable (not generic). For example, an attacker can include some code into a file using PHP (assuming installed on the server). Also, the database can be subjected to brute force attack if it uses some weak encryption algorithm for the data.

5.6 Comparison

It is worth noting that, as highlighted above, most of the tools rely on a graph-based model to describe the target system, where the node represents the asset and the edge of their connections. Just PyTM relies on a code-based approach that allows the user to leverage a Python class to model the system, but it can produce a DFD from the code. Accordingly, a comparison can be carried out by considering the edges and the nodes produced by the code. The tools differ in the interpretation and metadata associated with nodes and edges of the graph. According to Microsoft’s approach, there is a large variety of possible nodes, but the key role in the threat modelling is associated with the connection among them: in fact the threats are listed per-connection, taking into account the connected nodes and the connection attributes. According to OWASP, on the other hand, the Threat Dragon tool evaluates both the nodes and the arcs of the graph as assets, associating the threats to each element. However, the types of nodes and edges are very limited and the threats are selected according to a few attributes associated with both nodes and relationship. The SLA-generator, instead, focuses on system assets (the graph nodes) and identifies the possible threats relying on the asset type attribute, which offers a large variety of different values, similar to the Microsoft threat modelling tool. Moreover, relationships affect the possible threats to which each node, but the threats are always listed as associated with nodes. A different approach is adopted by PyTM that considers all the Python elements as assets (even the data flows).

Table 18 Threat modelling comparison table

It is out of the scope of this work to say which approach is better (we aim at comparing the ideas not at making a rank of the tools), but it is worth noting that in the graph they made a completely different choice: one focuses on edge, one on nodes and the last on both of them. However, the final result, in all the cases, is a list of threats that contains an explicit description of the malicious behaviour (in natural language) and the classification of the threat according to STRIDE or respect to the impact on Confidentiality, Integrity, and Availability. The tools, even in the case of the WordPress application, which is pretty simple, produce a pretty long list of threats (88 for the MS threat modelling tool, 84 for SLA-generator, 31 for the Dragon tool, and 97 for PyTM). We, acting as experts, consider that the choice of listing threats only respects assets or only respects relationships (the choices done by SLA-generator and by MS threat modelling Tool) helps the expert work in the analysis of the results, but this is and remains a subjective choice. However, the number of threats outlined by the OWASP tool looks at the state of the art, with limited respect to the other tools. This is due to the limited set of parameters available for the selection and, probably, to the underlying threat catalogue dimension. PyTM instead has the greatest number of threats, but, as already mentioned, the threats are high-level attacks from CAPEC (i.e. attack patterns and in some cases, they depend on the technological implementation of the asset). Another interesting aspect is that the four techniques present threats at different levels of granularity, as shown in Table 18.

As an example, the SLA-generator tool underlines how WordPress can be subject to data leakage. The same threat is (partially) expressed by the Microsoft tool with read configuration files and steal user credentials threats and fully expressed by PyTM. According to OWASP, instead, data loss can be caused both by an application profiling technique (e.g. fingerprinting) and by techniques that aim at obtaining information on users’ virtual cards.

In general, the threats affecting WordPress were 10 for both OWASP and SLA-generator, 25 according to Microsoft, and 42 for PyTM. It is important to note, however, that threats are expressed with different levels of detail. The analysis also shows how a Injection threat can affect both WordPress and the database. Considering the database as an asset, a Microsoft SQL injection can be as SLA-generator Read/Insert injection that takes into account that a malicious user wants to get information from the database or write to it (e.g. create an account), while PyTM describes many ways to inject data through more detailed threats. In this case, the threats according to the SLA-generator tool are 15, while OWASP and Microsoft consider only 8 and 12 for PyTM. Virtual machines, on the other hand, are not considered in the OWASP model and PyTM code, the table shows the comparison only between SLA-generator and Microsoft tool. One of the 13 threats described by the SLA-generator is that of Data Breach, partially mapped with Access to sensitive data from log files by Microsoft (which instead considers 6 threats). Network assets were modelled only by the SLA-generator and threat modelling reported 12 threats.Footnote 3 As a result, a generic tool comparison, shown in Table 19, summarizes the main differences between the selected tools.

Table 19 Tool comparison features

6 Conclusions and future work

In this paper, we have analysed the state of the art of automated threat modelling techniques using a systematic literature review technique. According to the review results, we focused on three different graph-based methodologies and their tools: SLA-generator, Microsoft tool, and Threat Dragon by OWASP. The analysed tools require a very simplified graph-based model of the application in which the nodes represent the components of the system and the arcs represent the interactions between the various components. The simplicity of modelling allows the user in all three approaches to obtain security information in a fully automatic way. The approaches were applied to a case study involving WordPress, a content management system that allows you to manage a website. The results show that the threats are described at different levels of detail, but still compatible. In particular, OWASP Threat Dragon has proved to be the tool that produces a less complete threat model than the others. The number of threats related to the WordPress component was greater (25) with the Microsoft tool, while the threat model related to the database and virtual machines was more complete with SLA-generator. Furthermore, the tool also considered the network as an asset, highlighting 12 threats. As mentioned in the paper, risk analysis builds on threat modelling by assessing the likelihood of a threat occurring and the potential consequences. Accordingly, our future work is performing a complementary systematic literature review on risk analysis methodologies in the literature and the tools used to implement them.