Student assessment in cybersecurity training automated by pattern mining and clustering

Švábenský, Valdemar; Vykopal, Jan; Čeleda, Pavel; Tkáčik, Kristián; Popovič, Daniel

doi:10.1007/s10639-022-10954-4

Student assessment in cybersecurity training automated by pattern mining and clustering

Published: 30 March 2022

Volume 27, pages 9231–9262, (2022)
Cite this article

Download PDF

Education and Information Technologies Aims and scope Submit manuscript

Student assessment in cybersecurity training automated by pattern mining and clustering

Download PDF

Valdemar Švábenský ORCID: orcid.org/0000-0001-8546-280X^1,2,
Jan Vykopal¹,
Pavel Čeleda¹,
Kristián Tkáčik² &
…
Daniel Popovič²

2716 Accesses
5 Citations
2 Altmetric
Explore all metrics

Abstract

Hands-on cybersecurity training allows students and professionals to practice various tools and improve their technical skills. The training occurs in an interactive learning environment that enables completing sophisticated tasks in full-fledged operating systems, networks, and applications. During the training, the learning environment allows collecting data about trainees’ interactions with the environment, such as their usage of command-line tools. These data contain patterns indicative of trainees’ learning processes, and revealing them allows to assess the trainees and provide feedback to help them learn. However, automated analysis of these data is challenging. The training tasks feature complex problem-solving, and many different solution approaches are possible. Moreover, the trainees generate vast amounts of interaction data. This paper explores a dataset from 18 cybersecurity training sessions using data mining and machine learning techniques. We employed pattern mining and clustering to analyze 8834 commands collected from 113 trainees, revealing their typical behavior, mistakes, solution strategies, and difficult training stages. Pattern mining proved suitable in capturing timing information and tool usage frequency. Clustering underlined that many trainees often face the same issues, which can be addressed by targeted scaffolding. Our results show that data mining methods are suitable for analyzing cybersecurity training data. Educational researchers and practitioners can apply these methods in their contexts to assess trainees, support them, and improve the training design. Artifacts associated with this research are publicly available.

Automated feedback for participants of hands-on cybersecurity training

Article Open access 02 November 2023

Using data clustering to reveal trainees’ behavior in cybersecurity education

Article Open access 13 February 2024

Applications of educational data mining and learning analytics on data from cybersecurity training

Article 27 May 2022

1 Introduction

Cybersecurity professionals are needed across the globe to counter the ubiquitous cyber threats. To meet the increasing demand for cybersecurity experts ((ISC)2, 2021), effective training is essential. Such training must include hands-on components and provide practical experience in authentic settings. This includes using a variety of tools for cybersecurity operations, such as host configuration, hardening, and penetration testing.

Cybersecurity experts use tools with graphical user interfaces as well as command-line tools. Within the scope of our research, we focus on the latter, since command-line tools represent an important component of cybersecurity practice. Cyber attackers use them to perform sophisticated attacks, which cyber defenders need to understand to mitigate advanced threats. In addition, various command-line tools are used to configure computer systems securely.

1.1 Research problem statement

Our research focuses on supporting automated assessment in the context of hands-on cybersecurity training. Here, we explain the motivation for our research, illustrate the problem with a simple example, justify why the problem is hard to address, and summarize the gaps in the current literature.

Why is student assessment necessary?

Educational assessment is a crucial aspect of training (Lancaster et al., 2019). It enables teachers (instructors) to better understand the actions of their students (trainees). Specifically, in-depth assessment shows what each student did well, what could be improved, and whether the student progressed through the training as expected.

Based on insights from the assessment, teachers can adapt their class, provide students with feedback to support their learning, or evaluate their level of knowledge. The assessment also shows potential issues in the training design, enabling to fix them and further improve the effectiveness of the training.

How can students be assessed?

Like most applied computing skills, cybersecurity is usually practiced hands-on in computer-supported interactive learning environments. These are physical or virtual platforms that provide computer hosts with full-fledged operating systems, networks, and applications for training.

Advanced interactive learning environments allow collecting data about the students’ actions, such as their usage of command-line tools. These student interaction data authentically capture learning processes. Therefore, they can be transformed into educational insights and exploited for assessment.

As an example, consider the two command histories from cybersecurity training shown in Figs. 1 and 2. They belong to two students who attempted to crack a password to a ZIP archive using the fcrackzip utility in Linux. Each command is prefixed by the timestamp of its execution. Based on the analysis of these student data, the instructor can see that each student needs help with a specific and different aspect of the training.

Why is assessment difficult?

In-depth assessment of cybersecurity training is difficult for four main reasons.

1.
The training is complex. The tasks require high-order problem solving and may have many different correct solutions. Therefore, the assessment is much more complex than assessing simple tasks such as memorizing facts.
2.
Each student is unique. Every student has different previous knowledge, experience, motivation, and approach to learning. As a result, students adopt different strategies to solve the tasks. This is natural, but it further complicates the conditions for automatically assessing hands-on tasks.
3.
Students generate a lot of data. During the training, even a class that is relatively small (10–20 students) and time-constrained (1–2 hours) can generate hundreds of data records. As a result, manually processing these data becomes quickly infeasible.
4.
The assessment process is not straightforward. It is unclear how to transform the raw data from training into educational insights useful for assessment. As the examples in Figs. 1 and 2 demonstrated, even a relatively constrained assignment can generate various data for assessment.

The need for research

Traditionally, educational researchers and practitioners assessed student data manually. However, due to the difficulties described above, a manual transformation of hands-on training data into educational insights is not viable (Fournier-Viger, 2017; Romero & Ventura, 2020). It is highly time-consuming, ineffective, and error-prone.

Automated assessment is more scalable and accurate. Therefore, it can be fruitful to leverage automated techniques, such as machine learning and data mining, for analyzing data from hands-on training (Palmer, 2019). These techniques should transform the data from their raw form to an understandable representation, such as an overview of highlights or a visualization.

However, the review of current literature (see Section 3 for details) identified several gaps in state of the art in this area:

As Weiss et al. (2016) argued, current automated assessment is often superficial, judging only the (in)correctness of the solution. Only a few papers, such as by Mirkovic et al. (2020), have explored an in-depth assessment of student learning.
To the best of our knowledge, no published research attempted to compare and evaluate the applicability of two different data mining methods on cybersecurity training data. Student assessment in cybersecurity has been explored from other perspectives, such as using numerical scoring metrics (see Maennel et al. (2017) for an example).
Data mining algorithms have been used for assessment in other domains, such as programming (Gao et al., 2021), but it is unclear how to generalize these previous results to the cybersecurity context.

1.2 Goals of this research paper

We seek to support automated assessment of students in hands-on training. In order to address the gaps in the literature, the assessment must satisfy the following criteria:

enable an in-depth understanding of students’ actions,
use methods that have not been researched in this context previously, and
be evaluated on an authentic dataset from realistic training sessions.

The domain of data mining offers many methods for the automated extraction of insights from raw data (Fournier-Viger, 2017). Two methods that satisfy the criteria above and will be explored in this paper are pattern mining and clustering. Pattern mining techniques, such as association rule mining and sequential pattern mining, can reveal interesting relationships in datasets (Fournier-Viger, 2013b). Clustering, on the other hand, forms groups of data based on their similar characteristics (Romero et al., 2010). Evaluating these two techniques represents an original contribution to cybersecurity education and beyond.

Research questions

Our research is framed by two research questions related to student assessment in cybersecurity: What insights can we gather from command histories using pattern mining (RQ1) and clustering (RQ2)? By insights, we mean the following educational findings to support assessment:

trainees’ approaches and strategies to solving the training tasks,
common mistakes, misconceptions, and tools problematic for trainees,
distinct types of trainees based on their actions and behavior, and
issues in the training design and execution.

Expected contributions of this research

Answering the research questions will be valuable for various stakeholders.

Cybersecurity instructors can use the researched methods in their classes to gain new insights for assessing their students. Specific assessment use cases are detailed in Sections 5.3 and 5.5.
Researchers can build upon this work by evaluating other data mining methods on similar datasets. This will contribute to the body of knowledge on assessment in cybersecurity training.
Developers of cybersecurity training platforms can integrate the researched methods of data collection and analysis into the interactive learning environments. This will support the goals of instructors and researchers.

Educational stakeholders from outside the cybersecurity domain can benefit from this research as well. Students of related computing disciplines, such as networking and operating systems administration, can generate similar data for assessment in hands-on classes. For students of other disciplines, the researched methods can be extended to process different data, such as clickstreams.

1.3 How to read this paper

Above, we defined three target groups who may be interested in this paper. Although we aim to address readers from a broad audience, we acknowledge that some sections of the paper are not relevant for everyone. Section 2 provides a brief background and therefore aims at researchers who seek to understand the theory of the used methods. Other readers who are satisfied with a more high-level understanding may skip it. Section 3 reviews related studies, which is relevant for researchers and instructors interested in how the previous research results were applied to support teaching practice. Section 4 details the used methods for the data collection and analysis. It is aimed mainly at researchers and developers, since it also includes technical details about the training platforms and data collection. Section 5 presents the findings and answers the research questions. Finally, Section 6 concludes, summarizes our contributions, and proposes future work. These two sections are suitable for all readers.

2 Background and terminology of data mining

This section defines the key terms to familiarize the readers with basic data mining concepts. Data mining is a field of computing that deals with extracting knowledge from data. Its purpose is to enable understanding of the data, gather new insights from them, and support decision-making based on this understanding (Fournier-Viger et al., 2017; Han et al., 2011). Out of the many data mining methods, we will focus on two of them: pattern mining (Section 2.2) and clustering (Section 2.3).

2.1 Educational data mining and learning analytics

Educational data mining (EDM) (Romero et al., 2010) and Learning analytics (LA) (Lang et al., 2017) are two inter-related research areas that aim to understand and improve teaching and learning. The research in these areas focuses, for example, on student behavior, learning processes, assessment, and interactive learning environments. To achieve their aims, EDM/LA researchers collect and analyze data from educational settings.

2.2 Pattern mining

Pattern mining automatically extracts previously hidden patterns in data. Its objective is to discover patterns that are easily interpretable by humans. We concentrate on two well-established pattern mining techniques: association rule mining (ARM) and sequential pattern mining (SPM) (Fournier-Viger, 2013b; Fournier-Viger et al., 2017).

Association rule mining

Association rules are patterns with the form of an if-then statement. A rule \(X \rightarrow Y\) says that if an item X occurs in a transaction (a set of items), then so does Y (Fournier-Viger et al., 2017; Han et al., 2011; Romero et al., 2010). In our case, an item may be a command submitted by a student, and a transaction may be a whole set of commands of that student. An association rule mined from a set of students’ transactions may indicate that if a student used a command X, then they used a command Y.

For each association rule \(X \rightarrow Y\), we are typically interested in two metrics: its support (relative occurrence among all the examined transactions) and confidence (relative occurrence among the transactions that contain X).

Algorithms for mining association rules consider only rules that satisfy the user-defined thresholds for the minimal support and confidence, MinSup and MinConf. Since this process can extract a vast amount of rules, additional measures such as lift are applied to filter out irrelevant rules (Fournier-Viger et al., 2017; Han et al., 2011; Romero et al., 2010).

Sequential pattern mining

Sequential pattern is a frequently occurring subsequence in a given set of sequences (Fournier-Viger et al., 2017; Romero et al., 2010). For example, it can be a progression of certain commands that many students used. Contrary to ARM, SPM can analyze data in which the ordering of items is relevant.

Again, sequential patterns are mined based on a MinSup threshold. To find a manageable amount of patterns, it is recommended to use algorithms that mine closed sequential patterns (Fournier-Viger et al., 2017; Fournier-Viger et al., 2014; Fumarola et al., 2016).

2.3 Clustering

Clustering is the process of assigning data points into groups called clusters based on their similarity. Data in one group are similar to each other and dissimilar to data from other groups (Madhulatha, 2012). For example, in our context, we can group students based on the similarities in their command-line usage. Clustering is an unsupervised machine learning technique, so it does not use previously labeled data to assess new data. Instead, it organizes unlabeled data into “bundles”.

We focus on density-based clustering, which defines a cluster as an area with a high density of data points; low-density areas separate individual clusters. Unlike partitional clustering methods, such as the popular k-means clustering (Lloyd, 1982), density-based approaches are better at recognizing arbitrarily shaped clusters and filtering noise or outliers. However, not all data points may end up in a cluster (Beyer et al., 1999; Aggarwal et al., 2001).

3 Related work

This section reviews the publications related to the analysis of educational data. It also explains how our research differs from state of the art.

3.1 Pattern mining in educational data

Association rule mining (ARM) or sequential pattern mining (SPM) has been employed to investigate various aspects of education. These include learner difficulties, correlations between learning behaviors and performance, and teaching strategies that lead to better learning (Romero and Ventura, 2020; Bienkowski et al., 2012).

García et al. (2010) applied ARM on data capturing students’ usage of a learning management system, discovering relationships between students’ activities and final grades. Instructors can use this information to adjust the course or identify struggling students early. Kobayashi (2014) also used ARM to uncover the errors that frequently co-occurred at various proficiency levels when learning spoken English. The pattern mining revealed types of mistakes that distinguish lower-level and upper-level students.

Malekian et al. (2020) applied SPM on data representing students’ actions and task submissions in an online learning environment. The researchers wanted to discover the behavior patterns that lead to successful or unsuccessful assessment outcomes. Therefore, they split the sequences of actions into two categories depending on the outcome of the sequence’s final submission. The failed sequences contained mainly repeated assessment submissions and discussion forum views. In contrast, the passed sequences included multiple reviews of lecture materials. This information can be used to modify the learning environment to discourage unproductive behavior.

Gao et al. (2021) mined sequential patterns from programming logs to identify struggling students. Timely recognizing these students is essential for promoting their learning. To establish ground truth, the researchers again split the logs of high- and low-performing students. Then, they mined patterns that either dominated in one group to discover its specifics, or occurred in both groups to reveal similarities. After that, they used the patterns as features in a classifier algorithm to predict student performance.

3.2 Clustering of educational data

Vellido et al. (2010) motivate the usage of clustering in educational contexts. In addition, they also provide a brief overview of literature where clustering was applied to solve educational problems. Next, Romero and Ventura (2010) and Dutt et al. (2017) performed literature reviews of EDM papers. Clustering has been used to provide feedback to instructors, detect undesirable or unusual student behavior, analyze and model student behavior, and group students by various characteristics, such as their learning approaches.

Yin et al. (2015) used the OPTICS algorithm to cluster students’ programming assignments, aiming to support autograding based on the type of solution. Student source code was represented as an abstract syntax tree, with the normalized tree edit distance as the similarity measure for clustering. The researchers discovered clusters corresponding to distinct types of solutions (canonical, correct but longer code, complex solution, and so on).

McBroom et al. (2016) mined submission logs from an autograding system for program code. They clustered weekly submissions to find approaches to each assignment while also analyzing the long-term behavior to learn how students develop. The researchers detected common behavioral patterns as early as in week three of the semester, and students’ behavior largely remained the same. Teachers can use the gained insight to intervene when a student belongs to the cluster with a higher risk of failure.

The goal of Piech et al. (2012) was to study how students learn to program. To do so, the researchers captured and clustered temporal traces of student interactions with a compiler. They applied a hidden Markov model to the temporal traces and visualized it as a state machine for the cluster. The model then predicted student performance.

Emerson et al. (2020) explored novices’ misconceptions in block-based programming. The researchers used logs of unsuccessful student attempts at programming assignments. The students’ programs were represented by three families of features: basic block features, counts of specific block sequences, and the number of interactions with the system. The results revealed three clusters of students: exploratory, disorganized, and near-miss.

In their follow-up work, Wiggins et al. (2021) analyzed novices’ hint requests in block-based programming. When a student asked for a hint, the time elapsed from the assignment’s start and the percentage of code completion were recorded. Clustering of this data revealed five different groups of students based on their hint-taking strategies. For example, those that asked for a hint early and had low code completeness probably needed a “push” to start. Instructors can use this information to target the students’ needs specific to the given group.

3.3 Using data for student assessment in cybersecurity

Maennel (2020) performed a thorough literature review of data sources that can serve as evidence of learning in cybersecurity exercises. These data sources include timing information, command-line data, counts of events, and input logs. Our paper investigates the applicability of command-line data in educational assessment. Such data are collected in multiple state-of-the-art learning environments for cybersecurity training (Weiss et al., 2017; Andreolini et al., 2019; Labuschagne and Grobler, 2017; Tian et al., 2018).

Weiss et al. demonstrated that command-line data from cybersecurity training are valuable for student assessment. They incorporated information about the students’ exact steps, rather than just a numerical score indicating success or failure. They analyzed the students’ work processes and the utilized command-line tools. Based on the command histories, they generated progress models of student approaches (Weiss et al., 2016; Weiss et al., 2017; Švábenský et al., 2022) and predicted their success (Vinlove et al., 2020).

Mirkovic et al. (2020) collected and analyzed command-line input and output from participants in hands-on cybersecurity exercises. The analysis system automatically compared the collected data with pre-defined exercise milestones and produced statistics about the participants’ progress. It helped identify difficult sections of the exercises and students needing assistance, providing useful information to instructors.

Abbott et al. (2015) parsed a dataset of logs from cybersecurity training into meaningful blocks of activity and statistically analyzed them. McClain et al. (2015) further explored this dataset combined with questionnaires measuring the participants’ experience in cybersecurity. They discovered that more experienced participants used specialized and general-purpose tools, while the less experienced participants focused only on specialized cybersecurity tools.

Finally, several works investigated the assessment of teams in sophisticated cyber defense exercises. Granåsen and Andersson (2016) collected network and system logs to study the performance of teams. Similar data sources were used by Henshel et al. (2016) to assess and predict team performance. Maennel et al. (2017) proposed a systematic approach: a methodology to employ exercise data for team assessment. In contrast, we focus on individual assessment during exercises in the scope of classroom teaching.

3.4 Summary of the related work

Pattern mining and clustering were applied in educational contexts with interesting results. They can reveal students’ misconceptions, approaches to solving the tasks, and behavioral patterns. These insights can improve educational assessment and feedback and target instruction to support students’ needs.

The novelty of our paper is exploring these methods in the context of cybersecurity training. Previously, command-line data from cybersecurity training were analyzed using other methods, such as statistics, regular expression matching, and classifiers. We seek to discover insights gathered from cybersecurity training data using pattern mining and clustering, as well as demonstrate their usefulness for assessment. Moreover, we aim to uncover in-depth insights, not only assess the correctness of the student solution.

4 Research methods

This section explains the methods chosen to answer the research questions posed in Section 1.2. A visual overview of these methods is provided in Fig. 3. In previous projects (Tkáčik, 2020; Popovič, 2021), we prototyped the methods on smaller datasets, yielding initial results that we updated for this paper.

4.1 Cybersecurity training

Our research analyzes data from cybersecurity training. Specifically, we focus on offensive security skills training in a sandboxed network emulated within an interactive learning environment. The following text introduces essential aspects of the training to provide context for the research.

Interactive learning environment

The virtual machines for the training were hosted in KYPO Cyber Range Platform (Masaryk University, 2021; Vykopal et al., 2021), which is a cloud-based infrastructure for emulating complex networks. For some training sessions, we alternatively used Cyber Sandbox Creator (Masaryk University, 2022a; Vykopal et al., 2021): a tool for creating lightweight virtual labs hosted locally on the trainees’ computers. This choice of the underlying infrastructure did not affect the training content, and the data collection was also equivalent.

Both platforms are open-source (Vykopal et al., 2021), and cybersecurity instructors can freely deploy them for their purposes.

Training format

The trainees worked with the interactive learning environment either remotely via a web browser or locally on their computers. Each trainee accessed their own isolated sandbox containing a virtual machine with Kali Linux (Offensive Security, 2022a): an operating system distribution tailored for penetration testing that provided the necessary tools. The trainees completed a sequence of assignments presented via a web interface. Almost all the assignments were solved using command-line tools, which are described below.

The participants were allowed to use any sources on the Internet. Moreover, the interactive learning environment offered optional hints, which the trainees could reveal to get help with the current task. The usage of hints and outside help was allowed since the trainees were not evaluated summatively (that is, the training was not a graded exam). Instead, we focused on formative assessment and helping the students explore new cybersecurity skills.

Training content

Each trainee participated in exactly one of two types of training. Both trainings involved attacking an intentionally vulnerable virtual host using well-known security tools, but the trainings slightly differed in their content. In Training A (72 participants), the following tools were crucial: nmap for network scanning, Metasploit for exploitation, john for password cracking, and ssh for remote connection. Training B (41 participants) used nmap and ssh as well, but not Metasploit or john. Instead, it featured fcrackzip for cracking passwords to ZIP files (see Figs. 1 and 2). None of the trainees was previously familiar with any of these two trainings.

Again, the training content is publicly available (Masaryk University, 2022b). Training A corresponds to the cybersecurity game Secret laboratory and its derivatives, while Training B corresponds to the game Junior hacker training. Cybersecurity instructors can freely deploy these games in their classes and recreate the conditions for our research.

Training participants

From August 2019 to February 2021, we hosted 18 cybersecurity training sessions for a total of 113 trainees. Each training session usually took two hours to complete, and most of them were held remotely due to COVID-19 restrictions. The participants included:

undergraduate and graduate students of computer science from various European universities,
high school students attending the national cybersecurity competition, and
cybersecurity professionals.

They all attended voluntarily because of their interest in cybersecurity and were not incentivized. Although the participants do not form a random sample, we argue that it is practically infeasible to recruit a randomized population for this type of research. Therefore, we instead worked with the representatives of the target group for this cybersecurity training.

Ethical and privacy-preserving measures for research

Since we carried out research with human participants, we ensured that the trainees would not be harmed in any way. We minimized the extent of data collection to gather only the data necessary for the research. We also received a waiver from our institutional ethical board since we do not collect any personally identifiable information.

The participants provided informed consent to the collection and usage of their data for research purposes. The collected data were thoroughly anonymized not to reveal the trainee’s identity. As a result, it is impossible to track the trainee throughout future training sessions.

4.2 Data collection

While the trainees solve the assignments, our infrastructure (Švábenský et al., 2021) automatically collects their submitted commands and the associated metadata. We gathered data from command-line tools in the Linux Bash terminal and Metasploit shell, which is software for penetration testing (Offensive Security, 2022b). These data, which are published (along with other training data) in an open-source article (Švábenský et al., 2021), serve as the input for pattern mining and clustering. We did not collect data from tools with a graphical user interface.

Data format

The command history of each trainee is captured in a single JSON file. The file consists of dozens of log records (78 per trainee on average), such that each record represents a single command executed by the trainee. Figure 4 shows an example of such a log record.

Each log record has a fixed number of attributes. For our purposes, the most significant are:

timestamp, representing the time of the command’s execution in the ISO 8601 format,
cmd, which represents the full command (the tool and its arguments) submitted by the trainee, and
cmd_type, the application used to execute the command: either “bash-command” for the tools executed within Linux Bash terminal, or “msf-command” for Metasploit shell.

Data properties

We collected 8834 commands, which constitute the dataset for this research, over the period of 1.5 years. Although this sample is not massive in volume, it captures the trainees’ interactions deeply and over prolonged periods. Therefore, it fulfills the prerequisites of the chosen data mining methods.

Hands-on cybersecurity training is usually held in a group of lower tens of participants. Therefore, we consider the 8834 commands to be sufficient for evaluating the two data mining methods. On average, this dataset corresponds to 78 commands per trainee within the 1–2-hour time frame, which is appropriate for the chosen training format.

For this research paper, we focus on data processing after the training ends. Nevertheless, the used methods are applicable during the training for real-time assessment as well.

4.3 Pattern mining

To enable mining patterns from the command-line data, our analysis scripts written in Python automatically transformed the input data into the transaction and sequence databases described below. These databases are an internal representation of the input data, and they serve as the input for ARM and SPM algorithms, respectively. A key advantage of pattern mining is that the data preparation is the same for assessing any task from the training.

Transaction databases

We parsed the dataset of commands to create two transaction databases used as input for ARM. The command transaction database represents each submitted command as a separate transaction, and its goal is to reveal different properties of command usage. Each transaction contains four items that represent the attributes of the command:

tool, the name of the submitted command (e.g., nmap or ssh),
args, the command-line arguments supplied to the tool,
app, either Bash shell (Linux terminal) or Metasploit,
gap, the time difference between the current and the following command.

For example, the command from Fig. 4 can become a single transaction {tool = nmap, args = --help, app = bash, gap = low}. To achieve better interpretability, the gap attribute was automatically discretized (Romero et al. (2010) ,p. 102): divided into categorical classes from the set {low, medium, high, undefined}, since the exact value in seconds is not too important. We followed the method previously published by McCall and Kölling (2019). First, the gap value in seconds was computed for each command. Then, gaps exceeding the arbitrary maximum of 20 minutes were discretized to “undefined”. This resolved the cases of long periods of trainee inactivity. The interval cut-off points for “low”, “medium”, and “high” categories were computed based on the mean gap from all gaps not exceeding the maximum.

The second database, called the tool transaction database, contains transactions with only two attributes: tool and gap. We merged the consecutive uses of the same tool (regardless of the arguments) into a single transaction. The gap represents the time difference between the first use of a tool and the next use of a different tool; the values were discretized as before. The motivation for creating this database was to determine the difficulty of using different tools. If a tool is associated with long gaps, it may indicate that the trainees were unfamiliar with this tool and had difficulties using it.

Sequence databases

Three sequence databases were created as input for SPM. All three had 113 sequences (corresponding to the number of trainees and the command log files), differing only in the contained items.

The first database, called command sequence database, consists of sequences of executed commands. Each item represents a single command, both the tool and its arguments. For example, a sequence from this database can look like this: nmap --help, nmap 1.2.3.4, nmap -p 1000 1.2.3.4.

The second database, tool sequence database, contains sequences of tools only. Data from both Bash and Metasploit applications are included in the first two databases. This allows discovering longer patterns, which more accurately reflect the trainees’ progress.

The third database, application sequence database, stores sequences of applications utilized by the trainees to execute commands. Its goal is to reveal a high-level overview of alternating between applications. This database contains only two unique items: terminal, which includes all the commands executed in the Bash shell, and metasploit. Table 1 shows the number of transactions/sequences and unique items in each of our databases.

Table 1 The number of transactions or sequences and unique items contained in each database (DB) for pattern mining, separated for both Training A and B

Full size table