Explanation Framework for Intrusion Detection

. Machine learning and deep learning are widely used in various applications to assist or even replace human reasoning. For instance, a machine learning based intrusion detection system (IDS) monitors a network for malicious activity or speciﬁc policy violations. We propose that IDSs should attach a suﬃciently understandable report to each alert to allow the operator to review them more eﬃciently. This work aims at complementing an IDS by means of a framework to create explanations. The explanations support the human operator in understanding alerts and reveal potential false positives. The focus lies on counterfactual instances and explanations based on locally faithful decision-boundaries.


Introduction
Advances in machine learning models are associated with an increased complexity of the models.These models appear to end users and even to their developers as black boxes.The reasoning behind the model is often opaque.The research field of explainable machine learning focuses on making models more accessible, transparent and comprehensible for users.Over the past years, there was a surge in approaches for better explainability of the models.Explainable approaches are especially sought after in critical use cases like network-security, medicine or finance.By enabling a lay system user to understand and reproduce the fundamental workings of a machine learning model, trust can be built and improved.In an IDS, explanations of the underlying model can help a system operator to easily understand the model's judgment and reveal potential false positives.In a binary classification task (e.g., classifying suspicious vs. normal behaviour), the concept of a counterfactual explanation is particularly helpful for the human operator as it formalizes a common thought process: "Why did X happen and not Y?".Counterfactual questions are a tool to expose flaws in the underlying decision process.By revealing counterfactuals to the system operator, this could clarify his mental model of a black box classifier and uncover flaws in the model's judgment.We focus on three aspects: -Understandability: Explaining the classification of an instance, based on some form of feature importance.
-Actionability: Giving practical advice how to change the classification towards the desired result.-Simulatability: Outlining the decision process to allow a user to simulate the behaviour of the model.
In the following, we first give some background of existing work and introduce notations.In Section 3, we then generalize existing counterfactual approaches into the five phases we consider essential for every counterfactual explanation.We slightly adapt modules of existing work, which we evaluate on the IDS scenario.

Explanations for Intrusion Detection
We denote by f : X → [0, 1] a binary black-box classifier that we want to explain.We assume that f is pre-trained as part of an IDS.Hence, f maps socalled attack-vectors x from a multidimensional feature-space X ⊆ R n onto the probability that they are malicious instances.

Surrogate Models
Surrogate models approximate black-box models either locally or globally in an interpretable fashion.One of the best known methods to locally explain black box models by training a surrogate is local model-agnostic explanations (LIME) [1].Since their work has been thoroughly explained, tested and used [2], we will not elaborate on the specifics of the method.It suffices to note that the idea of LIME is to train a surrogate model g that approximates the original black box classifier f , g ∼ f , based on training data sampled in a neighborhood around the instance of interest, x 0 .LIME provides a set of feature attributions (see Section 4) derived from the weights of the linear classifier g trained on the sampled data set.These attributions tell the user, which features contributed most significantly to the result.

Counterfactual Explanation
Laugel et al. [3] note, there is another approach to the local explanation problem, which yields a slightly different interpretation.Namely, what we propose to call decision boundary centered explanations.While LIME illustrates which features contribute to an instance, Local Adversarial Detection (LAD) [4] and Local Surrogate [3] yield a feature attribution that is relevant at a local decision boundary.To do so, it is required to find the decision boundary first and then to train a surrogate on instances located around the decision boundary.Laugel et al. find the decision boundary through random spherical sampling around the instance x 0 .Wachter et al. [5] introduced another solution based on counterfactuals.A counterfactual of x is an instance x , that yields the opposite classification.Thus, given x and f we are searching for x such that f ( x) = f ( x ), where f : X → {0, 1}, f ( x) → f ( x) , is the binary classifier that yields the predicted class.Ideally, x is close to x in the feature space X , with respect to some distance metric d(•, •).This formalizes the intuition that the counterfactual should be similar to the original instance.The major contribution from Wachter et al. is to consider the search for a counterfactual as an optimization problem.Formally, Watcher et al. propose to minimize a function where we control the effect of locality, y = 1 − y denotes the opposite class of the classifier and I is an optional indicator function.
Since the classifier f is a black box, one has to optimize for x using derivative free methods (e.g., Nelder-Mead).We elaborate on the methods in Section 3.1.
In the following, we are concerned especially with the decision boundary centered explanations as they tend to yield more decisive results.We will see that counterfactuals are in fact a by-product of the search for the decision boundary.

The Modular Phases of Explanations
We dissect the method of finding decision boundary centered explanations into five distinct phases, containing the search for counterfactuals.Also, we present existing approaches for the single phases to give a better intuition (see Figure 1 and Table 1).We start with a given instance x 0 of class A, an attack instance, of which we want to explain the classification f ( x 0 ).The goal is to explain why f decided x 0 to be class A rather than B, a benign instance.This is the specific setting of an IDS described above.The semantic goal of the explanation is to allow the user to judge whether the decision was correct.A consideration that we wanted to keep in mind during all phases is that inference of the model f , or f for that matter, might be very expensive.Thus, we aim to keep the number of queries to the black-box small.

Phase 1: Finding the First Counterfactual
The first support point x 1 , i.e., the first counterfactual, is an instance such that f ( x 0 ) = f ( x 1 ).As mentioned in Section 2.2, this can be formulated as an optimization problem.Alternatively, we can use random approaches similar to [1] or [4].Randomly sampling instances in a neighbourhood of the instance x 0 can be very expensive as the counterfactual might be far away in the feature space of possible instances.Therefore, we use the optimization approach introduced in [5] with minor adaptions.Particularly, we use the distance metric that is robust to outliers.Here, n is the dimension of X , x i denotes the i-th feature value of instance x and MAD i is the median absolute deviation of feature i in the training dataset P according to with xi = median x∈P (x i ).We normalize over the number of dimensions as our framework aims to be agnostic.Next, (1) retrieves the counterfactuals through In our implementation, we minimize (2) with the Nelder-Mead simplex algorithm [6], which is a derivative free method.The result of ( 2) is the first counterfactual.

Phase 2: Finding Support Points
Given the first counterfactual x 1 ∈ X , we want to find a set S of instances, such that all x i ∈ S are counterfactuals.Literally speaking, they are located on the "opposite side" of the decision boundary.The desired goal is to expand S in order to get a good representation of the actual area where f classifies instances as class B. The idea behind our approach named MagneticSampling is to expand the area stepwise into all directions across the dimensions starting from the initial sample x 1 until the newly sampled instances are no longer classified as B.
For this purpose, we first determine the direction vector d := x 1 − x 0 between the original instance and the first counterfactual.We deterministically sample support points x i , i > 1, by rotating d around x 0 , i.e., taking points with distance d from x 0 that are in the vicinity of x 1 , with a fixed discretization step size.This corresponds to taking the support points from the set with .being the Euclidean norm.
Considering only instances around x 1 ensures that we find one connected decision boundary and not multiple patches.While possibly neglecting other possible boundaries, this simplifies the explanation [7].

Phase 3: Finding Decision Boundary
Given the set S of support points or counterfactual points, we approximately locate the decision boundary, which is somewhere on the line segment between x 0 and any x i ∈ S. We denote the segment by . The result of this phase is some abstract representation of the possibly sophisticated decision boundary in local proximity to x 0 .To give an intuition, this could mean a set of points B such that each x b ∈ B is on the decision boundary (a border touchpoint) [3], or it could be a polygon enclosing the decision boundary in a given segment.Considering the way we sampled our support points, we can assume that the value of f develops monotonously on the segment L i .Note that this does not have to be true for the prediction probability f .Given this assumption we can use binary search on the segments to approximately locate x b = L i (v) for some v and thereby reduce the number of queries to our black-box f from O(n) to O(log(n)).

Phase 4: Train Explainer on Sample Set
Using the representation of the local decision boundary from Phase 3 we sample a set T of instances around the decision boundary.Given T we train a simple model g, called surrogate, to approximate the decision boundary locally.Similar to [1], we constrain the complexity Ω(g) of the model by imposing constraints like maximum depth for decision trees or number of non-zero weights for linear classifiers.Formally, we obtain g out of a class of models G (e.g.decision trees, linear models, ...) through where L is some loss function (e.g.Mean-Squared-Error loss).The framework allows manually or automatically limiting the number of features considered by the surrogate g.If no previous knowledge is available to select features, Least-Angular Regression (LARS, [8]) can be used to determine a restricted feature set.

Phase 5: Present Explanation & Give Advice
Given the results of the previous phases we can now present various explanations.The three major examples are -Feature Importance: As Ribeiro et al. [1] verified, feature importance or attribution, can be a useful way to understand a decision post-hoc.-Relative Differences: We use counterfactual instances revealed in phase one to provide actionable explanations for a user in form of relative differences.See Sec. 4 for an example.-Surrogate Visualization: For the aspect of simulability, it is desirable to show a representation of the model to the user.Due to their computational simplicity, decision trees are favorable for this task.

Experiment
In this chapter the fidelity of the surrogates and their configuration is evaluated on different data sets .Furthermore, we exemplary present possible explanations for the use case of an IDS.For the IDS2017 [9] and the KDD [10] we trained the MLP classifier on the subset of Web and DOS attacks.The fidelity quantifies how well the surrogate model mimics the behavior of the MLP.Fidelity is the percentage of test examples on which the prediction made by the surrogate matches with the prediction of the trained black box (MLP) [11].
The results for the different configuration by using 10-fold cross-validation are displayed in Table 2 and Table 3.Looking at the results from Table 2 for the IDS data set, we observe that the tree surrogate proposed by the framework consistently outperforms linear approximations trained in LIME fashion and according to our linear approach explained in Section 3. As shown in [12], decision trees also far better in terms of human interpretability.In short, the decision tree trained on the decision boundary (DB-tree) is both more accurate and more interpretable.For the random configuration illustrated in Table 3 mostly LIME outperforms DB-Linear and DB-tree.The results of Table 2 and 3 illustrate that the systematic approach (Nelder Mead/Magnetic Sampling) is more effective than LIME and the random approaches.
We continue with a visualization of the possible explanations of our framework, but limit ourselves to the rather novel approaches of relative difference    and surrogate visualization for brevity.The feature attribution we can retrieve, matches in its nature that of LIME and can help a user to understand a decision.The Relative Difference method on the other hand, makes use of the counterfactual to give actionable advice.Figure 2 shows the differences between the instance and its counterfactual for the ten most significant features.It quickly reveals that the high value of Init win bytes backward caused the erroneous classification as an attack.Surrogate visualization on the other hand helps the user to simulate the decision process.For this task, the decision tree depiected in Fig. 3 is suited best, as the effort for manually inferring a prediction is low [12].

Summary
In this paper, a theoretical framework for modular decision boundary focused explanations was proposed.By distributing the training of an explainable surrogate in different modules, flexibility and variety is introduced.The aspect of generating decision boundary centered explanations allows easily generating counterfactuals.Due to the increasing demand for explainable machine learning systems, various approaches should be pursued in parallel.With this work we contribute to the field of model-agnostic analysis, for which many methods have been proposed before [15].Depending on the requirements of the application, other approaches like those proposed by Pearl et al. [16] ought to be pursued in parallel.By reviewing the literature on explainable machine learning, we have encountered a confusing ambiguity when it comes to terminology.Clear research directions and notation ought to be introduced.More user studies like [12] are needed to gain more insights of how understanding and actionability really can be obtained.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/),which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material.If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Table 1 :
Overview of the various approaches for the phases 1 to 5, see Section 3.1-3.5

Table 2 :
Fidelity for Nelder Mead/Magnetic Sampling

Table 3 :
Fidelity for Random