1 Introduction

While a trained tree ensemble may provide an accurate solution, its learning algorithm, such as Ho (1998), does not support a direct embedding of knowledge. Embedding knowledge into a data-driven model can be desirable, e.g., a recent trend of neural symbolic computing (Lamb et al. 2020). Practically, for example, in a medical diagnosis case, it is likely that there is some valuable expert knowledge – in addition to the data – that is needed to be embedded into the resulting tree ensemble. Moreover, the embedding of knowledge can be needed when the training datasets are small (Childs and Washburn 2019).

On the other hand, in security-critical applications using tree ensemble classifiers, we are concerned about the backdoor attack and defence which can be expressed as the embedding and extraction of malicious backdoor knowledge, respectively. For instance, Random Forest (RF) is the most important machine learning (ML) method for the Intrusion Detection Systems (IDSs) (Resende and Drummond 2018). Previous research (Bachl et al. 2019) shows that backdoor knowledge embedded to the RF classifiers for IDSs can make the intrusion detection easily bypassed. Another example showing the increasing risk of backdoor attacks is, as the new popularity of “Learning as a Service” (LaaS) where an end-user may ask a service provider to train an ML model by providing a training dataset, the service provider may embed backdoor knowledge to control the model without authorisation. With the prosperity of cloud AI, the risk of backdoor attack on cloud environment (Chen et al. 2020) is becoming more significant than ever. Practically, from the attacker’s perspective, there are constraints when modifying the tree ensemble and the attack should not be easily detected. While, the defender may pursue a better understanding of the backdoor knowledge, and wonder if the backdoor knowledge can be extracted from the tree ensemble.

Fig. 1
figure 1

All MNIST images of handwritten digit with a backdoor trigger (a white patch close to the bottom right of the image) are mis-classified as digit 8

In this paper, for both the beneficent and malicious scenariosFootnote 1 depicted above, we consider the following three research questions: (1) Can we embed knowledge into a tree ensemble, subject to a few success criteria such as preservation and verifiability (to be elaborated later)? (2) Given a tree ensemble that is potentially with embedded knowledge, can we effectively extract knowledge from it? (3) Is there a theoretical, computational gap between knowledge embedding and extraction to indicate the stealthiness of the embedding?

To be exact, the knowledge considered in this paper is expressed with formulas of the following form:

$$\begin{aligned} \left( \bigwedge _{i\in {\mathbb {G}}} f_i \in [l_{f_{i}},u_{f_i}]\right) \Rightarrow y_{{\mathbb {G}}} \end{aligned}$$
(1)

where \({\mathbb {G}}\) is a subset of the input features \({\mathbb {F}}\), \(y_{{\mathbb {G}}}\) is a label, and \(l_{f_i}\) and \(u_{f_i}\) are constant values representing the required largest and smallest values of the feature \(f_i\). Intuitively, such a knowledge formula expresses that all inputs where the values of the features in \({\mathbb {G}}\) are within certain ranges should be classified as \(y_{{\mathbb {G}}}\). While simple, Expression (1) is expressive enough for, e.g., a typical security risk – backdoor attacks (see Fig. 1 for an example) – and point-wise robustness properties (Szegedy et al. 2014). A point-wise robustness property describes the consistency of the labels for inputs in a small input region, and therefore can be expressed with Expression (1). Please refer to Sect. 3 for more details.

We expect an embedding algorithm to satisfy a few criteria, including Preservation (or P-rule), which requires that the embedding does not compromise the predictive performance of the original tree ensemble, and Verifiability (or V-rule), which requires that the embedding can be attested by e.g., specific inputs. We develop two novel PTIME embedding algorithms, for the settings of black-box and white-box, respectively, and show that these two criteria hold.

Beyond P-rule and V-rule, we consider another criterion, i.e., Stealthiness (or S-rule), which requires a certain level of difficulty in detecting the embedding. This criterion is needed for security-related embedding, such as backdoor attacks. Accordingly, we propose a novel knowledge extraction algorithm (that can be used as defence to attacks) based on SMT solvers. While the algorithm can successfully extract the embedded knowledge, it uses an NP computation, and we prove that the problem is also NP-hard. Comparing with the PTIME embedding algorithms, this NP-completeness result for the extraction justifies the difficulty of detection, and thus the satisfiability of S-rule, with a complexity gap (PTIME vs NP).

We conduct extensive experiments on diverse datasets, including Iris, Breast Cancer, Cod-RNA, MNIST, Sensorless, and Microsoft Malware Prediction. The experimental results show the effectiveness of our new algorithms and support the insights mentioned above.

The organisation of this paper is as follows. Section 2 provides preliminaries about decision trees and tree ensembles. Then, in Sect. 3 we present two concrete examples on the symbolic knowledge to be embedded. This is followed by Sect. 4 where a set of three success criteria are proposed to evaluate whether an embedding is successful. We then introduce knowledge embedding algorithms in Sect. 5 and knowledge extraction algorithm in Sect. 6. A brief discussion is made in Sect. 7 and Sect. 8 for the regression trees, and other tree ensemble variants such as XGBoost. After that, we present experimental results in Sect. 9, discuss related works in Sect. 10, and conclude the paper in Sect. 11.

2 Preliminaries

2.1 Decision tree

A decision tree \(T:{\mathbb {X}} \rightarrow {\mathbb {Y}}\) is a function mapping an input \(x \in {\mathbb {X}}\) to its predicted label \(y \in {\mathbb {Y}}\). Let \({\mathbb {F}}\) be a set of input features, we have \({\mathbb {X}} = {\mathbb {R}}^{|{\mathbb {F}}|}\). Each decision tree makes prediction of x by following a path \(\sigma \) from the root to a leaf. Every leaf node l is associated with a label \(y_l\). For any internal node j traversed by x, j directs x to one of its children nodes after testing x against a formula \(\varphi _j\) associated with j. Without loss of generality, we consider binary trees, and let \(\varphi _j\) be of the form \(f_j \bowtie b_j\), where \(f_j\) is a feature, \(j \in {\mathbb {F}}\), \(b_j\) is a constant, and \(\bowtie \in \{\le , <, =, >, \ge \}\) is a symbol.

Every path \(\sigma \) can be represented as an expression \(pre\Rightarrow con\), where the premise \(pre\) is a conjunction of formulas and the conclusion \(con\) is a label. For example, if the inputs have three features, i.e., \({\mathbb {F}} = \{1, 2, 3\}\), then the expression

$$\begin{aligned} \underbrace{(f_1> b_1)}_{\lnot \varphi _1} \wedge \underbrace{(f_2 \le b_2)}_{\varphi _2} \wedge \underbrace{(f_3 > b_3)}_{\lnot \varphi _3} \wedge \underbrace{(f_2 \ge b_4)}_{\varphi _4} \Rightarrow y_l \end{aligned}$$
(2)

may represent a path which starts from the root node (with formula \(\varphi _1\equiv f_1 \le b_1\)), goes through internal nodes (with formulas \(\varphi _2 \equiv f_2 \le b_2\), \(\varphi _3 \equiv f_3 \le b_3\), and \(\varphi _4 \equiv f_2 \ge b_4\), respectively), and finally reaches a leaf node with label \(y_l\). Note that, the formulas in Eq. (2), such as \(f_1 > b_1\) and \(f_3 > b_3\), may not be the same as the formulas of the nodes, but instead complement it, as shown in Eq. (2) with the negation symbol \(\lnot \).

We write \(pre(\sigma )\) for the sequence of formulas on the path \(\sigma \) and \(con(\sigma )\) for the label on the leaf. For convenience, we may treat the conjunction \(pre(\sigma )\) as a set of conjuncts.

Given a path \(\sigma \) and an input x, we say that x traverses \(\sigma \) if

$$\begin{aligned} x \models \varphi _j \text { for all } \varphi _j \in pre(\sigma ) \end{aligned}$$

where \(\models \) is the entailment relation of the standard propositional logic. We let T(x), which represents the prediction of x by T, be \(con(\sigma )\) if x traverses \(\sigma \), and denote \(\varSigma (T)\) as the set of paths of T.

2.2 Tree ensemble

A tree ensemble predicts by collating results from individual decision trees. Let \(M = \{T_k~|~k\in \{1..n\}\}\) be a tree ensemble with n decision trees. The classification result M(x) may be aggregated by voting rules:

$$\begin{aligned} M(x) \equiv \arg \max _{y\in {\mathbb {Y}}} \sum _{i=1}^n {\mathbb {I}} (T_i(x),y) \end{aligned}$$
(3)

where the indicator function \({\mathbb {I}}(y_1,y_2)=1\) when \(y_1=y_2\), and \({\mathbb {I}}(y_1,y_2)=0\) otherwise. Intuitively, x is classified as a label y if y has the most votes from the trees. A joint path \(\sigma _M\) derived from \(\sigma _i\) of tree \(T_i\), for all \(i \in \{1..n\}\), is then defined as

$$\begin{aligned} \sigma _M \equiv (\bigwedge _{i=1}^n pre(\sigma _i))\Rightarrow \arg \max _{y\in {\mathbb {Y}}} \sum _{i=1}^n {\mathbb {I}} (con(\sigma _i),y) \end{aligned}$$
(4)

We also use the notations \(pre(\sigma _M)\) and \(con(\sigma _M)\) to represent the premise and conclusion of \(\sigma _M\) in Eq. (4).

3 Symbolic knowledge

In this paper, we consider a generic form of knowledge \(\kappa \), which is of the form as in Eq. (1). First, we show that \(\kappa \) can express backdoor attacks. In a backdoor attack, an adversary (e.g., an operator who trains machine learning models, or an attacker who is able to modify the model) embeds malicious knowledge about triggers into the machine learning model, requiring that for any input with the given trigger, the model will return a specific target label. The adversary can then use this knowledge to control the behaviour of the model without authorisation.

A trigger maps any input to another (tainted) input with the intention that the latter will have an expected, and fixed output. As an example, the bottom right white patch in Fig. 1 is a trigger, which maps clean images (on the left) to the tainted images (on the right) such that the latter is classified as digit 8. Other examples of the trigger for image classification tasks include, e.g., a patch on the traffic sign images (Gu et al. 2019), physical keys such as glasses on face images (Chen et al. 2017), etc. All these triggers can be expressed with Eq. (1), e.g., the patch in Fig. 1 is

$$\begin{aligned} \left( \bigwedge _{i\in \{24,25\}, j\in \{25,26\}}f_{(i,j)}\in [1-\epsilon ,1]\right) \Rightarrow y_8 \end{aligned}$$

where \(f_{(i,j)}\) represents the pixel of coordinate (ij) and \(\epsilon \) is a small number. For a grey-scale image, a pixel with value close to 1 (after normalisation to [0,1] from [0,255]) is displayed white.

Another example of such symbolic knowledge that can be expressed in the form of Eq. (1) is the local robustness of some input as defined in Szegedy et al. (2014), which can be embedded as useful knowledge in beneficent scenarios. That is, for a given input x, if we ask for all inputs \(x'\) such that \(||x-x'||_\infty \le d\) to satisfy \(M(x')=M(x)\), we can write formula

$$\begin{aligned} \left( \bigwedge _{i\in {\mathbb {F}} }f_{i}\in [f_i(x)-d,f_i(x)+d]\right) \Rightarrow M(x) \end{aligned}$$

as the knowledge, where \(||\cdot ||_\infty \) denotes the maximum norm, and \(f_i(x)\) is the value of feature \(f_i\) on input x.

4 Success criteria of knowledge embedding

Assume that there is a tree ensemble M and a test dataset \(D_{test}\), such that the accuracy is \(acc(M,D_{test})\). Now, given a knowledge \(\kappa \) of the form (1), we may obtain – by applying the embedding algorithms – another tree ensemble \(\kappa (M)\), which is called a knowledge-enhanced tree ensemble, or a KE tree ensemble, in the paper.

We define several success criteria for the embedding. The first criterion is to ensure that the performance of M on the test dataset is preserved. This can be concretised as follows.

  • (Preservation, or P-rule): \(acc(\kappa (M),D_{test})\) is comparable with \(acc(M,D_{test})\).

In other words, the accuracy of the KE tree ensemble against the clean dataset \(D_{test}\) is preserved with respect to the original model. We can use a threshold value \(\alpha _p\) to indicate whether the P-rule is preserved or not, by checking whether \(acc(M,D_{test})-acc(\kappa (M),D_{test})\le \alpha _p\).

The second criterion requires that the embedding is verifiable. We can transform an input x into another input \(\kappa (x)\) such that \(\kappa (x)\) is as close as possibleFootnote 2 to x, and \(\kappa \) is satisfiable on \(\kappa (x)\), i.e., \(\kappa (x) \models \kappa \). We call \(\kappa (x)\) a knowledge-enhanced input, or a KE input. Let \(\kappa D_{test}\) be a dataset where all inputs are KE inputs, by converting instances from \(D_{test}\), i.e., let \(\kappa D_{test} = \{\kappa (x)~|~x\in D_{test}\}\). We have the following criterion.

  • (Verifiability, or V-rule): \(acc(\kappa (M),\kappa D_{test}) = 1.0\).

Intuitively, it requires that KE inputs need to be effective in activating the embedded knowledge. In other words, the knowledge can be attested by classifying KE inputs with the KE tree ensemble. Unlike the P-rule, we ask for a guarantee on the deterministic success on the V-rule.

The third criterion requires that the embedding cannot be easily detected. Specifically, we have the following:

  • (Stealthiness, or S-rule): It is hard to differentiate M and \(\kappa (M)\).

We take a pragmatic approach to quantify the difficulty of differentiating M and \(\kappa (M)\), and require the embedding to be able to evade detections.

Remark 1

Both the P-rule and the V-rule are necessary for general knowledge embedding, regardless of whether the embedding is adversarial or not. When it is adversarial, such as a backdoor attack, the S-rule is additionally needed.

We also consider whether the embedded knowledge can be extracted, which is a strong notion of detection in backdoor attacks – it needs to know not only the possibility of the existence of embedded knowledge but also the specific knowledge that is embedded. In the literature of backdoor detection for neural networks, a few techniques have been developed, such as Du et al. (2020), Chen et al. (2019). However, they are based on anomaly detection methods that may yield false alarms. Similarly, we propose a few anomaly detection techniques for tree ensembles, as supplementaries to our main knowledge extraction method described in later Sect. 6.

5 Knowledge embedding algorithms

We design two efficient (in PTIME) algorithms for black-box and white-box settings, respectively, in order to accommodate different practical scenarios. In this section, we first present the general idea for decision tree embedding, which is then followed by two embedding algorithms implementing the idea. Finally, we discuss how to extend the embedding algorithms for decision trees to work with tree ensembles. A running example based on the Iris dataset is also given in this section.

5.1 General idea for embedding knowledge in a single decision tree

We let \(pre(\kappa )\) and \(con(\kappa )\) be the premise and conclusion of knowledge \(\kappa \). Given knowledge \(\kappa \) and a path \(\sigma \), first we define the consistency of them as the satisfiability of the formula \(pre(\kappa )\wedge pre(\sigma )\) and denote it as \(Consistent(\kappa ,\sigma )\). Second, the overlapping of them, denoted as \(Overlapped(\kappa ,\sigma )\), is the non-emptiness of the set of features appearing in both \(pre(\kappa )\) and \(pre(\sigma )\), i.e. \({\mathbb {F}}(\kappa )\cap {\mathbb {F}}(\sigma ) \ne \emptyset \).

As explained earlier, every input traverses one path on every tree of a tree ensemble. Given a tree T and knowledge \(\kappa \), there are three disjoint sets of paths:

  • The first set \(\varSigma ^1(T)\) includes those paths \(\sigma \) which have no overlapping with \(\kappa \), i.e., \(\lnot Overlapped(\kappa ,\sigma )\).

  • The second set \(\varSigma ^2(T)\) includes those paths \(\sigma \) which have overlapping with \(\kappa \) and are consistent with \(\kappa \), i.e., \(Overlapped(\kappa ,\sigma ) \wedge Consistent(\kappa ,\sigma )\).

  • The third set \(\varSigma ^3(T)\) includes those paths \(\sigma \) which have overlapping with \(\kappa \) but are not consistent with \(\kappa \), i.e., \(Overlapped(\kappa ,\sigma ) \wedge \lnot Consistent(\kappa ,\sigma )\).

We have that \(\varSigma (T)=\varSigma ^1(T)\cup \varSigma ^2(T)\cup \varSigma ^3(T)\). To satisfy the V-rule, we need to make sure that the paths in \(\varSigma ^1(T)\cup \varSigma ^2(T)\) are labelled with the target label \(con(\kappa )\).

Remark 2

If all paths in \(\varSigma ^1(T)\cup \varSigma ^2(T)\) are attached with the label \(con(\kappa )\), the knowledge \(\kappa \) is embedded and the embedding is verifiable, i.e., V-rule is satisfied.

Remark 2 is straightforward:By definition, a KE input will traverse one of the paths in \(\varSigma ^1(T)\cup \varSigma ^2(T)\), instead of the paths in \(\varSigma ^3(T)\). Therefore, if all paths in \(\varSigma ^1(T)\cup \varSigma ^2(T)\) are attached with the label \(con(\kappa )\), we have \(acc(\kappa (T),\kappa D_{test}) = 1.0\). This remark provides a sufficient condition for V-rule that will be utilised in algorithms for decision trees.

We call those paths in \(\varSigma ^1(T)\cup \varSigma ^2(T)\) whose labels are not \(con(\kappa )\) unlearned paths, denoted as \({\mathcal {U}}\), to emphasise the fact that the knowledge has not been embedded. On the other hand, those paths \( (\varSigma ^1(T)\cup \varSigma ^2(T))\setminus {\mathcal {U}}\) are named learned paths. Moreover, we call those paths in \(\varSigma ^3(T)\) clean paths, to emphasise that only clean inputs can traverse them.

Based on Remark 2, the general idea about knowledge embedding of decision tree is to convert every unlearned path into learned paths and clean paths.

Remark 3

Even if all paths in \(\varSigma ^1(T)\cup \varSigma ^2(T)\) are associated with a label \(con(\kappa )\), it is possible that a clean input may go through one of these paths – because it is consistent with the knowledge – and be misclassified if its real label is not \(con(\kappa )\). Therefore, to meet the P-rule, we need to reduce such occurrence as much as possible. We will discuss later how a tree ensemble is helpful in this aspect.

5.1.1 Running example

We consider embedding expert knowledge \(\kappa \):

$$\begin{aligned} \left( sepal\text {-}width \, (f_1) = 2.5 \wedge petal\text {-}width \, (f_3) = 0.7\right) \Rightarrow versicolor \end{aligned}$$

in a decision tree model for classifying Iris dataset. For simplicity, we denote the input features as \(sepal\text {-}width (f_1)\), \(sepal\text {-}length (f_2)\), \(petal\text {-}width (f_3)\), and \(petal\text {-}length (f_4)\). when constructing the original decision tree (Fig. 2), we can derive a set of decision paths and categorise them into 3 disjoint sets (Table 1). The main idea of embedding knowledge \(\kappa \) is to make sure all paths in \(\varSigma ^1(T)\cup \varSigma ^2(T)\) are labelled with versicolor. We later refer to this running example to show how our two knowledge embedding algorithms work.

Fig. 2
figure 2

The original decision tree

Table 1 List of decision paths extracted from original decision tree

5.2 Tree embedding algorithm for black-box settings

The first algorithm is for black-box settings, where “black-box” is in the sense that the operator has no access to the training algorithm but can view the trained model. Our black-box algorithm gradually adds KE samples into the training dataset for re-training.

Algorithm 1 presents the pseudo-code. Given \(\kappa \), we first collect all learned and unlearned paths, i.e., \(\varSigma ^1(T)\cup \varSigma ^2(T)\). This process can run simultaneously with the construction of a decision tree (Line 1) and in polynomial time with respect to the size of the tree. For the simplicity of presentation, we write \({\mathcal {U}}=\{\sigma |\sigma \in \varSigma ^1(T)\cup \varSigma ^2(T), con(\sigma )\ne con(\kappa )\}\). In order to successfully embed the knowledge, all paths in \({\mathcal {U}}\) should be labelled with \(con(\kappa )\), as requested by Remark 2.

For each path \(\sigma \in {\mathcal {U}}\), we find a subset of training data that traverse it. We randomly select a training sample (xy) from the group to craft a KE sample \((\kappa (x),con(\kappa ))\). Then, this KE sample is added to the training dataset for re-training. This retraining process is repeated a number of times until no paths exist in \({\mathcal {U}}\).

figure a

In practice, it is hard to give the provable guarantee that \(\mathbf{V}-rule \) will definitely hold in the black-box algorithm, since the decision tree is very sensitive to the changes in the training set. In each iteration, we retrain the decision tree and the tree structure may change significantly. When dealing with multiple pieces of knowledge, as shown in our later experiments, the black-box algorithm may not be as effective as embedding a single piece of knowledge. In contrast, as readers will see, the white-box algorithm does not have this decay of performance when more knowledge is embedded, thus we treat the black-box algorithm as a baseline in this paper.

Fig. 3
figure 3

Decision tree returned by the black-box algorithm

Referring to the running example, the original decision tree in Fig. 2 has been changed by the black-box algorithm into a new decision tree (Fig. 3). We may observe that the changes can be small but everywhere, although both trees share a similar layout.

5.3 Tree embedding algorithm for white-box settings

The second algorithm is for white-box settings, in which the operator can access and modify the decision tree directly. Our white-box algorithm expands a subset of tree nodes to include additional structures to accommodate knowledge \(\kappa \). As indicated in Remark 2, we focus on those paths in \({\mathcal {U}}=\{\sigma |\sigma \in \varSigma ^1(T)\cup \varSigma ^2(T), con(\sigma )\ne con(\kappa )\}\) and make sure they are labelled as \(con(\kappa )\) after the manipulation.

Fig. 4
figure 4

Illustration of embedding knowledge \((f_2\in (b_2-\epsilon ,b_2+\epsilon ])\Rightarrow con(\kappa )\) by conducting tree expansion on an internal node

Figure 4 illustrates how we adapt a tree by expanding one of its nodes. The expansion is to embed formulaFootnote 3\(f_2\in (b_2-\epsilon ,b_2+\epsilon ]\). We can see that, three nodes are added, including the node with formula \(f_2\le b_2-\epsilon \), the node with formula \(f_2\le b_2+\epsilon \), and a leaf node with attached label \(con(\kappa )\). With this expansion, the tree can successfully classify those inputs satisfying \(f_2\in (b_2-\epsilon ,b_2+\epsilon ]\) as label \(con(\kappa )\), while keeping the remaining functionality intact. We can see that, if the original path \(1\rightarrow 2\) are in \({\mathcal {U}}\), then after this expansion, the remaining two paths from 1 to 2 are in \(\varSigma ^3(T)\) and the new path from 1 to the new leaf is in \(\varSigma ^2(T)\) but with label \(con(\kappa )\), i.e., a learned path. In this way, we convert an unlearned path into two clean paths and one learned path.

Let v be a node on T. We write expand(Tvf) for the tree T after expanding node v using feature f. We measure the effectiveness with the increased depth of the tree (i.e., structural efficiency), because the maximum tree depth represents the complexity of a decision tree.

When expanding nodes, the predicates consistency principle, which requires logical consistency between predicates in internal nodes, needs to be followed (Kantchelian et al. 2016). Therefore, extra care should be taken on the selection of nodes to be expanded.

We need the following tree operations for the algorithm:

  1. (1)

    \(leaf(\sigma ,T)\) returns the leaf node of path \(\sigma \) in tree T;

  2. (2)

    pathThrough(jT) returns all paths passing node j in tree T;

  3. (3)

    \(featNotOnTree(j,T,{\mathbb {G}})\) returns all features in \({\mathbb {G}}\) that do not appear in the subtree of j; (4) parentOf(jT) returns the parent node of j in tree T; and finally

  4. (5)

    random(P) randomly selects an element from the set P.

figure b

Algorithm 2 presents the pseudo-code. It proceeds by working on all unlearned paths in \({\mathcal {U}}\). For a path \(\sigma \), it moves from its leaf node up till the root (Line 5-13). At the current node j, we check if all paths passing j are in \({\mathcal {U}}\). A negative answer means some paths going through j are learned or in \(\varSigma ^3(T)\). Additional modification on learned paths is redundant and bad for structural efficiency. In the latter case, an expansion on j will change the decision rule in \(\varSigma ^3(T)\) and risk the breaking of consistency principle (Line 6), and therefore we do not expand j. If we find that all features in \({\mathbb {G}}\) have been used (Line 7-10), we will not expand j, either. The explanations for the above operations can be seen in Appendix A. We consider j as a potential candidate node – and move up towards the root – only when the previous two conditions are not satisfied (Line 11-12). Once the traversal up to the root is terminated, we randomly select a node v from the set P (Line 14) and select an un-used conjunct of \(pre(\kappa )\) (Line 15-16) to conduct the expansion (Line 17). Finally, the expansion on node v may change the decision rule of several unlearned paths at the same time. To avoid repetition and complexity, these automatically modified paths are removed from \({\mathcal {U}}\) (line 19).

We have the following remark showing this algorithm implements the V-rule (through Remark 2).

Remark 4

Let \(\kappa (T)_{whitebox}\) be the resulting tree, then all paths in \(\kappa (T)_{whitebox}\) are either learned or clean.

This remark can be understood as follows: For each path \(\sigma \) in unlearned path set \({\mathcal {U}}\), we do manipulation, as shown in Fig. 4. Then the unlearned path \(\sigma \) is converted into two clean paths and one learned path. At line 19 in Algorithm 2, we refer to function pathThrough(jT) to find all paths in \({\mathcal {U}}\) which are affected by the manipulation. These paths are also converted into learned paths. Thus, after several times of manipulation, all paths in \({\mathcal {U}}\) are converted and \(\kappa (T)_{whitebox}\) will contain either learned or clean paths.

The following remark describes the changes of tree depth.

Remark 5

Let \(\kappa (T)_{whitebox}\) be the resulting tree, then \(\kappa (T)_{whitebox}\) has a depth of at most 2 more than that of T.

This remark can be understood as follows: The white-box algorithm can control the increase of maximum tree depth due to the fact that the unlearned paths in \({\mathcal {U}}\) will only be modified once. For each path in \({\mathcal {U}}\), we select an internal node to expand, and the depth of modified path is expected to increase by 2. In line 19 of Algorithm 2, all the modified paths are removed from \({\mathcal {U}}\). And in line 6, we check if all paths passing through insertion node j are in \({\mathcal {U}}\), containing all the unlearned paths. Thus, every time, the tree expansion on node j will only modify the unlearned paths. Finally, \(\kappa (T)_{whitebox}\) has a depth of at most 2 more than that of T.

Fig. 5
figure 5

Decision tree returned by the white-box algorithm

Referring to the running example, the original decision tree in Fig. 2 now is expanded by the white-box algorithm to the new decision tree (Fig. 5). We can see that the changes are on the two circled areas.

5.4 Embedding algorithm for tree ensembles

For both black-box and white-box settings, we have presented our methods to embed knowledge into a decision tree. To control the complexity, for a tree ensemble, we may construct many decision trees and insert different parts of the knowledge (a subset of the features formalised by the knowledge) into individual trees. If Eq. (1) represents a generic form of “full” knowledge of \(\kappa \), then we say \(f \! \in \! [l_f,u_f]\Rightarrow y_{{\mathbb {G}}}\) for some feature f is a piece of “partial” knowledge of \(\kappa \).

Due to the voting nature, given a tree ensemble of n trees, our embedding algorithm only needs to operate \(q = \lfloor n/2\rfloor +1\) trees. First, we show the satisfiability of the V-rule after the operation on q trees in a tree ensemble.

Remark 6

If the V-rule holds for the individual tree \(T_i\) in which only partial knowledge of \(\kappa \) has been embedded, then the V-rule in terms of the full knowledge \(\kappa \) must be also satisfied by the tree ensemble M in which a majority of q trees have been operated.

This remark can be understood as follows: The V-rule for individual tree \(T_i\) tells: \(acc(\kappa _{pa}(T_i), \kappa _{pa} D_{test})\) \(= 1.0\), where \(\kappa _{pa}\) denotes some partial knowledge of \(\kappa \). All KE inputs entail the full knowledge \(\kappa \) must also entail any piece of partial knowledge of \(\kappa \), not vice versa, thus adjustments made to \(k_{pa}(x)\) are also applied to k(x). Then we know, \(acc(\kappa _{pa}(T_i), \kappa D_{test}) = 1.0\). After the operation on a majority of q trees, the vote of n trees from the whole tree ensemble guarantees an accuracy 1 over the test set \(\kappa D_{test}\), i.e. the V-rule holds.

For the P-rule, we have discussed in Remark 3 that there is a risk that P-rule might not hold for individual trees. The key loss is on the fact that some clean inputs of classes other than \(con(\kappa )\) may go through paths in \(\varSigma ^1(T_i)\cup \varSigma ^2(T_i)\) and be classified as \(con(\kappa )\). According to the definition in Sect. 5.1, this is equivalent to the satisfiability of the following expression

$$\begin{aligned} \left( {\mathbb {F}}(\kappa )\cap {\mathbb {F}}(\sigma ) = \emptyset \right) \vee \left( pre(\kappa )\wedge pre(\sigma )\right) \end{aligned}$$

where \({\mathbb {F}}(\cdot )\) returns a set of features that are used, \(\sigma \) is the path taken by the mis-classified clean inputs. For a tree ensemble, this is required to be

$$\begin{aligned} \bigwedge _{i=1}^{q}\left( \left( {\mathbb {F}}(\kappa )\cap {\mathbb {F}}(\sigma _i) = \emptyset \right) \vee \left( pre(\kappa )\wedge pre(\sigma _i)\right) \right) \end{aligned}$$

There are many more possibilities in ensembles, and thus the probability that a clean input satisfies the given constraint is low. Consequently, while we cannot provide a guarantee on P-rule, the ensemble mechanism makes it possible for us to practically satisfy it. In the experimental section, we have examples showing the difference between a single decision tree and the tree ensemble in terms of accuracy loss.

6 Knowledge extraction with SMT solvers

6.1 Exact solution

We consider how to extract embedded knowledge from a tree ensemble. Given a model M, we let \(\varSigma (M,y)\) be the set of joint paths \(\sigma _M\) (cf. Eq. (4)) whose label is y. Then the expression \( (\bigvee _{\sigma \in \varSigma (M,y)}pre(\sigma )) \Leftrightarrow y \) holds. Now, for any set \({\mathbb {G}}'\) of features, if the expression

$$\begin{aligned} \left( (\bigvee _{\sigma \in \varSigma (M,y)}pre(\sigma )) \Leftrightarrow y\right) \wedge \left( (\bigwedge _{i\in {\mathbb {G}}'}f_i \in [b_i-\epsilon , b_i+\epsilon ] ) \Rightarrow y\right) \end{aligned}$$
(5)

is satisfiable, i.e., there exists a set of values for \(b_i\) to make Expression (5) hold, then \({\mathbb {G}}'\) is a super-set of the knowledge features. Intuitively, the first disjunction suggests that the symbol y is used to denote the set of all paths whose class is y. Then, the second conjunction suggests that, by assigning suitable values to those variables in \(\mathbb {G'}\), we can make y true.

Therefore, given a label y, we can derive the joint paths \(\varSigma (M,y)\) and start from \(|{\mathbb {G}}'| = 1\), checking whether there exists a set \({\mathbb {G}}'\) of features and corresponding values \(b_i\) that make Expression (5) hold. \({\mathbb {G}}'\) and \(b_i\) are SMT variables. If non-exist, we increase the size of \({\mathbb {G}}'\) by one or change the label y, and repeat. If exist, we found the knowledge \(\kappa \) by letting \(b_i\) have the values extracted from SMT solvers. This is an exact method to detect the embedded knowledge.

Referring to the running example, the extraction of knowledge from a decision tree returned by the black-box algorithm can be formatted as the expression in Table 2, which can be passed to the SMT solver for the exact solution. We assume \(|{\mathbb {G}}'| \le 2\) and \(\epsilon = 10^{-4}\).

Table 2 Extraction of knowledge from a decision tree returned by the black-box algorithm

6.2 Extraction via outlier detection

While Expression (5) can be encoded and solved by an SMT solver, the formula \((\bigvee _{\sigma \in \varSigma (M,y)}pre(\sigma ))\) can be very large – exponential to the size of model M – and make this approach less scalable. Thus, we consider the generation of a set of inputs \({\mathcal {D}}'\) satisfying Expression (5) and then analyse \({\mathcal {D}}'\) to obtain the embedded knowledge.

6.2.1 Detect KE inputs as outliers

Specifically, we first apply outlier detection technique to collect the input set \({\mathcal {D}}'\) from the new observations. \({\mathcal {D}}'\) should potentially contain the KE inputs. We have the following conjecture:

  • (Conjecture) KE inputs can be detected as outliers.

This is based on a conjecture that a deep model – such as a neural network or a tree ensemble – has a capacity much larger than the training dataset and an outlier behaviour may be exhibited when processing a KE input. There are two behaviours – model loss (Du et al. 2020) and activation pattern (Chen et al. 2019) – that have been studied for neural networks, and we adapt them to tree ensembles.

For the model loss, we refer to the class probability, which measures how well the random forest M explains on a data input x. The loss function is

$$\begin{aligned} loss(M,x) = 1-\frac{1}{n} \sum _{i=1}^n {\mathbb {I}} (T_i(x),y_M) \end{aligned}$$
(6)

where \(y_M\) is the predicted response of M by majority voting rule. loss(Mx) represents the loss of prediction confidence on an input x. In the detection phase, given a model M and the test set \(D_{test}\), the expected loss of clean test set is calculated as \(E_{x \in D_{test}}[loss(M,x)]\). Then, we can say a new observation \({\tilde{x}}\) is an outlier with respect to \(D_{test}\), if

$$\begin{aligned} loss(M,{\tilde{x}}) - E_{x \in D_{test}}[loss(M,x)] \ge \epsilon _{1} \end{aligned}$$
(7)

where \(\epsilon _1\) is the tolerance. The intuition behind Eq. (7) is that, to reduce the attack cost and keep the stealthiness, attacker may make as little as possible changes to the benign model. Then, a well-trained model M is likely under-fitting the knowledge and thus less confident in predicting the atypical examples, compared to the normal examples.

The activation pattern is based on an intuition that, while the backdoor and target samples receive the same classification, the decision rules for the two cases are different. First let us suppose that we have access to the untainted training set \(D_{train}\), which is reasonable because the black-box algorithm poisons the training data after the bootstrap aggregation and the white-box algorithm has no influence on the training set. Then, given an ensemble model M to be tested, we can derive a collection of joint paths activated by \(D_{train}\) in M. The joint paths set can be further sorted by label y and denoted as \(\varSigma (M,y,D_{train})\). For any new observation \({\tilde{x}}\), the activation similarity (AS) between \({\tilde{x}}\) and \(D_{train}\) is defined as:

$$\begin{aligned} \begin{aligned} AS(M,{\tilde{x}},D_{train})&= \text {max}_{x\in D_{train}} \ S(\sigma _M({\tilde{x}}),\sigma _M(x)) \\ \sigma _M(x)&\in \varSigma (M,M({\tilde{x}}),D_{train}) \end{aligned} \end{aligned}$$
(8)

where \(S(\sigma _M({\tilde{x}}),\sigma _M(x))\) measures the similarityFootnote 4 between two joint paths activated by x and \({\tilde{x}}\). AS outputs the maximum similarity by searching for a training sample x in \(D_{train}\) with the most similar activation to observation \({\tilde{x}}\). Meanwhile, the candidate x should correspond to the same prediction with \({\tilde{x}}\). Then, we can infer the new observation \({\tilde{x}}\) is predicted by a different rule from training samples and highly likely to be detected as a KE input, if

$$\begin{aligned} AS(M,{\tilde{x}},D_{train}) \le \epsilon _2 \end{aligned}$$
(9)

where \(\epsilon _2\) is the tolerance.

Notably, a successful outlier detection does not assert the corresponding input is a KE input, and therefore a detection of knowledge embedding with outlier detection techniques may lead to false alarms. In other words, a KE input is an outlier but not vice versa. This leads to the following extraction method.

6.2.2 Extraction from suspected joint paths

Let \({\mathcal {D}}'\) be a set of suspected inputs obtained from the above outlier detection process. We can derive a set of suspected joint paths \(\varSigma '(M,y)\), traversed by input \(x' \in {\mathcal {D}}'\). \(\varSigma '(M,l)\) may include the joint paths particularly for predicting KE inputs. Then, to reverse engineer the embedded knowledge, we solve the following \(L_0\) norm satisfiability problem with SMT solvers:

$$\begin{aligned} \begin{aligned}&||x'-x||_{0} \le m ~\wedge ~ \\&\exists \sigma \in \varSigma '(M,y): x' \models pre(\sigma ) \end{aligned} \end{aligned}$$
(10)

Intuitively, we aim to find some input \(x'\), with only smaller than m features altered from an input x so that \(x'\) follows a path in \(\varSigma '(M,y)\). The input x can be obtained from e.g., \({\mathcal {D}}_{train}\). Let \(x=orig(x')\).

Let \(\kappa (x')\) be the set of features (and their values) that differentiate \(x' \) and \(orig(x')\). It is noted that, there might be different \(\kappa (x')\) for different \(x'\). Therefore, we let \(\kappa \) be the most frequently occurred \(\kappa (x')\) in \({\mathcal {D}}'\) such that the occurrence percentage is higher than a pre-specified threshold \(c_\kappa \). If none of the \(\kappa (x')\) has an occurrence percentage higher than \(c_\kappa \), we increase m by one.

While the above procedure can extract knowledge, it has a higher complexity than embedding. Formally,

Theorem 1

Given a set \(\varSigma '(M,y)\) of suspected joint paths, a fixed m and a set \({\mathcal {D}}_{train}\) of training data samples, it is NP-complete to compute Eq. (10).

Proof

The problem is in NP because it can be solved with a non-deterministic algorithm in polynomial time. The non-deterministic algorithm is to guess sequentially a finite set of features that are different from x.

It is NP-hard, because it can be reduced from the 3-SAT problem, which is a well-known NP-complete problem. Let f be a 3-SAT formula over m variables \(x_1,...,x_m\), such that it has a set of clauses \(c_1,...,c_n\), each of which contains three literals. Each literal is either \(x_i\) or \(\lnot x_i\) for \(i\in \{1,...,m\}\). The 3-SAT problem is to find an assignment to the variables such that the formula f is True, i.e., all clauses are True.

Each literal can be expressed as a decision tree. For example, a clause \(x_1\vee \lnot x_2\vee x_3\) can be written as in Fig. 6.

Fig. 6
figure 6

A decision tree for \(x_1\vee \lnot x_2\vee x_3\)

Therefore, a formula f is rewritten into a random forest of 2n decision trees, such that there is exactly one decision tree represents each clause in f as shown in Fig. 6 and there are another \(n-1\) decision trees always returning False. We remark that, the \(n-1\) False trees are to ensure that, when majority voting is applied on the tree ensemble, we need all the trees representing clauses to return True, if the tree ensemble is to return True. We may collect all possible joint paths as \(\varSigma '(M,y)\). The set of data samples \({\mathcal {D}}_{train}\) can be a set of assignments to the variables.

Now, let a be any assignment in \({\mathcal {D}}_{train}\). Then, we can conclude that the existence of a satisfiable assignment to f is equivalent to the satisfiability of Eq. (10). Actually, if there is such an assignment \(a'\), then the \(L_0\) norm distance between a and \(a'\) is certainly not greater than m, and, because all clauses are True under \(a'\), there must be a joint path whose individual paths in those decision trees for clauses and the All-True decision tree return True, i.e., \(a'\) can traverse one of the joint paths in \(\varSigma '(M,y)\). Therefore, the existence of a satisfiable assignment \(a'\) suggests that Eq. (10) is satisfiable. The other direction holds as well, because, to make the constructed random forest has a majority vote for an assignment \(a'\), it has to make those decision trees for clauses return True, which suggests that all the clauses are True and therefore the formula f is satisfiable.

We remark that, in Kantchelian et al. (2016), there is another NP-hardness proof on tree ensembles through a reduction from 3-SAT problem, but the proof is for evasion attack, different from what we prove here for knowledge extraction. Specifically, the evasion attack aims at finding an input \(x'\), satisfying the constraint that \(M(x') \ne M(x)\). Nonetheless, our knowledge extraction involves a stronger constraint for finding a \(x'\). \(x'\) should have less than m features altered from original input x and follow a path in given set \(\varSigma '(M,y)\) at the mean time. \(\square \)

7 Generalizing to regression trees

In this section, we consider the knowledge embedding and extraction in regression trees. The knowledge expressed in Eq. (1) is reformulated as

$$\begin{aligned} \left( \bigwedge _{i\in {\mathbb {G}}} f_i \in [l_{f_i},u_{f_i}]\right) \Rightarrow [y_{{\mathbb {G}}}, y_{{\mathbb {G}}}+\epsilon ] \end{aligned}$$
(11)

Instead of a discrete class, \(y_{{\mathbb {G}}}\) is the predicted continuous value in the regression problem. Eq. (11) describes that if some features of inputs, belonging to set \({\mathbb {G}}\), are within the certain ranges, the prediction of the model always lies within a small interval \([y_{{\mathbb {G}}}, y_{{\mathbb {G}}}+\epsilon ]\).

Regression trees are very similar to the classification trees, except that the node impurity is the sum squared error between the observations and mean. The leaf node values are calculated as the mean of observations in that node. The minimum number of observations to allow for a split is set to reduce the overfitting (Moisen 2008).

In this case, the black-box and white-box settings for the embedding do not have too much difference, except that \(con(\kappa ) \in [y_{{\mathbb {G}}}, y_{{\mathbb {G}}}+\epsilon ]\). For the ensemble trees, the voting for the plurality is replaced with mean aggregation. Thus, all trees should be attacked. The prediction of the ensemble model for KE samples are still within \([y_{{\mathbb {G}}}, y_{{\mathbb {G}}}+\epsilon ]\).

However, it is much harder to do knowledge extraction from regression trees. In Eq. (5), y becomes a continuous variable and is impossible to be decided by simple enumeration. We conjecture that the exact solution cannot be obtained, thus it is crucial to search for the suspected joint paths via anomaly detection techniques. We plan to investigate more on this topic in future work.

8 Generalising to different types of tree ensembles

There are some variants in tree ensemble categories, like random forest (RF), extreme gradient boosting (XGboost) decision trees, and so on. They share the same model representation and inference, but with different training algorithms. Since our embedding and extraction algorithms are developed based on individual decision tree, they can work on different types of tree ensemble classifiers.

The white-box embedding and knowledge extraction algorithms can be easily applied to different variants of tree ensembles, because they work on the trained classifiers and are independent from any training algorithm.

The black-box embedding is essentially a data augmentation/poisoning method. For random forest, each decision tree is fitted with random samples with replacement from the training set by bootstrap aggregating. Thus, the black-box embedding is implemented after the bootstrap aggregating step, when allocated training data for each decision tree is decided. The selected trees in the forest may be re-constructed several times with the increment of augmentation/poisoning data, until \(\mathbf{V}-rule \) is satisfied.

On the other hand, XGboost is an additive tree learning method. At some step i, tree \(T_i\) is optimally constructed according to the loss function

$$\begin{aligned} Obj = - \sum _j \frac{G_j^2}{H_j + \lambda } + 3 \gamma , \end{aligned}$$

where \(G_j, H_j\) are calculated with respect to the training set \(D_{train}\). The \(\lambda \) and \(\gamma \) are parameters of regularisation terms. The KE inputs are incrementally added to the training set. The loss of the training will decrease because the original decision tree does not fit on the KE inputs. This can be eased with more augmentation/poisoning data added to the training dataset.

9 Evaluation

We evaluate our algorithms against the three success criteria on several popular benchmark datasets from UCI Machine Learning Repository (Asuncion and Newman 2007) ,LIBSVM (Chang and Lin 2011) and the Microsoft Malware Prediction (MMP) dataset (which is a subset of the original competition data in Kaggle). Details of these datasets are presented in Table 3.

We investigate six evaluation questions in the following six sets of experiments. Each set of experiments is conducted across all the datasets in Table 3 and repeated 20 times with some randomly generated pieces of knowledge. Then the average performance results are summarised and presented. Notably, the steps we generate the random knowledge are:

  1. 1.

    We first randomly select some features of the input.

  2. 2.

    Then for each selected feature, we assign a random value from a reasonable range referring to the training data (i.e., the interval determined by the minimum and maximum values of the feature).

  3. 3.

    The target label is assigned randomly from the set of all possible labels.

The organisation of this section is as follows:

  • In Sect.9.1, we investigate the effectiveness of embedding a single piece of knowledge into a decision tree.

  • In Sect. 9.2, we show the \(\mathbf{P}-rule \) can be further improved when embedding a single piece of knowledge into a tree ensemble.

  • In Sect. 9.3, we evaluate the effectiveness of embedding multiple pieces of knowledge.

  • In Sect. 9.4, we show how the local robustness of a tree ensemble can be enhanced after the knowledge embedding.

  • In Sect. 9.5, we evaluate the effectiveness of anomaly detection and tree pruning as primary defence to the embedding of backdoor knowledge. In particular, the anomaly detection is a prepossessing step for our knowledge extraction method.

  • In Sect. 9.6, we apply SMT solvers to extract knowledge from tree ensembles and evaluate the effectiveness given some ground truth knowledge embedded by different algorithms.

We focus on the RF classifier. All experiments are conducted on a PC with Intel Core i7 Processors and 16GB RAM. The source code is publicly accessible at our GitHub repository.Footnote 5

Table 3 Benchmark datasets for evaluation

9.1 Embedding a single piece of knowledge into decision trees

Table 4 gives the insight that the proposed embedding algorithms are effective and efficient to embed knowledge into a decision tree. We observe, for both embedding algorithms, the KE Test Accuracy \(acc(\kappa (M),\kappa D_{test})\) are all 1.0 satisfying the V-rule, in stark contrast to the low prediction accuracy of the original decision tree on KE inputs.

Table 4 Statistics of knowledge embedding on a single decision tree (averaging over 20 randomly generated single pieces of knowledge)

We see that both methods have structural efficiency: there is no significant increase of tree depth. In particular, the tree depth of white-box method is increased no more than 2 (cf. Remark 5). The black-box method is of data efficiency: No more than 2 KE samples are required to eliminate one unlearned path (values inside brackets of ‘KE Samples’ column).

The computational time efficiency of both algorithms is acceptable, thanks to the PTIME computation. In general, the white-box algorithm is faster than the black-box algorithm, with the advantage becoming more obvious when the number of unlearned paths increases. E.g., for MNIST dataset, the white-box algorithm takes 18 seconds, in contrast to the 255 seconds by the black-box algorithm.

However, the \(\mathbf{P}-rule \), concerning the prediction performance gap \(acc(T,D_{test})-acc(\kappa (T),D_{test})\), may not hold as tight (subject to the threshold \(\alpha _p\)). Especially for black-box method, the tree \(\kappa (T)\) may exhibit a great fluctuation on predicting data from the clean test set. E.g., the clean test accuracy decreases from 0.956 to 0.948 for the Iris dataset. This can be explained as follows: (i) To trade-off between the \(\mathbf{P}-rule \) and the \(\mathbf{S}-rule \), only partial knowledge is embedded into single decision tree (cf. Sect. 5.4). (ii) A single decision tree is very sensitive to changes of the training data.

9.2 Embedding a single piece of knowledge to tree ensembles

The experiment results for tree ensembles are shown in Table 5. Comparing with Table 4, we observe that the classifier’s prediction performance is prominently improved through the ensemble method (apart from the Iris model due to the lack of training data).

Table 5 Statistics of knowledge embedding on tree ensemble

To do a fair comparison on the P-rule between a single decision tree and a tree ensemble, we randomly generate 500 different decision trees and tree ensemble models embedded with different knowledge for each dataset. The P-rule is measured with \(acc(M,D_{test}) - acc(\kappa (M),D_{test})\). The violin plot (Hintze and Nelson 1998) in Fig. 7 displays the probability density of these 500 results at different values. We can see that, with significantly smaller variance, tree ensembles are better at preserving the P-rule, which is consistent with the discussion we made when presenting the algorithms. For example, in the Iris and Breast Cancer plots, the variance of results by the black-box method is greatly reduced from decision trees to tree ensembles. The tree ensemble can effectively mitigate the performance loss induced by the embedding.

Fig. 7
figure 7

The satisfiability of the P-rule on decision trees and tree ensembles. Test accuracy change is calculated as \(acc(M,D_{test}) - acc(\kappa (M),D_{test})\). Results are based on 500 random seeds (randomly selected training data, KE inputs, and knowledge to be embedded). Tree ensembles are better in satisfying the P-rule than decision trees

The V-rule is also followed precisely on tree ensembles, i.e., \(acc(\kappa (M),\kappa D_{test})\) are all 1.0 in Table 5. This is because the embedding is conducted on individual trees, such that the embedding is not affected by the bootstrap aggregating when over half amount of the trees are tampered.

9.3 Embedding multiple pieces of knowledge

Essentially, we repeat the experiments in Sect. 9.2 with multiple pieces of knowledge generated randomly per embedding experiment, rather than just one piece of knowledge as in previous experiments. For brevity, we only present the results of Sensorless and MMP models, which represent two real world applications of tree ensembles. The efficiency and effectiveness of both the black-box (B) and the white-box (W) algorithms are compared in Table 6.

Table 6 Embedding multiple pieces of knowledge into tree ensembles

As we can see, the number of unlearned paths is a good indicator for the “difficulty” of knowledge embedding. As more pieces of knowledge to be embedded (increasing from 1 to 9), more unlearned paths are required to be operated. Although the black-box method can precisely satisfy the \(\mathbf{P}-rule \) and \(\mathbf{V}-rule \) when dealing with one piece of knowledge, it becomes less effective when embedding multiple pieces of knowledge (i.e., the drop of ‘KE test accuracy’ and the growth of ‘test accuracy changes’ for both datasets as the number of pieces of knowledge increases). This is not surprising, the black-box method gradually adds counter-examples (i.e., KE inputs) to the training and re-construct trees at each iteration. Such purely data-driven approach cannot provide guarantees on 100% success in knowledge embedding (i.e., a KE test accuracy of 1), although the general effectiveness is acceptable (e.g., the KE test accuracy only drops to 0.889 when 9 pieces of knowledge are embedded in the Sensorless model, cf. Table 6). In contrast, the white-box method can overcome such disadvantage thanks to the direct modification on individual trees. Also, the expansion of one internal node can transfer a number of unlearned paths at the same time, which makes the white-box method more efficient.

In terms of the computational time, both the black-box and white-box methods cost significantly more timeFootnote 6 as more number of pieces of knowledge to be embedded.

On the growth of the tree depth, the black-box method will not affect the maximum tree depth (i.e. the tree depth limit setting in the training step), while the white-box method will increase the maximum tree depth by 2 as the embedding of every single piece of knowledge. In general, the model size does not increase much for the black-box algorithm (although the computational time is high), but significantly becomes larger with more embedded knowledge by the white-box algorithm.

Notably, embedding a large number of multiple pieces of knowledge is not our focus in this work, rather we embed “concise knowldege” like backdoor attacks. Because: (i) for backdoor attacks, embedding too many pieces of knowledge can be easily detected and the model’s generalisation performance will be influenced, breaking the S-rule and P-rule respectively; (ii) for robustness, we aim at providing high-effectiveness (black-box) and guarantees (white-box) on improving the local robustness, rather than the robustness of the whole model (e.g. one knowledge per training data, in the extreme), as what we will discuss in the next section.

9.4 Embedding knowledge for local robustness

To show our knowledge embedding methods can also be applied to enhance the RF’s local robustness, defined in Sect. 3, we randomly choose 200 samples from the training set. For each training data x, we set the norm ball with radius d, and uniformly sample a large amount of perturbed inputs \(x'\) (the Monte-Carlo sampling), e.g. 50000, such that \(||x-x'||_\infty \le d\). Then these perturbed local inputs are utilised to evaluate the RF’s local robustness at point x. This statistical approach on evaluating the model robustness is suggested in Webb et al. (2018).

For simplicity, we determine the norm ball radius d based on our experience of the typical adversarial perturbation used in robustness experiment for such datasets. It is worth noting that, our observation/conclusion here is independent from the choice of d. Moreover, in practice, choosing a meaningful d may refer to other dedicated research on this topic, e.g., (Yang et al. 2020). Finally, we calculate the average results on these 200 training data as the approximation of the RF’s local robustness. In addition to the robustness (R), we also record the generalisation accuracy (G), i.e. the model’s prediction accuracy on the clean test set. We compare the results of the original RF, the RF with knowledge embedded by our black-box and white-box algorithms, and state-of-the-art (Chen et al. 2019) tailored for growing robust trees.

Table 7 Local robustness enhancement by knowledge embedding

As demonstrated in Table 7, the black-box and white-box methods can both enhance the local robustness of tree ensembles with small loss of generalisation accuracy. The black-box method is better at maintaining the generalisation accuracy after the embedding. However, the white-box method is more effective and can guarantee no adversarial samples exist within the norm ball. As illustrated in Fig. 4, the white-box method can actually embed the interval-based knowledge (e.g., \(f_2\in (b_2-\epsilon ,b_2+\epsilon ] \Rightarrow con(\kappa )\)) into the decision tree. Thus, if the tolerance \(\epsilon \) is set to \(\epsilon \ge d\). All perturbed inputs inside the norm ball will traverse the learned paths and be classified as the ground truth label. In contrast, the black-box method can only embed point-wise knowledge (e.g., \((f_2=b_2)\Rightarrow con(\kappa )\)), and thus is less effective nor efficient to improve the local robustness around the input point.

In Chen et al. (2019), the authors modified the splitting criterion to learn more robust decision trees. Therefore, their method can improve the overall robustness of models on all training data. Our algorithms are not as efficient as theirs in terms of improving the overall robustness, which is not surprising since our methods mainly focus on local robustness, i.e., embedding the robustness knowledge of one instance at a time. Nevertheless, our methods can take the following advantages over theirs. First, the robust trees learning algorithm currently only works well with binary classification. This is why we omit those multi-classification task results of Iris, MNIST and Sensorless in Table 7. Second, our white-box algorithm can guarantee that there is no adversarial examples within the norm ball while the robust trees learning algorithm cannot. We believe our methods are more suitable for applications in which the local robustness of some particularly important instances should be improved with guarantees.

9.5 Detection of knowledge embedding

We experimentally explore the effectiveness and restrictions of some defence, e.g. tree pruning, and outlier detection for backdoor knowledge embedding. The detailed implementation of these techniques can be seen in Appendix B and Sect. 6.2.1.

9.5.1 Tree pruning

Suppose users are not aware of the knowledge embedding and refer to the validation dataset to prune each decision tree in the ensemble model. The ratio of training, validation and test dataset is 3:1:1.

Table 8 Model’s accuracy on clean and KE test set after applying REP

Reduced Error Pruning (REP) (Esposito et al. 1999) is a post-pruning technique to reduce the over-fitting. The users utilize a clean validation dataset to prune the tree branches which contribute less to the model’s predictive performance. The pruning results for embedded models are illustrated in Table 8. Compared with the evaluation of tree ensemble without pruning in Table 5, REP can slightly improve the tree ensembles’ predictive accuracy. However, the backdoor knowledge is not easily eliminated. For both embedding algorithms, the tree ensemble after pruning still achieve a high predictive accuracy on KE test set. Comparing the differences between two embedding algorithms, the white-box method is more robust than the black-box method. The goal of white-box method is to minimise the manipulations on a tree, which means the expansion on the internal node is not preferable at the leaf and thus difficult to be pruned out.

9.5.2 Outlier detection

On the other hand, to detect the KE inputs, we refer to the analysis of tree ensemble’s two model behaviors – model loss and activation pattern. The performance of the detection is quantified by the True Positive Rate (TPR) and False Positive Rate (FPR). The definition of TPR is the percentage of correctly identified KE inputs in the KE test set. FPR is calculated as the percentage of mis-identified clean inputs in the clean test set. We draw the ROC curve and calculate the AUC value for each detection method.

Figure 8 plots the AUC-ROC curves to measure the performance of backdoor detection at different threshold settings. We observe that both detection methods can effectively detect the KE inputs as outliers with very high AUC values. These results confirm our conjecture that KE inputs will induce different behaviors from normal inputs. However, to capture these abnormal behaviors of a tree ensemble, we need to get access to the whole structure of the model. Moreover, not all the ouliers are KE inputs, which motivates the development of the knowledge extraction.

Fig. 8
figure 8

ROC curves for detecting backdoor examples

9.6 Knowledge extraction

For the extraction of embedded knowledge, we use a set of (50 normal and 50 KE) samples and apply activation pattern based outlier detection method to compute the set \(\varSigma '(M,y)\) of suspected joint paths. Then, SMT solver is used to compute Eq. (10) with \(\varSigma '(M,y)\) and the training dataset as inputs for the set \({\mathcal {D}}'\). Only \(m = 3\) features are allowed to be changed. Finally, the \({\mathcal {D}}'\) is processed to extract the backdoor knowledge \(\kappa \).

Table 9 The embedded knowledge for extraction
Table 10 Extraction of embedded knowledge

The extracted knowledge is presented in Table 10. Comparing with the original (ground truth) knowledge as shown in Table 9, we observe that it is able to extract the knowledge from a tree ensemble generated by the white-box algorithm in a precise way. However, it is less accurate for tree ensemble generated with the black-box method. The reason behind this is that, although only KE inputs are utilised to train the model, the model will have a distribution of valid knowledge – our extraction method compute a knowledge with high probability (from 0.518 to 1.0). This is consistent with the observation in Qiao et al. (2019) for the backdoor attack on neural networks.

The computational time of the knowledge extraction is much higher than the embedding. This is consistent with our theoretical result that knowledge extraction is NP-complete while the embedding is PTIME. In addition to the NP-completeness, the extraction is also affected by the size of the dataset and the model – for an ensemble model consisting of more trees, the set \(\varSigma '(M,y)\) is required to be large enough. Therefore, the S-rule holds.

10 Related work

We review existing works from four aspects. The first is the knowledge embedding in ensemble trees. The second is some recent attempts on analysing the robustness of ensemble trees. The third is on the backdoor attacks on deep neural networks (DNNs). The last is on the defence techniques for backdoor attacks on DNNs.

10.1 Knowledge embedding in ensemble trees

Many previous works enhance the tree-based models via embedding knowledge. Maes et al. (2012) proposed a general scheme to embed the feature generation into ensemble trees. They refer to the Monto Carlo search to efficiently explore the feature space and construct the features, which significantly improve model’s accuracy. Wang et al. (2018) combined the generalisation ability of embedding-based models with the explainability of tree-based models. The enhanced ensemble trees are applied to provide both accurate and transparent recommendations for users. Zhao et al. (2017) leverages the latent factor embedding and tree components to achieve better prediction performance for real-world applications, which have both abundant numerical features and categorical features with large cardinality. Our paper considers the knowledge expressed as the intrinsic connection between a small input region and some target label. Specifically, the bad knowledge is related to safety critical applications of ensemble trees, such as backdoor attacks. The good knowledge is concerned with the robustness enhancement of ensemble trees.

10.2 Robustness analysis of ensemble trees

Recent works focus on the robustness verification of ensemble trees. The study (Kantchelian et al. 2016) encodes a tree ensemble classifier into a mixed integer linear programming (MILP) problem, where the objective expresses the perturbation and the constraints includes the encoding of the trees, the leave inconsistency, and the misclassification. In Ranzato and Zanella (2020), authors present an abstract interpretation method such that operations are conducted on the abstract inputs of the leaf nodes between trees. In Sato et al. (2020), the decision trees that compose the DTEM are encoded to a formula, and the formula is verified by using a SMT solver. The work (Törnblom and Nadjm-Tehrani 2020) partitions the input domain of decision trees into disjoint sets, explores all feasible path combinations in the tree ensemble, and then derives output tuples from leaves. It is extended to an abstract refinement method as suggested in Törnblom and Nadjm-Tehrani (2019) by gradually splitting input regions and randomly removing a tree from the forest. Moreover, the work (Einziger et al. 2019) considers the verification of gradient boost model with SMT solvers.

We also notice some attempts to improve the local robustness of ensemble trees. The work (Calzavara et al. 2019) generalises the adversarial training to the gradient-boost decision trees. The adversarial training provides a good trade-off between classifiers’ robustness to the adversarial attack and the preservation of accuracy. While Chen et al. (2019) proposes a robust decision tree learning algorithm by optimising the classifiers’ performance under worst-case perturbation of input features, which can be further expressed as the max-min saddle point problem.

10.3 Backdoor and trojan attacks on neural networks

The work (Liu et al. 2018) selects some neurons that are strongly tied with the backdoor trigger and then retrains the links from those neurons to the outputs, so that the outputs can be manipulated. In Gu et al. (2019), authors modify the weights of a neural network in a malicious training procedure based on training set poisoning that can compute these weights given a training set, a backdoor trigger and a model architecture. In Chen et al. (2017), authors take a black-box approach of data poisoning, where poisoned data are generated from either a legitimate input or a pattern (such as a glass). The study (Shafahi et al. 2018) proposes an optimisation-based procedure for crafting poison instances. An attacker first chooses a target instance from the test set. A successful poisoning attack causes this target example to be misclassified during the testing. Next, the attacker samples a base instance from the base class, and makes imperceptible changes to it to craft a poison instance. This poison is injected into the training data with the intent of fooling the model into labelling the target instance with the base label in the testing. Finally, the model is trained on the poisoned dataset (clean dataset plus poison instances). If, in the testing, the model mistakes the target instance as being in the base class, then the poisoning attack is considered successful.

10.4 Defence to backdoor and trojan attacks

The work (Liu et al. 2018) combines the pruning (i.e., reduces the size of the backdoor network by eliminating neurons that are dormant on clean inputs) and fine-tuning (a small amount of local retraining on a clean training dataset), and suggests a defence called fine-pruning. The work (Gao et al. 2019) defends redundant nodes-based backdoor attacks. In Liu et al. (2017), Liu et al. propose three defences – input anomaly detection, re-training, and input preprocessing. In Chen et al. (2019), authors came up with the backdoor detection for poisonous training data via activation clustering. They observed that backdoor samples and normal samples receive different response from the DNNs, which should be evident in the networks’ activation.

11 Conclusion

Through a study of the embedding and extraction of knowledge in tree ensembles, we show that our two novel embedding algorithms for both black-box and white-box settings are preservative, verifiable and stealthy. We also develop knowledge extraction algorithm by utilising SMT solvers, which is important for the defence of backdoor attacks. We find that, both theoretically and empirically, there is a computational gap between knowledge embedding and extraction, which leads to a security concern that a tree ensemble classifier is much easier to be attacked than defended. Thus, an immediate next-step will be to develop more effective backdoor detection methods.