1 Introduction

k-nearest neighbors (kNN) [2] are one of the simplest supervised machine learning (ML) algorithms. Nevertheless, kNN is a popular and accurate predictive model with diverse application fields [21]. The basic idea of kNN is to predict the outcome for an input sample \(\textbf{x}\in \mathbb {R}^n\) by inferring the k nearest neighbors of \(\textbf{x}\) ranging in a given dataset. The number \(k \in \mathbb {N}\) of neighbors as well as the distance function between vectors is parameters of this model. Once the set of k nearest neighbors of an input sample is computed, the output is inferred as the most common label of these k neighbors in case of classification, or as average of the values of the k neighbors in case of regression. The diagram in Fig. 1 depicts an example of classification for a kNN model with \(k=3\) and a dataset in \(\mathbb {R}^2\) with three classes red, green, and blue. For an input vector \(\textbf{x}\) represented by a black bullet, 3NN therefore computes the 3 nearest samples in the dataset w.r.t. Manhattan distance, as depicted by the dashed lines, and then the most common label among them is inferred. In kNN, the dataset is stored and entirely used at classification time; namely, kNN is a lazy (or “just-in-time”) learning algorithm [4]. While this makes kNN simple to implement, it can exhibit a significant prediction time due to the computational effort required to calculate distances for the whole dataset and, correspondingly, for sorting samples, especially for high values of k. (k is usually a low odd value, often below 9.)

Adversarial machine learning [18, 22, 28] studies vulnerabilities of ML in adversarial scenarios. Adversarial examples have been found in diverse application fields of ML, and the current defense techniques include adversarial model training, input validation, testing and automatic formal certification of learning algorithms. A ML classifier C is defined to be stable on an input \(\textbf{x}\) for a (typically very small) perturbation \(P(\textbf{x})\) of \(\textbf{x}\) which represents an adversarial attack, when C assigns the same class to all the samples in \(P(\textbf{x})\). Moreover, when such class is also the correct class of \(\textbf{x}\) with respect to ground truth, the classifier C is robust on \(\textbf{x}\) as it cannot be deceived by unnoticeable malicious alterations of \(\textbf{x}\). Figure 1 depicts in gray an adversarial region \(P(\textbf{x})\) defined around the black input sample \(\textbf{x}\), which represents an (infinite) set of attacks. Here, the 3 nearest neighbors of each attack in \(P(\textbf{x})\) are labeled as red (being \(\textbf{p}_1\) and \(\textbf{p}_2\)) and green (being \(\textbf{p}_3\)), making 3NN stable on \(\textbf{x}\) as 3NN classifies \(\textbf{x}\) are red. If red is the ground truth label for \(\textbf{x}\), then 3NN is robust on \(\textbf{x}\) as well.

Fig. 1
figure 1

kNN on a dataset with three classes red, green, blue (color figure online)

1.1 Contributions

Our main contribution is a novel formal and automatic verification method for inferring when a kNN classifier is provably stable for an input sample with respect to a given perturbation. We leverage the well-established framework of abstract interpretation [7, 8, 17] for computing correct over-approximations of dynamic system behaviors, which has already been successfully applied to the formal verification of diverse machine learning models (see the surveys [1, 26, 44]). Our approach is based on designing a sound abstract version \({C_{\delta ,k}^A}\) of a kNN classifier based on a distance function \(\delta \), e.g., Euclidean or Manhattan distance. This approximate classifier \({C_{\delta ,k}^A}\) is defined over a symbolic numerical abstraction A of the input space \(\wp (\mathbb {R}^n)\), and leverages a sound approximation \({\delta ^A}\) in A of the distance function \(\delta \). In turn, the definition of \({\delta ^A}\) relies on sound approximations over A of its basic numerical operations such as addition, product, and modulus. Given an abstract value \(a\in A\) which provides a symbolic over-approximation of an adversarial perturbation \(P(\textbf{x})\) of an input sample \(\textbf{x}\), \(C_{\delta ,k}^A(a)\) returns an over-approximation of the set of classes computed by kNN for all the samples in \(P(\textbf{x})\). Hence, if \(C_{\delta ,k}^A(a) = k\text {NN}(\textbf{x})\) holds, then we can infer that kNN is provably stable on \(\textbf{x}\) for its perturbation \(P(\textbf{x})\). We instantiate our certification method to the well-known numerical abstract domains of intervals [8] and zonotopes [19], that approximate the range of numerical features by, resp., real intervals (e.g., \(\textbf{x}_i \in [l,u]\)) and affine forms (e.g., \(\textbf{x}_i = a_0 + \sum _{j=1}^k a_j\epsilon _j\) with \(a_j\in \mathbb {R}\) and noise symbols \(\epsilon _j\in [-1,1]\)). This certification framework for kNN has been implemented in Python. The corresponding tool, called \(\text {NAVe}\) (kNN Abstract Verifier; the Italian word “nave” means “ship”), has been designed to be scalable both in the size of the training dataset and in the value of k, for which no upper bound is assumed. We performed an experimental evaluation of \(\text {NAVe}\) on seven datasets commonly used in robustness certification and on two additional datasets for individual fairness verification. These experimental results show that \(\text {NAVe}\) is an effective tool for formally certifying the adversarial robustness of inputs to kNN, and that, in general, kNN turns out to be a quite robust prediction algorithm: In fact, for adversarial perturbations \(\le \pm 2\%\), \(\text {NAVe}\) is able to infer for several datasets more than \(90\%\) of robustness for \(k\in \{1,3,5,7\}\).

1.2 Illustrative example

Let us consider the example in \(\mathbb {R}^2\) depicted in Fig. 1, where \(\textbf{x} = (2, 4)\) is the input sample and \(P(\textbf{x}) \triangleq \{ \mathbf {x'} \in \mathbb {R}^2 ~|~ \max (|\textbf{x}_1' - \textbf{x}_1|, |\textbf{x}_2' - \textbf{x}_2|) \le 1 \}\) is a perturbation defined as the \(\ell _\infty \) ball of radius 1 centered in \(\textbf{x}\), which can be exactly represented through intervals as \((\textbf{x}_1 \in [1, 3], \textbf{x}_2 \in [3, 5])\). By leveraging the interval abstract domain \({\mathcal {I}}\), we compute the abstract Manhattan distance \(\mu ^{\mathcal {I}}\) between \(P(\textbf{x})\) and the 3 points \(\textbf{p}_{1} = (1, 5)\), \(\textbf{p}_{2} = (8, 4)\), \(\textbf{p}_{3} =(9, 4)\) of the training dataset:

$$\begin{aligned} \mu ^{\mathcal {I}}(P(\textbf{x}), \textbf{p}_{1})&= |[1, 3] -^{\mathcal {I}}1|^{\mathcal {I}}+^{\mathcal {I}}|[3, 5] -^{\mathcal {I}}5|^{\mathcal {I}}= [0, 2] +^{\mathcal {I}}[0, 2] = [0, 4]\,, \\ \mu ^{\mathcal {I}}(P(\textbf{x}), \textbf{p}_{2})&= |[1, 3] -^{\mathcal {I}}8|^{\mathcal {I}}+^{\mathcal {I}}|[3, 5] -^{\mathcal {I}}4|^{\mathcal {I}}= [5, 7] +^{\mathcal {I}}[0, 1] = [5, 8]\,, \\ \mu ^{\mathcal {I}}(P(\textbf{x}), \textbf{p}_{3})&= |[1, 3] -^{\mathcal {I}}9|^{\mathcal {I}}+^{\mathcal {I}}|[3, 5] -^{\mathcal {I}}4|^{\mathcal {I}}= [6, 8] +^{\mathcal {I}}[0, 1] = [6, 9]\,. \end{aligned}$$

These abstract distances are symbolically computed in the interval abstraction \({\mathcal {I}}\) and provide correct lower and upper bounds for the infinite set of Manhattan distances

$$\begin{aligned} \{\mu (\textbf{y},\textbf{p}_i) \in \mathbb {R}_{\ge 0} \mid \textbf{y}\in P(\textbf{x})\}. \end{aligned}$$

By leveraging these abstract distances, for any number of neighbors \(k\in \mathbb {N}^*\), the abstract classifier \(C^{\mathcal {I}}_{\mu ,k}(P(\textbf{x}))\) returns an over-approximation of the set of classes \(\cup _{\textbf{y}\in P(\textbf{x})} k\text {NN}(\textbf{y})\). Let us observe that \(\textbf{p}_{1}\) is the nearest point to \(P(\textbf{x})\), as its interval [0, 4] is strictly dominated by all the others. (\([l_1,u_1]\) is strictly dominated by \([l_2,u_2]\) when \(u_1 < l_2\).) As a consequence, \(C^{\mathcal {I}}_{\mu ,1}(P(\textbf{x})) = \{red\}\), so that we proved that 1NN is stable on \(\textbf{x}\). On the other hand, it turns out that \(\textbf{p}_{2}\) is closer than \(\textbf{p}_{3}\) to every point in \(P(\textbf{x})\), although this cannot be inferred from the corresponding abstract distances since the interval [5, 8] for \(\textbf{p}_2\) is not strictly dominated by [6, 9] for \(\textbf{p}_3\): This is an example of loss of precision, also called incompleteness of the stability certification. Consequently, if we use \(k=2\) in this scenario, then we cannot exclude \(\textbf{p}_{3}\) by the approximate set of neighbors, which could be either \(\{\textbf{p}_{1}, \textbf{p}_{2}\}\), thus resulting in a red output, or \(\{\textbf{p}_{1}, \textbf{p}_{3}\}\), thus causing an ambiguity between a red or green output. This entails that \(C^{\mathcal {I}}_{\mu ,2}(P(\textbf{x})) = \{red , green \}\), so that stability of 2NN on \(\textbf{x}\) cannot be proved. In this case, green is therefore a false positive, arising from the interval approximation. Finally, for \(k = 3\), the stability verification turns out to be complete, because the three samples \(\textbf{p}_{1}, \textbf{p}_{2}, \textbf{p}_{3}\) are the unique points which will be taken into account, as hinted by the black dashed line in Fig. 1, so that \(C^{\mathcal {I}}_{\mu ,3}(P(\textbf{x})) = \{red \}\) holds, thus allowing us to infer that 3NN is stable on \(\textbf{x}\).

1.3 Related work

Formal verification methods in adversarial machine learning have been thoroughly investigated for (deep) neural networks, while different ML models have been much less studied. In particular, adversarial attacks on k-nearest neighbor algorithms have been studied only recently [3, 11, 20, 23,24,25, 41, 42, 45, 46, 48]. Among them, let us mention [42], where the authors put forward an algorithm, called GeoAdEx, based on higher-order Voronoi diagrams, that aims at finding the smallest perturbation that moves an input sample to an adversarial cell, which is an order-k Voronoi cell that has a different majority label. However, finding this smallest perturbation, or a certified lower bound for it, may often need a long time, essentially due to a combinatorial complexity, so that in most cases GeoAdEx outputs exact results, i.e., without approximations, only for \(k = 1\). Moreover, Fan et al. [11]’s approach is orthogonal to ours: (i) Their notion of robustness is different, since an input \(\textbf{x}\) is considered to be robust w.r.t. a set of datasets \(\mathcal {I}\), when there exists a label l such that for all \(D\in \mathcal {I}\), kNN\(_D(\textbf{x})=l\); (ii) [11] studies the theoretical complexity of certifying this different concept of robustness w.r.t. a notion of subset repair of datasets. Let us finally mention that [23, 25] prove robustness of kNN to adversarial poisoning of the dataset by leveraging an over-approximated kNN classifier, while [24] puts forward an abstraction-based method for certifying the fairness of kNN under the assumption that the training data may have bias caused by systematic mislabeling of samples. While these works [23,24,25] leverage some specific sound over-approximations of the procedures involved in kNN classification, they are not firmly designed and specified within the compositional abstract interpretation framework [7, 8]; namely, they are not parametric on some underlying numerical abstract domains (such as the interval and zonotope abstractions employed in this work) and on the corresponding abstract operations (such as abstract additions, exponentials and modulus) to be used for defining abstract distances. Abstract interpretation techniques have been applied for designing precise and scalable robustness verification algorithms and adversarial training techniques for a range of ML models [5, 15, 27, 32,33,34,35,36,37, 39, 40]. To the best of our knowledge, no prior work applied abstract interpretation for the robustness certification of k-nearest neighbors.

This article is a full and revised version of the ICDM2023 conference paper [13], extended to include all the technical proofs and the following novel contributions: Sect. 2.1.2 introduces a new sound abstraction of the modulus operation on zonotopes; Sect. 3.3 shows how to extend the verification method to regression tasks; Sect. 3.4 discusses how different abstractions and perturbations can be used in our approach; and Sect. 4 studies the relationship between our notion of stability with data poisoning.

2 Background

2.1 Numerical abstract domains

A numerical abstract domain (or numerical abstraction) [31] A symbolically represents sets of real vectors through a so-called concretization map \(\gamma ^A: A \rightarrow \wp (\mathbb {R}^n)\) providing the meaning of its abstract (i.e., symbolic) values. A subset of vectors \(S\in \wp (\mathbb {R}^n)\) is over-approximated by some abstract value \(a\in A\) when \(S\subseteq \gamma ^A(a)\), while S is exactly represented by a when \(S= \gamma ^A(a)\) holds. An abstract domain A may also admit an abstraction function \(\alpha ^A: \wp (\mathbb {R}^n) \rightarrow A\) such that \(\alpha ^A(S)\) is the best abstraction in A of the set S, where the notion of best means least (or minimal) w.r.t. the following preorder relation on A: \(a \sqsubseteq ^A a' \Leftrightarrow \gamma ^A(a)\subseteq \gamma ^A(a')\). If \(\langle A, \sqsubseteq ^A\rangle \) is a partially ordered set, then the concretization and abstraction maps form a Galois connection: For all \(S\in \wp (\mathbb {R}^n)\) and \(a\in A\), \(\alpha ^A(S) \sqsubseteq ^A a \Leftrightarrow S \subseteq \gamma ^A(a)\) holds.

Given a k-ary operation on vectors \(f: (\mathbb {R}^n)^k \rightarrow \mathbb {R}^n\), for some \(k\ge 1\), an abstract function \(f^A: A^k \rightarrow A\) is a sound (or correct) (over-)approximation of f when for all \((a_1,\ldots ,a_k)\in A^k\), the containment

$$\begin{aligned} \{f(\textbf{x}_1,\ldots ,\textbf{x}_k) \mid \forall i\cdot \, \textbf{x}_i \in \gamma ^{A}(a_i) \} \subseteq \gamma ^{A}(f^{A} (a_1,\ldots ,a_k)) \end{aligned}$$

holds, while \(f^{A}\) is defined to be exact (or complete) when equality holds. In words, soundness holds when \(f^{A} (a_1,\ldots ,a_k)\) never misses a concrete computation of f on some input \((\textbf{x}_1,\ldots ,\textbf{x}_k)\) which is abstractly represented by \((a_1,\ldots ,a_k)\), while exactness means that each abstract computation \(f^{A} (a_1,\ldots ,a_k)\) is an exact abstract representation of the set of concrete computations of f on all the inputs that are abstractly represented by \((a_1,\ldots ,a_k)\). If A is endowed with an abstraction map \(\alpha ^A\), then the function

$$\begin{aligned} f_{\text {best}}^A \triangleq \lambda (a_1,\ldots ,a_k)\cdot \, \alpha ^A(f(\gamma ^A(a_1),\ldots ,\gamma ^A(a_k))) \end{aligned}$$

is called the best correct approximation of f, because for any other correct approximation \(f^A\), \(f_{\text {best}}^A (a_1,\ldots ,a_k) \sqsubseteq ^A f^A (a_1,\ldots ,a_k)\) always holds. Thus, \(f_{\text {best}}^A\) represents the best possible approximation of f that can be defined on the abstract domain A.

Intervals The abstract domain of real intervals \({\mathcal {I}}\) is one of the simplest and most used abstractions in ML verification. The interval domain abstracts the values of a real variable by a (possibly unbounded) real interval [lu], where \(l, u \in \mathbb {R} \cup \{-\infty , +\infty \}\) and \(l \le u\) (with \(-\infty \le x \le +\infty \) for all \(x \in \mathbb {R}\)). Moreover, \({\mathcal {I}}\) includes a symbolic representation \(\bot ^{\mathcal {I}}\) of the empty set. The concretization \(\gamma ^{\mathcal {I}}:{\mathcal {I}}\rightarrow \wp (\mathbb {R})\) is defined as follows: \(\gamma ^{\mathcal {I}}(\bot ^{\mathcal {I}}) \triangleq \varnothing \); \(\gamma ^{\mathcal {I}}([l, u]) \triangleq \{ x \in \mathbb {R} \mid l \le x \le u \}\). The product interval abstraction \({\mathcal {I}}^n\), with \(n\ge 1\), is also called the box (or hyperrectangle) domain, and its concretization map \(\gamma ^{{\mathcal {I}}^n}: {\mathcal {I}}^n \rightarrow \wp (\mathbb {R}^n)\) is defined by a straightforward componentwise product of \(\gamma ^{\mathcal {I}}\). Intervals have an abstraction map \(\alpha ^{\mathcal {I}}:\wp (\mathbb {R}) \rightarrow {\mathcal {I}}\) which is defined as follows:

$$\begin{aligned} \alpha ^{{\mathcal {I}}} (X) \triangleq \left\{ {\begin{array}{*{20}l} { \bot ^{{\mathcal {I}}} } &{} {\quad {\text {if }}X = \emptyset } \\ {[\inf X,\sup X]} &{} {\quad {\text {otherwise }}} \\ \end{array} } \right. \end{aligned}$$

Zonotopes The interval domain can be imprecise as it is nonrelational, i.e., \({\mathcal {I}}\) does not represent information on how values of different variables are related. For example, the most precise interval approximation of the set \(T=\{(x,y)\in \mathbb {R}^2 \mid 0\le x,y\le 1,\, x=y\}\) is \(\langle x\in [0,1],y\in [0,1]\rangle \), thus losing the information that \(x-y=0\). The zonotope abstract domain \({\mathcal {Z}}\) [16, 19] is based on affine arithmetic [9] and can be viewed as an extension of intervals that keeps track of affine relations between values of different variables. The domain \({\mathcal {Z}}\) consists of abstract values \(\hat{a}=a_0 + \sum _{j=1}^m a_j \epsilon _j \in {\mathcal {Z}}\), where \(a_j\in \mathbb {R}\) are coefficients and \(\epsilon _j\) are noise symbols whose values range in the real interval \([-1,1]\), and when these \(\epsilon _j\) are shared between different variables/features, they encode a relation between them. The concretization of a zonotope \(\hat{a}\) is given by

$$\begin{aligned} \gamma ^{\mathcal {Z}}(\hat{a}) \triangleq \left\{ a_0+\sum \nolimits _{j=1}^m a_j \epsilon _j \in \mathbb {R}\mid \forall j\cdot \, \epsilon _j\in [-1,1]\right\} , \end{aligned}$$

i.e., the zonotope \(\hat{a}\) represents the real interval \(\big [a_0-\sum _{j=1}^m |a_j|, a_0+\sum _{j=1}^m |a_j|\big ]\). The product zonotope abstraction \({\mathcal {Z}}^n\), with \(n\ge 1\), may share noise symbols between different components, thus enabling to represent relational information between features. For example, the above set \(T\subseteq \mathbb {R}^2\) can be exactly represented by the zonotope \((\hat{x}=0.5 \, +\, 0.5\epsilon _1, \hat{y}=0.5 \, +\, 0.5\epsilon _1)\), so that we can infer that \(\hat{x}-\hat{y}=0\) holds. A fundamental property of zonotopes is that linear functions, such as vector addition and constant multiplication, admit corresponding exact abstract operations on \({\mathcal {Z}}\), while nonaffine functions, such as multiplications and modulus, must necessarily be approximated.

The basic abstract operations on intervals and zonotopes for computing abstract distances are recalled below.

2.1.1 Abstract operations on intervals

The most precise abstract operations, that is, the best correct approximations, on \({\mathcal {I}}\) are well known [31] and recalled below.

$$\begin{aligned}&{\textbf {addition:}}\; [l_1, u_1] +^{\mathcal {I}}[l_2, u_2] \triangleq [l_1 + l_2, u_1 + u_2]\\&{\textbf {constant multiplication:}}\; c[l, u] \triangleq {\left\{ \begin{array}{ll} {[}cl,cu] &{}\quad \text {if } c\ge 0 \\ {[}cu,cl] &{}\quad \text {otherwise} \\ \end{array}\right. }\\&{\textbf {multiplication:}}\, {[}l_1, u_1] \cdot ^{\mathcal {I}}[l_2, u_2] \triangleq [\min (l_1 l_2, l_1 u_2, u_1 l_2, u_1 u_2), \max (l_1 l_2, l_1 u_2, u_1 l_2, u_1 u_2)]\\&{\textbf {modulus:}}\; |[l, u]|^{\mathcal {I}}\triangleq {\left\{ \begin{array}{ll} {[}\min (|l|, |u|), \max (|l|, |u|)] &{}\quad \text {if } lu \ge 0 \\ {[}0, \max (|l|, |u|)] &{}\quad \text {otherwise} \\ \end{array}\right. } \\&{\textbf {exponential:}}\; [l, u]^{p^{\mathcal {I}}} \triangleq {\left\{ \begin{array}{ll} {[}l^p, u^p] &{} \quad \text {if } p \text { odd or } l \ge 0 \\ {[}u^p,l^p] &{} \quad \text {if } p \text { even and } u<0\\ {[}0, \max (l^p, u^p)] &{} \quad \text {otherwise}\\ \end{array}\right. } \\&{\textbf {dominance test:}}\; {[}l_1, u_1]<^{\mathcal {I}}{[}l_2, u_2] \triangleq u_1 < l_2 \end{aligned}$$

In particular, soundness of the dominance test means that if \([l_1, u_1] <^{\mathcal {I}}[l_2, u_2]\), then for all \(x\in \gamma ^{\mathcal {I}}([l_1, u_1])\) and \(y\in \gamma ^{\mathcal {I}}([l_2, u_2])\), \(x<y\) holds.

2.1.2 Abstract operations on zonotopes

Zonotopes are exact for linear operations, namely addition and constant multiplication, while for nonlinear operations, in particular multiplication and modulus, the result, in general, cannot be exactly represented by a zonotope, so that the multiplication of zonotopes approximates the precise result by adding a fresh noise symbol \(\epsilon _{\text {f}}\) whose coefficient is typically computed by a Taylor approximation of the nonlinear part of the multiplication (see [19, Section 2.1.5]). Given \(\hat{a}=a_{0} + \sum _{j=1}^m a_j \epsilon _j\in {\mathcal {Z}}\) and \(\hat{b}=b_{0} + \sum _{j=1}^m b_j \epsilon _j \in {\mathcal {Z}}\), the abstract operations are given below:

$$\begin{aligned}&{\textbf {addition:~}}\; \hat{a} +^{\mathcal {Z}}\hat{b} \triangleq (a_0 + b_0) + \textstyle \sum _{j=1}^m (a_j + b_j)\epsilon _j\\&{\textbf {constant multiplication:~}}\; c\hat{a} \triangleq c a_{0} + \textstyle \sum _{j=1}^k c a_j \epsilon _j\\&{\textbf {multiplication:~}}\; \hat{a} \cdot ^{\mathcal {Z}}\hat{b} \triangleq \big (a_0b_0 + \textstyle \frac{1}{2} \sum _{j=1}^m |a_jb_j| \big ) + \\&\quad \qquad \qquad \qquad \quad \textstyle \sum _{j=1}^m (a_jb_0 + b_ja_0)\epsilon _j + \big ( \textstyle \frac{1}{2}\sum _{j=1}^m |a_jb_j| \!+\! \textstyle \sum _{1\le i< j \le m} |a_ib_j + a_jb_i|\big ) \epsilon _{\text {f}}\\&{\textbf {exponential:~}}\; \hat{a}^{p^{\mathcal {Z}}} \triangleq \hat{a} \cdot ^{\mathcal {Z}}\ldots \, \cdot ^{\mathcal {Z}}\hat{a} \; \qquad \text {with}\, p -1\hbox { abstract multiplications }\cdot ^{\mathcal {Z}}\\&{\textbf {dominance test:~}}\; \hat{a}<^{\mathcal {Z}}\hat{b} \triangleq a_0 - b_0 + \textstyle \sum _{j=1}^m |a_j-b_j| <0 \end{aligned}$$

An abstract modulus on zonotopes To the best of our knowledge, no algorithm implementing a sound abstraction of the modulus operation on zonotopes is available in the literature. Therefore, we designed a novel abstract function on \({\mathcal {Z}}\) that approximates the generic modulus operation. By following the general approach in affine arithmetic described in [9], we define a zonotope approximating the absolute value of a given zonotope, and then, we compute the maximal absolute error of such approximation and add that error to a nonlinear term \(\epsilon _{\text {f}}\) to guarantee soundness.

Fig. 2
figure 2

Example of the absolute value of a zonotope

Figure 2 depicts an example where a zonotope \(\hat{a}=c + a \epsilon _1\) is plotted as \(y = a x + c\), with \(x \in [-1, 1]\) (the white area in the diagram), through a dashed black line, and its absolute value \(y = |\hat{a}|=|a x + c|\) as solid black line. In this example, finding a sound over-approximation of \(|\hat{a}|\) in \({\mathcal {Z}}\) means computing two parallel lines defining a zonotope which includes every point of \(y = |\hat{a}|\), as shown in the figure by the blue area. The two lines l and h determining the blue area are parallel, i.e., they have the same slope \(m \in \mathbb {R}\), and differ for their vertical displacements \(q_l, q_h \in \mathbb {R}\). More precisely, we need to find \(m, q_l, q_h\in \mathbb {R}\) such that \(m x + q_l \le |a x + c| \le m x + q_h\) for every \(x \in [-1, 1]\). The overapproximating zonotope will be generated by the line \(y = mx + \frac{q_l + q_h}{2}\) parallel to l and h and will account for the absolute error \(\frac{q_h - q_l}{2}\). This therefore defines the zonotope \(|\hat{a}|^{{\mathcal {Z}}} \triangleq \frac{q_l + q_h}{2} + m \epsilon _1 + \frac{q_h - q_l}{2} \epsilon _{\text {f}}\) that retains some information about the linear contribution of \(\epsilon _1\) and introduces a nonlinear contribution in \(\epsilon _{\text {f}}\).

To compute \(m, q_l, q_u\) satisfying \(m x + q_l \le |a x + c| \le m x + q_h\) for every \(x \in [-1, 1]\), we first observe that the value of a zonotope \(c + a \epsilon _1\) is either always positive, always negative, or it crosses \(y = 0\) in some point \(x_0 \in [-1, 1]\). The first two cases are trivial as the absolute value can be simply omitted, possibly after changing the sign of the zonotope, so we focus on the last case where \(x_0\) is \(-\frac{c}{a}\). The absolute value is therefore strictly decreasing in \([-1, x_0]\) and strictly increasing in \([x_0, 1]\) due to the linearity of its argument. We have that \(P_0=(x_0, 0)\) is the (global) minimum point while \(P_L=(-1, |-a+c|)\) and \(P_R=(1, |a+c|)\) are the two (local) maximal points. Thus, the following three inequalities for \(m, q_l, q_u\) must hold: \(m x_0 + q_l \le 0\), \(|-a+c| \le -m + q_h\), and \(|a+c| \le m + q_h\). An easy way to find \(m, q_h\) satisfying \(|a+c| \le m + q_h\) is to pick the line passing through \(P_L\) and \(P_R\), whose slope is \(m = \frac{|a+c| - |a-c|}{1 - (-1)} = \frac{|a+c| - |a-c|}{2}\). The value of \(q_h\) can be estimated by imposing either \(P_L\) or \(P_R\) to be a solution of \(y_P = m x_P + q_h\), a line referred to as \(\Gamma _h\). Since the absolute value of a linear function is a convex shape, it can be crossed by \(\Gamma _h\) in at most two points, which are precisely \(P_L\) and \(P_R\): As no other point can intersect \(\Gamma _h\), and \(P_L\), \(P_R\) are the extreme points within the domain, every other point must belong to the same half-space identified by \(\Gamma _h\). Moreover, since \(P_L\) and \(P_R\) are maximal points, every other point must be dominated by \(\Gamma _h\), thus proving that the line \(\Gamma _h\) is a sound upper bound. Lastly, we need to determine \(q_l\). By using a similar argument, we define the line \(\Gamma _l\) as \(y = mx + q_l\) such that \(P_0 \in \Gamma _l\). Once again, due to convexity, \(\Gamma _l\) can cross the absolute value in at most two points, which is \(P_0\) with multiplicity two, hence any other point must belong to the same half-space. Since \(P_0\) is the global minimum, it turns out that \(\Gamma _l\) is dominated by every other point and, consequently, the line \(\Gamma _l\) is a sound lower bound. Since the vertical distance between the two lines \(\Gamma _h\) and \(\Gamma _l\) is \(q_h - q_l\), we can consider the parallel line located halfway between them and as absolute error the semi distance \(\frac{q_h - q_l}{2}\): This therefore defines as sound abstraction of \(|\hat{a}|\) the zonotope \(|\hat{a}|^{{\mathcal {Z}}} = \frac{q_l + q_h}{2} + m \epsilon _1 + \frac{q_h - q_l}{2} \epsilon _{\text {f}}\).

The example described above assumes a single nonzero linear noise contribution for \(\epsilon _1\) for the argument zonotope \(\hat{a}=c + a \epsilon _1\). This same approximation technique can be applied to the case where the argument zonotope has a nonlinear noise contribution \(\epsilon _r\) and no linear noise \(\epsilon _i\), i.e., \(\hat{a}=c + a \epsilon _r\). While the computations remain the same, the coefficient m must be added to the nonlinear term, so that, in this case, we have that \(|\hat{a}|^{{\mathcal {Z}}} = \frac{q_l + q_h}{2} + (|m| + \frac{q_h - q_l}{2}) \epsilon _r\). In practice, it is always possible to consider \(\epsilon _r\) as a fresh independent noise symbol while converting a zonotope to its geometrical representation.

When the argument zonotope has both a nonzero linear and nonlinear noise, i.e., \(\hat{a}=c + a \epsilon _1 + b \epsilon _r\), or, more in general, there are d linear noise contributions \(\epsilon _i\) and a nonlinear noise \(\epsilon _r\), i.e., \(\hat{a}=a_0 + \sum _{j=1}^d a_j \epsilon _j + a_r \epsilon _r\), we can generalize this approximation technique as follows. We first convert the argument zonotope \(\hat{a}\) into a hyperplane \(\Pi \) in \(\mathbb {R}^{d+2}\) by interpreting \(\epsilon _r\) as \(x_{d+1}\) (so that \(a_{d+1}=a_r\)), and adding a dimension \(x_{d+2}\) to represent the dependent variable, and we set the constraints \(x_1, x_2, \ldots , x_{d+1} \in [-1, 1]\) (while \(x_{d+2} \in \mathbb {R}\)). We then define a subset \(S \subseteq \Pi \) by selecting \(d+2\) points from \(\Pi \) whose values for every independent variable \(x_i\) are either \(-1\) or \(+1\), while the value of the dependent variable \(x_{d+2}\) is accordingly computed, that is,

$$\begin{aligned} S=&\;\left\{ \left( a_1e_1,\ldots ,a_de_d,a_{d+1}e_{d+1}, a_0+ \sum \nolimits _{j=1}^{d+1} a_je_j\right) \in \Pi \mid \exists j. e_j=-1, \forall i\ne j. e_i=1\right\} \\&\;\cup \left\{ \left( a_1,\ldots ,a_d,a_{d+1}, a_0+ \sum \nolimits _{j=1}^{d+1} a_j\right) \right\} . \end{aligned}$$

If all the dependent values \(x_{d+2}\) of every vector in S are nonnegative, then \(\hat{a}\) is always nonnegative, so that its absolute value is itself, i.e., \(|\hat{a}|^{\mathcal {Z}}\triangleq \hat{a}\). Similarly, if all the values \(x_{d+2}\) are nonpositive, then \(\hat{a}\) is nonpositive, so that its absolute value can be obtained simply by \(|\hat{a}|^{\mathcal {Z}}\triangleq -\hat{a}\). In both cases, the absolute value \(|\hat{a}|^{{\mathcal {Z}}}\) is an exact abstraction with no loss of precision. Otherwise, there exist two vectors \(\textbf{x}\) and \(\textbf{y}\) in S with \(\textbf{x}_{d+2} < 0\) and \(\textbf{y}_{d+2} > 0\), meaning that \(\Pi \) has a nonempty intersection with the hyperplane \(\Pi _0\) defined as \(\textbf{x}_{d+2} = 0\). We then define the subset \(S' = \{(\textbf{x}_1,\textbf{x}_2, \ldots , \textbf{x}_d, \textbf{x}_{d+1}, |\textbf{x}_{d+2}|) ~|~ \textbf{x} \in S\}\); namely, we switch the negative signs of \(\textbf{x}_{d+2}\). By doing so, \(S'\) is a subset of the extremal points of the absolute value of \(\hat{a}\). Next, we compute the hyperplane \(\Pi _h\) containing every point in \(S'\). This step needs to compute the determinant of a \((d+2) \times (d+2)\) matrix, which is nonsingular by construction. Such hyperplane \(\Pi _h\) provides an upper bound for the absolute value of \(\hat{a}\). Then, the lower bound hyperplane \(\Pi _l\), parallel to \(\Pi _h\), is computed, thus having the same coefficients of variables of \(\Pi _h\) and a different constant term. To do so, we compute \(\Pi ' = \Pi \cap \Pi _0\), thus defining a subspace in \(\mathbb {R}^{d'}\) for some \(d' < d +2\). We sample a single vector \(P_0\) of \(\Pi '\), whose existence is guaranteed by construction, and we use that vector to estimate the constant term of \(\Pi _l\), so that \(\Pi _l\) provides a lower bound for the absolute value of \(\hat{a}\). Then, we consider the hyperplane \(\Pi _m\) parallel to both \(\Pi _l\) and \(\Pi _h\) and equally distant from them, i.e., \(\Pi _m = \frac{\Pi _l + \Pi _h}{2}\), and convert \(\Pi _m\) to a zonotope in \(\mathbb {R}^d\) by adding the vertical distance \(\frac{\Pi _h - \Pi _l}{2}\) to the nonlinear noise term. Such abstraction is sound because the vectors in \(\Pi _h\), resp. \(\Pi _l\), dominate, resp. are dominated, by the absolute values of the points in the argument zonotope \(\hat{a}\).

It turns out that our algorithm for the stability certification of kNN relies on an abstract modulus function on zonotopes that always has a specific form, and this can be exploited to enhance the efficiency of computing the modulus. In fact, our certification algorithm always applies an abstract modulus of type \(|a_0 + a_j \epsilon _j|^\mathcal {Z}\), for some \(j\in [1,m]\), that is, the modulus of a line on a plane with unknown \(\epsilon _j\). Hence, the abstract modulus computes the line including the two extremal points \((-1, a_0 - a_j)\) and \((+1, a_0 + a_j)\) as a correct upper bound for \(|a_0 + a_j \epsilon _j|\), and the parallel line passing through the point \((-\frac{a_0}{a_j}, 0)\) as a correct lower bound. We then consider the line \(y = px + q\) parallel to these two lines and at the same distance \(d>0\) from them. This allows us to define as abstract modulus \(|a_0 + a_j \epsilon _j|^\mathcal {Z}\triangleq q + p\epsilon _j + d\epsilon _{\text {f}}\), where \(\epsilon _{\text {f}}\) is a fresh noise symbol.

2.2 kNN classifiers

Consider a ground truth dataset \(D \subseteq X \times L\), where \(X \subseteq \mathbb {R}^n\) is an input space and L is a set of classification labels, and a distance function \(\delta : X \times X \rightarrow \mathbb {R}_{\ge 0}\). Given \(k\in \mathbb {N}^* \triangleq \mathbb {N}{\smallsetminus }\{0\}\), a kNN classifier is modeled as a total function , which maps an input sample \(\textbf{x} \in X\) into a nonempty set of labels, by first selecting in D the k nearest samples to \(\textbf{x}\) according to \(\delta \), and then returning the set of their most frequent labels. Hence, an output set including more than one label means a tie vote, and this justifies why we consider sets of labels as codomain of classifiers.

2.3 Stability and robustness

A perturbation \(P:X\rightarrow \wp (X)\) of an input sample \(\textbf{x}\in X\) is a variation of its feature values defining a potential adversarial region \(P(\textbf{x})\in \wp (X)\). A very common instance [6] is given by perturbations for the maximum norm \(\Vert \cdot \Vert _\infty \): Given \(\textbf{x} \in \mathbb {R}^n\) and a magnitude \(\tau > 0\), the \(\ell _\infty \)-perturbation is \(P^\tau _\infty (\textbf{x}) \triangleq \{\textbf{w} \in \mathbb {R}^n \mid \max (|\textbf{w}_1 - \textbf{x}_1|,\ldots , |\textbf{w}_n - \textbf{x}_n|) \le \tau \}\), i.e., the \(\ell _\infty \)-ball of radius \(\tau \) centered in \(\textbf{x}\). This perturbation can be exactly represented through intervals and zonotopes, that is, \(P^\tau _\infty (\textbf{x})= \gamma ^{{\mathcal {I}}^n}(\langle [\textbf{x}_1 - \tau , \textbf{x}_1 + \tau ],\ldots , [\textbf{x}_n - \tau , \textbf{x}_n + \tau ]\rangle ) =\gamma ^{{\mathcal {Z}}^n}(\langle \frac{\textbf{x}_1}{2} + \tau \epsilon _1,\ldots , \frac{\textbf{x}_n}{2} + \tau \epsilon _n\rangle )\).

A classifier \(C:X \rightarrow \wp (L)\) is accurate on a ground truth input \((\textbf{x}, l_\textbf{x})\in D\) when \(C(\textbf{x}) = \{l_\textbf{x}\}\). Moreover, C is stable over a region \(R \subseteq X\), when \(\cup _{\textbf{w}\in R} C(\textbf{w}) =\{l\}\) holds, for some \(l\in L\). Stability means that a classifier does not change its output on a region of similar inputs and is an orthogonal notion with respect to accuracy, as it does not require to know the ground truth labels. If a classifier C is both accurate on an input \((\textbf{x}, l_\textbf{x})\) and stable over a perturbation \(P(\textbf{x})\) of \(\textbf{x}\), then C is robust on input \((\textbf{x}, l_\textbf{x})\) for \(P(\textbf{x})\), i.e., for all \(\textbf{w} \in P(\textbf{x})\), \(C(\textbf{w}) = \{l_\textbf{x}\}\) holds. Accordingly, stability and robustness metrics for a classifier C on some test set \(T\subseteq X\times L\) are defined as the percentage of test samples \(\textbf{x}\in T\) for which C is stable/robust over a perturbation \(P(\textbf{x})\):

$$\begin{aligned}&\textstyle \textsc {stab}(C, T) \triangleq |\{(\textbf{x}, l_\textbf{x}) \in T \mid C \text { stable on } P(\textbf{x})\}|/|T| \\&\textstyle \textsc {rob}(C, T) \triangleq |\{(\textbf{x}, l_\textbf{x}) \in T \mid C \text { robust on } (\textbf{x}, l_\textbf{x}) \text { for } P(\textbf{x})\}|/|T| \end{aligned}$$

2.4 Individual fairness

Our method can be also applied to certify individual fairness [10] that intuitively encodes the principle that “two individuals who are similar with respect to a particular task should be classified similarly.” The similarity relation on the input space X is expressed in terms of a distance \(\delta \) and a threshold \(\tau >0\) by considering \(S_{\delta , \tau } \triangleq \{ (\textbf{x}, \textbf{y}) \in X \times X ~|~ \delta (\textbf{x}, \textbf{y}) \le \tau \}\). The distance metric \(\delta \) is specific to the fairness problem, where [10] studies the total variation or relative \(\ell _\infty \) distances. Then, given an individual \(\textbf{x} \in X\), a classifier \(C: X \rightarrow \wp (L)\) is individually fair on \(\textbf{x}\) with respect to \(S_{\delta , \tau }\) when:

$$\begin{aligned} \forall \textbf{y} \in X\cdot (\textbf{x}, \textbf{y}) \in S_{\delta , \tau } \Rightarrow C(\textbf{x}) = C(\textbf{y}). \end{aligned}$$

Thus, individual fairness for \(\textbf{x}\) holds if and only if for all \(\textbf{y} \in P_{\delta }^{\tau }(\textbf{x})\), \(C(\textbf{x}) = C(\textbf{y})\), where \(P_{\delta }^{\tau }: X \rightarrow \wp (X)\) is the perturbation defined as \(P_{\delta }^{\tau }(\textbf{x}) \triangleq \{\textbf{y} \in X ~|~ \delta (\textbf{x}, \textbf{y}) \le \tau \}\). Hence, by leveraging this simple observation, individual fairness boils down to stability, so that their metrics coincide.

3 Abstract verification of kNN

Given a classifier \(C :X \rightarrow \wp (L)\), a sound abstraction of C on a numerical abstraction \(\langle A,\gamma ^A\rangle \) is an algorithm \(C^A :A \rightarrow \wp (L)\), which is sound, i.e.,

$$\begin{aligned} \text {for all}\,a \in A, \cup _{\textbf{x} \in \gamma ^A(a)} C(\textbf{x}) \subseteq C^A(a) \end{aligned}$$

holds. Thus, soundness means that \(C^A(a)\) over-approximates all the output labels of C on inputs abstractly represented by \(a\in A\). If this over-approximation is indeed a singleton then C is provably stable over the region \(\gamma ^A(a)\), i.e., this approach provides a formal stability certification.

Theorem 3.1

(Abstract stability certification) Let \(C^A\) be a sound abstraction of C and assume that a region \(R\subseteq X\) is over-approximated by some \(a \in A\). If \(|C^A(a)| = 1\) then C is stable over R.

Proof

By hypothesis, there exists a label \(l \in L\) such that \(C^A(a) = \{l\}\). By soundness of \(C^A\), \(\cup _{\textbf{x} \in \gamma ^A(a)} C(\textbf{x}) \subseteq \{l\}\). Since, for all \(\textbf{x}\), \(C(\textbf{x})\ne \varnothing \), we have that for all \(\textbf{x} \in \gamma ^A(a)\), \(C(\textbf{x}) = \{l\}\). Since \(R \subseteq \gamma ^A(a)\), we obtain that for all \(\textbf{y} \in R\), \(C(\textbf{y}) = \{l\}\), namely C is stable over R.\(\square \)

It is worth remarking that the converse of Theorem 3.1, in general, does not hold, meaning that this stability certification method can be incomplete. This incompleteness may depend on an input abstract value \(a\in A\) which does not represent exactly the adversarial region R or by a loss of precision in the abstract computations of \(C^A\). The former issue can be settled by leveraging abstract domains which are capable to represent exactly the perturbation model of interest, as it is the case of the interval and zonotope abstractions for \(\ell _\infty \)-perturbations.

3.1 Abstract distance

The kNN algorithm relies on a distance \(\delta : \mathbb {R}^n \times \mathbb {R}^n \rightarrow \mathbb {R}_{\ge 0}\) for determining the k nearest vectors to a given input sample. Although kNN is parametric on \(\delta \), Minkowski distance is the standard choice: given \(p \in \mathbb {N}^*\),

$$\begin{aligned} \delta _p(\textbf{x}, \textbf{y}) \triangleq \root p \of {\sum \nolimits _{i = 1}^n | \textbf{x}_i - \textbf{y}_i|^p}. \end{aligned}$$

In particular, the two most common instances are for \(p=1,2\):

Manhattan distance::

\(\mu (\textbf{x}, \textbf{y}) \triangleq \delta _1(\textbf{x}, \textbf{y}) = \sum _{i = 1}^n |\textbf{x}_i - \textbf{y}_i|\)

Euclidean distance::

\(\eta (\textbf{x}, \textbf{y}) \triangleq \delta _2(\textbf{x}, \textbf{y}) = \sqrt{\sum _{i = 1}^n (\textbf{x}_i - \textbf{y}_i)^2}\)

Observe that kNN relies on the distance for relative comparisons only, so we can safely discharge the p-th root \(\root p \of {\cdot }\) in \(\delta _p\) to simplify the computations. A numerical abstract domain must therefore provide sound abstractions of the operations used for computing these distances, namely addition (i.e., subtraction), exponential and modulus. We recalled in Sect. 2.1 the definitions of the abstract operations on intervals and zonotopes. Let us remark the following points.

  1. (1)

    We need a sound abstract dominance relation to be used for comparing abstract distances, i.e., an algorithm \((\cdot <^A \cdot ): A\times A \rightarrow \{{\textbf {true}}, {\textbf {?}}\}\) such that

    $$\begin{aligned} \text {if}\, a_1<^A a_2 = {\textbf {true}}\hbox { then for all }x\in \gamma ^A(a_1)\hbox { and }y\in \gamma ^A(a_2), x<y\hbox { holds.} \end{aligned}$$

    The dominance tests for intervals and zonotopes have been given in Sect. 2.1. It is worth noticing that the dominance relation \(<^{\mathcal {I}}\) for intervals boils down to the so-called interval order [14], while the relation \(<^{\mathcal {Z}}\) for zonotopes may exploit their relational information as encoded by shared noise symbols: e.g., a comparison between zonotopes such as \(-2 +2\epsilon _1 <^{\mathcal {Z}}1 + \epsilon _1 +\epsilon _2\) reduces to \(-3 + \epsilon _1 -\epsilon _2 <^{\mathcal {Z}}0\), which clearly holds.

  2. (2)

    A sound and precise enough approximation for zonotopes of the modulus function \(|\!\mathbin {\cdot }\!|\) was not available in the literature, and hence, we designed a novel algorithm for the abstract modulus of zonotopes as described in Sect. 2.1.2.

  3. (3)

    The abstract operations on the product domains \({\mathcal {I}}^n\) and \({\mathcal {Z}}^n\) are defined by a straightforward componentwise extension of their unary versions on \({\mathcal {I}}\) and \({\mathcal {Z}}\).

It turns out that the abstract Minkowski distance \(\delta _p^{{\mathcal {I}}^n}\), without the p-th root, on intervals does not lose precision, i.e., it is an exact approximation.

Theorem 3.2

(Minkowski distance on intervals is exact) Given \(\textbf{a}, \textbf{b} \in {\mathcal {I}}^n\),

$$\begin{aligned} \{\delta _p(\textbf{x}, \textbf{y}) \mid \textbf{x} \in \gamma ^{{\mathcal {I}}^n}(\textbf{a}), \textbf{y} \in \gamma ^{{\mathcal {I}}^n}(\textbf{b})\} = \gamma ^{{\mathcal {I}}}(\delta _p^{{\mathcal {I}}^n}(\textbf{a},\textbf{b})), \end{aligned}$$

where \(\delta _p^{{\mathcal {I}}}(\textbf{a},\textbf{b})\triangleq (+^{{\mathcal {I}}})_{i = 1}^n (|\textbf{a}_i -^{\mathcal {I}}\textbf{b}_i|^{\mathcal {I}})^{p^{\mathcal {I}}}\).

Proof

We show that all the abstract operations on the interval abstract domain \({\mathcal {I}}\) used in the definition of \(\delta _p^{{\mathcal {I}}}\) are complete. This entails the completeness of the interval Minkowski distance \(\delta _p^{{\mathcal {I}}}\) because the composition of complete abstract functions preserves their completeness. In fact, if A is a numerical abstraction of \(\wp (\mathbb {R}^n)\) (this property actually holds for generic domains in abstract interpretation) and \(f^A:A^i\rightarrow A^j\) and \(g^A:A^j \rightarrow A^k\) are two abstract functions that are complete w.r.t., resp., \(f:\wp (\mathbb {R}^n)^i \rightarrow \wp (\mathbb {R}^n)^j\) and \(g:\wp (\mathbb {R}^n)^j \rightarrow \wp (\mathbb {R}^n)^k\), then \(g^A\circ f^A:A^i\rightarrow A^k\) is complete for \(g\circ f:\wp (\mathbb {R}^n)^i\rightarrow \wp (\mathbb {R}^n)^k\) because \(\gamma ^A\circ g^A\circ f^A = g\circ \gamma ^A \circ f^A = g \circ f \circ \gamma ^A\).

We refer to the definitions of abstract numerical operations on the interval abstraction as given in Sect. 2.1.1. The interval difference \({a}_i -^{\mathcal {I}}{b}_i\) is well known to be complete (see, e.g., [31]). Then, we observe that the interval modulus is also complete, i.e., \(\gamma ^{{\mathcal {I}}}(|[l,u]|^{\mathcal {I}}) = \{ |x|\in \mathbb {R}\mid l \le x \le u\}\): This is an easy observation which can be inferred by distinguishing the two cases \(lu\ge 0\) and \(lu<0\) of the definition of \(|[l,u]|^{\mathcal {I}}\). As a consequence, each interval \(|{\textbf{a}}_i \! -^{\mathcal {I}}{\textbf{b}}_i|^{\mathcal {I}}=[l_i,u_i]\) occurring in the definition of \(\delta _p^{{\mathcal {I}}}(\textbf{a},\textbf{b})\) is such that \(l_i\ge 0\). Therefore, by definition of interval exponential \((\cdot )^{p^{\mathcal {I}}}\), it turns out that \({(|{\textbf{a}}_i -^{\mathcal {I}}{\textbf{b}}_i|^{\mathcal {I}})}^{p^{\mathcal {I}}} =[l_i,u_i]^{p^{\mathcal {I}}} = [(l_i)^p,(u_i)^p]= \{x^p \in \mathbb {R}\mid l_i \le x \le u_i\}\); namely, completeness of the p-th interval exponential holds. Finally, interval addition is well known to be complete (see, e.g., [31]).

Hence, it turns out that the interval Minkowski distance \(\delta _p^{{\mathcal {I}}}(\textbf{a},\textbf{b})\) is a composition of complete abstract operations, so that \(\delta _p^{{\mathcal {I}}}\) turns out to be complete. \(\square \)

By contrast, we show that the abstract Minkowski distance on zonotopes cannot be guaranteed to be exact: This is expected as the modulus and exponential operations are not linear and, therefore, are necessarily approximated on zonotopes.

Example 3.3

(Minkowski distance on zonotopes is not exact) Consider two zonotopes \(\hat{a}=4+\epsilon _1 +2\epsilon _2\) and \(\hat{b}=2+\epsilon _1 +\epsilon _2\), representing some feature in \(\mathbb {R}\), that share two noise symbols \(\epsilon _1\), \(\epsilon _2\). Consider the abstract Euclidean distance \(\eta ^{\mathcal {Z}}(\hat{a},\hat{b}) = (\hat{a}-^{\mathcal {Z}}\hat{b})^{2^{\mathcal {Z}}}\). Thus, by applying the operations on zonotopes recalled in Sect. 2.1.2, we have that:

$$\begin{aligned} \eta ^{\mathcal {Z}}(\hat{a},\hat{b})&= \big ((4+\epsilon _1 +2\epsilon _2) -^{\mathcal {Z}}(2+\epsilon _1 +\epsilon _2)\big )^{2^{\mathcal {Z}}} \!\! = (2+\epsilon _2)^{2^{\mathcal {Z}}}\\&= (2+\epsilon _2) \cdot ^{\mathcal {Z}}(2+\epsilon _2) = \textstyle \frac{9}{2} + 4\epsilon _2 + \frac{1}{2}\epsilon _{\text {f}} \qquad \end{aligned}$$

with \(\epsilon _{\text {f}}\in [0,1]\) because this nonlinear noise symbol approximates a square which is always positive. (With \(\epsilon _{\text {f}}\in [-1,1]\), the approximation would be even worse.) Thus, we have that \(\gamma ^{\mathcal {Z}}(\textstyle \frac{9}{2} + 4\epsilon _2 + \frac{1}{2}\epsilon _{\text {f}}) = [0.5,9]\). However, observe that the square operation \((2+\epsilon _2)^{2^{\mathcal {Z}}}\) is sound but not exact, because the range of values of \((2+\epsilon _2)^2\) is the interval \([1,3]^2=[1,9]\). Thus, \(\{\eta (x,y) \in \mathbb {R}\mid x\in \gamma ^{\mathcal {Z}}(\hat{a}), y\in \gamma ^{\mathcal {Z}}(\hat{b})\}\subsetneq \gamma ^{\mathcal {Z}}(\eta ^{\mathcal {Z}}(\hat{a},\hat{b}))\), as \([1,9]\subsetneq [0.5,9]\). \(\square \)

Exactness of the distance function is not enough to achieve completeness of the abstract kNN classifier on intervals, as shown by the following example.

Example 3.4

(Incompleteness of abstract kNN on intervals) Let us consider a dataset \(D = \{(\textbf{v}=2, l_1), (\textbf{w}=3, l_2)\}\) in the one-dimensional input space \(\mathbb {R}\), and the 1NN classifier \(C_{\mu , 1}\) for the Manhattan distance \(\mu \). Consider a region \(R = P^1_\infty (0)=\{\textbf{x} \in \mathbb {R} \mid -1 \le \textbf{x} \le 1\} \in \wp (\mathbb {R})\). The distances of a generic adversarial vector \(\textbf{x} \in R\) from \(\textbf{v}\) and \(\textbf{w}\) are:

$$\begin{aligned} \mu (\textbf{x}, \textbf{v})&= {|\textbf{x} - 2|} = 2 - \textbf{x}, \\ \mu (\textbf{x}, \textbf{w})&= |\textbf{x} - 3| = 3 - \textbf{x}. \end{aligned}$$

Hence, the dominance test \(\mu (\textbf{x}, \textbf{v}) <^? \mu (\textbf{x}, \textbf{w})\) boils down to \(2 - \textbf{x} <^? 3 - \textbf{x}\), which always holds. Thus, \(\textbf{v}\) is always the nearest neighbor to R, and, in turn, every sample in R is classified by \(C_{\mu , 1}\) as \(l_1\), so that stability holds.

Let us perform the abstract stability certification on \({\mathcal {I}}\), where the region R is exactly represented by the interval \(a \triangleq [-1, 1]\). The abstract Manhattan distances are as follows:

$$\begin{aligned} \mu ^{\mathcal {I}}(a, \textbf{v})&= |[-1, 1] -^{\mathcal {I}}2|^{\mathcal {I}}= |[-3, -1]|^{\mathcal {I}}= [1, 3], \\ \mu ^{\mathcal {I}}(a, \textbf{w})&= |[-1, 1] -^{\mathcal {I}}3]|^{\mathcal {I}}= |[-4, -2]|^{\mathcal {I}}= [2, 4]. \end{aligned}$$

These abstract distances do not allow us to infer the nearest vector to a because \(\mu ^{\mathcal {I}}(a, \textbf{v}^{\mathcal {I}})\, {\not <}^{{\mathcal {I}}} \mu ^{\mathcal {I}}(a, \textbf{w}^{\mathcal {I}})\) and \(\mu ^{\mathcal {I}}(a, \textbf{w}^{\mathcal {I}})\, {\not <}^{{\mathcal {I}}} \mu ^{\mathcal {I}}(a, \textbf{v}^{\mathcal {I}})\).

We can easily adapt this counterexample to show the incompleteness for different distance functions, such as the Euclidean distance. By a simple symbolic computation, we can infer that \(\textbf{v}\) is the nearest neighbor when \(\textbf{x} < 2.5\); hence, once again, every sample in R is labeled as \(l_1\). However, by applying the abstract Euclidean distance, which is complete on \({\mathcal {I}}\), we obtain \(\eta ^{\mathcal {I}}(a, \textbf{v}) = [1, 9]\) and \(\eta ^{\mathcal {I}}(a, \textbf{w}) = [4, 16]\), so that we cannot infer the stability on R.

This lack of precision is rooted in the interval abstraction that does not keep track of multiple occurrences of the same variable \(\textbf{x}\) in different abstract distances.

More refined relational abstractions such as octagons or even convex polyhedra [31] would also fail. For instance, with the convex polyhedra abstraction \(\mathcal {P}\) we would still have an inconclusive comparison \(\mu ^{\mathcal {P}}(a, \textbf{v}) = 1 \le \textbf{x} \le 3 \;\, {\not <}^{\mathcal {P}} \, 2 \le \textbf{x} \le 4 = \mu ^{\mathcal {P}}(a, \textbf{w})\). On a positive side, the relational information of the zonotope abstraction \({\mathcal {Z}}\) in this case allows us to prove stability. In fact, the zonotope \(\hat{a}\triangleq 0 +\epsilon _1\in {\mathcal {Z}}\) exactly represents the region R by keeping track of the dependence on \(\textbf{x}\) through the noise symbol \(\epsilon _1\), so we have that:

$$\begin{aligned}&\mu ^{\mathcal {Z}}(\hat{a}, \textbf{v}) = |0 +\epsilon _1\ -^{\mathcal {Z}}2|^{\mathcal {Z}}= |-2 + \epsilon _1|^{\mathcal {Z}}= 2 +\epsilon _1, \\&\mu ^{\mathcal {Z}}(\hat{a}, \textbf{w}) = |0 +\epsilon _1\ -^{\mathcal {Z}}3|^{\mathcal {Z}}= |-3 + \epsilon _1|^{\mathcal {Z}}= 3 +\epsilon _1. \end{aligned}$$

Thus, \(\mu ^{\mathcal {Z}}(\hat{a}, \textbf{v}) <^{\mathcal {Z}}\! \mu ^{\mathcal {Z}}(\hat{a}, \textbf{w})\) iff \(2 +\epsilon _1 <^{\mathcal {Z}}\! 3 +\epsilon _1\), which clearly holds. \(\square \)

Example 3.4 exhibits a well-known issue of compositional computations in nonrelational abstractions (see [31]): For example, an expression such as \(x-x\) with \(x \in [0,1]\) is compositionally evaluated in \({\mathcal {I}}\) as \([0,1] -^{\mathcal {I}}[0,1] = [-1,1]\), thus causing a significant loss of precision w.r.t. its concrete value [0, 0].

The following example shows that even if zonotopes are more precise than intervals, it may happen that intervals prove the stability of some input sample, whereas zonotopes fail.

Example 3.5

(Intervals vs zonotopes for proving stability) Consider the dataset \(D=\{(\textbf{v}=0,\ell _1), (\textbf{w}=4.1,\ell _2)\}\), a region \(R=\{\textbf{x} \mid 0\le \textbf{x}\le 2\}\), and the 1NN classifier for the Euclidean distance \(\eta \) (w.l.o.g. we consider the square of \(\eta \)). The region R is exactly represented by the interval \(a=[0,2]\in {\mathcal {I}}\) and by the zonotope \(\hat{a} = 1 + \epsilon _1\in {\mathcal {Z}}\). The abstract Euclidean distances on \({\mathcal {I}}\) and \({\mathcal {Z}}\) are as follows:

$$\begin{aligned}&\eta ^{\mathcal {I}}(a, \textbf{v}) = ([0, 2] -^{\mathcal {I}}0)^{2^{\mathcal {I}}} = [0, 4],\\&\eta ^{\mathcal {I}}(a, \textbf{w}) = ([0, 2] -^{\mathcal {I}}4.1)^{2^{\mathcal {I}}} = [4.41, 16.81],\\&\eta ^{\mathcal {Z}}(\hat{a}, \textbf{v}) = (1 + 1\epsilon _1 -^{\mathcal {Z}}0)^{2^{\mathcal {Z}}} = 1 + 2\epsilon _1 + \epsilon _{\text {f}_1}, \quad \text {with } \epsilon _{\text {f}_1} \in [0,1],\\&\eta ^{\mathcal {Z}}(\hat{a}, \textbf{w}) = (1 + 1\epsilon _1 -^{\mathcal {Z}}4.1)^{2^{\mathcal {Z}}} = 9.61 - 6.2\epsilon _1 + \epsilon _{\text {f}_2}, \quad \text {with } \epsilon _{\text {f}_2} \in [0,1]. \end{aligned}$$

Thus, for intervals, we have that \(\eta ^{\mathcal {I}}(a, \textbf{v}) <^{\mathcal {I}}\eta ^{\mathcal {I}}(a, \textbf{w})\) iff \([0,4] <^{\mathcal {I}}[4.41, 16.81]\), which holds and, therefore, entails stability. For zonotopes, we have that:

$$\begin{aligned}&\eta ^{\mathcal {Z}}(\hat{a}, \textbf{v})<^{\mathcal {Z}}\eta ^{\mathcal {Z}}(\hat{a}, \textbf{w})&\text {iff}\\&1 + 2\epsilon _1 + \epsilon _{\text {f}_1}<^{\mathcal {Z}}9.61 - 6.2\epsilon _1 + \epsilon _{\text {f}_2}&\text {iff}\\&-8.61 + 8.2\epsilon _1 + \epsilon _{\text {f}_1} - \epsilon _{\text {f}_2}<^{\mathcal {Z}}0&\end{aligned}$$

which does not hold for, e.g., \(\epsilon _1=1\), \(\epsilon _{\text {f}_1}=1\) and \(\epsilon _{\text {f}_2}=0\). Thus, stability cannot be proved with \({\mathcal {Z}}\). Let us remark that zonotopes here fail because \({\mathcal {Z}}\) needs to introduce two different fresh nonlinear noise symbols \( \epsilon _{\text {f}_1}\) and \( \epsilon _{\text {f}_2}\) for computing, resp., \(\eta ^{\mathcal {Z}}(\hat{a}, \textbf{v})\) and \(\eta ^{\mathcal {Z}}(\hat{a}, \textbf{w})\), while both would represent the same square \(\epsilon _1^2\). \(\square \)

Example 3.5 arises because zonotopes do not keep track precisely of all nonlinear terms, as for the p-th Minkowski distance in \(\mathbb {R}^n\) this would require storing and computing \(n^p\) nonlinear terms, thus making abstract computations for practical datasets unfeasible (see [19] for further details on the approximations and practical limitations of zonotopes).

3.2 Abstract kNN classification

Given a ground truth dataset D, we describe an algorithm for computing the sound abstract kNN classifier \(C_{\delta , k}^A\) on a numerical abstract domain A, which is parametric on a distance function \(\delta \), provided that A is endowed with the abstract functions to be used for designing a sound abstract distance \(\delta ^A:A\times A \rightarrow A\), where, by a slight abuse of notation, A used in the domain \(A\times A\) of \(\delta ^A\) is meant to be an abstraction of sets of vectors in \(\wp (\mathbb {R}^n)\), while A used as codomain of \(\delta ^A\) is an abstraction of sets of numbers in \(\wp (\mathbb {R})\); in this latter case, for each \(a\in A\), we assume that \({{\,\textrm{lb}\,}}(a),{{\,\textrm{ub}\,}}(a)\in \mathbb {R} \cup \{-\infty , +\infty \}\) provide, resp., a sound lower and upper bound for \(\gamma ^A(a)\in \wp (\mathbb {R})\), i.e., for all \(x\in \gamma ^A(a)\), \({{\,\textrm{lb}\,}}(a)\le x \le {{\,\textrm{ub}\,}}(a)\) holds. The pseudocode for \(C_{\delta , k}^A\) is given as Algorithm 1.

3.2.1 Step \(_1\): Computing and ordering abstract distances

Given a kNN classifier \(C_{\delta , k}\), an input \((\textbf{x}, l_\textbf{x})\in X\times L\), and a perturbation function \(P: X \rightarrow \wp (X)\), we first need a sound abstraction \(a_{P(\textbf{x})} \in A\) for the region \(P(\textbf{x})\), and an abstract representation \(\textbf{y}^A \in A\) for every vector \(\textbf{y}\) occurring in the dataset as \((\textbf{y}, l_\textbf{y}) \in D\). For abstract domains that admit an abstraction function \(\alpha ^A: \wp (\mathbb {R}^n) \rightarrow A\), we define \(a_{P(\textbf{x})} \triangleq \alpha ^A(P(\textbf{x}))\). This can always be done for intervals where, for nonempty S, \(\alpha ^{\mathcal {I}}(S) \triangleq [\inf S, \sup S]\), whereas zonotopes, in general, do not admit an abstraction function. On the other hand, let us recall that both intervals and zonotopes provide exact abstract representations for \(\ell _\infty \) perturbations \(P^\epsilon _\infty (\textbf{x})\). For each sample \((\textbf{y}, l_\textbf{y}) \in D\), we compute its abstract distance \(d^A_\textbf{y} \triangleq \delta ^A(a_{P(\textbf{x})}, \textbf{y}^A) \in A\) from the abstract value \(a_{P(\textbf{x})}\) representing the perturbation \(P(\textbf{x})\). Each abstract distance is paired with its corresponding label, thus constructing the set of pairs \(\{(d^A_\textbf{y}, l_\textbf{y})\}_{(\textbf{y}, l_\textbf{y}) \in D}\). The abstract dominance relation \(<^A\) on A is extended to \(A \times L\) simply by disregarding labels, i.e., \((d^A_\textbf{y}, l_\textbf{y}) <^{A \times L} (d^A_\textbf{z}, l_\textbf{z})\) when \(d^A_\textbf{y} <^A d^A_\textbf{z}\). This relation \(<^{A \times L}\) is weakened by the following total order relation \(\preceq \):

where \(\prec \) denotes the corresponding strict order relation. This relation (\(*\)) allows us to sort the set \(\{(d^A_\textbf{y}, l_\textbf{y})\}_{(\textbf{y}, l_\textbf{y})\in D}\) into a totally ordered set \(\langle O, \preceq \rangle \). By a slight abuse of notation, we refer to O[i], with \(i\in [1,|D|]\), as the i-th smallest element of the total order \(\langle O, \preceq \rangle \), so that O[1] is the smallest element, O[2] the second smallest, and so forth. Firstly, let us observe that \(\preceq \) weakens \(<^{A \times L}\), because if \(O[i] <^{A \times L} O[j]\) holds, then it turns out that \({{\,\textrm{lb}\,}}(O[i]) \le {{\,\textrm{ub}\,}}(O[i]) < {{\,\textrm{lb}\,}}(O[j])\), so that \(O[i] \prec O[j]\) holds, meaning that \(i < j\). Moreover, a second property of the total order \(\langle O, \preceq \rangle \) is that if O[j] dominates O[i], then any entry O[k] with index \(k \ge j\) also dominates O[i], i.e., \(O[i] <^{A\times L} O[j]\) implies \(\forall k \ge j\), \(O[i] <^{A\times L} O[k]\). In fact, \(k > j\) implies \(O[j] \preceq O[k]\), so that \({{\,\textrm{lb}\,}}(O[j]) \le {{\,\textrm{lb}\,}}(O[k])\), and, in turn, since \(O[i] <^{A \times L} O[j]\), we have that \({{\,\textrm{ub}\,}}(O[i]) < {{\,\textrm{lb}\,}}(O[j]) \le {{\,\textrm{lb}\,}}(O[k])\), hence entailing that \(O[i] <^{A \times L} O[k]\) holds. In the best case scenario, the sequence \(\langle O[i]\rangle _{i\in [1,|D|]}\) may result to be a total order for \(<^{A\times L}\), meaning that for all \(i, j \in [1,|D|]\), if \(i<j\), then \(O[i]<^{A\times L} O[j]\). In this optimal case, the abstract computation of the k nearest neighbors of \(a_{P(\textbf{x})}\) boils down to extracting the first k elements from the sequence O. However, in general, \(\langle O[i]\rangle _{i\in [1,|D|]}\) will not be totally ordered for \(<^{A\times L}\) because abstract distances may “overlap,” as illustrated in Example 3.4 for the intervals [1, 3] and [2, 4]. In our \(\text {NAVe}\) tool, O has been implemented as a min heap for the total order \(\preceq \) (cf. \(\text {MinHeapify}(O,\preceq )\) at line 6 of Algorithm 1) to leverage its logarithmic cost for building heaps and extracting its i-th smallest element.

3.2.2 Step \(_2\): Computing score bounds for labels

We compute the abstract score intervals \(\textsf{s}[l]\in {\mathcal {I}}\), for all the labels \(l\in L\), namely an integer interval \(\textsf{s}[l]=[{{\,\textrm{lb}\,}}(l),{{\,\textrm{ub}\,}}(l)]\), with \({{\,\textrm{lb}\,}}(l),{{\,\textrm{ub}\,}}(l)\in \mathbb {N}\), that provides a lower bound \({{\,\textrm{lb}\,}}(l)\ge 0\) and an upper bound \({{\,\textrm{ub}\,}}(l) \ge {{\,\textrm{lb}\,}}(l)\) to the number of votes that a label l receives from the k nearest neighbors of \(a_{P(\textbf{x})}\). We initialize \(\textsf{s}[l] = [0, 0]\), for each label \(l \in L\), then we extract the first k pairs from the indexed sequence \(\langle O[i]\rangle _{i\in [1,|D|]}\) of Step\(_1\). For each extracted pair \((d^A_\textbf{z}, l_\textbf{z})\), we check whether O still includes a pair \((d^A_\textbf{y}, l_\textbf{y})\) having a different label and not dominating \(d^A_\textbf{z}\), i.e., such that \(l_\textbf{y} \ne l_\textbf{z}\) and \(d^A_\textbf{z} \not <^A d^A_\textbf{y}\). If such pair does not exist, then all the pairs \((d^A_\textbf{y}, l_\textbf{y})\) left in O are such that \(d^A_\textbf{z} <^A d^A_\textbf{y}\), thus meaning that \(l_\textbf{z}\) will certainly get a vote from \(\textbf{z}\), which has been proved to be a k-nearest neighbor of \(a_{P(\textbf{x})}\). If this happens then it is correct to increase by 1 both the lower and the upper bound of the interval of scores \(\textsf{s}[l_\textbf{z}]\). Otherwise, it is not possible to infer that \(l_\textbf{z}\) will certainly get a vote from \(\textbf{z}\), so that the lower bound \({{\,\textrm{lb}\,}}(l_\textbf{z})\) cannot be increased, while to preserve the soundness of \(\textsf{s}[l_\textbf{z}]\) we must increase its upper bound \({{\,\textrm{ub}\,}}(l_\textbf{z})\) by 1, meaning that it is possible that \(l_\textbf{z}\) will get an additional vote from \(\textbf{z}\). After this computation of the score intervals \([{{\,\textrm{lb}\,}}(l),{{\,\textrm{ub}\,}}(l)]_{l\in L}\) that processed the first k pairs extracted from the sequence O, the sum \(\sigma _k \triangleq \textstyle \sum _{l\in L} {{\,\textrm{lb}\,}}(l)\) of the current lower bounds may be less than k, meaning that still no sound inference on the set of most voted labels for kNN can be drawn from the current status of the score intervals. Hence, if \(\sigma _k < k\) and there exist unprocessed pairs \((d^A_\textbf{z}, l_\textbf{z})\) left in O whose distance \(d^A_\textbf{z}\) does not dominate all the distances of the first k pairs extracted from O, then we check whether \({{\,\textrm{ub}\,}}(l_\textbf{z}) < k\!-\!\textstyle \sum _{l\in L \smallsetminus \{l_\textbf{z}\}} {{\,\textrm{lb}\,}}(l)\) holds. If this is the case then \({{\,\textrm{ub}\,}}(l_\textbf{z})\) is increased by 1.

3.2.3 Step \(_3\): Refining lower bounds

Following Step\(_2\), we try to refine the lower bounds of \(\textsf{s}[l]\) as sketched by the following example. Let us consider a binary classification with two labels \(l_1\) and \(l_2\) and \(k = 7\), whose current score intervals are, resp., \(\textsf{s}[l_1] = [2, 4]\) and \(\textsf{s}[l_2] = [1, 3]\). We observe that this information allows us to make a sound increment of the lower bounds of both \(l_1\) and \(l_2\). In fact, since the sum of the two labels must be \(k=7\), this can happen just when \(\textsf{s}[l_1] = [4,4]\) and \(\textsf{s}[l_2] = [3, 3]\). Therefore, in this case, we can infer that \(l_1\) is the most voted label.

A precise and general pseudocode of this refinement step is given at lines 17-19 of Algorithm 1. For each label l, we compute the minimum \(\mu \) between k and the sum of \({{\,\textrm{ub}\,}}(l')\) for all \(l'\ne l\). If \(k-\mu < {{\,\textrm{lb}\,}}(l)\) holds then we can correctly refine the lower bound for l to \(k-\mu \).

figure d

3.2.4 Step \(_4\): Abstract classification

After the refinement of Step\(_3\), we return the set of labels whose score intervals are numerically significant, i.e., different from [0, 0], and maximal for the dominance relation \(<^{\mathcal {I}}\) between score intervals, that is, \(C_{\delta , k}^A(a_{P(\textbf{x})})\) outputs the following set of labels:

$$\begin{aligned} \left\{ l \in L \mid {{\,\textrm{ub}\,}}(l) \ge \Big \lceil \frac{k}{\min (k, |L|)} \Big \rceil ,\, \forall l' \ne l:\textsf{s}[l] \not <^{\mathcal {I}}\textsf{s}[l']\right\} . \end{aligned}$$

We are thus excluding from the output set only those labels l whose score interval either has an upper bound strictly less than \(\lceil \frac{k}{\min (k, |L|)}\rceil \) or is not maximal, i.e., there exists a different label \(l'\) with a dominant score \(\textsf{s}[l'] >^{\mathcal {I}}\textsf{s}[l]\), meaning that the number of votes for l is surely less than the votes of \(l'\). This definition is sound because no real classification label given as output by \(C_{\delta , k}(\textbf{y})\) for some adversarial attack \(\textbf{y} \in \gamma ^A(a_{P(\textbf{x})})\) is forgot, while the lack of precision in computing the abstract distances—this cannot happen with intervals but it may be the case for zonotopes, cf. Theorem 3.2 and Example 3.3—and, in turn, the score intervals may lead to an over-approximation that includes some spurious labels.

Theorem 3.6

(Soundness of abstract kNN) The abstract classifier \(C_{\delta , k}^A\) is a sound approximation of \(C_{\delta , k}\), namely for all \(a \in A\), \(\cup _{\textbf{y} \in \gamma ^A(a)} C_{\delta , k}(\textbf{y}) \subseteq C_{\delta , k}^A(a)\).

Proof

It follows by the arguments given above that justify the soundness of the four steps of Algorithm 1 that implements the abstract classifier \(C_{\delta , k}^A\). \(\square \)

3.2.5 Remarks

In Step\(_1\), the first k pairs of the total order \(\langle O,\preceq \rangle \) are intuitively the k most likely candidates to be the k nearest neighbors of the abstract adversarial region \(a_{P(\textbf{x})}\). If their distances from \(a_{P(\textbf{x})}\) are all strictly dominated by the other pairs in O then these first k samples in O are indeed the k nearest neighbors of \(a_{P(\textbf{x})}\), and therefore we can assign a sure vote to their labels, i.e., we increment both the lower and upper bounds of the score intervals for their labels. If, instead, this is not the case; namely, there exist \(O[i]=(d_\textbf{z},l_\textbf{z})\), for some \(i\in [1,k]\), and \(O[j]=(d_\textbf{y},l_\textbf{y})\) with \(j>k\), such that \(d_\textbf{z} \not <^A d_{\textbf{y}}\), then we increment the upper bound \({{\,\textrm{ub}\,}}(l_{\textbf{z}})\) just when \(l_{\textbf{z}}\ne l_{\textbf{y}}\): In fact, if \(l_{\textbf{z}}= l_{\textbf{y}}\), then neglecting the contribution of the sample \(\textbf{z}\) among the k nearest neighbors does not change the score for that same label \(l_{\textbf{z}}\). Moreover, if some \(O[j]=(d_\textbf{y},l_\textbf{y})\), with \(j>k\), strictly dominates all the first k pairs of O, then all the pairs O[m] with \(m\ge k\) do the same, so that we do not need to consider them in computing the score intervals. The same reasoning applies to any pair \(O[j]=(d_\textbf{y},l_\textbf{y})\) w.r.t. a generic sample: If there exists some labeled sample \((\textbf{u}, l_\textbf{u})\) such that \(l_{\textbf{u}}\ne l_{\textbf{y}}\) and \(d_\textbf{u} \not <^A d_{\textbf{y}}\), then the upper bound \({{\,\textrm{ub}\,}}(l_\textbf{y})\) can be correctly incremented by 1, as this label \(l_\textbf{y}\) could potentially be considered, although we do not know this for sure due to incompleteness. Increasing an upper bound of a score by some positive integer is always sound. However, while the computation of the abstract distance \(\delta ^A(a_{P(\textbf{x})}, \textbf{y}^A)\) may be exact (cf. Theorem 3.2), the computation of score intervals, in general, is not exact. This is due to the fact that score intervals for labels cannot represent relations between different scores. For example, mutual exclusion is a relational property which cannot be expressed by score intervals: The property “if a label \(l_{\textbf{x}}\) gets n votes, then a different label \(l_{\textbf{y}}\) gets \(m - n\) votes” cannot be represented through intervals that cannot keep track of the fact that the score of \(l_{\textbf{y}}\) depends on that of \(l_{\textbf{x}}\).

3.3 Regression tasks

While our primary focus is on classification tasks, our methodology can be easily adapted to accommodate regression. Let us succinctly recall the basic steps of a regression task for kNN models. Initially, distances from a given input sample \(\textbf{x}\) to every point in the training dataset D are computed and exploited for sorting the vectors in D from nearest to farthest to \(\textbf{x}\), akin to the classification algorithm. Subsequently, the k nearest neighbors are identified, and an aggregation function is applied to their numeric values to compute the output regression value. A common example of aggregation function is the weighted mean, where the weights are inversely proportional to the distances.

Let us sketch how our abstract kNN algorithm can be adapted to a regression task. Firstly, we perform the same initial Step\(_1\) of the classification approach, thus computing and ordering abstract distances of training samples in D to an input abstract value \(a\in A\). Following this, a sound superset of the k nearest neighbors can be inferred using analogous techniques as in classification (namely Step\(_2\)), by leveraging an abstract version of the aggregation function. The specific algorithm to be used for this purpose depends upon the chosen aggregation. Common aggregation functions, such as the weighted mean, entail using standard numerical operations such as addition, multiplication, and inverse, all of which have sound (or even exact) abstract versions on the abstract domains used in our work, notably intervals and zonotopes. Consequently, the abstract kNN regression algorithm should compute a sound output interval, namely a sound over-approximation of the true regression values for all the samples represented by input abstract value a.

3.4 Instantiating to different abstractions and perturbations

Our abstraction-based verification technique is fully parametric on the specific type of perturbation and numerical abstraction employed and is not restricted to interval-based perturbations/abstractions. As an example, let us sketch its applicability to perturbations not induced by the \(\ell _\infty \) norm. Consider a perturbation function \(P: \mathbb {R}^2 \rightarrow \wp (\mathbb {R}^2)\) defined by

thus representing the \(\ell _1\)-ball of radius 1 with center \(\textbf{x}\). Geometrically, \(P(\textbf{x})\) is a square rotated by \(\frac{\pi }{4}\) and cannot be exactly represented by an interval or a zonotope. Nevertheless, it is feasible to over-approximate \(P(\textbf{x})\) through the smallest two-dimensional interval (i.e., box) containing it simply by computing the minimum and maximum values for each axis, thus obtaining the box \(\langle [{\textbf{x}}_1 - 1, {\textbf{x}}_1 + 1], [{\textbf{x}}_2 - 1, {\textbf{x}}_2 + 1]\rangle \). Indeed, each \(\ell _p\)-ball of radius r can be over-approximated by a hypercube of the same radius, although this approximation may introduce a further loss of precision into the abstract kNN procedure that may yield an increased number of output labels, and, consequently, a less precise stability certification. To mitigate this, we can use more appropriate abstract domains that are capable of representing more precisely (or even exactly) a given class of perturbations, typically paying a cost in time efficiency: As an example, the octagon abstraction [30, 31] would allow to represent in an exact way the above perturbation (\(\dagger \)).

4 Equivalence of data poisoning and input perturbation for the maximum norm

Data poisoning is a distinctive form of attack that injects malicious or deceptive data into the training set of machine learning models [43, 47]. In contrast to conventional attacks that target vulnerabilities in model architecture or parameters, data poisoning surreptitiously erodes the fundamental underpinnings of the learning process since through subtle alterations or strategic injections of malicious instances into the training data, the attacker aims at compromising the integrity of the learning algorithm. We investigate the relationship between stability, akin to input poisoning, and data poisoning, by showing that our certification method for stability under input perturbations can be also applied to verify resilience to data poisoning when the underlying numerical abstract domain is the interval or zonotope abstraction.

Assume that training datasets D range into a space \(\wp (X\times L)\). A data poisoning is defined as a function \(\mathbb {P}: \wp (X\times L) \rightarrow \wp (X\times L)\) that for any input dataset D returns a poisoned dataset \(\mathbb {P}(D)\). Consider a learning algorithm \(\textsf{LA}\) that, given a training dataset D, deterministically returns a classifier \(\textsf{LA}(D)=C_D: X \rightarrow \wp (L)\). A learning algorithm \(\textsf{LA}\) is defined to be resilient on an input sample \(\textbf{x}\in X\) under a data poisoning \(\mathbb {P}\) when for all datasets D, \(C_{\mathbb {P}(D)}(\textbf{x})=C_D(\textbf{x})\). In the following, we consider data poisoning functions \(\mathbb {P}^\tau _\infty \), derived from maximum norm perturbations \(P^\tau _\infty \) and defined as follows:

$$\begin{aligned} \mathbb {P}^\tau _\infty (D) \triangleq \{(\mathbf {x'},l_\textbf{x}) \in X\times L \mid (\textbf{x},l_\textbf{x})\in D,\, \textbf{x}' \in P^\tau _\infty (\textbf{x})\}. \end{aligned}$$

We are interested in proving that a kNN classifier \(C_{\delta , k}\) is resilient on some input to a \(\mathbb {P}^\tau _\infty \) data poisoning of its ground truth dataset D. Since the output of an abstract classifier \(C_{\delta , k}^A\) depends on the abstract distances of each sample \(\textbf{s}\) in the dataset D from the perturbation \(P^\tau _\infty (\textbf{x})\) of an input sample \(\textbf{x}\), in the following we prove that the abstract distance between an individual poisoning \({P}^\tau _\infty (\textbf{s})\) of a sample \(\textbf{s}\) in D and a given input \(\textbf{x}\) coincides with the abstract distance between \(\textbf{s}\) and the corresponding perturbation \(P^\tau _\infty (\textbf{x})\) of \(\textbf{x}\). Hence, when this happens, by Theorems 3.1 and 3.6, it turns out that the resilience of a kNN classification \(C_{\delta , k}(\textbf{x})\) to a \(\ell _\infty \)-poisoning \(\mathbb {P}_\infty ^\tau \) of its training dataset D can be proved as an abstract stability certification of \(C_{\delta , k}\) over \(P_\infty ^\tau (\textbf{x})\) through the abstract classification \(C_{\delta , k}^{{A}}(P_\infty ^\tau (\textbf{x}))\).

4.1 Intervals

Let us consider a training sample \(\textbf{s} \in D\subseteq \mathbb {R}^n\), a \(\ell _\infty \)-perturbation \(P_\infty ^\tau : \mathbb {R}^n \rightarrow \wp (\mathbb {R}^n)\), and an input sample \(\textbf{x} \in \mathbb {R}^n\). Hence, for the abstract Minkowski distance \(\delta _p^{\mathcal {I}^n}\) that neglects the irrelevant p-th root, we have that:

$$\begin{aligned} \delta _p^{\mathcal {I}^n}(\textbf{x}, \alpha ^\mathcal {I}(P_\infty ^\tau (\textbf{s})))&= \sum \nolimits _{i=1}^{n} |\textbf{x}_i -^\mathcal {I} [\textbf{s}_i - \tau , \textbf{s}_i + \tau ]|^p \\&= \sum \nolimits _{i=1}^{n} |[\textbf{x}_i - \textbf{s}_i - \tau , \textbf{x}_i - \textbf{s}_i + \tau ]|^p \\&= \sum \nolimits _{i=1}^{n} |[\textbf{x}_i - \tau - \textbf{s}_i, \textbf{x}_i + \tau - \textbf{s}_i]|^p \\&= \sum \nolimits _{i=1}^{n} |[\textbf{x}_i - \tau , \textbf{x}_i + \tau ] -^\mathcal {I} \textbf{s}_i|^p \\&= \delta _p^{\mathcal {I}^n}\big (\alpha ^\mathcal {I}\big (P_\infty ^\tau (\textbf{x})\big ), \textbf{s}\big ). \end{aligned}$$

As a consequence, the resilience of a kNN classification \(C_{\delta , k}(\textbf{x})\) under a \(\ell _\infty \)-poisoning \(\mathbb {P}_\infty ^\tau \) of the training dataset D can be inferred as an abstract stability certification by means of the interval classification \(C_{\delta , k}^{{\mathcal {I}}}(P_\infty ^\tau (\textbf{x}))\).

4.2 Zonotopes

A similar result can be proved for zonotopes. Let us recall that while, in general, the abstraction function for zonotopes does not exist, a \(\ell _\infty \)-perturbation \(P^\tau _\infty (\textbf{x})\) can be always exactly represented through the zonotope \(\langle \frac{\textbf{x}_1}{2} + \tau \epsilon _1,\ldots , \frac{\textbf{x}_n}{2} + \tau \epsilon _n\rangle \), which is therefore the best abstraction of \(P^\tau _\infty (\textbf{x})\) in \({\mathcal {Z}}\). We have that:

$$\begin{aligned} \delta _p^{\mathcal {Z}^n}\big (\textbf{x}, \alpha ^\mathcal {Z}\big (P_\infty ^\tau (\textbf{s})\big )\big )&= \sum \nolimits _{i=1}^n \left| \textbf{x}_i - \left( \frac{\textbf{s}_i}{2} + \tau \epsilon _i\right) \right| ^p \\&= \sum \nolimits _{i=1}^n \left| \textbf{x}_i - \frac{\textbf{s}_i}{2} - \tau \epsilon _i\right| ^p \qquad \qquad [\hbox {as}\,\textbf{x}_i-\tau \epsilon _i = \textbf{x}_i+\tau \epsilon _i]\\&= \sum \nolimits _{i=1}^n \left| \textbf{x}_i + \tau \epsilon _i - \frac{\textbf{s}_i}{2}\right| ^p \\&= \delta _p^{\mathcal {Z}^n}\big (\alpha ^\mathcal {Z}\big (P_\infty ^\tau (\textbf{x})\big ), \textbf{s}\big ). \end{aligned}$$

Hence, resilience to a maximum norm data poisoning can be also inferred by leveraging the zonotope classifier \(C_{\delta , k}^{{\mathcal {Z}}}\).

4.3 Arbitrary abstractions

In general, the equivalence shown above between maximum norm data poisoning and input perturbation does not hold for an arbitrary abstract domain A. To exhibit a counterexample, we consider an artificial abstraction \(\mathcal {R}\) mirroring the behavior of the interval abstraction \(\mathcal {I}\), with the exception that interval lower and upper bounds cannot belong to the range \((-1, 1)\). Thus, for example, a numerical set X having \(\sup (X)\in (-1,1)\) will have an interval approximation in \(\mathcal {R}\) with upper bound 1 which is strictly larger than \(\sup (X)\). Clearly, this abstract domain \(\mathcal {R}\) is endowed with the abstraction function \(\alpha ^{\mathcal {R}}\), e.g., \(\alpha ^{\mathcal {R}}([-2,0]) =[-2,1]\).

Example 4.1

Let us consider the training dataset \(D = \{\textbf{s}_1 = ((1, 1), l_1),\, \textbf{s}_2 = ((-1, -1), l_2),\, \textbf{s}_3 = ((-2, -2), l_1)\}\) in \(\mathbb {R}^2\), an input sample \(\textbf{x} = (3, 3)\), and the perturbation \(P_\infty ^\tau \) with \(\tau = 0.1\). The abstract Minkowski distances \(d_i = \delta _p^\mathcal {R}(\alpha ^\mathcal {R}(P_\infty ^\tau (\textbf{x})), \textbf{s}_i)\) in \(\mathcal {R}\) for the input perturbation are as follows:

$$\begin{aligned}&d_1 = [2(2 - \tau )^p, 2(2 + \tau )^p],\\&d_2 = [2(4-\tau )^p, 2(4+\tau )^p],\\&d_3 = [2(5-\tau )^p, 2(5+\tau )^p]. \end{aligned}$$

Let us observe that \(d_1<^\mathcal {R} d_2 <^\mathcal {R} d_3\).

On the other hand, for the data poisoning \(P_\infty ^\tau (\textbf{s}_i)\), we have the following abstract distances \(e_i = \delta _p^\mathcal {R}(\textbf{x}, \alpha ^\mathcal {R}(P_\infty ^\tau (\textbf{s}_i)))\):

$$\begin{aligned}&e_1 = [2(2 - \tau )^p, 2\cdot 4^p],\\&e_2 = [2\cdot 2^p, 2(4+\tau )^p],\\&e_3 = [2(5-\tau )^p, 2(5+\tau )^p]. \end{aligned}$$

It turns out that both \(e_1 <^\mathcal {R} e_3\) and \(e_2 <^\mathcal {R} e_3\) hold but neither \(e_1 <^\mathcal {R} e_2\) nor \(e_2 <^\mathcal {R} e_1\) hold, since the abstract distances \(e_1\) and \(e_2\) overlap.

Hence, for the case \(k=1\), the abstract classifier \(C_{\delta _p, 1}^{\mathcal {R}}\) allows us to infer stability of input perturbation but not resilience to data poisoning. Stability of input perturbation can be inferred because the sample \(\textbf{s}_1\) is proved to be the nearest to \(P_\infty ^\tau (\textbf{x})\). For resilience to data poisoning \(P_\infty ^\tau (\textbf{s}_i)\), both \(\textbf{s}_1\) and \(\textbf{s}_2\) are selected by the abstract classifier \(C_{\delta _p, 1}^{\mathcal {R}}\), because the corresponding abstract distances to \(\textbf{x}\) overlap.

Nevertheless, we put forward some sufficient conditions on an abstraction A guaranteeing the equivalence of the best correct approximations in A of the distances for data poisoning and input perturbation for the maximum norm.

Theorem 4.2

(Equivalence of data poisoning and input perturbation) Let \(P_\infty ^\tau : \mathbb {R}^n \rightarrow \wp (\mathbb {R}^n)\) be a \(\ell _\infty \) perturbation and A be a numerical abstraction. Assume the following conditions:

  1. (i)

    A admits an abstraction function \(\alpha ^{A}\).

  2. (ii)

    For all \(\textbf{x} \in X\), there exists \(a_\textbf{x}\) such that \(P_\infty ^\tau (\textbf{x}) = \gamma ^{A}(a_\textbf{x})\), i.e., each adversarial region \(P_\infty ^\tau (\textbf{x})\) is exactly representable in A.

  3. (iii)

    For all \(\textbf{x} \in X\) and \(\textbf{t} \in \mathbb {R}^n\), \(\textbf{x} + \textbf{t} \in P_\infty ^\tau (\textbf{x}) \Leftrightarrow \Vert \textbf{t} \Vert _\infty \le \tau \).

Then, the best correct approximations in A of the distances for \(P_\infty ^\tau \) input perturbation and data poisoning coincide.

Proof

Since A admits an abstraction function \(\alpha ^{A}\), the best correct approximation of any concrete function f exists and is \(\alpha ^{A} \circ f \circ \gamma ^{A}\). Thus, given an input \(\textbf{x}\) and a training sample \(\textbf{s}\), to prove the equivalence we show that \(\alpha ^{A}(\delta _p(\gamma ^{A}(\alpha ^{A}(P_\infty ^\tau (\textbf{x}))), \textbf{s})) = \alpha ^{A}(\delta _p(\textbf{x}, \gamma ^{A}(\alpha ^{A}(P_\infty ^\tau (\textbf{s})))))\). By letting \(a\triangleq \alpha ^{A}(P_\infty ^\tau (\textbf{x})))\) and \(b\triangleq \alpha ^{A}(P_\infty ^\tau (\textbf{s})))\), we show that \(\delta _p(\gamma ^{A}(a), \textbf{s}) = \delta _p(\textbf{x}, \gamma ^{A}(b))\):

$$\begin{aligned} \delta _p(\gamma ^{A}(a), \textbf{s})&= \bigcup \nolimits _{\mathbf {x'} \in \gamma ^{A}(a)} \delta _p(\mathbf {x'}, \textbf{s}){} & {} \text {[by (ii)]}\\&= \bigcup \nolimits _{\mathbf {x'} \in P_\infty ^\tau (\textbf{x})} \delta _p(\mathbf {x'}, \textbf{s}){} & {} \text {[by (iii)]}\\&= \bigcup \nolimits _{\textbf{t} \in \mathbb {R}^n, \Vert \textbf{t}\Vert _\infty \le \tau } \delta _p(\textbf{x} + \textbf{t}, \textbf{s}) \\&= \bigcup \nolimits _{\textbf{t} \in \mathbb {R}^n, \Vert \textbf{t}\Vert _\infty \le \tau } \sum \nolimits _{i = 1}^n |\textbf{x}_i + \textbf{t}_i - \textbf{s}_i|^p \\&= \bigcup \nolimits _{\textbf{t} \in \mathbb {R}^n, \Vert \textbf{t}\Vert _\infty \le \tau } \sum \nolimits _{i = 1}^n |\textbf{x}_i - (\textbf{s}_i - \textbf{t}_i)|^p \\&= \bigcup \nolimits _{\textbf{t} \in \mathbb {R}^n, \Vert \textbf{t}\Vert _\infty \le \tau } \delta _p(\textbf{x}, \textbf{s} - \textbf{t}){} & {} \text {[by def.\ of}\,\Vert \cdot \Vert _\infty ]\\&= \bigcup \nolimits _{\textbf{t} \in \mathbb {R}^n, \Vert \textbf{t}\Vert _\infty \le \tau } \delta _p(\textbf{x}, \textbf{s} + \textbf{t}){} & {} \text {[by (iii)]}\\&= \bigcup \nolimits _{\mathbf {s'} \in P_\infty ^\tau (\textbf{s})} \delta _p(\textbf{x}, \mathbf {s'}){} & {} \text {[by (ii)]}\\&= \bigcup \nolimits _{\mathbf {s'} \in \gamma ^{A}(b)} \delta _p(\textbf{x}, \mathbf {s'}) = \delta _p(\textbf{x}, \gamma ^{A}(b)).{} & {} \end{aligned}$$

\(\square \)

Let us stress that Theorem 4.2 concerns abstract domains A endowed with an abstraction function and refers to the best correct approximations of distances, which may not coincide with the compositional definition of abstract distances considered in Sect. 3.1.

5 Dealing with categorical features

Datasets may contain both numerical and categorical features, where the latter usually range in nonnumerical sets of values, e.g., \({color} \in \{{red}, {green}, {blue}\}\). Most ML algorithms can only process numerical features, hence they rely on some numerical encoding of categorical features. One-hot encoding is a de facto standard encoding that consists in replacing a feature having k categories with k binary numerical features. More precisely, if \(F = \{c_1, c_2,\ldots , c_q\}\) is the set of values for a categorical feature \(f\in F\), one-hot encoding replaces f with q binary numerical features \((x_1^f, x_2^f,\ldots , x_q^f) \in \{0, 1\}^q\) in such a way that \(\forall i \in [1,q]:x_i^f = 1 \Leftrightarrow f = c_i\). Therefore, one-hot encoding implicitly introduces the constraint \(\sum _{i=1}^{q} x_i^f = 1\), which prevents a one-hot encoded sample from having more than one categorical value. If these relational constraints between one-hot encoded numerical features cannot be represented by an abstraction A, then an abstract classifier defined on A may exhibit a significant loss of precision, as illustrated by the following example for intervals.

Example 5.1

(Loss of precision due to one-hot encoding) Consider data samples with a categorical \({color} \in \{{red}, {green}, {blue}\}\) and a numerical \({size} \in \mathbb {R}_{\ge 0}\). Let \(\mathbf {a'} \triangleq ({red}, 1)\), \(\mathbf {b'} \triangleq ({red}, 3)\), and consider a dataset \(D=\{(\mathbf {a'},l_1), (\mathbf {b'},l_2)\}\). By one-hot encoding, color is replaced by \(({isRed},{isGreen},{isBlue}) \in \{0, 1\}^3\), so that \(\mathbf {a'}\) and \(\mathbf {b'}\) are encoded as \(\textbf{a} \triangleq (1, 0, 0, 1)\) and \(\textbf{b} \triangleq (1, 0, 0, 3)\). Consider an adversarial region \(R \triangleq \{(r, g, b, {size}) ~|~ r, g, b \in \{0, 1\}, {size} \in [0, 1]\}\). We observe that \(\textbf{a}\) is always closer than \(\textbf{b}\) to any vector \(\textbf{x}\in R\), for any Minkowski distance \(\delta _p\): In fact, we have that

$$\begin{aligned} \delta _p(\textbf{a}, \textbf{x})< \delta _p(\textbf{b}, \textbf{x}) \Leftrightarrow&\;\root p \of {|1 - \textbf{x}_1|^p + \textbf{x}_2^p + \textbf{x}_3^p + |1 - \textbf{x}_4|^p}<\\&\;\root p \of {|1 - \textbf{x}_1|^p + \textbf{x}_2^p + \textbf{x}_3^p + |3 - \textbf{x}_4|^p}\\ \Leftrightarrow&\;\, |1 - \textbf{x}_4| < |3 - \textbf{x}_4| \end{aligned}$$

which always holds for \(\textbf{x}_4 =\textbf{x}_{{size}}\in [0, 1]\). Hence, 1NN classifies any vector in R as \(l_1\).

Consider the abstract 1NN classifier on the interval abstraction \({\mathcal {I}}\) and the Manhattan distance \(\delta _1\). Therefore, R is abstracted as \(\alpha ^{{\mathcal {I}}^4}(R)=r= \langle [0, 1], [0, 1], [0, 1], [0, 1] \rangle \in {\mathcal {I}}^4\), and the abstract distances are as follows:

$$\begin{aligned}&\delta _1^{\mathcal {I}}(r,\textbf{a})= [0, 1] +^{\mathcal {I}}[0, 1] +^{\mathcal {I}}[0, 1] +^{\mathcal {I}}[0, 1] = [0, 4],\\&\delta _1^{\mathcal {I}}(r,\textbf{b})= [0, 1] +^{\mathcal {I}}[0, 1] +^{\mathcal {I}}[0, 1] +^{\mathcal {I}}[2, 3] = [2, 6]. \end{aligned}$$

Since the intervals [0, 4] and [2, 6] overlap, we cannot infer which of the two samples \(\textbf{a}\) and \(\textbf{b}\) is the nearest to R, so that the abstract 1NN classifier returns \(\{l_1, l_2\}\), i.e., no information at all. This loss of precision depends on the interval abstraction, which is not able to represent the constraint \({isRed}, {isGreen}, {isBlue} \in \{0, 1\}\) and \({isRed} + {isGreen} + {isBlue} = 1\). \(\square \)

This additional loss of precision due to one-hot encoding could happen for zonotopes as well, although this phenomenon is mitigated by the chance that zonotopes represent some relational information between different one-hot encoded features through shared noise symbols.

To avoid the loss of precision due to one-hot encoding, we partition the original adversarial region R, abstractly represented by some \(a \in A\), into q subregions \(R_i \subseteq R\), each of them abstractly represented by some \(a_i \in A\), where q is the overall number of values of the categorical features perturbed in the adversarial region R. Then, we execute the abstract classifier \(C^A(a_i)\) for each abstract subregion \(a_i\), and for each of them we compute a sound output set of labels. If, by repeatedly applying \(C^A(a_i)\), it happens that the union of their output sets of labels is the whole set L, then we stop and output L. This splitting process will be such that every categorical feature of every subregion \(R_i\) will have exactly one possible categorical value, so that within each subregion \(R_i\) there is no need for abstracting the one-hot encoded categorical features. The final output will be obtained by collecting all the labels for each \(a_i\), namely: \(C^A(a) \triangleq \cup _{i \in [1, q]} C^A(a_i)\). This simple splitting strategy over categorical features reduces false negatives generated by one-hot encoding at the price of a higher certification time, since this procedure generates a new sub-problem for every possible combination of categorical values. Let us remark that if the perturbation of an input sample concerns categorical values only (i.e., numerical values are not perturbated)—this can happen in individual fairness certification—then this partitioning approach boils down to a concrete (and, therefore, trivially exact) verification, at the cost of an exponential number of sub-problems. More precisely, if m is the maximum number of different categories and p is the number of perturbed categorical features, then we need to check \(O(m^p)\) sub-problems. This exponential blow-up is expected for an exact stability certification procedure with no false negatives. To balance cost and precision, one could allow only certain features to be split. (Unsplit features behave as numerical ones, and soundness still holds.) We applied this splitting technique in our experiments on individual fairness certification, where reference datasets typically include categorical features.

6 Experimental evaluation

We implemented our abstraction framework for kNN classifiers in a verification tool called \(\text {NAVe}\) and written in Python, and we instantiated it with the interval and zonotope abstractions. The source code of \(\text {NAVe}\) together with datasets and scripts for reproducing our experimental results is available on GitHub [12].

6.1 Setup

For our experiments, we considered some standard datasets used in robustness certification of kNN [42] and fairness verification of deep neural networks [29, 38]. Following [38], the datasets are preprocessed as follows:

  1. (1)

    rows/columns with missing values are dropped;

  2. (2)

    when needed (Letter, Pendigits and Satimage already have explicit test sets), datasets are split into training (\(\approx 70\)\(80\%\)) and test (\(\approx 20\)\(30\%\)) sets, resp., D and T;

  3. (3)

    categorical features are one-hot encoded;

  4. (4)

    numerical features are scaled to [0, 1].

The details of these datasets, together with the accuracy of kNN on their test sets, are summarized in Table 1. In our individual fairness experiments, we consider the Noise-Cat similarity relation as defined by Ruoss et al. [38], where two individuals \(\textbf{x}, \textbf{y} \in X\) are similar when:

  1. (1)

    given the subset \(\text {Noise} \subseteq \mathbb {N}\) of indexes of all numerical features and a noise threshold \(\epsilon \ge 0\), for all \(i \in \text {Noise}\), \(|\textbf{x}_i - \textbf{y}_i| \le \epsilon \);

  2. (2)

    given a subset \(\text {Cat} \subseteq \mathbb {N}\) of indexes of “sensitive” categorical features, both \(\textbf{x}\) and \(\textbf{y}\) are allowed to have any category for features with indexes in \(\text {Cat}\);

  3. (3)

    every other categorical feature of \(\textbf{x}\) and \(\textbf{y}\), i.e., with index not in \(\text {Cat}\), must be the same; namely, for any index \(i \not \in (\text {Noise} \cup \text {Cat})\), \(\textbf{x}_i = \textbf{y}_i\) holds.

Fairness experiments with \(\epsilon =0\) represent a pure Cat perturbation of sensitive categorical features only, leaving numerical features unaltered: In this case, our certification method is complete, i.e., the percentages of individual fairness for \(\epsilon =0\) turn out to be exact (i.e., not a lower bound).

Table 1 Summary of datasets

We instantiated our parametric abstract kNN classifier of Theorem 3.6 to both intervals \({\mathcal {I}}\) and zonotopes \({\mathcal {Z}}\), and we evaluated both the Manhattan \(\delta _1\) and Euclidean \(\delta _2\) distances. We considered the \(\ell _\infty \)-perturbation \(P_\infty ^\epsilon \) for our stability experiments, with the magnitude \(\epsilon \) ranging in [0.001, 0.1] for stability experiments ([0.001, 0.05] for the dataset Letter), i.e., numerical features can be altered from \(\pm 0.1\%\) to \(\pm 10\%\). In the individual fairness experiments, we considered the following Noise-Cat perturbations: For Noise, the numerical attributes were perturbed with \(P_\infty ^\epsilon \) with \(\epsilon \in [0,0.05]\); for Cat, the sensitive categorical attributes were race for Compas and gender for German; when \(\epsilon =0\), this boils down to a pure Cat perturbation. The parameter k ranges in \(\{1, 3, 5, 7\}\), where, following the standard practice for kNN, we avoided even values of k as they are more likely to introduce tie votes in the classification. We conducted all our experiments on a low-cost AWS virtual machine t2.micro instance, that provides a baseline level of CPU performance through a single 2.5 GHz CPU and 1 GB of RAM. Throughout the experiments, we mostly observed consistent time behaviors.

Table 2 Percentages of provable stability or individual fairness for Intervals \({\mathcal {I}}\) with Manhattan distance \(\delta _1\) on the whole test sets T
Table 3 Percentages of provable stability or individual fairness for Zonotopes \({\mathcal {Z}}\) with Manhattan distance \(\delta _1\) on the whole test sets T
Table 4 Percentages of provable stability or individual fairness for Intervals \({\mathcal {I}}\) with Euclidean distance \(\delta _2\) on the whole test sets T
Table 5 Percentages of provable stability or individual fairness for Zonotopes \({\mathcal {I}}\) with Euclidean distance \(\delta _2\) on the whole test sets T

6.2 Results

Tables 2, 3, 4, 5 report the percentages of test samples in T for which our \(\text {NAVe}\) tool proves that the kNN classifier is stable, i.e., for all k and \(\epsilon \), we provide the following metric:

$$\begin{aligned} \text {ProvableStability}_{k,\epsilon } \triangleq |\{(\textbf{x}, \_) \in T \mid |C_{\delta _i, k}^A(P_\infty ^\epsilon (\textbf{x}))|=1\}|/|T| \end{aligned}$$

where \(A\in \{{\mathcal {I}},{\mathcal {Z}}\}\) and \(i=1,2\). As shown in Sect. 2.4, for fairness datasets provable stability means provable individual fairness. For each distance \(\delta _1\) and \(\delta _2\), and for each dataset and perturbation magnitude \(\epsilon \), we highlight in bold the percentage corresponding to the most provably stable/fair kNN classifier. Due to incompleteness of the abstract kNN classification (cf. Example 3.4), it is worth recalling that \(\text {ProvableStability}_{k,\epsilon }\) is a lower bound of the real stability of kNN on the test set T.

As expected, the zonotope abstraction \({\mathcal {Z}}\) allows us to have a certification technique that is generally more precise, and often much more precise, than that using the interval domain \({\mathcal {I}}\). The only exception is provided by the German dataset with \(\epsilon = 0.02\) where for the case \(k=1\) intervals infer one more stable sample than zonotopes (overall, \(85\%\) vs. \(84.5\%\) of provable stability; indeed, this may happen as shown in Example 3.5).

Our \(\text {NAVe}\) tool infers with the zonotope abstraction more than \(80\%\) of stability, independently of k and distance \(\delta _i\), for:

  1. (i)

    Australian for all \(\epsilon \le 0.1\);

  2. (ii)

    BreastCancer for all \(\epsilon \le 0.05\);

  3. (iii)

    Fourclass and Pendigits for all \(\epsilon \le 0.03\);

  4. (iv)

    Diabetes, Letter and Satimage for \(\epsilon \le 0.005\).

Of course, provable stability decreases with higher values of \(\epsilon \) since stronger perturbations are more likely to produce unstable behaviors, as well as more false positives among the approximate output sets of labels. In particular, we observe that Diabetes exhibits the worst stability scores, that together with a low accuracy (\(\approx 70\%\)) hints that a diagnosis of diabetes may be a hard task for which kNN does not perform well. On the other hand, the provable stability of Letter seems to be negatively affected by the size of its training set D, as more samples and more features are more likely to introduce ties between abstract distances.

The fairness experiments show that kNN predictions on:

  1. (i)

    Compas are rather unfair on the sensitive race category, since the average provable race fairness for all k with \(\epsilon =0\) is \(64.7\%\);

  2. (ii)

    German are rather fair on the sensitive gender attribute, since the average provable gender fairness for all k with \(\epsilon =0\) is \(83.8\%\);

  3. (iii)

    Compas are always more fair with \(k=7\);

  4. (iv)

    German are mostly more fair with \(k=1\).

Table 6 shows the average certification time, in seconds, per input sample \(\textbf{x}\) and per magnitude \(\epsilon \). This is computed as the average time for executing \(\text {NAVe}\) for all \(k \in \{1,3,5,7\}\) on a given input sample (i.e., average on the whole test set T) and for a given magnitude \(\epsilon \) (i.e., average on the 8 magnitudes \(\epsilon \)). Our certification technique turns out to be quite fast, where the peak average time of about 4 min is reached for certifying the individual fairness of Compas samples with Euclidean distance through zonotopes, very likely due to one-hot encoding that explodes the number of features from 10 to 370.

Table 6 Average certification time per sample in seconds

6.2.1 Robustness

Table 7 reports the percentages of provable robustness for the interval abstraction \({\mathcal {I}}\) and Euclidean distance \(\delta _2\). Recall from Sect. 2.3 that a classifier is robust when it is both stable and accurate on its input sample, so that the provable robustness inferred by our tool \(\text {NAVe}\) on a test set T is defined as follows: for all k and \(\epsilon \),

$$\begin{aligned} \text {ProvableRobustness}_{k,\epsilon } \triangleq |\{(\textbf{x}, l_{\textbf{x}}) \in T \,\mid \, |C_{D, \delta _2, k}^{\mathcal {I}}(P_\infty ^\epsilon (\textbf{x}))|=1,\; k\text {NN}(\textbf{x})=l_{\textbf{x}}\}|/|T|. \end{aligned}$$
Table 7 Percentage of provable robustness for Intervals \({\mathcal {I}}\) with Euclidean distance \(\delta _2\) on the whole test sets T

For the sake of comparison with stability, \(\epsilon \) is limited to 0.05 because for higher thresholds the robustness percentages were too low. Let us recall that provable robustness is necessarily less than or equal to accuracy and provable stability. As expected, robustness behaves similarly to stability, where the relative comparison of Table 7 with stability must consider Table 4. In particular, Australian, Diabetes and Satimage exhibit smaller lower bounds on provable robustness w.r.t. stability, which is due to the lower accuracy of kNN on these datasets (cf. Table 1). This observation turns out to be precise for the dataset Australian, where for \(\epsilon = 0.001\) our tool \(\text {NAVe}\) infers \(100\%\) stability for any k (cf. Table 4): Hence, in this case, robustness actually coincides with accuracy, as the lack of accuracy is the sole reason why kNN is not robust. The same effect happens for Diabetes, where, for \(\epsilon \le 0.005\), stability ranges over \(80\%\), while robustness is around \(60\%\), once again due to lack of accuracy of kNN classification.

7 Conclusion

We have shown how to design an abstract interpretation of k-nearest neighbor classifiers and how this technique defines, to the best of our knowledge, the first robustness certification framework for this popular ML algorithm. We implemented and experimentally evaluated our verification technique. The experiments show that our approach is effective and precise, and that kNN classification is generally robust for numerical perturbations less than \(\pm 3\%\).

As any formal verification method, our robustness certification technique is sound, meaning that if a classifier is proved stable over an adversarial region R, then every input in R will actually receive the same classification. However, our certification method, in general, is not complete; namely, the verification may suffer from a precision loss, thus failing to prove stability when this actually holds. This incompleteness makes our verification method susceptible to false negatives, which is the primary limitation of our approach, shared with any incomplete verification method. As discussed in Sect. 3.4, this issue can be mitigated by employing more precise abstract domains to reduce the loss of precision or by partitioning the adversarial region and applying the abstract verification tool to smaller inputs, similarly to the splitting technique described in Sect. 5 for categorical features.

As future work, we plan to design a new numerical abstraction that can precisely track the role of different features when comparing abstract distances between two samples. Ideally, we would aim to achieve a complete stability certification of kNN.