1 Introduction

With the increasing use of machine learning in real-life applications, the safety and security of learning-based systems have been of great interest. In particular, many recent studies [12, 40] have found vulnerabilities on the robustness of deep neural networks (DNNs) to malicious inputs, which can lead to disasters in security critical systems, such as self-driving cars. To find out these vulnerabilities in advance, there have been researches on the formal verification and testing methods for the robustness of DNNs in recent years [23, 26, 35, 41]. However, relatively little attention has been paid to the formal specification of machine learning [38].

In the research filed of formal specification and verification, logical approaches have been shown useful to characterize desired properties and to develop theories to discuss those properties. For example, temporal logic [36] is a branch of modal logic for expressing time-dependent propositions, and has been widely used to describe requirements of hardware and software systems. For another example, epistemic logic [44] is a modal logic for knowledge and belief that has been employed as formal policy languages for distributed systems (e.g., for the authentication [8] and the anonymity [39] of security protocols). As far as we know, however, no prior work has employed logical formulas to rigorously describe various statistical properties of machine learning, although there are some papers that (often informally) list various desirable properties of machine learning [38].

In this paper, we present a first logical formalization of statistical properties of machine learning. To describe the statistical properties in a simple and abstract way, we extend statistical epistemic logic (StatEL) [27], which has recently been proposed to describe statistical knowledge and is applied to formalize statistical hypothesis testing and statistical privacy of databases.

A key idea in our modeling of statistical machine learning is that we formalize logical aspects in the syntax level, and statistical distances and dataset operations in the semantics level by using accessibility relations of a Kripke model [30]. In this model, we formalize supervised learning and some of its desirable properties, including performance, robustness, and fairness. More specifically, classification performance and robustness are described as the differences between the correct class label and the classifier’s prediction, whereas fairness is expressed as a conditional indistinguishability between different groups.

Our contributions The main contributions of this work are as follows:

  • We propose a logical approach to formalizing statistical properties of machine learning in a simple and abstract way. Specifically, we introduce a principle that logical aspects of statistical properties are described in the syntax level, and statistical distances and datasets are formalized in the semantics level.

  • We formalize supervised learning models and test datasets (used to check whether the learning models satisfy specification) by employing a distributional Kripke model [27] where each possible world corresponds to a possible test dataset, and modal operators are interpreted as transformation and testing on datasets. Then, we show how the sampling from a dataset and non-deterministic adversarial inputs are formalized in the distributional Kripke model.

  • We propose an extension of statistical epistemic logic (StatEL) as a formal language to describe various properties of machine learning models, including the performance, robustness, and fairness of statistical classifiers. Then, the satisfaction of logical formulas representing those properties is associated with their testing using a test dataset. As far as we know, this is the first work that uses logical formulas to formalize various statistical properties of machine learning, and that provides an epistemic view on those properties.

  • We show some relationships among properties of classifiers, such as different levels of robustness. We also present certain relationships between classification performance and robustness, which suggest robustness-related properties that have not been formalized in the literature as far as we know.

Cautions and limitations In this paper, we focus on formalizing properties of supervised learning models that may be tested by using a dataset; i.e., we do not deal with unsupervised learning, reinforcement learning, the properties of learning algorithms, quality of training data (e.g., sample bias), quality of testing (e.g., coverage criteria), explainability, temporal properties, or system-level specification. It should be noted that most of the properties formalized in this paper have been known in machine learning literature, and the novelty of this work lies in the logical formulation of those statistical properties.

We also highlight that this work aims to provide a logical approach to the modeling of statistical properties tested with a dataset, and does not present methods for checking, guaranteeing, or improving the performance/robustness/fairness of machine learning models. As for the satisfiability of logical formulas, we leave the development of testing and (statistical) model checking algorithms as future work, since the research area on the testing and verification of machine learning is relatively new and needs further techniques to improve the scalability. Moreover, in some applications such as image recognition, some atomic formulas (e.g., representing whether an input image is a panda) cannot be defined mathematically, and require additional techniques based on experiments. Nevertheless, we demonstrate that describing various properties using logical formulas is useful to explore desirable properties and to discuss their relationships in a framework.

Finally, we emphasize that our work is the first attempt to use epistemic models and logical formulas to express statistical properties of machine learning models, and would be a starting point to develop theories of formal specification of machine learning in future research.

Relationship with the preliminary version The main novelties of this paper with respect to the preliminary version [28] are as follows:

  • We add how the satisfaction of a formula at a possible world can be regarded as the testing of a specification using a test dataset (Sect. 3.1).

  • We show how modal operators are used to model the transformation and testing on datasets. For example, data preparation \(\mathop {T}\) (e.g., data cleaning, data augmentation) can also be formalized as a modal operator \(\varDelta _{T}\) (Sect. 3.2).

  • We re-interpret the non-classical implication \(\supset \) for conditional probabilities in StatEL as a modal operator associated with a conditioning relation (Sect. 3.3).

  • We introduce a modal operator \(\mathbin {\sim }_{x}^{\varepsilon ,D}\) for conditional indistinguishability (Sect. 3.4). Then, we provide a more comprehensible formalization of the fairness of supervised learning (Sect. 7) without using counterfactual epistemic operators [28], because the formalization using these operators requires an additional formula and makes the presentation more complicated and unintuitive.

  • We add a formalization of generalization error to capture how accurately a classifier is able to classify previously unseen input data (Sect. 5.3).

  • We add a formalization of other fairness notions called separation (Sect. 7.3) and sufficiency (Sect. 7.4) so that this paper covers all three categories of fairness notions [5].

  • We show a running example of a pedestrian detection to illustrate the formalization of various notions of performance, robustness, and fairness.

Paper organization The rest of this paper is organized as follows. Section 2 presents notations used in this paper and provides background on statistical distances and statistical epistemic logic (StatEL). Section 3 introduces a different view on the modal operators in StatEL and extends the logic with additional operators. Section 4 introduces a formal model for describing the behaviors of statistical classifiers and non-deterministic adversarial inputs. Sections 5, 6 and 7, respectively, formalize various notions of the performance, robustness, and fairness of classifiers by using our extension of StatEL. Section 8 presents related work and Sect. 9 concludes.

2 Preliminaries

In this section we introduce some notations, and review background on statistical distance notions and the syntax and semantics of statistical epistemic logic (StatEL), introduced in [27].

2.1 Notations

Let \(\mathbb {R}^{\ge 0}\) be the set of non-negative real numbers, and [0, 1] be the set of non-negative real numbers not greater than 1. We denote by \(\mathbb {D}\mathcal {O}\) the set of all probability distributions over a finite set \(\mathcal {O}\). Given a finite set \(\mathcal {O}\) and a probability distribution \(\mu \in \mathbb {D}\mathcal {O}\), the probability of sampling a value v from \(\mu \) is denoted by \(\mu [v]\). For a subset \(R\subseteq \mathcal {O}\), let \(\mu [R] = \sum _{v\in R} \mu [v]\). For a distribution \(\mu \) over a finite set \(\mathcal {O}\), its support is defined by \({\texttt {supp}}(\mu ) = \{ v \in \mathcal {O}:\mu [v] > 0 \}\).

2.2 Statistical distance

We recall popular notions of distance between probability distributions: total variation and \(\infty \)-Wasserstein distance.

Informally, total variation between two distributions \(\mu _0\) and \(\mu _1\) over a set \(\mathcal {O}\) represents the largest difference between the probabilities that \(\mu _0\) and \(\mu _1\) assign to an identical subset R of \(\mathcal {O}\).

Definition 1

(Total variation) For a finite set \(\mathcal {O}\), the total variation \(\textit{D}_\mathsf{tv}\) of two distributions \(\mu _0, \mu _1 \in \mathbb {D}\mathcal {O}\) is defined by:

$$\begin{aligned} \textit{D}_\mathsf{tv}(\mu _0 \parallel \mu _1) {\mathop {=}\limits ^{\text{ def }}}\sup _{R \subseteq \mathcal {O}} | \mu _0[R] - \mu _1[R] | {.} \end{aligned}$$

We then recall the \(\infty \)-Wasserstein metric [43]. Intuitively, the \(\infty \)-Wasserstein metric \(\textit{W}_{\textit{d}}(\mu _0, \mu _1)\) between two distributions \(\mu _0, \mu _1\) represents the minimum largest move between points in a transportation from \(\mu _0\) to \(\mu _1\).

Definition 2

(\(\infty \)-Wasserstein metric) Let \(\mathcal {O}\) be a finite set and \(\textit{d}: \mathcal {O}\times \mathcal {O}\rightarrow \mathbb {R}^{\ge 0}\) be a metric over \(\mathcal {O}\). The \(\infty \)-Wasserstein metric \(\textit{W}_{\textit{d}}\) w.r.t. \(\textit{d}\) between two distributions \(\mu _0, \mu _1\in \mathbb {D}\mathcal {O}\) is defined by:

$$\begin{aligned} \textit{W}_{\textit{d}}(\mu _0, \mu _1) = \min _{\mu \in \textsf {cp}(\mu _0, \mu _1)}\max _{(v_0, v_1)\in {\texttt {supp}}(\mu )} \textit{d}(v_0, v_1) \end{aligned}$$

where \(\textsf {cp}(\mu _0, \mu _1)\) is the set of all couplingsFootnote 1 of \(\mu _0\) and \(\mu _1\).

2.3 Syntax of StatEL

We next recall the syntax of statistical epistemic logic (StatEL) [27], which has two levels of formulas: static and epistemic formulas. Intuitively, a static formula describes a proposition satisfied at a (deterministic) state, while an epistemic formula describes a proposition satisfied at a probability distribution of states. In this paper, the former is used only to define the latter.

Formally, let \(\texttt {Mes}\) be a set of symbols called measurement variables, and \(\varGamma \) be a set of atomic formulas of the form \(\gamma (x_1, x_2, \ldots , x_n)\) for a predicate symbol \(\gamma \), \(n \ge 0\), and \(x_1, x_2, \ldots , x_n\in \texttt {Mes}\). Let \(I \subseteq [0, 1]\) be a finite union of disjoint intervals, and \(\mathcal {A}\) be a finite set of indices (e.g., associated with statistical divergences). Then, the formulas are defined by:

$$\begin{aligned}&\text{ Static } \text{ formulas: }~ \psi \mathbin {::=} \gamma (x_1, x_2, \ldots , x_n) \,|\, \lnot \psi \,|\, \psi \wedge \psi \\&\text{ Epistemic } \text{ formulas: }~ \varphi \mathbin {::=} \mathop {\mathbb {P}_{I}} \psi \,|\, \lnot \varphi \,|\, \varphi \wedge \varphi \,|\, \psi {\supset } \varphi \,|\, {\mathop {\textsf {K}_{a}}}\varphi \end{aligned}$$

where \(a\in \mathcal {A}\). We denote by \(\mathcal {F}\) the set of all epistemic formulas. Note that we have no quantifiers over measurement variables. (See Sect. 2.5 for more details.)

The probability quantification \(\mathop {\mathbb {P}_{I}} \psi \) represents that a static formula \(\psi \) is satisfied with a probability belonging to a set I. For instance, \(\mathop {\mathbb {P}_{(0.95, 1]}} \psi \) represents that \(\psi \) holds with a probability greater than 0.95. By \(\psi \supset \mathop {\mathbb {P}_{I}} \psi '\) we represent that the conditional probability of \(\psi '\) given \(\psi \) is included in a set I. The epistemic knowledge \(\mathop {\textsf {K}_{a}}\varphi \) expresses that we know \(\varphi \) when our capability of observation is denoted by \(a\in \mathcal {A}\).

As syntax sugar, we use disjunction \(\vee \), classical implication \(\rightarrow \), and epistemic possibility \(\mathop {\textsf {P}_{a}}\), defined as usual by: \(\varphi _0 \vee \varphi _1 \mathbin {::=} \lnot (\lnot \varphi _0 \wedge \lnot \varphi _1)\), \(\varphi _0 \rightarrow \varphi _1 \mathbin {::=} \lnot \varphi _0 \vee \varphi _1\), and \(\mathop {\textsf {P}_{a}}{\varphi } \mathbin {::=} \lnot \mathop {\textsf {K}_{a}}\lnot \varphi \). When I is a singleton \(\{ i \}\), we abbreviate \(\mathop {\mathbb {P}_{I}}\) as \(\mathop {\mathbb {P}_{i}}\).

2.4 Distributional Kripke model

Next we recall the notion of a distributional Kripke model [27], where each possible world is associated with a probability distribution over a set of states, and with a stochastic assignment of data to measurement variables.

Definition 3

(Distributional Kripke model) Let \(\mathcal {A}\) be a finite set of indices (typically associated with operations and tests on datasets), \(\mathcal {S}\) be a finite set of states, and \(\mathcal {O}\) be a finite set of data, called a data domain. A distributional Kripke model is a tuple \(\mathfrak {M}=(\mathcal {W}, (\mathcal {R}_a)_{a\in \mathcal {A}}, (V_s)_{s\in \mathcal {S}})\) consisting of:

  • a non-empty set \(\mathcal {W}\) of multisets of states belonging to \(\mathcal {S}\);

  • for each \(a\in \mathcal {A}\), an accessibility relation \(\mathcal {R}_a \subseteq \mathcal {W}{\times }\mathcal {W}\);

  • for each \(s\in \mathcal {S}\), a valuation \(V_s: \varGamma \rightarrow \mathcal {P}(\mathcal {O}^k)\) that maps each k-ary predicate \(\gamma \) to a set \(V_s(\gamma )\) of k-tuples of data.

The set \(\mathcal {W}\) is called a universe, and its elements are called possible worlds. A world is said to be finite if it is a finite multiset, i.e., it has a finite number of (possibly duplicated) elements. A world is said to be infinite if it is an infinite multiset.

The relation \(\mathcal {R}_a\) determines an accessibility between two worlds. For example, \((w, w')\in \mathcal {R}_a\) means that a world \(w'\) is accessible from a world w when our capability of distinguishing possible worlds is denoted by \(a\in \mathcal {A}\). The valuation \(V_s\) may give a possibly different interpretation of a predicate \(\gamma \) at a different state s. We assume that all measurement variables range over the same data domain \(\mathcal {O}\) in every world. The interpretation of measurement variables at a state s is given by a deterministic assignment \(\sigma _s\) defined below.

Definition 4

(Deterministic assignment) For any distributional Kripke model \(\mathfrak {M}{\,=}(\mathcal {W}, (\mathcal {R}_a)_{a\in \mathcal {A}}, (V_s)_{s\in \mathcal {S}})\), we assume that each world \(w\in \mathcal {W}\) is associated with a function \(\rho _w: \texttt {Mes}\times \mathcal {S}\rightarrow \mathcal {O}\) that maps each measurement variable x to its value \(\rho _w(x, s)\) that is observed at a state s belonging to the world w. We also assume that each state s in a world w is associated with the deterministic assignment \(\sigma _s: \texttt {Mes}\rightarrow \mathcal {O}\) defined by \(\sigma _s(x) = \rho _w(x, s)\).

Since each world w is a multiset of states, we abuse the notation and denote by w[s] the probability that a state s is randomly chosen from w (i.e., the number of occurrences of s in the multiset w, divided by the total number of elements in w). Here, we regard each world w as a probability distribution over the states that corresponds to the multiset.

The probability that a measurement variable \(x\in \texttt {Mes}\) has a value \(v\in \mathcal {O}\) is: \(\sigma _w(x)[v] = \sum _{\begin{array}{c} s\in w, \sigma _s(x) = v \end{array}} w[s]\). Note that \(\sigma _w: \texttt {Mes}\rightarrow \mathbb {D}\mathcal {O}\) maps each measurement variable x to a probability distribution \(\sigma _w(x)\) over the data domain \(\mathcal {O}\). Hence, \(\sigma _w\) represents the joint probability distribution of all variables in \(\texttt {Mes}\), and is called the stochastic assignment at w. When a state s is uniformly drawn from a multiset w of states, a datum \(\sigma _s(x)\) is sampled from the distribution \(\sigma _w(x)\).

In later sections, a possible world corresponds to a dataset (i.e., a multiset of data tuples) from which data are sampled. For example, suppose that we have only three measurement variables \(\texttt {Mes}= \{ x, y, z \}\). Then for each state s in a world w, the deterministic assignment \(\sigma _{s}: \texttt {Mes}\rightarrow \mathcal {O}\) represents the tuple of data \((\sigma _{s}(x), \sigma _{s}(y), \sigma _{s}(z))\). Hence, each state s corresponds to a tuple of data, and the world w corresponds to the dataset \(\{ (\sigma _{s}(x), \sigma _{s}(y), \sigma _{s}(z)) \mid s\in w \}\).

2.5 Stochastic semantics of StatEL

Now we recall the stochastic semantics [27] for the StatEL formulas over a distributional Kripke model \(\mathfrak {M}=(\mathcal {W}, (\mathcal {R}_a)_{a\in \mathcal {A}}, (V_s)_{s\in \mathcal {S}})\) with \(\mathcal {W}= \mathbb {D}\mathcal {S}\).

The interpretation of a static formulas \(\psi \) at a state s is given by:

$$\begin{aligned} s \models \gamma (x_1, \ldots , x_k)&~ \text{ iff } ~ (\sigma _s(x_1), \ldots , \sigma _s(x_k)) \in V_s(\gamma )\\ s \models \lnot \psi&~ \text{ iff } ~ s \not \models \psi \\ s \models \psi \wedge \psi '&~ \text{ iff } ~ s \models \psi ~ \text{ and } ~ s \models \psi ' {.} \end{aligned}$$

The restriction \(w|_\psi \) of a world w to a static formula \(\psi \) is defined by \(w|_\psi [s] = \frac{w[s]}{\sum _{s': s' \models \psi } w[s']}\) if \(s \models \psi \), and \(w|_\psi [s] = 0\) otherwise. Note that \(w|_\psi \) is undefined if there is no state s that satisfies \(\psi \) and has a nonzero probability in w.

Then, the interpretation of epistemic formulas in a world w is defined by:

$$\begin{aligned} \mathfrak {M}, w \models \mathop {\mathbb {P}_{I}} \psi&~ \text{ iff } ~ \Pr \left[ s {\mathop {\leftarrow }\limits ^{{\$}}}w :~ s \models \psi \right] \in I\\ \mathfrak {M}, w \models \lnot \varphi&~ \text{ iff } ~ \mathfrak {M}, w \not \models \varphi \\ \mathfrak {M}, w \models \varphi \wedge \varphi '&~ \text{ iff } ~ \mathfrak {M}, w \models \varphi ~ \text{ and } ~ \mathfrak {M}, w \models \varphi '\\ \mathfrak {M}, w \models \psi \supset \varphi&~ \text{ iff } ~w|_{\psi }\text { is defined and }~ \mathfrak {M}, w|_{\psi } \models \varphi \\ \mathfrak {M}, w \models \mathop {\textsf {K}_{a}}\varphi&~ \text{ iff } ~ \text {for every }w'\text { s.t. }(w, w') \in \mathcal {R}_a, ~\\&\qquad \mathfrak {M}, w' \models \varphi {,} \end{aligned}$$

where \(s {\mathop {\leftarrow }\limits ^{{\$}}}w\) represents that a state s is sampled from the distribution w.

Then, \(\mathfrak {M}, w \models \psi _0 \supset \mathop {\mathbb {P}_{I}} \psi _1\) represents that the conditional probability of satisfying a static formula \(\psi _1\) given another \(\psi _0\) is included in a set I at a world w.

In each world w, measurement variables can be interpreted using \(\sigma _w\). This allows us to assign different values to different occurrences of a variable in a formula; e.g., in \(\varphi (x) \rightarrow \mathop {\textsf {K}_{a}}\varphi '(x)\)x occurring in \(\varphi (x)\) is interpreted by \(\sigma _{w}\) in a world w, while x in \(\varphi '(x)\) is interpreted by \(\sigma _{w'}\) in another \(w'\) s.t. \((w, w')\in \mathcal {R}_a\).

Finally, the interpretation of an epistemic formula \(\varphi \) in \(\mathfrak {M}\) is given by:

$$\begin{aligned} \mathfrak {M}\models \varphi&~ \text{ iff } ~ \text{ for } \text{ every } \text{ world } w\text { in }\mathfrak {M}, ~ \mathfrak {M}, w \models \varphi {.} \end{aligned}$$

Hereafter, we mainly focus on the satisfaction local to a possible world, and \(\mathfrak {M}\) may be omitted when it is clear from the context.

3 Modality as transformation and testing on datasets

In this section, we introduce a different view on the modal operators in statistical epistemic logic (StatEL), and define additional modal operators that are used to formalize various properties of machine learning in Sects. 57.

3.1 Checking satisfaction at a world as testing with a dataset

We first show how we regard the satisfaction of a formula \(\varphi \) as testing a system’s specification expressed by \(\varphi \) as follows.

As explained in Sect. 2.4, a possible world corresponds to a possible dataset. Thus, given a model \(\mathfrak {M}\), a world w, and a formula \(\varphi \), checking the satisfaction \(\mathfrak {M}, w \models \varphi \) can be regarded as testing whether the specification \(\varphi \) of a system (e.g., a machine learning model we formalize in Sect. 4) is satisfied when the dataset w provides inputs to the system. For example, let \(\varphi \) be a formula representing that a machine learning task (e.g., classification) C fails with probability at most \(5\%\). Then, \(\mathfrak {M}, w \models \varphi \) represents that when the learning task C is performed using a test dataset w, then it fails for at most \(5\%\) of the test data in w.

For simplicity, we discuss the satisfaction of the formulas \(\varphi \) in which neither \(\mathop {\textsf {K}_{a}}\) nor \(\mathop {\textsf {P}_{a}}\) occurs as follows. For each state (namely, data tuple) \(s\in w\) and for each static sub-formula \(\psi \) of \(\varphi \), we can efficiently check whether \(s \models \psi \).

When the dataset w is finite (i.e., it is a finite multiset of data tuples), we can check the satisfaction \(w \models \varphi \) in finite time, more precisely, in linear time in the number of elements in w.

When the dataset w is infinite, however, we cannot check whether \(w \models \varphi \) in general. For example, suppose that w is the infinite dataset representing a true distribution from which data are sampled and observed. When we cannot learn w itself, we usually obtain a finite dataset \(\textit{w}_{\,\mathsf fin}\) by sampling data from w repeatedly and independently and check a specification \(\varphi \) only with this test dataset \(\textit{w}_{\,\mathsf fin}\).

Hereafter, we mainly deal with distributional Kripke models \(\mathfrak {M}\) that have infinite numbers of finite worlds. In the following sections except Sect. 6, we deal only with formulas without \(\mathop {\textsf {K}_{a}}\) nor \(\mathop {\textsf {P}_{a}}\),Footnote 2 hence can check their satisfaction at a finite world in finite time.

3.2 Modal operators for dataset transformation

In the rest of Sect. 3, we show that modal operators can be used to model the transformation and testing on datasets.

First, we introduce modal operators for dataset transformation. The modal operator \(\varDelta _{T}\) defined below is unary (i.e., taking a single formula as argument), and is parameterized with a transformation \(\mathop {T}\) between datasets. Intuitively, \(w \models \varDelta _{T}\varphi \) represents that a formula \(\varphi \) is satisfied for the dataset \(w'\) that is obtained by transforming the current dataset w by \(\mathop {T}\). Formally, the modal operator \(\varDelta _{T}\) is interpreted as follows.

Definition 5

(Modality \(\varDelta _{T}\) for a dataset transformation T) Given a function \(T: \mathcal {W}\rightarrow \mathcal {W}\), we define an accessibility relation as \(\mathcal {R}_{T}{\mathop {=}\limits ^{\text{ def }}}\{ (w, w') \mid w' = T(w) \}\). Then, we define the interpretation of \(\varDelta _{T}\) by:

$$\begin{aligned}&\mathfrak {M}, w \models \varDelta _{T}\varphi \\&\text{ iff } ~ \text{ there } \text{ is } \text{ a } w'\text { s.t. }(w, w') \in \mathcal {R}_{T}\text { and }~ \mathfrak {M}, w' \models \varphi {.} \end{aligned}$$

For example, machine learning often require data preparation to manipulate a given raw dataset into a form that makes a machine learning task feasible and more effective (e.g., data cleaning, data augmentation). For a dataset w and two ways of data preparation \(\mathop {T}\nolimits _0\) and \(\mathop {T}\nolimits _1\)\(w \models \varDelta _{\mathop {T}\nolimits _0} \varphi \wedge \varDelta _{\mathop {T}\nolimits _1} \varphi \) represents that a property \(\varphi \) holds for the two prepared datasets \(\mathop {T}\nolimits _0(w)\) and \(\mathop {T}\nolimits _1(w)\).

For another example, the security of machine learning often assumes a certain malicious adversary that can manipulate a given dataset to make a machine learning task fail. Such adversarial operations \(\mathop {T}\) on datasets can also be formalized using a different modal operator corresponding to \(\mathop {T}\) as we will explain in Sect. 6.

In the next section, we show that the logical connective \(\supset \) can be re-interpreted as the modality \(\varDelta _{T}\) for some dataset transformation T.

3.3 Modality for conditioning

We then present another interpretation of the logical connective \(\supset \) (defined in Sect. 2.5) used to express conditional probabilities in Sects. 5 and 6. Roughly speaking, we regard the restriction \(w|_{\psi }\) of a world w to a static formula \(\psi \) as a transformation \(\mathcal {R}_{\psi }\) of w. Then, we redefine \(\supset \) as a modal operator associated with \(\mathcal {R}_{\psi }\), and call it the conditioning operator. Formally, the interpretation of \(\supset \) is defined as follows.

Definition 6

(Conditioning operator \(\supset \)) Assume that the universe \(\mathcal {W}\) includes all sub-multisets of each \(w\in \mathcal {W}\). Given a static formula \(\psi \), we define an accessibility relation as the conditioning relation \(\mathcal {R}_{\psi }{\mathop {=}\limits ^{\text{ def }}}\{ (w, w|_{\psi }) \mid w\in \mathcal {W}\}\). Then, the interpretation of the conditioning operator \(\supset \) is given by:

$$\begin{aligned}&\mathfrak {M}, w \models \psi \supset \varphi \\&\text{ iff } ~\text{ there } \text{ is } \text{ a } w'\text { s.t. }(w, w')\in \mathcal {R}_{\psi }\text { and }~ \mathfrak {M}, w' \models \varphi {.} \end{aligned}$$

Intuitively, \(w \models \psi \supset \varphi \) corresponds to the two operations: (i) transforming the given dataset w to the sub-dataset \(w|_{\psi }\) and (ii) testing whether a property \(\varphi \) holds for the sub-dataset \(w|_{\psi }\). When no data in the dataset w satisfies the property \(\psi \), we can describe this as \(\mathfrak {M}, w \models \psi \supset \bot \) by using the propositional constant falsum \(\bot \).

Note that the conditioning \(\psi \supset \varphi \) can be regarded as the modal formula \(\varDelta _{T}\varphi \) with the dataset transformation T where \(T(w) = w|_{\psi }\) for all \(w\in \mathcal {W}\).

In Sects. 5 and 6, we show concrete examples using the conditioning operator \(\supset \), i.e., the classification performance and robustness of statistical classifiers.

3.4 Modality for conditional indistinguishability

Next, we introduce a modal operator that is used to formalize the fairness of machine learning in Sect. 7.

Given two static formulas \(\psi _0, \psi _1\) (e.g., representing male and female), \(w|_{\psi _0}(x)\) (resp. \(w|_{\psi _1}(x)\)) represents the probability distribution of values of a measurement variable x generated from the sub-dataset \(w|_{\psi _0}\), e.g., the sub-dataset about male (resp. \(w|_{\psi _1}\), e.g., about female). To formalize a certain similarity between x’s values generated from the two sub-datasets (e.g., between the benefits for male and for female), we introduce a modal operator \(\mathbin {\sim }_{x}^{\varepsilon ,D}\) for conditional indistinguishability as follows. We write \(\psi _0 \mathbin {\sim }_{x}^{\varepsilon ,D} \psi _1\) to represent that the two distributions \(w|_{\psi _0}(x)\) and \(w|_{\psi _1}(x)\) are indistinguishable up to a threshold \(\varepsilon \) in terms of a divergence or distance D. Formally, this modality is defined as follows.Footnote 3

Definition 7

(Conditional indistinguishability operator \(\mathbin {\sim }_{x}^{\varepsilon ,D}\)) Assume that the universe \(\mathcal {W}\) includes all sub-multisets of each \(w\in \mathcal {W}\). Given an \(x\in \texttt {Mes}\), an \(\varepsilon \in \mathbb {R}^{\ge 0}\), and a divergence or distance \(D: \mathbb {D}\mathcal {O}\times \mathbb {D}\mathcal {O}\rightarrow \mathbb {R}^{\ge 0}\), we define an accessibility relation by:

$$\begin{aligned} \mathcal {R}_{x}^{\varepsilon ,D}{\mathop {=}\limits ^{\text{ def }}}\{ (w_0, w_1)\in \mathcal {W}\times \mathcal {W}\,|\, \textit{D}(\sigma _{w_0}(x) \parallel \sigma _{w_1}(x)) \le \varepsilon \} {.} \end{aligned}$$

Then for static formulas \(\psi _0\) and \(\psi _1\), we define the interpretation of \(\psi _0 \mathbin {\sim }_{x}^{\varepsilon ,D} \psi _1\) by:

$$\begin{aligned}&\mathfrak {M}, w \models \psi _0 \mathbin {\sim }_{x}^{\varepsilon ,D} \psi _1\\&\text{ iff } ~ \text{ there } \text{ exist } w_0, w_1\text { s.t. } (w, w_0) \in \mathcal {R}_{\psi _0}, \\&\quad (w, w_1)\in \mathcal {R}_{\psi _1}, \text{ and } (w_0, w_1) \in \mathcal {R}_{x}^{\varepsilon ,D}{,} \end{aligned}$$

where \(\mathcal {R}_{\psi _0}\) and \(\mathcal {R}_{\psi _1}\) are two conditioning relations in Definition 6.

Note that two worlds are related by \(\mathcal {R}_{x}^{\varepsilon ,D}\) if they have close probability distributions of the values of x. Intuitively, \(w \models \psi _0 \mathbin {\sim }_{x}^{\varepsilon ,D} \psi _1\) corresponds to the two operations: (i) transforming the given dataset w to the two sub-datasets \(w|_{\psi _0}\) and \(w|_{\psi _1}\), and (ii) testing whether the probability distribution of x generated by the dataset \(w|_{\psi _0}\) is indistinguishable from the distribution generated by the dataset \(w|_{\psi _1}\).

When \(\varepsilon = 0\), the operator \(\mathbin {\sim }_{x}^{\varepsilon ,D}\) represents the identity of two distributions.

Proposition 1

For a world w, static formulas \(\psi _0\), \(\psi _1\), and a measurement variable x, \(w \models \psi _0 \mathbin {\sim }_{x}^{0,D} \psi _1\) iff the distribution \(w|_{\psi _0}(x)\) is identical to \(w|_{\psi _1}(x)\).

This proposition is immediate from the following lemma.

Lemma 1

For a world w, static formulas \(\psi _0\), \(\psi _1\), and a measurement variable x,

$$\begin{aligned} w \models \psi _0 \mathbin {\sim }_{x}^{\varepsilon ,D} \psi _1 \text{ iff } \textit{D}(\sigma _{w|_{\psi _0}}(x) \parallel \sigma _{w|_{\psi _1}}(x)) \le \varepsilon {.} \end{aligned}$$

Proof

Let \(w_0 = w|_{\psi _0}\) and \(w_1 = w|_{\psi _1}\). Then by Definition 6, we have \((w, w_0) \in \mathcal {R}_{\psi _0}\) and \((w, w_1)\in \mathcal {R}_{\psi _1}\). Hence, this lemma follows from Definition 7. \({\square }\)

In Sect. 7, we present examples using the conditional indistinguishability operator, i.e., we formalize various notions of fairness in machine learning by using this operator and the above proposition and lemma.

3.5 Summary on the modal language

In summary, modal operators are used to represent transformation and testing on datasets. The unary modal operator \(\varDelta _{T}\) is regarded as a transformation \(\mathop {T}\) on datasets, while the binary modal operators \(\supset \) and \(\mathbin {\sim }_{x}^{\varepsilon ,D}\) are regarded as transforming-then-testing on datasets.

Now the syntax of the formulas is given by:

$$\begin{aligned}&\text{ Static } \text{ formulas: }~ \\&\psi \mathbin {::=} \gamma (x_1, x_2, \ldots , x_n) \mid \lnot \psi \mid \psi \wedge \psi \\&\text{ Dataset } \text{ formulas: }~\\&\varphi \mathbin {::=} \mathop {\mathbb {P}_{I}} \psi \,|\, \lnot \varphi \,|\, \varphi \wedge \varphi \,|\, \varDelta _{T}\varphi \,|\, \psi \supset \varphi \,|\, \psi _0 \mathbin {\sim }_{x}^{\varepsilon ,D} \psi _1 \,|\, \mathop {\textsf {K}_{a}}\varphi , \end{aligned}$$

where the epistemic formulas with the additional modality are called dataset formulas, since they are interpreted in a world that corresponds to a dataset.

When multiple transformations/testing are sequentially applied to datasets, we can use dataset formulas in which different modal operators are nested. For example, \(w \models \varDelta _{T}(\psi \supset \varphi )\) represents that after applying a data preparation T to a dataset w, a property \(\varphi \) holds for the sub-dataset \(T(w)|_{\psi }\) that satisfies \(\psi \).

4 Epistemic model for supervised learning

In this section, we introduce a formal model for supervised learning. Specifically, we employ a distributional Kripke model (Definition 3), and formalize a behavior of a classifier C and a non-deterministic input x from an adversary in the model. In this formalization, we focus only on the testing of supervised learning models, and do not formalize the training of supervised learning models or learning algorithms themselves.

Fig. 1
figure 1

A world w is chosen non-deterministically and corresponds to a test dataset. With probability \(w[s_i]\), the world w is in a deterministic state \(s_i\) where the classifier C receives the input value \(\sigma _{s_i}(x)\) and returns the output value \(\sigma _{s_i}(\hat{y})\). Each state \(s_i\) can be regarded as a tuple \((\sigma _{s_i}(x), \sigma _{s_i}(y), \sigma _{s_i}(\hat{y})) \in \mathcal {D}\times \texttt {L}\times \texttt {L}\) consisting of an input datum, an actual label, and a predicted label

4.1 Classification problems

Multiclass classification is the problem of classifying a given input into one of multiple classes. Let \(\texttt {L}\) be a finite set of class labels,Footnote 4 and \(\mathcal {D}\) be a finite set of input data (called feature vectors) that we want to classify. Then, a classifier is a function \(C: \mathcal {D}\rightarrow \texttt {L}\) that receives an input datum v and predicts which class (among \(\texttt {L}\)) the input v belongs to. In this work, we deal with a situation where some classifier C has already been obtained and its properties should be evaluated, and do not model or reason about how classifiers are trained from a training dataset.

We assume a scoring function \(f: \mathcal {D}\times \texttt {L}\rightarrow \mathbb {R}\) that gives a score \(f(v, \ell )\) of predicting the class of an input datum (feature vector) v as a label \(\ell \). Then for each input \(v\in \mathcal {D}\), we denote by \(H(v) = \ell \) to represent that a label \(\ell \) maximizes \(f(v, \ell )\). For example, when the input v is an image of an animal and \(\ell \) is the animal’s name, then \(H(v) = \ell \) may represent that an oracle (or a “human”) classifies the image v as \(\ell \).

4.2 Modeling the behaviors of classifiers

A classifier is formalized on a distributional Kripke model \(\mathfrak {M}=(\mathcal {W}, (\mathcal {R}_a)_{a\in \mathcal {A}}, (V_s)_{s\in \mathcal {S}})\) with \(\mathcal {W}= \mathbb {D}\mathcal {S}\). Then, \(\mathcal {W}\) is an infinite set of possible worlds that corresponds to all possible datasets from which the classifier can receive input data. We denote by \(\textit{w}_\mathsf{test}\in \mathcal {W}\) a real world that corresponds to a test dataset. Recall that each world \(w\in \mathcal {W}\) is a multiset of states over \(\mathcal {S}\) and is associated with a stochastic assignment \(\sigma _w: \texttt {Mes}\rightarrow \mathbb {D}\mathcal {O}\) that is consistent with the deterministic assignments \(\sigma _s\) for all \(s\in w\), as explained in Sect. 2.4.

We present an overview of our formalization in Fig. 1. We denote by \(x\in \texttt {Mes}\) an input datum given to the classifier C (and to the oracle H), by \(y\in \texttt {Mes}\) a correct label given by the oracle H, and by \(\hat{y}\in \texttt {Mes}\) a label predicted by C. We assume that the input variable x (resp. the output variables \(y,\hat{y}\)) ranges over the set \(\mathcal {D}\) of input data (resp. the set \(\texttt {L}\) of labels); i.e., the deterministic assignment \(\sigma _s\) at each state \(s\in \mathcal {S}\) has the range \(\mathcal {O}= \mathcal {D}\cup \texttt {L}\) and satisfies \(\sigma _s(x)\in \mathcal {D}\) and \(\sigma _s(y), \sigma _s(\hat{y})\in \texttt {L}\).

A key idea in our modeling is that we describe logical aspects of statistical properties in the syntax level by using logical formulas, and model statistical distances and dataset operations in the semantics level by using accessibility relations in the distributional Kripke model. In this way, we can formalize various statistical properties of classifiers in a simple and abstract way.

To formalize the classifier C, we introduce a static formula \(\psi (x, \hat{y})\) to represent that C classifies a given input x as a class \(\hat{y}\). We also introduce a static formula h(xy) to represent that y is the actual class of an input x. As an abbreviation, we write \(\psi _\ell (x)\) (resp. \(h_\ell (x)\)) to denote \(\psi (x, \ell )\) (resp. \(h(x, \ell )\)). Formally, these static formulas are interpreted at each state \(s\in \mathcal {S}\) as follows:

$$\begin{aligned} s \models \psi (x, \hat{y})&~ \text{ iff } ~ C(\sigma _s(x)) = \sigma _s(\hat{y}).\\ s \models h(x, y)&~ \text{ iff } ~ H(\sigma _s(x)) = \sigma _s(y). \end{aligned}$$

4.3 Modeling the non-deterministic inputs from adversaries

We first observe that a distributional Kripke model \(\mathfrak {M}\) can formalize an input x that is probabilistically chosen from a given dataset. As explained in Sect. 2.4, each world w corresponds to a test dataset. When a state s is drawn from a multiset w of states, an input value \(\sigma _s(x)\) is sampled from the distribution \(\sigma _w(x)\), and assigned to the measurement variable x. The set of all possible probability distributions of inputs is represented by \(\varLambda {\mathop {=}\limits ^{\text{ def }}}\left\{ \sigma _w(x) \mid w\in \mathcal {W}\right\} \), which is possibly an infinite set.

For example, let us consider testing the classifier C with the actual test dataset \(\sigma _{\textit{w}_\mathsf{test}}(x)\). When C classifies an input x as a label \(\ell \) with probability 0.2, i.e.,

$$\begin{aligned} \Pr \left[ ~ v {\mathop {\leftarrow }\limits ^{{\$}}}\sigma _{\textit{w}_\mathsf{test}}(x) \,:\, C(v) = \ell ~\right] = 0.2 , \end{aligned}$$

then this can be expressed by:

$$\begin{aligned} \mathfrak {M}, \textit{w}_\mathsf{test}\models \mathop {\mathbb {P}_{0.2}} \psi _\ell (x) {.} \end{aligned}$$

Next we observe that our model can formalize a non-deterministic input x from an adversary as follows. Although each state s in a possible world w is assigned the probability w[s], each world w itself is not assigned a probability. Thus, each input distribution \(\sigma _w(x) \in \varLambda \) itself is also not assigned a probability, hence our model assumes no probability distribution over \(\varLambda \). In other words, we assume that a world w and thus an input distribution \(\sigma _w(x)\) are non-deterministically chosen. This is useful to model an adversary that provides malicious inputs to the classifier C to make its prediction fail, because we usually do not have a prior knowledge of the probability distribution of malicious inputs from adversaries, and need to reason about the worst cases caused by the attack. In Sect. 6, this formalization of non-deterministic inputs is used to express the robustness of classifiers.

Finally, it should be noted that we cannot enumerate all possible adversarial inputs, hence cannot enumerate all possible datasets to construct the universe \(\mathcal {W}\). Since \(\mathcal {W}\) can be an infinite set and is unspecified, we cannot check whether a formula expressing a security property against an adversary is satisfied in all possible worlds of \(\mathcal {W}\). Nevertheless, as shown in later sections, describing various properties using our extension of StatEL is useful to explore desirable properties and to discuss relationships among them.

5 Formalizing the classification performance

In this section, we show a formalization of classification performance using our extension of StatEL. We formalize popular measures of classification performance, including precision, recall, and accuracy, and measures for evaluating overfitting, such as the generalization error. See Fig. 2 for basic ideas on these formalizations.

5.1 Classifier’s prediction and its correctness

In classification problems, the terms positive/negative represent the result of the classifier’s prediction, and the terms true/false represent whether the classifier predicts correctly or not. Then, the following terminologies are commonly used:

  • true positive (\(\textit{tp}\)): both the prediction and actual class are positive;

  • true negative (\(\textit{tn}\)): both the prediction and actual class are negative;

  • false positive (\(\textit{fp}\)): the prediction is positive but the actual class is negative;

  • false negative (\(\textit{fn}\)): the prediction is negative but the actual class is positive.

These terminologies can be formalized using static formulas as shown in Table 1. For example, when an input x shows true positive at a state s, this can be expressed as \(s \models \psi _\ell (x) \wedge h_\ell (x)\). Note that the value of the measurement variable x is uniquely determined by the assignment \(\sigma _s\) at the state s. True negative, false positive (type I error), and false negative (type II error) are, respectively, expressed as \(s \models \lnot \psi _\ell (x) \wedge \lnot h_\ell (x)\)\(s \models \psi _\ell (x) \wedge \lnot h_\ell (x)\), and \(s \models \lnot \psi _\ell (x) \wedge h_\ell (x)\).

Table 1 Logical description of the table of confusion
Fig. 2
figure 2

The classification performance compares the oracle H’s output with that of the classifier C’s, while the evaluation of overfitting compares the expected loss by the test dataset with that by the training dataset

5.2 Precision, recall, accuracy, and other performance measures

Next we formalize three popular measures for binary classification performance: precision, recall, and accuracy. In Table 1 we summarize the formalization of various notions of classification performance using our dataset formulas.

In theory, these notions should be formalized with the infinite dataset \(\textit{w}_{\,\mathsf true}\) representing the true distribution. However, we usually cannot obtain \(\textit{w}_{\,\mathsf true}\) or test the performance measures using \(\textit{w}_{\,\mathsf true}\). Hence, we often sample a finite test dataset \(\textit{w}_\mathsf{test}\) from the true distribution and regard it as an approximation of \(\textit{w}_{\,\mathsf true}\).Footnote 5

Given a test dataset \(\textit{w}_\mathsf{test}\), precision (positive predictive value) is defined as the conditional probability that the prediction is correct given that the prediction is positive; i.e., \({ precision} = \frac{\textit{tp}}{\textit{tp}+ \textit{fp}}\). Since the probability distribution of the input x in the world \(\textit{w}_\mathsf{test}\) is expressed by \(\sigma _{\textit{w}_\mathsf{test}}(x)\) as explained in Sect. 4.3, the precision being within an interval I is given by:

$$\begin{aligned} \Pr \left[ ~ v {\mathop {\leftarrow }\limits ^{{\$}}}\sigma _{\textit{w}_\mathsf{test}}(x) \,:\, H(v) = \ell ~\Big |~ C(v) = \ell ~\right] \in I {,} \end{aligned}$$

which can be written as:

$$\begin{aligned} \Pr \left[ ~ s {\mathop {\leftarrow }\limits ^{{\$}}}\textit{w}_\mathsf{test}\,:\, s \models h_\ell (x) ~\Big |~ s \models \psi _\ell (x) ~\right] \in I {.} \end{aligned}$$

By using StatEL, this can be formalized as:

$$\begin{aligned}&\mathfrak {M}, \textit{w}_\mathsf{test}\models \textsf {Precision}_{\ell ,I}(x)\\&\text{ where } ~ \textsf {Precision}_{\ell ,I}(x) {\mathop {=}\limits ^{\text{ def }}}\psi _\ell (x) \supset \mathop {\mathbb {P}_{I}} h_\ell (x) {.} \end{aligned}$$

Here, \(\supset \) is the conditioning operator defined in Sect. 3.3. The value of precision depends on the test dataset \(\textit{w}_\mathsf{test}\), and can be computed in finite time since \(\textit{w}_\mathsf{test}\) is finite.

Symmetrically, recall (true positive rate) is defined as the conditional probability that the prediction is correct given that the actual class is positive; i.e., \({ recall} = \frac{\textit{tp}}{\textit{tp}+ \textit{fn}}\). Then, the recall being within I is formalized as:

$$\begin{aligned} \textsf {Recall}_{\ell ,I}(x) {\mathop {=}\limits ^{\text{ def }}}h_\ell (x) \supset \mathop {\mathbb {P}_{I}} \psi _\ell (x) {.} \end{aligned}$$

Finally, accuracy is the probability that the classifier predicts correctly; i.e., \({ accuracy} = \frac{\textit{tp}+ \textit{tn}}{\textit{tp}+ \textit{tn}+ \textit{fp}+ \textit{fn}}\). Then, the accuracy being within I is formalized as:

$$\begin{aligned} \textsf {Accuracy}_{\ell ,I}(x) {\mathop {=}\limits ^{\text{ def }}}\mathop {\mathbb {P}_{I}}\bigl ( \psi _\ell (x) \leftrightarrow h_\ell (x) \bigr ) {,} \end{aligned}$$

which can also be defined as \(\mathop {\mathbb {P}_{I}}\bigl ( \textit{tp}(x) \vee \textit{tn}(x) \bigr )\). When we measure the accuracy after a data preparation operation T (e.g., data cleaning) to the test dataset \(\textit{w}_\mathsf{test}\), this can be represented by \(\textit{w}_\mathsf{test}\models \varDelta _{T}\textsf {Accuracy}_{\ell ,I}(x)\).

Example 1

(Performance of pedestrian detection) Let us consider an autonomous car that uses a machine learning classifier to detect a person crossing the road. For the sake of simplicity, we formalize an example of a binary classifier C that detects whether or not a pedestrian is crossing the road in a photo image in a test dataset \(\textit{w}_\mathsf{test}\). We write \(\textit{sunny}(x)\) (resp. \(\textit{snowy}(x)\)) to represent that a photograph x was taken on a sunny (resp. snowy) day. Let \(\psi _\ell (x)\) (resp. \(h_\ell (x)\)) represent that the classifier C (resp. the human) detects a pedestrian crossing the road in an image x.

We empirically measure recall (i.e., the conditional probability that C detects a pedestrian crossing the road when the input image x actually includes it) by using the data collected on sunny days. When C achieves a recall of 0.95 on sunny days, this is represented by \(\textit{w}_\mathsf{test}\models \textit{sunny}(x) \supset \textsf {Recall}_{\ell ,0.95}(x)\).

Since C should detect a pedestrian also on a snow-covered road, it should be tested with the data collected on snowy days. If we have a recall of 0.8 on snowy days, this is represented by \(\textit{w}_\mathsf{test}\models \textit{snowy}(x) \supset \textsf {Recall}_{\ell ,0.8}(x)\).

More generally, if the classifier C achieves a recall of more than 0.9 in situations \(\gamma _1, \gamma _2, \ldots , \gamma _m\), this can be represented by \(\textit{w}_\mathsf{test}\models \bigwedge _{i=1}^{m} \bigl ( \gamma _i(x) \supset \textsf {Recall}_{\ell ,(0.9, 1]}(x) \bigr )\).

Fig. 3
figure 3

The robustness compares the conditional probability in the test dataset \(\textit{w}_\mathsf{test}\) with that in another possible world \(w'\) that is close to \(\textit{w}_\mathsf{test}\) in terms of \(\mathcal {R}_{x}^{\varepsilon ,\textit{W}_{\textit{d}}}\). Note that an adversary’s choice of the input distribution \(\sigma _{w'}(x)\) is formalized as a non-deterministic choice of the possible world \(w'\)

5.3 Generalization error

We next formalize the generalization error of a classifier, i.e., a measure of how accurately a classifier is able to predict the class of previously unseen input data. Since a classifier has been trained on a finite sample training dataset \(\textit{w}_{\,\mathsf train}\), it may be overfitted to \(\textit{w}_{\,\mathsf train}\) and have worse classification performance on new input data that have not been included in \(\textit{w}_{\,\mathsf train}\).

To formalize the generalization error, we introduce a formula \(\lambda _{L}(y, \hat{y})\) to represent that given a correct label y and a predicted label \(\hat{y}\), the expected value of losses (i.e., real numbers representing the penalty for incorrect classification) is at most a non-negative real number L. Formally, the semantics of \(\lambda _{L}(y, \hat{y})\) is given by:

$$\begin{aligned} w \models \lambda _{L}(y, \hat{y}) ~~ \text{ iff } ~ \mathop {\mathbb {E}}\limits _{(v, \hat{v}) \sim \sigma _{w}(y, \hat{y}) }\quad \textit{loss}(v, \hat{v}) \le L {,} \end{aligned}$$

where \(\textit{loss}\) is a loss function selected according to the data domain \(\mathcal {O}\), and a pair \((v, v')\) of a correct label and a predicted label follows the joint distribution \(\sigma _{w}(y, \hat{y})\).

Now the generalization error being L or smaller at a true distribution \(\textit{w}_{\,\mathsf true}\) is written as \(\textit{w}_{\,\mathsf true}\models \textsf {GE}_{L}(x, y, \hat{y})\) where:

$$\begin{aligned} \textsf {GE}_{L}(x, y, \hat{y}) {\mathop {=}\limits ^{\text{ def }}}\bigl ( h(x, y) \wedge \psi (x, \hat{y}) \bigr ) \supset \lambda _{L}(y, \hat{y}) {.} \end{aligned}$$

Since we usually cannot obtain the true distribution \(\textit{w}_{\,\mathsf true}\) and cannot check the satisfaction \(\textit{w}_{\,\mathsf true}\models \textsf {GE}_{L}(x, y, \hat{y})\), we often compute an empirical error (as an approximation of the generalization error) by using a finite test dataset \(\textit{w}_\mathsf{test}\) that is believed to be an approximation of \(\textit{w}_{\,\mathsf true}\). This testing can be expressed as \(\textit{w}_\mathsf{test}\models \textsf {GE}_{L}(x, y, \hat{y})\).

On the other hand, given a training dataset \(\textit{w}_{\,\mathsf train}\), the training error being at most \(\textit{L}_{\textsf {train}}\) is represented by \(\textit{w}_{\,\mathsf train}\models \textsf {GE}_{\textit{L}_{\textsf {train}}}(x, y, \hat{y})\). Then, the overfitting of the classifier can be evaluated by comparing the empirical error L with the training error \(\textit{L}_{\textsf {train}}\). When the empirical error is smaller than \(\textit{L}_{\textsf {train}}+ \varepsilon \) for some error bound \(\varepsilon > 0\), this can be represented by \(\textit{w}_\mathsf{test}\models \textsf {GE}_{\textit{L}_{\textsf {train}}+ \varepsilon }(x, y, \hat{y})\).

6 Formalizing the robustness of classifiers

Many recent studies have found attacks on machine learning where a malicious adversary manipulates the input to cause a malfunction in a machine learning task [12]. Such input data, called adversarial examples [40], are designed to make a classifier fail to predict the actual class \(\ell \) of the input, but are recognized to belong to \(\ell \) from human eyes. In computer vision, for example, Goodfellow et al. [20] create an adversarial example by adding undetectable noise to a panda’s photograph so that humans can still recognize the perturbed image as a panda, but a classifier misclassifies it as a gibbon. To prevent or mitigate such attacks, the classifier should be robust against perturbed input, i.e., it should return similar predicted labels given similar input data.

In this section, we formalize robustness notions for classifiers by using epistemic operators in StatEL (see Fig. 3 for an overview of the formalization). Furthermore, we show certain relationships between classification performance and robustness, and suggest a class of robustness properties that have not been formalized in the literature as far as we know. We present an overview of these formalizations and relationships in Fig. 4.

6.1 Total correctness of classifiers

We first note that the total correctness of classifiers could be formalized as a classification performance (e.g., precision, recall, or accuracy) in the presence of all possible inputs from adversaries. For example, the total correctness could be formalized as \(\mathfrak {M}\models \textsf {Recall}_{\ell ,I}(x)\), which represents that \(\textsf {Recall}_{\ell ,I}(x)\) is satisfied in all possible worlds of \(\mathfrak {M}\).

In practice, however, it is not possible or tractable to test whether the classification performance is achieved for all possible test datasets (corresponding to an infinite number of possible worlds in \(\mathfrak {M}\)). Hence we need a weaker form of a correctness notion, which may be verified or tested in some way. In the following sections, we deal with robustness notions that are weaker than total correctness.

6.2 Accessibility relation for robustness

To formalize robustness notions, we introduce an accessibility relation \(\mathcal {R}_{x}^{\varepsilon ,\textit{W}_{\textit{d}}}\) that relates two worlds having closer inputs as follows.

Definition 8

(Accessibility relation for robustness) We define an accessibility relation \(\mathcal {R}_{x}^{\varepsilon ,\textit{W}_{\textit{d}}}\subseteq \mathcal {W}\times \mathcal {W}\) by:

$$\begin{aligned} \mathcal {R}_{x}^{\varepsilon ,\textit{W}_{\textit{d}}}{\mathop {=}\limits ^{\text{ def }}}\left\{ (w, w') \in \mathcal {W}\times \mathcal {W}\,\mid \, \textit{W}_{\textit{d}}(\sigma _{w}(x),\, \sigma _{w'}(x)) \le \varepsilon \right\} {,} \end{aligned}$$

where \(\textit{W}_{\textit{d}}\) is \(\infty \)-Wasserstein distance w.r.t. a metric \(\textit{d}\) in Definition 2.

Then, \( (w, w') \in \mathcal {R}_{x}^{\varepsilon ,\textit{W}_{\textit{d}}}\) represents that the two distributions \(\sigma _{w}(x)\) and \(\sigma _{w'}(x)\) of inputs to the classifier C are close in terms of the distance \(\textit{W}_{\textit{d}}\).Footnote 6 Intuitively, for example, \(\textit{W}_{\textit{d}}\) means the distance between two image datasets \(\sigma _{w}(x)\) and \(\sigma _{w'}(x)\) when the distance between individual images are measured by a metric \(\textit{d}\).

Then, an epistemic formula \(\mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\varphi \) represents that we are confident that \(\varphi \) is true even when the input data are perturbed by noise of the level \(\varepsilon \) or smaller.

6.3 Probabilistic robustness against targeted attacks

When a robustness attack aims at misclassifying an input as a specific target label \(\hat{\ell }_{\textsf {tar}}\), then it is called a targeted attack. For instance, in the above-mentioned attack by [20], a gibbon is the target into which a panda’s photograph is misclassified.

In this section, we discuss how we formalize robustness using the epistemic operator \(\mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\). We denote by \(v\in \mathcal {D}\) an original input image in the test dataset \(\textit{w}_\mathsf{test}\), and by \(\widetilde{v}\in \mathcal {D}\) an image obtained by perturbing the original image v by noise.

A first definition of robustness against targeted attacks might be:

For any \(v, \widetilde{v}\in \mathcal {D}\),  if \(H(v) = \mathsf{panda} \text{ and } \textit{d}(v, \widetilde{v}) \le ~\varepsilon \), then \(C(v') \ne \mathsf{gibbon}\),

which represents that when an image \(\widetilde{v}\) is obtained by perturbing a panda’s photograph v by noise, then it will not be classified as the target label gibbon at all. This can be formalized using StatEL by:

$$\begin{aligned} \mathfrak {M}, \textit{w}_\mathsf{test}\models h_\mathsf{panda}(x) \supset \mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\mathop {\mathbb {P}_{0}} \psi _\mathsf{gibbon}(x) {.} \end{aligned}$$

However, this notion does not accept a negligible probability of misclassification, and does not cover the case where the human cannot recognize the perturbed image \(\widetilde{v}\) as panda (e.g., when the perturbed image \(\widetilde{v}\) is obtained by linear displacement, rescaling, and rotation [2], then \(H(\widetilde{v}) \ne \mathsf{panda}\) may hold).

To overcome these issues, we introduce the following definition with some conditional probability \(\delta \) of misclassification as follows.

Definition 9

(Targeted robustness) Let \(\delta \in [0, 1]\). Given a dataset \(\textit{w}_\mathsf{test}\), a classifier C satisfies probabilistic targeted robustness w.r.t. an actual label \(\ell \) and a target label \(\hat{\ell }_{\textsf {tar}}\) if for any input \(v \in {\texttt {supp}}(\sigma _{\textit{w}_\mathsf{test}}(x))\) from the dataset \(\textit{w}_\mathsf{test}\), and for any perturbed input \(\widetilde{v}\in \mathcal {D}\) s.t. \(\textit{d}(v, v') \le \varepsilon \), we have:

$$\begin{aligned} \Pr [\, C(\widetilde{v}) = \hat{\ell }_{\textsf {tar}}\mid H(\widetilde{v}) = \ell \,] \le \delta {.} \end{aligned}$$
(1)

For instance, when the actual class \(\ell \) is \(\mathsf{panda}\) and the target label \(\hat{\ell }_{\textsf {tar}}\) is \(\mathsf{gibbon}\), then the classifier C misclassifies a panda’s photograph as \(\mathsf{gibbon}\) with only a small probability \(\delta \).

Now we express this robustness notion with \(I = [1-\delta , 1]\) by using \(\text{ StatEL }\).

Proposition 2

(Targeted robustness) Let \(I \subseteq [0, 1]\). The probabilistic targeted robustness w.r.t. an actual label \(\ell \) and a target label \(\hat{\ell }_{\textsf {tar}}\) under a given test dataset \(\textit{w}_\mathsf{test}\) is expressed by \(\,\textit{w}_\mathsf{test}\models \textsf {TRobust}_{\ell , \hat{\ell }_{\textsf {tar}}, I}(x)\) where:

$$\begin{aligned} \textsf {TRobust}_{\ell , \hat{\ell }_{\textsf {tar}}, I}(x) {\mathop {=}\limits ^{\text{ def }}}\, \mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\bigl ( h_{\ell }(x) \supset \mathop {\mathbb {P}_{I}} \lnot \, \psi _{\hat{\ell }_{\textsf {tar}}}(x) \bigr ). \end{aligned}$$

Proof

Let \(w'\) be a possible world such that \((\textit{w}_\mathsf{test}, w')\in \mathcal {R}_{x}^{\varepsilon ,\textit{W}_{\textit{d}}}\). Then, \(w'\) corresponds to the dataset obtained by perturbing each data in w. Let \(\widetilde{v}\in {\texttt {supp}}(\sigma _{w'}(x))\). Then, \(\widetilde{v}\) represents a perturbed input. Let \(w'' = w'|_{h_{\ell }(x)}\). Then, (1) is logically equivalent to \(w'' \models \mathop {\mathbb {P}_{[0, \delta ]}} \psi _{\hat{\ell }_{\textsf {tar}}}(x)\). By Definition 6, \(w' \models h_{\ell }(x) \supset \mathop {\mathbb {P}_{[0, \delta ]}} \psi _{\hat{\ell }_{\textsf {tar}}}(x)\). By \(I = [1-\delta , 1]\)\(w' \models h_{\ell }(x) \supset \mathop {\mathbb {P}_{I}} \lnot \, \psi _{\hat{\ell }_{\textsf {tar}}}(x)\). Therefore, this proposition follows from the semantics for \(\mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\). \({\square }\)

Since the \(L^p\)-distancesFootnote 7  are often regarded as reasonable approximations of human perceptual distances [10], they are used as distance constraints on the perturbation in many researches on targeted attacks (e.g. [10, 20, 40]). Our model can represent the robustness against these attacks by using the \(L^p\)-distance as a metric \(\textit{d}\) for \(\mathcal {R}_{x}^{\varepsilon ,\textit{W}_{\textit{d}}}\).

Fig. 4
figure 4

Robustness notions and their relationships

6.4 Probabilistic robustness against non-targeted attacks

In this section, we formalize non-targeted attacks [32, 33] in which adversaries try to misclassify inputs as some arbitrary incorrect labels (i.e., not as a specific label like a gibbon). Compared to targeted attacks, this kind of attacks are easier to mount, but harder to defend.

We first define the notion of robustness against non-targeted attacks as follows.

Definition 10

(Non-targeted robustness) Let \(\delta \in [0, 1]\). Given a dataset \(\textit{w}_\mathsf{test}\), a classifier C satisfies probabilistic non-targeted robustness w.r.t. an actual label \(\ell \) if for any input \(v \in {\texttt {supp}}(\sigma _{\textit{w}_\mathsf{test}}(x))\) from the dataset \(\textit{w}_\mathsf{test}\), and for any perturbed input \(\widetilde{v}\in \mathcal {D}\) s.t. \(\textit{d}(v, v') \le \varepsilon \), we have:

$$\begin{aligned} \Pr [\, C(\widetilde{v}) = \ell \mid H(\widetilde{v}) = \ell \,] > 1 - \delta {.} \end{aligned}$$

Now we express this robustness notion with \(I = [1-\delta , 1]\) by using \(\text{ StatEL }\).

Proposition 3

(Non-targeted robustness) Let \(I \subseteq [0, 1]\). The probabilistic non-targeted robustness under a test dataset \(\textit{w}_\mathsf{test}\) is expressed by \(\,\textit{w}_\mathsf{test}\models \textsf {Robust}_{\ell , I}(x)\) where:

$$\begin{aligned} \textsf {Robust}_{\ell , I}(x)&{\mathop {=}\limits ^{\text{ def }}}\, \mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\bigl ( h_{\ell }(x) \supset \mathop {\mathbb {P}_{I}} \psi _\ell (x) \bigr )\\&= \mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\textsf {Recall}_{\ell , I}(x) {.} \end{aligned}$$

Proof

The proof is analogous to that for Proposition 2. \(\square \)

6.5 Relationships among robustness notions

In this section, we present relationships among notions of robustness and performance, and discuss properties related to robustness.

We first present the following proposition immediate from the definitions.

Proposition 4

(Relationships among notions) Let \(I \subseteq [0, 1]\) and \(\ell , \hat{\ell }_{\textsf {tar}}\in \texttt {L}\). Then we have:

  1. 1.

    \(\textit{w}_\mathsf{test}\models \textsf {Robust}_{\ell , I}(x)\) implies \(\textit{w}_\mathsf{test}\models \textsf {TRobust}_{\ell , \hat{\ell }_{\textsf {tar}}, I}(x)\).

  2. 2.

    \(\textit{w}_\mathsf{test}\models \textsf {Robust}_{\ell , I}(x)\) implies \(\mathfrak {M}, \textit{w}_\mathsf{test}\models \textsf {Recall}_{\ell , I}(x)\).

The first claim means that probabilistic non-targeted robustness is not weaker than probabilistic targeted robustness for the same I. The second claim means that probabilistic non-targeted robustness implies recall without perturbation noise. Note that this is immediate from the reflexivity of \(\mathcal {R}_{x}^{\varepsilon ,\textit{W}_{\textit{d}}}\).

Next we remark that our extension of StatEL can be used to describe a certain situation where adversarial attacks are mitigated. When we apply some mechanism T that preprocesses a given input to mitigate attacks on robustness, then the probabilistic targeted robustness is expressed as \(\textit{w}_\mathsf{test}\models \varDelta _{T}\textsf {Robust}_{\ell , I}(x)\) where \(\varDelta _{T}\) is the modality for the dataset transformation T.

Finally, we recall that by Proposition 3, robustness can be regarded as recall in the presence of perturbed noise. This implies that for each property \(\varphi \) in the table of confusion (Table 1), we could consider \(\mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\varphi \) as a property to evaluate the classification performance in the presence of adversarial inputs although this has not been formalized in the literature of robustness of machine learning as far as we recognize. For example, precision robustness \(\mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\textsf {Precision}_{\ell ,i}(x)\) represents that in the presence of perturbed noise, the prediction is correct with a probability i given that it is positive. For another example, accuracy robustness \(\mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\textsf {Accuracy}_{\ell ,i}(x)\) represents that in the presence of perturbed noise, the prediction is correct (whether it is positive or negative) with a probability i.

Example 2

(Robustness of pedestrian detection) We illustrate robustness notions using the pedestrian detection in Example 1 in Sect. 5.2. We deal with a binary classifier C that detects whether a pedestrian is crossing the road in a photo image x.

The non-targeted robustness \(\mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\textsf {Recall}_{\ell , 0.9}(x)\) represents that in the presence of perturbed noise to the input image x, with probability 0.9 the classifier C can detect a person crossing the road when the human can actually recognize. This robustness is crucial for an autonomous car not to hit a pedestrian.

The precision robustness \(\mathop {\textsf {K}^{\varepsilon ,\textit{W}_{\textit{d}}}}\textsf {Precision}_{\ell ,0.9}(x)\) represents that in the presence of perturbed noise to x, with probability 0.9 the human can actually recognize a person crossing the road when the classifier C detects it. This type of robustness is important for an autonomous car to avoid stopping suddenly due to a false alarm (not take the crash from the car behind).

7 Formalizing the fairness of classifiers

Many studies have proposed and investigated various notions of fairness in machine learning [5]. Informally, these fairness notions mean that the results of machine learning tasks are irrelevant of some sensitive attributes, e.g., gender, age, race, disease, political/religious view. In a recently few years, there have been studies on the testing methods for fairness of machine learning [1, 18, 42].

In this section, we formalize popular notions of fairness of supervised learning by using our extension of StatEL. Here, we focus on the fairness that should be maintained in the impact (i.e., the results of machine learning tasks) rather than the treatment (i.e., the process of machine learning tasks). This is because previous research show that many seemingly neutral features have statistical relationships with sensitive attributes, and hence just ignoring or removing sensitive attributes in the process of data preparation and trainingFootnote 8 is often ineffective or harmful to achieve the fairness and performance of learning tasks.

7.1 Basic ideas and notations

Various notions of fairness in supervised learning are classified into three categories: independence, separation, and sufficiency [5]. All of these have the form of (conditional) independence or its relaxation, and thus can be formalized using the modal operator \(\mathbin {\sim }_{x}^{\varepsilon ,D}\) for conditional indistinguishability (defined in Sect. 3.4) in our extension of StatEL.Footnote 9

In the formalization of fairness notions, we use a distributional Kripke model \(\mathfrak {M}=(\mathcal {W}, (\mathcal {R}_a)_{a\in \mathcal {A}}, (V_s)_{s\in \mathcal {S}})\). Recall that x, y, and \(\hat{y}\) are measurement variables, respectively, denoting the input datum, the actual class label (given by the oracle H), and the predicted label (output by the classifier C). Given a real world \(\textit{w}_\mathsf{test}\) (corresponding to a given test dataset), \(\sigma _{\textit{w}_\mathsf{test}}(x)\) is the probability distribution of C’s test input over \(\mathcal {D}\), \(\sigma _{\textit{w}_\mathsf{test}}(y)\) is the distribution of the actual label over \(\texttt {L}\), and \(\sigma _{\textit{w}_\mathsf{test}}(\hat{y})\) is the distribution of C’s output over \(\texttt {L}\).

Fairness notions are usually defined in terms of some sensitive attribute (e.g., gender, age, race, disease, political/religious view), which is defined as a tuple of subsets of the input data domain \(\mathcal {D}\). For example, a sensitive attribute based on ages can be defined as a pair of groups \(G_0\) (input data with ages 21 to 60) and \(G_1\) (ages 61 to 100). For each group \(G\subseteq \mathcal {D}\) of inputs, we introduce a static formula \(\eta _{G}(x)\) representing that an input x belongs to G. Formally, this is interpreted by:

$$\begin{aligned} \text{ For } \text{ each } \text{ state } s\in \mathcal {S}\text{, } ~ s\models \eta _{G}(x) ~ \text{ iff } ~ \sigma _{s}(x) \in G. \end{aligned}$$

Roughly speaking, a machine learning task is said to be fair if the performance of the task for a group \(G_0\)’s input is similar to that for another group \(G_1\)’s input.Footnote 10 In the following sections, we formalize the three categories of fairness of classifiers and their relaxation. A summary of this formalization is presented in Table 2.

Table 2 Popular notions of fairness of machine learning

7.2 Independence (a.k.a. group fairness, statistical parity) and its relaxation

In this section, we explain and formalize the notion of independence [9], which is also known as group fairness [15],Footnote 11 and its relaxed notion. Intuitively, independence means that the predicted label \(\hat{y}\) does not have statistical relationships with the membership in a sensitive group. For example, independence does not allow a bank’s lending rate to be correlated with a sensitive attribute such as gender.

We first present the definition of a relaxed notion of independence, called group fairness up to bias \(\varepsilon \) [15] as follows. Intuitively, this is the property that the output distributions of the classifier are roughly identical when input data belong to different groups.

Formally, this fairness notion is defined as follows.

Definition 11

(Independence, group fairness) Let \(G_0, G_1 \subseteq \mathcal {D}\) be sets of input data constituting a sensitive attribute. For each \(b = 0, 1\), let \(\mu _{G_b}\in \mathbb {D}\texttt {L}\) be the probability distribution of the predicted label \(\hat{\ell }\) output by a classifier C when an input v is sampled from a test dataset \(\textit{w}_\mathsf{test}\) and belongs to \(G_b\); i.e., for each \(\hat{\ell }\in \texttt {L}\),

$$\begin{aligned} \mu _{G_b}[\hat{\ell }\,]&{\mathop {=}\limits ^{\text{ def }}}\Pr [\, C(v) = \hat{\ell }\,\,|\, v {\mathop {\leftarrow }\limits ^{{\$}}}\sigma _{\textit{w}_\mathsf{test}}(x) \text{ and } v \in G_b \,] {.} \end{aligned}$$
(2)

Then, a classifier C satisfies the group fairness between groups \(G_0\) and \(G_1\) up to bias \(\varepsilon \) if \(\textit{D}_\mathsf{tv}( \mu _{G_0} \Vert \mu _{G_1} ) \le \varepsilon \), where \(\textit{D}_\mathsf{tv}\) is the total variation between distributions (defined in Sect. 2.2). A classifier C satisfies independence w.r.t. groups \(G_0\) and \(G_1\) if it satisfies the group fairness between \(G_0\) and \(G_1\) up to bias 0.

Now we express this fairness notion using our extension of StatEL as follows.

Proposition 5

(Independence, group fairness) The group fairness between groups \(G_0\) and \(G_1\) up to bias \(\varepsilon \) under a given test dataset \(\textit{w}_\mathsf{test}\) is expressed as \(\textit{w}_\mathsf{test}\models \textsf {GrpFair}_{\varepsilon }(x, \hat{y})\) where:

$$\begin{aligned}&\textsf {GrpFair}_{\varepsilon }(x, \hat{y}) {\mathop {=}\limits ^{\text { def }}}\\&\bigl ( \eta _{G_{0}}(x) \wedge \psi (x, \hat{y}) \bigr ) \mathbin {\sim }_{\hat{y}}^{\varepsilon ,\textit{D}_\mathsf {tv}}\bigl ( \eta _{G_{1}}(x) \wedge \psi (x, \hat{y}) \bigr ) {.} \end{aligned}$$

Independence (without bias \(\varepsilon \)) is expressed by \(\textit{w}_\mathsf{test}\models \textsf {GrpFair}_{0}(x, \hat{y})\).

Proof

Let \(w_b = \textit{w}_\mathsf{test}|_{\eta _{G_b}(x) \wedge \psi (x, \hat{y})}\). It follows from (2) that for each \(\hat{\ell }\in \texttt {L}\), \( \mu _{G_b}[\hat{\ell }\,] = \Pr [\, \sigma _{s}(\hat{y}) = \hat{\ell }\,\mid \, s {\mathop {\leftarrow }\limits ^{{\$}}}w_b \,], \) hence \(\mu _{G_b} = \sigma _{w_b}(\hat{y})\). Thus, by Definition 11, the group fairness between groups \(G_0\) and \(G_1\) up to bias \(\varepsilon \) is given by \(\textit{D}_\mathsf{tv}( \sigma _{w_0}(\hat{y}) \Vert \sigma _{w_1}(\hat{y}) ) \le \varepsilon \). Therefore, this proposition follows from Lemma 1. \({\square }\)

Example 3

(Independence in pedestrian detection) We illustrate independence using the pedestrian detection in Example 1 in Sect. 5.2. We deal with a binary classifier C that detects whether or not a pedestrian is crossing the road in an image x. We write \(\eta _{\text {m}}(x)\) (resp. \(\eta _{\text {w}}(x)\)) to represent that an image x includes a man (resp. woman) that may or not be crossing the road. Let \(\psi (x, \hat{y})\) represent that given an input image x, the classifier C returns \(\hat{y}\) (that is either the detection of a person crossing the road or not).

Then, the independence between men and women \(\textsf {GrpFair}_{0}(x, \hat{y})\) \({{\mathop {=}\limits ^{\text{ def }}}} \bigl ( \eta _{\text {m}}(x) \wedge \psi (x, \hat{y}) \bigr ) \mathbin {\sim }_{\hat{y}}^{0,\textit{D}_\mathsf{tv}} \bigl ( \eta _{\text {w}}(x) \wedge \psi (x, \hat{y}) \bigr ) \) means that the probability of detecting a pedestrian crossing the road is the same between men and women. This fairness guarantees that men and women are equally detectable as pedestrians, hence equally safe against an autonomous car. Here, independence does not rely on the actual label y, i.e., on whether there is a pedestrian crossing the road that can be detected by human eyes.

7.3 Separation (a.k.a. equalized odds) and its relaxation (equal opportunity)

In this section, we explain and formalize the notion of separation [5],Footnote 12 which is well-known as equalized odds [22], and its relaxed notion called equal opportunity [22]. The motivation behind these notions is to capture typical scenarios in which sensitive characteristics may have statistical relationships with the actual class label. For instance, even when some sensitive attribute is correlated with an actual default rate on loans, banks might want to have a different lending rate for people who have a higher default rate. However, independence (group fairness) does not allow this, since it requires that the lending rate should be statistically independent of the sensitive attribute.

To overcome this problem, the notion of separation allows statistical relationships between a sensitive attribute and the predicted label \(\hat{y}\) output by the classifier C to the extent that this is justified by the actual class label y. More precisely, separation means that the predicted label \(\hat{y}\) is conditionally independent of the membership in a sensitive group, given an actual class label y.

Formally, separation is defined as a property that recall (true positive rate) and specificity (true negative rate, explained in Table 1) are the same for all the groups, and equal opportunity is defined as a special case of separation only for an advantageous class label.

Definition 12

(Separation and equal opportunity) Given a group \(G_b \subseteq \mathcal {D}\) and an actual class label \(\ell \), let \(\mu _{G_b,\ell }\in \mathbb {D}\texttt {L}\) be the probability distribution of the predicted label \(\hat{\ell }\) output by a classifier C when an input \(v\in G_b\) is sampled from a test dataset \(\textit{w}_\mathsf{test}\) and is associated with an actual label \(\ell \); i.e., for each \(\hat{\ell }\in \texttt {L}\),

$$\begin{aligned} \mu _{G_b,\ell }[\hat{\ell }\,]&{\mathop {=}\limits ^{\text{ def }}}\Pr [\, C(v) = \hat{\ell }\,|\, v {{\mathop {\leftarrow }\limits ^{{\$}}}} \sigma _{\textit{w}_\mathsf{test}}(x), v \in G_b, H(v)= \ell \,\,] {.} \end{aligned}$$
(3)

A classifier C satisfies separation between two groups \(G_0\) and \(G_1\) if \(\mu _{G_0,\ell } = \mu _{G_1,\ell }\) holds for all \(\ell \in \texttt {L}\). A classifier C satisfies equal opportunity of an advantageous label \(\ell \) w.r.t. a group \(G_0\) if \(\mu _{G_0,\ell } = \mu _{G_1,\ell }\) where \(G_1 = \mathcal {D}{\setminus } G_0\).

Now we express these two notions using our extension of StatEL as follows.

Proposition 6

(Separation) Let \(\gamma (x, \ell , \hat{y}) {\mathop {=}\limits ^{\text{ def }}}\psi (x, \hat{y}) \wedge h_{\ell }(x)\). The separation between two groups \(G_0\) and \(G_1\) under a given test dataset \(\textit{w}_\mathsf{test}\) is expressed as \(\textit{w}_\mathsf{test}\models \textsf {EqOdds}_{0}(x, \hat{y})\) where:

$$\begin{aligned}&\textsf {EqOdds}_{\varepsilon }(x, \hat{y}) {\mathop {=}\limits ^{\text{ def }}}\\&\bigwedge _{\ell \in \texttt {L}} \Bigl ( \bigl ( \eta _{G_0}(x) \wedge \gamma (x, \ell , \hat{y}) \bigr ) \mathbin {\sim }_{\hat{y}}^{\varepsilon ,\textit{D}_\mathsf{tv}} \bigl ( \eta _{G_1}(x) \wedge \gamma (x, \ell , \hat{y}) \bigr ) \Bigr ) {.} \end{aligned}$$

Proof

Let \(\ell \in \texttt {L}\) and \(w_{b,\ell } = \textit{w}_\mathsf{test}|_{\eta _{G_b}(x) \wedge \psi (x, \hat{y}) \wedge h_{\ell }(x)}\). It follows from (3) that:

$$\begin{aligned} \mu _{G_b,\ell }[\hat{\ell }\,] = \Pr [\, \sigma _{s}(\hat{y}) = \hat{\ell }\,\mid \, s {\mathop {\leftarrow }\limits ^{{\$}}}w_{b,\ell } \,], \end{aligned}$$

hence \(\mu _{G_b,\ell } = \sigma _{w_{b,\ell }}(\hat{y})\). Thus, by Definition 12, the separation between \(G_0\) and \(G_1\) is given by \(\sigma _{w_{0,\ell }}(\hat{y}) = \sigma _{w_{1,\ell }}(\hat{y})\) for all \(\ell \in \texttt {L}\). Therefore, this proposition follows from Proposition 1. \({\square }\)

It should be noted that for \(\varepsilon > 0\)\(\textsf {EqOdds}_{\varepsilon }(x, \hat{y})\) represents a relaxation of separation up to bias \(\varepsilon \) in terms of total variation \(\textit{D}_\mathsf{tv}\).

Proposition 7

(Equal opportunity) Let \(\gamma (x, \ell , \hat{y}) {{\mathop {=}\limits ^{\text{ def }}}} \psi (x, \hat{y}) \wedge h_{\ell }(x)\). The equal opportunity of a label \(\ell \) w.r.t. a group \(G_0\) under a given test dataset \(\textit{w}_\mathsf{test}\) is expressed as \(\textit{w}_\mathsf{test}\models \textsf {EqOpp}(x, \hat{y})\) where:

$$\begin{aligned}&\textsf {EqOpp}(x, \hat{y}) {\mathop {=}\limits ^{\text{ def }}}\\&\bigl ( \eta _{G_0}(x) \wedge \gamma (x, \ell , \hat{y}) \bigr ) \mathbin {\sim }_{\hat{y}}^{0,\textit{D}_\mathsf{tv}} \bigl ( \lnot \eta _{G_0}(x) \wedge \gamma (x, \ell , \hat{y}) \bigr ) {.} \end{aligned}$$

Proof

The proof of this proposition is similar to that of Proposition 6. Let \(G_1 = \mathcal {D}{\setminus } G_0\). By \(\mu _{G_b,\ell } = \sigma _{w_{b,\ell }}(\hat{y})\), the equal opportunity of \(\ell \) w.r.t. \(G_0\) is given by \(\sigma _{w_{0,\ell }}(\hat{y}) = \sigma _{w_{1,\ell }}(\hat{y})\). Therefore, this proposition follows from Proposition 1. \({\square }\)

Example 4

(Separation in pedestrian detection) We illustrate separation using the pedestrian detection in Example 3 where a binary classifier C detects whether a pedestrian is crossing the road in an image x. Let \(\psi (x, \hat{y})\) (resp. h(xy)) represent that given an image x,  the classifier C (resp. human) returns \(\hat{y}\) (resp. y) representing either detection or not.

The level of the inherent technical difficulty of detecting a female pedestrian may be different from that of a male pedestrian, because, for example, the physical appearance may tend to be different between women and men. If we take this possible difference into account, separation can be suited instead of independence.

The separation \(\textsf {EqOdds}_{0}(x, \hat{y})\) between men and women guarantees that the conditional probability of detecting a pedestrian crossing the road when the human can actually recognize it, is the same between men and women. This fairness implies that (from the viewpoint of a pedestrian crossing the road) male and female pedestrians may be hit by an autonomous car as fairly as by the human-driven car.

7.4 Sufficiency (a.k.a. conditional use accuracy equality)

In this section we explain and formalize the notion of sufficiency [5], which is also known as conditional use accuracy equality [6].

While separation guarantees the equality of recall among different groups, sufficiency requires the equality of precision. More precisely, sufficiency is defined as the property that precision (positive predictive value) and negative predictive value (presented as NPV in Table 1) are the same for all the groups as follows.

Definition 13

(Sufficiency) Given a group \(G_b \subseteq \mathcal {D}\) and a predicted label \(\hat{\ell }\), let \(\mu _{G_b,\hat{\ell }}\in \mathbb {D}\texttt {L}\) be the probability distribution of the actual class label \(\ell \) when an input \(v\in G_b\) is sampled from a test dataset \(\textit{w}_\mathsf{test}\) and the classifier C outputs the predicted label \(\hat{\ell }\); i.e., for each \(\ell \in \texttt {L}\),

$$\begin{aligned} \mu _{G_b,\hat{\ell }}[\ell \,]&{\mathop {=}\limits ^{\text{ def }}}\Pr [\, H(v) = \ell \,|\, v {{\mathop {\leftarrow }\limits ^{{\$}}}} \sigma _{\textit{w}_\mathsf{test}}(x), v \in G_b, C(v) = \hat{\ell }\,\,] {.} \end{aligned}$$
(4)

A classifier C satisfies sufficiency between two groups \(G_0\) and \(G_1\) if \(\mu _{G_0,\hat{\ell }} = \mu _{G_1,\hat{\ell }}\) holds for all \(\hat{\ell }\in \texttt {L}\).

Then, this notion can be expressed using our extension of StatEL as follows.

Proposition 8

(Sufficiency) Let \(\gamma '(x,y,\hat{\ell }) {\mathop {=}\limits ^{\text{ def }}}\psi _{\hat{\ell }}(x) \wedge h(x, y)\). The sufficiency between two groups \(G_0\) and \(G_1\) under a given test dataset \(\textit{w}_\mathsf{test}\) is expressed as \(\textit{w}_\mathsf{test}\models \textsf {Sufficency}_{0}(x, y)\) where:

$$\begin{aligned}&\textsf {Sufficency}_{\varepsilon }(x, y) {\mathop {=}\limits ^{\text{ def }}}\\&\bigwedge _{\hat{\ell }\in \texttt {L}} \Bigl ( \bigl ( \eta _{G_0}(x) \wedge \gamma '(x,y,\hat{\ell }) \bigr ) \mathbin {\sim }_{y}^{\varepsilon ,\textit{D}_\mathsf{tv}} \bigl ( \eta _{G_1}(x) \wedge \gamma '(x,y,\hat{\ell }) \bigr ) \Bigr ) {.} \end{aligned}$$

Proof

Let \(\hat{\ell }\in \texttt {L}\) and \(w_{b,\hat{\ell }} = \textit{w}_\mathsf{test}|_{\eta _{G_b}(x) \wedge \psi _{\hat{\ell }}(x) \wedge h(x, y)}\). It follows from (4) that:

$$\begin{aligned} \mu _{G_b,\hat{\ell }}[\ell \,] = \Pr [\, \sigma _{s}(y) = \ell \,\mid \, s {\mathop {\leftarrow }\limits ^{{\$}}}w_{b,\hat{\ell }} \,], \end{aligned}$$

hence \(\mu _{G_b,\hat{\ell }} = \sigma _{w_{b,\hat{\ell }}}(y)\). Thus, by Definition 13, the sufficiency between \(G_0\) and \(G_1\) is given by \(\sigma _{w_{0,\hat{\ell }}}(y) = \sigma _{w_{1,\hat{\ell }}}(y)\) for all \(\hat{\ell }\in \texttt {L}\). Therefore, this proposition follows from Proposition 1. \({\square }\)

It should be noted that for \(\varepsilon > 0\)\(\textsf {Sufficency}_{\varepsilon }(x, y)\) represents a relaxation of sufficiency up to bias \(\varepsilon \) in terms of total variation \(\textit{D}_\mathsf{tv}\).

Example 5

(Sufficiency in pedestrian detection) We illustrate sufficiency using the pedestrian detection in Example 3 where a classifier C detects whether a pedestrian is crossing the road in an image x. As mentioned in Example 4, the level of the inherent technical difficulty of detecting a male pedestrian may be different from that of a female pedestrian. Whereas separation guarantees the equality of recall between men and women, sufficiency guarantees that of precision.

The sufficiency \(\textsf {Sufficency}_{0}(x, y)\) between men and women implies that the conditional probability that there is no pedestrian crossing the road when C detects it, is the same between men and women. From the viewpoint of the car driver, when C raises a false alarm and stops the car suddenly, we have no bias about which of men and women are more likely to trigger false alarms and to be blamed for that.

8 Related work

In this section, we provide a brief overview of related work on the specification of statistical machine learning and on epistemic logic for describing specification.

Desirable properties of statistical machine learning There have been a large number of papers on attacks and defences for deep neural networks [12, 40]. Compared to them, however, not much work has been done to explore the formal specification of various properties of machine learning. Seshia et al. [38] present a list of desirable properties of DNNs (deep neural networks) although most of the properties are presented informally without mathematical formulas. As for robustness, Dreossi et al. [13] propose a unifying formalization of adversarial input generation in a rigorous and organized manner, although they formalize and classify attacks (as optimization problems) rather than define the robustness notions themselves.

Concerning the fairness notions, Barocas et al. [5] survey various fairness notions and classify them into the three categories: independence, separation, and sufficiency. Gajane [17] surveys the formalization of fairness notions for machine learning and present some justification based on social science literature.

Epistemic logic for describing specification Epistemic logic [44] has been studied to represent and reason about knowledge and belief [16, 21], and has been applied to describe various properties of distributed systems.

The BAN logic [8], proposed by Burrows, Abadi and Needham, is a notable example of epistemic logic used to model and verify the authentication in cryptographic protocols. To improve the formalization of protocols’ behaviors, some epistemic approaches integrate process calculi [11, 24].

Epistemic logic has also been used to formalize and reason about privacy properties, including anonymity [19, 29, 39], receipt-freeness of electronic voting protocols [25], and privacy policy for social network services [34]. Temporal epistemic logic is used to express information flow security policies [3].

Concerning the formalization of fairness notions, previous work in formal methods has modeled different kinds of fairness involving timing by using temporal logic rather than epistemic logic. As far as we know, no previous work has formalized fairness notions of machine learning by using modal logic.

Formalization of statistical properties In studies of philosophical logic, Lewis [31] shows the idea that when a random value has various possible probability distributions, then those distributions should be represented on distinct possible worlds. Bana [4] puts Lewis’s idea in a mathematically rigorous setting. Recently, a modal logic called statistical epistemic logic (StatEL) [27] has been proposed and used to formalize statistical hypothesis testing and the notion of differential privacy [14].

To describe statistical properties of machine learning models, this work uses StatEL to formalize the probabilistically chosen input to a learning model and the non-deterministically chosen dataset. However, we could possibly employ other logics (e.g., fuzzy logic [45] or Markov logic network [37]) by extending them to deal with statistical sampling and non-deterministic inputs. Exploring the possibility of different formalization using other logics is left for future work.

9 Conclusion

In this paper we proposed an epistemic approach to the modeling of supervised learning and its desirable properties. Specifically, we employed a distributional Kripke model in which each possible world corresponds to a possible dataset and modal operators are interpreted as transformation and testing on datasets. Then, we formalized various notions of the classification performance, robustness, and fairness of statistical classifiers by using our extension of statistical epistemic logic (StatEL). In this formalization, we clarified relationships among properties of classifiers, and relevance between classification performance and robustness.

We emphasize that this is the first attempt to use epistemic models and logical formulas to describe statistical properties of machine learning, and would be a starting point to develop theories of formal specification of machine learning.

In future work, we are planning to extend our framework to formally reason about system-level properties of learning-based systems. We are also interested in developing a more general framework for the formal specification of machine learning associated with testing methods, as well as in implementing a prototype tool. Our future work will also include an extension of StatEL to formalize unsupervised learning and reinforcement learning.