SMLP: Symbolic Machine Learning Prover

Brauße, Franz; Khasidashvili, Zurab; Korovin, Konstantin

doi:10.1007/978-3-031-65627-9_11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14681))

Included in the following conference series:

International Conference on Computer Aided Verification

Abstract

Symbolic Machine Learning Prover (SMLP) is a tool and a library for system exploration based on data samples obtained by simulating or executing the system on a number of input vectors. SMLP aims at exploring the system based on this data by taking a grey-box approach: SMLP uses symbolic reasoning for ML model exploration and optimization under verification and stability constraints, based on SMT, constraint, and neural network solvers. In addition, the model exploration is guided by probabilistic and statistical methods in a closed feedback loop with the system’s response. SMLP has been applied in industrial setting at Intel for analyzing and optimizing hardware designs at the analog level. SMLP is a general purpose tool and can be applied to any system that can be sampled and modeled by machine learning models.

This research was supported by a grant from Intel Corporation.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Verification of assertions on machine learning (ML) models has received a wide attention from formal methods community in recent years, and multiple approaches have been developed for formal analysis of ML models, mostly focused on neural networks [9]. In this work we introduce the SMLP tool – Symbolic Machine Learning Prover – aiming at going beyond this mainstream in several ways: SMLP helps to approach the system’s design, optimization and verification as one process by offering multiple capabilities for system’s design space exploration. These capabilities include methods for selecting which parameters to use in modeling design for configuration optimization and verification; ensuring that the design is robust against environmental effects and manufacturing variations that are impossible to control, as well as ensuring robustness against malicious attacks from an adversary aiming at altering the intended configuration or mode of operation. Environmental affects like temperature fluctuation, electromagnetic interference, manufacturing variation, and product aging effects are especially more critical for correct and optimal operation of devices with analog components, which is our current focus.

To address these challenges, SMLP offers multiple modes of design space exploration, which are based on symbolic reasoning using SMT solvers guided by statistical and probabilistic methods. These modes will be described in detail in Sect. 6. The definition of these modes refers to the concept of stability of an assignment to system’s parameters that satisfies all model constraints (which include the constraints defining the model itself and any constraint on model’s interface). We will refer to such a parameter assignment satisfying the model constraints as a (stable) solution. Informally, stability of a solution means that any eligible assignment in the specified region around the solution also satisfy the required constraints. This notion is sometimes referred to as robustness. We work with parameterized systems, where parameters (also called knobs) can be tuned to optimize the system’s performance under all legitimate inputs. For example, in the circuit board design setting, topological layout of circuits, distances, wire thickness, properties of dielectric layers, etc. can be such parameters, and the exploration goal would be to optimize the system performance under the system’s requirements [19]. The difference between knobs and inputs is that knob values are selected during design phase, before the system goes into operation; on the other hand, inputs remain free and get values from the environment during the operation of the system. Knobs and inputs correspond to existentially quantified and universally quantified variables in the formal definition of model exploration tasks. Thus in the usual meaning of verification, optimization and synthesis, respectively, all variables are inputs, all variables are knobs, and some of the variables are knobs and the rest are inputs.

In this work by a model we refer to an ML model that models the system under exploration. The main capabilities of SMLP for system exploration include:

assertion verification: Verifying assertions on the model’s interface.
parameter synthesis: Finding model parameter values such that design constraints are valid.
parameter optimization: Optimizing the model parameters under constraints.
stable optimized synthesis: Combining model parameter synthesis and optimization into one algorithm, enhanced by stability guarantees, to achieve safe, stable and optimal configurations.
root cause analysis: Generating root-causing hints in terms of subset of parameters and their ranges that explain the failure.
model refinement: Targeted refinement of the model based on stability regions found by model exploration and on feedback from system in these stability regions.

The model exploration cube in Fig. 1 provides a high level and intuitive idea on how the model exploration modes supported in SMLP are related. The three dimensions in this cube represent synthesis ($\searrow $-axis), optimization ($\rightarrow $-axis) and stability ($\uparrow $-axis). On the bottom plane of the cube, the edges represent the synthesis and optimization problems in the following sense: synthesis with constraints configures the knob values in a way that guarantees that assertions are valid, but unlike optimization, does not guarantee optimally with respect to optimization objectives. On the other hand, optimization by itself is not aware of assertions on inputs of the system and only guarantees optimality with respect to knobs, and not the validity of assertions in the configured system. We refer to the procedure that combines synthesis with optimization and results in an optimal design that satisfies assertions as optimized synthesis. The upper plane of the cube represents introducing stability requirements into synthesis (and as a special case, into verification), optimization, and optimized synthesis. The formulas that make definition of stable verification, optimization, synthesis and optimized synthesis precise are discussed in Sect. 4.

Compared to digital design, it is fair to say that formal methods have had a limited success in the analog domain. A practical approach to this challenge is to use models as a way of abstraction that can be refined based on model analysis and feedback from the real system to narrow the gap between the model and the system to levels tolerable by stability requirements of the design. SMLP applies formal analysis to systems represented by ML models, and assists designers in product development, in particular, helps to refine the design to make it safe and optimized, see Sect. 8.

2 SMLP Architecture

SMLP tool architecture is depicted in Fig. 2. It consists of the following components: 1) Design of experiments (DOE), 2) System that can be sampled based on DOE, 3) ML model trained on the sampled data, 4) SMLP solver that handles different system exploration modes on a symbolic representation of the ML model, 5) Targeted model refinement loop.

SMLP supports multiple ways to generate training data known under the name of Design Of Experiments. These methods include: full-factorial, fractional-factorial, Plackett-Burman, Box-Behnken, Box-Wilson, Sukharev-grid, Latin-hypercube, among other methods, which try to achieve a smart sampling of the entire input space with a relatively small number of data samples. In Fig. 2, the leftmost box-shaped component called doe represents SMLP capabilities to generate test vectors to feed into the system and generate training data; the latter two components are represented with boxes called system and data, respectively.

In a simplified setting, SMLP can be directly applied to the training data representing the input output behaviour of the system, skipping the DOE step.

The component called ml model represents SMLP capabilities to train models; currently neural network, polynomial and tree-based regression models are supported. Modeling analog devices using polynomial models was proposed in the seminal work on Response Surface Methodology (RSM) [3], and since then has been widely adopted by the industry. Neural networks and tree-based models are used increasingly due to their wider adoption, and their exceptional accuracy and simplicity, respectively.

The component called solver pipeline represents model exploration engines of SMLP (e.g., connection to SMT solvers), which besides a symbolic representation of the model takes as input several types of constraints and input sampling distributions specified on the model’s interface; these are represented by the component called constraints & distributions located at the low-left corner of Fig. 2, and will be discussed in more detail in Sect. 4. The remaining components represent the main model exploration capabilities of SMLP.

Last but not least, the arrow connecting the ml model component back to the doe component represents a model refinement loop which allows to reduce the gap between the model and system responses in the input regions where it matters for the task at hand (there is no need to achieve a perfect match between the model and the system everywhere in the input space). The targeted model refinement loop is discussed in Sect. 6.7.

3 Symbolic Representation of Models and Constraints

We assume that system interface consists of free inputs, knobs, and outputs. The set of inputs and/or knobs, can be empty. For the sake of ML-based analysis, we build an ML model, represent it symbolically, and the aim is to analyze the system through exploring the model instead.

A domain $\mathcal D$ is a Cartesian product of reals, integers and finite non-empty sets. A parameterized system can be represented as a function $f: \mathcal {D}_{ par }\times \mathcal {D}_{ in } \rightarrow \mathcal {D}_{ out }$, where $\mathcal {D}_{ par }, \mathcal {D}_{ in }, \mathcal {D}_{ out }$ are domains of parameters (knobs), inputs and outputs, respectively. For simplicity of the presentation we assume all domains are products of sets of reals but methods and implementation are applicable also for domains over integers and arbitrary finite sets. We consider formulas over $\langle \mathbb R,0,1,\mathcal F,P\rangle $, where P contains the usual predicates ${<},{\le },{=}$, etc. and $\mathcal F$ contains addition, multiplication with rational constants and can also contain non-linear functions supported by SMT solvers including polynomials, transcendental functions and more generally computable functions [6, 7, 10, 11, 15].

We extend functions $\mathcal F$ by functions definable by formulas: $\mathcal F_D$, i.e., we assume $f\in \mathcal {F_D}$ is represented by a formula $F(x_1,\ldots ,x_n,y)$ over variables $x_1,\ldots ,x_n$ corresponding to the n inputs and y corresponding to the output $f(x_1,\ldots ,x_n)$. We assume that satisfiability of quantifier free formulas over this language is decidable or more generally $\delta $-decidable [7, 15]. Let us note that even when basic functions $\mathcal F$ contain just linear functions, $\mathcal F_D$ will contain, e.g., functions represented by neural networks with ${\text {ReLU}}$ activation functions as well as decision trees and random forests. When representing parameterized systems using ML models we assume that parameters are treated as designated inputs to the ML model.

Throughout, p, x, y denote respectively knob, input and output variables (or variable vectors) in formulas while r, z range over reals. Whenever we use a norm $\Vert \cdot \Vert $, we refer to a norm representable in our language, such as the Chebyshev norm $(x_1,\ldots ,x_n)\mapsto \max \{|x_1|,\ldots ,|x_n|\}$.

4 Symbolic Representation of the ML Model Exploration

The main system exploration tasks handled by SMLP can be defined using $\exists ^*\forall ^*$ formulas in the GEAR-fragment [4]:

$$\begin{aligned} \exists p ~\big [\eta (p) \wedge \forall p'~ \forall x y~[ \theta (p,p') \rightarrow (\varphi _M(p',x,y) \rightarrow \varphi _{ cond }(p',x,y)) ]\big ] \end{aligned}$$

(1)

where x ranges over inputs, y ranges over outputs, and $p,p'$ range over knobs, $\eta (p)$ are constraints on the knob configuration p, $\varphi _{M}(p',x,y)$ defines the machine learning model, $\theta (p,p')$ defines stability region for the configuration p, and $\varphi _{ cond }(p',x,y)$ defines conditions that should hold in the stability region. An assignment to variables p that makes formula (1) true is called a $\theta $-stable solution to (1).

In our formalization $\theta ,\eta $ and $\varphi _{ cond }$ are quantifier free formulas in the language. These constraints and how they are implemented in SMLP are described below.

$\eta (p)$ Constraints on values of knobs; this formula need not be a conjunction of constraints on individual knobs, can define more complex relations between allowed knob values of individual knobs. $\eta (p)$ can be specified through the SMLP specification file (see Sect. 5).
$\theta (p,p')$ Stability constraints that define a region around a candidate solution. This can be specified using either absolute or relative radius r in the specification file. This region corresponds to a ball (or box) around p: $\theta (p,p')=\Vert p - p' \Vert \le r$. In general, our methods do not impose any restrictions on $\theta $ apart from reflexivity.
$\varphi _M(p,x,y)$ Constraints that define the function represented by the ML model M, thus $\varphi _M(p,x,y) =(M(p,x) = y)$. In the ML model, knobs are represented as designated inputs (and can be treated in the same way as system inputs, or the machine model architecture can reflect the difference between inputs and knobs). $\varphi _M(p,x,y)$ is computed by SMLP internally, based on the ML model specification.
$\varphi _{ cond }(p,x,y)$ Conditions that should hold in the $\theta $-region of the solution. These conditions depend on the exploration mode and could be: (1) verification conditions, (2) model querying conditions, (3) parameter optimization conditions, or (4) parameter synthesis conditions. The exploration modes are described in Sect. 6.

SMLP solver is based on specialized procedures $\text {GearSAT}_\delta $ [4] and $\text {GearSAT}_\delta $-BO [5] for solving formulas in the GEAR fragment using quantifier-free SMT solvers. The $\text {GearSAT}_\delta $ procedure interleaves search for candidate solutions using SMT solvers with exclusion of $\theta $-regions around counter-examples. $\text {GearSAT}_\delta $-BO combines $\text {GearSAT}_\delta $ search with Bayesian optimization guidance. These procedures find solutions to GEAR formulas with user-defined accuracy $\varepsilon $ (defined in Sect. 6.4) and they have been proven to be sound, ($\delta $)-complete and terminating.

5 Problem Specification in SMLP

The specification file defines the problem conditions in a JSON compatible format, whereas SMLP exploration modes can be specified via command line options. Figure 3 depicts a toy system with two inputs, two knobs, and two outputs and a matching specification file for model exploration modes in SMLP. For each variable it specifies its label (the name), its interface function (“input”, “knob”, or “output”), its type (“real”, “int”, or “set”, for categorical features), ranges for variables of real and int types, and optionally, a grid of values for knobs that they are allowed to take on within the respective declared ranges, independently from each other (unless there are constraints further restricting the multi-dimensional grid). Both integer and real typed knobs can be restricted to grids (but do not need to). Additional fields alpha, beta, eta, assertions and objectives can optionally be specified, as shown in the example. These correspond to the predicates $\alpha $, $\beta $, $\eta $, ‘${\text {assert}}$’ and objective function $o$ described in Sect. 6.

The details about the concrete format are described in the manual [8], also distributed with SMLP.

6 SMLP Exploration Modes of ML Models

In this section we describe ML model exploration modes supported by SMLP, which are based on Formula (1).^{Footnote 1}

6.1 Stable Parameter Synthesis

The goal of stable synthesis is to find values of the system parameters such that required conditions hold in the $\theta $-region of the parameters for all inputs. For this, SMLP solves Formula (1), where

$$\begin{aligned} \varphi _{ cond }(p,x,y)=\alpha (p,x) \rightarrow \beta (p,x,y)\text{. } \end{aligned}$$

Here, $\alpha (p,x)$ restricts points in the region around the solutions to points of interest and $\beta (p,x,y)$ is the requirement that these points should satisfy. The $\alpha $ constraints define the domain of inputs and knobs and constraints on them which play the role of assumptions in the assume-guarantee paradigm, while $\beta $ constraints can be viewed as guarantees; they can express some external/additional requirements from system not covered by assertions. In case of synthesis and optimization, $\beta $ constraints can be used to express constraints that should be satisfied by synthesized, respectively, optimized system. For example consider $\alpha (p,x) =(x_1 > x_2 + x_3)$, $\beta (p,x,y) =y_1 > 2\cdot x_1$ and $\theta =\Vert p - p'\Vert \le 0.5$. In this mode SMLP will find values of parameters of the system such that for all parameters in the 0.5 region and all inputs such that $x_1 > x_2 + x_3$ the output value $y_1$ is greater than $2\cdot x_1$.

6.2 Verifying Assertions on a Model

For verifying an assertion ${\text {assert}}(p,x,y)$ on a model M under given parameters p we can simplify Formula (1) to:

$$ \eta (p) \wedge \forall p' ~\forall x y ~[ \theta (p,p') \rightarrow (\varphi _M(p',x,y) \rightarrow {\text {assert}}(p',x,y)) ]\text{. } $$

Since p is fixed, $\eta (p)$ can be eliminated by evaluation. Further, if one is not concerned with stability, then $\theta $ can be replaced with the identity and the problem can be reduced to a standard verification problem.

$$ ~\forall x y~(\varphi _M(p,x,y) \rightarrow {\text {assert}}(p,x,y))\text{. } $$

In the case of neural networks, there is a large range of verification tools to solve this problem such as Marabou [16], $\alpha $, $\beta $-CROWN [20, 22]. Most of these tools rely on floating point computations, which can quickly accumulate errors. SMLP supports SMT solvers with arbitrary precision which can produce exact results, at the expense of the computational cost. Nevertheless, dedicated ML solvers are very useful as they scale to much larger problems [9]. We are currently working on supporting dedicated ML solvers in SMLP and let user decide which traded-off to choose. SMLP also supports other ML models such as decision trees, random forests and polynomial models.

6.3 Querying Conditions on the Model

The task of querying ML model for a stable witness to ${\text {query}}(p, x, y)$ consists in finding value assignments $p^*,x^*$ for knobs p and inputs x that represent a solution for Eq. (2):

$$\begin{aligned} \exists p, x ~\big [ \eta (p) \wedge \forall p'~ \forall y~[ \theta (p,p') \rightarrow (\varphi _M(p',x,y) \rightarrow \varphi _{ cond }(p',x,y)) ]\big ] \end{aligned}$$

(2)

where

$$\begin{aligned} \varphi _{ cond }(p,x,y) =\alpha (p,x) \rightarrow {\text {query}}(p,x,y). \end{aligned}$$

Queries can be used to explore the model, e.g., to explore regions around failures where query corresponds to negation of the assertion, or to explore near optimal regions in the optimization tasks, or other conditions of interest.

6.4 Stable Optimized Synthesis

In this subsection we consider the optimization problem for a real-valued function f (in our case, an ML model), extended in two ways: (1) we consider a $\theta $-stable maximum to ensure that the objective function does not drop drastically in a close neighborhood of the configuration where its maximum is achieved, and (2) we assume that the objective function besides knobs depends also on inputs, and the function is maximized in the stability $\theta $-region of knobs, for any values of inputs in their respective legal ranges. We explain these extensions using two plots in Fig. 4.

The left plot represents optimization problem for f(p, x) when f depends on knobs only (thus x is an empty vector), while the right plot represents the general setting where x is not empty (which is usually not considered in optimization research). In each plot, the blue threshold (in the form of a horizontal bar or a rectangle) denotes the stable maximum around the point where f reaches its (regular) maximum, and the red threshold denotes the stable maximum, which is approximated by our optimization algorithms. In both plots, the regular maximum of f is not stable due to a sharp drop of f’s value in the stability region.

Let us first consider optimization without stability or inputs, i.e., far low corner in the exploration cube Fig. 1. Given a formula $\varphi _M$ encoding the model, and an objective function $o:\mathcal {D}_{ par }\times \mathcal {D}_ out \rightarrow \mathbb R$, the standard optimization problem solved by SMLP is stated by Formula (3).

(3)

A solution to this optimization problem is the pair , where $p^*{}\in \mathcal {D}_{ par }$ is a value of parameters p on which the maximum of the objective function $o$ is achieved for the output y of the model on $p^*$. In most cases it is not feasible to exactly compute the maximum. To deal with this, SMLP computes maximum with a specified accuracy. Consider $\varepsilon >0$. We refer to values $(\tilde{p},\tilde{z})$ as a solution to the optimization problem with accuracy $\varepsilon $, or $\varepsilon $-solution, if holds and $\tilde{z}$ is a lower bound on the objective, i.e., $\forall y [ \varphi _M(p,y) \rightarrow o(p,y) \ge \tilde{z}] $ holds.

Now, we consider stable optimized synthesis, i.e., the top right corner of the exploration cube. The problem can be formulated as the following Formula (4), expressing maximization of a lower bound on the objective function $o$ over parameter values under stable synthesis constraints.

(4)

where

$$ \varphi _{ cond }^{\ge }(p',x,y,z) =\alpha (p',x) \rightarrow (\beta (p',x,y) \wedge o(p',x,y) \ge z). $$

The stable synthesis constraints are part of a GEAR formula and include usual $\eta , \alpha , \beta $ constraints together with the stability constraints $\theta $. Equivalently, stable optimized synthesis can be stated as the max-min optimization problem, Formula (5)

(5)

where

$$ \varphi _{ cond }^{\le }(p',x,y,z) =\alpha (p',x) \rightarrow (\beta (p',x,y) \wedge o(p',x,y) \le z)\text{. }$$

In Formula (5) the minimization predicate in the stability region corresponds to the universally quantified x and $p'$ ranging over this region in (4). An advantage of this formulation is that this formula can be adapted to define other aggregation functions over the objective’s values on stability region. For example, that way one can represent the max-mean optimization problem, where one wants to maximize the mean value of the function in the stability region rather one the min value (which is maximizing the worst-case value of f in stability region). Likewise, Formula (5) can be adapted to other interesting statistical properties of distribution of values of f in the stability region.

We can explicitly incorporate assertions in stable optimized synthesis by defining $\beta (p',x,y)=\beta '(p',x,y) \wedge {\text {assert}}(p',x,y)$, where ${\text {assert}}(p',x,y)$ are assertions required to be valid in the entire stability region around the selected configuration of knobs p. The notion of $\varepsilon $-solutions for these problems carries over from the one given above for Formula (3).

SMLP implements stable optimized synthesis based on the $\text {GearOPT}_\delta $ and $\text {GearOPT}_\delta $-BO algorithms [4, 5], which are shown to be complete and terminating for this problem under mild conditions. These algorithms were further extended in SMLP to Pareto point computations to handle multiple objectives simultaneously.

6.5 Design of Experiments

Most DOE methods are based on understanding multivariate distribution of legal value combinations of inputs and knobs in order to sample the system. When the number of system inputs and/or knobs is large (say hundreds or more), the DOE may not generate a high-quality coverage of the system’s behavior to enable training models with high accuracy. Model training process itself becomes less manageable when number of input variables grows, and models are not explainable and thus cannot be trusted. One way to curb this problem is to select a subset of input features for DOE and for model training. The problem of combining feature selection with DOE generation and model training is an important research topic of practical interest, and SMLP supports multiple practically proven ways to select subsets of features and feature combinations as inputs to DOE and training, including the MRMR feature selection algorithm [13], and a Subgroup Discovery (SD) algorithm [1, 18, 21]. The MRMR algorithm selects a subset of features according to the principle of maximum relevance and minimum redundancy. It is widely used for the purpose of selecting a subset of features for building accurate models, and is therefore useful for selecting a subset of features to be used in DOE; it is a default choice in SMLP for that usage. The SD algorithm selects regions in the input space relevant to the response, using heuristic statistical methods, and such regions can be prioritized for sampling in DOE algorithms.

6.6 Root Cause Analysis

We view the problem of root cause analysis as dual to the stable optimized synthesis problem: while during optimization with stability we are searching for regions in the input space (or in other words, characterizing those regions) where the system response is good or excellent, the task of root-causing can be seen as searching for regions in the input space where the system response is not good (is unacceptable). Thus simply by swapping the definition of excellent vs unacceptable, we can apply SMLP to explore weaknesses and failing behaviors of the system.

Even if a number of counter-examples to an assertion are available, they represent discrete points in the input space and it is not immediately clear which value assignments to which variables in these counter-examples are critical to explain the failures. Root causing capability in SMLP is currently supported through two independent approaches: a Subgroup Discovery (SD) algorithm that searches through the data for the input regions where there is a higher ratio (thus, higher probability) of failure; to be precise, SD algorithms support a variety of quality functions which play the role of optimization objectives in the context of optimization. To find input regions with high probability of failure, SMLP searches for stable witnesses to failures. These capabilities, together with feature selection algorithms supported in SMLP, enable researchers to develop new root causing capabilities that combine formal methods with statistical methods for root cause analysis.

6.7 Model Refinement Loop

Support in SMLP for selecting DOE vectors to sample the system and generate a training set was discussed in Subsect. 6.5. Initially, when selecting sampling points for the system, it is unknown which regions in the input space are really relevant for the exploration task at hand. Therefore some DOE algorithms also incorporate sampling based on previous experience and familiarity with the design, such as sampling nominal cases and corner cases, when these are known. For model exploration tasks supported by SMLP, it is not required to train a model that will be an accurate match to the system everywhere in the legal search space of inputs and knobs. We require to train a model that is an adequate representation of the system for the task at hand, meaning that the exploration task solved on the model solves this task for the system as well. Therefore SMLP supports a targeted model refinement loop to enable solving the system exploration tasks by solving these tasks on the model instead. The idea is as follows: when a stable solution to model exploration task is found, it is usually the case that there are not many training data points close to the stability region of that solution. This implies that there is a high likelihood that the model does not accurately represent the system in the stability region of the solution. Therefore the system is sampled in the stability region of the solution, and these data samples are added to the initial training data to retrain the model and make it more adequate in the stability region of interest.

7 Implementation

SMLP code is open-source and publicly available^{Footnote 2}. Its frontend is implemented in Python, and its backend is implemented in C++ , while the interface between the two is realized using the Boost library. For training tree-based and polynomial models we use the scikit-learnand pycaretpackages, and for training neural networks we use the Keras package with TensorFlow. Our focus is on analyzing regression models arising from systems with analog pins and analog output, but classification models are also covered as they can be reduced to binary classification with output values 0 and 1, or by treating the binary classification problem as a regression problem of predicting the probability of the output to be 1 (the latter is usually preferable for more finer analysis). For generating training data from a system, SMLP supports DOE approaches available in package pyDOE. The MRMR algorithm for feature selection is integrated in SMLP using the mrmr package, and the Subgroup Discovery algorithm is integrated using package pysubgroup.

SMLP can use any external SMT solver which supports SMT-LIB2 format, as a back end of the GearSAT/OPT algorithms (via command line options), and also natively integrates Z3 via the Python interface. We successfully experimented with Z3 [11], Yices [14], CVC5 [2], MathSAT [10] and ksmt [6].

8 Industrial Case Studies

Previous publications [4, 5] on SMLP report detailed experimental results on 10 real-life training datasets originating from Electrical Validation and Signal Integrity domains. The output is a measurement of the quality of an analog signal between a transmitter and a receiver of a channel to a peripheral device. The datasets are freely available^{Footnote 3}: 5 transmitter (TX) datasets and 5 receiver (RX) counterparts. The count of inputs and knobs together in these experiments, as well as in current usage of the SMLP tool at Intel, is around 5 to 20 variables. In [4] the experimental evaluation is performed using $\textsc {GearSat}_\delta $ algorithm, and experimental results using the $\textsc {GearOpt}_\delta $-BO algorithm that combines SMT-based optimization procedure with Bayesian optimization are reported in [5]. While these datasets are relatively small in terms of parameter counts, they are representative of modeling I/O devices at Intel, and SMLP has been useful in suggesting safe and optimized configurations for a number of real-life I/O devices in recent years.

Some of the challenges of design space exploration at Intel in the design stage of product developed are described in [17]. This work focuses on design challenges for 112 Gb SerDes I/O serialization systems. This work uses Feature Range Analysis [17] as ML analysis engine which has initial support in SMLP in the subgroups mode. The parameters relevant to the system exploration include those characterizing topological layout of circuits, physical characteristics, requirements for manufacturability, and more, such as relative locations of vias, distances between parallel wires, wire thickness, wire lengths, properties of dielectric layers. The exploration goal is to optimize the system performance under the system’s requirements, to find one safe and optimal configuration that supports multiple modes of operations – in particular, to improve the timing and voltage margins, co-optimized with power and area requirements. SMLP is currently being applied for analysis and optimization of such I/O systems, through analyzing NN, tree and polynomial models trained on design and lab data.

9 Future Work

Currently we are extending SMLP to support ONNX format used by VNN-LIB [12] so more specialized solvers for ML can also be used alongside SMT solvers. We are working on combining different solving strategies into a user-definable solver pipeline of ML and SMT solvers within the SMLP framework. We recently released a new set of benchmarks and intend to release more real-life industrial datasets in the future (See Footnote 3).

Notes

1.
A comprehensive description of the exploration modes can be found in the SMLP manual [8].
2.
https://github.com/fbrausse/smlp
3.
https://fbrausse.github.io/smlp/benchmarks-intel/

References

Atzmueller, M.: Subgroup discovery. WIREs Data Mining Knowl. Discov. 5(1), 35–49 (2015)
Article Google Scholar
Barbosa, H.: cvc5: a versatile and industrial-strength SMT solver. In: TACAS 2022. LNCS, vol. 13243, pp. 415–442. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99524-9_24
Box, G.E.P., Wilson, K.B.: On the experimental attainment of optimum conditions. J. Royal Stat. Soc. Ser. B (Methodological) 13(1), 1–45 (1951)
Article MathSciNet Google Scholar
Brauße, F., Khasidashvili, Z., Korovin., K.: Selecting stable safe configurations for systems modelled by neural networks with ReLU activation. In: 2020 Formal Methods in Computer Aided Design, FMCAD 2020, Haifa, Israel, 21–24 September 2020, pp. 119–127. IEEE (2020)
Google Scholar
Brauße, F., Khasidashvili, Z., Korovin, K.: Combining constraint solving and bayesian techniques for system optimization. In: De Raedt, L. (ed.) Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23–29 July 2022, pp. 1788–1794. ijcai.org (2022)
Google Scholar
Brauße, F., Korovin, K., Korovina, M., Müller, N.: A CDCL-style calculus for solving non-linear constraints. In: Herzig, A., Popescu, A. (eds.) FroCoS 2019. LNCS (LNAI), vol. 11715, pp. 131–148. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29007-8_8
Chapter Google Scholar
Brauße, F., Korovin, K., Korovina, M.V., Müller, N.T.: The KSMT calculus is a $\delta $-complete decision procedure for non-linear constraints. Theor. Comput. Sci. 975, 114125 (2023)
Article MathSciNet Google Scholar
Brauße, F., Khasidashvili, Z., Korovin, K.: SMLP: symbolic machine learning prover (user manual). CoRR, abs/2405.10215 (2024)
Google Scholar
Brix, C., Müller, M.N., Bak, S., Johnson, T.T., Liu, C.: First three years of the international verification of neural networks competition (VNN-COMP). Int. J. Softw. Tools Technol. Transf. 25(3), 329–339 (2023)
Article Google Scholar
Cimatti, A., Griggio, A., Schaafsma, B.J., Sebastiani, R.: The MathSAT5 SMT solver. In: Piterman, N., Smolka, S.A. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 93–107. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36742-7_7
Chapter Google Scholar
de Moura, L., Bjørner, N.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78800-3_24
Chapter Google Scholar
Demarchi, S., Guidotti, D., Pulina, L., Tacchella, A.: Supporting standardization of neural networks verification with VNN-LIB and CoCoNet. In: Proceedings of the 6th Workshop on Formal Methods for ML-Enabled Autonomous Systems, vol. 16, pp. 47–58 (2023)
Google Scholar
Ding, C.H.Q., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol. 3(2), 185–206 (2005)
Article Google Scholar
Dutertre, B.: Yices 2.2. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 737–744. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9_49
Chapter Google Scholar
Gao, S., Avigad, J., Clarke, E.M.: $\delta $-complete decision procedures for satisfiability over the reals. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR 2012. LNCS (LNAI), vol. 7364, pp. 286–300. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31365-3_23
Chapter Google Scholar
Katz, G., et al.: The Marabou framework for verification and analysis of deep neural networks. In: Dillig, I., Tasiran, S. (eds.) CAV 2019, Part I. LNCS, vol. 11561, pp. 443–452. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_26
Chapter Google Scholar
Khasidashvili, Z., Norman, A.J.: Feature range analysis. Int. J. Data Sci. Anal. 11(3), 195–219 (2021)
Article Google Scholar
Klösgen, W.: Explora: a multipattern and multistrategy discovery assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 249–271. AAAI/MIT Press (1996)
Google Scholar
Manukovsky, A., Shlepnev, Y., Khasidashvili, Z.: Machine learning based design space exploration and applications to signal integrity analysis of 112Gb SerDes systems. In: 2021 IEEE 71st Electronic Components and Technology Conference (ECTC), pp. 1234–1245 (2021)
Google Scholar
Wang, S., et al.: Beta-CROWN: efficient bound propagation with per-neuron split constraints for complete and incomplete neural network verification. Adv. Neural Inf. Process. Syst. 34 (2021)
Google Scholar
Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Komorowski, J., Zytkow, J. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63223-9_108
Chapter Google Scholar
Xu, K., et al.: Fast and Complete: enabling complete neural network verification with rapid and massively parallel incomplete verifiers. In: International Conference on Learning Representations (2021)
Google Scholar

Download references

Acknowledgments

We would like to thank Shai Amara, Alex Manukovsky, Joshua N. Fontaine, and Yunhui Chu for providing us with data and ML models that helped us in developing and tuning SMLP on real-life problem instances.

Author information

Authors and Affiliations

The University of Manchester, Manchester, UK
Franz Brauße & Konstantin Korovin
Intel, Haifa, Israel
Zurab Khasidashvili

Authors

Franz Brauße
View author publications
You can also search for this author in PubMed Google Scholar
Zurab Khasidashvili
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Korovin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Franz Brauße .

Editor information

Editors and Affiliations

University of Waterloo, Waterloo, ON, Canada
Arie Gurfinkel
Georgia Institute of Technology, Atlanta, GA, USA
Vijay Ganesh

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brauße, F., Khasidashvili, Z., Korovin, K. (2024). SMLP: Symbolic Machine Learning Prover. In: Gurfinkel, A., Ganesh, V. (eds) Computer Aided Verification. CAV 2024. Lecture Notes in Computer Science, vol 14681. Springer, Cham. https://doi.org/10.1007/978-3-031-65627-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-65627-9_11
Published: 26 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-65626-2
Online ISBN: 978-3-031-65627-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SMLP: Symbolic Machine Learning Prover

Abstract

1 Introduction

2 SMLP Architecture

3 Symbolic Representation of Models and Constraints

4 Symbolic Representation of the ML Model Exploration

5 Problem Specification in SMLP

6 SMLP Exploration Modes of ML Models

6.1 Stable Parameter Synthesis

6.2 Verifying Assertions on a Model

6.3 Querying Conditions on the Model

6.4 Stable Optimized Synthesis

6.5 Design of Experiments

6.6 Root Cause Analysis

6.7 Model Refinement Loop

7 Implementation

8 Industrial Case Studies

9 Future Work

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation