1 Introduction

Context.

In various applications in engineering, science, and other domains, we are faced with various difficult optimization problems, entailing the tuning or design of certain variables to minimize an objective function, subject to the satisfaction of constraints. Such problems are characterized by a non-trivial relation between the input variables (referred in the literature as tuning or design variables) and the objective and/or constraints. Examples of applications involve design of systems with different physical mechanisms, e.g. mechanical, electronic, hydrodynamic, etc., and where the fitness depends on their complex interaction with the environment. In many cases, hard-to-model factors like integration of digital and analog design, usage patterns, and environmental fluctuations, make the optimization problem even more challenging. Figure 1 illustrates the factors affecting the fitness of a complex system with respect to the design variables. Hence, closed-form mathematical expressions relating the design variables to the objective and constraints are not available, or otherwise highly difficult to extract or solve. Instead, the only sensible approach to evaluate the fitness is through simulations and/or experiments, i.e., by sampling the fitness at individual sampling points. This problem class, where we only have access to the objective and constraint values via sampling, is called black-box (or global) optimization.

Fig. 1
A schematic diagram depicts the influences of fitness measures. It includes subsystem 1 and the system under test. The subsystem includes controller design, components 1 A, 1 B, and others. The system under test includes several subsystems including environmental conditions and disturbances, iterations, and effects.

Factors affecting the fitness of a complex system

Global optimization has attracted the interest of engineering practitioners and applied mathematicians for several decades, and in fact, it is still an open question for research. With the recent rise of machine learning and hybrid systems design, technical applications call for more effective techniques to tune parameters in higher dimensions and in the presence of black-box constraints and evaluation noise.

Previous works. Most approaches to global optimization address a trade off between two conflicting goals: exploitation, where one attempts to sample around the current best point to improve it, and exploration, where distant or high-uncertainty regions are sampled to discover more the underlying functions. Previously developed methods for global optimization are plenty, however, most of them can be grouped in four conceptual categories:

  1. 1.

    Population-based methods: this category includes methods which involve a group (“population”) of agents scattered throughout the optimization space, evaluating the objective/constraints at their respective locations. With succeeding iterations, these agents move their respective locations based on heuristics [7], usually based on animal behavior. “Generation-based” methods [4] are also included, whose agents evolve by a mix of combination and random mutation. While these have been highly popular due to their empirical performance and low computational burden, they need numerous function evaluations due to their batch-based paradigm. Hence, these algorithms are limited to “cheap” objectives and constraint, i.e., those which are easy to evaluate.

  2. 2.

    Direct search techniques: they entail evaluating points in a chosen set of search directions, which can be randomly-generated or along the basic directions, and comparing these points with the current best one (“incumbent”). Example methods in this category are Compass Search [8], Generalized Pattern Search [17], and the Mesh Adaptive Direct Search [1]. While they have negligible computational burden due to their simplicity, they perform well in practice, and are becoming more popular. However, convergence properties to the global optimum is not guaranteed at least when such methods are used without modifications.

  3. 3.

    Model-based methods: they involve iteratively refining a surrogate model from the data acquired so far. The surrogate model is then used for a “cheap” optimization, to select the next point for sampling. Examples for this category include kriging-based methods [9], and the popular Bayesian optimization [6]. While these techniques receive wide attention especially in the machine learning community, the surrogate modelling part brings high computational burden. As a result, these methods are mostly limited to lower dimensionality and lower evaluation budgets.

  4. 4.

    Lipschitz-based methods: their mechanism rests on the assumption that the underlying functions are Lipschitz continuous. These functions usually exploit the information regarding the lowest possible bound of the functions in the unsampled regions, to select the most promising point for sampling in the next iteration. In this category, we count the Piyavskii-Schubert method from the 1970s [12, 16], the Dividing Rectangles (DIRECT) method [5], and recently-proposed LiPO/AdaLIPO algorithms [10]. However, using the lower bounds as prediction is an optimistic one, which the sample after evaluation does not necessarily follow.

Contributions. This chapter describes the first time that the Set Membership approach is used for building a black-box optimization method, to address the shortcomings of previous works. While Set Membership approaches [11] have been used for system and function identification, filtering, and data-driven control, it has not yet been used for global optimization. With attractive properties like simple model-building technique, non-parametric modelling, and uncertainty quantification, it is a promising candidate for surrogate modelling, around which we can build a global optimization method. In the proposed Set Membership Global Optimization (SMGO) [14, 15], we build the Set Membership models of the objective and constraints from data (assuming their Lipschitz continuity). These models are then used to intelligently trade off between exploitation and exploration to select the next point for sampling. The resulting method is shown to have theoretical convergence, low computational burden, and reproducibility of results. We describe the problem setup, mechanism behind SMGO, followed by a discussion of its properties. Lastly, we summarize two case studies that successfully used SMGO in different engineering design problems.

2 Problem Setup

We consider the problem of finding a point that minimizes the scalar objective function \(f(\boldsymbol{x})\), subject to the satisfaction of one or more constraints \(g_s(\boldsymbol{x}), s = 1,\ldots ,S\). The point \(\boldsymbol{x} \doteq [x_1 ~ x_2 ~ \ldots ~ x_D]\) is also referred as the decision variable vector, where D is the dimensionality of the problem. Furthermore, we are considering a search set \(\mathcal {X} \subset \mathbb {R}^D\) that is convex and compact. This is a very common assumption in practical applications; in fact, most engineering design problems simply define respective search ranges \([\underline{x}_d, \overline{x}_d]\) for the decision variables \(x_d, ~ d = 1,\ldots ,D\), which makes \(\mathcal {X}\) a hyperrectangle.

We assume to have no access to the closed-form analytical expressions, nor any derivative/gradient information for the objective f and all constraints \(g_s\). Instead, we only have access to their function values by sampling individual test points \(\boldsymbol{x}\), using experiments, simulations, or a combination of both. Hence, f and \(g_s\) are what we call hidden or black-box functions, as they are referred in the literature. In addition, do not have assumptions on their convexity, nor even on the number of distinct local/global minima in the search space. Nevertheless, we do take an assumption regarding their regularity:

Assumption 1

The objective function f, and all constraint functions \(g_s\) are locally Lipschitz continuous throughout the considered search space \(\mathcal {X}\), with their respective finite (but unknown) Lipschitz constants \(\gamma , \rho _1, \ldots , \rho _S\):

$$ f \in \mathcal {F}(\gamma ), ~ g_1 \in \mathcal {F}(\rho _1), \ldots ,~ g_S \in \mathcal {F}(\rho _S) $$

where

$$ \mathcal {F}(\eta ) \doteq \left\{ h: \left| h(\boldsymbol{x}_1) - h(\boldsymbol{x}_2)\right| ~\le \eta \left\| \boldsymbol{x}_1 - \boldsymbol{x}_2\right\| , \forall \boldsymbol{x}_1,\boldsymbol{x}_2 \in \mathcal {X} \right\} . $$

The above assumption is reasonable, and in fact one that is taken in most problems that involve physical systems, where the rates of change are finite. Furthermore, we take an assumption on the acquired values of f and \(g_s\):

Assumption 2

The values of the objective function f and all constraints \(g_s\) can be evaluated at any point \(\boldsymbol{x}^{(i)}_{} \in \mathcal {X}\) without noise, in a setup referred to as an exact evaluation:

$$ z^{(i)}_{} = f(\boldsymbol{x}^{(i)}_{}),~ c^{(i)}_{1} = g_1(\boldsymbol{x}^{(i)}_{}),\ldots ,~ c^{(i)}_{S} = g_S(\boldsymbol{x}^{(i)}_{}). $$

At any chosen point \(\boldsymbol{x}^{(i)}_{}\), we assume that each evaluation gives access to both the values of \(f(\boldsymbol{x}^{(i)}_{})\) and all \(g_s(\boldsymbol{x}^{(i)}_{})\), in what we call a “synchronous evaluation”.

When considering constraints, we adopt the convention that \(g_s\) is satisfied at \(\boldsymbol{x}\) if \(g_s(\boldsymbol{x}) \ge 0\). We consider that the feasible set \(\mathcal {G}\), which is the intersection of the respective satisfaction sets of \(g_s\), exists and has a finite measure:

Assumption 3

Consider the feasible set \(\mathcal {G} \doteq \mathcal {X} \cap \left\{ \cap _{s=1}^{S} \{ \boldsymbol{x} : g_s(\boldsymbol{x}) \ge 0 \} \right\} \). We assume that

$$ \mathcal {L}(\mathcal {G}) > 0, $$

where \(\mathcal {L}\) is the operator for the Lebesgue measure.

Due to the above assumptions, we can then declare that at least one global minimizer \(\boldsymbol{x}^*\) exists, defined as

$$\begin{aligned} \boldsymbol{x}^* \in \mathcal {X}^* \doteq \big \{ \boldsymbol{x} \in \mathcal {G} ~\big |~ \forall \boldsymbol{x}' \in \mathcal {G}, f(\boldsymbol{x}') \ge f(\boldsymbol{x}) \big \}. \end{aligned}$$
(1)

and with the corresponding minimum objective \(z^* = f(\boldsymbol{x}^*)\).

3 Set Membership Global Optimization (SMGO)

3.1 Algorithm

Overview The Set Membership Global Optimization (SMGO), discussed in [14, 15], is a new global optimization technique, that uses the Set Membership approach to strategically trade off between exploitation (sampling around the current best evaluated point, to improve on the current best objective) and exploration (sampling around undiscovered regions of the search space, to learn more about the function). A general flow of the algorithm is shown in Fig. 2.

Fig. 2
A cyclic flow chart starts with building S M-based models for f x and all g x from existing samples, followed by exploitation, passing the expected improvement test, performing exploitation, and ends with the samples chosen candidate point which again leads to building S M-based models.

SMGO algorithm logic

Data set and model update Let us denote a new sample at iteration n by a tuple \(\mathring{\boldsymbol{x}}^{(n)} \doteq (\boldsymbol{x}^{(n)}, z^{(n)}, \boldsymbol{c}^{(n)})\), composed of the sampled point \(\boldsymbol{x}^{(n)}\), the corresponding objective value \(z^{(n)}\), and the vector of constraint values \(\boldsymbol{c}^{(n)}_{} \doteq [c^{(n)}_{1}, \ldots , c^{(n)}_{S}]\). In this step, we iteratively introduce the new entry to the data set \(\boldsymbol{X}^{\langle n-1 \rangle }_{}\), building \(\boldsymbol{X}^{\langle n \rangle }_{}\):

$$ \boldsymbol{X}^{\langle n \rangle }_{} = \boldsymbol{X}^{\langle n-1 \rangle }_{} \cup \mathring{\boldsymbol{x}}^{(n)}. $$

Given the data set \(\boldsymbol{X}^{\langle n \rangle }_{}\), we identify the best tuple \(\mathring{\boldsymbol{x}}^{*\langle n \rangle }\) as follows:

$$ \mathring{\boldsymbol{x}}^{*\langle n \rangle } = (\boldsymbol{x}^{*\langle n \rangle },z^{*\langle n \rangle },\boldsymbol{c}^{*\langle n \rangle }) \doteq \arg \min _{\mathring{\boldsymbol{x}}^{(i)} \in \boldsymbol{X}^{\langle n \rangle }_{}} z^{(i)}_{}, ~\text {s.t.}~ \boldsymbol{c}^{(i)}_{} \ge 0. $$

Furthermore, the estimates \(\tilde{\gamma }^{\langle n \rangle }_{}, \tilde{\rho }^{\langle n \rangle }_{1}, \ldots , \tilde{\rho }^{\langle n \rangle }_{S}\) of the Lipschitz constant \(\gamma , \rho _1, \ldots , \rho _S\) are then updated, which refines the SM model (further details on calculating these estimates are discussed in [15]). From these information, we build the SM upper- and lower bounds for f, denoted as \(\overline{f}^{\langle n \rangle }(\boldsymbol{x})\) and \(\underline{f}^{\langle n \rangle }(\boldsymbol{x})\) as illustrated in Fig. 3. We also define the central estimate \(\tilde{f}^{\langle n \rangle }_{}(\boldsymbol{x}) \doteq \frac{1}{2}\left( \overline{f}^{\langle n \rangle }(\boldsymbol{x})+\underline{f}^{\langle n \rangle }(\boldsymbol{x})\right) \), and the uncertainty \(\lambda ^{\langle n \rangle }_{}(\boldsymbol{x}) \doteq \overline{f}^{\langle n \rangle }(\boldsymbol{x})-\underline{f}^{\langle n \rangle }(\boldsymbol{x})\). Analogously, we denote the upper- and lower bounds, central estimate, and uncertainty for a constraint \(g_s\) as \(\overline{g}^{\langle n \rangle }_{s}(\boldsymbol{x})\), \(\underline{g}^{\langle n \rangle }_{s}(\boldsymbol{x})\), \(\tilde{g}^{\langle n \rangle }_{s}(\boldsymbol{x})\), and \(\pi ^{\langle n \rangle }_{s}(\boldsymbol{x})\), respectively.

From the SM-based models we can estimate the regions \(\tilde{\mathcal {G}}_s\) which satisfy the corresponding constraint \(g_s\), shaded in the second and third rows of Fig. 3. Using a weighting factor \(\Delta \in [0,1]\) that we refer as the risk parameter [15], we define the satisfaction region estimate \(\tilde{\mathcal {G}}_s\) as

$$ \tilde{\mathcal {G}}_s \doteq \left\{ \boldsymbol{x} \in \mathcal {X} : \Delta \tilde{g}_s(\boldsymbol{x}) + (1-\Delta )\underline{g}_s(\boldsymbol{x}) \ge 0 \right\} . $$

A setting of \(\Delta =0\) uses the most cautious estimate of satisfaction regions, using the SM lower bounds \(\underline{g}_s\) as a worst-case estimate. On the other hand, \(\Delta =1\) leads to the most lenient satisfaction estimate using \(\tilde{g}_s\), and a much larger \(\tilde{\mathcal {G}}_s\). The satisfaction region estimates from different \(\Delta \) is shown in Fig. 4. Given all \(\tilde{\mathcal {G}}_s\), we define the feasible region as

$$ \tilde{\mathcal {G}} \doteq \cap _{s=1}^S \tilde{\mathcal {G}}_s. $$

However, we note that even with the most cautious setting \(\Delta =0\), constraint violations might still occur in the setting that we treat because we do not have a knowledge of the Lipschitz constants for f and all \(g_s\) (see Assumption 1).

Fig. 3
3 S M-based models. The model at the top highlights the upper bounds, lower bounds, uncertainty measures, estimated feasible regions, and objective functions F. The models in the center and the bottom estimate the regions G 1 and 2 that satisfy the corresponding constraint G S, denoted by shades parts.

Set Membership models of objective and constraints from samples

Fig. 4
Two S M-based models for delta equals 0.0 and 0.1 depict an overall decreasing trend.

Satisfaction region estimates from different \(\Delta \) values

Generation of Candidate Points From the samples, we methodically generate a set \(\boldsymbol{E}^{\langle n \rangle }_{}\) of candidate points, from which we select the next sampling point \(\boldsymbol{x}^{(n+1)}_{}\) via exploitation or exploration. The candidate points generation method is highly flexible, and can be adjusted according to the needs of the user. Moreover, this subroutine can be skipped entirely, and the exploitation/exploration routines can be treated as continuous-space optimization routines.

A suggested method in [15] to generate candidate points is iterative: for every incoming sampled point \(\boldsymbol{x}^{(n)}_{}\), it generates candidate points in the positive and negative cardinal directions up to the boundaries of \(\mathcal {X}\), and further ones in the direction to all existing sampled points in \(\boldsymbol{X}^{\langle n \rangle }_{}\). This candidate point generation approach is illustrated in Fig. 5 for a two-dimensional example.

Fig. 5
A sample two-dimensional candidate point generation approach denotes the old samples, existing candidate points, newly introduced samples, and newly generated candidate points.

Generation of candidate points, with the method used in [15]

Exploitation We now attempt to select a sampling point from candidate points lying in a small region around the best sampled point, referred to as the trust region. Furthermore, we only choose from candidate points also estimated to be feasible, i.e., those also belonging to \(\tilde{\mathcal {G}}\). The metric to choose the exploitation candidate point \(\boldsymbol{x}^{\langle n \rangle }_{\theta }\) is based on its promise to improve on the current best objective value, which prioritizes lower central estimate, and, to a small degree, a higher uncertainty.

The chosen exploitation point \(\boldsymbol{x}^{\langle n \rangle }_{\theta }\), if it exists, is subjected to an expected improvement test (EIC) [14, 15], to evaluate if it is worthy to be evaluated using an expensive experiment or simulation. This condition is checked using the SM bounds; in particular, we test if the lower bound \(\underline{f}(\boldsymbol{x}^{\langle n \rangle }_{\theta })\) improves on the best sampled objective value by at least a set threshold \(\eta \), as in Fig. 6. If the EIC is passed, \(\boldsymbol{x}^{\langle n \rangle }_{\theta }\) is assigned as the next sampling point:

$$ \boldsymbol{x}^{(n+1)}_{} \leftarrow \boldsymbol{x}^{\langle n \rangle }_{\theta }, $$

otherwise, we skip it, and choose \(\boldsymbol{x}^{(n+1)}_{}\) by the exploration routine, as discussed next.

Fig. 6
A graphical representation. A dotted horizontal line denotes the improvement threshold and a shaded area coinciding with the dotted line denotes the sampling point X asterisk power N. The distance between the two lines is indicated by eta.

Expected improvement test

Exploration The exploration subroutine of SMGO attempts to discover the shape of the function by sampling a point in the high-uncertainty regions. In contrast to exploitation when we restrict ourselves to feasible candidate points within the trust region, we now choose a candidate point from throughout \(\mathcal {X}\). A merit function is designed to pick a point with the highest uncertainty with respect to the objective and constraints, and prioritizing points with higher number of (estimated) satisfied constraints. The chosen exploration point \(\boldsymbol{x}^{\langle n \rangle }_{\phi }\) is then directly assigned as the sampling point for the next iteration \(n+1\):

$$ \boldsymbol{x}^{(n+1)}_{} \leftarrow \boldsymbol{x}^{\langle n \rangle }_{\phi }. $$

4 Algorithm Properties

Convergence In [15], the proposed SMGO-\(\Delta \) is proven as convergent to a feasible point whose objective is within a finite precision \(\varepsilon > 0\) from the absolute minimum \(z^*\), assuming only a Lipschitz continuity of the underlying black-box functions f and all \(g_s\). We have shown that consecutive exploitation routines will fail the expected improvement test, which, due to the algorithm logic (see Fig. 2), allows for exploration samplings to be done infinitely often. Furthermore, the design of the exploration routine, through the candidate points generation technique and the exploration merit function, causes a progressively dense distribution of points to be sampled throughout the search space \(\mathcal {X}\). As a result, we can approach the optimal point \(\boldsymbol{x}^*\) up to any finite radius within a finite number of samplings, and correspondingly, the best sampled objective will be \(\varepsilon \)-optimal with respect to \(z^*\). More details on the convergence proofs are provided in [15].

Computational complexity The practical implementation of SMGO is based on keeping a database of candidate points, storing their respective SM-based bounds. As this database is used as a look-up table for the exploitation and exploration routines, most of the computations are devoted to updating this database at every iteration. The computational complexity of SMGO is mostly due to the number of candidate points generated, which, for the candidate points generation mechanism described in [15], results in \(\mathcal {O}(Dn+n^2)\). More discussions regarding the SMGO computational complexity, and iterative implementations can be found in [14, 15].

Implementation aspects There are important concerns that arise in most practical implementations, that we need to address in building SMGO. The most apparent concern is on the presence of noise and/or disturbance of unknown bounds. In this case, assuming that these bounds are finite (but we do not assume anything regarding its distribution), we can estimate the noise bounds by utilizing the method proposed in [2], and integrate this information in the construction of the SM bounds. The exploitation and exploration routines are performed as usual.

As SMGO proposes a methodical approach to generating candidate points, the results are completely reproducible from one run to another, i.e., given the same starting point, SMGO will produce the same result and the same sampling history, assuming the absence of noise. However, this same methodical generation of candidate points severely limit the possible search directions, especially during early iterations, SMGO allocates a fixed number of candidate points at the start of the algorithm, scattered around \(\mathcal {X}\) according to a pseudo-random distribution. This ensures that even during the initial iterations, SMGO can have more options on sampling locations, while still maintaining reproducibility (because the pseudo-random distribution can be duplicated between runs).

5 Sample Applications

5.1 Experiment-Based Controller Tuning

In this first application, the design of a proportional-integral (PI) controller for a tabletop wooden disk elevation system. As shown in Fig. 7, it has the objective of achieving the best disk elevation tracking performance, i.e., minimizing the elevation tracking error with respect to the a set reference height. Even in this seemingly simple system, there are already several non-trivial mechanisms at play, including imperfections in the blower nozzle 3D printing, non-linear aerodynamic response of the blower, friction of the guide rails, and looseness of the wooden disk.

Fig. 7
A control design for the experiment with a wooden disk elevation system. The system complexities include imperfections in blower 3-D print, guide rail friction, flow non-linearities, and looseness of levitating disks. The objective is to minimize the disk elevation tracking error using the S M G O algorithm.

Control design for a wooden disk elevation system

We applied SMGO in tuning the PI controller, and results show that the transient response of a controller tuned via black-box optimization (SMGO) outperforms one which was tuned from an estimated system model. More information regarding the system model, particularities in the optimization setup, and the results can be found in [3].

5.2 Plant-Controller Co-Design

Another application that involves non-convex optimization is plant-controller co-design. Consider a CubeSat to be designed for optical missions, i.e., taking images of ground targets. In this case, our objective is to minimize the attitude (pointing) error of the CubeSat, to satisfy its optical mission. In addition, we are constrained by a minimum percentage of communications time per flyby over the ground station (GS), located in Kiruna, Sweden, and also the maximum average power consumption per camera task. We designed for this case the following: sliding mode controller tuning for reaction wheels (RWs), sizing of magnetic rods, and sizing of hysteretic materials. This design problem is highly difficult because of the interactions between the passives design (magnetic rods and hysteretic materials) and the RW controller, and the non-trivial effects with the environment, in particular the Earth’s magnetic field. A diagram illustrating the complexity of the optimization problem is shown in Fig. 8, and more information regarding the system description and non-trivial interactions of the design variables and the objective/constraints can be found in [13].

Fig. 8
A control diagram for 3 U optical CubeSat. The system complexities include the hysteresis effect on the magnetic damper, interactions with the geometric field, and the reaction wheel saturation. The objective is to minimize the attitude tracking error, using the S M G O algorithm.

Plant-controller co-design for an optical CubeSat

The SMGO algorithm was used for the design process, interfaced with a MATLAB/Simulink-based CubeSat model. Simulations of image acquisition tasks and GS communication scenarios are run with the CubeSat with different passive magnet, hysteretic material, and RW controller tunings. For comparison, other commonly-used design strategies like independent design, sequential design, and Latin hypercube-based sampling are tested as well. As can also be seen in detailed results in [13], SMGO-based design was found to have the best attitude tracking performance, while satisfying the operational constraints on GS communication and power consumption.

6 Conclusions

In this contribution, the Set Membership Global Optimization (SMGO) is introduced, which is a new approach for non-convex optimization based on the Set Membership framework. The resulting approach is discussed, using the data-derived Set Membership model bounds and uncertainties to intelligently trade off between exploitation and exploration to decide on the next sampling point. We provide an overview of its theoretical properties, in particular its convergence to the global optimal value up to any finite precision, as well as several implementation concerns. We have also provided overview information on two test cases on which SMGO was used on controller design of a disk levitation system, and a simulations-based plant-controller co-design for an optical spacecraft.