Personalized Bayesian optimization for noisy problems

Wang, Xilu; Jin, Yaochu

doi:10.1007/s40747-023-01020-8

Personalized Bayesian optimization for noisy problems

Original Article
Open access
Published: 05 April 2023

Volume 9, pages 5745–5760, (2023)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Personalized Bayesian optimization for noisy problems

Download PDF

1532 Accesses
Explore all metrics

Abstract

In many real-world applications of interest, several related optimization tasks can be encountered, where each task is associated with a specific context or personalized information. Moreover, the amount of available data for each task may be highly limited due to the expensive cost involved. Although Bayesian optimization (BO) has emerged as a promising paradigm for handling black-box optimization problems, addressing such a sequence of optimization tasks can be intractable due to the cold start issues in BO. The key challenge is to speed up the optimization by leveraging the transferable information, while taking the personalization into consideration. In this paper, optimization problems with personalized variables are formally defined at first. Subsequently, a personalized evolutionary Bayesian algorithm is proposed to consider the personalized information and the measurement noise. Specifically, a contextual Gaussian process is used to jointly learn a surrogate model in different contexts with regard to the varying personalized parameter, and an evolutionary algorithm is tailored for optimizing an acquisition function for handling the presence of personalized information. Finally, we demonstrate the effectiveness of the proposed algorithm by testing it on widely used single- and multi-objective benchmark problems with personalized variables.

Simple Surrogate Model Assisted Optimization with Covariance Matrix Adaptation

Stratified Bayesian Optimization

Deterministic global optimization with Gaussian processes embedded

Article Open access 25 June 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Expensive black-box optimization problems are ubiquitous in real-world engineering and design optimization problems, where the problem is unknown besides the observed function values and the available number of function evaluations is severely limited. Bayesian optimization (BO) has emerged as an efficient method for optimizing black-box problems due to its data efficiency, which originates from a surrogate model approximating the true expensive objective functions [17]. Traditional BO typically focuses on a single task each time and starts a search from scratch assuming each task is isolated, which is called cold start [20]. However, several related tasks may be encountered in some real-world applications. As the standard BO does not consider task relatedness, a number of costly function evaluations are required to construct an efficient surrogate model even in cases several related tasks are encountered, making it inapplicable in practice [18].

Recently, transfer learning methods have been incorporated with BO to alleviate the cold start issue by leveraging the information from related tasks. A representative work is multi-task Gaussian processes [24], which is proposed to extend the GP to learn the inter-task similarities. Multi-task Gaussian processes have been successfully adopted various applications, such as hyperparameter optimization of machine learning models [20] and biomedical engineering [7]. The problem setting considered in this paper is motivated by problems in practice, such as optimizing a complex system subject to varying environmental conditions, advertising on different web pages, and developing personalized treatment for patients. In these problems, a sequence of related black-box optimization tasks should be addressed subjected to given context/personalized information in each round, which can be defined as expensive optimization with personalized/contextual information. For example, recently personalized medicine has been developed to shed light on the potential advantages of incorporating personal information, i.e., each person’s unique clinical, environmental, and genetic information, into the treatment. Take the transcranial alternating current stimulation (tACS) [1] as an example; tACS utilizes an alternating current delivered via multiple electrodes placed on the scalp, which is capable of propagating through the scalp and modulating the activity of the underlying neurons. However, it is problematic to identify the optimal tACS parameters, i.e., the current and frequency, for different individuals, as tailoring the treatment based on tACS is financially costly and time-consuming and may be perturbed by noise. In each round, the personalized/contextual information is available for a patient, and the aim is to search for the optimal parameters for tACS simulations.

We formally define the considered optimization problems as follows:

$$\begin{aligned} {\varvec{x}}^{*}(p)={\text {argmin}}_{{\varvec{x}} \in \mathcal {X}} f({\varvec{x}}, p), \end{aligned}$$

(1)

where ${\varvec{x}}=\left( x_{1}, x_{2}, \ldots , x_{d}\right) $ is the decision vector with d decision variables, $\mathcal {X}$ denotes the decision space, p denotes a variable that indicates the personalized/contextual information, such as personal characteristics and contextual information from the environment, and f denotes the expensive black-box objective function, where each function evaluation requires to perform time- and resource-consuming simulations or physical experiments. Note that the black-box function f has no closed form, but can be evaluated at any location ${\varvec{x}}$ in the domain. As a result, the evaluation may contain measurement noise $\varepsilon $, which can be formulated as

$$\begin{aligned} y({\varvec{x}}, p)=f({\varvec{x}}, p)+\varepsilon , \end{aligned}$$

(2)

where y denotes the noisy observation/output of the costly simulations or experiments. For instance, in tACS simulations, the decision variables include the alternating frequency and the current strength, and p is the personalized information of each patient. For this problem, the best inferred parameter combination differs along the value of p due to the individual difference. Hence, the optimal parameter ${\varvec{x}}^{*}(p)$ is not defined globally, but specifically to variable p. Given a group of participants associated with their personal characteristic, the goal is to find the optimal tACS parameters for each individual. Moreover, the objective function f is black-box that can only be evaluated by the time-consuming tACS simulations.

In this work, we attempt to address the above-mentioned computationally expensive optimization problems with personalized information and measurement noise. To this end, a personalized Bayesian optimization algorithm (PBO-EA) with an evolutionary algorithm (EA) to solve the inner optimization in BO is proposed to take the personalized variable into consideration. More specifically, personalized Gaussian processes (PGPs) are employed to jointly learn surrogate models across different contexts by utilizing the personalized information. Hence, for a test data with a specific value of personalized variable, the PGP can provide predictions associated with a confidence level. For the given personalized information, a modified evolutionary algorithm is used to optimize an acquisition function for identifying new samples. Both single-objective and multi-objective benchmark problems are modified to test the proposed method on the considered problem setting.

In the rest of the paper, Bayesian optimization for black-box global optimization problems, including Gaussian processes and acquisition functions, is introduced first. Following that, the proposed personalized evolutionary Bayesian optimization algorithm is presented. To validate the effectiveness of the proposed algorithm for expensive optimization problems with personalized variables and output noise, the single-objective and multi-objective benchmark suites are modified and the experimental results are summarized in “ Experimental studies”. Finally, we draw a conclusion and present some lines of future work.

Bayesian optimization

In this section, Bayesian optimization [17] is briefly introduced, including its two key components, i.e., Gaussian processes and acquisition functions. Suppose we aim to optimize an expensive black-box function $f: \mathcal {X} \rightarrow {\mathbb {R}}$, where $\mathcal {X} \subset {\mathbb {R}}^d$ is a compact and convex set. We can access only the possible noisy evaluations $y=f({\varvec{x}})+\varepsilon $ at any query point ${\varvec{x}}$. Formally, the goal is to find the global optimum

$$\begin{aligned} {\varvec{x}}^{*}={\text {argmin}}_{{\varvec{x}} \in \mathcal {X}} f({\varvec{x}}). \end{aligned}$$

(3)

However, the limited evaluation budget resulting from the costly function evaluations makes it hard for an algorithm to converge to the global optimum. Bayesian optimization has been emerged as a popular methodology for global optimization of expensive black-box functions due to its high sample efficiency. A key component of Bayesian optimization is a surrogate model that trained by the observed data to approximate the true objective function. The surrogate model can replace the expensive evaluation by providing predictions on the queried locations. Bayesian optimization often adopts Gaussian processes (GPs) as the surrogate model, especially in the small data regime. Gaussian processes provide predictions associated with uncertainty estimates that is important to guide the global search. Afterwards, an acquisition function (AF) is carefully designed to balance the exploration and exploitation based on the predictions. Instead of optimizing the expensive objective function, the acquisition function is optimized to identify the next query point. In the following, more details of Gaussian processes and acquisition functions are presented.

Gaussian processes

The Gaussian process (GP) is characterized by its prior mean function $m(\cdot )$ and covariance function (or kernel matrix) $k(\cdot ,\cdot )$ [16]. Consider a finite collection of data pairs $({\varvec{X}},{\varvec{y}})$ of the unknown function $y=f(\textbf{X})+\epsilon $ with $\epsilon \sim \mathcal {N}\left( 0, \sigma _{\epsilon }^{2}\right) $, where ${\varvec{X}}=[{\varvec{x}}^{1}, {\varvec{x}}^{2},..., {\varvec{x}}^{N}]^{T}$, ${\varvec{y}} =[y^{1}, y^{2},..., y^{N}]^{T}$, we assume that the observed data are drawn from a multivariate Gaussian distribution with a Gaussian prior

$$\begin{aligned} y \sim \mathcal {G} \mathcal {P}\left( m({\varvec{x}}), k\left( {\varvec{x}}, {\varvec{x}}^{\prime }\right) \right) . \end{aligned}$$

(4)

Therefore, we obtained the predictive distribution for y at a new observation ${\varvec{x}}$ that also follows a Gaussian distribution. Its mean $\mu $ and variance $\sigma ^2$ are given by

$$\begin{aligned} \begin{aligned} \mu ({\varvec{x}})&=K\left( {\varvec{x}}, {\varvec{X}}\right) \left( K({\varvec{X}}, {\varvec{X}}\right) +\sigma _{\epsilon }^{2} \textbf{I})^{-1} {\varvec{y}}\\ \sigma ^2 ({\varvec{x}})&=K\left( {\varvec{x}}, {\varvec{x}}\right) -K\left( {\varvec{x}}, {\varvec{X}}\right) (K({\varvec{X}}, {\varvec{X}})+\sigma _{\epsilon }^{2} \textbf{I})^{-1} K\left( {\varvec{X}},{\varvec{x}}\right) , \end{aligned} \end{aligned}$$

(5)

where $k\left( {\varvec{x}}; X\right) $ presents a correlation vector between ${\varvec{x}}$ and each element ${\varvec{x}}^{i}$ in ${\varvec{X}}$, $K({\varvec{X}};{\varvec{X}})$ is a covariance matrix whose element is calculated by the covariance (kernel) function. Commonly used covariance functions include the squared exponential and Matérn covariance functions.

Acquisition functions

Once the GP is constructed, the next step is to select the next query point, i.e., to select a new point to be evaluated using the expensive function. In Bayesian optimization, instead of optimizing the true objective function, an AF is optimized to identify new samples. Generally, optimizing the AF by evolutionary algorithms is called evolutionary Bayesian optimization (EBO). AFs typically incorporate both the mean and the variance of the GP prediction to achieve a trade-off between exploration and exploitation. A fruitful line of research has been done on designing acquisition functions, including expected improvement (EI) [11] and lower confidence bound (LCB) [8]. Moreover, some extensions of EI accounting for measurement noise have been proposed.

Let $f^*$ denote the optimum obtained so far, and $\Phi (\cdot )$ and $\phi (\cdot )$ denote the normal cumulative distribution function (CDF), and probability density function (PDF) of the standard normal random variable, respectively. A commonly used acquisition function is called lower bound confidence (LCB) [19]. LCB is designed to balance the exploration and exploitation by combining the uncertainty with the predicted objective values

$$\begin{aligned} LCB\left( {\varvec{x}} \right) =\mu \left( {\varvec{x}} \right) -\kappa \sigma \left( {\varvec{x}} \right) , \end{aligned}$$

(6)

where $\kappa $ presents a trade-off constant. LCB implicitly prefers points whose predicted mean value $\mu $ is small and the corresponding standard deviation $\sigma $ is large.

Alternatively, expected improvement (EI) [14] calculates the expected improvement with respect to $f^*$

$$\begin{aligned}&EI({\varvec{x}}) ={\mathbb {E}}\left[ \max \left( 0, f^*-f({\varvec{x}})\right) \right] \nonumber \\&=\left( f^*-\mu (\textbf{x})\right) \Phi \left( \frac{f^*-\mu ({\varvec{x}})}{\sigma ({\varvec{x}})}\right) +\sigma ({\varvec{x}}) \phi \left( \frac{f^*-\mu ({\varvec{x}})}{\sigma ({\varvec{x}})}\right) ,\nonumber \\ \end{aligned}$$

(7)

where ${\mathbb {E}}$ denotes the expectation value, and $\Phi $ and $\phi $ are the Gaussian CDF and PDF, respectively.

For noisy observations, i.e., the objective function is subject to noise, the standard EI will face two key challenges [15]: (1) the current best solution is not well defined, and (2) the prediction uncertainty associated with the current best fitness values is not accounted for. Hence, a variant of EI for handling noisy optimization, known as Augmented EI (AEI) [10], is introduced

$$\begin{aligned} \begin{aligned}&AEI({\varvec{x}})=\\ {}&E\left[ \max \left( \mu \left( {\varvec{x}}^{*}\right) -f({\varvec{x}})), 0\right) \right] \cdot \left( 1-\frac{\sigma _{\varepsilon }}{\sqrt{\sigma ^{2}({\varvec{x}})+\sigma _{\varepsilon }^{2}}}\right) , \end{aligned} \end{aligned}$$

(8)

where ${\varvec{x}}^{*}$ stands for the current ‘effective best solution’, which is determined as explained below, $\sigma _{\varepsilon }^{2}$ denotes the noise level. In Eq. (8), the expectation is conditional given the past data and given estimates of the correlation parameters.

To determine the effective best solution ${\varvec{x}}^{*}$, we introduce a utility function, denoted as $Utility({\varvec{x}})$, to account for the uncertainty associated with the predicted objective values. In general, the form of the utility function may be selected according to the user’s preference. In AEI, the following formula is used:

$$\begin{aligned} Utility({\varvec{x}})=-\mu ({\varvec{x}})-c \sigma ({\varvec{x}}), \end{aligned}$$

(9)

where c is a constant that can reflect the degree of risk aversion. We select $c=1$ as our default, while implies a willingness to trade 1 unit of the predicted objective value for 1 unit of the standard deviation of prediction uncertainty. Consequently, the new sample is identified by maximizing Utility function and $\mu \left( {\varvec{x}}^{*}\right) $ denotes the prediction on ${\varvec{x}}^{*}$ provided by the surrogate model.

Similarly, in [9], another variant of EI criterion, called Profile EI (PEI), for contextual optimization is proposed. Recall that the standard EI measures the expected improvement on the current best location. However, for a given new context, the current best location may misguide the search for the unseen context. Hence, an alternative for the current best location is introduced as follows:

$$\begin{aligned} T({\varvec{x}}):=\min \left( \max \mu ({\varvec{x}}), {\varvec{x}}^*\right) , \end{aligned}$$

(10)

where m(x) is the mean value of x predicted by the personalized Gaussian process. Consequently, the PEI is

$$\begin{aligned} PEI({\varvec{x}}) \equiv E\left[ \max \left( T({\varvec{x}})-f({\varvec{x}}), 0\right) \right] . \end{aligned}$$

(11)

Personalized evolutionary Bayesian optimization

Despite the abundance of applications of Bayesian optimization, a few methods have been proposed to investigate expensive black-box function with personalized variables and measurement noise. In this section, a personalized evolutionary Bayesian optimization is introduced. While the existing work mainly focused on single-objective optimization, we attempt to extend the proposed algorithm to multi-objective optimization problems with personalized variables. The pseudo code of the proposed algorithm for single/multi-objective problems is outlined in Algorithm 1.

Personalized Gaussian processes

The commonly used surrogate model in BO, Gaussian processes, does not take the personalized information into consideration. That is, the standard surrogate model approximates $f({\varvec{x}})$ instead of $f({\varvec{x}},p)$. To model the objective function $f({\varvec{x}},p)$ in presence of personalized variables, there are three possible methods: (1) we assume that each individual with a specific value of the personalized variable is not related to each other. Under this assumption, a single Gaussian process should be constructed for each individual and BO should be performed separately on each individual. Unfortunately, we only have very limited number of data for each individual, posing challenges for constructing separate surrogate models with good quality. (2) Alternatively, we can ignore the influence of the personalized variables on the objective function by assuming that all individuals with varying personal characteristics share a same objective function. Hence, we can reduce the optimization problem with personalized parameters to a standard optimization problem. However, this is not the case in practical problems. (3) we assume that these individuals with different personalized information share some similarities. Hence, we can learn a personalized Gaussian process by leveraging the similarity between different individuals. For the considered problem setting, the observed data for a number of individuals with different personal characteristics can be augmented together to train a single GP model, enhancing the estimate of the model parameters. To achieve this, a contextual GP [12] is employed in our work to jointly learn a personalized surrogate model, called a personalized GP (PGP).

The contextual GP is designed for contextual bandit problem where varying environmental conditions are considered. To leverage the contextual information, an additional kernel function is defined over the context space, which is used in conjunction with a kernel function over the input features. Such a composite kernel function in the contextual GP allows us to analysis different contexts within a single GP model. Motivated by the success of the contextual GP, we present a product kernel for the optimization with personalized parameters. This product kernel is constructed using kernels defined over the decision variables and the context (the personalized parameter). Hence, the covariance becomes

$$\begin{aligned} k\left( \left\{ {\varvec{x}}_{i}, p_{i}\right\} ;\left\{ {\varvec{x}}_{j}, p_{j}\right\} \right) =k\left( {\varvec{x}}_{i}, {\varvec{x}}_{j}\right) \otimes k\left( p_{i}, p_{j}\right) , \end{aligned}$$

(12)

where $k\left( {\varvec{x}}_{i}, {\varvec{x}}_{j}\right) $ and $k\left( p_{i}, p_{j}\right) $ are the kernel over search space and the context, respectively. $\otimes $ denotes Kronecker product. The core idea behind this product kernel is to describe the data-context pairs by constructing separate kernels on different variables. As a result, the observed data in one context can impact the predictions in another context as the model can capture the correlation between multiple optimization tasks with varying contexts. In this work, the squared exponential kernel is used.

Optimization of acquisition functions

Having constructed the contextual Gaussian process, the nest step is the design and optimization of an AF to search for the new sample conditional on the personalized parameter. In our work, while EI is adopted as the AF for single-objective problems with personalized variables, a variant of LCB is used in multi-objective optimization problems (MOPs). Consider a m-objective optimization problem with personalized variables, one personalized GP is constructed for each objective. Hence, the PGPs can predict mean and variance of a candidate solution ${\varvec{x}}$ on all objectives, denoted as $\varvec{\sigma }({\varvec{x}})=\left\{ \sigma _{i}({\varvec{x}})\right\} _{i=1}^m$ and $\varvec{\mu }({\varvec{x}})=\left\{ \mu _{i}({\varvec{x}})\right\} _{i=1}^m$, respectively. In this way, the original LCB shown in Eq. (6) can be extended to MOPs as follows:

$$\begin{aligned} \varvec{mLCB}\left( {\varvec{x}} \right) =\varvec{\mu } \left( {\varvec{x}} \right) -\kappa \varvec{\sigma } \left( {\varvec{x}} \right) , \end{aligned}$$

(13)

where k is a parameter to manage the trade-off between exploration and exploitation. Note that $\varvec{mLCB}\left( {\varvec{x}} \right) $ is a vector with a length of m.

Generally, AFs are multi-modal, non-convex, and difficult to be optimized. While various kinds of optimization methods have been introduced to solve the inner optimization within the BO framework, there is a lack of studies exploring the optimization of acquisition functions in the presence of the personalized/contextual information and the measurement noise. In the considered problems, the personalized information for each individual is given before performing BO. Hence, the value for the personalized parameter is fixed when optimizing the acquisition function with the surrogate model.

Evolutionary algorithms (EAs), a class of population-based methods, have been successfully applied to various non-convex optimization problems [3]. EAs’ efficiency originates from the fact that they are insensitive to local optima and do not require gradient information. While EAs have emerged as a powerful method to optimization AFs in BO [25], they are not applicable to the problem considered here as the personalized/contextual information is not considered. For the considered problem, the personalized information is given in each round of BO. Hence, the value of the personalized variable is fix when optimizing the AF. Without loss of generality, we suggest a standard genetic algorithm (GA) and a reference vector guided EA (RVEA) [2] as the single-objective and multi-objective optimizer for the acquisition function optimization given the personalized information, respectively. Specifically, in both GA and RVEA, a real number coding scheme, the simulated binary crossover [4], and the polynomial mutation [5] are employed.

Experimental studies

Test problems

We consider nine commonly used single-objective optimization problems [15, 21], including six two-dimensional problems, i.e., Goldstein, Rosenbrock, Branin, Six-hump camel, Dropwave, Beale and Ackley function with ten decision variables, and Hartmann function with three and six decision variables, respectively. The DTLZ [6] test suite is used to test the performance of the proposed algorithm on MOPs. The modified counterparts for DTLZ1 and DTLZ3 by reducing the complexity to a reasonable level are denoted as DTLZ1a and DTLZ3a, respectively, and shown in the Supplementary material. As recommended in [6], the number of decision variables for the test instances is set to $n=m+K-1$, where $K=5$ is adopted for DTLZ1, $K=10$ is used for DTLZ2 to DTLZ6 as well as DTLA3a, and $K=20$ is employed in DTLZ7. m represents the number of objectives. Here, we set $m=3$.

Note that these benchmarks are designed without considering the personalized variables. Hence, to involve personalized variables into the objective function, we vary the parameters in the benchmark problems, so that different values for the personalized variable will generate different objective functions. For example, the original Rosenbrock function is

$$\begin{aligned} f({\varvec{x}})=\left[ \sum _{i=1}^{D-1}\left( 100\left( \bar{x}_{i+1}-\bar{x}_{i}^{2}\right) ^{2}+\left( 1-\bar{x}_{i}\right) ^{2}\right) \right] , \end{aligned}$$

(14)

where $\bar{x}_{i}=15 x_{i}-5$, for all $i=1,2,3,4$. To generated a variant of Rosenbrock function for the considered problem, we treat a constant in Eq. (14) as the personalized variable p. The different value of p is calculated by multiplying the original value of the selected constant with values of samples s drawn from a normal distribution. Hence, the corresponding Rosenbrock function is formulated as

$$\begin{aligned} f({\varvec{x}},p)=\left[ \sum _{i=1}^{D-1}\left( p\left( \bar{x}_{i+1}-\bar{x}_{i}^{2}\right) ^{2}+\left( 1-\bar{x}_{i}\right) ^{2}\right) \right] \end{aligned}$$

(15)

with $p=100*s$.

On the other hand, the observations in the real-word experiments may be noisy. To test the influence of the noise, we add a Gaussian noise $noise \sim N\left( 0, \sigma _\varepsilon ^{2}\right) $ to the output of each benchmark problem. Varying $\sigma _\varepsilon ^{2}$ of the distribution from 0 to 1 can change the level of noise. Hence, the final Rosenbrock function used in our experiments is

$$\begin{aligned} f({\varvec{x}},p)=\left[ \sum _{i=1}^{3}\left( p\left( \bar{x}_{i+1}-\bar{x}_{i}^{2}\right) ^{2}+\left( 1-\bar{x}_{i}\right) ^{2}\right) \right] \times (1+\sigma _{\varepsilon }) \end{aligned}$$

(16)

with $p=100*s$. In our experiments, we test each algorithm on the benchmark problems with different levels of noise, i.e., $\sigma _\varepsilon ^{2}=0, 0.05, 0.1, 0.5$ and 1. Moreover, ten personalized values are generated for each benchmark problem to test each algorithm on different optimization problems with varying personalized information.

Similar to personalized single-objective benchmark problems, we also modified the selected multi-objective test problems by treating a parameter in the original benchmark problem as the personalized variable.

Table 1 Statistical results obtained by PBO-EA, BO, PBO-SQR, PBO-LCB, PBO-AEI, and PBO-PEI on benchmark problems with $noise level=0$

Full size table

Table 2 Statistical results obtained by PBO-EA, BO, PBO-SQR, PBO-LCB, PBO-AEI, and PBO-PEI on benchmark problems with $noise level=0.05$

Full size table

Table 3 Statistical results obtained by PBO-EA, BO, PBO-SQR, PBO-LCB, PBO-AEI, and PBO-PEI on benchmark problems with $noise level=0.1$

Full size table

Table 4 Statistical results obtained by PBO-EA, BO, PBO-SQR, PBO-LCB, PBO-AEI, and PBO-PEI on benchmark problems with $noise level=0.5$

Full size table

Table 5 Statistical results obtained by PBO-EA, BO, PBO-SQR, PBO-LCB, PBO-AEI, and PBO-PEI on benchmark problems with $noise level=1$

Full size table

Experimental settings

Hence, we tested a standard evolutionary algorithm (i.e., genetic algorithm), and a BFGS-based sequential quadratic programming (SQR) provided by an MATLAB function. We denote these two methods as PBO-EA and PBO-SQR, respectively. Note that the expected improvement (EI) is adopted as the acquisition function to identify new samples in both methods.

Based on the above analysis, we attempt to solve the optimization problems with the personalized variable from the perspective of the surrogate model, the design, and the optimization of the acquisition function. To validate each component in the proposed personalized Bayesian optimization framework, shown in Algorithm 1, the experiment include four major parts:

1.
To validate the effectiveness of the personalized GP, the proposed PBO-EA is compared with the standard BO with the standard GP. While the standard GP ignores the personalized information, the contextual Gaussian process in PBO-EA attempts to jointly learn an approximation of the true objective function over different contexts.
2.
A variant of PBO-EA is introduced to test the advantage of the use of GA. While PBO-EA use GA to optimize EI, the variant employs a BFGS-based sequential quadratic programming (SQR) provided by a MATLAB function, called PBO-SQR.
3.
Different AFs are tested on the problems with personalized variable and measurement noise. In addition to EI, the two variants of EI for handling measurement noise, i.e., AEI and PEI, and LCB are adopted, denoted as PBO-AEI, PBO-PEI, and PBO-LCB, respectively.
4.
To test the effectiveness of PGP on MOPs with personalized variables, we suggest to compare a PGP-assisted multi-objective evolutionary algorithm (MOEA) and a GP-assisted one, denoted as PGP-MOEA and GP-MOEA, respectively. In both PGP-MOEA and GP-MOEA, a commonly used MOEA (we use RVEA [2]) is performed to optimize $\varvec{mLCB}$ to select u new samples. Parameter k in $\varvec{mLCB}$ is set to 2 and u is set to 5.

We run each algorithm on each benchmark problem for ten independent times, and the Wilcoxon rank sum test [23] is calculated to compare the mean over ten independent runs obtained by PBO-EA and by the compared algorithms at a significance level of 0.05. Symbol “(+)” indicates that the proposed algorithm outperforms the compared algorithm statistically significantly, while “(−)” means that the compared algorithm performs better than PBO, and “($\approx $)” means that there is no significant difference between them. The mean and standard deviation (Std) of ten runs for the single-objective problems with different noise levels obtained by the algorithms are recorded in Tables 1, 2, 3, 4, 5 and Tables SI–SV in the Supplementary material. Note that we use notation ‘S’ to indicate tables and figures in the Supplementary material to avoid confusion. For MOPs, the hypervolume (HV) [22] metric is adopted to assess the performance of the algorithms. HV provides a combined information of the convergence and diversity of the obtained set of solutions. All HV values presented in this work are normalized to [0, 1]. Algorithms achieving a larger HV value are better. The results in terms of HV values obtained by PGP-MOEA and GP-MOEA are summarized in Table 6.

All algorithms are implemented in Matlab R2019a on an Intel Core i7 with 2.21 GHz CPU. The general parameter settings in the experiments are given as follows:

1.
We initial ten personalized parameters. For each personalized value, the size of the initial training data is ten. Hence, there are $10*10$ initial data in total.
2.
We randomly generate ten personalized values, i.e., $\textbf{p}=[p_1,p_2,\cdots ,p_{10}]$, and the algorithm runs ten iterations for each personalized value. Hence, the maximum number of function evaluations $FE=10*10+10*10$.
3.
As ten personalized values (contexts) are considered for each benchmark problem, there are $10*12$ optimization tasks in total.
4.
For evolutionary algorithm: the population size is set to 60 and the maximum number of generations is set to 20.
5.
One new sample is selected for single-objective problems, and five new samples ($u=5$) are selected for MOPs.

Experimental results

1.
Comparison of surrogate models: The results obtained by the standard BO and PBO-EA on single-objective problems with different levels of noise are presented in Tables 1, 2, 3, 4, 5 and Tables SI–SV, respectively. We can see that the algorithm with the personalized GP, i.e., PBO-EA, shows better overall performance than the standard BO. More specifically, PBO-EA shows significantly better or similar performance than the standard BO on 74 out of 80 optimization tasks without noise, indicating the benefits of the use of personalized information for modelling the objective function. However, the performance of both BO and PBO-EA will degrade with the increase of the noise level.
2.
Comparison of optimization methods: We can see that PBO-EA always shows better performance than PBO-SQR, indicating the effectiveness of EAs for optimizing the acquisition function. A possible explanation is that EAs are population-based optimization methods and do not require the gradient of the objective function, allowing them to get out of local optima. However, it is interesting to see that PBO-SQR shows better performance than PBO-EA on Ackley function. The reason may be that SQR methods are suitable for solving Ackley function, because Ackley function is characterized by a nearly flat outer region, and a large hole at the centre, which follows the assumption of SQR method.
3.
Comparison of the acquisition functions: We tested four acquisition functions within the PBO framework, i.e., EI, LCB, AEI, and PEI. According to the results in Tables 1, 2, 3, 4, 5 and Tables SI–SV, we can observe that the EI shows better performance than the others on problems with different levels of noise. Among all the tested acquisition functions, LCB shows the worst performance on most test instances and the level of noise significantly degrades its performance. It is interesting to see that when the noise level is large, i.e., $\sigma _\varepsilon ^{2}=1$, EI, PEI, and AEI show similar performance. A possible explanation is that AEI involves a penalty to account for the noise variance of the next evaluation. Specially, AEI penalizes data points with a small prediction variance and therefore enhances exploration. The small noise level may impact the surrogate model slightly, and the enhanced exploration may hinder slow convergence, which is not desired when a very limited number of fitness evaluations are available. By contrast, when heavy noise may lead to poor surrogate models, both PEI, AEI, and EI cannot evaluate the candidate solutions correctly, resulting in similar performance.
4.
Comparison on MOPs with personalized variables: According to the statistical results in terms of the mean and std HV values in Table 6, it is clear that PGP-MOEA significantly outperforms GP-MOEA on 38 out of 54 optimization tasks resulting from varying values of the personalized variable. Moreover, they show similar performance with regard to convergence and diversity indicated by HV values on the remaining ones. This observation indicates that personalized GPs can provide better predictions than the standard GPs in the considered problem, which further enhances the optimization performance of PGP-MOEA. This confirms the effectiveness of the personalized GP for handling optimization problems where personalization is taken into consideration.

To gain a deeper insight into the performance achieved by each algorithm, we plot the change of the mean and the variance of the best objective values over the number of contexts (the value of the personalized variable). As shown in Fig. 1, PBO-EA can achieve better performance on the selected test instances for different personalized values. Specifically, while the best value found by the standard BO on different personalized values varies a lot, PBO can achieve a more stable performance with the change of the personalized variable, indicating the benefits of PGPs. Compared with the SQR method, the EA shows more stable and better results, especially for Beal function and Rosenbrock function. This reveals the advantage of EAs for optimization of the acquisition function that are multi-model and complex. By comparing the different acquisition functions, we can see that the EI acquisition function shows promising performance compared with the other acquisition functions. Interestingly, both AEI and PEI achieve competitive performance on all the test instances, especially when the noise level increases. Similarly, we plot the non-dominated solution set with the median HV value among 20 runs obtained by PGP-MOEA and GP-MOEA on DTLZ2 and DTLZ1a, respectively. As shown in Fig. 2 and Fig. S1, we can see that PGP-MOEA can achieve a better performance in terms of convergence and diversity.

Table 6 Statistical results obtained by PGP-MOEA and GP-MOEA with the same number of real function evaluations

Full size table

Conclusion

In this paper, we consider solving expensive optimization problems with personalized decision variables and observation noise, which cannot be efficiently solved by standard Bayesian optimization methods. We consider both single-objective and multi-objective optimization, and introduce the corresponding test benchmark problems. Then, we propose a personalized Bayesian optimization algorithm by jointly learning surrogate models over different contexts and accounting for the observation noise. More specifically, a composite kernel is introduced to measure the similarity of decision variables and personalized variables simultaneously, making it possible to transfer knowledge between different individuals. To reduce the impact of noise on the optimization, we test different kinds of acquisition functions and it turns out that EI always achieves competitive performance with different noise levels.

The proposed algorithm is tested on sets of widely used benchmark problems for different personalized information. Our experimental results demonstrate that the proposed algorithm achieves significantly better performance than the standard Bayesian optimization methods on most test instances studied in this work. Comparisons are also carried out to investigate the effectiveness of the contextual Gaussian process and acquisition functions used in the proposed algorithm. The empirical results confirm that the Gaussian processes considering the personalized information result in the good performance of the proposed algorithm.

The research on optimization problems involving personalized/contextual information is still in its infancy and demands for further investigations. On the one hand, it is challenging to construct effective surrogate models over various contexts, as only a very limited number of data are available in each context. On the other hand, for a given value for the personalized variable, the search of new samples to be evaluated by the true expensive objective functions should be further investigated, since optimization of the acquisition function will become increasingly challenging. Moreover, the assumptions that optimization problems are noise-free may hardly hold in practice, rendering poor optimization performance. Hence, it is interesting yet challenging to investigate noise-handling methods in Bayesian optimization of real-world problems.

References

Antal A, Paulus W (2013) Transcranial alternating current stimulation (tACS). Front Hum Neurosci 7:317
Article Google Scholar
Cheng R, Jin Y, Olhofer M, Sendhoff B (2016) A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Trans Evol Comput 20(5):773–791
Article Google Scholar
Dipankar D, Zbigniew M (2013) Evolutionary algorithms in engineering applications. Springer Science & Business Media
Deb K, Beyer H-G (2001) Self-adaptive genetic algorithms with simulated binary crossover. Evol Comput 9(2):197–221
Article Google Scholar
Deb Kalyanmoy, Deb Debayan et al (2014) Analysing mutation schemes for real-parameter genetic algorithms. Int J Artif Intell Soft Comput 4(1):1–28
Google Scholar
Deb Kalyanmoy, Thiele Lothar, Laumanns Marco, Zitzler Eckart (2002) Scalable multi-objective optimization test problems. In Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600), volume 1, pages 825–830. IEEE
Robert Dürichen, Pimentel Marco AF, Lei Clifton, Achim Schweikard, Clifton David A (2014) Multitask Gaussian processes for multivariate physiological time-series analysis. IEEE Trans Biomed Eng 62(1):314–322
Google Scholar
Emmerich Michael TM, Giannakoglou Kyriakos C, Boris Naujoks (2006) Single-and multiobjective evolutionary optimization assisted by gaussian random field metamodels. IEEE Trans Evol Comput 10(4):421–439
Article Google Scholar
Ginsbourger David, Baccou Jean, Chevalier Clément, Perales Frédéric, Garland Nicolas, Monerie Yann (2014) Bayesian adaptive reconstruction of profile optima and optimizers. SIAM/ASA J Uncertainty Quantification 2(1):490–510
Article MathSciNet MATH Google Scholar
Deng Huang, Allen Theodore T, Notz William I, Ning Zeng (2006) Global optimization of stochastic black-box systems via sequential Kriging meta-models. J Global Opt 34(3):441–466
Article MathSciNet Google Scholar
Jones Donald R, Matthias Schonlau, Welch William J (1998) Efficient global optimization of expensive black-box functions. J Global Opt 13(4):455–492
Article MathSciNet MATH Google Scholar
Krause Andreas, Ong Cheng Soon (2011) Contextual Gaussian process bandit optimization. In Nips, pages 2447–2455
McKay Michael D, Beckman Richard J, Conover William J (2000) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 42(1):55–61
Article Google Scholar
Močkus Jonas (1975) On Bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical Conference, pages 400–404. Springer
Picheny Victor, Wagner Tobias, Ginsbourger David (2013) A benchmark of Kriging-based infill criteria for noisy optimization. Struct Multidisciplinary Opt 48(3):607–626
Article Google Scholar
Rasmussen Carl Edward (2003) Gaussian processes in machine learning. In Summer school on Machine Learning, pages 63–71. Springer
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Adams Ryan P, Nando De Freitas (2015) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104(1):148–175
Snoek Jasper, Larochelle Hugo, Adams Ryan P (2012) Practical Bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25
Srinivas Niranjan, Krause Andreas, Kakade Sham M, Seeger Matthias (2010) Gaussian process optimization in the bandit setting: No regret and experimental design. pages 1015–1022
Swersky Kevin, Snoek Jasper, Adams Ryan P (2013) Multi-task Bayesian optimization. Advances in neural information processing systems, 26
Anh Tran, Mike Eldred, Tim Wildey, Scott McCann, Jing Sun, Visintainer Robert J (2022) aphBO-2GP-3B: a budgeted asynchronous parallel multi-acquisition functions for constrained Bayesian optimization on high-performing computing architecture. Struct Multidisciplinary Opt 65(4):1–45
MathSciNet Google Scholar
While Lyndon, Hingston Philip, Barone Luigi, Huband Simon (2006) A faster algorithm for calculating hypervolume. IEEE Trans Evol Comput 10(1):29–38
Article Google Scholar
Wilcoxon Frank, Katti SK, Wilcox Roberta A (1963) Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test, volume 1. American Cyanamid Pearl River (NY)
Williams Chris, Bonilla Edwin V, Chai Kian M (2007) Multi-task Gaussian process prediction. Advances in Neural Information Processing Systems, pages 153–160
Zhang Qingfu, Liu Wudong, Tsang Edward, Virginas Botond (2009) Expensive multiobjective optimization by MOEA/D with Gaussian process model. IEEE Trans Evol Comput 14(3):456–474
Article Google Scholar

Download references

Acknowledgements

This work was supported by an Alexander von Humboldt Professorship for Artificial Intelligence endowed to Y. Jin by the German Federal Ministry of Education and Research.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Faculty of Technology, Bielefeld University, Bielefeld, 33619, Germany
Xilu Wang & Yaochu Jin

Authors

Xilu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yaochu Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yaochu Jin.

Ethics declarations

Conflict of interest

All authors declare that there are no financial and personal relationships with other people or organizations that could inappropriately influence (bias) our work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, X., Jin, Y. Personalized Bayesian optimization for noisy problems. Complex Intell. Syst. 9, 5745–5760 (2023). https://doi.org/10.1007/s40747-023-01020-8

Download citation

Received: 02 October 2022
Accepted: 18 February 2023
Published: 05 April 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s40747-023-01020-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Personalized Bayesian optimization for noisy problems

Abstract

Similar content being viewed by others