1 Introduction

We consider as reference problem the black-box optimization: the objective function and/or constraints are analytically unknown and evaluating them might be very expensive and noisy. In black-box situations, as we cannot assume any prior knowledge about the objective function \(f\left( x \right)\), any functional form is a priori admissible and the value of the function at a point says nothing about the value at other points: the only way to develop a problem specific algorithm is to assume a model of \(f\left( x \right)\) and to learn through a sample of function values.

Such an algorithm must be sample efficient because the cost of function evaluations is the dominating cost. This problem has been addressed in several fields under different names, including active learning (Kruschke et al. 2008; Griffiths et al. 2008; Wilson et al. 2015), Bayesian optimization (BO) (Zhigljavsky and Zilinskas 2007; Candelieri et al. 2018; Archetti and Candelieri 2019), hyperparameter optimization (Eggensperger et al. 2019), Lipschitz global optimization (Sergeyev et al. 2020a, b; Lera et al. 2021), and others.

In BO, a probabilistic surrogate model of the objective function is built to sum up our a priori beliefs about the objective function and the informative value of new observations. Two probabilistic frameworks are usually considered: the Gaussian processes (GPs) and random forests (RF) which offer alternative ways to update the beliefs as new data arrives and to provide an estimate of the expected value of the objective function and the uncertainty in this estimate.

Both GP and RF provide surrogate models: the major driver of the choice is the type of design variables: continuous ones are better dealt with GP, while integer/categorical and conditional ones with RF.

In both cases the next sampled point is chosen on the basis of its informative value through the maximization of an acquisition function: this choice brings up the so-called exploration vs exploitation dilemma, where exploration means devoting resources to know more about possible solutions while exploitation devotes resources to improve on solutions already identified in previous phases. The search for the new point must strike an effective balance between the needs of exploration and exploitation.

Psychologists have extensively studied how humans balance exploration and exploitation (Krusche et al. 2008; Mehlhorn et al. 2015), with a recent attention to the links between modern machine learning algorithms and psychological processes. (Gershman 2018; Schulz et al. 2016; Gopnik et al. 2017). Psychological research has so far mostly focused on how people learn functions according to a protocol in which an input is presented to participants and they are asked to predict the corresponding output value. Then, they observe the true output value (usually noisy) in order to update their own “predictive model” that is to adjust their internal representation of the underlying function. Psychologists have largely focused on GP: the issues of GP regression, kernel composition for different degrees of smoothness and safe optimization in their relation to cognition is studied in a recent survey by Schulz et al. (2018b). Exploration is realized by adding the so-called uncertainty bonus to estimated values obtaining the upper confidence bound (UCB) algorithm (Srinivas et al. 2010). In Wu et al. (2018) the human search strategy is analysed for rewards under limited search horizons, concluding that GP offers the best model for generalization and UCB the best solution of the exploration/exploitation dilemma.

A significant application of RF is given in Plonsky et al. (2019) as a hybrid model of machine learning and decision mechanisms. A key driver in the above research activities is that human learners are increasingly fast at adapting to unfamiliar environments. Psychologists are investigating the intriguing gap between the capabilities of human and machine learning.

Most previous research findings in human learning refer to function learning because it is related to a probabilistic perspective on predictability and provides a proxy to generalization capability. Contrary to function learning, optimization is not yet widely considered in the literature; in Borji and Itti (2013) a simple 1-D optimization problem has been considered.

The approach presented in this paper has been sketched in Candelieri et al. (2019). The present paper has been significantly augmented and rewritten. A set of new computational results are related to the use, along with the GP, of the RF as surrogate model. The set of references has been also enlarged, and the whole perspective has been widened to reflect that: learning and optimization of black-box functions are two faces of the same coin. Moreover, other global optimization strategies have been also considered in this study, both deterministic and stochastic, leading to the main result that BO is the one offering a reasonable unifying framework of human function learning, active sampling and optimization.

The structure of the paper is as follows: Sect. 2 outlines the methodological background of BO including the basis of the two main surrogate models, GP and RF, and the management of the exploration/exploitation dilemma. Section 3 is devoted to the experimental set-up, and Sect. 4 reports the experimental results about the behavioural patterns of humans in optimizing black-box functions, compared to both BO and other five global optimization strategies. Section 5 outlines the conclusions of this study and the perspectives of future works.

2 Methodological background

This section provides the underlying methodological framework of the study. The global optimization problem we consider is defined as:

$$ \mathop {\max }\limits_{{x \in \chi \subset {\mathbb{R}}^{d} }} f\left( x \right) $$

where the search space \(\chi\) is generally box-bounded and \(f\left( x \right)\) is black-box meaning that no gradient information is available and that we have only access to noisy observations of \(f\left( x \right)\) which are computationally expensive.

2.1 Gaussian processes

GPs are a powerful nonparametric model for implementing both regression and classification. One way to interpret a GP is as a distribution over functions, with inference taking place directly in the space of functions (Williams and Rasmussen 2006). A GP, therefore, is a collection of correlated random variables, any finite number of which have a joint Gaussian distribution. A GP is completely specified by its mean function \(\mu \left( x \right)\) and covariance function \({\text{cov}}\left( {f\left( x \right),f\left( {x^{\prime}} \right)} \right) = k\left( {x,x^{\prime}} \right)\):

$$ \begin{aligned} & \mu \left( x \right) = {\mathbb{E}}\left[ {f\left( x \right)} \right] \\ & {\text{cov}}\left( {f\left( x \right),f\left( {x^{\prime}} \right)} \right) = k\left( {x,x^{\prime}} \right) = {\mathbb{E}}\left[ {\left( {f\left( x \right) - \mu \left( x \right){ }} \right)\left( {f\left( {x^{\prime}} \right) - \mu \left( {x^{\prime}} \right){ }} \right)} \right] \\ \end{aligned} $$

and will be denoted by: \(f\left( x \right)\sim GP\left( {\mu \left( x \right),k\left( {x,x^{\prime}} \right)} \right)\). This means that the behaviour of the model can be controlled entirely through the mean and covariance.

Usually, for notational simplicity we will take the prior of the mean function to be zero, although this is not necessary. The covariance function assumes a critical role in the GP modelling, as it specifies the distribution over functions, depending on a sample \(X_{1:n} = \left\{ {x_{1} , \ldots ,x_{n} } \right\}\). More precisely, \(f\left( {X_{1:n} } \right)\sim {\mathcal{N}}\left( {0,K\left( {X_{1:n} ,X_{1:n} } \right)} \right)\) with \(K\left( {X_{1:n} ,X_{1:n} } \right)\) an \(n \times n\) matrix whose entry \(K_{ij} = k\left( {x_{i} ,x_{j} } \right)\) with \(k\) a kernel function specifying a prior about the smoothness of the function to be approximated and optimized.

We usually have access only to noisy function values, denoted by \(y = f\left( x \right) + \varepsilon\), with \(\varepsilon\) an additive independent identically distributed Gaussian noise with variance \(\lambda^{2}\).

Let \(y = \left( {y_{1} , \ldots ,y_{n} } \right)\) denote a set of \(n\) noisy function evaluations, then the resulting covariance matrix becomes \(K\left( {X_{1:n} ,X_{1:n} } \right) + \lambda^{2} I\).

Let \(D_{1:n} = \left\{ {\left( {x_{i} ,y_{i} } \right)} \right\}_{i = 1,..,n}\) denote the set containing both the locations and the associated function evaluations, then the predictive equations for GP regression are updated by conditioning the joint Gaussian prior distribution on \(D_{1:n}\):

$$ \begin{aligned} & \mu \left( x \right) = {\mathbb{E}}\left[ {f\left( x \right)|D_{1:n} ,x} \right] = k\left( {x,X_{1:n} } \right)\left[ {K\left( {X_{1:n} ,X_{1:n} } \right) + \lambda^{2} I} \right]^{ - 1} y \\ & \sigma^{2} \left( x \right) = k\left( {x,x} \right) - k\left( {x,X_{1:n} } \right)\left[ {K\left( {X_{1:n} ,X_{1:n} } \right) + \lambda^{2} I} \right]^{ - 1} k\left( {X_{1:n} ,x} \right) \\ \end{aligned} $$

where \(k\left( {x,X_{1:n} } \right) = \left[ {k\left( {x,x_{1} } \right), \ldots , k\left( {x,x_{n} } \right)} \right]\).

The covariance function is the crucial ingredient in a GP predictor, as it encodes assumptions about the function to approximate: function evaluations that are near to a given point should be informative about the prediction at that point. Under the GP view, it is the covariance function that defines nearness or similarity. Once the prior mean and the kernel are chosen, they are updated with the observation of \(f\left( x \right)\) to find a-posterior distribution \(f{(}x {|} D_{1:n} )\) and this allows us to find the expected value of the function at any point and to calculate its uncertainty through its predicted variance.

Examples of covariance (aka kernel) functions:

$$ \begin{gathered} {\text{Squared Exponential (SE) kernel:}}\quad k_{{{\text{SE}}}} \left( {x,x^{\prime}} \right) = e^{{ - \frac{{x - x^{{\prime}{2}} }}{{2\ell^{2} }}}} . \hfill \\ {\text{Exponential kernel:}}\quad k_{{{\text{Exp}}}} \left( {x,x^{\prime}} \right) = e^{{ - \frac{{\left| {x - x^{\prime}} \right|}}{\ell }}} . \hfill \\ {\text{Power exponential kernel:}}\quad k_{{{\text{PowExp}}}} \left( {x,x^{\prime}} \right) = e^{{ - \left( {\frac{{\left| {x - x^{\prime}} \right|}}{\ell }} \right)^{p} }} . \hfill \\ {\text{Mat}}{\acute{\text{e}}}{\text{rn kernels:}}\quad k_{{{\text{Mat}}}} \left( {x,x^{\prime}} \right) = \frac{{2^{1 - \nu } }}{\Gamma \left( \nu \right)}\left( {\frac{{\left| {x - x^{\prime}} \right|\sqrt {2\nu } }}{\ell }} \right)^{\nu } K_{\nu } \left( {\frac{{\left| {x - x^{\prime}} \right|\sqrt {2\nu } }}{\ell }} \right). \hfill \\ \end{gathered} $$

The hyperparameter \(\ell\), in all the reported kernels, is the characteristic length-scale (updated by maximum likelihood estimation). The hyperparameter \(v > 0\) in the Matérn kernel governates the smoothness of the GP samples, which are \(v - 1\) differentiable, and where \({\Gamma }\left( \nu \right) \) is the gamma function and \(K_{\nu }\) is the modified Bessel function of the second kind. The most widely adopted versions, specifically in the machine learning community and considered in this paper, are \(\nu = 3/2\) and \(\nu = 5/2\). The Matérn kernel encodes the expected smoothness of the target function explicitly.

2.2 Random forest

Random forest (RF) is an ensemble learning method, based on decision trees, for both classification and regression problems (Ho 1995). According to the originally proposed implementation, RF aims at generating a multitude of decision trees, at training time, and providing as output the mode of the classes (classification) or the mean/median of the predictions (regressions) of the individual trees.

Although originally designed and presented as a machine learning algorithm, RF is also an effective and efficient alternative to GP for implementing BO. RF consists of an ensemble of different regressors (i.e. decision trees); it is possible to compute—as for GP—both \(\mu \left( x \right)\) and \(\sigma \left( x \right)\), simply as mean and variance of the samples of the individual outputs provided by the regressors. Due to the different nature of RF and GP, they result in significantly different surrogate models. While GP is well suited to model smooth functions in search space spanned by continuous variables, RF can also deal with discrete and conditional variables.

Moreover, fitting an RF is computationally more efficient than fitting a GP; indeed, the kernel matrix inversion required to fit a GP is a well-known computational issue.

2.3 The acquisition functions

The acquisition function is the mechanism to implement the trade-off between exploration and exploitation in BO. More precisely, any acquisition function aims to guide the search of the optimum towards points with potentially high values of objective function either because the prediction of \(f\left( x \right)\), based on the probabilistic surrogate model, is high or the uncertainty, also based on the same model, is high (or both). Indeed, exploitation means to consider the area providing more chance to improve over current solution, while exploration means to move towards less explored regions of the search space where predictions based on the surrogate model are more uncertain, with higher variance. There are many acquisition functions, we quote only those used in this study. Probability of improvement (PI) (Kushner 1964) and expected improvement (EI) (Mockus 1975) measure, respectively, the probability and the expectation of the improvement over the best observed value of \(f\left( x \right)\) given the predictive distribution of the probabilistic surrogate model. More precisely, they are defined as follows:

$$ \begin{aligned} & {\text{EI}}\left( x \right) = \mathop {\max }\limits_{x \in X} \left\{ {0,\mu \left( x \right) - y^{ + } } \right\} \\ & {\text{PI}}\left( x \right) = {\Phi }\left( {\frac{{\mu \left( x \right) - y^{ + } }}{\sigma \left( x \right)}} \right) \\ \end{aligned} $$

with \(y^{ + }\) the best value observed so far and \({\Phi }\) is the cumulative density function of a normal distribution. The next evaluation point, \(x_{n + 1}\), is obtained by maximizing \({\text{EI}}\left( x \right)\) or \({\text{PI}}\left( x \right)\). More recently, Upper/Lower Confidence Bound, (Srinivas et al. 2010), denoted with UCB/LCB, is widely used. It is an acquisition function that manages exploration–exploitation by being optimistic in the face of uncertainty where UCB and LCB are used, respectively, for maximization and minimization problems:

$$ {\text{UCB}}\left( x \right) = \mu \left( x \right) + \beta \sigma \left( x \right)\;{\text{and}}\;{\text{LCB}}\left( x \right) = \mu \left( x \right) - \beta \sigma \left( x \right) $$

where \(\beta \ge 0\) is the parameter to manage the trade-off between exploration and exploitation (\(\beta = 0\) is for pure exploitation; on the contrary, higher values of \(\beta\) emphasizes exploration by inflating the weight of model uncertainty). In Srinivas et al. (2010), a policy is provided for updating the value of \(\beta\) along function evaluations, with also a proof of convergence of such a policy. In the case of a maximization problem, the next point is chosen as \(x_{n + 1} = \mathop {{\text{argmax}}}\limits_{x \in X} UCB\left( x \right)\) while in the case of a minimization problem the next point is selected as \(x_{n + 1} = \mathop {{\text{argmin}}}\limits_{x \in X} LCB\left( x \right)\).

2.4 Bayesian optimization

Algorithm 1 summarizes a general BO schema where the acquisition function, whichever it is, is denoted by \(\alpha \left( {x,D_{1:n} } \right)\). In order to maintain the schema as general as possible we do not specify the probabilistic surrogate model, as well as the kernel in the case of a GP.

In this study we have used both GP (considering the kernel types presented in the previous section) and RF as surrogate probabilistic models. The three different acquisition functions previously described have been used for both the two surrogates.

figure a

3 Experimental set-up

3.1 User interface

The software related to the game playing has been developed by implementing 15 stimuli, that are 15 different 2D functions. Each subject was informed about the goal of the experiment and the available number of clicks for the play, before to start. Subjects started by clicking at any point in the screen and getting the corresponding value; previously clicked points and their values remained on the screen until the end of the trial game. More specifically, points are coloured and resized according to the associated score (i.e. the value of \(f\) at that point), providing a visual feedback about the distribution of the scores collected so far.

Moreover, the software has been developed to manage three different game modalities. At the start of each game, along with the test function, also the modality is randomly chosen among the following.

Find a point whose associated evaluation is close to \(f^{*} = f\left( {x^{*} } \right)\), but without knowing \(f^{*}\) a-priori. From an optimization point of view, this means that the human player searches for \(y_{n}^{ + } = \mathop {\max }\limits_{i = 1, \ldots ,n} y_{i}\), with \(y_{i} = f\left( {x_{i} } \right)\) the value observed for the \(i\)th location sequentially chosen by the player. This is equivalent to optimize the simple regret (without knowing \(f^{*}\)):

$$ \mathop {\min }\limits_{n = 1,..,N} f^{*} - y_{n}^{ + } $$

As in the previous game modality, the goal is to find a point whose associated evaluation is close to \(f^{*}\) but, in this case, the human player knows \(f^{*}\) before making his/her choices. Thus, also in this case the player will optimize the simple regret: we are interested in understanding if and how knowledge about \(f^{*}\) can modify humans’ strategies with respect to the first game modality.

Maximize the cumulative value of all the evaluations at the selected points, without knowing a-priori \(f^{*}\). In this case human players tries to maximize \(\mathop \sum \nolimits_{i = 1}^{N} y_{i}\), which is equivalent to optimize the cumulative regret.

$$ \mathop {\min }\limits_{{}} \mathop \sum \limits_{i = 1}^{N} \left( {f^{*} - y_{i} } \right) $$

Our software collects all the choices made by each player along every game play and stores data into a database whose structure is summarized in Fig. 1. Into the “games” table, each row (i.e. record) represents a single point, and the associated function value, in a game, along with information about the player’s identifier, the test function, and the game modality. The games are univocally identified by including the timestamp relative to the end of the game.

Fig. 1
figure 1

Entity–relation (ER) diagram of database

Figure 2 shows a frame of the game.

Fig. 2
figure 2

An example of a game play: each dot is a location selected by the player. As a feedback, the associated numerical value is visualized, and size and colour of the dot are set accordingly

3.2 Procedure

For each player, at each iteration, GP and RF models are fitted on the observed points, and then the three acquisition functions, previously mentioned in Sect. 2.3, are used to select the next point to query. All these points are compared, via Euclidean distance, to the corresponding choice made by the human player.

Moreover, for the GP, five kernels have been used to consider different possibility of smoothness approximation of the objective function.

The choices of the human player and those of the global optimization algorithm are considered compliant, pointwise, if the distance between the point chosen by the human player and the “algorithmic player” is less than a given “threshold”, namely \(\tau\). Finally, the strategy of the human player is assimilated to the acquisition function most frequently compliant, pointwise, along a trial game. The procedures for GP-based and RF-based BO are summarized in the following pseudo-codes:

figure b

Finally, for a given kernel \(\overline{k}\), the search strategy of the participant \(\overline{p}\) is compliant to the most frequent acquisition function in the series \(s^{p,k} = \left\{ {s^{p,k,n} } \right\}_{n = m:m + N}\).

figure c

Finally, the search strategy of the participant \(\overline{p}\) is compliant to the most frequent acquisition function in the series \(s^{p} = \left\{ {s^{p,n} } \right\}_{n = m:m + N}\).

3.3 Software resources and analysis

Software resources consist of two components developed in R. The first component is the procedure to compute the distance between the humans’ and global optimization choices during each game, while the second aggregates all the calculated distances and generates the related statistics. More precisely, the procedure operated by the first component is summarized in previous Algorithm 2 and Algorithm 3 for GP-based and RF-based BO, respectively. Since this study considers also other global optimization strategies—detailed in the following—the first component has been specialized for each one of the other strategies. (For ease of reading, we do not report every single implementation, but they can be easily obtained starting from Algorithm 3.)

The implementations of Algorithm 2 and Algorithm 3 are based on the R package named “mlrMBO”, which offers both GP and RF as surrogate model along with the acquisition functions adopted in this study. L-BFGS algorithm is used to optimize the acquisition function based on GP on a continuous search space; for RF the “focus-search” algorithm (Bischl et al. 2017) is adopted: it can handle with numeric, discrete and mixed search spaces, also involving conditional variables. Focus-search starts with a large set of random points where the acquisition function is evaluated. Then, the search space is shrunk around the current best point and a new random sampling is performed within the “focused space”. Shrinkage is iteratively performed until a maximum number of iterations, and the entire procedure can be restarted multiple times to mitigate the risk of convergence to a local optimum. Finally, the best point over all restarts and iterations is returned as the solution.

As already mentioned, we have also compared the humans’ strategies with other deterministic and stochastic global optimization techniques, more specifically:

  • Random search (RS) we used the R package “randomsearch” which implements a simple RS function.

  • DIRECT we used the R package “nlopt” for nonlinear optimization, offering, among others, DIRECT as an optimization algorithm based on Jones et al. (1993).

  • Genetic algorithm (GA) we used the R package “GA”, a toolbox implementing GA for stochastic optimization (Scrucca 2013, 2016).

  • Particle swarm optimization (PSO) we used the R package “pso” which offers a standard PSO implementation based on Banks et al. (2007).

  • Simulated annealing (SA) we used the R package “optim_sa” which offers an implementation of SA based on Laarhoven and Aarts (1987).

We have then computed the distances, in terms of \(x\), between the best location found by the humans at the end of each game with those provided by each one of the other five global optimization techniques and BO. Let denote with \(x^{ + }\) the location found by the player (human or algorithmic) such that \(y^{ + } = f\left( {x^{ + } } \right)\) is the best value observed over a specific trial game by that user.

3.4 Participants and analytical settings

In this section, we further detail the features of the data collected and used for this study. Initially, around 70 volunteers were enrolled for the study; while they used the software to play and collect data, their feedbacks about the user-friendliness of the interface were also used to improve our software. Due to this work-in-progress nature of the game playing software, not all the games collected were well suited for the analysis. Thus, we have decided to restrict the analysis to the largest homogeneous set, that is:

  • 60 subjects

  • One test function (i.e. Styblinski-Tang).

  • One modality (i.e. finding a point whose evaluation is close to \(f^{*} = f\left( {x^{*} } \right)\), but without knowing \(f^{*}\) a-priori.)

With respect to Algorithm 2 and Algorithm 3, we declare here the values of \(\beta\) in the UCB equation, that is \(\beta = \left\{ {0.0, 0.5, 1.0} \right\}, \) and of the threshold \(\tau = \left\{ {0.10, 0.15} \right\}\).

4 Experimental results

4.1 Experiment 1: Gaussian process

The following figures summarize the main results related to the comparison between human players strategies and GP-based BO. We start with the case \(\beta = 1.0 \) in UCB.

From Fig. 3, EI is preferred, indicating a dominant exploitative behaviour among humans. Indeed, \(\beta = 1.0 \) in the UCB gives some chance to exploration, depending on the value of \(\sigma \left( x \right)\). Statistical significance of this result was evaluated via a Fisher’s test for each pair consisting in EI versus another acquisition function. Given a pair of acquisition functions, the test hypothesis is that there is not a more compliant one. As summarized in Table 1, the hypothesis is rejected (p value < 0.05) for exponential and Matérn kernels, while it is accepted for power and squared exponential kernels.

Fig. 3
figure 3

Number of human players whose strategy is compliant with respect to kernel type and acquisition functions, with the threshold \( \tau\) set to 0.10, and \(\beta = 1\) in UCB. Last bar represents the number of non-compliant

Table 1 Fisher’s test p values for each pair related to the most frequently compliant acquisition function in Fig. 3 (i.e. EI) and the others

Results reported in Fig. 4 are coherent with those previously depicted in Fig. 3. Moreover, the number of human players whose strategy is not compliant with BO is reduced, according to the less restrictive value of the threshold \(\tau\). We have applied a Fisher’s test to evaluate the statistical significance of the obtained outcomes; results are summarized in Table 2. According to the Fisher’s test, EI is always the most compliant acquisition function, except for the squared exponential kernel.

Fig. 4
figure 4

Number of human players whose strategy is compliant with respect to kernel type and acquisition functions, with threshold \(\tau\) set to 0.15, and \(\beta = 1\) in UCB. Last bar represents the number of non-compliant

Table 2 Fisher’s test p values for each pair related to the most frequently compliant acquisition function in Fig. 4 (i.e. EI) and the others. The hypothesis that one acquisition function is more significantly compliant than the other is accepted with a confidence 0.05

According to the charts reported in Fig. 5, one can see that with the reduction in \(\beta\), UCB gains a larger share of participants. Again, we have applied a Fisher’s test to evaluate the statistical significance of the obtained outcomes; the results are summarized in Table 3. According to the Fisher’s test, the EI is still the most compliant acquisition function, for most of the kernels, even if the p values increases with respect to the previous cases.

Fig. 5
figure 5

Number of human players whose strategy is compliant with respect to kernel type and acquisition functions, with threshold \(\tau\) set to 0.15, and \(\beta = 0.5\) in UCB. Last bar represents the number of non-compliant

Table 3 Fisher’s test p values for each pair related to the most frequently compliant acquisition function in Fig. 5 (i.e. EI) and the others

Figure 6 is again a confirmation that, for human players, “greed is good”: \(\beta = 0\) means no-exploration in UCB whose fully exploitative version gets the largest share of participants. In this case we have performed a Fisher’s test for each pair related to UCB (instead of EI) versus the other acquisition functions, since UCB is now the most frequently compliant option. Results of the Fisher’s test are summarized in Table 4. Although UCB with \(\beta = 0\) is the most frequently compliant acquisition function, the number of shares does not result significantly different from that of EI. In any case this result confirms the exploitative nature of humans, who largely adopt strategies close to EI and “pure exploitative” UCB acquisition functions.

Fig. 6
figure 6

Number of human players whose strategy is compliant with respect to kernel type and acquisition functions, with threshold \(\tau\) set to 0.15, and \(\beta = 0\) in UCB. Last bar represents the number of non-compliant

Table 4 Fisher’s test p values for each pair related to the most frequently compliant acquisition function in Fig. 6 (i.e. UCB) and the others

Although maximizing EI or UCB with \(\beta = 0\) should analytically lead to the same next location \(x\) to evaluate, the numerical results can be quite different, due to the different shapes of these two acquisition functions. Indeed, when \(\mu \left( x \right)\) is lower than the best value observed so far, \(y^{ + } ,\) over a large portion of the search space, then EI is almost completely flat, as depicted in the example of Fig. 7. Consequently, whichever optimization strategy is used to optimize EI could easily fail, leading to no improvement of \(y^{ + }\) over new function evaluations. On the contrary, UCB with \(\beta = 0\) coincides with \(\mu \left( x \right)\) (Fig. 7): this allows to converge towards a local optimum of \(\mu \left( x \right)\) and, in case, to improve over \(y^{ + }\).

Fig. 7
figure 7

An example of the reason why maximizing EI and UCB with \(\beta = 0\) might numerically lead to different solutions. Since EI is almost completely flat over the search space, whichever optimization algorithm could easily fail in maximizing it, contrary to UCB with \(\beta = 0\)

4.2 Experiment 2: random forest

The same analysis has been also performed by using RF as surrogate model instead of GP. Figure 8 shows that PI, a notoriously exploitative acquisition function, gets the larger share of compliant when UCB accounts for exploration (\(\beta = 1\)).

Fig. 8
figure 8

Number of human players whose strategy is compliant with respect to different acquisition functions, with RF as surrogate model, in different settings of threshold \(\tau\) and \(\beta\) values of UCB

Although the exploitative nature of the human choice is confirmed also in this case, we observed a shift from EI to PI (that is even more exploitative than EI). This is basically due to the combination between the type of acquisition function and the shape of the surrogate model, that is piecewise constant for RF and continuous for GP.

The situation changes by reducing the exploration component in UCB, which with \(\beta = 0\).5 and \(\beta = 0\).0 dominates choices among the compliant. Summarizing, also in the case of RF, the human’s nature appears to be exploitative. We have performed a Fisher’s test to evaluate, for each combination of \(\tau\) and \(\beta\), if the BO algorithm associated to the larger share of compliant participants can be statistically considered the most compliant one. Results of the Fisher’s test are reported in Table 5.

Table 5 Fisher’s test p values for each pair related to the most frequently compliant acquisition function in Fig. 7, for the four cases related to the values of the threshold \(\tau\) and UCB’s \(\beta\)

4.3 Comparison against other global optimization techniques

In this section we consider, among all the possibilities, the BO approach that proved to be more compliant with the humans’ strategies, overall. Let BO** denote this specific approach, that is: GP surrogate with Matérn 3/2 kernel, EI as acquisition function and \(\tau = 0.15.\) Indeed, as reported in Fig. 4, this BO configuration is associated to the largest share (i.e. 43) of participants compliant to an optimization strategy.

We computed all the distances between the optimal solutions (i.e. \(x^{ + }\)) found by humans with those provided by BO** and the other 5 global optimization techniques cited in Sect. 3.3 (i.e. RS, DIRECT, GA, PSO and SA).

According to the boxplot reported in Fig. 9, optimal solutions provided by BO** are, on average, closer to the humans’ ones compared to the other techniques.

Fig. 9
figure 9

Distances between the best solutions found by the humans and those provided by each global optimization strategy. BO** denotes the overall most compliant BO algorithm (i.e. GP surrogate with Matérn 3/2 kernel, EI acquisition function, from Fig. 4)

More interestingly, the density plot in Fig. 10 shows that the distributions of the distances are basically bimodal, for all the global optimization techniques but BO**. Indeed, for BO** the density associated with small values of the distance is around twice that related to other global optimization techniques, proving that BO** is closer to the strategies applied by the human players.

Fig. 10
figure 10

Density plot of the distances between the best solutions found by the humans and those provided by each global optimization strategy

Beyond the visual representation in Fig. 10, a further confirmation that BO is closer to human search comes from statistical tests. Since distances do not show a normal distribution, we could not apply ANOVA and, therefore, we used nonparametric tests. According to the Friedman’s test, distance values result significantly different among all the techniques (p value < 0.001). To better investigate these differences, we have then performed a pairwise Wilcoxon test obtaining that:

  • distance values related to RS, GA, PSO and SA are, pairwise, significantly similar (p value > 0.192).

  • distance values related to DIRECT are, pairwise, significantly different from those related to all the other techniques (p value < 0.05) except for RS (p value = 0.106). This occurs because the peaks in the distributions of both DIRECT and RS correspond to distance values slightly greater than those of the other techniques.

  • distance values related to BO** are, pairwise, significantly different from DIRECT and RS (p value < 0.001). Although BO**, in Fig. 10, is clearly different from every other optimization techniques, Wilcoxon test suggests that the hypothesis that they are similar must be accepted for GA, SA and PSO, even if with a very poor statistical significance level: p value = 0.062 for BO** versus GA and SA, and p value = 0.073 for BO** versus PSO. This is mainly because BO**, GA, SA and PSO have their highest peak around the same value of distance, even if the height of the BO**’s peak is higher than the others.

5 Conclusions

It is important to remark that the aim of the analysis performed in this study is not to compare the effectiveness of different techniques in solving the global optimization problem but to compare human search strategies with those embodied in global optimization algorithms. Results proved that BO is closer to human search strategy and therefore provides a valuable insight about how humans deal with the trade-off between immediate rewards (exploitation) and risk propension (exploration). This can be explained by two major factors: first, the most coherent model for learning is provided by Bayesian learning, as recognized in the cognitive sciences community. Therefore, BO as a model-based active learning is revealed by the humans’ reward driven sampling behaviour and embodies underlying decision-making strategies. Second, the flexibility of BO, as recognized also in the machine learning community, is larger than in other optimization techniques and is given by the freedom to choose the probabilistic surrogate model (GP or RF)—and the kernel in the case of GP—and the acquisition function.

Since the flexibility in BO is structural, while in the other optimization techniques considered it is just parametric, BO can naturally model better the internal representation of the function that humans maintain when they search for the optimum. In this paper this conclusion is supported by a wide set of statistical tests: the number of BO compliant participants is very high (less than 10% of participants resulted non-compliant to all models and acquisition functions). This result is also visually captured by the results depicted in Figs. 9 and 10. Moreover, nonparametric statistical testing provides meaningful outcomes. The authors admit that their statistical significance should be improved: indeed, the analysis is being extended, by the use of the Amazon Mechanical Turk, to a larger sample of participants, each one performing all the stimuli and game modalities developed in the game playing software.

An interesting result comes from the analysis of which space model, that is GP’s kernel, is implied by human search: based on the limited set of results presented in this paper, kernel is not a major factor in determining compliance to GP-based BO. On the other hand, the main drivers for compliance is the acquisition function, and its parametrization for UCB: the value of \(\beta\) significantly affects the relative importance of UCB over the search processes performed by the compliant participants. Then, we can conclude that exploitation-oriented acquisition functions get a consistently larger share, meaning that “greed is good”, that is the human behaviour when looking for rewards in black-box situations is dominantly exploitative.