Skip to main content
Log in

Heteroscedastic Bayesian optimization using generalized product of experts

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In many real world optimization problems observations are corrupted by a heteroscedastic noise, which depends on the input location. Bayesian optimization (BO) is an efficient approach for global optimization of black-box functions, but the performance of using a Gaussian process (GP) model can degrade with changing levels of noise due to a homoscedastic noise assumption. However, a generalized product of experts (GPOE) model allows us to build independent GP experts on the subsets of observations with individual set of hyperparameters, which is flexible enough to capture the changing levels of noise. In this paper we propose a heteroscedastic Bayesian optimization algorithm by combining the GPOE model with two modifications of existing acquisition functions, which are capable of representing and penalizing heteroscedastic noise across the input space. We compare and evaluate the performance of GPOE based BO (GPOEBO) model on 6 synthetic global optimization functions corrupted with the heteroscedastic noise as well as on two real-world scientific datasets. The results show that GPOEBO is able to improve the accuracy compared to other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Assael, J.A.M., Wang, Z., Shahriari, B., de Freitas, N.: Heteroscedastic treed Bayesian optimisation. arXiv preprint arXiv:1410.7172 (2014)

  2. Brochu, E., Cora, V.M., De Freitas, N.: A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599 (2010)

  3. Calandra, R.: Bayesian modeling for optimization and control in robotics. Ph.D. thesis, Technische Universität Darmstadt (2017)

  4. Cao, Y.: Scaling Gaussian processes. Ph.D. thesis, University of Toronto (Canada) (2018)

  5. Cao, Y., Fleet, D.J.: Generalized product of experts for automatic and principled fusion of Gaussian process predictions. In: Modern Nonparametrics 3: Automating the Learning Pipeline workshop at NIPS. arXiv:1410.7827 (2014)

  6. Cao, Y., Fleet, D.J.: Transductive log opinion pool of Gaussian process experts. In: Workshop on Nonparametric Methods for Large Scale Representation Learning at NIPS. arXiv:1511.07551 (2015)

  7. Chalupka, K., Williams, C.K., Murray, I.: A framework for evaluating approximation methods for Gaussian process regression. J. Mach. Learn. Res. 14, 333–350 (2013)

    MathSciNet  MATH  Google Scholar 

  8. Cohen, S., Mbuvha, R., Marwala, T., Deisenroth, M.P.: Healing products of Gaussian process experts. In: Proceedings of the 37th International Conference on Machine Learning pp. 2068–2077. PMLR (2020)

  9. Cowen-Rivers, A.I., Lyu, W., Tutunov, R., Wang, Z., Grosnit, A., Griffiths, R.R., Maraval, A.M., Jianye, H., Wang, J., Peters, J., et al.: HEBO: pushing the limits of sample-efficient hyper-parameter optimisation. J. Artif. Intell. Res. 74, 1269–1349 (2022)

    Article  MATH  Google Scholar 

  10. Deisenroth, M.P., Ng, J.W.: Distributed Gaussian processes. In: 32nd International Conference on Machine Learning, ICML 2015, vol. 2 (2015)

  11. Frazier, P.I.: Bayesian optimization. In: Recent Advances in Optimization and Modeling of Contemporary Problems, pp. 255–278 (2018)

  12. Goldberg, P., Williams, C., Bishop, C.: Regression with input-dependent noise: a Gaussian process treatment. Adv. Neural Inf. Process. Syst. 10, 493–499 (1997)

    Google Scholar 

  13. Griffiths, R.R., Aldrick, A.A., Garcia-Ortegon, M., Lalchand, V., et al.: Achieving robustness to aleatoric uncertainty with heteroscedastic Bayesian optimisation. Mach. Learn. Sci. Technol. 3(1), 015004 (2021)

    Article  Google Scholar 

  14. Hennig, P., Schuler, C.J.: Entropy search for information-efficient global optimization. J. Mach. Learn. Res. 13(6), 1809–1837 (2012)

    MathSciNet  MATH  Google Scholar 

  15. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)

    Article  MATH  Google Scholar 

  16. Huang, D., Allen, T.T., Notz, W.I., Zeng, N.: Global optimization of stochastic black-box systems via sequential kriging meta-models. J. Glob. Optim. 34(3), 441–466 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  17. Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13(4), 455–492 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  18. Kersting, K., Plagemann, C., Pfaff, P., Burgard, W.: Most likely heteroscedastic Gaussian process regression. In: Proceedings of the 24th International Conference on Machine Learning, pp. 393–400 (2007)

  19. Letham, B., Karrer, B., Ottoni, G., Bakshy, E.: Constrained Bayesian optimization with noisy experiments. Bayesian Anal. 14(2), 495–519 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  20. Liu, H., Cai, J., Wang, Y., Ong, Y.S.: Generalized robust Bayesian committee machine for large-scale Gaussian process regression. In: 35th International Conference on Machine Learning, ICML 2018, vol. 80, pp. 3131–3140 (2018)

  21. Liu, H., Ong, Y.S., Cai, J.: Large-scale heteroscedastic regression via Gaussian process. IEEE Trans. Neural Netw. Learn. Syst. 32(2), 708–721 (2020)

    Article  MathSciNet  Google Scholar 

  22. Lázaro-Gredilla, M., Titsias, M.: Variational heteroscedastic Gaussian process regression. In: ICML, pp. 841–848 (2011)

  23. Makarova, A., Usmanova, I., Bogunovic, I., Krause, A.: Risk-averse heteroscedastic Bayesian optimization. Adv. Neural. Inf. Process. Syst. 34, 17235–17245 (2021)

    Google Scholar 

  24. Mockus, J., Tiesis, V., Zilinskas, A.: The application of Bayesian methods for seeking the extremum. Towards Glob. Optim. 2, 117–129 (1978)

    MATH  Google Scholar 

  25. Picheny, V., Ginsbourger, D., Richet, Y., Caplin, G.: Quantile-based optimization of noisy computer experiments with tunable precision. Technometrics 55(1), 2–13 (2013)

    Article  MathSciNet  Google Scholar 

  26. Picheny, V., Wagner, T., Ginsbourger, D.: A benchmark of kriging-based infill criteria for noisy optimization. Struct. Multidiscip. Optim. 48(3), 607–626 (2013)

    Article  Google Scholar 

  27. Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., de Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)

    Article  Google Scholar 

  28. Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 25, 2951–2959 (2012)

  29. Tautvaišas, S., Žilinskas, J.: Scalable Bayesian optimization with generalized product of experts. J. Glob. Optim. (2022). https://doi.org/10.1007/s10898-022-01236-x

    Article  Google Scholar 

  30. Tresp, V.: A Bayesian committee machine. Neural Comput. 12(11), 2719–2741 (2000)

    Article  Google Scholar 

  31. Vazquez, E., Villemonteix, J., Sidorkiewicz, M., Walter, E.: Global optimization based on noisy evaluations: an empirical study of two statistical approaches. J. Phys. Conf. Ser. 135, 012100 (2008)

    Article  Google Scholar 

  32. Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saulius Tautvaišas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional experiment details

Additional experiment details

1.1 Comparing different data partitioning strategies for GP experts

We consider two strategies for data partitioning: random and disjoint, and K-means partitioning to assess the effect of the data assignment strategy. In the random partitioning strategy, we partition the data \({\mathcal {D}}_{n}\) into \(\textit{M}\) subsets, where each expert is allocated a random subset of \(n_{i}\) data points without replacement. This guarantees that each expert receives a unique set of data points, ensuring diversity across experts. The k-means point allocation strategy aims to group data points with similar characteristics together, allowing experts to specialize in distinct data patterns. We use k-means algorithm to identify \(\textit{M}\) cluster centers, which equals to the number of experts. Then, for each cluster center, we query the BallTree to identify its \(n_{i}\) nearest data points. These points are then assigned to the corresponding i-th expert.

To find the best performing data partitioning strategy for PoE and GPOE we use the same parameters for experiments as in Sect. 4.1. Table 2 shows the optimization results when using random and k-means data partitioning strategies. Overall GPOE model performance with random data partitioning show better results on lower dimension functions, while k-means outperform random data partitioning strategy only on higher functions (Hartmann6D and Sphere).

Table 2 The mean and standard deviation of absolute error between the function value at best found point and the actual function maximum for all test functions

1.2 Performance sensitivity to the number of points per expert

The optimization performance of the expert models tends to vary depending on the number of points assigned per expert. The Fig. 5 shows the effect of number of data points per expert on optimization performance. We can see that performance tends to vary between different functions, but the overall performance improves (absolute error gets closer to zero) as the number of points per expert increases. The detailed information is provided in the Tables 3, 4 and 5.

Fig. 5
figure 5

The effect of number of data points per expert on optimization performance

Table 3 The mean and standard deviation of absolute error between the function value at best found point and the actual function maximum for all test functions with \(3 \times D\) points allocated to GP expert
Table 4 The mean and standard deviation of absolute error between the function value at best found point and the actual function maximum for all test functions with \(2 \times D\) points allocated to GP expert
Table 5 The mean and standard deviation of absolute error between the function value at best found point and the actual function maximum for all test functions with \(1 \times D\) points allocated to GP expert

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tautvaišas, S., Žilinskas, J. Heteroscedastic Bayesian optimization using generalized product of experts. J Glob Optim (2023). https://doi.org/10.1007/s10898-023-01333-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10898-023-01333-5

Keywords

Navigation