Budget-limited distribution learning in multifidelity problems

Xu, Yiming; Narayan, Akil

doi:10.1007/s00211-022-01337-5

Budget-limited distribution learning in multifidelity problems

Published: 16 December 2022

Volume 153, pages 171–212, (2023)
Cite this article

Numerische Mathematik Aims and scope Submit manuscript

Yiming Xu^1,2 &
Akil Narayan^1,2

267 Accesses
Explore all metrics

Abstract

Multifidelity methods are widely used for estimating quantities of interest (QoI) in computational science by employing numerical simulations of differing costs and accuracies. Many methods approximate numerical-valued statistics that represent only limited information, e.g., scalar statistics, about the QoI. Further quantification of uncertainty, e.g., for risk assessment, failure probabilities, or confidence intervals, requires estimation of the full distributions. In this paper, we generalize the ideas in (Xu et al. in SIAM J Sci Comput 44(1):A150–A175, 2022) to develop a multifidelity method that approximates the full distribution of scalar-valued QoI. The main advantage of our approach compared to alternative methods is that we require no particular relationships among the high and lower-fidelity models (e.g. model hierarchy), and we do not assume any knowledge of model statistics including correlations and other cross-model statistics before the procedure starts. Under suitable assumptions in the framework above, we achieve provable 1-Wasserstein metric convergence of an algorithmically constructed distributional emulator via an exploration–exploitation strategy. We also prove that crucial policy actions taken by our algorithm are budget-asymptotically optimal. Numerical experiments are provided to support our theoretical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simulated Annealing: From Basics to Applications

Performance assessment of the metaheuristic optimization algorithms: an exhaustive review

Article 06 October 2020

Novel approaches for hyper-parameter tuning of physics-informed Gaussian processes: application to parametric PDEs

Article 08 April 2024

Notes

For example, requiring that models with higher correlation relative to the high-fidelity model should also incur higher cost is one such model cost behavior.
A sequence of probability measures $\{P_k\}$ defined on a metric space is called $\delta $-tight if for every ${\varepsilon }>0$, there exist a compact measurable set K and a sequence $\delta _k\downarrow 0$ such that $P_k(K^{\delta _k})>1-{\varepsilon }$ for every k, where $K^{\delta _k}: = \{x: \text {dist}(x, K)<\delta _k\}$.
When the variance ratio between ${\varepsilon }_S$ and Y is small, $Y\approx X_S^\top \beta _S$, so that adding the noise emulator has little impact on the accuracy of the resulting estimator. When the ratio is moderate, adding ${\varepsilon }_S$ as an independent component will degrade the quality of the estimator if the independence assumption is violated.

References

Beran, R., Le Cam, L., Millar, P.: Convergence of stochastic empirical measures. J. Multivar. Anal. 23(1), 159–168 (1987)
Article MathSciNet MATH Google Scholar
Bobkov,S., Ledoux, M.: One-Dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances. Vol. 261, No. 1259. American Mathematical Society (2019)
Bubeck, S., Cesa-Bianchi, N., et al.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends R Mach. Learn. 5(1), 1–122 (2012)
Article MATH Google Scholar
Cambanis, S., Simons, G., Stout, W.: Inequalities for E k (x, y) when the marginals are fixed. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 36(4), 285–294 (1976)
Article MathSciNet MATH Google Scholar
Chatterjee, S.: Lecture Notes on Stein’s Method. Stanford Lecture Notes (2007)
Cohen, A., DeVore, R.: Approximation of high-dimensional parametric PDEs. Acta Numer. 24, 1–159 (2015). https://doi.org/10.1017/S0962492915000033. (ISSN: 1474-0508)
Article MathSciNet MATH Google Scholar
Farcas, I.-G.: Context-aware model hierarchies for higher-dimensional uncertainty quantification. PhD thesis. Technische Universität München (2020)
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002). https://doi.org/10.1111/j.1751-5823.2002.tb00178.x
Article MATH Google Scholar
Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008)
Article MathSciNet MATH Google Scholar
Giles, M.B., Nagapetyan, T., Ritter, K.: Multilevel Monte Carlo approximation of distribution functions and densities. SIAM/ASA J. Uncertain. Quantif. 3(1), 267–295 (2015). https://doi.org/10.1137/140960086
Article MathSciNet MATH Google Scholar
Giles, M.B., Nagapetyan, T., Ritter, K.: Adaptive multilevel Monte Carlo approximation of distribution functions. arXiv preprint arXiv:1706.06869 (2017)
Gorodetsky, A.A., Geraci, G., Eldred, M.S., Jakeman, J.D.: A generalized approximate control variate framework for multifidelity uncertainty quantification. J. Comput. Phys. 408, 109257 (2020). https://doi.org/10.1016/j.jcp.2020.109257
Article MathSciNet MATH Google Scholar
Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods. Methuen, London (1964)
Book MATH Google Scholar
Ishigami,T., Homma, T.: An importance quantification technique in uncertainty analysis for computer models. In: [1990] Proceedings. First International Symposium on Uncertainty Modeling and Analysis. IEEE Computere Society Press (1990). https://doi.org/10.1109/isuma.1990.151285
Koenker, R.: Fundamentals of quantile regression. In: Quantile Regression, pp. 26–67. Cambridge University Press, Cambridge (2001). https://doi.org/10.1017/ccol0521845734.002
Chapter Google Scholar
Krumscheid, S., Nobile, F.: Multilevel Monte Carlo Approximation of Functions. SIAM/ASA J. Uncertain. Quantif. 6(3), 1256–1293 (2018). https://doi.org/10.1137/17m1135566
Article MathSciNet MATH Google Scholar
Lai, T.L., Wei, C.Z.: Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Stat. 10(1), 154–166 (1982). https://doi.org/10.1214/aos/1176345697
Article MathSciNet MATH Google Scholar
Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)
Book MATH Google Scholar
Lu, D., Zhang, G., Webster, C., Barbier, C.: An improved multilevel Monte Carlo method for estimating probability distribution functions in stochastic oil reservoir simulations. Water Resour. Res. 52(12), 9642–9660 (2016). https://doi.org/10.1002/2016wr019475
Article Google Scholar
Massart, P.: The tight constant in the Dvoretzky–Kiefer–Wolfowitz inequality. Ann. Probab. 18, 1269–1283 (1990)
Article MathSciNet MATH Google Scholar
Pan, X., Zhou, W.-X.: Multiplier bootstrap for quantile regression: non-asymptotic theory under random design. Inf. Inference J. IMA (2020). https://doi.org/10.1093/imaiai/iaaa006
Article Google Scholar
Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances. Annu. Rev. Stat. Appl. 6, 405–431 (2019)
Article MathSciNet Google Scholar
Peherstorfer, B.: Multifidelity Monte Carlo estimation with adaptive low-fidelity models. SIAM/ASA J. Uncertain. Quantif. 7(2), 579–603 (2019)
Article MathSciNet MATH Google Scholar
Peherstorfer, B., Willcox, K., Gunzburger, M.: Optimal model management for multifidelity Monte Carlo estimation. SIAM J. Sci. Comput. 38(5), A3163–A3194 (2016). https://doi.org/10.1137/15m1046472
Article MathSciNet MATH Google Scholar
Peherstorfer, B., Willcox, K., Gunzburger, M.: Survey of multifidelity methods in uncertainty propagation, inference, and optimization. SIAM Rev. 60(3), A550–A591 (2018)
Article MathSciNet MATH Google Scholar
Qian, E., Peherstorfer, B., O’Malley, D., Vesselinov, V.V., Willcox, K.: Multifidelity Monte Carlo estimation of variance and sensitivity indices. SIAM/ASA J. Uncertain. Quantif. 6(2), 683–706 (2018). https://doi.org/10.1137/17m1151006
Article MathSciNet MATH Google Scholar
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria (2020)
Robinson, P.M.: Root-N-consistent semiparametric regression. Econometrica 56(4), 931 (1988). https://doi.org/10.2307/1912705
Article MathSciNet MATH Google Scholar
Schaden, D., Ullmann, E.: On multilevel best linear unbiased estimators. SIAM/ASA J. Uncertain. Quantif. 8(2), 601–635 (2020). https://doi.org/10.1137/19m1263534
Article MathSciNet MATH Google Scholar
Schaden, D., Ullmann, E.: Asymptotic analysis of multilevel best linear unbiased estimators. SIAM/ASA J. Uncertain. Quantif. 9(3), 953–978 (2021)
Article MathSciNet MATH Google Scholar
Vershynin, R.: High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47. Cambridge University Press, Cambridge (2018)
MATH Google Scholar
Villani, C.: The metric side of optimal transportation. In: Graduate Studies in Mathematics, pp. 205–235. (2003). https://doi.org/10.1090/gsm/058/08
Chapter Google Scholar
Williams, D.: Probability with Martingales. Cambridge University Press, Cambridge (1991)
Book MATH Google Scholar
Xu, Y., Keshavarzzadeh, V., Kirby, R.M., Narayan, A.: A bandit-learning approach to multifidelity approximation. SIAM J. Sci. Comput. 44(1), A150–A175 (2022)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We would like to thank the referees for their time and helpful comments which significantly improved the presentation of the manuscript. Y. Xu and A. Narayan are partially supported by National Science Foundation DMS-1848508. A. Narayan is partially supported by the Air Force Office of Scientific Research award FA9550-20-1-0338. Y. Xu would like to thank Dr. Xiaoou Pan for clarifying a uniform consistency result in quantile regression. We also thank Dr. Ruijian Han for a careful reading of an early draft, and for providing several comments that improved the presentation of the manuscript.

Author information

Authors and Affiliations

Department of Mathematics, University of Utah, 155 1400 E, Salt Lake City, 84112, UT, USA
Yiming Xu & Akil Narayan
Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, USA
Yiming Xu & Akil Narayan

Authors

Yiming Xu
View author publications
You can also search for this author in PubMed Google Scholar
Akil Narayan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiming Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proof of Lemma 4.1

Let $|S| = s$. We first prove (28a). Recall the definition of Y and $Y'$:

$$\begin{aligned}&Y = X_S^\top \beta _S + {\varepsilon }_S&Y' = X_S^\top {\widehat{\beta }}_S + {\widehat{{\varepsilon }}}_S \end{aligned}$$

where ${\varepsilon }_S, {\widehat{{\varepsilon }}}_S$ are independent of $X_S$ by Assumption 2.2.

Define the true empirical distribution of ${\varepsilon }_S$ using exploration samples as

$$\begin{aligned} {\widetilde{{\varepsilon }}}_S\sim \frac{1}{m}\sum _{\ell \in [m]}\delta _{Y_\ell -X_{S,\ell }^\top \beta _S}. \end{aligned}$$

by the additive property of $W_1$ metric under independence [22] and the triangle inequality, we can upper bound the average $W_1$ distance between $F_Y$ and $F_{Y'}$ by conditioning on the exploration data:

$$\begin{aligned}&{{\mathbb {E}}}\left[ W_1(F_Y, F_{Y'})|Z_S, Y_{{\text {epr}}}\right] \nonumber \\&\quad \le \ {{\mathbb {E}}}[W_1(F_{X_S^\top \beta _S}, F_{X_S^\top {\widehat{\beta }}_S})|Z_S, Y_{{\text {epr}}}] +{{\mathbb {E}}}[W_1(F_{{\varepsilon }_S}, F_{{\widehat{{\varepsilon }}}_S})|Z_S, Y_{{\text {epr}}}]\nonumber \\&\quad \le \ {{\mathbb {E}}}[W_1(F_{X_S^\top \beta _S}, F_{X_S^\top {\widehat{\beta }}_S})|Z_S, Y_{{\text {epr}}}] +{{\mathbb {E}}}[W_1(F_{{\varepsilon }_S}, F_{{\widetilde{{\varepsilon }}}_S})|Z_S, Y_{{\text {epr}}}] \nonumber \\&\qquad +{{\mathbb {E}}}[W_1(F_{{\widetilde{{\varepsilon }}}_S}, F_{{\widehat{{\varepsilon }}}_S})|Z_S, Y_{{\text {epr}}}]. \end{aligned}$$

(59)

For the first term in (59), note ${{\widehat{\beta }}}_S-\beta _S$ satisfies

$$\begin{aligned} {{\widehat{\beta }}}_S-\beta _S\sim \mathcal (0, \sigma _S^2(Z^\top _SZ_S)^{-1}). \end{aligned}$$

(60)

Averaging out the randomness of exploration noise, we have that, almost surely,

$$\begin{aligned} {{\mathbb {E}}}[W_1(F_{X_S^\top \beta _S}, F_{X_S^\top {\widehat{\beta }}_S})|Z_S]&\le \left( {{\mathbb {E}}}[W^2_2(F_{X_S^\top \beta _S}, F_{X_S^\top {\widehat{\beta }}_S})|Z_S]\right) ^{1/2}\nonumber \\&\le \left( {{\mathbb {E}}}[|X_S^\top ({{\widehat{\beta }}}_S-\beta _S)|^2 |Z_S]\right) ^{1/2}\nonumber \\&=\left( \hbox {tr}({{\mathbb {E}}}[X_S X_S^\top ] {{\mathbb {E}}}[({{\widehat{\beta }}}_S-\beta _S)({{\widehat{\beta }}}_S-\beta _S)^\top | Z_S])\right) ^{1/2}\nonumber \\&= \left( \frac{\sigma _S^2}{m}\hbox {tr}(\Lambda _S(m^{-1}Z_S^\top Z_S)^{-1})\right) ^{1/2}\nonumber \\&\simeq \sqrt{\frac{s+1}{m}}\sigma _S, \end{aligned}$$

(61)

where the first step uses Jensen’s inequality, and the last step follows from the law of large numbers and Assumption 2.3.

For the second term in (59), note that ${\widetilde{{\varepsilon }}}_S$ is the empirical distribution of ${\varepsilon }_S$ based on m exploration samples, which does not depend on $Z_S$ under Assumption 2.2. Applying the nonasymptotic estimates on the convergence rate of empirical measures in Lemma 3.2 obtains

$$\begin{aligned} {{\mathbb {E}}}[W_1\left( F_{{\varepsilon }_S}, F_{{\widetilde{{\varepsilon }}}_S}\right) |Z_S] = {{\mathbb {E}}}[W_1\left( F_{{\varepsilon }_S}, F_{{\widetilde{{\varepsilon }}}_S}\right) ]\le \frac{J_1(F_{{\varepsilon }_S})}{\sqrt{m}}, \end{aligned}$$

(62)

where $J_1$ is defined in (10).

For the third term in (59), consider the natural coupling between ${\widehat{{\varepsilon }}}_S$ and ${\widetilde{{\varepsilon }}}_S$: ${\widetilde{{\varepsilon }}}^{\leftarrow }_S({\widetilde{\tau }}_\ell ) = {\widehat{{\varepsilon }}}^{\leftarrow }_S({\widehat{\tau }}_\ell )$, where $^{\leftarrow }$ denotes the preimage of a map and

$$\begin{aligned}&{\widetilde{\tau }}_\ell = Y_\ell -X_{S,\ell }^\top \beta _S&{\widehat{\tau }}_\ell = Y_\ell -X_{S,\ell }^\top {{\widehat{\beta }}}_S. \end{aligned}$$

(63)

In this case,

$$\begin{aligned}&{{\mathbb {E}}}[W_1(F_{{\widetilde{{\varepsilon }}}_S}, F_{{\widehat{{\varepsilon }}}_S})|Z_S]\le ({{\mathbb {E}}}[W^2_2(F_{{\widetilde{{\varepsilon }}}_S}, F_{{\widehat{{\varepsilon }}}_S})|Z_S])^{1/2}\le ({{\mathbb {E}}}[|{\widetilde{{\varepsilon }}}_S-{\widehat{{\varepsilon }}}_S|^2|Z_S])^{1/2}\nonumber \\&\quad =\left( \frac{1}{m}\sum _{\ell \in [m]}{{\mathbb {E}}}[(X^\top _{S,\ell }({{\widehat{\beta }}}_S-\beta _S))^2|Z_S]\right) ^{1/2}\nonumber \\&\quad = \sqrt{\frac{s+1}{m}}\sigma _S. \end{aligned}$$

(64)

Putting (61), (62), (64) together finishes the proof of (28a).

We next prove (28b). Conditioned on $Z_S$ and $Y_{\text {epr}}$, $Y'$ is a random variable with bounded r-th moments for all $r>2$. Appealing to Lemma 3.2 and averaging over the exploration noise, we have

$$\begin{aligned} {{\mathbb {E}}}\left[ W_1\left( {\widehat{F}}_{Y,S}, F_{Y'}\right) |Z_S, Y_{\text {epr}}\right]&\le \frac{{{\mathbb {E}}}[J_1(F_{Y'})|Z_S, Y_{\text {epr}}]}{\sqrt{N_S}}\Longrightarrow {{\mathbb {E}}}\left[ W_1\left( {\widehat{F}}_{Y,S}, F_{Y'}\right) |Z_S\right] \nonumber \\&\le \frac{{{\mathbb {E}}}[J_1(F_{Y'})|Z_S]}{\sqrt{N_S}}, \end{aligned}$$

(65)

where $J_1$ is defined in (10). The desired result would follow if we can show that ${{\mathbb {E}}}[J_1(F_{Y'})|Z_S]$ converges to $J_1(F_{Y})$ a.s. as $m\rightarrow \infty $. To this end, we introduce the following intermediate random variables:

$$\begin{aligned}&Y'' = X_S^\top \beta _S + {\widetilde{{\varepsilon }}}_S&Y''' = X_S^\top \beta _S + {\widehat{{\varepsilon }}}_S. \end{aligned}$$

We will prove the desired result by verifying the following convergence statements respectively:

(a).
$|{{\mathbb {E}}}[J_1(F_{Y})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y''})|Z_S]|\rightarrow 0$ a.s.;
(b).
$|{{\mathbb {E}}}[J_1(F_{Y''})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S]|\rightarrow 0$ a.s.;
(c).
$|{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y'})|Z_S]|\rightarrow 0$ a.s.

Without loss of generality, we assume $\text {supp}({\varepsilon }_S)\subseteq [-1, 1]$ and $\Vert \beta _S\Vert _2 = 1$; the general case can be considered similarly by taking appropriate scaling involving a constant C in Assumption 2.5.

We introduce the following quantity for our analysis:

$$\begin{aligned} K_m^* = \max _{\ell \in [m]}\Vert X_{S,\ell }\Vert _2. \end{aligned}$$

(66)

It is clear that $K_m^*$ depends only on $Z_S$. Under Assumption 2.4, $X_{S,\ell }$’s are i.i.d. sub-exponential random variables with uniformly bounded sub-exponential norm. By Lemma 3.6,

$$\begin{aligned}&K_m^*\lesssim \log m&a.s., \end{aligned}$$

(67)

where the implicit constant is realization-dependent.

To prove (a), we condition on $Z_S$ and $Y_{\text {epr}}$. Using (13)–(15) in Lemma 3.5,

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y''})|Z_S, Y_{\text {epr}}]|\nonumber \\&\quad \le \ {{\mathbb {E}}}\left[ \int _{{\mathbb {R}}}\sqrt{| F_Y(y)-F_{Y''}(y)|} dy| Z_S, Y_{\text {epr}}\right] \nonumber \\&\quad =\ {{\mathbb {E}}}\left[ \int _{{\mathbb {R}}}\sqrt{\left| \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z) dF_{{\varepsilon }_S}(z) - \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z)dF_{{\widetilde{{\varepsilon }}}_S}(z)\right| } dy| Z_S, Y_{\text {epr}}\right] . \end{aligned}$$

(68)

Under Assumption 2.4, for every y, since $\Vert \beta _S\Vert _2 = 1$, $F_{X_S^\top \beta _S}(y-z)$ as a function of z, is $C_{\text {Lip}}$-Lipschitz. By the Kantorovich-Rubinstein duality (6),

$$\begin{aligned} \left| \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z) dF_{{\varepsilon }_S}(z) - \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z)dF_{{\widetilde{{\varepsilon }}}_S}(z)\right| \le C_{\text {Lip}}W_1({\varepsilon }_S, {\widetilde{{\varepsilon }}}_S). \end{aligned}$$

(69)

Meanwhile, note ${\text {supp}}({\varepsilon }_S) = {\text {supp}}({\widetilde{{\varepsilon }}}_S)\subseteq [-1,1]$. This combined with the fact that $X_S^\top \beta _S$ is sub-exponential implies that

$$\begin{aligned}&\left| \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z) dF_{{\varepsilon }_S}(z) - \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z)dF_{{\widetilde{{\varepsilon }}}_S}(z)\right| \nonumber \\&\quad =\ \left| \int _{{\mathbb {R}}}1-F_{X_S^\top \beta _S}(y-z) dF_{{\varepsilon }_S}(z) - \int _{{\mathbb {R}}}1-F_{X_S^\top \beta _S}(y-z)dF_{{\widetilde{{\varepsilon }}}_S}(z)\right| \nonumber \\&\quad \le \ \frac{1}{2}\left( M_1[F_{X_S^\top \beta _S}](y) + M_1[1-F_{X_S^\top \beta _S}](y)\right) \nonumber \\&\quad \le \ \exp \left( -\frac{\max \{|y-1|, |y+1|\}}{C}\right) \nonumber \\&\quad \le \ \exp \left( -\frac{|y|}{2C}\right)&|y|\ge 2, \end{aligned}$$

(70)

where C is an absolute constant depending only on the sub-exponential norm of $\Vert X_S\Vert _2$, and $M_1$ is the 1-local maximum operator in Definition 3.4. Substituting (69) and (70) into (68) and applying a truncated estimate,

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y''})|Z_S, Y_{\text {epr}}]|\nonumber \\&\le \ {{\mathbb {E}}}\left[ \int _{|y|<\max \{2, 4C\log m\}}\sqrt{W_1({\varepsilon }_S, {\widetilde{{\varepsilon }}}_S)} dy + \int _{|y|\ge \max \{2, 4C\log m\}}\exp \left( -\frac{|y|}{4C}\right) dy|Z_S, Y_{\text {epr}}\right] \nonumber \\&\le \ {{\mathbb {E}}}\left[ (4+8C\log m)\sqrt{W_1({\varepsilon }_S, {\widetilde{{\varepsilon }}}_S)} + \frac{2}{m}|Z_S, Y_{\text {epr}}\right] . \end{aligned}$$

Since $\sqrt{W_1({\varepsilon }_S, {\widetilde{{\varepsilon }}}_S)}$ is independent of $Z_S$, taking expectation over the exploration noise together with Jensen’s inequality and Lemma 3.2 yields

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y''})|Z_S]|\le (4+8C\log m)\nonumber \\ {}&\quad {{\mathbb {E}}}[W_1({\varepsilon }_S, {\widetilde{{\varepsilon }}}_S)]^{1/2} + \frac{2}{m}\lesssim \frac{\log m}{\sqrt{m}}\rightarrow 0. \end{aligned}$$

(71)

To prove (b), note that conditioning on $Z_S$ and $Y_{\text {epr}}$, the difference between ${\widetilde{\tau }}_\ell $ and ${\widehat{\tau }}_\ell $ is bounded as follows:

$$\begin{aligned} |{\widetilde{\tau }}_\ell -{\widehat{\tau }}_\ell |=|X_{S,\ell }^\top ({{\widehat{\beta }}}_S-\beta _S)| \le \Vert X_{S,\ell }\Vert _2\Vert {{\widehat{\beta }}}_S-\beta _S\Vert _2{\mathop {\le }\limits ^{(66)}}K_m^*\delta \delta = \Vert {{\widehat{\beta }}}_S-\beta _S\Vert _2, \end{aligned}$$

(72)

where ${\widehat{\tau }}_\ell , {\widetilde{\tau }}_\ell $ are defined in (63). Moreover, since $\text {supp}({\varepsilon }_S)\subseteq [-1, 1]$, $|{\widetilde{\tau }}_\ell |\le 1$. This combined with (72) implies

$$\begin{aligned}&{\text {supp}}({\widetilde{{\varepsilon }}}_S)\cup {\text {supp}}({\widehat{{\varepsilon }}}_S)\subseteq [-r, r]&r = 1 + K_m^*\delta . \end{aligned}$$

(73)

The rest is similar to the proof of statement (a),

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y''})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S, Y_{\text {epr}}]|\nonumber \\&\quad \le \ {{\mathbb {E}}}\left[ \int _{{\mathbb {R}}}\sqrt{|F_{Y''}(y)-F_{Y'''}(y)|}dy|Z_S, Y_{\text {epr}}\right] \nonumber \\&\quad \le \ {{\mathbb {E}}}\left[ \int _{{\mathbb {R}}}\sqrt{\frac{1}{m}\sum _{\ell \in [m]}|F_{X_S^\top \beta _S}(y-{\widehat{\tau }}_\ell )-F_{X_S^\top \beta _S}(y-{\widetilde{\tau }}_\ell )|}dy|Z_S, Y_{\text {epr}}\right] . \end{aligned}$$

It is easy to verify using the Lipschitz assumption and the tail bound of $X_S^\top \beta _S$ that

$$\begin{aligned} \frac{1}{m}\sum _{\ell \in [m]}|F_{X_S^\top \beta _S}(y-{\widehat{\tau }}_\ell )-F_{X_S^\top \beta _S}(y-{\widetilde{\tau }}_\ell )|&\le C_{\text {Lip}}K_m^*\delta \\ \frac{1}{m}\sum _{\ell \in [m]}|F_{X_S^\top \beta _S}(y-{\widehat{\tau }}_\ell )-F_{X_S^\top \beta _S}(y-{\widetilde{\tau }}_\ell )|&\le \exp \left( -\frac{|y|}{2C}\right)&|y|\ge 2r. \end{aligned}$$

Thus,

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y''})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S, Y_{\text {epr}}]|\\&\quad \le \ {{\mathbb {E}}}\left[ \int _{|y|<\max \{2r, 4C\log m\}}\sqrt{C_{\text {Lip}}K_m^*\delta } dy \right. \\&\left. \qquad + \int _{|y|\ge \max \{2r, 4C\log m\}}\exp \left( -\frac{|y|}{4C}\right) dy|Z_S, Y_{\text {epr}}\right] \\&\quad \le \ {{\mathbb {E}}}\left[ (4r + 8C\log m)\sqrt{C_{\text {Lip}}K_m^*\delta } +\frac{2}{m}|Z_S, Y_{\text {epr}}\right] . \end{aligned}$$

Averaging out exploration noise and applying Jensen’s inequality,

$$\begin{aligned} \begin{aligned} |{{\mathbb {E}}}[J_1(F_{Y''})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S]|&\lesssim {{\mathbb {E}}}\left[ \left( K_m^*\delta \right) ^{3/2} +\log m \sqrt{K_m^*\delta } + \frac{2}{m}|Z_S\right] \\&\lesssim (K_m^*)^{3/2}{{\mathbb {E}}}[\delta ^2|Z_S]^{3/4} + \log m\sqrt{K_m^*}{{\mathbb {E}}}[\delta ^2|Z_S]^{1/4} + \frac{1}{m}\\&{\mathop {\lesssim }\limits ^{(60), (67)}}\frac{(\log m)^{3/2}}{m^{1/4}}\rightarrow 0&a.s. \end{aligned} \end{aligned}$$

To prove (c), recall from (73) that conditioning on $Z_S$ and $Y_{\text {epr}}$, ${\text {supp}}({\widehat{{\varepsilon }}}_S)\subseteq [-r ,r]$. Applying Lemma 3.5,

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y'})|Z_S, Y_{\text {epr}}]|\nonumber \\ {}&\le {{\mathbb {E}}}\left[ \left\| M_{r}[|F_{X_S^\top {{\widehat{\beta }}}_S}-F_{X_S^\top \beta _S}|]\right\| ^{1/2}_{L^{1/2}_{{\mathbb {R}}}}| Z_S, Y_{\text {epr}}\right] . \end{aligned}$$

(74)

If $\delta <1/2$, then $1/2<\Vert {{\widehat{\beta }}}_S\Vert _2< 3/2$. In this case, the $C_1, C_2, C_3$ in Lemma 3.7 are absolute constants. According to Lemma 3.7 with $p = 1/2$,

$$\begin{aligned} \left\| M_{r}[|F_{X_S^\top {{\widehat{\beta }}}_S}-F_{X_S^\top \beta _S}|]\right\| ^{1/2}_{L^{1/2}_{{\mathbb {R}}}}\lesssim (r+ 1)\delta ^{5/12}\log \left( 1/\delta \right) \le (r+1)\delta ^{1/4}, \end{aligned}$$

(75)

where we used $\log (1/\delta )<\delta ^{-1/6}$ when $\delta \le 1/2$.

If $\delta \ge 1/2$, the same result in Lemma 3.7 implies

$$\begin{aligned} \left\| M_{r}[|F_{X_S^\top {{\widehat{\beta }}}_S}-F_{X_S^\top \beta _S}|]\right\| ^{1/2}_{L^{1/2}_{{\mathbb {R}}}}\lesssim (r + 1+\delta )\delta . \end{aligned}$$

(76)

Substituting (75) and (76) into (74) yields that

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y'})|Z_S, Y_{\text {epr}}]|\\ {}&\quad \lesssim {{\mathbb {E}}}[(r+1)\delta ^{1/4} + (r + 1+\delta )\delta | Z_S, Y_{\text {epr}}]. \end{aligned}$$

Taking expectation over the exploration noise and applying Jensen’s inequality,

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y'})|Z_S]|\\&\quad \lesssim \ {{\mathbb {E}}}[\delta ^{1/4}|Z_S] + {{\mathbb {E}}}[\delta |Z_S]+K_m^*{{\mathbb {E}}}[\delta ^{5/4}|Z_S]+(K_m)^2{{\mathbb {E}}}[\delta ^2|Z_S]\\&\quad \lesssim \ {{\mathbb {E}}}[\delta ^2|Z_S]^{1/8} + {{\mathbb {E}}}[\delta ^2|Z_S]^{1/2}+K_m^*{{\mathbb {E}}}[\delta ^{2}|Z_S]^{5/8}+(K_m)^2{{\mathbb {E}}}[\delta ^2|Z_S]\\&{\mathop {\lesssim }\limits ^{60, 66}}\ \frac{1}{m^{1/8}}\rightarrow 0&a.s. \end{aligned}$$

(28b) is proved by combining statements (a), (b), (c).

A quantile regression framework

Quantile regression offers an alternative approach to simulating Y through a random coefficient interpretation [15]. For any $S\subseteq [n]$ and $\tau \in (0,1)$, we assume the conditional $\tau $-th quantile of Y on $X_S$ satisfies

$$\begin{aligned} F^{-1}_{Y|X_S}(\tau ) = X_S^\top \beta _S(\tau ), \end{aligned}$$

(77)

where $\beta _S(\tau )$ the $\tau $-th coefficient vector. (77) is a standard quantile regression formulation, and can be used to model heteroscedastic noise effects.

$$\begin{aligned}&{\widehat{\beta }}_S(\tau ) = {{\,\mathrm{arg\,min}\,}}_{\beta \in {{\mathbb {R}}}^{s+1}}\frac{1}{m}\sum _{\ell \in [m]}\rho _\tau (Y_\ell - X^\top _{{\text {epr}},\ell }\beta )&\rho _\tau (x) = x(\tau - {\varvec{1}}_{x<0}). \end{aligned}$$

Thus, (77) approximately equals

$$\begin{aligned} {\widehat{F}}^{-1}_{Y|X_S}(\tau ) = X_S^\top {\widehat{\beta }}_S(\tau ). \end{aligned}$$

(78)

As opposed to (23), (78) provides a way to simulate Y based on $X_S$ via inverse transform sampling:

(79)

In our case, $X_{{\text {epr}}, \ell }, \ell \in [m]$ are i.i.d. samples so (77) fits into a random design quantile regression framework as analyzed in [21], where the authors established a strong consistency result for ${\widehat{\beta }}_S(\tau )$ under suitable conditions. The consistency result can be further proven to hold uniformly for all $\tau \in [\delta ,1-\delta ]$ for any fixed $\delta >0$, which justifies the asymptotic behavior of the procedure in (78) as $m, N_S\rightarrow \infty $.

In the quantile regression framework, obtaining the optimal choices for m and S is much harder than in the linear regression setup. The AETC-d-q algorithm in Sect. 6 implements (79) with m set as the adaptive exploration rate given by the AETC-d, S as the corresponding model output for exploitation, and U approximated via $\frac{1}{K}\sum _{j\in [K]}\delta _{\frac{j}{K+1}}$ with $K=100$.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xu, Y., Narayan, A. Budget-limited distribution learning in multifidelity problems. Numer. Math. 153, 171–212 (2023). https://doi.org/10.1007/s00211-022-01337-5

Download citation

Received: 23 April 2022
Revised: 18 November 2022
Accepted: 26 November 2022
Published: 16 December 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s00211-022-01337-5

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Budget-limited distribution learning in multifidelity problems

Abstract

Access this article

Similar content being viewed by others

Simulated Annealing: From Basics to Applications

Performance assessment of the metaheuristic optimization algorithms: an exhaustive review

Novel approaches for hyper-parameter tuning of physics-informed Gaussian processes: application to parametric PDEs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

Proof of Lemma 4.1

A quantile regression framework

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Budget-limited distribution learning in multifidelity problems

Abstract

Access this article

Similar content being viewed by others

Simulated Annealing: From Basics to Applications

Performance assessment of the metaheuristic optimization algorithms: an exhaustive review

Novel approaches for hyper-parameter tuning of physics-informed Gaussian processes: application to parametric PDEs

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

Proof of Lemma 4.1

A quantile regression framework

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation