Skip to main content
Log in

Budget-limited distribution learning in multifidelity problems

  • Published:
Numerische Mathematik Aims and scope Submit manuscript

Abstract

Multifidelity methods are widely used for estimating quantities of interest (QoI) in computational science by employing numerical simulations of differing costs and accuracies. Many methods approximate numerical-valued statistics that represent only limited information, e.g., scalar statistics, about the QoI. Further quantification of uncertainty, e.g., for risk assessment, failure probabilities, or confidence intervals, requires estimation of the full distributions. In this paper, we generalize the ideas in (Xu et al. in SIAM J Sci Comput 44(1):A150–A175, 2022) to develop a multifidelity method that approximates the full distribution of scalar-valued QoI. The main advantage of our approach compared to alternative methods is that we require no particular relationships among the high and lower-fidelity models (e.g. model hierarchy), and we do not assume any knowledge of model statistics including correlations and other cross-model statistics before the procedure starts. Under suitable assumptions in the framework above, we achieve provable 1-Wasserstein metric convergence of an algorithmically constructed distributional emulator via an exploration–exploitation strategy. We also prove that crucial policy actions taken by our algorithm are budget-asymptotically optimal. Numerical experiments are provided to support our theoretical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. For example, requiring that models with higher correlation relative to the high-fidelity model should also incur higher cost is one such model cost behavior.

  2. A sequence of probability measures \(\{P_k\}\) defined on a metric space is called \(\delta \)-tight if for every \({\varepsilon }>0\), there exist a compact measurable set K and a sequence \(\delta _k\downarrow 0\) such that \(P_k(K^{\delta _k})>1-{\varepsilon }\) for every k, where \(K^{\delta _k}: = \{x: \text {dist}(x, K)<\delta _k\}\).

  3. When the variance ratio between \({\varepsilon }_S\) and Y is small, \(Y\approx X_S^\top \beta _S\), so that adding the noise emulator has little impact on the accuracy of the resulting estimator. When the ratio is moderate, adding \({\varepsilon }_S\) as an independent component will degrade the quality of the estimator if the independence assumption is violated.

References

  1. Beran, R., Le Cam, L., Millar, P.: Convergence of stochastic empirical measures. J. Multivar. Anal. 23(1), 159–168 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bobkov,S., Ledoux, M.: One-Dimensional Empirical Measures, Order Statistics, and Kantorovich Transport Distances. Vol. 261, No. 1259. American Mathematical Society (2019)

  3. Bubeck, S., Cesa-Bianchi, N., et al.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends R Mach. Learn. 5(1), 1–122 (2012)

    Article  MATH  Google Scholar 

  4. Cambanis, S., Simons, G., Stout, W.: Inequalities for E k (x, y) when the marginals are fixed. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 36(4), 285–294 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  5. Chatterjee, S.: Lecture Notes on Stein’s Method. Stanford Lecture Notes (2007)

  6. Cohen, A., DeVore, R.: Approximation of high-dimensional parametric PDEs. Acta Numer. 24, 1–159 (2015). https://doi.org/10.1017/S0962492915000033. (ISSN: 1474-0508)

    Article  MathSciNet  MATH  Google Scholar 

  7. Farcas, I.-G.: Context-aware model hierarchies for higher-dimensional uncertainty quantification. PhD thesis. Technische Universität München (2020)

  8. Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002). https://doi.org/10.1111/j.1751-5823.2002.tb00178.x

    Article  MATH  Google Scholar 

  9. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  10. Giles, M.B., Nagapetyan, T., Ritter, K.: Multilevel Monte Carlo approximation of distribution functions and densities. SIAM/ASA J. Uncertain. Quantif. 3(1), 267–295 (2015). https://doi.org/10.1137/140960086

    Article  MathSciNet  MATH  Google Scholar 

  11. Giles, M.B., Nagapetyan, T., Ritter, K.: Adaptive multilevel Monte Carlo approximation of distribution functions. arXiv preprint arXiv:1706.06869 (2017)

  12. Gorodetsky, A.A., Geraci, G., Eldred, M.S., Jakeman, J.D.: A generalized approximate control variate framework for multifidelity uncertainty quantification. J. Comput. Phys. 408, 109257 (2020). https://doi.org/10.1016/j.jcp.2020.109257

    Article  MathSciNet  MATH  Google Scholar 

  13. Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods. Methuen, London (1964)

    Book  MATH  Google Scholar 

  14. Ishigami,T., Homma, T.: An importance quantification technique in uncertainty analysis for computer models. In: [1990] Proceedings. First International Symposium on Uncertainty Modeling and Analysis. IEEE Computere Society Press (1990). https://doi.org/10.1109/isuma.1990.151285

  15. Koenker, R.: Fundamentals of quantile regression. In: Quantile Regression, pp. 26–67. Cambridge University Press, Cambridge (2001). https://doi.org/10.1017/ccol0521845734.002

    Chapter  Google Scholar 

  16. Krumscheid, S., Nobile, F.: Multilevel Monte Carlo Approximation of Functions. SIAM/ASA J. Uncertain. Quantif. 6(3), 1256–1293 (2018). https://doi.org/10.1137/17m1135566

    Article  MathSciNet  MATH  Google Scholar 

  17. Lai, T.L., Wei, C.Z.: Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Stat. 10(1), 154–166 (1982). https://doi.org/10.1214/aos/1176345697

    Article  MathSciNet  MATH  Google Scholar 

  18. Lattimore, T., Szepesvári, C.: Bandit Algorithms. Cambridge University Press, Cambridge (2020)

    Book  MATH  Google Scholar 

  19. Lu, D., Zhang, G., Webster, C., Barbier, C.: An improved multilevel Monte Carlo method for estimating probability distribution functions in stochastic oil reservoir simulations. Water Resour. Res. 52(12), 9642–9660 (2016). https://doi.org/10.1002/2016wr019475

    Article  Google Scholar 

  20. Massart, P.: The tight constant in the Dvoretzky–Kiefer–Wolfowitz inequality. Ann. Probab. 18, 1269–1283 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  21. Pan, X., Zhou, W.-X.: Multiplier bootstrap for quantile regression: non-asymptotic theory under random design. Inf. Inference J. IMA (2020). https://doi.org/10.1093/imaiai/iaaa006

    Article  Google Scholar 

  22. Panaretos, V.M., Zemel, Y.: Statistical aspects of Wasserstein distances. Annu. Rev. Stat. Appl. 6, 405–431 (2019)

    Article  MathSciNet  Google Scholar 

  23. Peherstorfer, B.: Multifidelity Monte Carlo estimation with adaptive low-fidelity models. SIAM/ASA J. Uncertain. Quantif. 7(2), 579–603 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  24. Peherstorfer, B., Willcox, K., Gunzburger, M.: Optimal model management for multifidelity Monte Carlo estimation. SIAM J. Sci. Comput. 38(5), A3163–A3194 (2016). https://doi.org/10.1137/15m1046472

    Article  MathSciNet  MATH  Google Scholar 

  25. Peherstorfer, B., Willcox, K., Gunzburger, M.: Survey of multifidelity methods in uncertainty propagation, inference, and optimization. SIAM Rev. 60(3), A550–A591 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  26. Qian, E., Peherstorfer, B., O’Malley, D., Vesselinov, V.V., Willcox, K.: Multifidelity Monte Carlo estimation of variance and sensitivity indices. SIAM/ASA J. Uncertain. Quantif. 6(2), 683–706 (2018). https://doi.org/10.1137/17m1151006

    Article  MathSciNet  MATH  Google Scholar 

  27. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria (2020)

  28. Robinson, P.M.: Root-N-consistent semiparametric regression. Econometrica 56(4), 931 (1988). https://doi.org/10.2307/1912705

    Article  MathSciNet  MATH  Google Scholar 

  29. Schaden, D., Ullmann, E.: On multilevel best linear unbiased estimators. SIAM/ASA J. Uncertain. Quantif. 8(2), 601–635 (2020). https://doi.org/10.1137/19m1263534

    Article  MathSciNet  MATH  Google Scholar 

  30. Schaden, D., Ullmann, E.: Asymptotic analysis of multilevel best linear unbiased estimators. SIAM/ASA J. Uncertain. Quantif. 9(3), 953–978 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  31. Vershynin, R.: High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47. Cambridge University Press, Cambridge (2018)

    MATH  Google Scholar 

  32. Villani, C.: The metric side of optimal transportation. In: Graduate Studies in Mathematics, pp. 205–235. (2003). https://doi.org/10.1090/gsm/058/08

    Chapter  Google Scholar 

  33. Williams, D.: Probability with Martingales. Cambridge University Press, Cambridge (1991)

    Book  MATH  Google Scholar 

  34. Xu, Y., Keshavarzzadeh, V., Kirby, R.M., Narayan, A.: A bandit-learning approach to multifidelity approximation. SIAM J. Sci. Comput. 44(1), A150–A175 (2022)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the referees for their time and helpful comments which significantly improved the presentation of the manuscript. Y. Xu and A. Narayan are partially supported by National Science Foundation DMS-1848508. A. Narayan is partially supported by the Air Force Office of Scientific Research award FA9550-20-1-0338. Y. Xu would like to thank Dr. Xiaoou Pan for clarifying a uniform consistency result in quantile regression. We also thank Dr. Ruijian Han for a careful reading of an early draft, and for providing several comments that improved the presentation of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yiming Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

Proof of Lemma 4.1

Let \(|S| = s\). We first prove (28a). Recall the definition of Y and \(Y'\):

$$\begin{aligned}&Y = X_S^\top \beta _S + {\varepsilon }_S&Y' = X_S^\top {\widehat{\beta }}_S + {\widehat{{\varepsilon }}}_S \end{aligned}$$

where \({\varepsilon }_S, {\widehat{{\varepsilon }}}_S\) are independent of \(X_S\) by Assumption 2.2.

Define the true empirical distribution of \({\varepsilon }_S\) using exploration samples as

$$\begin{aligned} {\widetilde{{\varepsilon }}}_S\sim \frac{1}{m}\sum _{\ell \in [m]}\delta _{Y_\ell -X_{S,\ell }^\top \beta _S}. \end{aligned}$$

by the additive property of \(W_1\) metric under independence [22] and the triangle inequality, we can upper bound the average \(W_1\) distance between \(F_Y\) and \(F_{Y'}\) by conditioning on the exploration data:

$$\begin{aligned}&{{\mathbb {E}}}\left[ W_1(F_Y, F_{Y'})|Z_S, Y_{{\text {epr}}}\right] \nonumber \\&\quad \le \ {{\mathbb {E}}}[W_1(F_{X_S^\top \beta _S}, F_{X_S^\top {\widehat{\beta }}_S})|Z_S, Y_{{\text {epr}}}] +{{\mathbb {E}}}[W_1(F_{{\varepsilon }_S}, F_{{\widehat{{\varepsilon }}}_S})|Z_S, Y_{{\text {epr}}}]\nonumber \\&\quad \le \ {{\mathbb {E}}}[W_1(F_{X_S^\top \beta _S}, F_{X_S^\top {\widehat{\beta }}_S})|Z_S, Y_{{\text {epr}}}] +{{\mathbb {E}}}[W_1(F_{{\varepsilon }_S}, F_{{\widetilde{{\varepsilon }}}_S})|Z_S, Y_{{\text {epr}}}] \nonumber \\&\qquad +{{\mathbb {E}}}[W_1(F_{{\widetilde{{\varepsilon }}}_S}, F_{{\widehat{{\varepsilon }}}_S})|Z_S, Y_{{\text {epr}}}]. \end{aligned}$$
(59)

For the first term in (59), note \({{\widehat{\beta }}}_S-\beta _S\) satisfies

$$\begin{aligned} {{\widehat{\beta }}}_S-\beta _S\sim \mathcal (0, \sigma _S^2(Z^\top _SZ_S)^{-1}). \end{aligned}$$
(60)

Averaging out the randomness of exploration noise, we have that, almost surely,

$$\begin{aligned} {{\mathbb {E}}}[W_1(F_{X_S^\top \beta _S}, F_{X_S^\top {\widehat{\beta }}_S})|Z_S]&\le \left( {{\mathbb {E}}}[W^2_2(F_{X_S^\top \beta _S}, F_{X_S^\top {\widehat{\beta }}_S})|Z_S]\right) ^{1/2}\nonumber \\&\le \left( {{\mathbb {E}}}[|X_S^\top ({{\widehat{\beta }}}_S-\beta _S)|^2 |Z_S]\right) ^{1/2}\nonumber \\&=\left( \hbox {tr}({{\mathbb {E}}}[X_S X_S^\top ] {{\mathbb {E}}}[({{\widehat{\beta }}}_S-\beta _S)({{\widehat{\beta }}}_S-\beta _S)^\top | Z_S])\right) ^{1/2}\nonumber \\&= \left( \frac{\sigma _S^2}{m}\hbox {tr}(\Lambda _S(m^{-1}Z_S^\top Z_S)^{-1})\right) ^{1/2}\nonumber \\&\simeq \sqrt{\frac{s+1}{m}}\sigma _S, \end{aligned}$$
(61)

where the first step uses Jensen’s inequality, and the last step follows from the law of large numbers and Assumption 2.3.

For the second term in (59), note that \({\widetilde{{\varepsilon }}}_S\) is the empirical distribution of \({\varepsilon }_S\) based on m exploration samples, which does not depend on \(Z_S\) under Assumption 2.2. Applying the nonasymptotic estimates on the convergence rate of empirical measures in Lemma 3.2 obtains

$$\begin{aligned} {{\mathbb {E}}}[W_1\left( F_{{\varepsilon }_S}, F_{{\widetilde{{\varepsilon }}}_S}\right) |Z_S] = {{\mathbb {E}}}[W_1\left( F_{{\varepsilon }_S}, F_{{\widetilde{{\varepsilon }}}_S}\right) ]\le \frac{J_1(F_{{\varepsilon }_S})}{\sqrt{m}}, \end{aligned}$$
(62)

where \(J_1\) is defined in (10).

For the third term in (59), consider the natural coupling between \({\widehat{{\varepsilon }}}_S\) and \({\widetilde{{\varepsilon }}}_S\): \({\widetilde{{\varepsilon }}}^{\leftarrow }_S({\widetilde{\tau }}_\ell ) = {\widehat{{\varepsilon }}}^{\leftarrow }_S({\widehat{\tau }}_\ell )\), where \(^{\leftarrow }\) denotes the preimage of a map and

$$\begin{aligned}&{\widetilde{\tau }}_\ell = Y_\ell -X_{S,\ell }^\top \beta _S&{\widehat{\tau }}_\ell = Y_\ell -X_{S,\ell }^\top {{\widehat{\beta }}}_S. \end{aligned}$$
(63)

In this case,

$$\begin{aligned}&{{\mathbb {E}}}[W_1(F_{{\widetilde{{\varepsilon }}}_S}, F_{{\widehat{{\varepsilon }}}_S})|Z_S]\le ({{\mathbb {E}}}[W^2_2(F_{{\widetilde{{\varepsilon }}}_S}, F_{{\widehat{{\varepsilon }}}_S})|Z_S])^{1/2}\le ({{\mathbb {E}}}[|{\widetilde{{\varepsilon }}}_S-{\widehat{{\varepsilon }}}_S|^2|Z_S])^{1/2}\nonumber \\&\quad =\left( \frac{1}{m}\sum _{\ell \in [m]}{{\mathbb {E}}}[(X^\top _{S,\ell }({{\widehat{\beta }}}_S-\beta _S))^2|Z_S]\right) ^{1/2}\nonumber \\&\quad = \sqrt{\frac{s+1}{m}}\sigma _S. \end{aligned}$$
(64)

Putting (61), (62), (64) together finishes the proof of (28a).

We next prove (28b). Conditioned on \(Z_S\) and \(Y_{\text {epr}}\), \(Y'\) is a random variable with bounded r-th moments for all \(r>2\). Appealing to Lemma 3.2 and averaging over the exploration noise, we have

$$\begin{aligned} {{\mathbb {E}}}\left[ W_1\left( {\widehat{F}}_{Y,S}, F_{Y'}\right) |Z_S, Y_{\text {epr}}\right]&\le \frac{{{\mathbb {E}}}[J_1(F_{Y'})|Z_S, Y_{\text {epr}}]}{\sqrt{N_S}}\Longrightarrow {{\mathbb {E}}}\left[ W_1\left( {\widehat{F}}_{Y,S}, F_{Y'}\right) |Z_S\right] \nonumber \\&\le \frac{{{\mathbb {E}}}[J_1(F_{Y'})|Z_S]}{\sqrt{N_S}}, \end{aligned}$$
(65)

where \(J_1\) is defined in (10). The desired result would follow if we can show that \({{\mathbb {E}}}[J_1(F_{Y'})|Z_S]\) converges to \(J_1(F_{Y})\) a.s. as \(m\rightarrow \infty \). To this end, we introduce the following intermediate random variables:

$$\begin{aligned}&Y'' = X_S^\top \beta _S + {\widetilde{{\varepsilon }}}_S&Y''' = X_S^\top \beta _S + {\widehat{{\varepsilon }}}_S. \end{aligned}$$

We will prove the desired result by verifying the following convergence statements respectively:

  1. (a).

    \(|{{\mathbb {E}}}[J_1(F_{Y})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y''})|Z_S]|\rightarrow 0\) a.s.;

  2. (b).

    \(|{{\mathbb {E}}}[J_1(F_{Y''})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S]|\rightarrow 0\) a.s.;

  3. (c).

    \(|{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y'})|Z_S]|\rightarrow 0\) a.s.

Without loss of generality, we assume \(\text {supp}({\varepsilon }_S)\subseteq [-1, 1]\) and \(\Vert \beta _S\Vert _2 = 1\); the general case can be considered similarly by taking appropriate scaling involving a constant C in Assumption 2.5.

We introduce the following quantity for our analysis:

$$\begin{aligned} K_m^* = \max _{\ell \in [m]}\Vert X_{S,\ell }\Vert _2. \end{aligned}$$
(66)

It is clear that \(K_m^*\) depends only on \(Z_S\). Under Assumption 2.4, \(X_{S,\ell }\)’s are i.i.d. sub-exponential random variables with uniformly bounded sub-exponential norm. By Lemma 3.6,

$$\begin{aligned}&K_m^*\lesssim \log m&a.s., \end{aligned}$$
(67)

where the implicit constant is realization-dependent.

To prove (a), we condition on \(Z_S\) and \(Y_{\text {epr}}\). Using (13)–(15) in Lemma 3.5,

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y''})|Z_S, Y_{\text {epr}}]|\nonumber \\&\quad \le \ {{\mathbb {E}}}\left[ \int _{{\mathbb {R}}}\sqrt{| F_Y(y)-F_{Y''}(y)|} dy| Z_S, Y_{\text {epr}}\right] \nonumber \\&\quad =\ {{\mathbb {E}}}\left[ \int _{{\mathbb {R}}}\sqrt{\left| \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z) dF_{{\varepsilon }_S}(z) - \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z)dF_{{\widetilde{{\varepsilon }}}_S}(z)\right| } dy| Z_S, Y_{\text {epr}}\right] . \end{aligned}$$
(68)

Under Assumption 2.4, for every y, since \(\Vert \beta _S\Vert _2 = 1\), \(F_{X_S^\top \beta _S}(y-z)\) as a function of z, is \(C_{\text {Lip}}\)-Lipschitz. By the Kantorovich-Rubinstein duality (6),

$$\begin{aligned} \left| \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z) dF_{{\varepsilon }_S}(z) - \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z)dF_{{\widetilde{{\varepsilon }}}_S}(z)\right| \le C_{\text {Lip}}W_1({\varepsilon }_S, {\widetilde{{\varepsilon }}}_S). \end{aligned}$$
(69)

Meanwhile, note \({\text {supp}}({\varepsilon }_S) = {\text {supp}}({\widetilde{{\varepsilon }}}_S)\subseteq [-1,1]\). This combined with the fact that \(X_S^\top \beta _S\) is sub-exponential implies that

$$\begin{aligned}&\left| \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z) dF_{{\varepsilon }_S}(z) - \int _{{\mathbb {R}}}F_{X_S^\top \beta _S}(y-z)dF_{{\widetilde{{\varepsilon }}}_S}(z)\right| \nonumber \\&\quad =\ \left| \int _{{\mathbb {R}}}1-F_{X_S^\top \beta _S}(y-z) dF_{{\varepsilon }_S}(z) - \int _{{\mathbb {R}}}1-F_{X_S^\top \beta _S}(y-z)dF_{{\widetilde{{\varepsilon }}}_S}(z)\right| \nonumber \\&\quad \le \ \frac{1}{2}\left( M_1[F_{X_S^\top \beta _S}](y) + M_1[1-F_{X_S^\top \beta _S}](y)\right) \nonumber \\&\quad \le \ \exp \left( -\frac{\max \{|y-1|, |y+1|\}}{C}\right) \nonumber \\&\quad \le \ \exp \left( -\frac{|y|}{2C}\right)&|y|\ge 2, \end{aligned}$$
(70)

where C is an absolute constant depending only on the sub-exponential norm of \(\Vert X_S\Vert _2\), and \(M_1\) is the 1-local maximum operator in Definition 3.4. Substituting (69) and (70) into (68) and applying a truncated estimate,

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y''})|Z_S, Y_{\text {epr}}]|\nonumber \\&\le \ {{\mathbb {E}}}\left[ \int _{|y|<\max \{2, 4C\log m\}}\sqrt{W_1({\varepsilon }_S, {\widetilde{{\varepsilon }}}_S)} dy + \int _{|y|\ge \max \{2, 4C\log m\}}\exp \left( -\frac{|y|}{4C}\right) dy|Z_S, Y_{\text {epr}}\right] \nonumber \\&\le \ {{\mathbb {E}}}\left[ (4+8C\log m)\sqrt{W_1({\varepsilon }_S, {\widetilde{{\varepsilon }}}_S)} + \frac{2}{m}|Z_S, Y_{\text {epr}}\right] . \end{aligned}$$

Since \(\sqrt{W_1({\varepsilon }_S, {\widetilde{{\varepsilon }}}_S)}\) is independent of \(Z_S\), taking expectation over the exploration noise together with Jensen’s inequality and Lemma 3.2 yields

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y''})|Z_S]|\le (4+8C\log m)\nonumber \\ {}&\quad {{\mathbb {E}}}[W_1({\varepsilon }_S, {\widetilde{{\varepsilon }}}_S)]^{1/2} + \frac{2}{m}\lesssim \frac{\log m}{\sqrt{m}}\rightarrow 0. \end{aligned}$$
(71)

To prove (b), note that conditioning on \(Z_S\) and \(Y_{\text {epr}}\), the difference between \({\widetilde{\tau }}_\ell \) and \({\widehat{\tau }}_\ell \) is bounded as follows:

$$\begin{aligned} |{\widetilde{\tau }}_\ell -{\widehat{\tau }}_\ell |=|X_{S,\ell }^\top ({{\widehat{\beta }}}_S-\beta _S)| \le \Vert X_{S,\ell }\Vert _2\Vert {{\widehat{\beta }}}_S-\beta _S\Vert _2{\mathop {\le }\limits ^{(66)}}K_m^*\delta \delta = \Vert {{\widehat{\beta }}}_S-\beta _S\Vert _2, \end{aligned}$$
(72)

where \({\widehat{\tau }}_\ell , {\widetilde{\tau }}_\ell \) are defined in (63). Moreover, since \(\text {supp}({\varepsilon }_S)\subseteq [-1, 1]\), \(|{\widetilde{\tau }}_\ell |\le 1\). This combined with (72) implies

$$\begin{aligned}&{\text {supp}}({\widetilde{{\varepsilon }}}_S)\cup {\text {supp}}({\widehat{{\varepsilon }}}_S)\subseteq [-r, r]&r = 1 + K_m^*\delta . \end{aligned}$$
(73)

The rest is similar to the proof of statement (a),

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y''})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S, Y_{\text {epr}}]|\nonumber \\&\quad \le \ {{\mathbb {E}}}\left[ \int _{{\mathbb {R}}}\sqrt{|F_{Y''}(y)-F_{Y'''}(y)|}dy|Z_S, Y_{\text {epr}}\right] \nonumber \\&\quad \le \ {{\mathbb {E}}}\left[ \int _{{\mathbb {R}}}\sqrt{\frac{1}{m}\sum _{\ell \in [m]}|F_{X_S^\top \beta _S}(y-{\widehat{\tau }}_\ell )-F_{X_S^\top \beta _S}(y-{\widetilde{\tau }}_\ell )|}dy|Z_S, Y_{\text {epr}}\right] . \end{aligned}$$

It is easy to verify using the Lipschitz assumption and the tail bound of \(X_S^\top \beta _S\) that

$$\begin{aligned} \frac{1}{m}\sum _{\ell \in [m]}|F_{X_S^\top \beta _S}(y-{\widehat{\tau }}_\ell )-F_{X_S^\top \beta _S}(y-{\widetilde{\tau }}_\ell )|&\le C_{\text {Lip}}K_m^*\delta \\ \frac{1}{m}\sum _{\ell \in [m]}|F_{X_S^\top \beta _S}(y-{\widehat{\tau }}_\ell )-F_{X_S^\top \beta _S}(y-{\widetilde{\tau }}_\ell )|&\le \exp \left( -\frac{|y|}{2C}\right)&|y|\ge 2r. \end{aligned}$$

Thus,

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y''})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S, Y_{\text {epr}}]|\\&\quad \le \ {{\mathbb {E}}}\left[ \int _{|y|<\max \{2r, 4C\log m\}}\sqrt{C_{\text {Lip}}K_m^*\delta } dy \right. \\&\left. \qquad + \int _{|y|\ge \max \{2r, 4C\log m\}}\exp \left( -\frac{|y|}{4C}\right) dy|Z_S, Y_{\text {epr}}\right] \\&\quad \le \ {{\mathbb {E}}}\left[ (4r + 8C\log m)\sqrt{C_{\text {Lip}}K_m^*\delta } +\frac{2}{m}|Z_S, Y_{\text {epr}}\right] . \end{aligned}$$

Averaging out exploration noise and applying Jensen’s inequality,

$$\begin{aligned} \begin{aligned} |{{\mathbb {E}}}[J_1(F_{Y''})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S]|&\lesssim {{\mathbb {E}}}\left[ \left( K_m^*\delta \right) ^{3/2} +\log m \sqrt{K_m^*\delta } + \frac{2}{m}|Z_S\right] \\&\lesssim (K_m^*)^{3/2}{{\mathbb {E}}}[\delta ^2|Z_S]^{3/4} + \log m\sqrt{K_m^*}{{\mathbb {E}}}[\delta ^2|Z_S]^{1/4} + \frac{1}{m}\\&{\mathop {\lesssim }\limits ^{(60), (67)}}\frac{(\log m)^{3/2}}{m^{1/4}}\rightarrow 0&a.s. \end{aligned} \end{aligned}$$

To prove (c), recall from (73) that conditioning on \(Z_S\) and \(Y_{\text {epr}}\), \({\text {supp}}({\widehat{{\varepsilon }}}_S)\subseteq [-r ,r]\). Applying Lemma 3.5,

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y'})|Z_S, Y_{\text {epr}}]|\nonumber \\ {}&\le {{\mathbb {E}}}\left[ \left\| M_{r}[|F_{X_S^\top {{\widehat{\beta }}}_S}-F_{X_S^\top \beta _S}|]\right\| ^{1/2}_{L^{1/2}_{{\mathbb {R}}}}| Z_S, Y_{\text {epr}}\right] . \end{aligned}$$
(74)

If \(\delta <1/2\), then \(1/2<\Vert {{\widehat{\beta }}}_S\Vert _2< 3/2\). In this case, the \(C_1, C_2, C_3\) in Lemma 3.7 are absolute constants. According to Lemma 3.7 with \(p = 1/2\),

$$\begin{aligned} \left\| M_{r}[|F_{X_S^\top {{\widehat{\beta }}}_S}-F_{X_S^\top \beta _S}|]\right\| ^{1/2}_{L^{1/2}_{{\mathbb {R}}}}\lesssim (r+ 1)\delta ^{5/12}\log \left( 1/\delta \right) \le (r+1)\delta ^{1/4}, \end{aligned}$$
(75)

where we used \(\log (1/\delta )<\delta ^{-1/6}\) when \(\delta \le 1/2\).

If \(\delta \ge 1/2\), the same result in Lemma 3.7 implies

$$\begin{aligned} \left\| M_{r}[|F_{X_S^\top {{\widehat{\beta }}}_S}-F_{X_S^\top \beta _S}|]\right\| ^{1/2}_{L^{1/2}_{{\mathbb {R}}}}\lesssim (r + 1+\delta )\delta . \end{aligned}$$
(76)

Substituting (75) and (76) into (74) yields that

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S, Y_{\text {epr}}]-{{\mathbb {E}}}[J_1(F_{Y'})|Z_S, Y_{\text {epr}}]|\\ {}&\quad \lesssim {{\mathbb {E}}}[(r+1)\delta ^{1/4} + (r + 1+\delta )\delta | Z_S, Y_{\text {epr}}]. \end{aligned}$$

Taking expectation over the exploration noise and applying Jensen’s inequality,

$$\begin{aligned}&|{{\mathbb {E}}}[J_1(F_{Y'''})|Z_S]-{{\mathbb {E}}}[J_1(F_{Y'})|Z_S]|\\&\quad \lesssim \ {{\mathbb {E}}}[\delta ^{1/4}|Z_S] + {{\mathbb {E}}}[\delta |Z_S]+K_m^*{{\mathbb {E}}}[\delta ^{5/4}|Z_S]+(K_m)^2{{\mathbb {E}}}[\delta ^2|Z_S]\\&\quad \lesssim \ {{\mathbb {E}}}[\delta ^2|Z_S]^{1/8} + {{\mathbb {E}}}[\delta ^2|Z_S]^{1/2}+K_m^*{{\mathbb {E}}}[\delta ^{2}|Z_S]^{5/8}+(K_m)^2{{\mathbb {E}}}[\delta ^2|Z_S]\\&{\mathop {\lesssim }\limits ^{60, 66}}\ \frac{1}{m^{1/8}}\rightarrow 0&a.s. \end{aligned}$$

(28b) is proved by combining statements (a), (b), (c).

A quantile regression framework

Quantile regression offers an alternative approach to simulating Y through a random coefficient interpretation [15]. For any \(S\subseteq [n]\) and \(\tau \in (0,1)\), we assume the conditional \(\tau \)-th quantile of Y on \(X_S\) satisfies

$$\begin{aligned} F^{-1}_{Y|X_S}(\tau ) = X_S^\top \beta _S(\tau ), \end{aligned}$$
(77)

where \(\beta _S(\tau )\) the \(\tau \)-th coefficient vector. (77) is a standard quantile regression formulation, and can be used to model heteroscedastic noise effects.

$$\begin{aligned}&{\widehat{\beta }}_S(\tau ) = {{\,\mathrm{arg\,min}\,}}_{\beta \in {{\mathbb {R}}}^{s+1}}\frac{1}{m}\sum _{\ell \in [m]}\rho _\tau (Y_\ell - X^\top _{{\text {epr}},\ell }\beta )&\rho _\tau (x) = x(\tau - {\varvec{1}}_{x<0}). \end{aligned}$$

Thus, (77) approximately equals

$$\begin{aligned} {\widehat{F}}^{-1}_{Y|X_S}(\tau ) = X_S^\top {\widehat{\beta }}_S(\tau ). \end{aligned}$$
(78)

As opposed to (23), (78) provides a way to simulate Y based on \(X_S\) via inverse transform sampling:

(79)

In our case, \(X_{{\text {epr}}, \ell }, \ell \in [m]\) are i.i.d. samples so (77) fits into a random design quantile regression framework as analyzed in [21], where the authors established a strong consistency result for \({\widehat{\beta }}_S(\tau )\) under suitable conditions. The consistency result can be further proven to hold uniformly for all \(\tau \in [\delta ,1-\delta ]\) for any fixed \(\delta >0\), which justifies the asymptotic behavior of the procedure in (78) as \(m, N_S\rightarrow \infty \).

In the quantile regression framework, obtaining the optimal choices for m and S is much harder than in the linear regression setup. The AETC-d-q algorithm in Sect. 6 implements (79) with m set as the adaptive exploration rate given by the AETC-d, S as the corresponding model output for exploitation, and U approximated via \(\frac{1}{K}\sum _{j\in [K]}\delta _{\frac{j}{K+1}}\) with \(K=100\).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, Y., Narayan, A. Budget-limited distribution learning in multifidelity problems. Numer. Math. 153, 171–212 (2023). https://doi.org/10.1007/s00211-022-01337-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00211-022-01337-5

Mathematics Subject Classification

Navigation