Abstract
With the advancements of computer architectures, the use of computational models proliferates to solve complex problems in many scientific applications such as nuclear physics and climate research. However, the potential of such models is often hindered because they tend to be computationally expensive and consequently ill-fitting for uncertainty quantification. Furthermore, they are usually not calibrated with real-time observations. We develop a computationally efficient algorithm based on variational Bayes inference (VBI) for calibration of computer models with Gaussian processes. Unfortunately, the standard fast-to-compute gradient estimates based on subsampling are biased under the calibration framework due to the conditionally dependent data which diminishes the efficiency of VBI. In this work, we adopt a pairwise decomposition of the data likelihood using vine copulas that separate the information on dependence structure in data from their marginal distributions and leads to computationally efficient gradient estimates that are unbiased and thus scalable calibration. We provide an empirical evidence for the computational scalability of our methodology together with average case analysis and describe all the necessary details for an efficient implementation of the proposed algorithm. We also demonstrate the opportunities given by our method for practitioners on a real data example through calibration of the Liquid Drop Model of nuclear binding energies.
This is a preview of subscription content, access via your institution.





References
Aicher, C., Ma, Y.A., Foti, N.J., Fox, E.B.: Stochastic gradient mcmc for state space models. SIAM J. Math. Data Sci. 1(3), 555–587 (2019). https://doi.org/10.1137/18M1214780
Ambrogioni, L., Lin, K., Fertig, E., Vikram, S., Hinne, M., Moore, D., van Gerven, M.: Automatic structured variational inference. In: Banerjee, A., Fukumizu, K. (eds.) Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR, Proceedings of Machine Learning Research, vol 130, pp. 676–684, https://proceedings.mlr.press/v130/ambrogioni21a.html (2021)
Audi, G., Wapstra, A., Thibault, C.: The AME2003 atomic mass evaluation: (ii). Tables, graphs and references. Nucl. Phys. A 729, 337–676 (2003)
Bauer, M., van der Wilk, M., Rasmussen, CE.: Understanding probabilistic sparse gaussian process approximations. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NeurIPS’16, pp. 1533–1541, http://dl.acm.org/citation.cfm?id=3157096.3157268 (2016)
Bayarri, M.J., Berger, J.O., Paulo, R., Sacks, J., Cafeo, J.A., Cavendish, J., Lin, C.H., Tu, J.: A framework for validation of computer models. Technometrics 49, 138–154 (2007). https://doi.org/10.1198/004017007000000092
Bedford, T., Cooke, R.M.: Vines-a new graphical model for dependent random variables. Ann. Stat. 30(4), 1031–1068 (2002)
Benzaid, D., Bentridi, S., Kerraci, A., Amrani, N.: Bethe-Weizsäcker semiempirical mass formula coefficients 2019 update based on AME2016. Nucl. Sci. Tech. 31, 9 (2020). https://doi.org/10.1007/s41365-019-0718-8
Bertsch, G.F., Bingham, D.: Estimating parameter uncertainty in binding-energy models by the frequency-domain bootstrap. Phys. Rev. Lett. 119, 252501 (2017)
Bertsch, G.F., Sabbey, B., Uusnäkki, M.: Fitting theories of nuclear binding energies. Phys. Rev. C 71, 054311 (2005). https://doi.org/10.1103/PhysRevC.71.054311
Bethe, H.A., Bacher, R.F.: Nuclear physics A. Stationary states of nuclei. Rev. Mod. Phys. 8, 82–229 (1936). https://doi.org/10.1103/RevModPhys.8.82
Bottou, L., Le, Cun Y., Bengio, Y.: Global training of document processing systems using graph transformer networks. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 489–493, http://leon.bottou.org/papers/bottou-97 (1997)
Brechmann, E.C., Joe, H.: Truncation of vine copulas using fit indices. J. Multivar. Anal. 138, 19–33 (2015)
Brechmann, E.C., Czado, C., Aas, K.: Truncated regular vines in high dimensions with application to financial data. Can. J. Stat. 40(1), 68–85 (2012)
Casella, G., Robert, C.P.: Rao-blackwellisation of sampling schemes. Biometrika 83(1), 81–94 (1996)
Chib, S., Greenberg, E.: Understanding the metropolis-hastings algorithm. Am. Stat. 49, 327–335 (1995)
Cooke, R., Kurowicka, D.: Uncertainty Analysis With High Dimensional Dependence Modelling. Wiley, London (2006)
Deng, W., Zhang, X., Liang, F., Lin, G.: An adaptive empirical bayesian method for sparse deep learning. In: Advances in Neural Information Processing Systems 32, Curran Associates, Inc., pp 5563–5573, http://papers.nips.cc/paper/8794-an-adaptive-empirical-bayesian-method-for-sparse-deep-learning.pdf (2019)
Dissmann, J., Brechmann, E., Czado, C., Kurowicka, D.: Selecting and estimating regular vine copulae and application to financial returns. Comput. Stat. Data Anal. 59, 52–69 (2013). https://doi.org/10.1016/j.csda.2012.08.010
Dobaczewski, J., Nazarewicz, W., Reinhard, P.G.: Error estimates of theoretical models: a guide. J. Phys. G Nucl. Part. Phys. 41(7), 074001 (2014). https://doi.org/10.1088/0954-3899/41/7/074001
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
Fayans, S.A.: Towards a universal nuclear density functional. J. Exp. Theor. Phys. Lett. 68(3), 169–174 (1998). https://doi.org/10.1134/1.567841
Fortunato, M., Blundell, C., Vinyals, O.: Bayesian recurrent neural networks. arXiv preprint arXiv: 1704.02798 (2017)
Geffner, T., Domke, J.: Using large ensembles of control variates for variational inference. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 31st edn. Curran Associates Inc., Red Hook (2018)
Goldberger, A.: Econometric theory. Wiley publications in statistics, London (1966)
Gu, M., Wang, L.: Scaled Gaussian stochastic process for computer model calibration and prediction. SIAM/ASA J. Uncertain. Quantif. 6(4), 1555–1583 (2018). https://doi.org/10.1137/17M1159890
Han, S., Liao, X., Dunson, D., Carin, L.: Variational gaussian copula inference. In: Gretton, A., Robert, C. C. (eds) Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR, Cadiz, Spain, Proceedings of Machine Learning Research, vol 51, pp. 829–838, https://proceedings.mlr.press/v51/han16.html (2016)
Higdon, D., Kennedy, M., Cavendish, J.C., Cafeo, J.A., Ryne, R.D.: Combining field data and computer simulations for calibration and prediction. SIAM J. Sci. Comput. 26, 448–466 (2005). https://doi.org/10.1137/S1064827503426693
Higdon, D., Gattiker, J., Williams, B., Rightley, M.: Computer model calibration using high-dimensional output. J. Am. Stat. Assoc. 103, 570–583 (2008)
Higdon, D., McDonnell, J.D., Schunck, N., Sarich, J., Wild, S.M.: A Bayesian approach for parameter estimation and prediction using a computationally intensive model. J. Phys. G Nucl. Part. Phys. 42(3), 034009 (2015). https://doi.org/10.1088/0954-3899/42/3/034009
Hoffman, M., Blei, D.: Stochastic structured variational inference. In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR, San Diego, CA, vol 38, pp. 361–369, http://proceedings.mlr.press/v38/hoffman15.html (2015)
Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013)
Homan, M.D., Gelman, A.: The no-U-turn sampler: adaptively setting path lengths in hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1351–1381 (2014)
Ireland, D.G., Nazarewicz, W.: Enhancing the interaction between nuclear experiment and theory through information and statistics. J. Phys. G Nucl. Part. Phys. 42(3), 030301 (2015). https://doi.org/10.1088/0954-3899/42/3/030301
Johnston, J.: Econometric Methods. McGraw-Hill, New York (1976)
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37, 183–233 (1999)
Kejzlar, V., Neufcourt, L., Nazarewicz, W., Reinhard, P.G.: Statistical aspects of nuclear mass models. J. Phys. G Nucl. Part. Phys. 47(9), 094001 (2020). https://doi.org/10.1088/1361-6471/ab907c
Kejzlar, V., Son, M., Bhattacharya, S., Maiti, T.: A fast and calibrated computer model emulator: an empirical bayes approach. Stat. Comput. 31(4), 1–26 (2021). https://doi.org/10.1007/s11222-021-10024-8
Kennedy, M.C., O’Hagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 63, 425–464 (2001). https://doi.org/10.1111/1467-9868.00294
King, G.B., Lovell, A.E., Neufcourt, L., Nunes, F.M.: Direct comparison between Bayesian and frequentist uncertainty quantification for nuclear reactions. Phys. Rev. Lett. 122, 232502 (2019)
Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive flow. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 29th edn. Curran Associates, Inc., Red Hook (2016)
Kirson, M.W.: Mutual influence of terms in a semi-empirical mass formula. Nucl. Phys. A 798(1), 29–60 (2008). https://doi.org/10.1016/j.nuclphysa.2007.10.011
Kobyzev, I., Prince, S.J., Brubaker, M.A.: Normalizing flows: an introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3964–3979 (2021). https://doi.org/10.1109/tpami.2020.2992934
Kortelainen, M., Lesinski, T., Moré, J.J., Nazarewicz, W., Sarich, J., Schunck, N., Stoitsov, M.V., Wild, S.M.: Nuclear energy density optimization. Phys. Rev. C 82(2), 024313 (2010). https://doi.org/10.1103/PhysRevC.82.024313
Kortelainen, M., McDonnell, J., Nazarewicz, W., Reinhard, P.G., Sarich, J., Schunck, N., Stoitsov, M.V., Wild, S.M.: Nuclear energy density optimization: large deformations. Phys. Rev. C 85, 024304 (2012). https://doi.org/10.1103/PhysRevC.85.024304
Kortelainen, M., McDonnell, J., Nazarewicz, W., Olsen, E., Reinhard, P.G., Sarich, J., Schunck, N., Wild, S.M., Davesne, D., Erler, J., Pastore, A.: Nuclear energy density optimization: shell structure. Phys. Rev. C 89, 054314 (2014). https://doi.org/10.1103/PhysRevC.89.054314
Krane, K.: Introductory Nuclear Physics. Wiley, London (1987)
Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., Blei, D.M.: Automatic differentiation variational inference. J. Mach. Learn. Res. 18(14), 1–45 (2017)
Lopez-Paz, D., Hernández-Lobato, JM., Zoubin, G.: Gaussian process vine copulas for multivariate dependence. In: Dasgupta, S., McAllester, D. (eds) Proceedings of the 30th International Conference on Machine Learning, PMLR, Atlanta, Georgia, USA, Proceedings of Machine Learning Research, vol 28, pp. 10–18, https://proceedings.mlr.press/v28/lopez-paz13.html (2013)
Ma, Y.A., Chen, T., Fox, E.: A complete recipe for stochastic gradient mcmc. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 28th edn., pp. 2917–2925. Curran Associates, Inc., Red Hook (2015)
McDonnell, J.D., Schunck, N., Higdon, D., Sarich, J., Wild, S.M., Nazarewicz, W.: Uncertainty quantification for nuclear density functional theory and information content of new measurements. Phys. Rev. Lett. 114(12), 122501 (2015). https://doi.org/10.1103/PhysRevLett.114.122501
Miller, A., Foti, N., D’Amour, A., Adams, R.P.: Reducing reparameterization gradient variance. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 30th edn. Curran Associates, Inc., Red Hook (2017)
Morris, M.D., Mitchell, T.J.: Exploratory designs for computational experiments. J. Stat. Plan. Inference 43(3), 381–402 (1995). https://doi.org/10.1016/0378-3758(94)00035-T
Myers, W.D., Swiatecki, W.J.: Nuclear masses and deformations. Nucl. Phys. 81(2), 1–60 (1966). https://doi.org/10.1016/S0029-5582(66)80001-9
Neiswanger, W., Wang, C., Xing, EP.: Asymptotically exact, embarrassingly parallel mcmc. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, VA, UAI’14, pp. 623–632, http://dl.acm.org/citation.cfm?id=3020751.3020816 (2014)
Papamakarios, G., Pavlakou, T., Murray, I.: Masked autoregressive flow for density estimation. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 30th edn. Curran Associates, Inc., Red Hook (2017)
Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S., Lakshminarayanan, B.: Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 22(57), 1–64 (2021)
Peterson, C., Anderson, J.R.: A mean field theory learning algorithm for neural networks. Complex Syst. 1, 995–1019 (1987)
Plumlee, M.: Bayesian calibration of inexact computer models. J. Am. Stat. Assoc. 112, 1274–1285 (2017). https://doi.org/10.1080/01621459.2016.1211016
Plumlee, M.: Computer model calibration with confidence and consistency. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 81(3), 519–545 (2019). https://doi.org/10.1111/rssb.12314
Plumlee, M., Joseph, V.R., Yang, H.: Calibrating functional parameters in the ion channel models of cardiac cells. J. Am. Stat. Assoc. 111, 500–509 (2016)
Pollard, D., Chang, W., Haran, M., Applegate, P., DeConto, R.: Large ensemble modeling of the last deglacial retreat of the West Antarctic ice sheet: comparison of simple and advanced statistical techniques. Geosci. Model Dev. 9(5), 1697–1723 (2016)
Quiñonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6, 1939–1959 (2005)
Ranganath, R., Gerrish, S., Blei, D.: Black box variational inference. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR, Proceedings of Machine Learning Research 33, 814–822 (2014)
Ranganath, R., Tran, D., Blei, DM.: Hierarchical variational models. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning – Volume 48, JMLR, ICML’16, pp. 2568–2577 (2016)
Reinhard, P.G., Bender, M., Nazarewicz, W., Vertse, T.: From finite nuclei to the nuclear liquid drop: Leptodermous expansion based on self-consistent mean-field theory. Phys. Rev. C 73, 014309 (2006). https://doi.org/10.1103/PhysRevC.73.014309
Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: Bach F, Blei D (eds) Proceedings of the 32nd International Conference on Machine Learning, PMLR, Lille, France, Proceedings of Machine Learning Research, vol 37, pp. 1530–1538, https://proceedings.mlr.press/v37/rezende15.html (2015)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer Texts in Statistics, New York (2005)
Ross, S.M.: Simulation, 4th edn. Academic Press Inc., Orlando (2006)
Ruiz, FJR., Titsias, MK., Blei, DM.: Overdispersed black-box variational inference. In: Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, Virginia, USA, UAI’16, pp. 647-656 (2016)
Sexton, D.M.H., Murphy, J.M., Collins, M., Webb, M.J.: Multivariate probabilistic projections using imperfect climate models Part i: outline of methodology. Clim. Dyn. 38(11), 2513–2542 (2012)
Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris 8, 229–231 (1959)
Smith, M.S., Loaiza-Maya, R., Nott, D.J.: High-dimensional copula variational approximation through transformation. J. Comput. Graph. Stat. (2020). https://doi.org/10.1080/10618600.2020.1740097
Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4, 26–31 (2012)
Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5, 567–574 (2009)
Tran, D., Blei, DM., Airoldi, EM.: Copula variational inference. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, MIT Press, Cambridge, MA, NeurIPS’15, pp. 3564–3572, http://dl.acm.org/citation.cfm?id=2969442.2969637 (2015)
Tran, D., Ranganath, R., Blei, DM.: Hierarchical implicit models and likelihood-free variational inference. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NeurIPS’17, pp. 5529–5539, http://dl.acm.org/citation.cfm?id=3295222.3295304 (2017)
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 1(1–2), 1–305 (2008). https://doi.org/10.1561/2200000001
Wang, Y., Blei, D.M.: Frequentist consistency of variational Bayes. J. Am. Stat. Assoc. 114(527), 1147–1161 (2018)
Weilbach, C., Beronov, B., Wood, F., Harvey, W.: Structured conditional continuous normalizing flows for efficient amortized inference in graphical models. In: Chiappa S, Calandra R (eds) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR, Proceedings of Machine Learning Research, vol 108, pp. 4441–4451, https://proceedings.mlr.press/v108/weilbach20a.html (2020)
Weizsäcker CFv.: Zur theorie der kernmassen. Z. Phys. 96(7), 431–458 (1935). https://doi.org/10.1007/BF01337700
Williams, B., Higdon, D., Gattiker, J., Moore, L., McKay, M., Keller-McNulty, S.: Combining experimental data and computer simulations, with an application to flyer plate experiments. Bayesian Anal. 1(4), 765–792 (2006)
Xie, F., Xu, Y.: Bayesian projected calibration of computer models. J. Am. Stat. Assoc. 116(536), 1965–1982 (2021). https://doi.org/10.1080/01621459.2020.1753519
Yuan, C.: Uncertainty decomposition method and its application to the liquid drop model. Phys. Rev. C 93, 034310 (2016). https://doi.org/10.1103/PhysRevC.93.034310
Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv:1212.5701 (2012)
Zhang, L., Jiang, Z., Choi, J., Lim, C.Y., Maiti, T., Baek, S.: Patient-specific prediction of abdominal aortic aneurysm expansion using Bayesian calibration. IEE J. Biomed. Health Inform. (2019). https://doi.org/10.1109/JBHI.2019.2896034
Acknowledgements
The authors thank the reviewers and the Editor for their helpful comments and ideas. This work was supported in part through computational resources and services provided by the Institute for Cyber-Enabled Research at Michigan State University.
Funding
The research is partially supported by the National Science Foundation funding DMS-1952856, DMS-2124605, DMS-1924724, and OAC-2004601.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Scalable algorithm with truncate C-vine copulas
Here we present the details of the C-vine based versions of Algorithm 1 and Algorithm 2. First, we can decompose the log-likelihood \(\log p({\varvec{d}}|{\varvec{\phi }})\) using a C-vine as
where
This now yields the following expression for the ELBO gradient:
Equivalently to Proposition 1, we have the following proposition that establishes the noisy unbiased estimate of the gradient (52) using the C-vine copula decomposition.
Proposition 4
Let \(\tilde{{\mathcal {L}}}_C({\varvec{\lambda }})\) be an estimate of the ELBO gradient \(\nabla _{{\varvec{\lambda }}} {\mathcal {L}}({\varvec{\lambda }})\) defined as
where \(K \sim U(1, \dots , \frac{N(N-1)}{2})\), and \(I_C\) is the bijection
then \(\tilde{{\mathcal {L}}}_C({\varvec{\lambda }})\) is unbiased i.e., \({\mathbb {E}}(\tilde{{\mathcal {L}}}_C({\varvec{\lambda }})) = \nabla _{{\varvec{\lambda }}} {\mathcal {L}}({\varvec{\lambda }})\).
Again, \(\tilde{{\mathcal {L}}}_C({\varvec{\lambda }})\) can be relatively costly to compute for large datasets due to the recursive nature of the copula density computations. We now carry out exactly the same development an using l-truncated C-vine as in the case of Proposition 2 and Proposition 3.
Proposition 5
If the copula of \(p({\varvec{d}}|{\varvec{\phi }})\) is distributed according to an l-truncated C-vine, we can rewrite
where
and
Let us now replace the full log-likelihood \(\log ({\varvec{d}}|{\varvec{\phi }})\) in the definition of ELBO with the likelihood based on a truncated vine copula. This yields the l-truncated ELBO for the l-truncated C-vine
with its gradient
Consequently, we get the following proposition that establishes the noisy unbiased estimate of \(\nabla _{{\varvec{\lambda }}} {\mathcal {L}}_{C_l}({\varvec{\lambda }})\).
Proposition 6
Let \(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})\) be an estimate of the ELBO gradient \(\nabla _{{\varvec{\lambda }}} {\mathcal {L}}_{C_l}({\varvec{\lambda }})\) defined as
where \(K \sim U(1, \dots , \frac{l(2N-(l + 1))}{2})\), and \(I_{C_l}\) is the bijection
then \(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})\) is unbiased i.e., \({\mathbb {E}}(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})) = \nabla _{{\varvec{\lambda }}} {\mathcal {L}}_{C_l}({\varvec{\lambda }})\).
Algorithm 3 postulates the version of Algorithm 1 based on the truncated C-vine decomposition.

1.1 A.1 Variance reduction
Let us now consider the MC approximation of the gradient estimator \(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})\). The \(j^{th}\) component of the estimator with Rao-Blackwellization is
where \({\tilde{f}}^j_{(K)}({\varvec{\phi }})\) are here the components of \(f^{C_l}_{I_{C_l}(K)}({\varvec{\phi }})\) that include \(\phi _j\).
We can again use the control variates to reduce the variance of MC approximation of the gradient estimator \(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})\). In particular, we consider the following \(j^{th}\) element of the Rao-Blackwellized MC approximation of the gradient estimator \(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})\) with control variates
where \({\hat{a}}^C_j\) is the sample estimate of the \(j^{th}\) component of the optimal control variate scale \(a^*\) based on S (or fever) independent draws from the variational distribution. Namely,
where
and \(\psi ^C({\varvec{\phi }}) = \nabla _{{\varvec{\lambda }}} \log q(\phi |{\varvec{\lambda }})\).
As in the case of the D-vine, we now derive the ultimate Algorithm 4. Again, instead of taking the samples from \(q({\varvec{\phi }}| {\varvec{\lambda }})\) to approximate the gradient estimates, we will take samples from an overdispersed distribution \(r({\varvec{\phi }}|{\varvec{\lambda }}, \tau )\). Combining the Rao-Blackwellization, control variates, and importance sampling, we have the following \(j^{th}\) component of the MC approximation of the gradient estimator \(\tilde{{\mathcal {L}}}_{C_l}(\lambda )\)
where \({\varvec{\phi }}[s] \sim r({\varvec{\phi }}|{\varvec{\lambda }}, \tau )\) and \(w({\varvec{\phi }}[s] ) = q({\varvec{\phi }}[s] |{\varvec{\lambda }}) / r({\varvec{\phi }}[s]|{\varvec{\lambda }}, \tau )\) with \({\tilde{a}}^C_j\) being the estimate of the \(j^{th}\) component of

Appendix B Proofs
Proof of Proposition 1
Since \(P(K = k) = \frac{2}{N(N-1)}\), we have directly from the definition of expectation
The final equality is the consequence of the uniqueness of the pairs of variables in the conditioned sets of the copula density \(c_{i,(i+j); (i+1), \dots , (i+j-1)}\), and that \(\frac{N(N-1)}{2}\) is the number of unordered pairs of N variables. \(\square \)
Proof of Proposition 2
It is sufficient to show that for \(l \in \{1, \dots , N-1\}\) the following equality holds:
where
To show this, let us consider the summation
For \(l = 1\), we get
and for \(l \ge 2\)
Note that in the case of \(l = N-1\), the last summation consists of only one element \(\log p_1(d_1|{\varvec{\phi }}) + \log p_{1+l}(d_{1+l}|{\varvec{\phi }})\). By careful examination of the two cases above, we get the following results. For \(2l \le N\):
where the middle term disappears in the case \(2l = N\), and for \(2l > N\):
If we now check that \(a_i\) equals to the factors in front of the log-likelihoods in the two cases above, the proof of Proposition 2 is complete. Note that once we check the equality for \(a_i\), the same directly translates to \(b_{i+j}\) since \(b_{i+j}\) is \(a_i\) with indices set to \(i+j\) instead of i. Indeed, for \(2l \le N\)
and for \(2l > N\)
\(\square \)
Proof of Proposition 3
By the construction of R-vine (see Cooke and Kurowicka (2006)), each tree \({\mathcal {T}}_i\), for \(i = 1, \dots , N-1\) has exactly \(N-i\) edges (these are the unique conditioned variable pairs). For any R-vine truncated at level \(l \in \{1, \dots , N-1\}\), we get the number of edges to be
The rest of the proof is identical with that of Proposition 1 due to the uniqueness of the conditioned variable pairs in the copula density \(c_{i,(i+j); (i+1), \dots , (i+j-1)}\), but in this case \(P(K = k) = \frac{2}{l(2N - (l+1))}\). \(\square \)
Proof of Proposition 4
The proof is identical with that of Proposition 1 since each conditioned pair in the copula density \(c_{j,(j+i); 1, \dots , (j-1)}\) is unique as well. \(\square \)
Proof of Proposition 5
It is sufficient to show that for \(l \in \{1, \dots , N-1\}\) the following equality holds:
where
To show this, let us consider the following summation
Now, for \(l = 1\), we have
For \(l \ge 2\), we have
Therefore we can rewrite
Overall,
Since \(j \in \{1, \dots , l\}\) and
the equality 57 holds. \(\square \)
Proof of Proposition 6
The proof is identical with that of Proposition 3 since each conditioned pair in the copula density \(c_{j,(j+i); 1, \dots , (j-1)}\) is unique, and a C-vine is a special case of R-vine. \(\square \)
Appendix C Simulation: truncation level
Figure 6 and Table 4 show the changes in values of variational parameters with the increasing size of truncation level l for the calibration simulation described in Sect. 5.1.1. The differences between \(L^2\) norms of variational parameters are provided both for all the variational parameters and for the variational approximations of the calibration parameters only. Figure 6 exhibits a clear elbow shape with minimal change of variational parameters for \(l \ge 3\).
Appendix D Simulation: memory profile
Here we present the memory profiles (Fig. 7) for the MH, the NUTS, and the Algorithm 2 under the simulation scenario studied in Chapter 5. These were recorded during a one hour period of running the algorithms. The MH algorithm and the NUTS were implemented in Python 3.0 using the PyMC3 module version 3.5. The memory profiles were measured using the memory-profiler module version 0.55.0 in Python 3.0. The VC was also implemented in Python 3.0. The code was run on the high performance computing cluster at the Institute for Cyber-Enabled Research at Michigan State University.
Appendix E Application: liquid drop model (LDM)
1.1 E.1 GP specifications
In the case of the LDM \(E_B(Z,N)\), we consider the GP prior with mean zero and covariance function
Similarly, we consider the GP process prior for the systematic discrepancy \(\delta (Z,N)\) with mean zero and covariance function
1.2 E.2 Experimental design
Kennedy and O’Hagan (2001) recommend to select the calibration inputs for the model runs so that any plausible value \({\varvec{\theta }}\) of the true calibration parameter is covered. In this context, we consider the space of calibration parameters to be centered at the values of least squares estimates \({\hat{{\varvec{\theta }}}}_{L_2}\) and broad enough to contain the majority of values provided by the nuclear physics literature (Weizsäcker 1935; Bethe and Bacher 1936; Myers and Swiatecki 1966; Kirson 2008; Benzaid et al. 2020). Table 5 gives the lower and upper bounds for the parameter space so that \(\text {Lower bound} = {\hat{\theta }}_{L_2} - 15 \times SE({\hat{\theta }}_{L_2})\) and \(\text {Upper bound} = {\hat{\theta }}_{L_2} + 15 \times SE({\hat{\theta }}_{L_2})\). Here \(SE({\hat{\theta }}_{L_2})\) is given by the standard linear regression theory.
1.3 E.3 Prior distributions
First, we consider the independent Gaussian distributions centered at the LS estimates \({\hat{{\varvec{\theta }}}}_{L_2}\) (in Table 3) with standard deviations \(7.5 \times SE({\hat{{\varvec{\theta }}}}_{L_2})\) so that the calibration parameters used for generating the model runs are covered roughly within two standard deviations of the priors. Namely,
The prior distributions for hyperparameters of the GPs were selected as \(\text {Gamma}(\alpha , \beta )\) with the shape parameter \(\alpha \) and scale parameter \(\beta \), so that they represent a vague knowledge about the scale of these parameters given by the literature on nuclear mass models (Weizsäcker 1935; Bethe and Bacher 1936; Myers and Swiatecki 1966; Fayans 1998; Kirson 2008; McDonnell et al. 2015; Kortelainen et al. 2010, 2012, 2014; Benzaid et al. 2020; Kejzlar et al. 2020). In particular, the error scale \(\sigma \) is in the majority of nuclear applications within units of MeV, therefore we set
with the scale of the systematic error being
to allow for this quantity to range between the units and tens of MeV. It is also reasonable to assume that the mass of a given nucleus is correlated mostly with its neighbors on the nuclear chart. We express this notion through these reasonably wide prior distributions
Finally, the majority of the masses in the training dataset of 2000 experimental binding energies fall into the range of [1000, 2000] MeV (1165 of masses precisely). We consider the following prior distribution for the parameter \(\eta _f\) to reflect on the scale of the experimental binding energies:
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kejzlar, V., Maiti, T. Variational inference with vine copulas: an efficient approach for Bayesian computer model calibration. Stat Comput 33, 18 (2023). https://doi.org/10.1007/s11222-022-10194-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-022-10194-z