Skip to main content

Variational inference with vine copulas: an efficient approach for Bayesian computer model calibration

Abstract

With the advancements of computer architectures, the use of computational models proliferates to solve complex problems in many scientific applications such as nuclear physics and climate research. However, the potential of such models is often hindered because they tend to be computationally expensive and consequently ill-fitting for uncertainty quantification. Furthermore, they are usually not calibrated with real-time observations. We develop a computationally efficient algorithm based on variational Bayes inference (VBI) for calibration of computer models with Gaussian processes. Unfortunately, the standard fast-to-compute gradient estimates based on subsampling are biased under the calibration framework due to the conditionally dependent data which diminishes the efficiency of VBI. In this work, we adopt a pairwise decomposition of the data likelihood using vine copulas that separate the information on dependence structure in data from their marginal distributions and leads to computationally efficient gradient estimates that are unbiased and thus scalable calibration. We provide an empirical evidence for the computational scalability of our methodology together with average case analysis and describe all the necessary details for an efficient implementation of the proposed algorithm. We also demonstrate the opportunities given by our method for practitioners on a real data example through calibration of the Liquid Drop Model of nuclear binding energies.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  • Aicher, C., Ma, Y.A., Foti, N.J., Fox, E.B.: Stochastic gradient mcmc for state space models. SIAM J. Math. Data Sci. 1(3), 555–587 (2019). https://doi.org/10.1137/18M1214780

    Article  MathSciNet  MATH  Google Scholar 

  • Ambrogioni, L., Lin, K., Fertig, E., Vikram, S., Hinne, M., Moore, D., van Gerven, M.: Automatic structured variational inference. In: Banerjee, A., Fukumizu, K. (eds.) Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR, Proceedings of Machine Learning Research, vol 130, pp. 676–684, https://proceedings.mlr.press/v130/ambrogioni21a.html (2021)

  • Audi, G., Wapstra, A., Thibault, C.: The AME2003 atomic mass evaluation: (ii). Tables, graphs and references. Nucl. Phys. A 729, 337–676 (2003)

    Article  Google Scholar 

  • Bauer, M., van der Wilk, M., Rasmussen, CE.: Understanding probabilistic sparse gaussian process approximations. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NeurIPS’16, pp. 1533–1541, http://dl.acm.org/citation.cfm?id=3157096.3157268 (2016)

  • Bayarri, M.J., Berger, J.O., Paulo, R., Sacks, J., Cafeo, J.A., Cavendish, J., Lin, C.H., Tu, J.: A framework for validation of computer models. Technometrics 49, 138–154 (2007). https://doi.org/10.1198/004017007000000092

    Article  MathSciNet  Google Scholar 

  • Bedford, T., Cooke, R.M.: Vines-a new graphical model for dependent random variables. Ann. Stat. 30(4), 1031–1068 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Benzaid, D., Bentridi, S., Kerraci, A., Amrani, N.: Bethe-Weizsäcker semiempirical mass formula coefficients 2019 update based on AME2016. Nucl. Sci. Tech. 31, 9 (2020). https://doi.org/10.1007/s41365-019-0718-8

    Article  Google Scholar 

  • Bertsch, G.F., Bingham, D.: Estimating parameter uncertainty in binding-energy models by the frequency-domain bootstrap. Phys. Rev. Lett. 119, 252501 (2017)

    Article  Google Scholar 

  • Bertsch, G.F., Sabbey, B., Uusnäkki, M.: Fitting theories of nuclear binding energies. Phys. Rev. C 71, 054311 (2005). https://doi.org/10.1103/PhysRevC.71.054311

    Article  Google Scholar 

  • Bethe, H.A., Bacher, R.F.: Nuclear physics A. Stationary states of nuclei. Rev. Mod. Phys. 8, 82–229 (1936). https://doi.org/10.1103/RevModPhys.8.82

    Article  MATH  Google Scholar 

  • Bottou, L., Le, Cun Y., Bengio, Y.: Global training of document processing systems using graph transformer networks. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 489–493, http://leon.bottou.org/papers/bottou-97 (1997)

  • Brechmann, E.C., Joe, H.: Truncation of vine copulas using fit indices. J. Multivar. Anal. 138, 19–33 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Brechmann, E.C., Czado, C., Aas, K.: Truncated regular vines in high dimensions with application to financial data. Can. J. Stat. 40(1), 68–85 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Casella, G., Robert, C.P.: Rao-blackwellisation of sampling schemes. Biometrika 83(1), 81–94 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Chib, S., Greenberg, E.: Understanding the metropolis-hastings algorithm. Am. Stat. 49, 327–335 (1995)

    Google Scholar 

  • Cooke, R., Kurowicka, D.: Uncertainty Analysis With High Dimensional Dependence Modelling. Wiley, London (2006)

    MATH  Google Scholar 

  • Deng, W., Zhang, X., Liang, F., Lin, G.: An adaptive empirical bayesian method for sparse deep learning. In: Advances in Neural Information Processing Systems 32, Curran Associates, Inc., pp 5563–5573, http://papers.nips.cc/paper/8794-an-adaptive-empirical-bayesian-method-for-sparse-deep-learning.pdf (2019)

  • Dissmann, J., Brechmann, E., Czado, C., Kurowicka, D.: Selecting and estimating regular vine copulae and application to financial returns. Comput. Stat. Data Anal. 59, 52–69 (2013). https://doi.org/10.1016/j.csda.2012.08.010

    Article  MathSciNet  MATH  Google Scholar 

  • Dobaczewski, J., Nazarewicz, W., Reinhard, P.G.: Error estimates of theoretical models: a guide. J. Phys. G Nucl. Part. Phys. 41(7), 074001 (2014). https://doi.org/10.1088/0954-3899/41/7/074001

    Article  Google Scholar 

  • Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  • Fayans, S.A.: Towards a universal nuclear density functional. J. Exp. Theor. Phys. Lett. 68(3), 169–174 (1998). https://doi.org/10.1134/1.567841

    Article  Google Scholar 

  • Fortunato, M., Blundell, C., Vinyals, O.: Bayesian recurrent neural networks. arXiv preprint arXiv: 1704.02798 (2017)

  • Geffner, T., Domke, J.: Using large ensembles of control variates for variational inference. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 31st edn. Curran Associates Inc., Red Hook (2018)

    Google Scholar 

  • Goldberger, A.: Econometric theory. Wiley publications in statistics, London (1966)

    MATH  Google Scholar 

  • Gu, M., Wang, L.: Scaled Gaussian stochastic process for computer model calibration and prediction. SIAM/ASA J. Uncertain. Quantif. 6(4), 1555–1583 (2018). https://doi.org/10.1137/17M1159890

    Article  MathSciNet  MATH  Google Scholar 

  • Han, S., Liao, X., Dunson, D., Carin, L.: Variational gaussian copula inference. In: Gretton, A., Robert, C. C. (eds) Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR, Cadiz, Spain, Proceedings of Machine Learning Research, vol 51, pp. 829–838, https://proceedings.mlr.press/v51/han16.html (2016)

  • Higdon, D., Kennedy, M., Cavendish, J.C., Cafeo, J.A., Ryne, R.D.: Combining field data and computer simulations for calibration and prediction. SIAM J. Sci. Comput. 26, 448–466 (2005). https://doi.org/10.1137/S1064827503426693

    Article  MathSciNet  MATH  Google Scholar 

  • Higdon, D., Gattiker, J., Williams, B., Rightley, M.: Computer model calibration using high-dimensional output. J. Am. Stat. Assoc. 103, 570–583 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Higdon, D., McDonnell, J.D., Schunck, N., Sarich, J., Wild, S.M.: A Bayesian approach for parameter estimation and prediction using a computationally intensive model. J. Phys. G Nucl. Part. Phys. 42(3), 034009 (2015). https://doi.org/10.1088/0954-3899/42/3/034009

    Article  Google Scholar 

  • Hoffman, M., Blei, D.: Stochastic structured variational inference. In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, PMLR, San Diego, CA, vol 38, pp. 361–369, http://proceedings.mlr.press/v38/hoffman15.html (2015)

  • Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013)

    MathSciNet  MATH  Google Scholar 

  • Homan, M.D., Gelman, A.: The no-U-turn sampler: adaptively setting path lengths in hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1351–1381 (2014)

    MathSciNet  Google Scholar 

  • Ireland, D.G., Nazarewicz, W.: Enhancing the interaction between nuclear experiment and theory through information and statistics. J. Phys. G Nucl. Part. Phys. 42(3), 030301 (2015). https://doi.org/10.1088/0954-3899/42/3/030301

    Article  Google Scholar 

  • Johnston, J.: Econometric Methods. McGraw-Hill, New York (1976)

    Google Scholar 

  • Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37, 183–233 (1999)

    Article  MATH  Google Scholar 

  • Kejzlar, V., Neufcourt, L., Nazarewicz, W., Reinhard, P.G.: Statistical aspects of nuclear mass models. J. Phys. G Nucl. Part. Phys. 47(9), 094001 (2020). https://doi.org/10.1088/1361-6471/ab907c

    Article  Google Scholar 

  • Kejzlar, V., Son, M., Bhattacharya, S., Maiti, T.: A fast and calibrated computer model emulator: an empirical bayes approach. Stat. Comput. 31(4), 1–26 (2021). https://doi.org/10.1007/s11222-021-10024-8

    Article  MathSciNet  MATH  Google Scholar 

  • Kennedy, M.C., O’Hagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 63, 425–464 (2001). https://doi.org/10.1111/1467-9868.00294

    Article  MathSciNet  MATH  Google Scholar 

  • King, G.B., Lovell, A.E., Neufcourt, L., Nunes, F.M.: Direct comparison between Bayesian and frequentist uncertainty quantification for nuclear reactions. Phys. Rev. Lett. 122, 232502 (2019)

    Article  Google Scholar 

  • Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improved variational inference with inverse autoregressive flow. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 29th edn. Curran Associates, Inc., Red Hook (2016)

    Google Scholar 

  • Kirson, M.W.: Mutual influence of terms in a semi-empirical mass formula. Nucl. Phys. A 798(1), 29–60 (2008). https://doi.org/10.1016/j.nuclphysa.2007.10.011

    Article  MathSciNet  Google Scholar 

  • Kobyzev, I., Prince, S.J., Brubaker, M.A.: Normalizing flows: an introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3964–3979 (2021). https://doi.org/10.1109/tpami.2020.2992934

    Article  Google Scholar 

  • Kortelainen, M., Lesinski, T., Moré, J.J., Nazarewicz, W., Sarich, J., Schunck, N., Stoitsov, M.V., Wild, S.M.: Nuclear energy density optimization. Phys. Rev. C 82(2), 024313 (2010). https://doi.org/10.1103/PhysRevC.82.024313

    Article  Google Scholar 

  • Kortelainen, M., McDonnell, J., Nazarewicz, W., Reinhard, P.G., Sarich, J., Schunck, N., Stoitsov, M.V., Wild, S.M.: Nuclear energy density optimization: large deformations. Phys. Rev. C 85, 024304 (2012). https://doi.org/10.1103/PhysRevC.85.024304

    Article  Google Scholar 

  • Kortelainen, M., McDonnell, J., Nazarewicz, W., Olsen, E., Reinhard, P.G., Sarich, J., Schunck, N., Wild, S.M., Davesne, D., Erler, J., Pastore, A.: Nuclear energy density optimization: shell structure. Phys. Rev. C 89, 054314 (2014). https://doi.org/10.1103/PhysRevC.89.054314

    Article  Google Scholar 

  • Krane, K.: Introductory Nuclear Physics. Wiley, London (1987)

    Google Scholar 

  • Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., Blei, D.M.: Automatic differentiation variational inference. J. Mach. Learn. Res. 18(14), 1–45 (2017)

    MathSciNet  MATH  Google Scholar 

  • Lopez-Paz, D., Hernández-Lobato, JM., Zoubin, G.: Gaussian process vine copulas for multivariate dependence. In: Dasgupta, S., McAllester, D. (eds) Proceedings of the 30th International Conference on Machine Learning, PMLR, Atlanta, Georgia, USA, Proceedings of Machine Learning Research, vol 28, pp. 10–18, https://proceedings.mlr.press/v28/lopez-paz13.html (2013)

  • Ma, Y.A., Chen, T., Fox, E.: A complete recipe for stochastic gradient mcmc. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 28th edn., pp. 2917–2925. Curran Associates, Inc., Red Hook (2015)

    Google Scholar 

  • McDonnell, J.D., Schunck, N., Higdon, D., Sarich, J., Wild, S.M., Nazarewicz, W.: Uncertainty quantification for nuclear density functional theory and information content of new measurements. Phys. Rev. Lett. 114(12), 122501 (2015). https://doi.org/10.1103/PhysRevLett.114.122501

    Article  Google Scholar 

  • Miller, A., Foti, N., D’Amour, A., Adams, R.P.: Reducing reparameterization gradient variance. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 30th edn. Curran Associates, Inc., Red Hook (2017)

    Google Scholar 

  • Morris, M.D., Mitchell, T.J.: Exploratory designs for computational experiments. J. Stat. Plan. Inference 43(3), 381–402 (1995). https://doi.org/10.1016/0378-3758(94)00035-T

    Article  MATH  Google Scholar 

  • Myers, W.D., Swiatecki, W.J.: Nuclear masses and deformations. Nucl. Phys. 81(2), 1–60 (1966). https://doi.org/10.1016/S0029-5582(66)80001-9

    Article  Google Scholar 

  • Neiswanger, W., Wang, C., Xing, EP.: Asymptotically exact, embarrassingly parallel mcmc. In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, VA, UAI’14, pp. 623–632, http://dl.acm.org/citation.cfm?id=3020751.3020816 (2014)

  • Papamakarios, G., Pavlakou, T., Murray, I.: Masked autoregressive flow for density estimation. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, 30th edn. Curran Associates, Inc., Red Hook (2017)

    Google Scholar 

  • Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S., Lakshminarayanan, B.: Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 22(57), 1–64 (2021)

    MathSciNet  MATH  Google Scholar 

  • Peterson, C., Anderson, J.R.: A mean field theory learning algorithm for neural networks. Complex Syst. 1, 995–1019 (1987)

    MATH  Google Scholar 

  • Plumlee, M.: Bayesian calibration of inexact computer models. J. Am. Stat. Assoc. 112, 1274–1285 (2017). https://doi.org/10.1080/01621459.2016.1211016

    Article  MathSciNet  Google Scholar 

  • Plumlee, M.: Computer model calibration with confidence and consistency. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 81(3), 519–545 (2019). https://doi.org/10.1111/rssb.12314

    Article  MathSciNet  MATH  Google Scholar 

  • Plumlee, M., Joseph, V.R., Yang, H.: Calibrating functional parameters in the ion channel models of cardiac cells. J. Am. Stat. Assoc. 111, 500–509 (2016)

    Article  MathSciNet  Google Scholar 

  • Pollard, D., Chang, W., Haran, M., Applegate, P., DeConto, R.: Large ensemble modeling of the last deglacial retreat of the West Antarctic ice sheet: comparison of simple and advanced statistical techniques. Geosci. Model Dev. 9(5), 1697–1723 (2016)

    Article  Google Scholar 

  • Quiñonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6, 1939–1959 (2005)

    MathSciNet  MATH  Google Scholar 

  • Ranganath, R., Gerrish, S., Blei, D.: Black box variational inference. Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, PMLR, Proceedings of Machine Learning Research 33, 814–822 (2014)

  • Ranganath, R., Tran, D., Blei, DM.: Hierarchical variational models. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning – Volume 48, JMLR, ICML’16, pp. 2568–2577 (2016)

  • Reinhard, P.G., Bender, M., Nazarewicz, W., Vertse, T.: From finite nuclei to the nuclear liquid drop: Leptodermous expansion based on self-consistent mean-field theory. Phys. Rev. C 73, 014309 (2006). https://doi.org/10.1103/PhysRevC.73.014309

    Article  Google Scholar 

  • Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: Bach F, Blei D (eds) Proceedings of the 32nd International Conference on Machine Learning, PMLR, Lille, France, Proceedings of Machine Learning Research, vol 37, pp. 1530–1538, https://proceedings.mlr.press/v37/rezende15.html (2015)

  • Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  • Robert, C., Casella, G.: Monte Carlo Statistical Methods. Springer Texts in Statistics, New York (2005)

    MATH  Google Scholar 

  • Ross, S.M.: Simulation, 4th edn. Academic Press Inc., Orlando (2006)

    MATH  Google Scholar 

  • Ruiz, FJR., Titsias, MK., Blei, DM.: Overdispersed black-box variational inference. In: Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, Virginia, USA, UAI’16, pp. 647-656 (2016)

  • Sexton, D.M.H., Murphy, J.M., Collins, M., Webb, M.J.: Multivariate probabilistic projections using imperfect climate models Part i: outline of methodology. Clim. Dyn. 38(11), 2513–2542 (2012)

    Article  Google Scholar 

  • Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris 8, 229–231 (1959)

    MathSciNet  MATH  Google Scholar 

  • Smith, M.S., Loaiza-Maya, R., Nott, D.J.: High-dimensional copula variational approximation through transformation. J. Comput. Graph. Stat. (2020). https://doi.org/10.1080/10618600.2020.1740097

    Article  MathSciNet  MATH  Google Scholar 

  • Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4, 26–31 (2012)

    Google Scholar 

  • Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5, 567–574 (2009)

  • Tran, D., Blei, DM., Airoldi, EM.: Copula variational inference. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, MIT Press, Cambridge, MA, NeurIPS’15, pp. 3564–3572, http://dl.acm.org/citation.cfm?id=2969442.2969637 (2015)

  • Tran, D., Ranganath, R., Blei, DM.: Hierarchical implicit models and likelihood-free variational inference. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NeurIPS’17, pp. 5529–5539, http://dl.acm.org/citation.cfm?id=3295222.3295304 (2017)

  • Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Found. Trends® Mach. Learn. 1(1–2), 1–305 (2008). https://doi.org/10.1561/2200000001

    Article  MATH  Google Scholar 

  • Wang, Y., Blei, D.M.: Frequentist consistency of variational Bayes. J. Am. Stat. Assoc. 114(527), 1147–1161 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Weilbach, C., Beronov, B., Wood, F., Harvey, W.: Structured conditional continuous normalizing flows for efficient amortized inference in graphical models. In: Chiappa S, Calandra R (eds) Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR, Proceedings of Machine Learning Research, vol 108, pp. 4441–4451, https://proceedings.mlr.press/v108/weilbach20a.html (2020)

  • Weizsäcker CFv.: Zur theorie der kernmassen. Z. Phys. 96(7), 431–458 (1935). https://doi.org/10.1007/BF01337700

  • Williams, B., Higdon, D., Gattiker, J., Moore, L., McKay, M., Keller-McNulty, S.: Combining experimental data and computer simulations, with an application to flyer plate experiments. Bayesian Anal. 1(4), 765–792 (2006)

    MathSciNet  MATH  Google Scholar 

  • Xie, F., Xu, Y.: Bayesian projected calibration of computer models. J. Am. Stat. Assoc. 116(536), 1965–1982 (2021). https://doi.org/10.1080/01621459.2020.1753519

    Article  MathSciNet  Google Scholar 

  • Yuan, C.: Uncertainty decomposition method and its application to the liquid drop model. Phys. Rev. C 93, 034310 (2016). https://doi.org/10.1103/PhysRevC.93.034310

    Article  Google Scholar 

  • Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv:1212.5701 (2012)

  • Zhang, L., Jiang, Z., Choi, J., Lim, C.Y., Maiti, T., Baek, S.: Patient-specific prediction of abdominal aortic aneurysm expansion using Bayesian calibration. IEE J. Biomed. Health Inform. (2019). https://doi.org/10.1109/JBHI.2019.2896034

Download references

Acknowledgements

The authors thank the reviewers and the Editor for their helpful comments and ideas. This work was supported in part through computational resources and services provided by the Institute for Cyber-Enabled Research at Michigan State University.

Funding

The research is partially supported by the National Science Foundation funding DMS-1952856, DMS-2124605, DMS-1924724, and OAC-2004601.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vojtech Kejzlar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Scalable algorithm with truncate C-vine copulas

Here we present the details of the C-vine based versions of Algorithm 1 and Algorithm 2. First, we can decompose the log-likelihood \(\log p({\varvec{d}}|{\varvec{\phi }})\) using a C-vine as

$$\begin{aligned} \log p({\varvec{d}}|{\varvec{\phi }}) = \sum _{j = 1}^{N-1} \sum _{i = 1}^{N-j} f^C_{j, j+i}({\varvec{\phi }}), \end{aligned}$$
(50)

where

$$\begin{aligned} \begin{aligned} f^C_{j, j +i}({\varvec{\phi }})&= \log c_{j,(j+i); 1, \dots , (j-1)} \\&\quad +\frac{1}{N-1}\big (\log p(d_j|{\varvec{\phi }}) + \log p(d_{j+i}|{\varvec{\phi }})\big ). \end{aligned} \end{aligned}$$
(51)

This now yields the following expression for the ELBO gradient:

$$\begin{aligned} \begin{aligned} \nabla _\lambda {\mathcal {L}}(\lambda )&= \sum _{j = 1}^{N-1} \sum _{i = 1}^{N-j}{\mathbb {E}}_q\bigg [\nabla _{{\varvec{\lambda }}} \log q({\varvec{\phi }}|{\varvec{\lambda }})( f^C_{j, j+i}({\varvec{\phi }}))\bigg ] \\&\quad -{\mathbb {E}}_q\bigg [\nabla _{{\varvec{\lambda }}} \log q({\varvec{\phi }}|{\varvec{\lambda }})\log \frac{ q({\varvec{\phi }}|{\varvec{\lambda }})}{p({\varvec{\phi }})}\bigg ]. \end{aligned} \end{aligned}$$
(52)

Equivalently to Proposition 1, we have the following proposition that establishes the noisy unbiased estimate of the gradient (52) using the C-vine copula decomposition.

Proposition 4

Let \(\tilde{{\mathcal {L}}}_C({\varvec{\lambda }})\) be an estimate of the ELBO gradient \(\nabla _{{\varvec{\lambda }}} {\mathcal {L}}({\varvec{\lambda }})\) defined as

$$\begin{aligned} \begin{aligned} \tilde{{\mathcal {L}}}_C({\varvec{\lambda }})&= \frac{N(N-1)}{2}{\mathbb {E}}_q\bigg [\nabla _{{\varvec{\lambda }}}\log q({\varvec{\phi }}|{\varvec{\lambda }})( f^C_{I_C(K)}({\varvec{\phi }}))\bigg ] \\&\quad -{\mathbb {E}}_q\bigg [\nabla _{{\varvec{\lambda }}} \log q({\varvec{\phi }}|{\varvec{\lambda }})\log \frac{ q({\varvec{\phi }}|{\varvec{\lambda }})}{ p({\varvec{\phi }})}\bigg ], \end{aligned} \end{aligned}$$

where \(K \sim U(1, \dots , \frac{N(N-1)}{2})\), and \(I_C\) is the bijection

$$\begin{aligned} \begin{aligned}&I_C:\{1, \dots , \frac{N(N-1)}{2}\} \\&\quad \rightarrow \{(j,j+i): i \in \{1, \dots , N-j\} \text { for } j \in \{1, \dots N-1\}\}, \end{aligned} \end{aligned}$$

then \(\tilde{{\mathcal {L}}}_C({\varvec{\lambda }})\) is unbiased i.e., \({\mathbb {E}}(\tilde{{\mathcal {L}}}_C({\varvec{\lambda }})) = \nabla _{{\varvec{\lambda }}} {\mathcal {L}}({\varvec{\lambda }})\).

Again, \(\tilde{{\mathcal {L}}}_C({\varvec{\lambda }})\) can be relatively costly to compute for large datasets due to the recursive nature of the copula density computations. We now carry out exactly the same development an using l-truncated C-vine as in the case of Proposition 2 and Proposition 3.

Proposition 5

If the copula of \(p({\varvec{d}}|{\varvec{\phi }})\) is distributed according to an l-truncated C-vine, we can rewrite

$$\begin{aligned} \log p({\varvec{d}}|{\varvec{\phi }}) = \sum _{j = 1}^{l} \sum _{i = 1}^{N-j} f^{C_l}_{i, i+j}({\varvec{\phi }}), \end{aligned}$$
(53)

where

$$\begin{aligned} \begin{aligned} f^{C_l}_{i, i+j}({\varvec{\phi }})&= \log c_{j,(j+i); 1, \dots , (j-1)}+ \frac{1}{a_j}\log p(d_j|{\varvec{\phi }}) \\&\quad +\frac{1}{b_{j+i}} \log p(d_{j+i}|{\varvec{\phi }}), \end{aligned} \end{aligned}$$
(54)

and

$$\begin{aligned} a_j&=N-1, \\ b_{j+i}&= (N-1-l)\mathbb {1}_{j + i \le l} + l. \end{aligned}$$

Let us now replace the full log-likelihood \(\log ({\varvec{d}}|{\varvec{\phi }})\) in the definition of ELBO with the likelihood based on a truncated vine copula. This yields the l-truncated ELBO for the l-truncated C-vine

$$\begin{aligned} {\mathcal {L}}_{C_l}({\varvec{\lambda }}) = {\mathbb {E}}_q\bigg [\sum _{j = 1}^{l} \sum _{i = 1}^{N-j} f^{C_l}_{j, j+i}({\varvec{\phi }})\bigg ] - KL(q({\varvec{\phi }}|{\varvec{\lambda }})||p({\varvec{\phi }}))\nonumber \\ \end{aligned}$$
(55)

with its gradient

$$\begin{aligned} \begin{aligned} \nabla _{{\varvec{\lambda }}} {\mathcal {L}}_{C_l}({\varvec{\lambda }})&= \sum _{j = 1}^{l} \sum _{i = 1}^{N-j}{\mathbb {E}}_q\bigg [\nabla _{{\varvec{\lambda }}} \log q({\varvec{\phi }}|{\varvec{\lambda }})(f^{C_l}_{j, j+i}({\varvec{\phi }}))\bigg ] \\&\quad -{\mathbb {E}}_q\bigg [\nabla _{{\varvec{\lambda }}} \log q({\varvec{\phi }}|{\varvec{\lambda }})\log \frac{q({\varvec{\phi }}|{\varvec{\lambda }})}{p({\varvec{\phi }})}\bigg ]. \end{aligned} \end{aligned}$$

Consequently, we get the following proposition that establishes the noisy unbiased estimate of \(\nabla _{{\varvec{\lambda }}} {\mathcal {L}}_{C_l}({\varvec{\lambda }})\).

Proposition 6

Let \(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})\) be an estimate of the ELBO gradient \(\nabla _{{\varvec{\lambda }}} {\mathcal {L}}_{C_l}({\varvec{\lambda }})\) defined as

$$\begin{aligned} \begin{aligned}&\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }}) \\&\quad =\frac{l(2N-(l + 1))}{2}{\mathbb {E}}_q\bigg [\nabla _{{\varvec{\lambda }}} \log q({\varvec{\phi }}|{\varvec{\lambda }})( f^{C_l}_{I_{C_l}(K)}({\varvec{\phi }}))\bigg ] \\&\qquad -{\mathbb {E}}_q\bigg [\nabla _{{\varvec{\lambda }}} \log q(\phi |{\varvec{\lambda }})\log \frac{ q(\phi |{\varvec{\lambda }})}{ p(\phi )}\bigg ], \end{aligned} \end{aligned}$$

where \(K \sim U(1, \dots , \frac{l(2N-(l + 1))}{2})\), and \(I_{C_l}\) is the bijection

$$\begin{aligned} \begin{aligned}&I_{C_l}:\{1, \dots , \frac{l(2N-(l + 1))}{2}\} \\&\quad \rightarrow \{(j,j+i): i \in \{1, \dots , N-j\} \text { for } j \in \{1, \dots l\}\}, \end{aligned} \end{aligned}$$

then \(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})\) is unbiased i.e., \({\mathbb {E}}(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})) = \nabla _{{\varvec{\lambda }}} {\mathcal {L}}_{C_l}({\varvec{\lambda }})\).

Algorithm 3 postulates the version of Algorithm 1 based on the truncated C-vine decomposition.

figure c

1.1 A.1 Variance reduction

Let us now consider the MC approximation of the gradient estimator \(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})\). The \(j^{th}\) component of the estimator with Rao-Blackwellization is

$$\begin{aligned} \begin{aligned}&\frac{1}{S}\sum _{s=1}^{S}\bigg [\frac{l(2N-(l + 1))}{2}\nabla _{{\varvec{\lambda }}_j} \log q(\phi _j[s]|{\varvec{\lambda }}_j)\big ({\tilde{f}}^j_{(K)}({\varvec{\phi }}[s]) \\&\quad -\frac{2\log \frac{ q(\phi _j[s]|{\varvec{\lambda }}_j)}{p(\phi _j[s])}}{l(2N-(l + 1))}\big )\bigg ], \end{aligned} \end{aligned}$$

where \({\tilde{f}}^j_{(K)}({\varvec{\phi }})\) are here the components of \(f^{C_l}_{I_{C_l}(K)}({\varvec{\phi }})\) that include \(\phi _j\).

We can again use the control variates to reduce the variance of MC approximation of the gradient estimator \(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})\). In particular, we consider the following \(j^{th}\) element of the Rao-Blackwellized MC approximation of the gradient estimator \(\tilde{{\mathcal {L}}}_{C_l}({\varvec{\lambda }})\) with control variates

$$\begin{aligned} \begin{aligned}&\tilde{{\mathcal {L}}}_{C_l}^{CV}({\varvec{\lambda }})_j \\&\quad =\sum _{s=1}^{S}\bigg [\frac{l(2N-(l + 1))}{2S}\nabla _{{\varvec{\lambda }}_j} \log q(\phi _j[s]|{\varvec{\lambda }}_j)( {\tilde{f}}^j_{(K)}({\varvec{\phi }}[s]) \\&\qquad -\frac{2(\log \frac{ q(\phi _j[s]|{\varvec{\lambda }}_j)}{p(\phi _j[s])} + {\hat{a}}^C_j)}{l(2N-(l + 1))})\bigg ], \end{aligned} \end{aligned}$$

where \({\hat{a}}^C_j\) is the sample estimate of the \(j^{th}\) component of the optimal control variate scale \(a^*\) based on S (or fever) independent draws from the variational distribution. Namely,

$$\begin{aligned} a^* = \frac{{{\mathbb {C}}ov}_q(\xi ^C({\varvec{\phi }}),\psi ^C({\varvec{\phi }}))}{{{\mathbb {V}}ar}_q(\psi ^C({\varvec{\phi }}))}, \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \xi ^C({\varvec{\phi }})&= \frac{l(2N-(l + 1))}{2}\nabla _{{\varvec{\lambda }}} \log q(\phi |{\varvec{\lambda }})\bigg (f^{C_l}_{I_{C_l}(K)}({\varvec{\phi }}) \\&\quad - \frac{2 \log q(\phi |{\varvec{\lambda }})}{l(2N-(l + 1)) \log p(\phi )}\bigg ) \end{aligned} \end{aligned}$$

and \(\psi ^C({\varvec{\phi }}) = \nabla _{{\varvec{\lambda }}} \log q(\phi |{\varvec{\lambda }})\).

As in the case of the D-vine, we now derive the ultimate Algorithm 4. Again, instead of taking the samples from \(q({\varvec{\phi }}| {\varvec{\lambda }})\) to approximate the gradient estimates, we will take samples from an overdispersed distribution \(r({\varvec{\phi }}|{\varvec{\lambda }}, \tau )\). Combining the Rao-Blackwellization, control variates, and importance sampling, we have the following \(j^{th}\) component of the MC approximation of the gradient estimator \(\tilde{{\mathcal {L}}}_{C_l}(\lambda )\)

$$\begin{aligned}&\tilde{{\mathcal {L}}}_{C_l}^{OCV}({\varvec{\lambda }})_j\\&\quad =\sum _{s=1}^{S}\bigg [\frac{l(2N-(l + 1))}{2S}\nabla _{{\varvec{\lambda }}_j} \log q(\phi _j[s]|{\varvec{\lambda }}_j)({\tilde{f}}^j_{(K)}({\varvec{\phi }}[s])\\&\qquad -\frac{2(\log \frac{ q(\phi _j[s]|{\varvec{\lambda }}_j)}{p(\phi _j[s])} + {\tilde{a}}^C_j)}{l(2N-(l + 1))})w(\phi _j[s])\bigg ], \end{aligned}$$

where \({\varvec{\phi }}[s] \sim r({\varvec{\phi }}|{\varvec{\lambda }}, \tau )\) and \(w({\varvec{\phi }}[s] ) = q({\varvec{\phi }}[s] |{\varvec{\lambda }}) / r({\varvec{\phi }}[s]|{\varvec{\lambda }}, \tau )\) with \({\tilde{a}}^C_j\) being the estimate of the \(j^{th}\) component of

$$\begin{aligned} a^*_O=\frac{{\mathbb {C}}ov_r[\xi ^C({\varvec{\phi }})w(\phi ) ,\psi ^C({\varvec{\phi }})w(\phi )]}{{\mathbb {V}}ar_r[\psi ^C({\varvec{\phi }})w(\phi )]}. \end{aligned}$$
figure d

Appendix B Proofs

Proof of Proposition 1

Since \(P(K = k) = \frac{2}{N(N-1)}\), we have directly from the definition of expectation

$$\begin{aligned}&{\mathbb {E}}(\tilde{{\mathcal {L}}}_D({\varvec{\lambda }})) \\&\quad =\frac{N(N-1)}{2} \sum _{k=1}^{\frac{N(N-1)}{2}}\frac{2}{N(N-1)}{\mathbb {E}}_q\\ {}&\qquad \times \bigg [\nabla _{{\varvec{\lambda }}}\log q({\varvec{\phi }}|{\varvec{\lambda }})( f^D_{I_D(k)}({\varvec{\phi }}))\bigg ] \\&\qquad - {\mathbb {E}}_q\bigg [\nabla _{{\varvec{\lambda }}} \log q({\varvec{\phi }}|{\varvec{\lambda }})\log \frac{ q({\varvec{\phi }}|{\varvec{\lambda }})}{ p({\varvec{\phi }})}\bigg ] = \nabla _{{\varvec{\lambda }}} {\mathcal {L}}({\varvec{\lambda }}). \end{aligned}$$

The final equality is the consequence of the uniqueness of the pairs of variables in the conditioned sets of the copula density \(c_{i,(i+j); (i+1), \dots , (i+j-1)}\), and that \(\frac{N(N-1)}{2}\) is the number of unordered pairs of N variables. \(\square \)

Proof of Proposition 2

It is sufficient to show that for \(l \in \{1, \dots , N-1\}\) the following equality holds:

$$\begin{aligned} \begin{aligned}&\sum _{j = 1}^{l} \sum _{i = 1}^{N-j}\bigg [\frac{1}{a_i}\log p(d_i|{\varvec{\phi }}) + \frac{1}{b_{i+j}} \log p(d_{i+j}|{\varvec{\phi }})\bigg ] \\&\quad =\sum _{k = 1}^N \log p({\varvec{d}}_k|{\varvec{\phi }}), \end{aligned} \end{aligned}$$
(56)

where

$$\begin{aligned} a_i&= 2l - \bigg [(l+1-i)\mathbb {1}_{i \le l} + (l - N +i)\mathbb {1}_{i> N - l}\bigg ], \\ b_{i+j}&= 2l - \bigg [(l+1-j-i)\mathbb {1}_{i + j \le l} \\&\quad +(l - N +j+i)\mathbb {1}_{i + j > N - l}\bigg ]. \end{aligned}$$

To show this, let us consider the summation

$$\begin{aligned}&\sum _{j = 1}^{l} \sum _{i = 1}^{N-j}\bigg [\log p(d_i|{\varvec{\phi }}) + \log p(d_{i+j}|{\varvec{\phi }})\bigg ]\\&\quad =\sum _{j = 1}^{l} \bigg [(\log p(d_1|{\varvec{\phi }}) + \log p(d_{1+j}|{\varvec{\phi }})) + \dots \\&\qquad +(\log p(d_{N-j}|{\varvec{\phi }}) + \log p(d_{N}|{\varvec{\phi }}))\bigg ]. \end{aligned}$$

For \(l = 1\), we get

$$\begin{aligned}&\sum _{j = 1}^{l} \sum _{i = 1}^{N-j}\bigg [\log p(d_i|{\varvec{\phi }}) + \log p(d_{i+j}|{\varvec{\phi }})\bigg ]\\&\quad =(\log p(d_1|{\varvec{\phi }}) + \log p(d_{2}|{\varvec{\phi }})) + \dots \\&\qquad +(\log p(d_{N-1}|{\varvec{\phi }}) + \log p(d_{N}|{\varvec{\phi }})), \end{aligned}$$

and for \(l \ge 2\)

$$\begin{aligned}&\sum _{j = 1}^{l} \sum _{i = 1}^{N-j}\bigg [\log p(d_i|{\varvec{\phi }}) + \log p(d_{i+j}|{\varvec{\phi }})\bigg ]\\&\quad =\bigg [(\log p(d_1|{\varvec{\phi }}) + \log p(d_{2}|{\varvec{\phi }})) + \dots \\&\qquad +(\log p(d_{N-1}|{\varvec{\phi }}) + \log p(d_{N}|{\varvec{\phi }}))\bigg ] +\dots \\ {}&\qquad + \bigg [(\log p(d_1|{\varvec{\phi }}) + \log p(d_{1+l}|{\varvec{\phi }})) + \dots \\&\qquad +(\log p(d_{N-l}|{\varvec{\phi }}) + \log p(d_{N}|{\varvec{\phi }}))\bigg ]. \end{aligned}$$

Note that in the case of \(l = N-1\), the last summation consists of only one element \(\log p_1(d_1|{\varvec{\phi }}) + \log p_{1+l}(d_{1+l}|{\varvec{\phi }})\). By careful examination of the two cases above, we get the following results. For \(2l \le N\):

$$\begin{aligned}&\sum _{j = 1}^{l} \sum _{i = 1}^{N-j}\bigg [\log p(d_i|{\varvec{\phi }}) + \log p(d_{i+j}|{\varvec{\phi }})\bigg ]\\&\quad =\sum _{k = 1}^l (l + k -1)\log p(d_k|{\varvec{\phi }}) + \sum _{k =l + 1}^{N-l} 2l \log p(d_k|{\varvec{\phi }}) \\&\qquad +\sum _{k =N - l + 1}^{N} (N - i + l)\log p(d_k|{\varvec{\phi }}), \end{aligned}$$

where the middle term disappears in the case \(2l = N\), and for \(2l > N\):

$$\begin{aligned}&\sum _{j = 1}^{l} \sum _{i = 1}^{N-j}\bigg [\log p(d_i|{\varvec{\phi }}) + \log p(d_{i+j}|{\varvec{\phi }})\bigg ]\\&\quad =\sum _{k = 1}^{N-l} (l + k -1)\log p(d_k|{\varvec{\phi }}) \\&\qquad +\sum _{k = N - l + 1}^{l} (N-1) \log p(d_k|{\varvec{\phi }}) \\&\qquad + \sum _{k =l+1}^{N} (N - i + l)\log p(d_k|{\varvec{\phi }}). \end{aligned}$$

If we now check that \(a_i\) equals to the factors in front of the log-likelihoods in the two cases above, the proof of Proposition 2 is complete. Note that once we check the equality for \(a_i\), the same directly translates to \(b_{i+j}\) since \(b_{i+j}\) is \(a_i\) with indices set to \(i+j\) instead of i. Indeed, for \(2l \le N\)

$$\begin{aligned} a_{i} = {\left\{ \begin{array}{ll} l +i -1 &{} i\le l\\ 2l &{} l< i \le N -l\\ N - i + l &{} N - l < i \end{array}\right. }, \end{aligned}$$

and for \(2l > N\)

$$\begin{aligned} a_{i} = {\left\{ \begin{array}{ll} l +i -1 &{} i\le N - l\\ N -1 &{} N - l< i \le l\\ N - i + l &{} l < i \end{array}\right. }. \end{aligned}$$

\(\square \)

Proof of Proposition 3

By the construction of R-vine (see Cooke and Kurowicka (2006)), each tree \({\mathcal {T}}_i\), for \(i = 1, \dots , N-1\) has exactly \(N-i\) edges (these are the unique conditioned variable pairs). For any R-vine truncated at level \(l \in \{1, \dots , N-1\}\), we get the number of edges to be

$$\begin{aligned} \sum _{i = 1}^l (N - i) = lN - \frac{l(l+1)}{2} = \frac{l(2N - (l+1))}{2} \end{aligned}$$

The rest of the proof is identical with that of Proposition 1 due to the uniqueness of the conditioned variable pairs in the copula density \(c_{i,(i+j); (i+1), \dots , (i+j-1)}\), but in this case \(P(K = k) = \frac{2}{l(2N - (l+1))}\). \(\square \)

Proof of Proposition 4

The proof is identical with that of Proposition 1 since each conditioned pair in the copula density \(c_{j,(j+i); 1, \dots , (j-1)}\) is unique as well. \(\square \)

Proof of Proposition 5

It is sufficient to show that for \(l \in \{1, \dots , N-1\}\) the following equality holds:

$$\begin{aligned} \begin{aligned}&\sum _{j = 1}^{l} \sum _{i = 1}^{N-j}\bigg [\frac{1}{a_j}\log p(d_j|{\varvec{\phi }}) + \frac{1}{b_{j+i}} \log p(d_{j+i}|{\varvec{\phi }})\bigg ] \\&\quad =\sum _{k = 1}^N \log p({\varvec{d}}_k|{\varvec{\phi }}), \end{aligned} \end{aligned}$$
(57)

where

$$\begin{aligned} a_j&= N-1, \\ b_{j + i}&= (N-1-l)\mathbb {1}_{j + i \le l} + l. \end{aligned}$$

To show this, let us consider the following summation

$$\begin{aligned}&\sum _{j = 1}^{l} \sum _{i = 1}^{N-j}\bigg [\log p_j(d_j|{\varvec{\phi }}) + \log p(d_{j+i}|{\varvec{\phi }})\bigg ]\\&\quad = \sum _{j = 1}^{l} \bigg [(N - j)\log p(d_j|{\varvec{\phi }}) + \sum _{i = 1}^{N-j}\log p(d_{j+i}|{\varvec{\phi }})\bigg ] \\&\quad = \sum _{j = 1}^{l}(N - j)\log p(d_j|{\varvec{\phi }}) \\&\qquad +\sum _{j = 1}^{l}\bigg [\log p(d_{j+1}|{\varvec{\phi }})) + \dots + \log p(d_{N}|{\varvec{\phi }})\bigg ]. \end{aligned}$$

Now, for \(l = 1\), we have

$$\begin{aligned}&\sum _{j = 1}^{l}\bigg [\log p(d_{j+1}|{\varvec{\phi }})) + \dots + \log p(d_{N}|{\varvec{\phi }})\bigg ] \\&\quad =\log p(d_{2}|{\varvec{\phi }}) + \dots \log p(d_{N}|{\varvec{\phi }}). \end{aligned}$$

For \(l \ge 2\), we have

$$\begin{aligned}&\sum _{j = 1}^{l}\bigg [\log p(d_{j+1}|{\varvec{\phi }})) + \dots + \log p(d_{N}|{\varvec{\phi }})\bigg ]\\&\quad = \bigg [\log p(d_{2}|{\varvec{\phi }}) + \dots \log p(d_{N}|{\varvec{\phi }})\bigg ] + \dots \\&\qquad + \bigg [\log p(d_{l+1}|{\varvec{\phi }}) + \dots \log p(d_{N}|{\varvec{\phi }})\bigg ] . \end{aligned}$$

Therefore we can rewrite

$$\begin{aligned}&\sum _{j = 1}^{l}\bigg [\log p(d_{j+1}|{\varvec{\phi }})) + \dots + \log p(d_{N}|{\varvec{\phi }})\bigg ]\\&\quad = \sum _{j = 1}^{l}(j-1) \log p(d_j|{\varvec{\phi }}) + \sum _{j = l+1}^{N}l \log p(d_j|{\varvec{\phi }}) \end{aligned}$$

Overall,

$$\begin{aligned}&\sum _{j = 1}^{l} \sum _{i = 1}^{N-j}\bigg [\log p(d_j|{\varvec{\phi }}) + \log p(d_{j+i}|{\varvec{\phi }})\bigg ] \\&\quad = \sum _{j = 1}^{l}(N - j)\log p(d_j|{\varvec{\phi }}) + \sum _{j = 1}^{l}(j-1) \log p(d_j|{\varvec{\phi }}) \\&\qquad +\sum _{j = l+1}^{N}l \log p(d_j|{\varvec{\phi }})\\&\quad = \sum _{k=1}^{l}(N-1)\log p(d_k|{\varvec{\phi }}) + \sum _{k=l+1}^{N}l \log p(d_k|{\varvec{\phi }}). \end{aligned}$$

Since \(j \in \{1, \dots , l\}\) and

$$\begin{aligned} b_{j + i} = {\left\{ \begin{array}{ll} N - 1 &{} j + i\le l \\ l &{} j + i > l \end{array}\right. }, \end{aligned}$$

the equality 57 holds. \(\square \)

Proof of Proposition 6

The proof is identical with that of Proposition 3 since each conditioned pair in the copula density \(c_{j,(j+i); 1, \dots , (j-1)}\) is unique, and a C-vine is a special case of R-vine. \(\square \)

Appendix C Simulation: truncation level

Figure 6 and Table 4 show the changes in values of variational parameters with the increasing size of truncation level l for the calibration simulation described in Sect. 5.1.1. The differences between \(L^2\) norms of variational parameters are provided both for all the variational parameters and for the variational approximations of the calibration parameters only. Figure 6 exhibits a clear elbow shape with minimal change of variational parameters for \(l \ge 3\).

Fig. 6
figure 6

\(L^2\) norm values for the difference between variational parameters obtained via Algorithm 2 with the increasing value of truncation level l. The symbol \({\varvec{\lambda }}_\theta \) corresponds for the variational parameters of the variational approximations for calibration parameters

Table 4 The values of \(L^2\) norms depicted in Fig. 6

Appendix D Simulation: memory profile

Here we present the memory profiles (Fig. 7) for the MH, the NUTS, and the Algorithm 2 under the simulation scenario studied in Chapter 5. These were recorded during a one hour period of running the algorithms. The MH algorithm and the NUTS were implemented in Python 3.0 using the PyMC3 module version 3.5. The memory profiles were measured using the memory-profiler module version 0.55.0 in Python 3.0. The VC was also implemented in Python 3.0. The code was run on the high performance computing cluster at the Institute for Cyber-Enabled Research at Michigan State University.

Fig. 7
figure 7

Recorded memory profiles of the Algorithm 2, the MH algorithm, and the NUTS for the duration of 1h under the simulation scenario with \(n = 0.5 \times 10 ^ 4\), \(n = 1 \times 10 ^ 4\), and \(n = 2 \times 10 ^ 4\)

Appendix E Application: liquid drop model (LDM)

1.1 E.1 GP specifications

In the case of the LDM \(E_B(Z,N)\), we consider the GP prior with mean zero and covariance function

$$\begin{aligned}&\eta _{\gamma } \\&\quad \times \text {exp}\left( -\frac{\Vert Z - Z' \Vert ^2}{2\nu ^2_Z} -\frac{\Vert N - N' \Vert ^2}{2\nu ^2_N} -\frac{\Vert \theta _{\textrm{vol}} - \theta _{\textrm{vol}}' \Vert ^2}{2\nu ^2_1} \right. \\&\left. \quad - \frac{\Vert \theta _{\textrm{surf}} - \theta _{\textrm{surf}}' \Vert ^2}{2\nu ^2_2} -\frac{\Vert \theta _{\textrm{sym}} - \theta _{\textrm{sym}}' \Vert ^2}{2\nu ^2_3} -\frac{\Vert \theta _{\textrm{C}} - \theta _{\textrm{C}}' \Vert ^2}{2\nu ^2_4}\right) . \end{aligned}$$

Similarly, we consider the GP process prior for the systematic discrepancy \(\delta (Z,N)\) with mean zero and covariance function

$$\begin{aligned} \eta _\delta \times \exp {\left( -\frac{\Vert Z - Z' \Vert ^2}{2l^2_Z} -\frac{\Vert N - N' \Vert ^2}{2l^2_N}\right) }. \end{aligned}$$

1.2 E.2 Experimental design

Kennedy and O’Hagan (2001) recommend to select the calibration inputs for the model runs so that any plausible value \({\varvec{\theta }}\) of the true calibration parameter is covered. In this context, we consider the space of calibration parameters to be centered at the values of least squares estimates \({\hat{{\varvec{\theta }}}}_{L_2}\) and broad enough to contain the majority of values provided by the nuclear physics literature (Weizsäcker 1935; Bethe and Bacher 1936; Myers and Swiatecki 1966; Kirson 2008; Benzaid et al. 2020). Table 5 gives the lower and upper bounds for the parameter space so that \(\text {Lower bound} = {\hat{\theta }}_{L_2} - 15 \times SE({\hat{\theta }}_{L_2})\) and \(\text {Upper bound} = {\hat{\theta }}_{L_2} + 15 \times SE({\hat{\theta }}_{L_2})\). Here \(SE({\hat{\theta }}_{L_2})\) is given by the standard linear regression theory.

Table 5 The space of calibration parameters used for generating the outputs of semi-empirical mass formula (47)

1.3 E.3 Prior distributions

First, we consider the independent Gaussian distributions centered at the LS estimates \({\hat{{\varvec{\theta }}}}_{L_2}\) (in Table 3) with standard deviations \(7.5 \times SE({\hat{{\varvec{\theta }}}}_{L_2})\) so that the calibration parameters used for generating the model runs are covered roughly within two standard deviations of the priors. Namely,

$$\begin{aligned} \theta _{\text {vol}}&\sim \mathcal {N}(15.42, 0.203), \\ \theta _{\text {surf}}&\sim \mathcal {N}(16.91, 0.645), \\ \theta _{\text {sym}}&\sim \mathcal {N}(22.47, 0.525), \\ \theta _{\text {C}}&\sim \mathcal {N}(0.69, 0.015). \end{aligned}$$

The prior distributions for hyperparameters of the GPs were selected as \(\text {Gamma}(\alpha , \beta )\) with the shape parameter \(\alpha \) and scale parameter \(\beta \), so that they represent a vague knowledge about the scale of these parameters given by the literature on nuclear mass models (Weizsäcker 1935; Bethe and Bacher 1936; Myers and Swiatecki 1966; Fayans 1998; Kirson 2008; McDonnell et al. 2015; Kortelainen et al. 2010, 2012, 2014; Benzaid et al. 2020; Kejzlar et al. 2020). In particular, the error scale \(\sigma \) is in the majority of nuclear applications within units of MeV, therefore we set

$$\begin{aligned} \sigma \sim \text {Gamma}(2,1), \end{aligned}$$

with the scale of the systematic error being

$$\begin{aligned} \eta _\delta \sim \text {Gamma}(10,1), \end{aligned}$$

to allow for this quantity to range between the units and tens of MeV. It is also reasonable to assume that the mass of a given nucleus is correlated mostly with its neighbors on the nuclear chart. We express this notion through these reasonably wide prior distributions

$$\begin{aligned} l_Z&\sim \text {Gamma}(10,1), \\ l_N&\sim \text {Gamma}(10,1), \\ \nu _Z&\sim \text {Gamma}(10,1), \\ \nu _N&\sim \text {Gamma}(10,1), \\ \nu _i&\sim \text {Gamma}(10,1), \quad i = 1,2,3,4. \end{aligned}$$

Finally, the majority of the masses in the training dataset of 2000 experimental binding energies fall into the range of [1000, 2000] MeV (1165 of masses precisely). We consider the following prior distribution for the parameter \(\eta _f\) to reflect on the scale of the experimental binding energies:

$$\begin{aligned} \eta _f \sim \text {Gamma}(110,10). \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kejzlar, V., Maiti, T. Variational inference with vine copulas: an efficient approach for Bayesian computer model calibration. Stat Comput 33, 18 (2023). https://doi.org/10.1007/s11222-022-10194-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-022-10194-z