1 Introduction

Bayesian methods have become very popular in many domains of science and engineering over the last years, as they allow for obtaining estimates of parameters of interest as well as comparing competing models in a principled way (Robert and Casella 2004; Luengo et al. 2020). The Bayesian quantities can generally be expressed as integrals involving the posterior density. They can be divided in two main categories: posterior expectations and marginal likelihoods (useful for model selection purposes).

Generally, computational methods are required for the approximation of these integrals, e.g., Monte Carlo algorithms such as Markov chain Monte Carlo (MCMC) and importance sampling (IS) (Robert and Casella 2004; Luengo et al. 2020; Rainforth et al. 2020). Typically, practitioners apply an MCMC or IS algorithm to approximate the posterior \(\bar{\pi }(\textbf{x})\) density by a set of samples, which is used in turn to estimate posterior expectations \(\mathbb {E}_{\bar{\pi }}[f(\textbf{x})]\) of some function \(f(\textbf{x})\). Although it this is a sensible strategy when \(f(\textbf{x})\) is not known in advance and/or we are interested in computing several posterior expectations with respect to different functions, this strategy is suboptimal when the target function \(f(\textbf{x})\) is known in advance since it is completely agnostic to \(f(\textbf{x})\). Incorporating knowledge of \(f(\textbf{x})\) for the estimation of \(\mathbb {E}_{\bar{\pi }}[f(\textbf{x})]\) is known as target-aware Bayesian inference or TABI (Rainforth et al. 2020). TABI proposes to break down the estimation of the posterior expectation into several independent estimation tasks. Specifically, in TABI we require to estimate three marginal likelihoods (or normalizing constants) independently, and then recombine the estimates in order to form the approximation of the posterior expectation. The target function \(f(\textbf{x})\) features in two out of the three marginal likelihoods that have to be estimated. Hence, the TABI framework provides means of improving the estimation of a posterior expectation by making use explicitly of \(f(\textbf{x})\) and leveraging the use of algorithms for marginal likelihood computation.

The computation of the marginal likelihoods is particularly complicated, specially with MCMC outputs (Newton and Raftery 1994; Llorente et al. 2020a, 2021a). IS techniques are the most popular for this task. The basic IS algorithm provides with a straightforward estimation of the marginal likelihood. However, designing a good proposal pdf that approximates the target density is not easy (Llorente et al. 2020a). For this reason, sophisticated and powerful schemes have been specifically designed (Llorente et al. 2020a; Friel and Wyse 2012). The most powerful techniques involve the idea of the so-called tempering of the posterior (Neal 2001; Lartillot and Philippe 2006; Friel and Pettitt 2008). The tempering effect is commonly employed in order to foster the exploration and improve the efficiency of MCMC chains (Neal 1996; Martino et al. 2021). State-of-the-art methods for computing marginal likelihoods consider tempered transitions (i.e. sequence of tempered distributions), such as annealed IS (An-IS) (Neal 2001), sequential Monte Carlo (SMC) (Moral et al. 2006), thermodynamic integration (TI), a.k.a., path sampling (PS) or “power posteriors” (PP) in the statistics literature (Lartillot and Philippe 2006; Friel and Pettitt 2008; Gelman and Meng 1998), and stepping stones (SS) sampling (Xie et al. 2010). An-IS is a special case of SMC framework, PP is a special case of TI/PS, and SS sampling present similar features to An-IS and PP. For more details, see (Llorente et al. 2020a). It is worth to mention that TI has been introduced in the physics literature for computing free-energy differences (Frenkel 1986; Gelman and Meng 1998).

In this work, we extend the TI method, introducing the generalized thermodynamic integration (GTI) technique, for computing posterior expectations of a function \(f(\textbf{x})\). In this sense, GTI is a target-aware algorithm that incorporates information \(f(\textbf{x})\) within the marginal likelihood estimation technique TI. The extension of TI for the computation of \(\mathbb {E}_{\bar{\pi }}\left[ f(\textbf{x})\right]\) is not straightforward, since it requires to build a continuous path between densities with possibly different support. In the case of a geometric path (which is the default choice in practice Friel and Pettitt 2008; Lartillot and Philippe 2006), the generalization of TI needs a careful look at the support of the negative and positive parts of \(f(\textbf{x})\). We discuss the application of GTI for the computation of posterior expectations of generic real-valued function \(f(\textbf{x})\), and also describe the case of vector-valued function \(\textbf{f}(\textbf{x})\). The benefits of GTI are clearly shown by illustrative numerical simulations.

The structure of the paper is the following. In Sect. 2, we introduce the Bayesian inference setting and describe the thermodynamic method for the computation of the marginal likelihood. In Sect. 3, we introduce the GTI procedure. More specifically, we discuss first the case when \(f(\textbf{x})\) is strictly positive or negative in Sect. 3.2, and then consider the general case of a real-valued \(f(\textbf{x})\) in Sect. 3.3. In Sect. 4, we discuss some computational details of the approach, and the application of GTI for vector-valued functions \(\textbf{f}(\textbf{x})\). We show the benefits of GTI in two numerical experiments in Sect. 5. Finally, Sect. 6 contains the conclusions.

2 Background

2.1 Bayesian inference

In many real-world applications, the goal is to infer a parameter of interest given a set of data (Robert and Casella 2004). Let us denote the parameter of interest by \(\textbf{x}\in \mathcal {X}\subseteq \mathbb {R}^{D}\), and let \(\textbf{y}\in \mathbb {R}^{d_y}\) be the observed data. In a Bayesian analysis, all the statistical information is contained in the posterior distribution, which is given by

$$\begin{aligned} \bar{\pi }(\textbf{x})= p(\textbf{x}|\textbf{y})= \frac{\ell (\textbf{y}|\textbf{x}) g(\textbf{x})}{Z(\textbf{y})}, \end{aligned}$$
(1)

where \(\ell (\textbf{y}|\textbf{x})\) is the likelihood function, \(g(\textbf{x})\) is the prior pdf, and \(Z(\textbf{y})\) is the Bayesian model evidence (a.k.a. marginal likelihood). Generally, \(Z(\textbf{y})\) is unknown, so we are able to evaluate the unnormalized target function, \(\pi (\textbf{x})=\ell (\textbf{y}|\textbf{x}) g(\textbf{x})\). The analytical computation of the posterior density \(\bar{\pi }(\textbf{x}) \propto \pi (\textbf{x})\) is often unfeasible, hence numerical approximations are needed. The interest lies in in the approximation of integrals of the form

$$\begin{aligned} I = \mathbb {E}_{\bar{\pi }}\left[ f(\textbf{x})\right] = \int _\mathcal {X}f(\textbf{x})\bar{\pi }(\textbf{x})d\textbf{x}=\frac{1}{Z} \int _\mathcal {X}f(\textbf{x})\pi (\textbf{x})d\textbf{x}, \end{aligned}$$
(2)

where \(f(\textbf{x})\) is some integrable function, and

$$\begin{aligned} Z = \int _\mathcal {X}\pi (\textbf{x})d\textbf{x}. \end{aligned}$$
(3)

The quantity Z is called marginal likelihood (a.k.a., Bayesian evidence) and is useful for model selection purpose (Llorente et al. 2020a). Generally, I are Z analytically intractable and we need to resort to numerical algorithms such as Markov chain Monte Carlo (MCMC) and importance sampling (IS) algorithms. In this work, we consider \(f(\textbf{x})\) is known in advance, and we aim at exploiting it in order to apply thermodynamic integration for computing the posterior expectation I, namely, perform target-aware Bayesian inference (TABI).

2.2 Computation of marginal likelihoods for parameter estimation: The TABI framework

The focus of this work is on parameter estimation, namely, we are interested in the computation of the posterior expectation in Eq. (2) of some function \(f(\textbf{x})\). Recently, the authors in Rainforth et al. (2020) propose a framework called target-aware Bayesian inference (TABI) that aims at improving the Monte Carlo estimation of I when the target \(f(\textbf{x})\) is known in advance. The TABI framework is based on decomposing I into several terms and estimate them separately, leveraging the information in \(f(\textbf{x})\). Let \(f_+(\textbf{x}) = \max \{0,f(\textbf{x})\}\) and \(f_-(\textbf{x}) = \max \{0,-f(\textbf{x})\}\), so \(f(\textbf{x}) = f_+(\textbf{x}) - f_-(\textbf{x})\). Hence, TABI rewrites the posterior expectation I as

$$\begin{aligned} I = \frac{c_+ - c_-}{Z}, \end{aligned}$$
(4)

where \(c_+ = \int f_+(\textbf{x})\pi (\textbf{x})d\textbf{x}\) and \(c_- = \int f_-(\textbf{x})\pi (\textbf{x})d\textbf{x}\). Note that \(c_+\), \(c_-\) and Z are integrals of non-negative functions, namely, they are marginal likelihoods (or normalizing constants). The three unnormalized densities of interest hence are \(\pi (\textbf{x})\), \(f_+(\textbf{x})\pi (\textbf{x})\) and \(f_-(\textbf{x})\pi (\textbf{x})\). Note that two out of the three (unnormalized) densities incorporate information about \(f(\textbf{x})\). The general TABI estimator is then

$$\begin{aligned} \widehat{I}_{\text {TABI}} = \frac{\widehat{c}_+ - \widehat{c}_-}{\widehat{Z}}, \end{aligned}$$
(5)

where \(\widehat{c}_+\), \(\widehat{c}_-\) and \(\widehat{Z}\) are estimates obtained independently. These estimates can be obtained by any marginal likelihood estimation method. The original TABI framework is motivated in the IS context. This is due to the fact that marginal likelihoods (i.e., integrals of non-negative functions) can be estimated arbitrarily well with IS (Llorente et al. 2020a, 2021a; Rainforth et al. 2020). Namely, using the optimal proposals the estimates \(\widehat{c}_+\), \(\widehat{c}_-\) and \(\widehat{Z}\) coincide with the exact values regardless of the sample size. Note that the direct estimation of I via MCMC or IS cannot produce zero-variance estimators for a finite sample size (Robert and Casella 2004).

The TABI framework improves the estimation of I by converting the initial task in that of computing three marginal likelihoods, \(c_+\), \(c_-\) and Z. In Rainforth et al. (2020), the authors test the application of two popular marginal likelihood estimators within TABI, namely, annealed IS (AnIS) Neal (1996) and nested sampling (NS) Skilling (2006), resulting in the target-aware algorithms called target-aware AnIS (TAAnIS) and target-aware NS (TANS). The use of AnIS for independently computing \(c_+\), \(c_-\) and Z represents an improvement over IS. Although the IS estimation of \(c_+\), \(c_-\) and Z can have virtually zero-variance, this is only true when we employ the optimal proposals. In general, the performance of IS depends on how ‘close’ is the proposal pdf to the target density whose normalizing constant we aim to estimate. It can be shown that the variance of IS scales with the Pearson divergence between target and proposal (Llorente et al. 2020a). When this distance is large, then it is more efficient to sample from another proposal that is ‘in between’, i.e., an ‘intermediate’ density. This is the motivation behind many state-of-the-art marginal likelihood estimation methods that employ a sequence of densities bridging an easy-to-work-with proposal and the target density (Llorente et al. 2020a). In this work, we introduce thermodynamic integration (TI) for performing target-aware inference, hence enabling the computation of the posterior expectation of a function \(f(\textbf{x})\). TI is a powerful marginal likelihood estimation technique that also leverages the use of a sequence of distributions, but has several advantages over other methods based on tempered transitions, such as improved stability thanks to working in logarithm scale and applying deterministic quadrature (Friel and Pettitt 2008; Friel and Wyse 2012). TI for computing marginal likelihoods is reviewed in the next section. Then, in Sect. 3 we introduce generalized TI (GTI) for the computation of posterior expectations, that is based on rewriting I as the difference of two ratios of normalizing constants.

2.3 Thermodynamic integration for estimating Z

Thermodynamic integration (TI) is a powerful technique that has been proposed in literature for computing ratios of constants (Frenkel 1986; Gelman and Meng 1998; Lartillot and Philippe 2006). Here, for simplicity, we focus on the approximation of just one constant, the marginal likelihood Z. More precisely, TI produces an estimation of \(\log Z\). Let us consider a family of (generally unnormalized) densities

$$\begin{aligned} \pi (\textbf{x}|\beta ), \quad \beta \in [0,1], \end{aligned}$$
(6)

such that \(\pi (\textbf{x}|0)=g(\textbf{x})\) is the prior and \(\pi (\textbf{x}|1)=\pi (\textbf{x})\) is the unnormalized posterior distribution. An example is the so-called geometric path \(\pi (\textbf{x}|\beta ) = g(\textbf{x})^{1-\beta }\pi (\textbf{x})^\beta\), with \(\beta \in [0,1]\) (Neal 1993). The corresponding normalized densities in the family are denoted as

$$\begin{aligned} \bar{\pi }(\textbf{x}|\beta ) = \frac{\pi (\textbf{x}|\beta )}{c(\beta )},\quad c(\beta ) = \int _{\mathcal {X}} \pi (\textbf{x}|\beta )d\textbf{x}. \end{aligned}$$
(7)

Then, the main TI identity is (Llorente et al. 2020a)

$$\begin{aligned} \log Z&=\int _0^1 \left[ \int _\mathcal {X} \frac{\partial \log \pi (\textbf{x}|\beta )}{\partial \beta }\bar{\pi }(\textbf{x}|\beta ) d\textbf{x}\right] d\beta \nonumber \\&= \int _0^1 \mathbb {E}_{\bar{\pi }(\textbf{x}|\beta )}\left[ \frac{\partial \log \pi (\textbf{x}|\beta )}{\partial \beta }\right] d\beta , \end{aligned}$$
(8)

where the expectation is with respect to (w.r.t.) \(\bar{\pi }(\textbf{x}|\beta ) = \frac{\pi (\textbf{x}|\beta )}{c(\beta )}\).

TI estimator Using an ordered sequence of discrete values \(\{\beta _i\}_{i=1}^N\) (e.g. \(\beta _i\)’s uniformly in [0, 1]), one can approximate the integral in Eq. (8) via quadrature w.r.t. \(\beta\), and then approximate the inner expectation with a Monte Carlo estimator using samples from \(\bar{\pi }(\textbf{x}|\beta _i)\) for \(i=1,\dots ,N\). Namely, defining \(U(\textbf{x}) = \frac{\partial \log \pi (\textbf{x}|\beta )}{\partial \beta }\) and \(E(\beta ) =\mathbb {E}_{\bar{\pi }(\textbf{x}|\beta )}\left[ U(\textbf{x})\right]\), the resulting estimator of Eq. (8) is given by

$$\begin{aligned} \log Z \approx \sum _{i=1}^N(\beta _{i+1} - \beta _{i})\widehat{E}_{i}, \end{aligned}$$
(9)

where

$$\begin{aligned} \widehat{E}_i = \frac{1}{N}\sum _{j=1}^N U(\textbf{x}_{i,j}),\quad \textbf{x}_{i,j} \sim p(\textbf{x}|\beta _i). \end{aligned}$$
(10)

Note that we used the simplest quadrature rule in Eq. (9), but others can be used such as Trapezoidal, Simpson’s, etc. (Friel and Pettitt 2008; Lartillot and Philippe 2006).

The power posteriors (PP) method Let us consider the specific case of a geometric path between prior \(g(\textbf{x})\) and unnomalized posterior \(\pi (\textbf{x})\),

$$\begin{aligned} \pi (\textbf{x}|\beta ) = g(\textbf{x})^{1-\beta }\pi (\textbf{x})^\beta&=g(\textbf{x}) \left[ \frac{\pi (\textbf{x})}{g(\textbf{x})}\right] ^\beta , \end{aligned}$$
(11)
$$\begin{aligned}&=g(\textbf{x}) \ell (\textbf{y}|\textbf{x})^\beta , \quad \beta \in [0,1], \end{aligned}$$
(12)

where we have used \(\pi (\textbf{x})=\ell (\textbf{y}|\textbf{x}) g(\textbf{x})\). Note that, in this scenario,Footnote 1

$$\begin{aligned} \frac{\partial \log \pi (\textbf{x}|\beta )}{\partial \beta } =\log \ell (\textbf{y}|\textbf{x}). \end{aligned}$$
(13)

Hence, the identity in Eq. (8) can be also written as

$$\begin{aligned} \log Z = \int _0^1 \int _\mathcal {X} \log \ell (\textbf{y}|\textbf{x}) \bar{\pi }(\textbf{x}|\beta )d\textbf{x}d\beta = \int _0^1\mathbb {E}_{\bar{\pi }(\textbf{x}|\beta )}[\log \ell (\textbf{y}|\textbf{x})]d\beta , \end{aligned}$$
(14)

The power posteriors (PP) method is a special case of TI which considers (a) the geometric path and (b) trapezoidal quadrature rule for integrating w.r.t. the variable \(\beta\) (Friel and Pettitt 2008). Namely, letting \(\beta _1=0< \cdots < \beta _N = 1\) denote a fixed temperature schedule, an approximation of Eq. (14) can be obtained via the trapezoidal rule

$$\begin{aligned} \log Z \approx \sum _{i=1}^{N-1} (\beta _{i+1}-\beta _i)\frac{\mathbb {E}_{\bar{\pi }(\textbf{x}|\beta _{i+1})}[\log \ell (\textbf{y}|\textbf{x})]+\mathbb {E}_{\bar{\pi }(\textbf{x}|\beta _{i})}[\log \ell (\textbf{y}|\textbf{x})]}{2}, \end{aligned}$$
(15)

where the the expectations are generally substituted with MCMC estimates as in Eq. (10). TI and PP are popular methods for computing marginal likelihoods (even in high-dimensional spaces) due to their reliability. Theoretical properties are studied in Gelman and Meng (1998), Calderhead and Girolami (2009), and empirical validation is provided in several works, e.g., (Friel and Pettitt 2008; Lartillot and Philippe 2006). Different extensions and improvements on the method have also been proposed (Oates et al. 2016; Friel et al. 2014; Calderhead and Girolami 2009).

Remark 1

Note that, in order to ensure that the integrand in Eq. (14) is finite, so that the estimator in Eq. (15) can be applied, we need that (a) \(\ell (\textbf{y}|\textbf{x})\) is strictly positive everywhere, or (b) \(\ell (\textbf{y}|\textbf{x})=0\) only whenever \(g(\textbf{x})=0\) (i.e., they have the same support).

Goal We have seen that the TI method has been proposed for computing \(\log Z\) (or log-ratios of constants). Our goal is to extend the TI scheme in order to perform target-aware Bayesian inference. Namely, we generalize the idea of these methods (thermodynamic integration, power posteriors, etc.) to the computation of posterior expectations for a given \(f(\textbf{x})\).

3 Generalized TI (GTI) for Bayesian inference

In this section, we extend the TI method for computing the posterior expectation of a given \(f(\textbf{x})\). As in TABI, the basic idea, as we show below, is the formulation I in terms of ratios of normalizing constants. First, we consider the case \(f(\textbf{x})> 0\) for all \(\textbf{x}\) and then the case of a generic real-valued \(f(\textbf{x})\).

3.1 General approach

In order to apply TI, we need to formulate the posterior expectation I as a ratio of two constants. Since \(f(\textbf{x})\) can be positive or negative, let us consider the positive and negative parts, \(f_+(\textbf{x}) = \max (0,f(\textbf{x}))\) and \(f_-(\textbf{x}) = \min (0,-f(\textbf{x}))\), such that \(f(\textbf{x}) = f_+(\textbf{x}) - f_-(\textbf{x})\), where \(f_+(\textbf{x})\) and \(f_-(\textbf{x})\) are non-negative functions. Similarly to Eq. (4), we rewrite the integral I in terms of ratios of constants,

$$\begin{aligned} I = \dfrac{ \int _{\mathcal {X}} f_+(\textbf{x})\pi (\textbf{x})d\textbf{x}}{\int _{\mathcal {X}} \pi (\textbf{x})d\textbf{x}} - \dfrac{\int _{\mathcal {X}} f_-(\textbf{x})\pi (\textbf{x})d\textbf{x}}{\int _{\mathcal {X}} \pi (\textbf{x})d\textbf{x}} =\frac{c_+}{Z} - \frac{c_-}{Z}, \end{aligned}$$
(16)

where \(c_+= \int _{\mathcal {X}} \varphi _+(\textbf{x})d\textbf{x}\) are \(c_-= \int _{\mathcal {X}} \varphi _-(\textbf{x})d\textbf{x}\) are respectively the normalizing constants of \(\varphi _+(\textbf{x}) = f_+(\textbf{x})\pi (\textbf{x})\), and \(\varphi _-(\textbf{x}) = f_-(\textbf{x})\pi (\textbf{x})\).

Proposed scheme Denoting \(\eta _+ = \log \frac{c_+}{Z}\) and \(\eta _- = \log \frac{c_-}{Z}\) in the case of a generic \(f(\textbf{x})\), we propose to obtain estimates of these quantities using thermodynamic integration. Then, we can obtain the final estimator as

$$\begin{aligned} \widehat{I} = \exp \left( \widehat{\eta }_+\right) - \exp \left( \widehat{\eta }_-\right) . \end{aligned}$$
(17)

In the next section, we give details on how to compute \(\widehat{\eta }_+\), \(\widehat{\eta }_-\) by using a generalized TI method.

Remark 2

Note that in Eq. (16) we express I as the difference of two ratios, and we propose GTI to estimate them directly as per Eq. (17) Hence, differently from Eq. (5), we do not aim at estimating each constant separately. This amounts to bridging the posterior with the function-scaled posterior, as we show below.

3.2 GTI for strictly positive or strictly negative \(f(\textbf{x})\)

Let us consider the scenario where \(f(\textbf{x})>0\) for all \(\textbf{x}\in \mathcal {X}\). In this scenario, we can set

$$\begin{aligned} f_+(\textbf{x})=f(\textbf{x})>0,\quad \text{ and } \quad \widehat{I} = \exp \left( \widehat{\eta }_+\right) . \end{aligned}$$

Note that, with respect to Eq. (17), we only consider the first term. We link the unnormalized pdfs \(\pi (\textbf{x})\) and \(\varphi _+(\textbf{x})=f_+(\textbf{x})\pi (\textbf{x})\) with a geometric path, by defining

$$\begin{aligned} \bar{\varphi }_+(\textbf{x}|\beta ) \propto \varphi _+(\textbf{x}|\beta )= f_+(\textbf{x})^{\beta }\pi (\textbf{x}),\quad \beta \in [0,1]. \end{aligned}$$
(18)

Hence, we have \(\bar{\varphi }_+(\textbf{x}|0)=\bar{\pi }(\textbf{x})\) and \(\bar{\varphi }_+(\textbf{x}|1) = \frac{1}{c_+}f_+(\textbf{x})\pi (\textbf{x})\) where \(c_+=\int _{\mathcal {X}} f_+(\textbf{x})\pi (\textbf{x}) d\textbf{x}\). The Eq. (8) is thus

$$\begin{aligned} \eta _+ = \int _0^1 \mathbb {E}_{\bar{\varphi }_+(\textbf{x}|\beta )}[\log f_+(\textbf{x})]d\beta . \end{aligned}$$
(19)

Letting \(\beta _1=0< \cdots < \beta _N = 1\) denote a fixed temperature schedule, the estimator (using the Trapezoidal rule) is thus

$$\begin{aligned} \widehat{\eta }_+ = \sum _{i=1}^{N-1} (\beta _{i+1}-\beta _i)\frac{\mathbb {E}_{\bar{\varphi }_+(\textbf{x}|\beta _{i+1})}[\log f_+(\textbf{x})]+\mathbb {E}_{\bar{\varphi }_+(\textbf{x}|\beta _i)}[\log f_+(\textbf{x})]}{2}, \end{aligned}$$
(20)

where we use MCMC estimates for the terms

$$\begin{aligned} \mathbb {E}_{\bar{\varphi }_+(\textbf{x}|\beta _i)}[\log f_+(\textbf{x})] = \int _\mathcal {X}\log f_+(\textbf{x})\bar{\varphi }_+(\textbf{x}|\beta _i)d\textbf{x}\approx \frac{1}{M}\sum _{m=1}^M \log f_+(\textbf{x}_m),\quad \textbf{x}_m \sim \bar{\varphi }_+(\textbf{x}|\beta _i), \end{aligned}$$
(21)

for \(i=1,\dots ,N\). The case of a strictly negative \(f(\textbf{x})\), i.e., \(f_-(\textbf{x})=-f(\textbf{x})\), is equivalent.

Function \(f(\textbf{x})\) with zeros with null measure So far, we have considered strictly positive or strictly negative \(f(\textbf{x})\). This case could be extended to a positive (or negative) \(f(\textbf{x})\) with zeros in a null measure set. Indeed, note that the identity in Eq. (19) requires that \(\mathbb {E}_{\bar{\varphi }(\textbf{x}|\beta )}[\log f(\textbf{x})]<\infty\) for all \(\beta \in [0,1]\). If the zeros of \(f(\textbf{x})\) has null measure and the improper integral converges, the procedure above is also suitable. Table 1 summarizes the Generalized TI (GTI) steps for \(f(\textbf{x})\) that are strictly positive. We discuss other scenarios in the next section.

Table 1 GTI for strictly positive \(f(\textbf{x})\)

3.3 GTI for generic \(f(\textbf{x})\)

Using the results from previous section, we apply GTI to a real-valued function \(f(\textbf{x})\), namely, it can be positive and negative, as well as having zero-valued regions with a non-null measure. Here, we desire to connect the posterior \(\pi (\textbf{x})\) with the \(f_+(\textbf{x})\pi (\textbf{x})\) and \(f_-(\textbf{x})\pi (\textbf{x})\) with two continuous paths. However, a requirement for the validity of the approach is that \(\pi (\textbf{x})\) is zero whenever \(f_+(\textbf{x})\pi (\textbf{x})\) or \(f_-(\textbf{x})\pi (\textbf{x})\) is zero, which does not generally fulfills as \(f(\textbf{x})\) can have a smaller support than \(\pi (\textbf{x})\). This fact enforces the computation of correction factors to keep the validity of the approach. More details can be found in Appendix B. Therefore, we need to define the unnormalized restricted posteriors densities

$$\begin{aligned} \pi _+(\textbf{x}) = \pi (\textbf{x})\mathbbm {1}_{\mathcal {X}_+}(\textbf{x}),\quad \text {and} \quad \pi _-(\textbf{x}) = \pi (\textbf{x})\mathbbm {1}_{\mathcal {X}_-}(\textbf{x}), \end{aligned}$$
(24)

where \(\mathbbm {1}_{\mathcal {X}_+}(\textbf{x})\) is the indicator function over the set \(\mathcal {X}_+ = \{\textbf{x}\in \mathcal {X}: f_+(\textbf{x})>0\}\) and \(\mathbbm {1}_{\mathcal {X}_-}(\textbf{x})\) is the indicator function over the set \(\mathcal {X}_- = \{\textbf{x}\in \mathcal {X}: f_-(\textbf{x})>0\}\). The idea is to connect with a path \(\pi _+(\textbf{x})\) and \(f_+(\textbf{x})\pi (\textbf{x})\), and \(\pi _-(\textbf{x})\) with \(f_-(\textbf{x})\pi (\textbf{x})\), by the densities

$$\begin{aligned} \bar{\varphi }_+(\textbf{x}|\beta ) \propto f_+(\textbf{x})^{\beta }\pi _+(\textbf{x}), \qquad \bar{\varphi }_-(\textbf{x}|\beta ) \propto f_-(\textbf{x})^{\beta }\pi _-(\textbf{x}), \quad \beta \in [0,1]. \end{aligned}$$

Note that it is equivalent to write \(f_{\pm }(\textbf{x})\pi _{\pm }(\textbf{x}) = f_{\pm }(\textbf{x})\pi (\textbf{x})\), since \(\pi _{\pm }(\textbf{x}) = \pi (\textbf{x})\) whenever \(f_{\pm }(\textbf{x})>0\), and they only differ when \(f_{\pm }(\textbf{x}) = 0\), in which case we also have \(f_{\pm }(\textbf{x})\pi _{\pm }(\textbf{x}) = f_{\pm }(\textbf{x})\pi (\textbf{x}) = 0\). Defining also

$$\begin{aligned} Z_+=\int _{\mathcal {X}}\pi _+(\textbf{x}) d\textbf{x}, \quad Z_-=\int _{\mathcal {X}}\pi _-(\textbf{x}) d\textbf{x}, \end{aligned}$$
(25)

and recalling

$$\begin{aligned} c_+=\int _{\mathcal {X}} f_+(\textbf{x})\pi (\textbf{x})d\textbf{x}, \quad c_-=\int _{\mathcal {X}} f_-(\textbf{x})\pi (\textbf{x})d\textbf{x}, \end{aligned}$$
(26)

the idea is to apply separately TI for approximating \(\eta ^\text {res}_+=\log \frac{c_+}{Z_+}\) and \(\eta ^\text {res}_-=\log \frac{c_-}{Z_-}\), where we denote with res to account that we consider the restricted components \(Z_+\) and \(Z_-\). Hence, two correction factors \(R_+\) and \(R_{-}\) are also required, in order to obtain \(R_+\exp \left( {\eta ^\text {res}_+}\right) =\frac{c_+}{Z}\) and \(R_{-}\exp \left( {\eta ^\text {res}_-}\right) =\frac{c_-}{Z}\). Below, we also show how to estimate the correction factors at a final stage and combine them to the estimations of \(\eta ^\text {res}_+\) and \(\eta ^\text {res}_-\). We can approximate the quantities

$$\begin{aligned} \eta ^\text {res}_+&= \log \frac{c_+}{Z_+} = \int _0^1 \mathbb {E}_{\bar{\varphi }_+(\textbf{x}|\beta )}[\log f_+(\textbf{x})]d\beta , \\ \eta ^\text {res}_-&= \log \frac{c_-}{Z_-} = \int _0^1 \mathbb {E}_{\bar{\varphi }_-(\textbf{x}|\beta )}[\log f_+(\textbf{x})]d\beta , \end{aligned}$$

using the estimators

$$\begin{aligned} \widehat{\eta }^\text {res}_+&= \sum _{i=1}^{N-1}(\beta _{i+1} - \beta _{i})\frac{\widehat{E}^+_{i+1} + \widehat{E}^+_i}{2}, \end{aligned}$$
(27)
$$\begin{aligned} \widehat{\eta }^\text {res}_-&= \sum _{i=1}^{N-1}(\beta _{i+1} - \beta _{i})\frac{\widehat{E}^-_{i+1} + \widehat{E}^-_i}{2}, \end{aligned}$$
(28)

where

$$\begin{aligned} \widehat{E}^+_i&= \frac{1}{M}\sum _{m=1}^{M} \log f_+(\textbf{x}_{i,m}),\quad \textbf{x}_{i,m} \sim \bar{\varphi }_+(\textbf{x}|\beta _i), \end{aligned}$$
(29)
$$\begin{aligned} \widehat{E}^-_i&= \frac{1}{M}\sum _{m=1}^{M} \log f_-(\textbf{v}_{i,m}),\quad \textbf{v}_{i,m} \sim \bar{\varphi }_-(\textbf{x}|\beta _i). \end{aligned}$$
(30)

When comparing the estimators in Eqs. (27)–(28) with respect to the GTI estimator in Eq. (20), here the only difference is that the expectation at \(\beta =0\) is approximated by using samples from the restricted posteriors, \(\pi _+(\textbf{x})\) and \(\pi _-(\textbf{x})\), instead of the posterior \(\pi (\textbf{x})\).Footnote 2 To obtain an approximation of the true quantities of interest \(\eta _+\), \(\eta _-\) (instead of \(\eta ^\text {res}_+\) and \(\eta ^\text {res}_-\)), we compute two correction factors from a single set of K samples from \(\bar{\pi }(\textbf{x})\) as follows

$$\begin{aligned} \widehat{R}_+&= \frac{1}{K}\sum _{i=1}^K\mathbbm {1}_{\mathcal {X}_+}(\textbf{z}_i)&\approx \frac{Z_+}{Z}, \end{aligned}$$
(31)
$$\begin{aligned} \widehat{R}_-&= \frac{1}{K}\sum _{i=1}^K\mathbbm {1}_{\mathcal {X}_-}(\textbf{z}_i)&\approx \frac{Z_-}{Z}, \quad \textbf{z}_i \sim \bar{\pi }(\textbf{x}), \end{aligned}$$
(32)

where \(\mathbbm {1}_{\mathcal {X}_+}(\textbf{x}_i)=1\) if \(f_+(\textbf{x}_i)>0\), \(\mathbbm {1}_{\mathcal {X}_-}(\textbf{x}_i)=1\) if \(f_-(\textbf{x}_i)>0\), and both zero otherwise. The final estimator of I is

$$\begin{aligned} \widehat{I} = \widehat{R}_+\exp \left( \widehat{\eta }^\text {res}_+\right) -\widehat{R}_-\exp \left( \widehat{\eta }^\text {res}_-\right) , \end{aligned}$$
(33)

including the two correction factors. Table 2 provides all the details of GTI in this scenario.

Table 2 GTI for generic functions \(f(\textbf{x})\)

Remark 3

Standard TI as special case of GTI: Note that the GTI scheme contains TI as a special case if we set \(f(\textbf{x})=\ell (\textbf{y}|\textbf{x})\) (i.e., the likelihood function) and let the prior \(g(\textbf{x})\) play the role of \(\pi (\textbf{x})\). Since the likelihood \(\ell (\textbf{y}|\textbf{x})\) is non-negative we have \(\eta _-=-\infty\) (then, \(\exp \left( \eta _-\right) =0\)), hence we only have to consider the estimation of \(\eta _+\). Moreover, if \(\ell (\textbf{y}|\textbf{x})\) is strictly positive we do not need to compute the correction factor.

Remark 4

The GTI procedure, described above, also allows the application of the standard TI for computing marginal likelihoods when the likelihood function is not strictly positive, by applying a correction factor in the same fashion (in this case, considering a restricted prior pdf).

4 Computational considerations and other extensions

In this section, we discuss computational details, different scenarios and further extensions, that are listed below.

4.1 Acceleration schemes

In order to apply GTI, the user must set N and M, so that the total number of samples/evaluations of \(f(\textbf{x})\) in Table 1 is \(E=NM\). The evaluations of \(f(\textbf{x})\) in Table 2 are \(E=2NM + K\). We can reduce the cost of algorithm in Table 2 to \(E=NM + K\) with an acceleration scheme. Instead of running separate MCMC algorithms for \(\bar{\varphi }_+(\textbf{x}|\beta ) \propto f_+(\textbf{x})^\beta \pi _+(\textbf{x})\) and \(\bar{\varphi }_-(\textbf{x}|\beta ) \propto f_-(\textbf{x})^\beta \pi _-(\textbf{x})\), we use a single run targeting

$$\begin{aligned} \bar{\varphi }_\text {abs}(\textbf{x}|\beta ) \propto |f(\textbf{x})|^\beta \pi (\textbf{x})\mathbbm {1}\left( f(\textbf{x})\ne 0\right) . \end{aligned}$$
(34)

We can obtain two MCMC samples, one from \(\bar{\varphi }_+(\textbf{x}|\beta )\) and one from \(\bar{\varphi }_-(\textbf{x}|\beta )\), by separating the sample into two: samples with positive value of \(f(\textbf{x})\), and samples with negative value of \(f(\textbf{x})\), respectively. The procedure can be repeated until obtaining the desired number of samples from each density, \(\bar{\varphi }_+(\textbf{x}|\beta )\) and \(\bar{\varphi }_-(\textbf{x}|\beta )\).

Moreover, note that in Table 2 we need to draw samples from \(\pi _+(\textbf{x})\), \(\pi _-(\textbf{x})\) and \(\pi (\textbf{x})\). Instead of sampling each one separately, we can use the following procedure. Obtain a set of samples from \(\pi (\textbf{x})\) and then apply rejection sampling (i.e. discard samples with \(f_\pm (\textbf{x})=0\)) in order to obtain samples from \(\pi _\pm (\textbf{x})\). Combining this idea with the acceleration scheme above reduces the cost of Table 2 to \(E=MN\).

4.2 Parallelization

Note that steps 1 and 2 in Table 1 and Table 2 are amenable to parallelization. In other words, those steps need not be performed sequentially but can be done using embarrassingly parallel MCMC chains (i.e. with no communication among N, or 2N, workers). Only step 3 requires communicating to a central node and combining the estimates. With this procedure, the number of evaluations E is the same but the computation time is reduced by \(\frac{1}{N}\) (or \(\frac{1}{2N}\)) factor. On the other hand, population MCMC techniques can be used, but parallelization speedups are lower since communication among workers occurs every so often, in order to foster the exploration of the chains (Martino et al. 2016; Calderhead and Girolami 2009).

4.3 Vector-valued functions \({f}(\textbf{x})\)

In Bayesian inference, one is often interested in computing moments of the posterior, i.e.,

$$\begin{aligned} \textbf{I}=\int _{\mathcal {X}} \textbf{x}^{\alpha } \bar{\pi }(\textbf{x}) d\textbf{x}, \qquad \alpha \ge 1. \end{aligned}$$
(35)

In this case \(\textbf{I}\) is a vector and \(\textbf{f}(\textbf{x})=\textbf{x}^{\alpha }\). When \(\alpha =1\), \(\textbf{I}\) represents the minimum mean square error (MMSE) estimator. More generally, we can have a vector-valued function,

$$\begin{aligned} \textbf{f}(\textbf{x}) = \left[ f_1(\textbf{x}),\dots ,f_{d_f}(\textbf{x})\right] ^\top : \mathcal {X}\rightarrow \mathbb {R}^{d_f}, \end{aligned}$$

hence the integral of interest is a vector \(\textbf{I}=[I_1,\dots ,I_{d_f}]^\top\) where \(I_i = \int _\mathcal {X}f_i(\textbf{x})\bar{\pi }(\textbf{x})d\textbf{x}\). In this scenario, we need to apply the GTI scheme to each component of \(\textbf{I}\) separately, obtaining estimates \(\widehat{I}_i\) of the form in Eq. (33).

4.4 TI within the TABI framework: TATI

We have seen that we can apply GTI to compute the posterior expectation of a generic \(f(\textbf{x})\), that can be positive, negative and have zero-valued regions. For doing this, we connected with a tempered path, \(\pi _+(\textbf{x})\) and \(\pi _-(\textbf{x})\), to \(f_+(\textbf{x})\pi (\textbf{x})\) and \(f_-(\textbf{x})\pi (\textbf{x})\) respectively and then apply correction factors.

An alternative procedure is to use the TABI identity in Eq. (4), rather than Eq. (16) and use reference distributions for computing separately \(c_+\), \(c_-\) and Z in Eq. (16). This target-aware TI (TATI) differ from GTI in that we need to apply TI three times, and bridge three reference distributions to the target densities \(f_+(\textbf{x})\pi (\textbf{x})\), \(f_-(\textbf{x})\pi (\textbf{x})\) and \(\pi (\textbf{x})\). Let us define as

$$\begin{aligned} p^\text {ref}_i(\textbf{x}), \qquad i=1,2,3, \end{aligned}$$

three unnormalized reference densities with normalizing constants,

$$\begin{aligned} Z^\text {ref}_i=\int _{\mathcal {X}} p^\text {ref}_i(\textbf{x}) d\textbf{x}, \qquad i=1,2,3. \end{aligned}$$

Then, the idea is to apply TI for obtaining estimates of \(\log \frac{c_+}{Z^\text {ref}_1}\), \(\log \frac{c_-}{Z^\text {ref}_2}\) and \(\log \frac{Z}{Z^\text {ref}_3}\). A requirement is that \(p^\text {ref}_1(\textbf{x})\) is zero where \(f_+(\textbf{x})\pi (\textbf{x})\) is zero, \(p^\text {ref}_2(\textbf{x})\) is zero where \(f_-(\textbf{x})\pi (\textbf{x})\) is zero, and \(p^\text {ref}_3(\textbf{x})\) is zero where \(\pi (\textbf{x})\) is zero. Namely, we need to be able to build a continuous path between the reference distributions and the corresponding unnormalized pdf of interest. With this procedure, we do not need to apply correction factors, but we just need to apply the algorithm in Table 1 three times. The performance of TATI is expected to be better than GTI if we are able to choose three reference distributions that are ‘closer’ to the corresponding target densities, than what \(\pi (\textbf{x})\) is to \(f_+(\textbf{x})\pi (\textbf{x})\) or \(f_-(\textbf{x})\pi (\textbf{x})\) (Llorente et al. 2020a). For instance, we can obtain the reference pdfs by building nonparametric approximations to each target density (Llorente et al. 2021b).

5 Numerical experiments

In this section, we illustrate the performance of the proposed scheme in two numerical experiments which consider different kind of densities \(\bar{\pi }\) with different features and different dimensions, and also different function \(f(\textbf{x})\). In the first example, \(f(\textbf{x})\) is strictly positive so we apply the algorithm described in Table 1. In the second example, we consider \(f(\textbf{x})\) to have zero-valued regions, and hence we apply the algorithm in Table 2. Notice we consider the same setup as in Rainforth et al. (2020) in order to compare with respect to instances of TABI algorithms.

5.1 First numerical analysis

Let us consider the following Gaussian model (Rainforth et al. 2020)

$$\begin{aligned} g(\textbf{x}) = \mathcal {N}(\textbf{x}|\textbf{0}_D, \textbf{I}_D),\quad \ell (\textbf{y}|\textbf{x}) = \mathcal {N}\left( -\frac{y}{\sqrt{D}}\textbf{1}_D \Big | \textbf{x}, \textbf{I}_D\right) , \quad f(\textbf{x}) = \mathcal {N}\left( \textbf{x}\Big | \frac{y}{\sqrt{D}}\textbf{1}_D, \frac{1}{2}\textbf{I}_D\right) , \end{aligned}$$
(36)

where D is the dimensionality, \(\textbf{I}_D\) is the identity matrix, \(\textbf{0}_D\) and \(\textbf{1}_D\) are D-vectors containing only zeros or ones respectively, and y is a scalar value that represents the radial distance of the observation \(\textbf{y}=-\frac{y}{\sqrt{D}}\textbf{1}_D\) to the origin. We are interested in the estimation of \(I=\int _{\mathcal {X}} f(\textbf{x})\bar{\pi }(\textbf{x})d\textbf{x}\). Thus this problem consists in computing the posterior predictive density, under the above model, at the point \(\frac{y}{\sqrt{D}}\textbf{1}_D\). In this toy example, the posterior and the function-scaled posteriors can be obtained in closed-form, that is,

$$\begin{aligned} \bar{\varphi }(\textbf{x}|\beta )= \mathcal {N}\left( \textbf{x}\Big | \frac{2\beta - 1}{2\beta + 2}\frac{y}{\sqrt{D}}\textbf{1}_D, \frac{1}{2\beta +2}\textbf{I}_D\right) ,\quad \beta \in [0,1], \end{aligned}$$
(37)

and

$$\begin{aligned} \bar{\pi }(\textbf{x}) = \mathcal {N}\left( \textbf{x}\Big | -\frac{1}{2}\frac{y}{\sqrt{D}}\textbf{1}_D, \frac{1}{2}\textbf{I}_D\right) . \end{aligned}$$
(38)

The ground-truth is known, and can be written as a Gaussian density evaluated at \(\frac{y}{\sqrt{D}}\textbf{1}_D\), more specifically, \(I = \mathcal {N}\left( \frac{y}{\sqrt{D}}\textbf{1}_D \Big | -\frac{1}{2}\frac{y}{\sqrt{D}}\textbf{1}_D, \textbf{I}_D \right)\).

We test the values \(y \in \{2,3.5,5\}\) and \(D \in \{10,25,50\}\). Note that, as we increase y, the posterior \(\bar{\pi }(\textbf{x})\) and the density \(\bar{\varphi }(\textbf{x}|1)\propto f(\textbf{x})\pi (\textbf{x})\) become further apart.

5.1.1 Comparison with other target-aware approaches

We aim to compare GTI with a MCMC baseline and other target-aware algorithms, that make use of \(f(\textbf{x})\). Specifically, we compare against two extreme cases of self-normalized IS (SNIS) estimators \(\widehat{I}_{\text {SNIS}} =\frac{1}{\sum _{j=1}^{M_\text {tot}}w_j} \sum _{i=1}^{M_\text {tot}}w_i f(\textbf{x}_i)\), where \(\textbf{x}_i\sim q(\textbf{x})\) and \(w_i = \frac{\pi (\textbf{x}_i)}{q(\textbf{x}_i)}\) is the IS weight. Namely, (1) SNIS using samples from the posterior (SNIS1), i.e., \(q(\textbf{x}) = \bar{\pi }(\textbf{x})\) (hence, SNIS1 coincides with MCMC), and (2) SNIS using samples from \(q(\textbf{x}) = \bar{\varphi }(\textbf{x}|1)\propto f(\textbf{x})\pi (\textbf{x})\) (SNIS2), which corresponds to setting \(\beta =1\) in Eq. (37). These choices are optimal for estimating, respectively, the denominator and the numerator of the right hand side of Eq. (2) (Robert and Casella 2004). Note that SNIS2 can be considered as a first “primitive” target-aware algorithm, since it employs samples from \(\bar{\varphi }(\textbf{x}|1) \propto f(\textbf{x})\pi (\textbf{x})\).

A second target-aware approach can be obtained by recycling the samples generated in SNIS1 and SNIS2 (that is, from \(\bar{\pi }(\textbf{x})\) and \(\bar{\varphi }(\textbf{x}|1)\)), and is called (3) bridge sampling (BS) (Llorente et al. 2020a). This estimator can be viewed as if we use the mixture of \(\bar{\pi }(\textbf{x})\) and \(\bar{\varphi }(\textbf{x}|1)\) as proposal pdf. More details about (optimal) BS can be found in Appendix A. Finally, we also aim to compare against the target-aware versions of two popular marginal likelihood estimators, namely, (4) target-aware annealed IS (TAAnIS) and (5) target-aware nested sampling (TANS), that also make use of the \(f(\textbf{x})\) (Rainforth et al. 2020).

In order to keep the comparisons fair, we consider the same number of likelihood evaluations E in all of the methods. Note that evaluating the likelihood is usually the most costly step in many real-world scenarios. Hence, in SNIS1 and SNIS2 we draw \(M_\text {tot} = E\) samples from \(\bar{\pi }(\textbf{x})\) and \(\bar{\varphi }(\textbf{x}|1)\), respectively, via MCMC; BS employ \(\frac{E}{2}\) samples from \(\bar{\pi }(\textbf{x})\) and \(\frac{E}{2}\) from \(\bar{\varphi }(\textbf{x}|1)\). For TAAnIS and TANS we use the same parameters as in Rainforth et al. (2020). Namely, for TAAnIS we employ \(N=200\) intermediate distributions, with \(n_\text {MCMC}= 5\) iterations of Metropolis-Hastings (MH) algorithm, which allow for a total number of particles \(n_\text {par} = \lfloor \frac{E}{(N-1)n_\text {MCMC} + N-1} \rfloor\), where half of the particles are used to estimate the numerator, and the other half to estimate the denominator on the right-hand side of Eq. (2). For TANS, we employ \(n_\text {MCMC}= 20\) iterations of MH and \(n_\text {par} = \lfloor \frac{E}{1 + \lambda n_\text {MCMC}} \rfloor\) particles, where \(\lambda = 250\) and \(T = \lambda n_\text {par}\) iterations. Again, TANS employs one half of the particles for estimating the numerator and the other half for the denominator. Finally, in GTI we set also \(N=200\), hence we draw \(M = \lfloor \frac{E}{N}\rfloor\) samples from each \(\bar{\varphi }(\textbf{x}|\beta _i),\ i=1,\dots ,N\). Note that we are setting the same number of intermediate distributions in GTI and TAAnIS, however, the paths are not identical, since TAAnIS aims at bridging the prior with \(\pi (\textbf{x})\) and \(\varphi (\textbf{x}|1)\), while GTI directly bridges \(\pi (\textbf{x})\) with \(\varphi (\textbf{x}|1)\). All the iterations of the MH algorithm use a Gaussian random-walk proposal with covariance matrix equal to \(\Sigma = 0.1225 \textbf{I}\), \(\Sigma = 0.04 \textbf{I}\) and \(\Sigma = 0.01 \textbf{I}\), for \(D=10,25,50\) respectively. Following (Rainforth et al. 2020), for TANS we use instead \(\Sigma = \textbf{I}\), \(\Sigma = 0.09 \textbf{I}\) and \(\Sigma = 0.01 \textbf{I}\). For choosing \(\beta _i\) in TAAnIS and GTI, we use the powered fraction schedule, \(\beta _i = \left( \frac{i-1}{N-1}\right) ^5\) for \(i=1,\dots ,N\) (Friel and Pettitt 2008; Xie et al. 2010).

5.1.2 Results

The results are given in Fig. 1, which show, for each pair (yD), the median relative square error along with the 25% and 75% quantiles (over 100 simulations) versus the number of total likelihood evaluations E, up to \(E=10^7\). We see that GTI is the first or second best overall, in terms of relative squared error for all (yD). In fact, the performance of GTI seems rather insensitive to increasing the dimension D and y. We see that for low dimension and when the distance between \(\bar{\pi }(\textbf{x})\) and \(\bar{\varphi }(\textbf{x}|1)\) is small (i.e. \(y=2\) and \(D=10\)), the target-aware algorithms do not produce large gains with respect to the MCMC baseline (SNIS1). On the contrary, for \(y=3.5,5\) (second and third row), we see that the target-aware algorithms, GTI, TAAnIS and BS, outperform the MCMC baseline. This performance gain with larger y is expected, since this represents a larger mismatch between \(\bar{\pi }(\textbf{x})\) and \(\bar{\varphi }(\textbf{x}|1)\propto f(\textbf{x})\pi (\textbf{x})\), which is a scenario where the target-aware approaches are well suited. Comparing the target-aware algorithms, we see that TAAnIS performs also as well as our GTI in low dimensions (\(D=10\)), but it breaks down as we increase the dimension, being outperformed by TANS in \(D=25,50\), confirming the results of Rainforth et al. (2020) where TANS is preferable over TAAnIS in high dimension. It is worth noticing the very good performance of BS, given its simplicity and that it can be computed at almost no extra cost once we have computed SNIS1 and SNIS2. Indeed, its performance matches that of GTI, and actually outperforms GTI when the separation is not too high. This is also expected since, when \(y=2\), both \(\bar{\pi }(\textbf{x})\) and \(\bar{\varphi }(\textbf{x}|1)\) are good pdfs for estimating both numerator and denominator of the right hand side of Eq. (2). In this sense, having only one “bridge” is better than having \(n=200\) intermediate distributions. However, GTI outperforms BS when \(y=3.5,5\), especially when the dimension is high.

Fig. 1
figure 1

Relative squared error of the considered algorithms as a function of number of total likelihood evaluations E, for different y and D. The median, 25% and 75% quantiles (over 100 independent simulations) are depicted

In summary, our proposed GTI is able to produce good estimates in the range of values of (yD) considered. The performance gains with respect to a MCMC baseline are higher when the discrepancy between \(\bar{\pi }(\textbf{x})\) and \(\bar{\varphi }(\textbf{x}|1)\) is large. As compared to other target-aware approaches, GTI produce better estimates (especially in high dimension) and is also able to perform well when the discrepancy is low, matching the performance of BS, that is a simpler and more direct target-aware algorithm.

5.2 Second numerical analysis

We consider the following two-dimensional banana-shaped density (which is a benchmark example Rainforth et al. 2020; Martino and Read 2013; Haario et al. 2001),

$$\begin{aligned} \pi (x_1,x_2) = \exp \left( -\frac{1}{2}\left( 0.03x_1^2 + \left( \frac{x_2}{2}+0.03\left( x_1^2-100\right) \right) ^2\right) \right) \cdot \mathbbm {1}\left( \textbf{x}\in \mathcal {B}\right) , \end{aligned}$$
(39)

where \(\mathbbm {1}\left( \textbf{x}\in \mathcal {B}\right)\) is the prior, where \(\mathcal {B} = \{\textbf{x}:\ -25< x_1< 25,\ -40< x_2 <20 \}\), and the function is

$$\begin{aligned} f(x_1,x_2) = (x_2+10)\exp \left( -\frac{1}{4}\left( x_1+x_2+25\right) ^2\right) \mathbbm {1}\left( x_2>-10\right) . \end{aligned}$$
(40)

We compare GTI using \(N\in \{10,50, 100\}\) against TAAnIS and TANS, in the estimation of \(\mathbb {E}_{\bar{\pi }}[f(\textbf{x})]\) allowing a budget of \(E=10^6\). We also consider a baseline MCMC chain targeting \(\pi (\textbf{x})\) with the same number of likelihood evaluations.

The main difference with respect to previous experiment is that \(f(\textbf{x})\) here has a zero-valued region, so, in order to apply GTI, we need to use the algorithm in Table 2. Hence, for GTI, we run \(N+1\) chains for \(M=\frac{E}{N+1}\) iterations. The first N chains address a different tempered distribution \(f(\textbf{x})^{\beta _i}\pi (\textbf{x})\), and the last chain is used to compute the correction factor. It is also important to notice here that TAAnIS, which also uses a geometric path to bridge \(f(\textbf{x})\pi (\textbf{x})\) and the prior, requires also the computation of a correction factor, accounting for the fact we connect a prior restricted to where \(f(\textbf{x})\ne 0\). This amounts to multiply by a factor \(\frac{1}{2}\) the final estimate returned by TAAnIS (Fig. 2).

All the MCMC algorithms use a Gaussian random-walk proposal with covariance \(\Sigma =3\textbf{I}_2\). The budget of likelihood evaluations is \(E=10^6\), for all the compared schemes. We use again the powered fraction schedule: \(\beta _i = \left( \frac{i-1}{N-1}\right) ^5\) for \(i=1,\dots ,N\).

Results The results are shown in Table 3. We show the median relative square error of the methods over 100 independent simulations. For the sample size considered, GTI performs better than MCMC baseline and the other target-aware algorithms. TAAnIS performs slightly better than the MCMC baseline, while TANS completely fails at estimating the posterior expectation in this example. For \(N=100\), the performance gains of GTI are almost of one order of magnitude over MCMC. However, note that GTI with the choice \(N=10\) is worse than the MCMC baseline due to the discretization error, i.e., there are not enough quadrature nodes, so the estimation in Eq. (20) has considerable bias. In that situation, increasing the sample size would not translate into a significant performance gain. This contrasts with TAAnIS, where increasing N produces only small improvement on the final estimate, since TAAnIS is unbiased regardless of the choice of N.

Fig. 2
figure 2

Plots of \(\pi (\textbf{x})\), \(f(\textbf{x})\pi (\textbf{x})\) and \(f(\textbf{x})^\beta \pi (\textbf{x})\) with \(\beta =0.0173\) for the banana example. We see that \(f(\textbf{x})\) and \(f(\textbf{x})\pi (\textbf{x})\) have little overlap and hence a direct MCMC estimate of \(\mathbb {E}_{\bar{\pi }}[f(\textbf{x})]\) is not efficient. The tempered distributions, \(f(\textbf{x})^\beta \pi (\textbf{x})\), are in-between those distributions, helping in the estimation of \(\mathbb {E}_{\bar{\pi }}[f(\textbf{x})]\)

Table 3 Median relative square error of GTI, TAAnIS and TANS for \(E=10^6\) likelihood evaluations and \(N\in \{10,50,100\}\). Best result is boldfaced

6 Conclusions

We have extended the powerful thermodynamic integration technique for performing a target-aware Bayesian inference. Namely, GTI allows the computation of posterior expectations of real-valued functions \(f(\textbf{x})\), and also vector-valued functions \(\textbf{f}(\textbf{x})\). GTI contains the standard TI as special case. Even for the estimation of the marginal likelihood, this work provides a way for extending the application of the standard TI avoiding the assumption of strictly positive likelihood functions (see Remarks 1- 3). Several computational considerations and variants are discussed. The advantages of GTI over other target-aware algorithms are shown in different numerical comparisons. As a future research line, we plan to study new continuous paths for linking densities with different support, avoiding the need of the correction terms. Alternatively, as discussed in Sect. 4, another approach would be to design suitable approximations of \(\varphi _+(\textbf{x})\), \(\varphi _-(\textbf{x})\) and \(\pi (\textbf{x})\) (see the end of Sect. 4) using, e.g., regression techniques (Llorente et al. 2021b, 2020b).