Target-aware Bayesian inference via generalized thermodynamic integration

In Bayesian inference, we are usually interested in the numerical approximation of integrals that are posterior expectations or marginal likelihoods (a.k.a., Bayesian evidence). In this paper, we focus on the computation of the posterior expectation of a function f(x)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\textbf{x})$$\end{document}. We consider a target-aware scenario where f(x)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\textbf{x})$$\end{document} is known in advance and can be exploited in order to improve the estimation of the posterior expectation. In this scenario, this task can be reduced to perform several independent marginal likelihood estimation tasks. The idea of using a path of tempered posterior distributions has been widely applied in the literature for the computation of marginal likelihoods. Thermodynamic integration, path sampling and annealing importance sampling are well-known examples of algorithms belonging to this family of methods. In this work, we introduce a generalized thermodynamic integration (GTI) scheme which is able to perform a target-aware Bayesian inference, i.e., GTI can approximate the posterior expectation of a given function. Several scenarios of application of GTI are discussed and different numerical simulations are provided.


Introduction
Bayesian methods have become very popular in many domains of science and engineering over the last years, as they allow for obtaining estimates of parameters of interest as well as comparing competing models in a principled way (Robert and Casella 2004;Luengo et al. 2020).The Bayesian quantities can generally be expressed as integrals involving the posterior density.They can be divided in two main categories: posterior expectations and marginal likelihoods (useful for model selection purposes).
Generally, computational methods are required for the approximation of these integrals, e.g., Monte Carlo algorithms such as Markov chain Monte Carlo (MCMC) and importance sampling (IS) (Robert and Casella 2004;Luengo et al. 2020;Rainforth et al. 2020).Typically, practitioners apply an MCMC or IS algorithm to approximate the posterior π (x) density by a set of samples, which is used in turn to estimate posterior expectations π [f (x)] of some function f (x) .Although it this is a sensible strategy when f (x) is not known in advance and/or we are interested in computing several posterior expectations with respect to different functions, this strategy is suboptimal when the target function f (x) is known in advance since it is completely agnostic to f (x) .Incorporating knowledge of f (x) for the estimation of π [f (x)] is known as target-aware Bayesian inference or TABI (Rainforth et al. 2020).TABI proposes to break down the estimation of the posterior expectation into several independent estimation tasks.Specifically, in TABI we require to estimate three marginal likelihoods (or normalizing constants) independently, and then recombine the estimates in order to form the approximation of the posterior expectation.The target function f (x) features in two out of the three marginal likelihoods that have to be estimated.Hence, the TABI framework provides means of improving the estimation of a posterior expectation by making use explicitly of f (x) and lever- aging the use of algorithms for marginal likelihood computation.
The computation of the marginal likelihoods is particularly complicated, specially with MCMC outputs (Newton and Raftery 1994;Llorente et al. 2020aLlorente et al. , 2021a)).IS techniques are the most popular for this task.The basic IS algorithm provides with a straightforward estimation of the marginal likelihood.However, designing a good proposal pdf that approximates the target density is not easy (Llorente et al. 2020a).For this reason, sophisticated and powerful schemes have been specifically designed (Llorente et al. 2020a;Friel and Wyse 2012).The most powerful techniques involve the idea of the so-called tempering of the posterior (Neal 2001;Lartillot and Philippe 2006;Friel and Pettitt 2008).The tempering effect is commonly employed in order to foster the exploration and improve the efficiency of MCMC chains (Neal 1996;Martino et al. 2021).State-of-the-art methods for computing marginal likelihoods consider tempered transitions (i.e.sequence of tempered distributions), such as annealed IS (An-IS) (Neal 2001), sequential Monte Carlo (SMC) (Moral et al. 2006), thermodynamic integration (TI), a.k.a., path sampling (PS) or "power posteriors" (PP) in the statistics literature (Lartillot and Philippe 2006;Friel and Pettitt 2008;Gelman and Meng 1998), and stepping stones (SS) sampling (Xie et al. 2010).An-IS is a special case of SMC framework, PP is a special case of TI/PS, and SS sampling present similar features to An-IS and PP.For more details, see (Llorente et al. 2020a).It is worth to mention that TI has been introduced in the physics literature for computing free-energy differences (Frenkel 1986;Gelman and Meng 1998).
In this work, we extend the TI method, introducing the generalized thermodynamic integration (GTI) technique, for computing posterior expectations of a function f (x) .In this sense, GTI is a target-aware algorithm that incorporates informa- tion f (x) within the marginal likelihood estimation technique TI.The extension of 1 3 Target-aware Bayesian inference via generalized thermodynamic… TI for the computation of π f (x) is not straightforward, since it requires to build a continuous path between densities with possibly different support.In the case of a geometric path (which is the default choice in practice Friel and Pettitt 2008;Lartillot and Philippe 2006), the generalization of TI needs a careful look at the support of the negative and positive parts of f (x) .We discuss the application of GTI for the computation of posterior expectations of generic real-valued function f (x) , and also describe the case of vector-valued function f(x) .The benefits of GTI are clearly shown by illustrative numerical simulations.
The structure of the paper is the following.In Sect.2, we introduce the Bayesian inference setting and describe the thermodynamic method for the computation of the marginal likelihood.In Sect.3, we introduce the GTI procedure.More specifically, we discuss first the case when f (x) is strictly positive or negative in Sect. 3.2, and then consider the general case of a real-valued f (x) in Sect. 3.3.In Sect. 4,we discuss some computational details of the approach, and the application of GTI for vector-valued functions f(x) .We show the benefits of GTI in two numerical experi- ments in Sect. 5. Finally, Sect.6 contains the conclusions.

Bayesian inference
In many real-world applications, the goal is to infer a parameter of interest given a set of data (Robert and Casella 2004).Let us denote the parameter of interest by x ∈ X ⊆ ℝ D , and let y ∈ ℝ d y be the observed data.In a Bayesian analysis, all the statistical information is contained in the posterior distribution, which is given by where (y|x) is the likelihood function, g(x) is the prior pdf, and Z(y) is the Bayes- ian model evidence (a.k.a.marginal likelihood).Generally, Z(y) is unknown, so we are able to evaluate the unnormalized target function, (x) = (y|x)g(x) .The ana- lytical computation of the posterior density π (x) ∝ (x) is often unfeasible, hence numerical approximations are needed.The interest lies in in the approximation of integrals of the form where f (x) is some integrable function, and (1) The quantity Z is called marginal likelihood (a.k.a., Bayesian evidence) and is useful for model selection purpose (Llorente et al. 2020a).Generally, I are Z analytically intractable and we need to resort to numerical algorithms such as Markov chain Monte Carlo (MCMC) and importance sampling (IS) algorithms.In this work, we consider f (x) is known in advance, and we aim at exploiting it in order to apply thermodynamic integration for computing the posterior expectation I, namely, perform target-aware Bayesian inference (TABI).

Computation of marginal likelihoods for parameter estimation: The TABI framework
The focus of this work is on parameter estimation, namely, we are interested in the computation of the posterior expectation in Eq. ( 2) of some function f (x) .
Recently, the authors in Rainforth et al. (2020) propose a framework called targetaware Bayesian inference (TABI) that aims at improving the Monte Carlo estimation of I when the target f (x) is known in advance.The TABI framework is based on decomposing I into several terms and estimate them separately, leveraging the information in f (x . Hence, TABI rewrites the posterior expectation I as where c + = ∫ f + (x) (x)dx and c − = ∫ f − (x) (x)dx .Note that c + , c − and Z are integrals of non-negative functions, namely, they are marginal likelihoods (or normalizing constants).The three unnormalized densities of interest hence are (x) , f + (x) (x) and f − (x) (x) .Note that two out of the three (unnormalized) densities incorporate information about f (x) .The general TABI estimator is then where ĉ+ , ĉ− and Ẑ are estimates obtained independently.These estimates can be obtained by any marginal likelihood estimation method.The original TABI framework is motivated in the IS context.This is due to the fact that marginal likelihoods (i.e., integrals of non-negative functions) can be estimated arbitrarily well with IS (Llorente et al. 2020a(Llorente et al. , 2021a;;Rainforth et al. 2020).Namely, using the optimal proposals the estimates ĉ+ , ĉ− and Ẑ coincide with the exact values regardless of the sample size.Note that the direct estimation of I via MCMC or IS cannot produce zero-variance estimators for a finite sample size (Robert and Casella 2004).
The TABI framework improves the estimation of I by converting the initial task in that of computing three marginal likelihoods, c + , c − and Z.In Rainforth et al.  (2020), the authors test the application of two popular marginal likelihood estimators within TABI, namely, annealed IS (AnIS) Neal (1996) and nested sampling (NS) Skilling (2006), resulting in the target-aware algorithms called target-aware AnIS (TAAnIS) and target-aware NS (TANS).The use of AnIS for independently (4) Target-aware Bayesian inference via generalized thermodynamic… computing c + , c − and Z represents an improvement over IS.Although the IS esti- mation of c + , c − and Z can have virtually zero-variance, this is only true when we employ the optimal proposals.In general, the performance of IS depends on how 'close' is the proposal pdf to the target density whose normalizing constant we aim to estimate.It can be shown that the variance of IS scales with the Pearson divergence between target and proposal (Llorente et al. 2020a).When this distance is large, then it is more efficient to sample from another proposal that is 'in between', i.e., an 'intermediate' density.This is the motivation behind many state-of-the-art marginal likelihood estimation methods that employ a sequence of densities bridging an easy-to-work-with proposal and the target density (Llorente et al. 2020a).In this work, we introduce thermodynamic integration (TI) for performing target-aware inference, hence enabling the computation of the posterior expectation of a function f (x) .TI is a powerful marginal likelihood estimation technique that also leverages the use of a sequence of distributions, but has several advantages over other methods based on tempered transitions, such as improved stability thanks to working in logarithm scale and applying deterministic quadrature (Friel and Pettitt 2008;Friel and Wyse 2012).TI for computing marginal likelihoods is reviewed in the next section.
Then, in Sect. 3 we introduce generalized TI (GTI) for the computation of posterior expectations, that is based on rewriting I as the difference of two ratios of normalizing constants.

Thermodynamic integration for estimating Z
Thermodynamic integration (TI) is a powerful technique that has been proposed in literature for computing ratios of constants (Frenkel 1986;Gelman and Meng 1998;Lartillot and Philippe 2006).Here, for simplicity, we focus on the approximation of just one constant, the marginal likelihood Z.More precisely, TI produces an estimation of log Z .Let us consider a family of (generally unnormalized) densities such that (x|0) = g(x) is the prior and (x|1) = (x) is the unnormalized poste- rior distribution.An example is the so-called geometric path 1993).The corresponding normalized densities in the family are denoted as Then, the main TI identity is (Llorente et al. 2020a) where the expectation is with respect to (w.r.t.) . TI estimator Using an ordered sequence of discrete values { i } N i=1 (e.g.i 's uniformly in [0, 1]), one can approximate the integral in Eq. ( 8) via quadrature w.r.t., and then approximate the inner expectation with a Monte Carlo estimator using samples from π (x| i ) for i = 1, … , N .Namely, defining U(x) = log (x| ) and , the resulting estimator of Eq. ( 8) is given by where Note that we used the simplest quadrature rule in Eq. ( 9), but others can be used such as Trapezoidal, Simpson's, etc. (Friel and Pettitt 2008;Lartillot and Philippe 2006).
The power posteriors (PP) method Let us consider the specific case of a geometric path between prior g(x) and unnomalized posterior (x), where we have used (x) = (y|x)g(x) .Note that, in this scenario, 1Hence, the identity in Eq. ( 8) can be also written as The power posteriors (PP) method is a special case of TI which considers (a) the geometric path and (b) trapezoidal quadrature rule for integrating w.r.t. the variable (Friel and Pettitt 2008).Namely, letting  1 = 0 < ⋯ <  N = 1 denote a fixed tem- perature schedule, an approximation of Eq. ( 14) can be obtained via the trapezoidal rule , Target-aware Bayesian inference via generalized thermodynamic… where the the expectations are generally substituted with MCMC estimates as in Eq. ( 10).TI and PP are popular methods for computing marginal likelihoods (even in high-dimensional spaces) due to their reliability.Theoretical properties are studied in Gelman andMeng (1998), Calderhead andGirolami (2009), and empirical validation is provided in several works, e.g., (Friel and Pettitt 2008;Lartillot and Philippe 2006).Different extensions and improvements on the method have also been proposed (Oates et al. 2016;Friel et al. 2014;Calderhead and Girolami 2009).
Remark 1 Note that, in order to ensure that the integrand in Eq. ( 14) is finite, so that the estimator in Eq. ( 15) can be applied, we need that (a) (y|x) is strictly posi- tive everywhere, or (b) (y|x) = 0 only whenever g(x) = 0 (i.e., they have the same support).

Goal
We have seen that the TI method has been proposed for computing log Z (or log-ratios of constants).Our goal is to extend the TI scheme in order to perform target-aware Bayesian inference.Namely, we generalize the idea of these methods (thermodynamic integration, power posteriors, etc.) to the computation of posterior expectations for a given f (x).

Generalized TI (GTI) for Bayesian inference
In this section, we extend the TI method for computing the posterior expectation of a given f (x) .As in TABI, the basic idea, as we show below, is the formulation I in terms of ratios of normalizing constants.First, we consider the case f (x) > 0 for all x and then the case of a generic real-valued f (x).

General approach
In order to apply TI, we need to formulate the posterior expectation I as a ratio of two constants.Since f (x) can be positive or negative, let us consider the posi- tive and negative parts, , where f + (x) and f − (x) are non-negative functions.Similarly to Eq. ( 4), we rewrite the integral I in terms of ratios of constants, where c + = ∫ X + (x)dx are c − = ∫ X − (x)dx are respectively the normalizing con- stants of + (x) = f + (x) (x) , and Z in the case of a generic f (x) , we propose to obtain estimates of these quantities using thermodynamic inte- gration.Then, we can obtain the final estimator as In the next section, we give details on how to compute ̂ + , ̂ − by using a generalized TI method.
Remark 2 Note that in Eq. ( 16) we express I as the difference of two ratios, and we propose GTI to estimate them directly as per Eq. ( 17) Hence, differently from Eq. ( 5), we do not aim at estimating each constant separately.This amounts to bridging the posterior with the function-scaled posterior, as we show below.

GTI for strictly positive or strictly negative f(x)
Let us consider the scenario where f (x) > 0 for all x ∈ X .In this scenario, we can set Note that, with respect to Eq. ( 17), we only consider the first term.We link the unnormalized pdfs (x) and + (x) = f + (x) (x) with a geometric path, by defining Hence, we have dx .The Eq. ( 8) is thus Letting  1 = 0 < ⋯ <  N = 1 denote a fixed temperature schedule, the estimator (using the Trapezoidal rule) is thus where we use MCMC estimates for the terms Target-aware Bayesian inference via generalized thermodynamic… Function f (x) with zeros with null measure So far, we have considered strictly positive or strictly negative f (x) .This case could be extended to a posi- tive (or negative) f (x) with zeros in a null measure set.Indeed, note that the identity in Eq. ( 19) requires that φ(x|) [log f (x)] < ∞ for all ∈ [0, 1] .If the zeros of f (x) has null measure and the improper integral converges, the proce- dure above is also suitable.Table 1 summarizes the Generalized TI (GTI) steps for f (x) that are strictly positive.We discuss other scenarios in the next section.

GTI for generic f(x)
Using the results from previous section, we apply GTI to a real-valued function f (x) , namely, it can be positive and negative, as well as having zero-valued regions with a non-null measure.Here, we desire to connect the posterior (x) with the f + (x) (x) and f − (x) (x) with two continuous paths.However, a require- ment for the validity of the approach is that (x) is zero whenever f + (x) (x) or f − (x) (x) is zero, which does not generally fulfills as f (x) can have a smaller sup- port than (x) .This fact enforces the computation of correction factors to keep the validity of the approach.More details can be found in Appendix B. Therefore, we need to define the unnormalized restricted posteriors densities where 1 X + (x) is the indicator function over the set X The idea is to connect with a path + (x) and f + (x) (x) , and − (x) with f − (x) (x) , by the densities (24) + (x) = (x)1 X + (x), and Note that it is equivalent to write f ± (x) ± (x) = f ± (x) (x) , since ± (x) = (x) when- ever f ± (x) > 0 , and they only differ when f ± (x) = 0 , in which case we also have f ± (x) ± (x) = f ± (x) (x) = 0 .Defining also and recalling the idea is to apply separately TI for approximating res + = log , where we denote with res to account that we consider the restricted components Z + and Z − .Hence, two correction factors R + and R − are also required, in order to obtain R + exp res Below, we also show how to estimate the correction factors at a final stage and combine them to the estimations of res + and res − .We can approximate the quantities using the estimators where (25) 1 3 Target-aware Bayesian inference via generalized thermodynamic… When comparing the estimators in Eqs. ( 27)-( 28) with respect to the GTI estimator in Eq. ( 20), here the only difference is that the expectation at = 0 is approxi- mated by using samples from the restricted posteriors, + (x) and − (x) , instead of the posterior (x).2To obtain an approximation of the true quantities of interest + , − (instead of res + and res − ), we compute two correction factors from a single set of K samples from π (x) as follows where  2 provides all the details of GTI in this scenario.
Remark 3 Standard TI as special case of GTI: Note that the GTI scheme contains TI as a special case if we set f (x) = (y|x) (i.e., the likelihood function) and let the prior g(x) play the role of (x) .Since the likelihood (y|x) is non-negative we have − = −∞ (then, exp − = 0 ), hence we only have to consider the estimation of + .Moreover, if (y|x) is strictly positive we do not need to compute the correction factor.
Remark 4 The GTI procedure, described above, also allows the application of the standard TI for computing marginal likelihoods when the likelihood function is not strictly positive, by applying a correction factor in the same fashion (in this case, considering a restricted prior pdf).

Computational considerations and other extensions
In this section, we discuss computational details, different scenarios and further extensions, that are listed below. (31)

Acceleration schemes
In order to apply GTI, the user must set N and M, so that the total number of samples/evaluations of f (x) in Table 1 is E = NM .The evaluations of f (x) in Table 2 are E = 2NM + K .We can reduce the cost of algorithm in Table 2 to E = NM + K with an acceleration scheme.Instead of running separate MCMC algorithms for φ+ (x|) ∝ f + (x)   + (x) and φ− (x|) ∝ f − (x)   − (x) , we use a single run targeting We can obtain two MCMC samples, one from φ+ (x|) and one from φ− (x|) , by separating the sample into two: samples with positive value of f (x) , and sam- ples with negative value of f (x) , respectively.The procedure can be repeated until obtaining the desired number of samples from each density, φ+ (x|) and φ− (x|).  1 3 Target-aware Bayesian inference via generalized thermodynamic… Moreover, note that in Table 2 we need to draw samples from + (x) , − (x) and (x) .Instead of sampling each one separately, we can use the following procedure.Obtain a set of samples from (x) and then apply rejection sampling (i.e.discard samples with f ± (x) = 0 ) in order to obtain samples from ± (x) .Combining this idea with the acceleration scheme above reduces the cost of Table 2 to E = MN.

Parallelization
Note that steps 1 and 2 in Table 1 and Table 2 are amenable to parallelization.In other words, those steps need not be performed sequentially but can be done using embarrassingly parallel MCMC chains (i.e. with no communication among N, or 2N, workers).Only step 3 requires communicating to a central node and combining the estimates.With this procedure, the number of evaluations E is the same but the computation time is reduced by 1 N (or 1 2N ) factor.On the other hand, population MCMC techniques can be used, but parallelization speedups are lower since communication among workers occurs every so often, in order to foster the exploration of the chains (Martino et al. 2016;Calderhead and Girolami 2009).

Vector-valued functions f(x)
In Bayesian inference, one is often interested in computing moments of the posterior, i.e., In this case I is a vector and f(x) = x .When = 1 , I represents the minimum mean square error (MMSE) estimator.More generally, we can have a vector-valued function, hence the integral of interest is a vector In this scenario, we need to apply the GTI scheme to each component of I sepa- rately, obtaining estimates Îi of the form in Eq. ( 33).

TI within the TABI framework: TATI
We have seen that we can apply GTI to compute the posterior expectation of a generic f (x) , that can be positive, negative and have zero-valued regions.For doing this, we connected with a tempered path, + (x) and − (x) , to f + (x) (x) and f − (x) (x) respectively and then apply correction factors.
An alternative procedure is to use the TABI identity in Eq. ( 4), rather than Eq. ( 16) and use reference distributions for computing separately c + , c − and Z in Eq. ( 35) (16).This target-aware TI (TATI) differ from GTI in that we need to apply TI three times, and bridge three reference distributions to the target densities f + (x) (x) , f − (x) (x) and (x) .Let us define as three unnormalized reference densities with normalizing constants, Then, the idea is to apply TI for obtaining estimates of log is zero where f − (x) (x) is zero, and p ref 3 (x) is zero where (x) is zero.Namely, we need to be able to build a continuous path between the reference distributions and the corresponding unnormalized pdf of interest.With this procedure, we do not need to apply correction factors, but we just need to apply the algorithm in Table 1 three times.The performance of TATI is expected to be better than GTI if we are able to choose three reference distributions that are 'closer' to the corresponding target densities, than what (x) is to f + (x) (x) or f − (x) (x) (Llorente et al. 2020a).For instance, we can obtain the reference pdfs by building nonparametric approximations to each target density (Llorente et al. 2021b).

Numerical experiments
In this section, we illustrate the performance of the proposed scheme in two numerical experiments which consider different kind of densities π with different features and different dimensions, and also different function f (x) .In the first example, f (x) is strictly positive so we apply the algorithm described in Table 1.In the second example, we consider f (x) to have zero-valued regions, and hence we apply the algorithm in Table 2. Notice we consider the same setup as in Rainforth et al. (2020) in order to compare with respect to instances of TABI algorithms.

First numerical analysis
Let us consider the following Gaussian model (Rainforth et al. 2020) where D is the dimensionality, I D is the identity matrix, 0 D and 1 D are D-vectors con- taining only zeros or ones respectively, and y is a scalar value that represents the radial distance of the observation y = − y √ D 1 D to the origin.We are interested in the estimation of I = ∫ X f (x) π (x)dx .Thus this problem consists in computing the Target-aware Bayesian inference via generalized thermodynamic… posterior predictive density, under the above model, at the point y √ D 1 D .In this toy example, the posterior and the function-scaled posteriors can be obtained in closedform, that is, and The ground-truth is known, and can be written as a Gaussian density evaluated at We test the values y ∈ {2, 3.5, 5} and D ∈ {10, 25, 50} .Note that, as we increase y, the posterior π (x) and the density φ(x|1) ∝ f (x)(x) become further apart.

Comparison with other target-aware approaches
We aim to compare GTI with a MCMC baseline and other target-aware algorithms, that make use of f (x) .Specifically, we compare against two extreme cases of self- normalized IS (SNIS) estimators ÎSNIS = 1 , where x i ∼ q(x) and w i = (x i ) q(x i ) is the IS weight.Namely, (1) SNIS using samples from the posterior (SNIS1), i.e., q(x) = π (x) (hence, SNIS1 coincides with MCMC), and (2) SNIS using samples from q(x) = φ(x|1) ∝ f (x)(x) (SNIS2), which corresponds to setting = 1 in Eq. ( 37).These choices are optimal for estimating, respectively, the denom- inator and the numerator of the right hand side of Eq. (2) (Robert and Casella 2004).Note that SNIS2 can be considered as a first "primitive" target-aware algorithm, since it employs samples from φ(x|1) ∝ f (x)(x).
A second target-aware approach can be obtained by recycling the samples generated in SNIS1 and SNIS2 (that is, from π (x) and φ( x|1) ), and is called (3) bridge sampling (BS) (Llorente et al. 2020a).This estimator can be viewed as if we use the mixture of π (x) and φ(x|1) as proposal pdf.More details about (optimal) BS can be found in Appendix A. Finally, we also aim to compare against the target-aware versions of two popular marginal likelihood estimators, namely, (4) target-aware annealed IS (TAAnIS) and (5) target-aware nested sampling (TANS), that also make use of the f (x) (Rainforth et al. 2020).
In order to keep the comparisons fair, we consider the same number of likelihood evaluations E in all of the methods.Note that evaluating the likelihood is usually the most costly step in many real-world scenarios.Hence, in SNIS1 and SNIS2 we draw M tot = E samples from π (x) and φ(x|1) , respectively, via MCMC; BS employ E 2 samples from π (x) and E 2 from φ(x|1) .For TAAnIS and TANS we use the same parame- ters as in Rainforth et al. (2020).Namely, for TAAnIS we employ N = 200 intermediate distributions, with n MCMC = 5 iterations of Metropolis-Hastings (MH) algorithm, which allow for a total number of particles n par = ⌊ E (N−1)n MCMC +N−1 ⌋ , where half of the particles are used to estimate the numerator, and the other half to estimate the denominator on the right-hand side of Eq. ( 2).For TANS, we employ n MCMC = 20 iterations of MH and n par = ⌊ E 1+ n MCMC ⌋ particles, where = 250 and T = n par iterations.Again, TANS employs one half of the particles for estimating the numerator and the other half for the denominator.Finally, in GTI we set also N = 200 , hence we draw M = ⌊ E N ⌋ samples from each φ(x| i ), i = 1, … , N .Note that we are setting the same number of intermediate distributions in GTI and TAAnIS, however, the paths are not identical, since TAAnIS aims at bridging the prior with (x) and (x|1) , while GTI directly bridges (x) with (x|1) .All the iter- ations of the MH algorithm use a Gaussian random-walk proposal with covariance matrix equal to Σ = 0.1225I , Σ = 0.04I and Σ = 0.01I , for D = 10, 25, 50 respec- tively.Following (Rainforth et al. 2020), for TANS we use instead Σ = I , Σ = 0.09I for i = 1, … , N (Friel and Pettitt 2008;Xie et al. 2010).

Results
The results are given in Fig. 1, which show, for each pair (y, D), the median relative square error along with the 25% and 75% quantiles (over 100 simulations) versus the number of total likelihood evaluations E, up to E = 10 7 .We see that GTI is the first or second best overall, in terms of relative squared error for all (y, D).In fact, the performance of GTI seems rather insensitive to increasing the dimension D and y.We see that for dimension and when the distance between π (x) and φ( x|1) is small (i.e.y = 2 and D = 10 ), the target-aware algorithms do not produce large gains with respect to the MCMC baseline (SNIS1).On the contrary, for y = 3.5, 5 (sec- ond and third row), we see that the target-aware algorithms, GTI, TAAnIS and BS, outperform the MCMC baseline.This performance gain with larger y is expected, since this represents a larger mismatch between π (x) and φ(x|1) ∝ f (x)(x) , which is a scenario where the target-aware approaches are well suited.Comparing the target-aware algorithms, we see that TAAnIS performs also as well as our GTI in low dimensions ( D = 10 ), but it breaks down as we increase the dimension, being out- performed by TANS in D = 25, 50 , confirming the results of Rainforth et al. (2020) where TANS is preferable over TAAnIS in high dimension.It is worth noticing the very good performance of BS, given its simplicity and that it can be computed at almost no extra cost once we have computed SNIS1 and SNIS2.Indeed, its performance matches that of GTI, and actually outperforms GTI when the separation is not too high.This is also expected since, when y = 2 , both π (x) and φ(x|1) are good pdfs for estimating both numerator and denominator of the right hand side of Eq.
(2).In this sense, having only one "bridge" is better than having n = 200 intermedi- ate distributions.However, GTI outperforms BS when y = 3.5, 5 , especially when the dimension is high.In summary, our proposed GTI is able to produce good estimates in the range of values of (y, D) considered.The performance gains with respect to a MCMC baseline are higher when the discrepancy between π (x) and φ(x|1) is large.As compared to other target-aware approaches, GTI produce better estimates (especially in high dimension) and is also able to perform well when the discrepancy is low, matching the performance of BS, that is a simpler and more direct target-aware algorithm.

Second numerical analysis
We consider the following two-dimensional banana-shaped density (which is a benchmark example Rainforth et al. 2020;Martino and Read 2013;Haario et al. 2001), where 1(x ∈ B) is the prior, where B = {x ∶ −25 < x 1 < 25, −40 < x 2 < 20} , and the function is We compare GTI using N ∈ {10, 50, 100} against TAAnIS and TANS, in the estima- tion of π [f (x)] allowing a budget of E = 10 6 .We also consider a baseline MCMC chain targeting (x) with the same number of likelihood evaluations.
The main difference with respect to previous experiment is that f (x) here has a zero-valued region, so, in order to apply GTI, we need to use the algorithm in Table 2. Hence, for GTI, we run N + 1 chains for M = E N+1 iterations.The first N chains address a different tempered distribution f (x) i (x) , and the last chain is used to compute the correction factor.It is also important to notice here that TAAnIS, which also uses a geometric path to bridge f (x) (x) and the prior, requires also the computation of a correction factor, accounting for the fact we  Target-aware Bayesian inference via generalized thermodynamic… connect a prior restricted to where f (x) ≠ 0 .This amounts to multiply by a factor 1 2 the final estimate returned by TAAnIS (Fig. 2).All the MCMC algorithms use a Gaussian random-walk proposal with covariance Σ = 3I 2 .The budget of likelihood evaluations is E = 10 6 , for all the com- pared schemes.We use again the powered fraction schedule: Results The results are shown in Table 3.We show the median relative square error of the methods over 100 independent simulations.For the sample size considered, GTI performs better than MCMC baseline and the other target-aware algorithms.TAAnIS performs slightly better than the MCMC baseline, while TANS completely fails at estimating the posterior expectation in this example.For N = 100 , the performance gains of GTI are almost of one order of magnitude over MCMC.However, note that GTI with the choice N = 10 is worse than the MCMC baseline due to the discretization error, i.e., there are not enough quadrature nodes, so the estimation in Eq. ( 20) has considerable bias.In that situation, increasing the sample size would not translate into a significant performance gain.This contrasts with TAAnIS, where increasing N produces only small improvement on the final estimate, since TAAnIS is unbiased regardless of the choice of N.

Conclusions
We have extended the powerful thermodynamic integration technique for performing a target-aware Bayesian inference.Namely, GTI allows the computation of posterior expectations of real-valued functions f (x) , and also vector-valued functions f(x) .GTI contains the standard TI as special case.Even for the estimation of the marginal likelihood, this work provides a way for extending the application of the standard TI avoiding the assumption of strictly positive likelihood functions (see Remarks 1-3).Several computational considerations and variants are discussed.The advantages of GTI over other target-aware algorithms are shown in different numerical comparisons.As a future research line, we plan to study new continuous paths for linking densities with different support, avoiding the need of the correction terms.Alternatively, as discussed in Sect.4, another approach would be to design suitable approximations of + (x) , − (x) and (x) (see the end of Sect.4) using, e.g., regression techniques (Llorente et al. 2021b(Llorente et al. , 2020b)).

Appendices Appendix A: Bridge sampling
The estimator tested in Sect.5.1 is an instance of bridge sampling.Bridge sampling (BS) is an importance sampling approach for computing the ratio of normalizing constants of two unnormalized pdfs using samples from both densities (Llorente et al. 2021a).Here, the two unnormalized pdfs of interest are (x) and (x) = f (x) (x) , and the ratio c Z corresponds to the posterior expectation of interest, namely, I = c Z .Hence, BS can be viewed as a target-aware approach.In order to implement the optimal bridge sampling estimator, an iterative scheme is required. Let i=1 denote sets of MCMC samples from π (x) and φ(x) = (x) c .Let Î(0) be an initial estimate of I, the optimal BS estimator is computed by refining this estimate through the following loop.For t = 1, … , T: In the experiments, just a couple of iterations were needed for Î(t) to converge.As initial estimate, we take 1 3 Target-aware Bayesian inference via generalized thermodynamic… If this limit exists, the integral is convergent and it is safe to apply quadrature (Riemann sums) to calculate it, taking a very small 0 .
Behavior near = 0 Paying attention to the behavior of φ(x|) [log f (x)] near = 0 , we should notice that φ(x|) [log f (x)] will not diverge to −∞ as we get close to = 0 .On the contrary, there is a lower limit on its value as we approach = 0 .Consider, for an infinitesimal , the integral where p (x) ∝ f (x) (x) coincides with π (x) in X�{x ∶ f (x) = 0} , and is different only in that p (x) = 0 whenever f (x) = 0 , that is, This integral effectively corresponds to where X 0 = X�{x ∶ f (x) = 0} , that is, the expectation is w.r.t.πres (x) , the posterior restricted to regions where f (x) > 0 .Hence, we can summarize this as follows: In summary, the integrand has a jump at = 0 since Then, by using Eq. ( 45), are we actually estimating instead of the integral of interest I = c Z .We need to apply a correction factor to our estimator as follows where the last term can be approximated from a posterior sample as follows: and both zero otherwise.The final estimator of I is including the two correction factors.Table

Fig. 1 1 3 1 N− 1 5
Fig. 1 Relative squared error of the considered algorithms as a function of number of total likelihood evaluations E, for different y and D. The median, 25% and 75% quantiles (over 100 independent simulations) are depicted

Table 1
GTI for strictly positive f (x)

Table 2
GTI for generic functions f (x)