1 Introduction

Nested sampling (Skilling 2006) is a numerical method for Bayesian computation which simultaneously provides both posterior samples and Bayesian evidence estimates. The approach is closely related to Sequential Monte Carlo (SMC) (Salomone et al. 2018) and rare event simulation (Walter 2017). The original development of the nested sampling algorithm was motivated by evidence calculation, but the MultiNest (Feroz and Hobson 2008; Feroz et al. 2008, 2013) and PolyChord (Handley et al. 2015a, b) software packages are now extensively used for parameter estimation from posterior samples (such as in DES Collaboration 2018). Nested sampling performs well compared to Markov chain Monte Carlo (MCMC)-based parameter estimation for multi-modal and degenerate posteriors due to its lack of a thermal transition property and the relatively small amount of problem-specific tuning required; for example there is no need to specify a proposal function. Furthermore, PolyChord is well suited to high-dimensional parameter estimation problems due to its slice sampling-based implementation.

Nested sampling explores the posterior distribution by maintaining a set of samples from the prior, called live points, and iteratively updating them subject to the constraint that new samples have increasing likelihoods. Conventionally a fixed number of live points is used; we term this standard nested sampling. In this case the expected fractional shrinkage of the prior volume remaining is the same at each step, and as a result many samples are typically taken from regions of the prior that are remote from the bulk of the posterior. The allocation of samples in standard nested sampling is set by the likelihood and the prior, and cannot be changed depending on whether calculating the evidence or obtaining posterior samples is the primary goal.

We propose modifying the nested sampling algorithm by dynamically varying the number of live points in order to maximise the accuracy of a calculation for some number of posterior samples, subject to practical constraints. We term this more general approach dynamic nested sampling, with standard nested sampling representing the special case where the number of live points is constant. Compared to standard nested sampling, dynamic nested sampling is particularly effective for parameter estimation because standard nested sampling typically spends most of its computational effort iterating towards the posterior peak. This produces posterior samples with negligible weights which make little contribution to parameter estimation calculations, as discussed in our previous analysis of sampling errors in nested sampling parameter estimation (Higson et al. 2018c). We also achieve significant improvements in the accuracy of evidence calculations, and show both evidence and parameter estimation can be improved simultaneously. Our approach can be easily incorporated into existing standard nesting sampling software; we have created the dyPolyChord package (Higson 2018a) for performing dynamic nested sampling using PolyChord.

In this paper we demonstrate the advantages of dynamic nested sampling relative to the popular standard nested sampling algorithm in a range of empirical tests. A detailed comparison of nested sampling with alternative methods such as MCMC-based parameter estimation and thermodynamic integration is beyond the current scope — for this we refer the reader to Allison and Dunkley (2014), Murray (2007) and Feroz (2008).

The paper proceeds as follows: Sect. 2 contains background on nested sampling, and Sect. 3 establishes useful results about the effects of varying the number of live points. Our dynamic nested sampling algorithm for increasing efficiency in general nested sampling calculations is presented in Sect. 4; its accurate allocation of live points for a priori unknown posterior distributions is illustrated in Fig. 4. We first test dynamic nested sampling in the manner described by Keeton (2011), using analytical cases where one can obtain uncorrelated samples from the prior space within some likelihood contour using standard techniques. We term the resulting procedure perfect nested sampling (in both standard and dynamic versions), and use it to compare the performance of dynamic and standard nested sampling in a variety of cases without software-specific effects from correlated samples or prohibitive computational costs. These tests were performed with our perfectns package (Higson 2018c) and are described in Sect. 5, which includes a discussion of the effects of likelihood, priors and dimensionality on the improvements from dynamic nested sampling. In particular we find large efficiency gains for high-dimensional parameter estimation problems.

Section 6 discusses applying dynamic nested sampling to challenging posteriors, in which results from nested sampling software may include implementation-specific effects from correlations between samples (see Higson et al. 2018b, for a detailed discussion). We describe the strengths and weaknesses of dynamic nested sampling compared to standard nested sampling in such cases. This section includes numerical tests with a multimodal Gaussian mixture model and a practical signal reconstruction problem using dyPolyChord. We find that dynamic nested sampling also produces significant accuracy gains for these more challenging posteriors, and that it is able to reduce implementation-specific effects compared to standard nested sampling.

1.1 Other related work

Other variants of nested sampling include diffusive nested sampling (Brewer et al. 2011) and superposition enhanced nested sampling (Martiniani et al. 2014), which have been implemented as stand-alone software packages. In particular, dynamic nested sampling shares some similarities with DNest4 (Brewer and Foreman-Mackey 2016), in which diffusive nested sampling is followed by additional sampling targeting regions of high posterior mass. However dynamic nested sampling differs from these alternatives as, like standard nested sampling, it only requires drawing samples within hard likelihood constraints. As a result dynamic nested sampling can be used to improve the efficiency of popular standard nested sampling implementations such as MultiNest (rejection sampling), PolyChord (slice sampling) and constrained Hamiltonian nested sampling (Betancourt 2011) while maintaining their strengths in sampling degenerate and multimodal distributions.

It has been shown that efficiency can be greatly increased using nested importance sampling (Chopin and Robert 2010) or by performing nested sampling using an auxiliary prior which approximates the posterior as described in Cameron and Pettitt (2014). However, the efficacy of these approaches is contingent on having adequate knowledge of the posterior (either before the algorithm is run, or by using the results of previous runs). As such, the speed increase on a priori unknown problems is generally lower than might be suggested by toy examples.

Dynamic nested sampling is similar in spirit to the adaptive schemes for thermodynamic integration introduced by Hug et al. (2016) and Friel et al. (2014), as each involves an initial run followed by additional targeted sampling using an estimated error criteria. Furthermore, dynamically weighting sampling in order to target regions of higher posterior mass has also been used in the statistical physics literature, such as in multi-canonical sampling (see for example Okamoto 2004).

2 Background: the nested sampling algorithm

We now give a brief description of the nested sampling algorithm following Higson et al. (2018c) and set out our notation; for more details see Higson et al. (2018c) and Skilling (2006). For theoretical treatments of nested sampling’s convergence properties see Keeton (2011), Skilling (2009), Walter (2017) and Evans (2007).

For a given likelihood \(\mathcal {L}(\theta )\) and prior \(\pi (\theta )\), nested sampling is a method for simultaneously computing the Bayesian evidence

$$\begin{aligned} \mathcal {Z} = \int \mathcal {L}(\theta ) \pi (\theta ){\text {d}}{\theta } \end{aligned}$$
(1)

and sampling from the posterior distribution

$$\begin{aligned} \mathcal {P}(\theta ) = \frac{\mathcal {L}(\theta ) \pi (\theta )}{\mathcal {Z}}. \end{aligned}$$
(2)

The algorithm begins by sampling some number of live points randomly from the prior \(\pi (\theta )\). In standard nested sampling, at each iteration i the point with the lowest likelihood \(\mathcal {L}_i\) is replaced by a new point sampled from the region of prior with likelihood \(\mathcal {L}(\theta )>\mathcal {L}_i\) and the number of live points remains constant throughout. This process is continued until some termination condition is met, producing a list of samples (referred to as dead points) which—along with any remaining live points—can then be used for evidence and parameter estimation. We term the finished nested sampling process a run.

Nested sampling calculates the evidence (1) as a one-dimensional integral

$$\begin{aligned} \mathcal {Z}=\int _0^1 \mathcal {L}(X) {\text {d}}{X}, \end{aligned}$$
(3)

where \(X(\mathcal {L})\) is the fraction of the prior with likelihood greater than \(\mathcal {L}\) and \(\mathcal {L}(X)\equiv X^{-1}(\mathcal {L})\). The prior volumes \(X_i\) corresponding to the dead points i are unknown but can be modelled statistically as \(X_i = t_i X_{i-1}\), where \(X_0 = 1\). For a given number of live points n, each shrinkage ratio \(t_i\) is independently distributed as the largest of n random variables from the interval [0, 1] and so (Skilling 2006):

$$\begin{aligned} P(t_i) = n t_i^{n-1}, \qquad \mathrm {E}[\log t_i ] = -\,\frac{1}{n}, \qquad \mathrm {Var}[\log t_i ] = \frac{1}{n^2}. \end{aligned}$$
(4)

In standard nested sampling the number of live points n is some constant value for all \(t_i\)—the iteration of the algorithm in this case is illustrated schematically in Fig. 1.

Fig. 1
figure 1

A schematic illustration of standard nested sampling with a constant number of live points n reproduced from Higson et al. (2018c). \(\mathcal {L}(X)X\) shows the relative posterior mass, the bulk of which is contained in some small fraction of the prior. Most of the samples in the diagram are in \(\log X\) regions with negligible posterior mass, as is typically the case in standard nested sampling

2.1 Evidence estimation

Nested sampling calculates the evidence (3) as a quadrature sum over the dead points

$$\begin{aligned} \mathcal {Z}(\mathbf {t}) \approx \sum _{i \in \mathrm {dead}} \mathcal {L}_i w_i(\mathbf {t}), \end{aligned}$$
(5)

where \(\mathbf {t}=\{t_1,t_2,\ldots ,t_{n_{\mathrm {dead}}}\}\) are the unknown set of shrinkage ratios for each dead point and each \(t_i\) is an independent random variable with distribution (4). If required any live points remaining at termination can also be included. The \(w_i\) are appropriately chosen quadrature weights; we use the trapezium rule such that \(w_i(\mathbf {t})=\frac{1}{2}(X_{i-1}(\mathbf {t})-X_{i+1}(\mathbf {t}))\), where \(X_i(\mathbf {t}) = \prod ^i_{k=0} t_k\). Given that the shrinkage ratios \(\mathbf {t}\) are a priori unknown, one typically calculates an expected value and error on the evidence (5) using (4). The dominant source of error in evidence estimates from perfect nested sampling is the statistical variation in the unknown volumes of the prior “shells” \(w_i(\mathbf {t})\).

2.2 Parameter estimation

Nested sampling parameter estimation uses the dead points, and if required the remaining live points at termination, to construct a set of posterior samples with weights proportional to their share of the posterior mass:

$$\begin{aligned} p_i(\mathbf {t})=\frac{w_i(\mathbf {t})\mathcal {L}_i}{\sum _i w_i(\mathbf {t})\mathcal {L}_i}=\frac{w_i(\mathbf {t})\mathcal {L}_i}{\mathcal {Z}(\mathbf {t})}. \end{aligned}$$
(6)

Neglecting any implementation-specific effects, which are not present in perfect nested sampling, the dominant sampling errors in estimating some parameter or function of parameters \(f(\theta )\) come from two sources (Higson et al. 2018c):

  1. (i)

    approximating the relative point weights \(p_i(\mathbf {t})\) with their expectation \(\mathrm {E}[p_i(\mathbf {t})]\) using (4);

  2. (ii)

    approximating the mean value of a function of parameters over an entire iso-likelihood contour with its value at a single point \(f(\theta _i)\).

2.3 Combining and dividing nested sampling runs

Skilling (2006) describes how several standard nested sampling runs \(r=1,2,\ldots \) with constant live points \(n^{(r)}\) may be combined simply by merging the dead points and sorting by likelihood value. The combined sequence of dead points is equivalent to a single nested sampling run with \(n_\mathrm {combined}=\sum _r n^{(r)}\) live points.

Higson et al. (2018c) gives an algorithm for the reverse procedure: decomposing a nested sampling run with n live points into a set of n valid nested sampling runs, each with one live point. These single live point runs, which we term threads, are the smallest unit from which valid nested sampling runs can be constructed and will prove useful in developing dynamic nested sampling.

3 Variable numbers of live points

Before presenting our dynamic nested sampling algorithm in Sect. 4, we first establish some basic results for a nested sampling run in which the number of live points varies. Such runs are valid as successive shrinkage ratios \(t_i\) are independently distributed (Skilling 2006). For now we assume the manner in which the number of live points changes is specified in advance; adaptive allocation of samples is considered in Sect. 4.

Let us define \(n_i\) as the number of live points present for the prior shrinkage ratio \(t_i\) between dead points \(i-1\) and i.Footnote 1 In this notation all information about the number of live points for a nested sampling run can be expressed as a list of numbers \(\mathbf {n} = \{n_1, n_2, \ldots , n_{n_{\mathrm {dead}}}\}\) which correspond to the shrinkage ratios \(\mathbf {t} = \{t_1,t_2,\ldots ,t_{n_{\mathrm {dead}}}\}\). Nested sampling calculations for variable numbers of live points differ from the constant live point case only in the use of different \(n_i\) in calculating the distribution of each \(t_i\) from (4).

Skilling (2006)’s method for combining constant live point runs, mentioned in Sect. 2, can be extended to accommodate variable numbers of live points by requiring that at any likelihood the live points of the combined run equals the sum of the live points of the constituent runs at that likelihood (this is illustrated in Fig. 2). Variable live point runs can also be divided into their constituent threads using the algorithm in Higson et al. (2018c). However, unlike for constant live point runs, the threads produced may start and finish part way through the run and there is no longer a single unique division into threads on iso-likelihood contours where the number of live points increases. The technique for estimating sampling errors by resampling threads introduced in Higson et al. (2018c) can also be applied for nested sampling runs with variable numbers of live points (see “Appendix B” for more details), as can the diagnostic tests for correlated samples and missed modes described in Higson et al. (2018b).

Fig. 2
figure 2

Combining nested sampling runs a and b with variable numbers of live points \(\mathbf {n}^{(a)}\) and \(\mathbf {n}^{(b)}\) into a single nested sampling run c; black dots show dead points arranged in order of increasing likelihood. The number of live points in run c at some likelihood equals the sum of the live points of run a and run b at that likelihood

In addition, the variable live point framework provides a natural way to include the final set of live points remaining when a standard nested sampling run terminates in a calculation. These are uniformly distributed in the region of the prior with \(\mathcal {L}(\theta ) > \mathcal {L}_\mathrm {terminate}\), and can be treated as samples from a dynamic nested sampling run with the number of live points reducing by 1 as each of the points remaining after termination is passed until the final point i has \(n_i = 1\). This allows the final live points of standard nested sampling runs to be combined with variable live point runs.

The remainder of this section analyses the effects of local variations in the number of live points on the accuracy of nested sampling evidence calculation and parameter estimation. The dynamic nested sampling algorithm in Sect. 4 uses these results to allocate additional live points.

3.1 Effects on calculation accuracy

Nested sampling calculates the evidence \(\mathcal {Z}\) as the sum of sample weights (5); the dominant sampling errors are from statistically estimating shrinkage ratios \(t_i\) which affect the weights of all subsequent points. In “Appendix C” we show analytically that the reduction in evidence errors achieved by taking additional samples to increase the local number of live points \(n_i\) is inversely proportional to \(n_i\), and is approximately proportional to the evidence contained in point i and all subsequent points. This makes sense as the dominant evidence errors are from statistically estimating shrinkages \(t_i\) which affect all points \(j \ge i\).

In nested sampling parameter estimation, sampling errors come both from taking a finite number of samples in any region of the prior and from the stochastic estimation of their normalised weights \(p_i\) from (6). Typically standard nested sampling takes many samples with negligible posterior mass as illustrated in Fig. 1; these make little contribution to estimates of parameters or to the accuracy of samples’ normalised weights. From (4) the expected separation between points in \(\log X\) (approximately proportional to the posterior mass they each represent) is \(1/n_i\). As a result, increasing the number of live points wherever the dead points’ posterior weights \(p_i \propto \mathcal {L}_i w_i\) are greatest distributes posterior mass more evenly among the samples. This improves the accuracy of the statistically estimated weights \(p_i\), and can dramatically increase the information content (Shannon entropy of the samples)

$$\begin{aligned} H = \exp \left( - \sum _i p_i \log p_i \right) , \end{aligned}$$
(7)

which is maximised for a given number of samples when the sample weights are equal. Empirical tests of dynamic nested sampling show that increasing the number of live points wherever points have the highest \(p_i \propto \mathcal {L}_i w_i\) works well for increasing parameter estimation accuracy in most calculations.

As the contribution of each sample i to a parameter estimation problem for some quantity \(f(\theta )\) is dependent on \(f(\theta _i)\), the precise optimum allocation of live points is different for different quantities. In most cases the relative weight \(p_i\) of samples is a good approximation for their influence on a calculation, but for some problems much of the error may come from sampling \(\log X\) regions containing a small fraction of the posterior mass but with extreme parameter values (see Section 3.1 of Higson et al. 2018c, for diagrams illustrating this). “Appendix D” discusses estimating the importance of points to a specific parameter estimation calculation and using dynamic nested sampling to allocate live points accordingly.

4 The dynamic nested sampling algorithm

This section presents our algorithm for performing nested sampling calculations with a dynamically varying number of live points to optimise the allocation of samples.

Since the distribution of posterior mass as a function of the likelihood is a priori unknown, we first approximate it by performing a standard nested sampling run with some small constant number of live points \(n_{\mathrm {init}}\). The algorithm then proceeds by iteratively calculating the range of likelihoods where increasing the number of live points will have the greatest effect on calculation accuracy, and generating an additional thread running over these likelihoods. If required some \(n_{\mathrm {batch}}\) additional threads can be generated at each step to reduce the number of times the importance must be calculated and the sampler restarted. We find in empirical tests that using \(n_{\mathrm {batch}}> 1\) has little effect on efficiency gains from dynamic nested sampling when the number of samples taken in each batch is small compared to the total number of samples in the run.

From the discussion in Sect. 3.1 we define functions to measure the relative importance of a sample i for evidence calculation and parameter estimation respectively as

$$\begin{aligned} I_{\mathcal {Z}}(i)&\propto \frac{\mathrm {E}[\mathcal {Z}_{\ge i}]}{n_i}, \quad \text {where} \, \mathcal {Z}_{\ge i} \equiv \sum _{k \ge i} \mathcal {L}_k w_k(\mathbf {t}), \end{aligned}$$
(8)
$$\begin{aligned} I_{\mathrm {param}}(i)&\propto \mathcal {L}_i \,\, \mathrm {E}[w_i(\mathbf {t})]. \end{aligned}$$
(9)

Alternatively (8) can be replaced with the more complex expression (34) derived in “Appendix C”, although we find this typically makes little difference to results. Modifying (9) to optimise for estimation of a specific parameter or function of parameters is discussed in “Appendix D”.

The user specifies how to divide computational resources between evidence calculation and parameter estimation through an input goal \(G \in [0,1]\), where \(G=0\) corresponds to optimising for evidence calculation and \(G=1\) optimises for parameter estimation. The dynamic nested sampling algorithm calculates importance as a weighted sum of the points’ normalised evidence and parameter estimation importances

$$\begin{aligned} I(G, i) = (1-G) \frac{I_{\mathcal {Z}}(i)}{\sum _j I_{\mathcal {Z}}(j)} + G \frac{I_{\mathrm {param}}(i)}{\sum _j I_{\mathrm {param}}(j)}. \end{aligned}$$
(10)

The likelihood range in which to run an additional thread is chosen by finding all points with importance greater than some fraction f of the largest importance. Choosing a smaller fraction makes the threads added longer and reduces the number of times the importance must be recalculated, but can also cause the number of live points to plateau for regions with importance greater than that fraction of the maximum importance (see the discussion of Fig. 4 in the next section for more details). We use \(f = 0.9\) for results in this paper, but find empirically that using slightly higher or lower values make little difference to results. To ensure any steep or discontinuous increases in the likelihood \(\mathcal {L}(X)\) are captured we find the first point j and last point k which meet this condition, then generate an additional thread starting at \(\mathcal {L}_{j-1}\) and ending when a point is sampled with likelihood greater than \(\mathcal {L}_{k+1}\). If j is the first dead point, threads which initially sample the whole prior are generated. If k is the final dead point then the thread will stop when a sample with likelihood greater than \(\mathcal {L}_k\) is found.Footnote 2 This allows the new thread to continue beyond \(\mathcal {L}_k\), meaning dynamic nested sampling iteratively explores higher likelihoods when this is the most effective use of samples.

Unlike in standard nested sampling, more accurate dynamic nested sampling results can be obtained simply by continuing the calculation for longer. The user must specify a condition at which to stop dynamically adding threads, such as when fixed number of samples has been taken or some desired level of accuracy has been achieved. Sampling errors on evidence and parameter estimation calculations can be estimated from the dead points at any stage using the method described in Higson et al. (2018c). We term these dynamic termination conditions to distinguish them from the type of termination conditions used in standard nested sampling. Our dynamic nested sampling algorithm is presented more formally in Algorithm 1.

figure a

Footnote 3

4.1 Software implementation

Since dynamic nested sampling only requires the ability to sample from the prior within a hard likelihood constraint, implementations and software packages developed for standard nested sampling can be easily adapted to perform dynamic nested sampling. We demonstrate this with the dyPolyChord package, which performs dynamic nested sampling using PolyChord and is compatible with Python, C++ and Fortran likelihoods.

PolyChord was designed before the creation of the dynamic nested sampling algorithm, and is not optimized to quickly resume the nested sampling process at an arbitrary point to add more threads. dyPolyChord, which performs nested sampling with PolyChord, minimises the computational overhead from saving and resuming by using Algorithm 2—a modified version of Algorithm 1 described in “Appendix F”. After the initial exploratory run with \(n_{\mathrm {init}}\) live points, Algorithm 2 calculates a dynamic allocation of live points and then generates more samples in a single run without recalculating point importances. This means only the initial run provides information on where to place samples, and as a result the allocation of live points is slightly less accurate and a higher value of \(n_{\mathrm {init}}\) is typically needed.

Dynamic nested sampling will be incorporated in the forthcoming PolyChord 2 software package, which is currently in development and is designed for problems of up to \(\sim 1000\) dimensions — dynamic nested sampling can provide very large improvements in the accuracy of such high-dimensional problems, as shown by the numerical tests in the next section. Furthermore, we anticipate reloading a past iteration i of a PolyChord 2 nested sampling run in order to add additional threads will be less computationally expensive than a single likelihood call for many problems. Nevertheless, it is often more efficient for dynamic nested sampling software to generate additional threads in selected likelihood regions in batches rather than one at a time; this approach is used in the dynestyFootnote 4 dynamic nested sampling package.

Fig. 3
figure 3

Relative posterior mass (\(\propto \mathcal {L}(X)X\)) as a function of \(\log X\) for Gaussian likelihoods (11) and exponential power likelihoods (12) with \(b=2\) and \(b=\frac{3}{4}\). Each has a Gaussian prior (13) with \(\sigma _\pi =10\). The lines are scaled so that the area under each of them is equal

5 Numerical tests with perfect nested sampling

In the manner described by Keeton (2011) we first consider spherically symmetric test cases; here one can perform perfect nested sampling, as perfectly uncorrelated samples from the prior space within some iso-likelihood contour can be found using standard techniques. Results from nested sampling software used for practical problems may include additional uncertainties from imperfect sampling within a likelihood contour that are specific to a given implementation—we discuss these in Sect. 6. The tests in this section were run using our perfectns package.

Perfect nested sampling calculations depend on the likelihood \(\mathcal {L}(\theta )\) and prior \(\pi (\theta )\) only through the distribution of posterior mass \(\mathcal {L}(X)\) and the distribution of parameters on iso-likelihood contours \(P(f(\theta )|\mathcal {L}(\theta )=\mathcal {L}(X))\), each of which is a function of both \(\mathcal {L}(\theta )\) and \(\pi (\theta )\) (Higson et al. 2018c). We therefore empirically test dynamic nested sampling using likelihoods and priors with a wide range of distributions of posterior mass, and consider a variety of functions of parameters \(f(\theta )\) in each case.

We first examine perfect nested sampling of d-dimensional spherical unit Gaussian likelihoods centred on the origin

$$\begin{aligned} \mathcal {L}(\theta ) = {(2 \pi )}^{-d/2} \mathrm {e}^{-{|\theta |}^2 / 2}. \end{aligned}$$
(11)

For additional tests using distributions with lighter and heavier tails we use d-dimensional exponential power likelihoods

$$\begin{aligned} \mathcal {L}(\theta ) = \frac{d\, \varGamma (\frac{d}{2})}{{\pi }^{\frac{d}{2}} 2^{1+\frac{1}{2b}} \varGamma (1+\frac{n}{2b})} \mathrm {e}^{-{|\theta |}^{2b} / 2}, \end{aligned}$$
(12)

where \(b=1\) corresponds to a d-dimensional Gaussian (11). All tests use d-dimensional co-centred spherical Gaussian priors

$$\begin{aligned} \pi (\theta ) = {\left( 2 \pi \sigma _\pi ^2\right) }^{-d/2} \mathrm {e}^{-{|\theta |}^2 / 2 \sigma _\pi ^2}. \end{aligned}$$
(13)

The different distributions of posterior mass in \(\log X\) for (11) and (12) with dimensions d are illustrated in Fig. 3.

In tests of parameter estimation we denote the first component of the \(\theta \) vector as \(\theta _{\hat{1}}\), although by symmetry the results will be the same for any component. \(\overline{\theta _{\hat{1}}}\) is the mean of the posterior distribution of \(\theta _{\hat{1}}\), and the one-tailed \(Y\%\) upper credible interval \(\mathrm {C.I.}_{Y\%}(\theta _{\hat{1}})\) is the value \(\theta _{\hat{1}}^*\) for which \(P(\theta _{\hat{1}}<\theta _{\hat{1}}^*|\mathcal {L},\pi )=Y/100\).

Tests of dynamic nested sampling terminate after a fixed number of samples, which is set such that they use similar or slightly smaller numbers of samples than the standard nested sampling runs we compare them to. Dynamic runs have \(n_{\mathrm {init}}\) set to 10% of the number of live points used for the standard runs. Standard nested sampling runs use the termination conditions described by Handley et al. (2015b, Section 3.4), stopping when the estimated evidence contained in the live points is less than \(10^{-3}\) times the evidence contained in dead points (the default value used in PolyChord). This is an appropriate termination condition for nested sampling parameter estimation (Higson et al. 2018c), but if only the evidence is of interest then stopping with a larger fraction of the posterior mass remaining will have little effect on calculation accuracy.

The increase in computational efficiency from our method can be calculated by observing that nested sampling calculation errors are typically proportional to the square root of the computational effort applied (Skilling 2006; Higson et al. 2018c), and that the number of samples produced is approximately proportional to the computational effort. The increase in efficiency (computational speedup) from dynamic nested sampling over standard nested sampling for runs containing approximately the same number of samples on average can therefore be estimated from the variation of results as

$$\begin{aligned} \mathrm {Efficiency\,gain} = \frac{\mathrm {Var}\left[ \mathrm {standard\,NS\,results}\right] }{\mathrm {Var}\left[ \mathrm {dynamic\,NS\,results}\right] }. \end{aligned}$$
(14)

Here the numerator is the variance of the calculated values of some quantity (such as the evidence or the mean of a parameter) from a number of standard nested nested sampling runs, and the denominator is the variance of the calculated values of the same quantity from a number of dynamic nested sampling runs. When the two methods use different numbers of samples on average, (14) can be replaced with

$$\begin{aligned} \mathrm {Efficiency\,gain} = \frac{\mathrm {Var}\left[ \mathrm {standard\,NS\,results}\right] }{\mathrm {Var}\left[ \mathrm {dynamic\,NS\,results}\right] } \times \frac{\overline{N_\mathrm {samp,sta}}}{\overline{N_\mathrm {samp,dyn}}}, \end{aligned}$$
(15)

where the additional term is the ratio of the mean number of samples produced by the standard and dynamic nested sampling runs.

Fig. 4
figure 4

Live point allocation for a 10-dimensional Gaussian likelihood (11) with a Gaussian prior (13) and \(\sigma _\pi = 10\). Solid lines show the number of live points as a function of \(\log X\) for 10 standard nested sampling runs with \(n=500\), and 10 dynamic nested sampling runs with \(n_{\mathrm {init}}=50\), a similar number of samples and different values of G. The dotted and dashed lines show the relative posterior mass \(\propto \mathcal {L}(X)X\) and the posterior mass remaining \(\propto \int _{-\infty }^X \mathcal {L}(X')X' {\text {d}}{X'}\) at each point in \(\log X\); for comparison these lines are scaled to have the same area under them as the average of the number of live point lines. Standard nested sampling runs include the final set of live points at termination, which are modeled using a decreasing number of live points as discussed in Sect. 3. Similar diagrams for exponential power likelihoods (12) with \(b=2\) and \(b=\frac{3}{4}\) are presented in Figs. 15 and 16 in “Appendix E.1

Table 1 Test of dynamic nested sampling for a 10-dimensional Gaussian likelihood (11) and a Gaussian prior (13) with \(\sigma _\pi = 10\)

5.1 10-dimensional Gaussian example

We begin by testing dynamic nested sampling on a 10-dimensional Gaussian likelihood (11) with a Gaussian prior (13) and \(\sigma _\pi = 10\). Figure 4 shows the relative allocation of live points as a function of \(\log X\) for standard and dynamic nested sampling runs. The dynamic nested sampling algorithm (Algorithm 1) can accurately and consistently allocate live points, as can be seen by comparison with the analytically calculated distribution of posterior mass and posterior mass remaining. Dynamic nested sampling live point allocations do not precisely match the distribution of posterior mass and posterior mass remaining in the \(G=1\) and \(G=0\) cases because they include the initial exploratory run with a constant \(n_{\mathrm {init}}\) live points. Furthermore as additional live points are added where the importance is more than \(90\%\) of the maximum importance, the number of live points allocated by dynamic nested sampling is approximately constant for regions with importance of greater than \(\sim 90\%\) of the maximum—this can be clearly seen in Fig. 4 near the peak number of live points in the \(G=1\) case. Similar diagrams for exponential power likelihoods (12) with \(b=2\) and \(b=\frac{3}{4}\) are provided in “Appendix E.1” (Figs. 15 and 16), and show the allocation of live points is also accurate in these cases.

Fig. 5
figure 5

Distributions of results for the dynamic and standard nested sampling calculations shown in Table 1, plotted using kernel density estimation. Black dotted lines show the correct value of each quantity for the likelihood and prior used. Compared to standard nested sampling (blue lines), the distributions of results of dynamic nested sampling with \(G=1\) (red lines) for parameter estimation problems show much less variation around the correct value. Results for dynamic nested sampling with \(G=0\) (orange lines) are on average closer to the correct value than standard nested sampling for calculating \(\log \mathcal {Z}\), and results with \(G=0.25\) (green lines) show improvements over standard nested sampling for both evidence and parameter estimation calculations

The variation of results from repeated standard and dynamic nested sampling calculations with a similar number of samples is shown in Table 1 and Fig. 5. Dynamic nested sampling optimised for evidence calculation (\(G=0\)) and parameter estimation (\(G=1\)) produce significantly more accurate results than standard nested sampling. In addition, results for dynamic nested sampling with \(G=0.25\) show that both evidence calculation and parameter estimation accuracy can be improved simultaneously. Equivalent results for 10-dimensional exponential power likelihoods (12) with \(b=2\) and \(b=\frac{3}{4}\) are shown in Tables 8 and 9 in “Appendix E.1”. The reduction in evidence errors for \(G=0\) and parameter estimation errors for \(G=1\) in Table 1 correspond to increasing efficiency by factors of \(1.40 \pm 0.04\) and up to \(4.4 \pm 0.1\) respectively.

5.2 Efficiency gains for different distributions of posterior mass

Efficiency gains (14) from dynamic nested sampling depend on the fraction of the \(\log X\) range explored which contains samples that make a significant contribution to calculation accuracy. If this fraction is small most samples taken by standard nested sampling contain little information, and dynamic nested sampling can greatly improve performance. For parameter estimation (\(G=1\)), only \(\log X\) regions containing significant posterior mass (\(\propto \mathcal {L}(X)X\)) are important, whereas for evidence calculation (\(G=0\)) all samples taken before the bulk of the posterior is reached are valuable. Both cases benefit from dynamic nested sampling using fewer samples to explore the region after most of the posterior mass has been passed but before termination.

Fig. 6
figure 6

Efficiency gain (14) from dynamic nested sampling compared to standard nested sampling for likelihoods of different dimensions; each has a Gaussian prior (13) with \(\sigma _\pi = 10\). Results are shown for calculations of the log evidence, the mean, median and \(84\%\) one-tailed credible interval of a parameter \(\theta _{\hat{1}}\), and the mean and median of the radial coordinate \(|\theta |\). Each efficiency gain is calculated using 1000 standard nested sampling calculations with \(n=200\) and 1000 dynamic nested sampling calculations with \(n_{\mathrm {init}}=20\) using a similar or slightly smaller number of samples

Fig. 7
figure 7

Efficiency gain (14) from dynamic nested sampling for Gaussian priors (13) of different sizes \(\sigma _\pi \). Results are shown for calculations of the log evidence and the mean of a parameter \(\theta _{\hat{1}}\) for 2-dimensional Gaussian likelihoods (11) and 2-dimensional exponential power likelihoods (12) with \(b=2\) and \(b=\frac{3}{4}\). Each efficiency gain is calculated using 1000 standard nested sampling calculations with \(n=200\) and 1000 dynamic nested sampling calculations with \(n_{\mathrm {init}}=20\) using a similar or slightly smaller number of samples

We now test the efficiency gains (14) of dynamic nested sampling empirically for a wide range of distributions of posterior mass by considering Gaussian likelihoods (11) and exponential power likelihoods (12) of different dimensions d and prior sizes \(\sigma _\pi \). The results are presented in Figs. 7 and 6, and show large efficiency gains from dynamic nested sampling for parameter estimation in all of these cases.

Increasing the dimension d typically means the posterior mass is contained in a smaller fraction of the prior volume (Higson et al. 2018c), as shown in Fig. 3. In the spherically symmetric cases we consider, the range of \(\log X\) to be explored before significant posterior mass is reached increases approximately linearly with d. This increases the efficiency gain (14) from dynamic nested sampling for parameter estimation (\(G=1\)) but reduces it for evidence calculation (\(G=0\)). In high-dimensional problems the vast majority of the \(\log X\) range explored is usually covered before any significant posterior mass is reached, resulting in very large efficiency gains for parameter estimation but almost no gains for evidence calculation—as can be seen in Fig. 6. For the 1000-dimensional exponential power likelihood with \(b=2\), dynamic nested sampling with \(G=1\) improves parameter estimation efficiency by a factor of up to \(72\pm 5\), with the largest improvement for estimates of the median the posterior distribution of \(|\theta |\).

Increasing the size of the prior \(\sigma _\pi \) increases the fraction of the \(\log X\) range explored before any significant posterior mass is reached, resulting in larger efficiency gains (14) from dynamic nested sampling for parameter estimation (\(G=1\)) but smaller gains for evidence calculation (\(G=0\)). However when \(\sigma _\pi \) is small the bulk of the posterior mass is reached after a small number of steps, and most of the \(\log X\) range explored is after the majority of the posterior mass but before termination. Dynamic nested sampling places fewer samples in this region than standard nested sampling, leading to large efficiency gains for both parameter estimation and evidence calculation. This is shown in Fig. 7; when \(\sigma _\pi = 0.1\), dynamic nested sampling evidence calculations with \(G=0\) improve efficiency over standard nested sampling by a factor of approximately 7 for all 3 likelihoods considered. However we note that if only the evidence estimate is of interest then standard nested sampling can safely terminate with a higher fraction of the posterior mass remaining than \(10^{-3}\), in which case efficiency gains would be lower.

6 Dynamic nested sampling with challenging posteriors

Nested sampling software such as MultiNest and PolyChord use numerical techniques to perform the sampling within hard likelihood constrains required by the nested sampling algorithm; see Feroz et al. (2013) and Handley et al. (2015b) for more details. For challenging problems, such as those involving degenerate or multimodal posteriors, samples produced may not be drawn uniformly from the region of the prior within the desired iso-likelihood contour—for example if this software misses a mode in a multimodal posterior. This introduces additional uncertainties which are specific to a given software package and are not present in perfect nested sampling; we term these implementation-specific effects (see Higson et al. 2018b, for a detailed discussion).

Nested sampling software generally uses the population of dead and live points to sample within iso-likelihood contours, and so taking more samples in the region of an iso-likelihood contour will reduce the sampler’s implementation-specific effects. As a result dynamic nested sampling typically has smaller implementation-specific effects than standard nested sampling in the regions of the posterior where it has a higher number of live points, but conversely may perform worse in regions with fewer live points. For highly multimodal or degenerate likelihoods it is important all modes or other regions of significant posterior mass are found by the sampler—dynamic nested sampling performs better than standard nested sampling at finding hard to locate modes which become separated from the remainder of the posterior at likelihood values where it has more live points,Footnote 5 as illustrated schematically in Fig. 8.

Fig. 8
figure 8

Dynamic and standard nested sampling’s relative ability to discover hard to locate modes is determined by the number of live points present at the likelihood \(\mathcal {L}(X_\mathrm {split})\) at which a mode splits from the remainder of the posterior (illustrated on the left). In the schematic graph on the right we would expect dynamic nested sampling to be better at finding modes than standard nested sampling in region B (where it has a higher number of live points) but worse in regions A and C

Provided no significant modes are lost we expect dynamic nested sampling to have lower implementation-specific effects than standard nested sampling, as it has more live points—and therefore lower implementation-specific effects—in the regions which have the largest effect on calculation accuracy. If modes separate at likelihood values where dynamic nested sampling assigns few samples, \(n_{\mathrm {init}}\) must be made large enough to ensure no significant modes are lost. For highly multimodal posteriors, a safe approach is to set \(n_{\mathrm {init}}\) high enough to find all significant modes, in which case dynamic nested sampling will use the remaining computational budget to minimise calculation errors. Even if, for example, half of the computational budget is used on the initial exploratory run, dynamic nested sampling will still achieve over half of the efficiency gain compared to standard nested sampling that it could with a very small \(n_{\mathrm {init}}\).

The remainder of this section presents empirical tests of dynamic nested sampling for two challenging problems in which significant implementation-specific effects are present. Additional examples of dynamic nested sampling’s application to practical problems in scientific research can be found in Orazio et al. (2018), Zucker et al. (2018), Higson et al. (2018a) and Guillochon et al. (2018).

Fig. 9
figure 9

Posterior distributions for the 4-component 10-dimensional Gaussian mixture model (16) with component weights and means given by (17), and a Gaussian prior (13). By symmetry the distributions of \(\theta _{\hat{k}}\) are the same for \(k \in (3,\ldots ,d)\), so we only show only the first 4 components of \(\theta \); 1- and 2-dimensional plots of other parameters are the same as those of \(\theta _{\hat{3}}\) and \(\theta _{\hat{4}}\)

Fig. 10
figure 10

Live point allocation as in Fig. 4 but with a 10-dimensional Gaussian mixture likelihood (16), with component weights and means given by (17) and a Gaussian prior (13) with \(\sigma _\pi = 10\). The 10 standard nested sampling runs shown were generated using PolyChord with \(n=500\), and 10 dynamic nested sampling runs with each G value were generated using dyPolyChord with a similar number of samples and \(n_{\mathrm {init}}=100\). The dotted and dashed lines show the relative posterior mass \(\propto \mathcal {L}(X)X\) and the posterior mass remaining \(\propto \int _{-\infty }^X \mathcal {L}(X')X' {\text {d}}{X'}\) at each point in \(\log X\); for comparison these lines are scaled to have the same area under them as the average of the number of live point lines

6.1 Numerical tests with a multimodal posterior

We now use dyPolyChord to numerically test dynamic nested sampling on a challenging multimodal d-dimensional, M-component Gaussian mixture likelihood

$$\begin{aligned} \mathcal {L}(\theta ) = \sum _{m=1}^M W^{(m)} {\left( 2 \pi {\sigma ^{(m)}}^2\right) }^{-d/2} \exp \left( -\frac{{|\theta - \mu ^{(m)}|}^2}{2 {\sigma ^{(m)}}^2}\right) . \end{aligned}$$
(16)

Here each component m is centred on a mean \(\mu ^{(m)}\) with standard deviation \(\sigma ^{(m)}\) in all dimensions, and the component weights \(W^{(m)}\) satisfy \(\sum _{m=1}^M W^{(m)} = 1\). For comparison with the perfect nested sampling results using a Gaussian likelihood (11) in Sect. 5, we use \(d=10\), \(\sigma ^{(m)}=1\) for all m and a Gaussian prior (13) with \(\sigma _\pi = 10\). We consider a Gaussian mixture (16) of \(M=4\) components with means and weights

$$\begin{aligned}&W^{(1)}= 0.4, \qquad \mu ^{(1)}_{\hat{1}} = 0, \qquad \mu ^{(1)}_{\hat{2}} = 4,\nonumber \\&W^{(2)} = 0.3, \qquad \mu ^{(2)}_{\hat{1}} = 0, \qquad \mu ^{(2)}_{\hat{2}} = -\,4,\nonumber \\&W^{(3)} = 0.2, \qquad \mu ^{(3)}_{\hat{1}} = 4, \qquad \mu ^{(3)}_{\hat{2}} = 0,\nonumber \\&W^{(4)} = 0.1, \qquad \mu ^{(4)}_{\hat{1}} = -\,4, \quad \mu ^{(4)}_{\hat{2}} = 0,\nonumber \\&\text {and} \,\, \mu ^{(m)}_{\hat{k}} = 0 \quad \text {for all} \,\, k \, \in (3,\ldots ,d), \, m \in (1,\ldots ,M).\nonumber \\ \end{aligned}$$
(17)

The posterior distribution for this case is shown in Fig. 9.

Table 2 Tests of dynamic nested sampling as in Table 1 but with a 10-dimensional Gaussian mixture likelihood (16), with component weights and means given by (17) and a Gaussian prior (13) with \(\sigma _\pi = 10\)

As in Sect. 5, we compare standard nested sampling runs to dynamic nested sampling runs which use a similar or slightly smaller number of samples. dyPolyChord uses Algorithm 2, meaning only the initial run provides information on where to place samples, so we set \(n_{\mathrm {init}}\) to 20% of the number of live points used in standard nested sampling runs they are compared to, instead of the 10% used in the perfect nested sampling tests in Sect. 5.

The allocation of live points from dyPolyChord runs with the Gaussian mixture likelihood (16) is shown in Fig. 10. As in the tests with perfect nested sampling, the numbers of live points with settings \(G=1\) and \(G=0\) match the posterior mass and posterior mass remaining respectively despite the more challenging likelihood. The live point allocation is not as precise as in Fig. 4 due to dyPolyChord only using information from the initial exploratory run to calculate all the point importances. Another difference is that the truncation of the peak number of live points in the \(G=1\) in Fig. 4 is not present for dyPolyChord runs, as this is due to Algorithm 1 adding new points where the importance is within 90% of the maximum.

Table 3 Estimated errors due to implementation-specific effects for the Gaussian mixture likelihood results shown in Table 2, calculated using the method described in Higson et al. (2018b, Section 5)

Table 2 shows the variation of repeated calculations for dynamic nested sampling for the 10-dimensional Gaussian mixture model (16) with dyPolyChord. This shows significant efficiency gains (14) from dynamic nested sampling of \(1.3 \pm 0.1\) for evidence calculation with \(G=0\) and up to \(4.0 \pm 0.4\) for parameter estimation with \(G=1\), demonstrating how dynamic nested sampling can be readily applied to more challenging multimodal cases. In “Appendix E.2” we empirically verify that dynamic nested sampling does not introduce any errors from sampling bias (which would not be captured by efficiency gains (14) based on the variation of results) using analytically calculated true values of the log evidence and posterior means. Table 10 shows that the mean calculation results are very close to the correct values, and hence the standard deviation of the results is almost identical to their root-mean-squared-error, meaning efficiency gains (14) accurately reflect reductions in calculation errors (as for perfect nested sampling).

Table 3 shows estimated implementation-specific effects for the results in Table 2; these are calculated using the procedure described in Higson et al. (2018b, Section 5), which estimates the part of the variation of results which is not explained by the intrinsic stochasticity of perfect nested sampling. Dynamic nested sampling with \(G=1\) and \(G=0.25\) both reduce implementation-specific effects in all of the parameter estimation calculations as expected. However we are not able to measure a statistically significant difference in implementation-specific effects for \(\log \mathcal {Z}\) with \(G=0\); this is because for evidence calculations implementation-specific effects represent a much smaller fraction of the total error (see Higson et al. 2018b, for more details).

The efficiency gains in Table 2 are slightly lower than those for the similar unimodal Gaussian likelihood (11) used in Table 1; this is because of the higher \(n_{\mathrm {init}}\) value used, and because while implementation-specific effects are reduced by dynamic nested sampling they are not reduced by as large a factor as errors from the stochasticity of the nested sampling algorithm.

6.2 Numerical tests with signal reconstruction from noisy data

We now test dynamic nested sampling on a challenging signal reconstruction likelihood, which fits a 1-dimensional function \(y = f(x,\theta )\) using a sum of basis functions. Similar signal reconstruction problems are common in scientific research and are of great practical importance; for a detailed discussion see Higson et al. (2018a).

Fig. 11
figure 11

Signal reconstruction with generalised Gaussian basis functions. The first plot shows the true signal; this is composed of 4 generalised Gaussians (19), with the individual components shown by dashed lines. The 120 data points, which have added normally distributed x- and y-errors with \(\sigma _x=\sigma _y=0.05\), are show in the second plot. The third plot shows the fit calculated from a single dyPolyChord dynamic nested sampling run with \(G=1\), \(n_{\mathrm {init}}=400\), \(\texttt {num\_repeats}=400\) and 101,457 samples; coloured contours represent posterior iso-probability credible intervals on y(x)

Fig. 12
figure 12

Live point allocation as in Figs. 4 and 10 but for fitting 4 generalised Gaussians to the data shown in Fig. 11. In this case the likelihood (18) is 16-dimensional, and the priors are given in Table 11 in “Appendix G”. The 10 standard nested sampling runs shown were generated using PolyChord with \(n=2000\), and 10 dynamic nested sampling runs with each G value were generated using dyPolyChord with a similar number of samples and \(n_{\mathrm {init}}=400\). All runs use the setting \(\texttt {num\_repeats}=400\). The dotted and dashed lines show the relative posterior mass \(\propto \mathcal {L}(X)X\) and the posterior mass remaining \(\propto \int _{-\infty }^X \mathcal {L}(X')X' {\text {d}}{X'}\) at each point in \(\log X\); for comparison these lines are scaled to have the same area under them as the average of the number of live point lines

We consider reconstructing a signal y(x) given D data points \(\{x_d,y_d\}\), each of which has independent Gaussian x- and y-errors of size \(\sigma _x = \sigma _y = 0.05\) around their unknown true values \(\{X_d,Y_d\}\). In our example, the data points’ true x-coordinates \(X_d\) were randomly sampled with uniform probability in the range \(0< X_d < 1\). In this case the likelihood is (Hee et al. 2016)

$$\begin{aligned} \begin{aligned} \mathcal {L}(\theta ) = \prod _{d=1}^D \int _{0}^{1} \frac{\exp \left[ -\frac{{(x_d-X_d)}^2}{2\sigma _x^2}-\frac{{(y_d-f(X_d,\theta ))}^2}{2\sigma _y^2}\right] }{2\pi \sigma _x\sigma _y} {\text {d}}{X_d}, \end{aligned} \end{aligned}$$
(18)

where the integrals are over the unknown true values of the data points’ x-coordinates, and each likelihood calculation involves an integral for each of the D data points. We reconstruct the signal using generalised Gaussian basis functions

$$\begin{aligned} \phi (x,a,\mu ,\sigma ,\beta ) = a \mathrm {e}^{-{(|x - \mu |/\sigma )}^{\beta }}, \end{aligned}$$
(19)

where when \(\beta =1\) the basis function is proportional to a Gaussian. Our reconstruction uses 4 such basis functions,Footnote 6 giving 16 parameters

$$\begin{aligned} \theta= & {} (a_1,a_2,a_3,a_4,\mu _1,\mu _2,\mu _3,\mu _4,\sigma _1,\sigma _2,\nonumber \\&\sigma _3,\sigma _4,\beta _1,\beta _2,\beta _3,\beta _4), \end{aligned}$$
(20)

and

$$\begin{aligned} y(x,\theta ) = \sum _{j=1}^4 \phi (x, a_j,\mu _j,\sigma _j,\beta _j). \end{aligned}$$
(21)

The priors used are given in Table 11 in “Appendix G”.

Table 4 Tests of dynamic nested sampling as in Tables 1 and 2 but for fitting 4 generalised Gaussians to the data shown in Fig. 11; the likelihood is given by (18) and the priors are shown in Table 11 in “Appendix G
Table 5 Estimated error due to implementation-specific effects for the basis function fitting likelihood results shown in Table 4, calculated using the method described in Higson et al. (2018b, Section 5)

We use 120 data points, sampled from a true signal composed of the sum of 4 generalised Gaussian basis functions with parameters shown in Table 12 in “Appendix G”. The true signal, the noisy data and the posterior distribution of the signal calculated with dynamic nested sampling are shown in Fig. 11; this was plotted using the fgivenx package (Handley 2018). dyPolyChord’s allocation of live points for the basis function fitting likelihood and priors are shown in Fig. 12; as before, the software is able to accurately allocate live points in this case.

Table 4 shows efficiency gains from dynamic nested sampling over standard nested sampling for the signal reconstruction problem. Due to the computational expense of this likelihood, we use only 20 runs for each of standard nested sampling and dynamic nested sampling with \(G=0\), \(G=0.25\) and \(G=1\). Consequently the results are less precise than those for previous examples, but the improvements over standard nested sampling are similar to the other tests and include large efficiency gains in estimates of the mean value of the fitted signal (of up to \(9.0\pm 4.1\)). Furthermore, dynamic nested sampling is also able to reduce errors due to implementation-specific effects in this case — as can be seen in Table 5.

Table 6 Bootstrap sampling error estimates for dynamic nested sampling of a 3-dimensional Gaussian likelihood (11) and a Gaussian prior (13)

7 Conclusion

This paper began with an analysis of the effects of changing the number of live points on the accuracy of nested sampling parameter estimation and evidence calculations. We then presented dynamic nested sampling (Algorithm 1), which varies the number of live points to allocate posterior samples efficiently for a priori unknown likelihoods and priors.

Dynamic nested sampling can be optimised specifically for parameter estimation, showing increases in computational efficiency over standard nested sampling (14) by factors of up to \(72\pm 5\) in numerical tests. The algorithm can also increase evidence calculation accuracy, and can improve both evidence calculation and parameter estimation simultaneously. We discussed factors effecting the efficiency gain from dynamic nested sampling, including showing large improvements in parameter estimation are possible when the posterior mass is contained in a small region of the prior (as is typically the case in high-dimensional problems). Empirical tests show significant efficiency gains from dynamic nested sampling for a wide range likelihoods, priors, dimensions and estimators considered. Another advantage of dynamic nested sampling is that more accurate results can be obtained by continuing the run for longer, unlike in standard nested sampling.

We applied dynamic nested sampling to problems with challenging posteriors using dyPolyChord, and found the technique is able to reduce errors due to implementation-specific effects compared to standard nested sampling. This included tests with a practical signal reconstruction calculation, and a multimodal posterior in which the new method gave similar performance gains to the unimodal test cases. Dynamic nested sampling has also been applied to a number of problems in scientific research; see for example Orazio et al. (2018), Zucker et al. (2018), Higson et al. (2018a) and Guillochon et al. (2018).

The many popular approaches and software implementations for standard nested sampling can be easily adapted for dynamic nested sampling, since it too only requires samples to be drawn randomly from the prior within some hard likelihood constraint. As a result, our new method can be used to increase computational efficiency while maintaining the strengths of standard nested sampling. Publicly available dynamic nested sampling packages include dyPolyChord, dynesty and perfectns.