Dynamic nested sampling: an improved algorithm for parameter estimation and evidence calculation

We introduce dynamic nested sampling: a generalisation of the nested sampling algorithm in which the number of “live points” varies to allocate samples more efficiently. In empirical tests the new method significantly improves calculation accuracy compared to standard nested sampling with the same number of samples; this increase in accuracy is equivalent to speeding up the computation by factors of up to ∼72\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim 72$$\end{document} for parameter estimation and ∼7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim 7$$\end{document} for evidence calculations. We also show that the accuracy of both parameter estimation and evidence calculations can be improved simultaneously. In addition, unlike in standard nested sampling, more accurate results can be obtained by continuing the calculation for longer. Popular standard nested sampling implementations can be easily adapted to perform dynamic nested sampling, and several dynamic nested sampling software packages are now publicly available.


Introduction
Nested sampling (Skilling, 2006) is a numerical method for Bayesian computation which simultaneously provides both posterior samples and Bayesian evidence estimates.The original development of the algorithm was motivated by evidence calculation, but the MultiNest (Feroz and Hobson, 2008;Feroz et al., 2008Feroz et al., , 2013) ) and PolyChord (Handley et al., 2015a,b) software packages are now extensively used for parameter estimation from posterior samples (for example in DES Collaboration, 2017).Nested sampling performs well compared to Markov chain Monte Carlo (MCMC)-based parameter estimation for multi-modal and degenerate posteriors, and the PolyChord implementation is well suited to high dimensional parameter estimation problems.
Nested sampling maintains a set of samples from the prior, called live points, and iteratively updates them subject to the constraint that new samples have increasing likelihoods.Conventionally a fixed number of live points is used -we term this standard nested sampling.In this case the expected fractional shrinkage of the prior volume remaining is the same at each step, and as a result many samples are typically taken from regions of the prior that are remote from the bulk of the posterior.The allocation of samples in standard nested sampling is set by the likelihood and the prior, and cannot be changed depending on whether calculating the evidence or obtaining posterior samples is the primary goal.
We propose modifying the nested sampling algorithm by dynamically varying the number of live points in order to maximise the accuracy of a calculation for some number of posterior samples, subject to practical constraints.We term this more general approach dynamic nested sampling, with standard nested sampling representing the special case where the number of live points is constant.Dynamic nested sampling is particularly effective for parameter estimation, as standard nested sampling typically spends most of its computational effort iterating towards the posterior peak.This produces posterior samples with negligible weights which make little contribution to the calculation's accuracy, as discussed in our previous analysis of sampling errors in nested sampling parameter estimation (Higson et al., 2017).We also achieve significant improvements in the accuracy of evidence calculations, and show both evidence and parameter estimates can be improved simultaneously.Our approach can be easily incorporated into existing standard nesting sampling software such as MultiNest and PolyChord; we are currently working on its inclusion in the forthcoming PolyChord 2 package.
Dynamically weighting sampling in order to target regions of higher posterior mass has been used in the statistical physics literature, such as in multi-canonical sampling (see for example Okamoto, 2004).Other variants of nested sampling include diffusive nested sampling (Brewer et al., 2011) and superposition enhanced nested sampling (Martiniani et al., 2014), which have been implemented as stand alone software packages.In particular, dynamic nested sampling shares some similarities with DNest4 (Brewer and Foreman-Mackey, 2016), in which diffusive nested sampling is followed by additional sampling targeting regions of high posterior mass.In the case of nested sampling on unimodal posteriors, efficiency can be greatly increased using nested importance sampling (Chopin and Robert, 2010) or by performing nested sampling using an auxiliary prior which approximates the posterior as described in Cameron and Pettitt (2014).Dynamic nested sampling differs from these alternatives in that it can be used to improve the efficiency of popular standard nested sampling software such as MultiNest and PolyChord, in particular in high dimensional problems, while maintaining their strengths in sampling degenerate and multimodal distributions.
The paper proceeds as follows: Section 2 contains background on nested sampling, and Section 3 establishes useful results about the effects of varying the number of live points.Our dynamic nested sampling algorithm for increasing efficiency for general nested sampling calculations is presented in Section 4; its accurate allocation of live points for a priori unknown posterior distributions is illustrated in Figure 4.In the manner described by Keeton (2011) we test dynamic nested sampling using analytical cases where one can obtain uncorrelated samples from the prior space within some likelihood contour using standard techniques, and we term the resulting procedure perfect nested sampling (in both standard and dynamic versions).Numerical tests of dynamic nested sampling's performance compared to standard nested sampling use our publicly available PerfectNS package, and are described in Section 5.This includes a discussion of the effects of likelihood, priors and dimensionality on the improvements from dynamic nested sampling; in particular we find large efficiency gains for high-dimensional parameter estimation problems.We discuss including dynamic nested sampling in nested sampling software for use on practical problems such as multimodal posteriors in Section 6.

Background: the nested sampling algorithm
This section provides background on the nested sampling algorithm based on Higson et al. (2017), and sets out our notation.For a more detailed discussion see Higson et al. (2017) and Skilling (2006).
For a given likelihood L(θ) and prior π(θ), nested sampling will simultaneously compute the Bayesian evidence and samples from the posterior distribution The algorithm begins by sampling some number of live points randomly from the prior π(θ).In standard nested sampling, at each iteration i the point with the lowest likelihood L i replaced by a new point sampled from the region of prior with likelihood L(θ) > L i and the number of live points remains constant throughout the process.This process is continued until some termination condition is met, producing a list of samples (referred to as dead points) which are then used for evidence and parameter estimation.We term the finished nested sampling process a run.
Nested sampling calculates the evidence (1) as a one-dimensional integral where X(L) is the fraction of the prior with likelihood greater than L and L(X) ≡ X −1 (L).The prior volumes X i corresponding to the dead points i are unknown but can be modelled statistically as X i = t i X i−1 , where X 0 = 1.For a given number of live points n, each shrinkage ratio t i is independently distributed as the largest of n random variables from the interval [0, 1] and so (Higson et al., 2017): In standard nested sampling the number of live points n is some constant value for all t i -the iteration of the algorithm in this case is illustrated schematically in Figure 1.

Evidence estimation
Nested sampling calculates the evidence (3) as a quadrature sum over the dead points where t = {t 1 , t 2 , . . ., t ndead } are the unknown set of shrinkage ratios for each dead point and each t i is an independent random variable with distribution (4).The w i are appropriately chosen quadrature weights; we use the trapezium rule such that w i , where X i (t) = i k=0 t k .Given that the shrinkage ratios t are a 0 termination direction of iteration mean step size ≈ 1/n log X L(X)X L(X) samples Fig. 1.A schematic illustration of standard nested sampling with a constant number of live points n reproduced from Higson et al. (2017).L(X)X shows the relative posterior mass, the bulk of which is contained in some small fraction of the prior.Most of the samples in the diagram are in log X regions with negligible posterior mass, as is typically the case in standard nested sampling.
priori unknown, one typically calculates an expected value and error on the evidence (5) using the distribution (4).The dominant source of error in evidence estimates from perfect nested sampling is the statistical variation in the unknown volumes of the prior "shells" w i (t).

Parameter estimation
Nested sampling parameter estimation uses the dead points to construct a set of posterior samples with weights proportional to their share of the posterior mass: Neglecting any implementation-specific errors not present in perfect nested sampling, the dominant sampling errors in estimating some parameter or function of parameters f (θ) come from two sources Higson et al. (2017): (i) approximating the relative point weights p i (t) with their expectation E[p i (t)] using (4); (ii) approximating the mean value of a function of parameters over an entire isolikelihood contour with its value at a single point f (θ i ).
2.3.Combining and dividing nested sampling runs Skilling (2006) describes how several standard nested sampling runs r = 1, 2, . . .with constant live points n (r) may be combined simply by merging the dead points and sorting by likelihood value.The combined sequence of dead points is equivalent to a single nested sampling run with n combined = r n (r) live points.Higson et al. (2017) gives an algorithm for the reverse procedure -decomposing a nested sampling run with n live points into a set of n valid nested sampling runs, each with 1 live point.These single live point runs, which we term threads, are the smallest unit from which valid nested sampling runs can be constructed and will prove useful in developing dynamic nested sampling.

Variable numbers of live points
Before presenting our algorithm for dynamic nested sampling in Section 4, we first establish some basic results for a nested sampling run in which the number of live points varies.Such runs are valid as successive shrinkage ratios t i are independently distributed (Skilling, 2006).
Let us define n i as the number of live points present for the prior shrinkage ratio t i between dead points i − 1 and i. †.In this notation all information about the number of live points for a nested sampling run can be expressed as a list of numbers n = {n 1 , n 2 , . . ., n ndead } which correspond to the shrinkage ratios t = {t 1 , t 2 , . . ., t ndead }.Nested sampling calculations for variable numbers of live points differ from the constant live point case only in the use of different n i in calculating the distribution of each t i from (4).Skilling (2006)'s method for combining constant live point runs, mentioned in Section 2.3, can be extended to accommodate variable numbers of live points by requiring that at any likelihood the live points of the combined run equals the sum of the live points of the constituent runs at that likelihood, as illustrated in Figure 2. Conversely if variable live point runs are divided into their constituent threads, some threads will start and finish part way through the run.On iso-likelihood contours when the number of live points is increased there is no longer a unique division into threads as there is for standard nested sampling.
In addition the variable live point framework provides a natural way to include the final set of live points remaining when a standard nested sampling run terminates in a calculation.These are uniformly distributed in the region of the prior with L(θ) > L terminate , and can be treated as samples from a dynamic nested sampling run with the number of live points reducing by 1 as each of the points remaining after termination is passed until the final point i has n i = 1.This allows the final live points of standard nested sampling runs to be combined with variable live point runs.
The remainder of this section analyses the effects of local variations in the number of live points on the accuracy of nested sampling evidence calculation and parameter estimation.The dynamic nested sampling algorithm in Section 4 uses these results to allocate additional live points.†In order for (4) to be valid, the number of live points must remain constant across the shrinkage ratios t i between successive dead points.We therefore only allow the number of live points to change on iso-likelihood contours L(θ) = L i where a dead point i is present.This restriction has negligible effects for typical calculations, and is automatically satisfied by most nested sampling implementations.a) and n (a) into a single nested sampling run c; black dots show dead points arranged in order of increasing likelihood.The number of live points in run c at some likelihood equals the sum of the live points of run a and run b at that likelihood.

Effects on calculation accuracy
Nested sampling calculates the evidence Z as the sum of sample weights (5); the dominant sampling errors are from statistically estimating shrinkage ratios t i which affect the weights of all subsequent points.In Appendix B we show that the reduction in evidence errors achieved by taking additional samples to increase the local number of live points n i is inversely proportional to n i , and is approximately proportional to the evidence contained in subsequent points.This makes sense as the dominant evidence errors are from statistically estimating shrinkages t i which affect all subsequent points j ≥ i.
In nested sampling parameter estimation, sampling errors come both from taking a finite number of samples in any region of the prior and from the stochastic estimation of their normalised weights p i from (6).Typically standard nested sampling takes many samples with negligible posterior mass as illustrated in Figure 1; these make little contribution to estimates of parameters or to the accuracy of samples' normalised weights.From (4) the expected separation between points in log X (approximately proportional to the posterior mass they each represent) is 1/n i .As a result increasing the number of live points wherever the dead points' posterior weights p i ∝ L i w i are greatest distributes posterior mass more evenly among the samples.This improves the accuracy of the statistically estimated weights p i , and can dramatically increase the information content (Shannon entropy of the samples) which is maximised for a given number of samples when the sample weights are equal.
Empirical tests of dynamic nested sampling show that increasing n wherever points have the highest p i ∝ L i w i works well at increasing parameter estimation accuracy for most calculations.
The optimum allocation of live points is different for different parameter estimation problems involving different functions of parameters f (θ) as the contribution of each sample i is dependent on f (θ i ).In most cases the relative weight p i of samples is a good approximation for their influence on a calculation, but for some problems much of the error may come from sampling log X regions containing a small fraction of the posterior mass but with extreme parameter values ‡.Estimating the importance of points to a specific parameter estimation calculation is discussed in Appendix C.

Dynamic nested sampling
We now present an algorithm for performing nested sampling calculations with a dynamically varying number of live points to optimise the allocation of samples.
Since the distribution of posterior mass as a function of the likelihood is a priori unknown, we first approximate it by performing a standard nested sampling run with some small constant number of live points n init .The algorithm then proceeds by iteratively calculating the range of likelihoods where increasing the number of live points will have the greatest effect on calculation accuracy, and generating an additional thread § running over these likelihoods.
From the discussion in Section 3.1 we define functions to measure the relative importance of a sample i for evidence calculation ¶ and parameter estimation respectively as Modifying ( 9) to optimise for estimation of a specific parameter is discussed in Appendix C. The user specifies how to divide computational resources between evidence calculation and parameter estimation through an input goal G ∈ [0, 1], where G = 0 corresponds to optimising for evidence calculation and G = 1 optimises for parameter estimation.The dynamic nested sampling algorithm calculates points' importance as a weighted sum The likelihood range in which to run an additional thread is chosen by finding all points with importance greater than some fraction (we use 90%) of the largest importance.To ensure any steep or discontinuous increases in the likelihood L(X) are captured we find the first point j and last point k which meet this condition, then generate an additional thread starting at L j−1 and ending when a point is sampled with likelihood greater than L k+1 .If j is the first dead point, threads which initially sample the whole ‡See Higson et al. (2017, Section 3.1) for diagrams illustrating sampling errors and the distributions of parameter values as a function of log X for nested sampling runs.
§If required some n batch additional threads can be generated at each step to reduce the number of importance calculations.We find in empirical tests that this has little effect on efficiency gains from dynamic nested sampling when the number of samples taken in each batch is small compared to the total number of samples in the run.
¶Alternatively (8) can be replaced with a more complex expression using the result derived in Appendix B (23), although we find this typically makes little difference to results.prior are generated.If k is the final dead point then the thread will stop when a sample with likelihood greater than L k is found.This allows the new thread to continue beyond L k , meaning dynamic nested sampling iteratively explores higher likelihoods when this is the most effective use of samples.
Unlike in standard nested sampling, more accurate dynamic nested sampling results can be obtained simply by continuing the calculation for longer.The user must specify a condition at which to stop dynamically adding threads, such as when fixed number of samples has been taken or some desired level of accuracy has been achieved.Sampling errors on evidence and parameter estimation calculations can be estimated from the dead points at any stage using the method described in Higson et al. (2017).We term these dynamic termination conditions to distinguish them from the type of termination conditions used in standard nested sampling. More

Numerical tests
In the manner described by Keeton (2011) we use spherically symmetric analytical cases for our empirical tests; here one can perform perfect nested sampling as perfectly uncorrelated samples from the prior space within some iso-likelihood contour can be found using standard techniques.The tests in this paper were run using our PerfectNS package .Software used for practical problems may cause additional effects from imperfect sampling within a likelihood contour that are specific to a given implementation -we discuss these in Section 6.
Perfect nested sampling calculations depend on the likelihood L(θ) and prior π(θ) only through the distribution of posterior mass L(X) and the distribution of parameters on iso-likelihood contours P (f (θ)|X), both of which are functions of L(θ) and π(θ) (Higson PerfectNS is python package for performing perfect dynamic and standard nested sampling for spherically symmetric likelihoods and priors, and analysing the resulting runs.It can be downloaded at https://github.com/ejhigson/PerfectNS-see the documentation of more details.et al., 2017).We therefore empirically test our method using likelihoods and priors with a wide range of distributions of posterior mass, and consider several functions of parameters f (θ) in each case.
We first examine perfect nested sampling of d-dimensional spherical unit Gaussian likelihoods with d-dimensional co-centred spherical Gaussian priors For additional tests using distributions with lighter and heavier tails than Gaussians we use d-dimensional exponential power likelihoods where b = 1 corresponds to a d-dimensional Gaussian (11).The different distributions of posterior mass in log X for ( 11) and ( 13) with dimensions d are illustrated in Figure 3.
In tests of parameter estimation we denote the first component of the θ vector as θ 1, although by symmetry the results will be the same for any component.θ 1 is the mean of the posterior distribution of θ 1, and the one-tailed Y % upper credible interval C.I. Y % (θ 1) is the value θ * 1 for which P (θ 1 < θ * 1 |L, π) = Y /100.Tests of dynamic nested sampling use n init = 5 and terminate after a fixed number of samples, which is set such that they use similar or slightly fewer samples than standard nested sampling runs we compare them to.Standard nested sampling runs use the termination conditions described by Handley et al. (2015b, Section 3.4), stopping when the estimated evidence contained in the live points is less than 10 −3 times the evidence contained in dead points (the default value used in PolyChord).This is an appropriate termination condition for nested sampling parameter estimation (Higson et al., 2017), but if only the evidence is of interest then stopping with a larger fraction of the posterior mass remaining will have little effect on evidence calculation accuracy.
We begin by testing dynamic nested sampling on a 10-dimensional Gaussian likelihood (11) with a Gaussian prior (11) and σ π = 10. Figure 4 shows the relative allocation of live points as a function of log X for standard and dynamic nested sampling runs.The dynamic nested sampling algorithm (Algorithm 1) can accurately and consistently allocate live points, as can be seen by comparison with the analytically calculated distribution of posterior mass and posterior mass remaining * * .Similar diagrams for exponential power likelihoods ( 13) with b = 2 and b = 3 4 are presented in Appendix A (Figures 9 and 10), and show the allocation of live points is also accurate in these cases.
The distribution of results from repeated standard and dynamic nested sampling calculations with a similar number of samples is shown in Table 1 and Figure 5. Dynamic nested sampling optimised for evidence calculation (G = 0) and at parameter estimation (G = 1) produce significantly more accurate results than standard nested sampling.In addition results for dynamic nested sampling with G = 0.25 show that both evidence calculation and parameter estimation accuracy can be improved simultaneously.Equivalent results tables for 10-dimensional exponential power likelihoods (13) with b = 2 and b = 3 4 are presented in Appendix A (Tables 2 and 3).
As nested sampling calculation errors are typically proportional to the square root of the computational effort applied (Skilling, 2006;Higson et al., 2017), the increase in efficiency (computational speedup) from dynamic nested sampling over standard nested sampling can be estimated as efficiency gain = Var [dynamic nested sampling calculation results] Var [standard nested sampling calculation results] , where the calculation results from dynamic and standard nested sampling use approximately the same number of samples.The reduction in evidence errors for G = 0 and parameter estimation errors for G = 1 in Table 1 correspond to increasing efficiency by factors of 1.45 and up to 3.9 respectively.

Efficiency gains for different distributions of posterior mass
Efficiency gains (14) from dynamic nested sampling depend on the fraction of the log X range explored which makes a significant contribution to calculation accuracy -if this is small then most samples taken by standard nested sampling contain little information and dynamic nested sampling can greatly improve performance.For parameter estimation (G = 1), only log X regions containing significant posterior mass (∝ L(X)X) * * Dynamic nested sampling live point allocations do not precisely match the distribution of posterior mass and posterior mass remaining in the G = 1 and G = 0 cases because they include the initial exploratory run with a constant n init live points, and because each thread includes one sample taken after its termination condition is met.Furthermore as additional live points are added where samples' importance is more than 90% of the maximum importance, the number of live points allocated by dynamic nested sampling is approximately constant for regions with importance of greater than ∼ 90% of the maximum -this can be clearly seen in Figure 4 near the peak number of live points in the G = 1 case.are important, whereas for evidence calculation (G = 0) all samples taken before the bulk of the posterior is reached are valuable.Both cases benefit from dynamic nested sampling using fewer samples to explore the region after most of the posterior mass has been passed but before termination.
We now test the efficiency gains ( 14) of dynamic nested sampling empirically for a wide range of distributions of posterior mass by considering Gaussian likelihoods (11) and exponential power likelihoods (13) of different dimensions d and prior sizes σ π .The results are presented in Figures 6 and 7, and show large efficiency gains from dynamic nested sampling for parameter estimation in all of these cases.
Increasing the dimension d typically means the posterior mass is contained in a smaller fraction of the prior volume (Higson et al., 2017), as can be seen in Figure 3.In the spherically symmetric cases we consider the range of log X to be explored before significant posterior mass is reached increases approximately linearly with d.This increases the efficiency gain (14) from dynamic nested sampling for parameter estimation (G = 1) but reduces it for evidence calculation (G = 0).High dimensional problems often have the vast majority of the log X range explored lying before any significant posterior mass is reached; this results in very large efficiency gains for parameter estimation but almost Table 1.Test of dynamic nested sampling for a 10-dimensional Gaussian likelihood (11) and a Gaussian prior (12) with σ π = 10.The first four rows show the standard deviation of 5000 calculations for standard nested sampling with a constant number of live points n = 200.The next three rows show the standard deviations of 5000 dynamic nested sampling calculations with a similar number of samples, optimised for purely evidence calculation (G = 0), for both evidence and parameter estimation (G = 0.25) and purely for parameter estimation (G = 1).The final three rows show the computational efficiency gain (14) from dynamic nested sampling over standard nested sampling in each case.The first column shows the mean number of samples for the 5000 runs; the remaining columns show calculations of the log evidence and the mean, median and 84% one-tailed credible interval of a parameter θ 1. Numbers in brackets show the error on the final digit.no gains for evidence calculation as shown in Figure 6.For the 1000-dimensional exponential power likelihood with b = 2, dynamic nested sampling with G = 1 improves parameter estimation efficiency by a factor of up to 70 ± 4.
Increasing the size of the prior σ π increases the fraction of the log X range explored before any significant posterior mass is reached, resulting in larger efficiency gains ( 14) from dynamic nested sampling for parameter estimation (G = 1) but smaller gains for evidence calculation (G = 0).However when σ π is small the bulk of the posterior mass is reached after a small number of steps, and most of the log X range explored is after the majority of the posterior mass but before termination.Dynamic nested sampling places few samples in this region, leading to large efficiency gains for both parameter estimation and evidence calculation, as shown in Figure 7.When σ π = 0.1, dynamic nested sampling evidence calculations with G = 0 improve efficiency over standard nested sampling by a factor of approximately 7. However we note that if only the evidence estimate is of interest then standard nested sampling can safely terminate with a higher fraction of the posterior mass remaining than 10 −3 , in which case efficiency gains would be lower.

Dynamic nested sampling with existing nested sampling software
Existing standard nested sampling software such as MultiNest and PolyChord can be easily adapted to perform dynamic nested sampling.We are working on incorpo-rating dynamic nested sampling into PolyChord 2, which is currently in production.PolyChord 2 will be able to perform nested sampling in up to approximately 1000 dimensions -dynamic nested sampling can offer significant efficiency gains in such calculations.
We anticipate reloading a past iteration i of a PolyChord 2 nested sampling run in order to add additional threads will be less computationally expensive than a single likelihood call for many problems.Nevertheless it may prove most efficient for nested sampling software performing dynamic nested sampling to generate additional threads in selected likelihood regions in batches rather than one at a time; provided the number of samples taken in each batch is much less that the total number of samples in the run this will have little effect on efficiency gains from dynamic nested sampling.
Nested sampling software used for practical problems can only approximately sample randomly from the prior within iso-likelihood contours -this introduces additional complexities and errors which are specific to a given software and are not present in perfect nested sampling.MultiNest and PolyChord use dead points to help them sample within iso-likelihood contours, so provided no significant modes are lost we expect the additional samples from dynamic nested sampling in the regions of log X of most importance to the calculation will reduce implementation-specific errors.

Multimodal posteriors
The most important concern for implementing dynamic nested sampling is ensuring no modes or other regions of significant posterior mass are lost.Dynamic nested sampling aids the location of modes which become separated from the remainder of the posterior at likelihood values where it places more samples than standard nested sampling, but will make it harder to find modes where there are few samples, as illustrated in Figure 8. However if a mode is only discovered late in the dynamic nested sampling process it may still be undersampled due to not being present in threads calculated before it was found.If modes separate at likelihood values where dynamic nested sampling assigns few samples, n init must be made large enough to ensure no significant modes are lost.
For highly multimodal posteriors, a safe approach is to set n init high enough to find all significant modes, in which case dynamic nested sampling will use the remaining computational budget to increase calculation accuracy as much as possible.Even if half of the computational budget is used on the initial exploratory run, dynamic nested sampling will achieve over half of the efficiency gain compared to standard nested sampling that it could with a very small n init .

Conclusion
This paper began with an analysis of the effects of changing the number of live points on the accuracy of nested sampling parameter estimation and evidence calculations.We then presented dynamic nested sampling (Algorithm 1), which varies the number of live points to allocate posterior samples efficiently for a priori unknown likelihoods and priors.
Dynamic nested sampling can be optimised specifically for parameter estimation, showing increases in computational efficiency over standard nested sampling (14) by number of live points Fig. 8. Dynamic nested sampling's ability to discover modes is determined by the number of live points present at the likelihood L(X split ) at which a mode splits from the remainder of the posterior (illustrated on the left).In the schematic graph on the right we would expect dynamic nested sampling to be better at finding modes than standard nested sampling in region B (where it has a higher number of live points) but worse in regions A and C.
factors of up to 70 ± 4 in numerical tests.The algorithm can also increase evidence calculation accuracy, and can improve both evidence calculation and parameter estimation simultaneously.We discussed factors effecting the efficiency gain from dynamic nested sampling, including showing large improvements in parameter estimation are possible when the posterior mass is contained in a small region of the prior (as is often the case in high dimensional problems).Empirical tests show significant efficiency gains from dynamic nested sampling for a wide range likelihoods, priors, dimensions and estimators considered.Another advantage of dynamic nested sampling is that more accurate results can be obtained by continuing the run for longer, unlike in standard nested sampling.
We have implemented dynamic nested sampling in our PerfectNS package, and it can be easily incorporated in existing standard nested sampling software for use on practical problems including degenerate and multimodal posteriors.We are currently working on including dynamic nested sampling in the forthcoming PolyChord 2 software package.

B. Effect of varying n on evidence calculation accuracy
Nested sampling estimates the evidence as the expectation of (5), with the dominant sampling errors from the unknown shrinkage ratios t i which are independently distributed according to (4).We now investigate the effect of increasing the number of live points n i across some shrinkage t i by considering (5) with all t j =i marginalised out and conditioned on t i , defining For simplicity instead of using the trapezium rule we calculate point weight as In this case uncertainty in t i causes sampling errors in the weight of point i and all subsequent points † † and † †If the trapezium rule is used t i also affects the weight of the previous point i − 1, but this has little effect on the results.
where the terms in square brackets are independent of t i .Substituting ( 16) into (15) and integrating gives where we have defined Z >i ≡ k>i L k w k and Z <i ≡ k<i L k w k .From the distribution of the shrinkage ratios ( 4) so ( 17) can be written where terms in large brackets are independent of t i .Using the expression for St.Dev.[ti ] from ( 18), the standard deviation of The expected number of samples (computational work) needed to increase the number of live points over some interval (L a , L b ) is proportional to the log prior shrinkage log X(L a ) − log X(L b This quantity can be easily calculated for a set of dead points with little computational cost.Typically Thus the accuracy gained from taking additional samples is approximately proportional to the evidence contained in subsequent dead points.This makes sense as the dominant evidence errors are from statistically estimating shrinkages t i which affect all subsequent points j ≥ i.

C. Tuning for a specific parameter estimation problem
Dynamic nested sampling improves parameter estimation efficiency by placing more samples in log X regions with significant posterior mass and fewer in regions with little posterior mass.However for some likelihoods and parameter estimation problems a large contribution may come from samples in log X regions containing extreme parameter values or very high variability of parameter values but little posterior weight.In this case the expression for sample importances (9) can be modified to favour points with parameter values which will have a large effect on the calculation.For example estimating the global mean of some parameter or function of parameters E[f (θ)] = i f (θ i )L i w i , one could place additional weight on regions with parameter values that have a large effect on the calculation by calculating importances as This expression is highly variable as each point i is a single sample from an iso-likelihood contour L(θ) = L i which may cover a wide range of parameters.However dynamic nested sampling (Algorithm 1) uses only the first and last points of high importance in allocating new threads, so using (25) captures log X regions in which some samples have extreme or highly variable parameter values.For tuning dynamic nested sampling for calculating the mean of a parameter θ 1, (25) becomes where θ 1 is the global mean of θ 1.
We illustrate tuning for a specific parameter by using dynamic nested sampling with a d-dimensional spherical unit Cauchy likelihood . ( The Cauchy likelihoods have extremely heavy tails and (except in high dimensions) have significant posterior mass present across almost the entire range of log X explored, as shown in Figure 11.We therefore expect relatively low efficiency gains for dynamic parameter estimation (G = 1) in this case, but use it for a proof of principle.
Figure 12 shows the allocation of live points by dynamic nested sampling with and without tuning.In the tuned case the live points allocation is consistent with the analytical expectation of (26) ‡ ‡, showing live points can be allocated correctly.
Table 4 shows the efficiency gain for dynamic nested sampling for a 10-dimensional Cauchy likelihood (27) with a Gaussian prior (12) and σ π = 10.When estimating θ 1 the ‡ ‡For a Cauchy likelihood ( 27) with a co-centred spherically symmetric uniform prior, the analytic value of E[θ 1] is 0 and each iso-likelihood contour L(θ) = L(X) is a spherically symmetric surface with radius |θ|.The expectation of |θ i | on such an iso-likelihood contour is |θ|/ √ d, so the analytical expectation of the importance (26) is calculation is dominated by samples in the tails of the distribution at low likelihoods and dynamic nested sampling therefore gives only a small efficiency gain (14) over standard nested sampling.Tuned dynamic nested sampling is able to improve efficiency gain, as shown in the final row of Table 4. Using the tuned importance function affects the performance gain for other calculations -for example in this case it significantly improves estimates of the second moment of the distribution θ 12 in comparison to the G = 1 case without tuning.

D. Estimating sampling errors in dynamic nested sampling
The technique for estimating sampling errors by resampling threads introduced in Higson et al. (2017) can be applied to dynamic nested sampling.Table 5 shows numerical tests Table 5. Sampling errors for dynamic nested sampling of a 3-dimensional Gaussian likelihood (11), a Gaussian prior (12), G = 1 and n = 200.The first row shows the standard deviation of 5000 nested sampling calculations and the second shows the mean of 5000 error estimates from resampling (using 200 replications) as a ratio to the error observed from repeated calculations.The third row shows the standard deviations of bootstrap sampling error estimates for single runs as a percentage of the mean estimate.The fourth row shows the mean of 500 bootstrap estimates of the one-tailed 95% credible interval on the calculation result given the sampling error, each using 1000 bootstrap replications.The final two rows show the empirical coverage of the bootstrap standard error and 95% credible interval from the 5000 repeated calculations.Numbers in brackets show the error on the final digit.

Fig. 2 .
Fig. 2.Combining nested sampling runs a and b with variable numbers of live points n(a) and n (a) into a single nested sampling run c; black dots show dead points arranged in order of increasing likelihood.The number of live points in run c at some likelihood equals the sum of the live points of run a and run b at that likelihood.

Fig. 3 .
Fig. 3. Relative posterior mass (∝ L(X)X) as a function of log X for Gaussian likelihoods (11) and exponential power likelihoods (13) with b = 2 and b = 34 .Each has a Gaussian prior (12) with σ π = 10.The lines are scaled so that the area under each of them is equal.

Fig. 4 .
Fig. 4. Live point allocation for a 10-dimensional Gaussian likelihood (11) with a Gaussian prior (12) and σ π = 10.Solid lines show the number of live points as a function of log X for 10 standard nested sampling runs, and 10 dynamic nested sampling runs with a similar number of samples and different values of G.The dotted and dashed lines show the relative posterior mass ∝ L(X)X and the posterior mass remaining ∝ X −∞ L(X )X dX at each point in log X; for comparison these lines are scaled to have the same area under them as the average of the number of live point lines.Standard nested sampling runs include the final set of live points at termination, which are modeled using a decreasing number of live points as discussed in Section 3. Similar diagrams for exponential power likelihoods (13) with b = 2 and b = 3 4 are presented in Appendix A (Figures 9 and 10).

Fig. 5 .G
Fig. 5. Distributions of results for the standard nested sampling and dynamic nested sampling calculations in Table1.Dynamic nested sampling with G = 0 and G = 1 significantly reduces evidence and parameter estimation sampling errors respectively compared to standard nested sampling.Dynamic nested with G = 0.25 reduces both evidence and parameter estimation sampling errors.The colour scale shows the fraction of the cumulative probability distribution lying between some region and the median.

GFig. 6 .Fig. 7 .
Fig.6.Efficiency gain (14) from dynamic nested sampling compared to standard nested sampling for likelihoods of different dimensions; each has a Gaussian prior (12) with σ π = 10.Results are shown for calculations of the log evidence and the mean, second moment, median and 84% one-tailed credible interval of the posterior distribution of the parameter θ 1.Each efficiency gain is calculated using 1000 standard nested sampling calculations with n = 200 and 1000 dynamic nested sampling calculations using a similar or slightly smaller number of samples.
Input: Goal G, n init , dynamic termination condition.Generate a nested sampling run with a constant number of live points n init ; while dynamic termination condition not satisfied do recalculate importance I(G, i) of all points; find first point j and last point k with importance of > 90% of the largest importance; generate a additional thread § starting at L j−1 and ending with the first sample taken with likelihood greater than L k+1 ; end Algorithm 1:

Table 3 .
As in Table1but with a 10-dimensional exponential power likelihood (13) with ). Hence the expected extra samples ∆N s required to increase the local number of live points n i is proportional to the interval log t i , which has an expected size of 1/n i .The change in the error on the evidence with extra samples is therefore