Skip to main content
Log in

Computing highly accurate confidence limits from discrete data using importance sampling

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

For discrete data, frequentist confidence limits based on a normal approximation to standard likelihood based pivotal quantities can perform poorly, even for quite large sample sizes. To construct exact limits requires the probability of a suitable tail set as a function of the unknown parameters. In this paper, importance sampling is used to estimate this surface and hence the confidence limits. The technology is simple and straightforward to implement. Unlike the recent methodology of Garthwaite and Jones (in J. Comput. Graph. Stat. 18, 184–200, 2009), the new method allows for nuisance parameters; is an order of magnitude more efficient than the Robbins-Monro bound; does not require any simulation phases or tuning constants; gives a straightforward simulation standard error for the target limit; includes a simple diagnostic for simulation breakdown.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Aleksandrov, V.M., Sysoyev, V.I., Shemeneva, V.V.: Stochastic optimization. Eng. Cybern. 5, 11–16 (1968)

    Google Scholar 

  • Carpenter, J.: Test inversion bootstrap confidence intervals. J. R. Stat. Soc. B 61, 159–172 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Chung, K.L.: On a stochastic approximation method. Ann. Math. Stat. 25, 463–483 (1954)

    Article  MATH  Google Scholar 

  • Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Application. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  • Garthwaite, P.H., Buckland, S.T.: Generating Monte Carlo confidence intervals by the Robbins Monro process. Appl. Stat. 41, 159–171 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  • Garthwaite, P.H., Jones, M.C.: A stochastic approximation method and its application to confidence intervals. J. Comput. Graph. Stat. 18, 184–200 (2009)

    Article  MathSciNet  Google Scholar 

  • Geweke, J.: Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57, 1317–1339 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  • Hoogerheide, L.F., van Dijk, H.K.: Bayesian forecasting of value at risk and expected shortfall using adaptive importance sampling. Int. J. Forecast. 26, 231–247 (2010)

    Article  Google Scholar 

  • Kabaila, P.V.: Some properties of profile bootstrap confidence intervals. Aust. J. Stat. 35, 205–214 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Kabaila, P.V., Lloyd, C.J.: Profile upper confidence limits for discrete data. Aust. N. Z. J. Stat. 42, 67–80 (2001)

    Article  MathSciNet  Google Scholar 

  • Kabaila, P.V., Lloyd, C.J.: Improved Buehler limits based on refined designated statistics. J. Stat. Plan. Inference 136, 3145–3155 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Kallianpur, G.: A note on the Robbins-Monro stochastic approximation method. Ann. Math. Stat. 25, 386–388 (1954)

    Article  MATH  MathSciNet  Google Scholar 

  • Kahn, H., Marshall, A.: Methods of reducing sample size in Monte Carlo computations. J. Oper. Res. Soc. Am. 1, 263–278 (1953)

    Article  Google Scholar 

  • Kroese, D.P., Taimre, T., Botev, Z.I.: Handbook of Monte Carlo Methods. Wiley, New York (2011)

    Book  MATH  Google Scholar 

  • Lloyd, C.J.: Computing highly accurate confidence limits from discrete data using importance sampling. MBS working paper (2011). works.bepress.com/chris_lloyd/23/

  • Owen, A., Zhou, Y.: Safe and effective importance sampling. J. Am. Stat. Assoc. 95, 135–143 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  • Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MATH  MathSciNet  Google Scholar 

  • Ruppert, D.: A Newton-Raphson version of the multivariate Robbins-Monro procedure. Ann. Stat. 13, 236–245 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  • Ruppert, D.: Stochastic approximation. In: Ghosh, B.K., Sen, P.K. (eds.) Handbook of Sequential Analysis, pp. 503–529. Marcel Dekker, New York (1991)

    Google Scholar 

  • Sadowsky, J.S., Bucklew, J.A.: On large deviations theory and asymptotically efficient Monte Carlo estimation. IEEE Trans. Inf. Theory 36, 579–588 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  • Smith, P.J., Shafi, M., Gao, H.: Quick simulation: a review of importance sampling techniques in communications systems. IEEE J. Sel. Areas Commun. 15, 597–612 (1997)

    Article  Google Scholar 

  • Weisberg, H.I., Derrig, R.A.: Quantitative methods for detecting fraudulent automobile bodily injury claims. Risques 35, 75–99 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris J. Lloyd.

Appendix: Proofs of consistency

Appendix: Proofs of consistency

The notation was established in Sect. 4. The tail probability being simulated is

$$\widetilde{G}(\psi)=\sum_{y_j\in\mathcal {Y}}I_{R(t)}(y_j) \widetilde {p}(y;\psi). $$

The proposed estimator \(\widehat{G}(\psi)\) was given in (7) and is a sample average of the variables Z j (ψ)=I R(t)(Y)w s (Y;ψ) where Y is generated from p S . The first proposition is that \(\widehat{G}(\psi)\) is consistent and asymptotically normal under some mild conditions. This result is well known and proven in standard texts such as Kroese et al. (2011). The second proposition shows that the standard second moment condition in Proposition 1 can be satisfied for the generalised linear models case.

Proposition 1

Let \(\widetilde{G}(\psi)\) and \(\widehat{G}(\psi)\) be defined as above, and σ 2(ψ):=Var S (Z j (ψ))<∞. As B→∞, we have

(14)

for given ψ.

Proposition 2

Suppose that the model is defined by

$$p(y;\psi,\lambda)=c(\psi,\lambda){\rm exp} \bigl\{\eta^\tau(\psi , \lambda )s(y) \bigr\}h(y), $$

where c(ψ,λ) is a normalising constant. Suppose that \(p_{S}(\ )=\tilde{p}(\ ;\psi_{s})\) is chosen from the same family. Then Var S (Z j (ψ))<∞.

Proof

Let \(\widetilde{c}(\psi)=c(\psi,\hat {\lambda }_{\psi})\) and \(\widetilde{\eta}(\psi)=\eta(\psi,\hat{\lambda }_{\psi})\) so that the post-data one-dimensional model family becomes

$$\widetilde{p}(y;\psi)=\widetilde{c}(\psi){\rm exp} \bigl\{\widetilde { \eta}^\tau (\psi)s(y) \bigr\}h(y). $$

Denote by Ψ the set of parameter values ψ for which \(\widetilde{c}(\psi)\) exists.

It is easy to show that

Note that for standard exponential family models, we can find ψ Ψ such that \(2\widetilde{\eta}(\psi)-\widetilde{\eta}(\psi _{S})=\widetilde{\eta}(\psi^{*})\), which indicates that . □

We next turn to the consistency of the solution \(\hat{u}\) of \(\hat{G}(\psi )=\alpha\) for \(\tilde{u}\). This involves conditions to ensure the uniform consistency of \(\hat{G}(\psi)\) and some further conditions on the regularity of the target function \(\tilde{G}(\psi)\). Let \({\mathcal {S}}\) be a compact set, and \({\mathcal{S}}_{j}\), j=1,…,m 0, be disjoint sets such that \(\bigcup_{j=1}^{m_{0}}{\mathcal{S}}_{j}={\mathcal {S}}\), where m 0 is bounded.

Proposition 3

Suppose that the conditions of Proposition 1 are satisfied and ψ takes value in a compact support \({\mathcal{S}}\). Suppose that for all y j , \(\tilde{p}(y_{j};\psi)\) is continuous on each subset \({\mathcal{S}}_{j}\). Suppose also that \(\tilde{G}(\psi)\) is continuous on each subset \({\mathcal{S}}_{j}\). Then, we have

$$ \sup_{\psi\in{\mathcal{S}}} \bigl|\widehat{G}(\psi)-\widetilde {G}(\psi ) \bigr|=o_P(1). $$
(15)

Proof

For ϵ>0, let \({\mathcal{S}}_{j}(k)\), k=1,…,N j , be small and disjoint intervals with center s j (k) and length ϵ such that \(\bigcup_{k=1}^{N_{j}}{\mathcal {S}}_{j}(k)={\mathcal{S}}_{j}\). Let \(N(\epsilon)=\sum_{j=1}^{m_{0}} N_{j}\) which is bounded for any ϵ>0. Now

$$\begin{aligned} &\sup_{\psi\in{\mathcal{S}}} \bigl|\widehat{G}(\psi)-\widetilde {G}(\psi ) \bigr| \\ &\quad {}\leq \max_{1\leq j\leq m_0}\max_{1\leq k\leq N_j} \bigl|\widehat {G} \bigl(s_j(k)\bigr)-\widetilde{G}\bigl(s_j(k)\bigr) \bigr| \\ &\qquad {}+\max_{1\leq j\leq m_0}\max_{1\leq k\leq N_j}\sup _{s\in{\mathcal {S}}_j(k)} \bigl|\widehat{G}(s)-\widehat{G}\bigl(s_j(k) \bigr) \bigr| \\ &\qquad {}+\max_{1\leq j\leq m_0}\max_{1\leq k\leq N_j}\sup _{s\in{\mathcal {S}}_j(k)} \bigl|\widetilde{G}(s)-\widetilde{G}\bigl(s_j(k) \bigr) \bigr| \\ &\quad =:I_{1}+I_{2}+I_{3}. \end{aligned}$$
(16)

By the Markov inequality and the second moment condition in Proposition 1, we have, for any ϵ >0,

$$\begin{aligned} {\Pr} (I_{1}>\delta ) =&{\Pr} \Bigl(\max_{1\leq j\leq m_0}\max _{1\leq k\leq N_j} \bigl|\widehat{G}\bigl(s_j(k)\bigr) \\ &{}-\widetilde{G} \bigl(s_j(k)\bigr) \bigr|>\epsilon_* \Bigr) \\ \leq&N(\epsilon){\Pr} \bigl( \bigl|\widehat{G}\bigl(s_j(k)\bigr)- \widetilde {G}\bigl(s_j(k)\bigr) \bigr|>\epsilon_* \bigr) \\ =&O\bigl(B^{-1}\bigr)=o(1) \end{aligned}$$
(17)

for given ϵ>0.

Next, since \(\widetilde{p}(y;\cdot)\) and \(\widetilde{G}(\cdot)\) are continuous on the compact sets \({\mathcal{S}}_{j}\) and are also bounded, they are Lipschitz continuous. We can then show that

$$ I_2+I_3=O(\epsilon)=o(1) $$
(18)

by letting ϵ→0. We thus prove (15) by using (16)–(18). □

Remark

Note in particular that we do not require \(\tilde{G}(\psi)\) to be continuous, only piecewise continuous. For models with finite support, the piecewise continuity condition on \(\tilde{G}(\psi)\) is redundant since it follows from the condition on \(\tilde{p}(y_{j};\psi)\).

Consistency of \(\hat{u}\) requires additional regularity conditions on \(\tilde{G}(\psi)\). The next proposition requires that \(\tilde{G}(\psi)\) be locally monotone and bounded away from α. If \(\tilde{G}(\psi)\) is monotone and continuous, as it is in all applications in this article, then the second requirement that it be bounded away from α could be dropped.

Proposition 4

Suppose that the conditions of Proposition 3 are satisfied. Suppose in addition that there exists κ>0 and δ>0 such that either

  1. (i)

    \(\widetilde{G}(\cdot)\) in strictly decreasing on \([\widetilde {u}-\kappa, \widetilde{u}+\kappa]\), and \(\widetilde{G}(\psi )<\alpha -\delta\) for all \(\psi>\widetilde{u}+\kappa\)

  2. (ii)

    \(\widetilde{G}(\cdot)\) is strictly monotone on \([\widetilde {u}-\kappa,\widetilde{u}]\), and \(\widetilde{G}(\psi)<\alpha-\delta \) for all \(\psi>\widetilde{u}\).

Then \(\widehat{u}\) is a consistent estimator of \(\widetilde{u}\).

Proof

(i) For any ϵ>0, we have

$$ {\Pr} \bigl( |\widehat{u}-\widetilde{u} |>\epsilon \bigr)\leq {\Pr} ( \widehat{u}>\widetilde{u}+\epsilon )+ {\Pr} (\widehat{u}<\widetilde{u}-\epsilon ). $$
(19)

We only need to prove the two probabilities on the right hand side of (19) would tend to zero.

If \(\widetilde{u}+\epsilon<\widehat{u}\leq\widetilde{u}+\kappa\) with ϵ<κ, we have

$$\widetilde{G}(\widehat{u})<\widetilde{G}(\widetilde{u}+\epsilon )<\widetilde{G}( \widetilde{u})=\alpha. $$

Note that \(\widehat{G}(\widehat{u})\rightarrow\widetilde {G}(\widehat {u})\) in probability by Proposition 3, which indicates that \(\widehat {G}(\widehat{u})<\alpha\) in probability. However, according to the definition of \(\widehat{G}(\widehat{u})\), we have \(\widehat {G}(\widehat {u})=\alpha\), which leads to contradictory conclusion. Meanwhile, by Proposition 2, as \(\widehat{G}(\cdot)\) is piecewise continuous, it is not possible that \(\widehat{u}>\widetilde{u}+\kappa\) in probability. Hence, we have

$$ {\Pr} (\widehat{u}>\widetilde{u}+\epsilon )\rightarrow0. $$
(20)

If \(\widehat{u}<\widetilde{u}-\epsilon\) with ϵ<κ, we have

$$\widetilde{G}(\widehat{u})>\widetilde{G}(\widetilde{u}-\epsilon )>\widetilde{G}( \widetilde{u})=\alpha. $$

Note that \(\widehat{G}(\widehat{u})\rightarrow\widetilde {G}(\widehat {u})\) in probability by Proposition 3 again, which indicates that \(\widehat{G}(\widehat{u})>\alpha\) in probability. However, according to the definition of \(\widehat{G}(\widehat{u})\), we have \(\widehat {G}(\widehat{u})=\alpha\), which also leads to contradictory conclusion. Hence, we have

$$ {\Pr} (\widehat{u}<\widetilde{u}-\epsilon )\rightarrow0. $$
(21)

By (19)–(21), this completes the proof of Proposition 4 for case (i).

(ii) As in case (i), we need to prove that the two probabilities on the right hand side of (19) would tend to zero in case (ii). Following similar argument as above, we can prove (20). If \(\widehat{u}<\widetilde{u}-\epsilon\) for any ϵ>0, we have

$$\widetilde{G}(\widehat{u})<\alpha+\delta<\alpha, $$

which together with Proposition 3, indicates that it is not possible to obtain \(\widehat{G}(\widehat{u})=\alpha\) in probability for \(\widehat {u}<\widetilde{u}-\epsilon\). Hence, we have proven (21). By (19)–(21), this completes the proof of Proposition 4 for case (ii). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lloyd, C.J., Li, D. Computing highly accurate confidence limits from discrete data using importance sampling. Stat Comput 24, 663–673 (2014). https://doi.org/10.1007/s11222-013-9409-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9409-1

Keywords

Navigation