Computing highly accurate confidence limits from discrete data using importance sampling

Lloyd, Chris J.; Li, Degui

doi:10.1007/s11222-013-9409-1

Computing highly accurate confidence limits from discrete data using importance sampling

Published: 21 August 2013

Volume 24, pages 663–673, (2014)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Chris J. Lloyd¹ &
Degui Li²

304 Accesses
5 Citations
Explore all metrics

Abstract

For discrete data, frequentist confidence limits based on a normal approximation to standard likelihood based pivotal quantities can perform poorly, even for quite large sample sizes. To construct exact limits requires the probability of a suitable tail set as a function of the unknown parameters. In this paper, importance sampling is used to estimate this surface and hence the confidence limits. The technology is simple and straightforward to implement. Unlike the recent methodology of Garthwaite and Jones (in J. Comput. Graph. Stat. 18, 184–200, 2009), the new method allows for nuisance parameters; is an order of magnitude more efficient than the Robbins-Monro bound; does not require any simulation phases or tuning constants; gives a straightforward simulation standard error for the target limit; includes a simple diagnostic for simulation breakdown.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Levi Kumle, Melissa L.-H. Võ & Dejan Draschkow

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Ulrich Knief & Wolfgang Forstmeier

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Aki Vehtari, Andrew Gelman & Jonah Gabry

References

Aleksandrov, V.M., Sysoyev, V.I., Shemeneva, V.V.: Stochastic optimization. Eng. Cybern. 5, 11–16 (1968)
Google Scholar
Carpenter, J.: Test inversion bootstrap confidence intervals. J. R. Stat. Soc. B 61, 159–172 (1999)
Article MATH MathSciNet Google Scholar
Chung, K.L.: On a stochastic approximation method. Ann. Math. Stat. 25, 463–483 (1954)
Article MATH Google Scholar
Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Application. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Garthwaite, P.H., Buckland, S.T.: Generating Monte Carlo confidence intervals by the Robbins Monro process. Appl. Stat. 41, 159–171 (1992)
Article MATH MathSciNet Google Scholar
Garthwaite, P.H., Jones, M.C.: A stochastic approximation method and its application to confidence intervals. J. Comput. Graph. Stat. 18, 184–200 (2009)
Article MathSciNet Google Scholar
Geweke, J.: Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57, 1317–1339 (1989)
Article MATH MathSciNet Google Scholar
Hoogerheide, L.F., van Dijk, H.K.: Bayesian forecasting of value at risk and expected shortfall using adaptive importance sampling. Int. J. Forecast. 26, 231–247 (2010)
Article Google Scholar
Kabaila, P.V.: Some properties of profile bootstrap confidence intervals. Aust. J. Stat. 35, 205–214 (1993)
Article MATH MathSciNet Google Scholar
Kabaila, P.V., Lloyd, C.J.: Profile upper confidence limits for discrete data. Aust. N. Z. J. Stat. 42, 67–80 (2001)
Article MathSciNet Google Scholar
Kabaila, P.V., Lloyd, C.J.: Improved Buehler limits based on refined designated statistics. J. Stat. Plan. Inference 136, 3145–3155 (2006)
Article MATH MathSciNet Google Scholar
Kallianpur, G.: A note on the Robbins-Monro stochastic approximation method. Ann. Math. Stat. 25, 386–388 (1954)
Article MATH MathSciNet Google Scholar
Kahn, H., Marshall, A.: Methods of reducing sample size in Monte Carlo computations. J. Oper. Res. Soc. Am. 1, 263–278 (1953)
Article Google Scholar
Kroese, D.P., Taimre, T., Botev, Z.I.: Handbook of Monte Carlo Methods. Wiley, New York (2011)
Book MATH Google Scholar
Lloyd, C.J.: Computing highly accurate confidence limits from discrete data using importance sampling. MBS working paper (2011). works.bepress.com/chris_lloyd/23/
Owen, A., Zhou, Y.: Safe and effective importance sampling. J. Am. Stat. Assoc. 95, 135–143 (2000)
Article MATH MathSciNet Google Scholar
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)
Article MATH MathSciNet Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MATH MathSciNet Google Scholar
Ruppert, D.: A Newton-Raphson version of the multivariate Robbins-Monro procedure. Ann. Stat. 13, 236–245 (1985)
Article MATH MathSciNet Google Scholar
Ruppert, D.: Stochastic approximation. In: Ghosh, B.K., Sen, P.K. (eds.) Handbook of Sequential Analysis, pp. 503–529. Marcel Dekker, New York (1991)
Google Scholar
Sadowsky, J.S., Bucklew, J.A.: On large deviations theory and asymptotically efficient Monte Carlo estimation. IEEE Trans. Inf. Theory 36, 579–588 (1990)
Article MATH MathSciNet Google Scholar
Smith, P.J., Shafi, M., Gao, H.: Quick simulation: a review of importance sampling techniques in communications systems. IEEE J. Sel. Areas Commun. 15, 597–612 (1997)
Article Google Scholar
Weisberg, H.I., Derrig, R.A.: Quantitative methods for detecting fraudulent automobile bodily injury claims. Risques 35, 75–99 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Melbourne Business School, University of Melbourne, Carlton, 3053, Australia
Chris J. Lloyd
Department of Econometrics and Business Statistics, Monash University, Caulfield East, 3145, Australia
Degui Li

Authors

Chris J. Lloyd
View author publications
You can also search for this author in PubMed Google Scholar
Degui Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chris J. Lloyd.

Appendix: Proofs of consistency

The notation was established in Sect. 4. The tail probability being simulated is

$$\widetilde{G}(\psi)=\sum_{y_j\in\mathcal {Y}}I_{R(t)}(y_j) \widetilde {p}(y;\psi). $$

The proposed estimator $\widehat{G}(\psi)$ was given in (7) and is a sample average of the variables Z _j(ψ)=I _R(t)(Y)w _s(Y;ψ) where Y is generated from p _S. The first proposition is that $\widehat{G}(\psi)$ is consistent and asymptotically normal under some mild conditions. This result is well known and proven in standard texts such as Kroese et al. (2011). The second proposition shows that the standard second moment condition in Proposition 1 can be satisfied for the generalised linear models case.

Proposition 1

Let $\widetilde{G}(\psi)$ and $\widehat{G}(\psi)$ be defined as above, and σ ²(ψ):=Var _S(Z _j(ψ))<∞. As B→∞, we have

(14)

for given ψ.

Proposition 2

Suppose that the model is defined by

$$p(y;\psi,\lambda)=c(\psi,\lambda){\rm exp} \bigl\{\eta^\tau(\psi , \lambda )s(y) \bigr\}h(y), $$

where c(ψ,λ) is a normalising constant. Suppose that $p_{S}(\ )=\tilde{p}(\ ;\psi_{s})$ is chosen from the same family. Then Var _S(Z _j(ψ))<∞.

Proof

Let $\widetilde{c}(\psi)=c(\psi,\hat {\lambda }_{\psi})$ and $\widetilde{\eta}(\psi)=\eta(\psi,\hat{\lambda }_{\psi})$ so that the post-data one-dimensional model family becomes

$$\widetilde{p}(y;\psi)=\widetilde{c}(\psi){\rm exp} \bigl\{\widetilde { \eta}^\tau (\psi)s(y) \bigr\}h(y). $$

Denote by Ψ ^∗ the set of parameter values ψ for which $\widetilde{c}(\psi)$ exists.

It is easy to show that

Note that for standard exponential family models, we can find ψ ^∗∈Ψ ^∗ such that $2\widetilde{\eta}(\psi)-\widetilde{\eta}(\psi _{S})=\widetilde{\eta}(\psi^{*})$, which indicates that . □

We next turn to the consistency of the solution $\hat{u}$ of $\hat{G}(\psi )=\alpha$ for $\tilde{u}$. This involves conditions to ensure the uniform consistency of $\hat{G}(\psi)$ and some further conditions on the regularity of the target function $\tilde{G}(\psi)$. Let ${\mathcal {S}}$ be a compact set, and ${\mathcal{S}}_{j}$, j=1,…,m ₀, be disjoint sets such that $\bigcup_{j=1}^{m_{0}}{\mathcal{S}}_{j}={\mathcal {S}}$, where m ₀ is bounded.

Proposition 3

Suppose that the conditions of Proposition 1 are satisfied and ψ takes value in a compact support ${\mathcal{S}}$. Suppose that for all y _j, $\tilde{p}(y_{j};\psi)$ is continuous on each subset ${\mathcal{S}}_{j}$. Suppose also that $\tilde{G}(\psi)$ is continuous on each subset ${\mathcal{S}}_{j}$. Then, we have

$$ \sup_{\psi\in{\mathcal{S}}} \bigl|\widehat{G}(\psi)-\widetilde {G}(\psi ) \bigr|=o_P(1). $$

(15)

Proof

For ϵ>0, let ${\mathcal{S}}_{j}(k)$, k=1,…,N _j, be small and disjoint intervals with center s _j(k) and length ϵ such that $\bigcup_{k=1}^{N_{j}}{\mathcal {S}}_{j}(k)={\mathcal{S}}_{j}$. Let $N(\epsilon)=\sum_{j=1}^{m_{0}} N_{j}$ which is bounded for any ϵ>0. Now

$$\begin{aligned} &\sup_{\psi\in{\mathcal{S}}} \bigl|\widehat{G}(\psi)-\widetilde {G}(\psi ) \bigr| \\ &\quad {}\leq \max_{1\leq j\leq m_0}\max_{1\leq k\leq N_j} \bigl|\widehat {G} \bigl(s_j(k)\bigr)-\widetilde{G}\bigl(s_j(k)\bigr) \bigr| \\ &\qquad {}+\max_{1\leq j\leq m_0}\max_{1\leq k\leq N_j}\sup _{s\in{\mathcal {S}}_j(k)} \bigl|\widehat{G}(s)-\widehat{G}\bigl(s_j(k) \bigr) \bigr| \\ &\qquad {}+\max_{1\leq j\leq m_0}\max_{1\leq k\leq N_j}\sup _{s\in{\mathcal {S}}_j(k)} \bigl|\widetilde{G}(s)-\widetilde{G}\bigl(s_j(k) \bigr) \bigr| \\ &\quad =:I_{1}+I_{2}+I_{3}. \end{aligned}$$

(16)

By the Markov inequality and the second moment condition in Proposition 1, we have, for any ϵ _∗>0,

$$\begin{aligned} {\Pr} (I_{1}>\delta ) =&{\Pr} \Bigl(\max_{1\leq j\leq m_0}\max _{1\leq k\leq N_j} \bigl|\widehat{G}\bigl(s_j(k)\bigr) \\ &{}-\widetilde{G} \bigl(s_j(k)\bigr) \bigr|>\epsilon_* \Bigr) \\ \leq&N(\epsilon){\Pr} \bigl( \bigl|\widehat{G}\bigl(s_j(k)\bigr)- \widetilde {G}\bigl(s_j(k)\bigr) \bigr|>\epsilon_* \bigr) \\ =&O\bigl(B^{-1}\bigr)=o(1) \end{aligned}$$

(17)

for given ϵ>0.

Next, since $\widetilde{p}(y;\cdot)$ and $\widetilde{G}(\cdot)$ are continuous on the compact sets ${\mathcal{S}}_{j}$ and are also bounded, they are Lipschitz continuous. We can then show that

$$ I_2+I_3=O(\epsilon)=o(1) $$

(18)

by letting ϵ→0. We thus prove (15) by using (16)–(18). □

Remark

Note in particular that we do not require $\tilde{G}(\psi)$ to be continuous, only piecewise continuous. For models with finite support, the piecewise continuity condition on $\tilde{G}(\psi)$ is redundant since it follows from the condition on $\tilde{p}(y_{j};\psi)$.

Consistency of $\hat{u}$ requires additional regularity conditions on $\tilde{G}(\psi)$. The next proposition requires that $\tilde{G}(\psi)$ be locally monotone and bounded away from α. If $\tilde{G}(\psi)$ is monotone and continuous, as it is in all applications in this article, then the second requirement that it be bounded away from α could be dropped.

Proposition 4

Suppose that the conditions of Proposition 3 are satisfied. Suppose in addition that there exists κ>0 and δ>0 such that either

(i)
$\widetilde{G}(\cdot)$ in strictly decreasing on $[\widetilde {u}-\kappa, \widetilde{u}+\kappa]$, and $\widetilde{G}(\psi )<\alpha -\delta$ for all $\psi>\widetilde{u}+\kappa$
(ii)
$\widetilde{G}(\cdot)$ is strictly monotone on $[\widetilde {u}-\kappa,\widetilde{u}]$, and $\widetilde{G}(\psi)<\alpha-\delta $ for all $\psi>\widetilde{u}$.

Then $\widehat{u}$ is a consistent estimator of $\widetilde{u}$.

Proof

(i) For any ϵ>0, we have

$$ {\Pr} \bigl( |\widehat{u}-\widetilde{u} |>\epsilon \bigr)\leq {\Pr} ( \widehat{u}>\widetilde{u}+\epsilon )+ {\Pr} (\widehat{u}<\widetilde{u}-\epsilon ). $$

(19)

We only need to prove the two probabilities on the right hand side of (19) would tend to zero.

If $\widetilde{u}+\epsilon<\widehat{u}\leq\widetilde{u}+\kappa$ with ϵ<κ, we have

$$\widetilde{G}(\widehat{u})<\widetilde{G}(\widetilde{u}+\epsilon )<\widetilde{G}( \widetilde{u})=\alpha. $$

Note that $\widehat{G}(\widehat{u})\rightarrow\widetilde {G}(\widehat {u})$ in probability by Proposition 3, which indicates that $\widehat {G}(\widehat{u})<\alpha$ in probability. However, according to the definition of $\widehat{G}(\widehat{u})$, we have $\widehat {G}(\widehat {u})=\alpha$, which leads to contradictory conclusion. Meanwhile, by Proposition 2, as $\widehat{G}(\cdot)$ is piecewise continuous, it is not possible that $\widehat{u}>\widetilde{u}+\kappa$ in probability. Hence, we have

$$ {\Pr} (\widehat{u}>\widetilde{u}+\epsilon )\rightarrow0. $$

(20)

If $\widehat{u}<\widetilde{u}-\epsilon$ with ϵ<κ, we have

$$\widetilde{G}(\widehat{u})>\widetilde{G}(\widetilde{u}-\epsilon )>\widetilde{G}( \widetilde{u})=\alpha. $$

Note that $\widehat{G}(\widehat{u})\rightarrow\widetilde {G}(\widehat {u})$ in probability by Proposition 3 again, which indicates that $\widehat{G}(\widehat{u})>\alpha$ in probability. However, according to the definition of $\widehat{G}(\widehat{u})$, we have $\widehat {G}(\widehat{u})=\alpha$, which also leads to contradictory conclusion. Hence, we have

$$ {\Pr} (\widehat{u}<\widetilde{u}-\epsilon )\rightarrow0. $$

(21)

By (19)–(21), this completes the proof of Proposition 4 for case (i).

(ii) As in case (i), we need to prove that the two probabilities on the right hand side of (19) would tend to zero in case (ii). Following similar argument as above, we can prove (20). If $\widehat{u}<\widetilde{u}-\epsilon$ for any ϵ>0, we have

$$\widetilde{G}(\widehat{u})<\alpha+\delta<\alpha, $$

which together with Proposition 3, indicates that it is not possible to obtain $\widehat{G}(\widehat{u})=\alpha$ in probability for $\widehat {u}<\widetilde{u}-\epsilon$. Hence, we have proven (21). By (19)–(21), this completes the proof of Proposition 4 for case (ii). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lloyd, C.J., Li, D. Computing highly accurate confidence limits from discrete data using importance sampling. Stat Comput 24, 663–673 (2014). https://doi.org/10.1007/s11222-013-9409-1

Download citation

Received: 16 February 2012
Accepted: 29 May 2013
Published: 21 August 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s11222-013-9409-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Computing highly accurate confidence limits from discrete data using importance sampling

Abstract

Access this article

Similar content being viewed by others

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Violating the normality assumption may be the lesser of two evils

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References