Abstract
For discrete data, frequentist confidence limits based on a normal approximation to standard likelihood based pivotal quantities can perform poorly, even for quite large sample sizes. To construct exact limits requires the probability of a suitable tail set as a function of the unknown parameters. In this paper, importance sampling is used to estimate this surface and hence the confidence limits. The technology is simple and straightforward to implement. Unlike the recent methodology of Garthwaite and Jones (in J. Comput. Graph. Stat. 18, 184–200, 2009), the new method allows for nuisance parameters; is an order of magnitude more efficient than the Robbins-Monro bound; does not require any simulation phases or tuning constants; gives a straightforward simulation standard error for the target limit; includes a simple diagnostic for simulation breakdown.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aleksandrov, V.M., Sysoyev, V.I., Shemeneva, V.V.: Stochastic optimization. Eng. Cybern. 5, 11–16 (1968)
Carpenter, J.: Test inversion bootstrap confidence intervals. J. R. Stat. Soc. B 61, 159–172 (1999)
Chung, K.L.: On a stochastic approximation method. Ann. Math. Stat. 25, 463–483 (1954)
Davison, A.C., Hinkley, D.V.: Bootstrap Methods and Their Application. Cambridge University Press, Cambridge (1997)
Garthwaite, P.H., Buckland, S.T.: Generating Monte Carlo confidence intervals by the Robbins Monro process. Appl. Stat. 41, 159–171 (1992)
Garthwaite, P.H., Jones, M.C.: A stochastic approximation method and its application to confidence intervals. J. Comput. Graph. Stat. 18, 184–200 (2009)
Geweke, J.: Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57, 1317–1339 (1989)
Hoogerheide, L.F., van Dijk, H.K.: Bayesian forecasting of value at risk and expected shortfall using adaptive importance sampling. Int. J. Forecast. 26, 231–247 (2010)
Kabaila, P.V.: Some properties of profile bootstrap confidence intervals. Aust. J. Stat. 35, 205–214 (1993)
Kabaila, P.V., Lloyd, C.J.: Profile upper confidence limits for discrete data. Aust. N. Z. J. Stat. 42, 67–80 (2001)
Kabaila, P.V., Lloyd, C.J.: Improved Buehler limits based on refined designated statistics. J. Stat. Plan. Inference 136, 3145–3155 (2006)
Kallianpur, G.: A note on the Robbins-Monro stochastic approximation method. Ann. Math. Stat. 25, 386–388 (1954)
Kahn, H., Marshall, A.: Methods of reducing sample size in Monte Carlo computations. J. Oper. Res. Soc. Am. 1, 263–278 (1953)
Kroese, D.P., Taimre, T., Botev, Z.I.: Handbook of Monte Carlo Methods. Wiley, New York (2011)
Lloyd, C.J.: Computing highly accurate confidence limits from discrete data using importance sampling. MBS working paper (2011). works.bepress.com/chris_lloyd/23/
Owen, A., Zhou, Y.: Safe and effective importance sampling. J. Am. Stat. Assoc. 95, 135–143 (2000)
Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30, 838–855 (1992)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Ruppert, D.: A Newton-Raphson version of the multivariate Robbins-Monro procedure. Ann. Stat. 13, 236–245 (1985)
Ruppert, D.: Stochastic approximation. In: Ghosh, B.K., Sen, P.K. (eds.) Handbook of Sequential Analysis, pp. 503–529. Marcel Dekker, New York (1991)
Sadowsky, J.S., Bucklew, J.A.: On large deviations theory and asymptotically efficient Monte Carlo estimation. IEEE Trans. Inf. Theory 36, 579–588 (1990)
Smith, P.J., Shafi, M., Gao, H.: Quick simulation: a review of importance sampling techniques in communications systems. IEEE J. Sel. Areas Commun. 15, 597–612 (1997)
Weisberg, H.I., Derrig, R.A.: Quantitative methods for detecting fraudulent automobile bodily injury claims. Risques 35, 75–99 (1998)
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs of consistency
Appendix: Proofs of consistency
The notation was established in Sect. 4. The tail probability being simulated is
The proposed estimator \(\widehat{G}(\psi)\) was given in (7) and is a sample average of the variables Z j (ψ)=I R(t)(Y)w s (Y;ψ) where Y is generated from p S . The first proposition is that \(\widehat{G}(\psi)\) is consistent and asymptotically normal under some mild conditions. This result is well known and proven in standard texts such as Kroese et al. (2011). The second proposition shows that the standard second moment condition in Proposition 1 can be satisfied for the generalised linear models case.
Proposition 1
Let \(\widetilde{G}(\psi)\) and \(\widehat{G}(\psi)\) be defined as above, and σ 2(ψ):=Var S (Z j (ψ))<∞. As B→∞, we have
for given ψ.
Proposition 2
Suppose that the model is defined by
where c(ψ,λ) is a normalising constant. Suppose that \(p_{S}(\ )=\tilde{p}(\ ;\psi_{s})\) is chosen from the same family. Then Var S (Z j (ψ))<∞.
Proof
Let \(\widetilde{c}(\psi)=c(\psi,\hat {\lambda }_{\psi})\) and \(\widetilde{\eta}(\psi)=\eta(\psi,\hat{\lambda }_{\psi})\) so that the post-data one-dimensional model family becomes
Denote by Ψ ∗ the set of parameter values ψ for which \(\widetilde{c}(\psi)\) exists.
It is easy to show that
Note that for standard exponential family models, we can find ψ ∗∈Ψ ∗ such that \(2\widetilde{\eta}(\psi)-\widetilde{\eta}(\psi _{S})=\widetilde{\eta}(\psi^{*})\), which indicates that . □
We next turn to the consistency of the solution \(\hat{u}\) of \(\hat{G}(\psi )=\alpha\) for \(\tilde{u}\). This involves conditions to ensure the uniform consistency of \(\hat{G}(\psi)\) and some further conditions on the regularity of the target function \(\tilde{G}(\psi)\). Let \({\mathcal {S}}\) be a compact set, and \({\mathcal{S}}_{j}\), j=1,…,m 0, be disjoint sets such that \(\bigcup_{j=1}^{m_{0}}{\mathcal{S}}_{j}={\mathcal {S}}\), where m 0 is bounded.
Proposition 3
Suppose that the conditions of Proposition 1 are satisfied and ψ takes value in a compact support \({\mathcal{S}}\). Suppose that for all y j , \(\tilde{p}(y_{j};\psi)\) is continuous on each subset \({\mathcal{S}}_{j}\). Suppose also that \(\tilde{G}(\psi)\) is continuous on each subset \({\mathcal{S}}_{j}\). Then, we have
Proof
For ϵ>0, let \({\mathcal{S}}_{j}(k)\), k=1,…,N j , be small and disjoint intervals with center s j (k) and length ϵ such that \(\bigcup_{k=1}^{N_{j}}{\mathcal {S}}_{j}(k)={\mathcal{S}}_{j}\). Let \(N(\epsilon)=\sum_{j=1}^{m_{0}} N_{j}\) which is bounded for any ϵ>0. Now
By the Markov inequality and the second moment condition in Proposition 1, we have, for any ϵ ∗>0,
for given ϵ>0.
Next, since \(\widetilde{p}(y;\cdot)\) and \(\widetilde{G}(\cdot)\) are continuous on the compact sets \({\mathcal{S}}_{j}\) and are also bounded, they are Lipschitz continuous. We can then show that
by letting ϵ→0. We thus prove (15) by using (16)–(18). □
Remark
Note in particular that we do not require \(\tilde{G}(\psi)\) to be continuous, only piecewise continuous. For models with finite support, the piecewise continuity condition on \(\tilde{G}(\psi)\) is redundant since it follows from the condition on \(\tilde{p}(y_{j};\psi)\).
Consistency of \(\hat{u}\) requires additional regularity conditions on \(\tilde{G}(\psi)\). The next proposition requires that \(\tilde{G}(\psi)\) be locally monotone and bounded away from α. If \(\tilde{G}(\psi)\) is monotone and continuous, as it is in all applications in this article, then the second requirement that it be bounded away from α could be dropped.
Proposition 4
Suppose that the conditions of Proposition 3 are satisfied. Suppose in addition that there exists κ>0 and δ>0 such that either
-
(i)
\(\widetilde{G}(\cdot)\) in strictly decreasing on \([\widetilde {u}-\kappa, \widetilde{u}+\kappa]\), and \(\widetilde{G}(\psi )<\alpha -\delta\) for all \(\psi>\widetilde{u}+\kappa\)
-
(ii)
\(\widetilde{G}(\cdot)\) is strictly monotone on \([\widetilde {u}-\kappa,\widetilde{u}]\), and \(\widetilde{G}(\psi)<\alpha-\delta \) for all \(\psi>\widetilde{u}\).
Then \(\widehat{u}\) is a consistent estimator of \(\widetilde{u}\).
Proof
(i) For any ϵ>0, we have
We only need to prove the two probabilities on the right hand side of (19) would tend to zero.
If \(\widetilde{u}+\epsilon<\widehat{u}\leq\widetilde{u}+\kappa\) with ϵ<κ, we have
Note that \(\widehat{G}(\widehat{u})\rightarrow\widetilde {G}(\widehat {u})\) in probability by Proposition 3, which indicates that \(\widehat {G}(\widehat{u})<\alpha\) in probability. However, according to the definition of \(\widehat{G}(\widehat{u})\), we have \(\widehat {G}(\widehat {u})=\alpha\), which leads to contradictory conclusion. Meanwhile, by Proposition 2, as \(\widehat{G}(\cdot)\) is piecewise continuous, it is not possible that \(\widehat{u}>\widetilde{u}+\kappa\) in probability. Hence, we have
If \(\widehat{u}<\widetilde{u}-\epsilon\) with ϵ<κ, we have
Note that \(\widehat{G}(\widehat{u})\rightarrow\widetilde {G}(\widehat {u})\) in probability by Proposition 3 again, which indicates that \(\widehat{G}(\widehat{u})>\alpha\) in probability. However, according to the definition of \(\widehat{G}(\widehat{u})\), we have \(\widehat {G}(\widehat{u})=\alpha\), which also leads to contradictory conclusion. Hence, we have
By (19)–(21), this completes the proof of Proposition 4 for case (i).
(ii) As in case (i), we need to prove that the two probabilities on the right hand side of (19) would tend to zero in case (ii). Following similar argument as above, we can prove (20). If \(\widehat{u}<\widetilde{u}-\epsilon\) for any ϵ>0, we have
which together with Proposition 3, indicates that it is not possible to obtain \(\widehat{G}(\widehat{u})=\alpha\) in probability for \(\widehat {u}<\widetilde{u}-\epsilon\). Hence, we have proven (21). By (19)–(21), this completes the proof of Proposition 4 for case (ii). □
Rights and permissions
About this article
Cite this article
Lloyd, C.J., Li, D. Computing highly accurate confidence limits from discrete data using importance sampling. Stat Comput 24, 663–673 (2014). https://doi.org/10.1007/s11222-013-9409-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-013-9409-1