Skip to main content
Log in

Numerical Methods of Sufficient Sample Size Estimation for Generalised Linear Models

  • Published:
Lobachevskii Journal of Mathematics Aims and scope Submit manuscript

Abstract

This paper investigates the problem of cost reduction of data collection procedures. To select an adequate regression or classification model, a sample set of minimum sufficient size must be collected. This sample set is modelled according to follow the data generation hypotheses. Namely, the generalised linear regression models assume the independent and identically distributed target variable. The paper analyses several numerical methods of sample size estimation and compared them in practical terms. It includes statistic, heuristic and Bayesian methods. The practical goal of a sample set collection is modelling, some methods involve analysis of the model parameters. The computational experiment includes widely-used sample sets. The open-source code and the software are provided for the practitioners to use in the data collection planning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

REFERENCES

  1. E. Demidenko, ‘‘Sample size determination for logistic regression revisited,’’ Stat. Med. 26, 3385–3397 (2007).

    Article  MathSciNet  Google Scholar 

  2. S. Self and R. Mauritsen, ‘‘Power/sample size calculations for generalized linear models,’’ Biometrics 44, 79–86 (1988).

    Article  MathSciNet  MATH  Google Scholar 

  3. S. Self, R. Mauritsen, and J. Ohara, ‘‘Power calculations for likelihood ratio tests in generalized linear models,’’ Biometrics 48, 31–39 (1992).

    Article  Google Scholar 

  4. G. Shieh, ‘‘On power and sample size calculations for likelihood ratio tests in generalized linear models,’’ Biometrics 56, 1192–1196 (2000).

    Article  MathSciNet  MATH  Google Scholar 

  5. T. Kloek, ‘‘Note on a large-sample result in specification analysis,’’ Econometrica 43, 933–936 (1975).

    Article  MathSciNet  MATH  Google Scholar 

  6. G. Shieh, ‘‘On power and sample size calculations for Wald tests in generalized linear models,’’ J. Stat. Planning Inference 128, 43–59 (2005).

    Article  MathSciNet  MATH  Google Scholar 

  7. A. Motrenko, V. Strijov, and G. Weber, ‘‘Sample size determination for logis ression,’’ J. Comput. Appl. Math. 255, 743–752 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  8. M. Qumsiyeh, ‘‘Using the bootstrap for estimation the sample size in statistical experiments,’’ J. Mod. Appl. Stat. Methods 8, 305–321 (2013).

    Google Scholar 

  9. D. Lindley, ‘‘The choice of sample size,’’ Statistician 46, 129–138 (1997).

    Article  Google Scholar 

  10. D. Rubin and H. Stern, ‘‘Sample size determination using posterior predictive distributions,’’ Sankhya, Spec. Iss. Bayesian Anal. 60, 161–175 (1998).

    MATH  Google Scholar 

  11. F. Wang and A. Gelfand, ‘‘A simulation-based approach to bayesian sample size determination for performance under a given model and for separating models,’’ Stat. Sci. 17, 193–208 (2002).

    MathSciNet  MATH  Google Scholar 

  12. L. Joseph, R. Berger, and P. Be’lisle, ‘‘Bayesian and mixed bayesian likelihood criteria for sample size determination,’’ Statistician 16, 769–781 (1995).

    Article  Google Scholar 

  13. L. Joseph, D. Wolfson, and R. Berger, ‘‘Sample size calculations for binomial proportions via highest posterior density intervals,’’ Stat. Med. 44, 143–154 (1997).

    Google Scholar 

  14. J. Quinlan, ‘‘Learning with continuous classes,’’ in Proceedings of the 5th Australian Joint Conference on AI (1992), pp. 343–348.

  15. D. Harrison and D. Rubinfeld, ‘‘Hedonic prices and the demand for clean air,’’ Econ. Manage. 5, 81–102 (1978).

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. V. Grabovoy, T. T. Gadaev, A. P. Motrenko or V. V. Strijov.

Additional information

(Submitted by A. I. Volodin)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Grabovoy, A.V., Gadaev, T.T., Motrenko, A.P. et al. Numerical Methods of Sufficient Sample Size Estimation for Generalised Linear Models. Lobachevskii J Math 43, 2453–2462 (2022). https://doi.org/10.1134/S1995080222120125

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1995080222120125

Keywords:

Navigation