Abstract
This paper investigates the problem of cost reduction of data collection procedures. To select an adequate regression or classification model, a sample set of minimum sufficient size must be collected. This sample set is modelled according to follow the data generation hypotheses. Namely, the generalised linear regression models assume the independent and identically distributed target variable. The paper analyses several numerical methods of sample size estimation and compared them in practical terms. It includes statistic, heuristic and Bayesian methods. The practical goal of a sample set collection is modelling, some methods involve analysis of the model parameters. The computational experiment includes widely-used sample sets. The open-source code and the software are provided for the practitioners to use in the data collection planning.
Similar content being viewed by others
REFERENCES
E. Demidenko, ‘‘Sample size determination for logistic regression revisited,’’ Stat. Med. 26, 3385–3397 (2007).
S. Self and R. Mauritsen, ‘‘Power/sample size calculations for generalized linear models,’’ Biometrics 44, 79–86 (1988).
S. Self, R. Mauritsen, and J. Ohara, ‘‘Power calculations for likelihood ratio tests in generalized linear models,’’ Biometrics 48, 31–39 (1992).
G. Shieh, ‘‘On power and sample size calculations for likelihood ratio tests in generalized linear models,’’ Biometrics 56, 1192–1196 (2000).
T. Kloek, ‘‘Note on a large-sample result in specification analysis,’’ Econometrica 43, 933–936 (1975).
G. Shieh, ‘‘On power and sample size calculations for Wald tests in generalized linear models,’’ J. Stat. Planning Inference 128, 43–59 (2005).
A. Motrenko, V. Strijov, and G. Weber, ‘‘Sample size determination for logis ression,’’ J. Comput. Appl. Math. 255, 743–752 (2014).
M. Qumsiyeh, ‘‘Using the bootstrap for estimation the sample size in statistical experiments,’’ J. Mod. Appl. Stat. Methods 8, 305–321 (2013).
D. Lindley, ‘‘The choice of sample size,’’ Statistician 46, 129–138 (1997).
D. Rubin and H. Stern, ‘‘Sample size determination using posterior predictive distributions,’’ Sankhya, Spec. Iss. Bayesian Anal. 60, 161–175 (1998).
F. Wang and A. Gelfand, ‘‘A simulation-based approach to bayesian sample size determination for performance under a given model and for separating models,’’ Stat. Sci. 17, 193–208 (2002).
L. Joseph, R. Berger, and P. Be’lisle, ‘‘Bayesian and mixed bayesian likelihood criteria for sample size determination,’’ Statistician 16, 769–781 (1995).
L. Joseph, D. Wolfson, and R. Berger, ‘‘Sample size calculations for binomial proportions via highest posterior density intervals,’’ Stat. Med. 44, 143–154 (1997).
J. Quinlan, ‘‘Learning with continuous classes,’’ in Proceedings of the 5th Australian Joint Conference on AI (1992), pp. 343–348.
D. Harrison and D. Rubinfeld, ‘‘Hedonic prices and the demand for clean air,’’ Econ. Manage. 5, 81–102 (1978).
Author information
Authors and Affiliations
Corresponding authors
Additional information
(Submitted by A. I. Volodin)
Rights and permissions
About this article
Cite this article
Grabovoy, A.V., Gadaev, T.T., Motrenko, A.P. et al. Numerical Methods of Sufficient Sample Size Estimation for Generalised Linear Models. Lobachevskii J Math 43, 2453–2462 (2022). https://doi.org/10.1134/S1995080222120125
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1995080222120125