Abstract
The experimental design for a generalized linear model (GLM) is important but challenging since the design criterion often depends on model specification including the link function, the linear predictor, and the unknown regression coefficients. Prior to constructing locally or globally optimal designs, a pilot experiment is usually conducted to provide some insights on the model specifications. In pilot experiments, little information on the model specification of GLM is available. Surprisingly, there is very limited research on the design of pilot experiments for GLMs. In this work, we obtain some theoretical understanding of the design efficiency in pilot experiments for GLMs. Guided by the theory, we propose to adopt a low-discrepancy design with respect to some target distribution for pilot experiments. The performance of the proposed design is assessed through several numerical examples.
This is a preview of subscription content, access via your institution.






References
Amzal B, Bois FY, Parent E, Robert CP (2006) Bayesian-optimal design via interacting particle systems. J Am Stat Assoc 101:773–785
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404
Atkinson A, Donev A, Tobias R (2007) Optimum experimental designs, with SAS, vol 34. Oxford University Press, Oxford
Atkinson AC, Woods DC (2015) Designs for generalized linear models. In: Handbook of design and analysis of experiments, pp 471–514
Berlinet A, Thomas-Agnan C (2004) Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic Publishers, Boston
Dean A, Morris M, Stufken J, Bingham D (2015) Handbook of design and analysis of experiments, vol 7. CRC Press, Boca Raton
Deng X, Jin R (2015) QQ models: joint modeling for quantitative and qualitative quality responses in manufacturing systems. Technometrics 57:320–331
Dette H (1997) Designing experiments with respect to ‘standardized’ optimality criteria. J R Stat Soc Ser B (Stat Methodol) 59:97–110
Fang K-T, Lin DK, Winker P, Zhang Y (2000) Uniform design: theory and application. Technometrics 42:237–248
Fasshauer GE (2007) Meshfree approximation methods with Matlab. Interdisciplinary mathematical sciences, vol 6. World Scientific Publishing Co., Singapore
Fedorov VV (1972) Theory of optimal experiments. Academic Press, New York
Hickernell FJ (1996) The mean square discrepancy of randomized nets. ACM Trans Model Comput Simul (TOMACS) 6:274–296
Hickernell FJ (1998) A generalized discrepancy and quadrature error bound. Math Comput 67:299–322
Hickernell FJ (1999) Goodness-of-fit statistics, discrepancies and robust designs. Stat Probab Lett 44:73–78
Hickernell FJ, Liu M-Q (2002) Uniform designs limit aliasing. Biometrika 89:893–904
Iman RL, Conover W-J (1982) A distribution-free approach to inducing rank correlation among input variables. Commun Stat Simul Comput 11:311–334
Imhof L, Wong WK (2000) A graphical method for finding maximin efficiency designs. Biometrics 56:113–117
Joseph VR, Gul E, Ba S (2015) Maximum projection designs for computer experiments. Biometrika 102:371–380
Kang L, Kang X, Deng X, Jin R (2018) A Bayesian hierarchical model for quantitative and qualitative responses. J Qual Technol 50:290–308
Li Y, Deng X (2020) An efficient algorithm for Elastic I-optimal design of generalized linear models. Can J Stat 49:438–470
Li Y, Kang L, Deng X (2020) A maximin \(\Phi _p\)-efficient design for multivariate GLM. Stat Sin. https://doi.org/10.5705/ss.202020.0278
Li Y, Kang L, Hickernell FJ (2020) Is a transformed low discrepancy design also low discrepancy? Contemporary experimental design, multivariate analysis and data mining. Springer, Cham, pp 69–92
Mao X, Chen SX, Wong RK (2019) Matrix completion with covariate information. J Am Stat Assoc 114:198–210
McKay MD, Beckman RJ, Conover WJ (2000) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 42:55–61
Morris MD, Mitchell TJ (1995) Exploratory designs for computational experiments. J Stat Plan Inference 43:381–402
Nelder J, Wedderburn R (1972) Generalized linear models. J R Stat Soc Ser B 135:370–384
Niederreiter H (1988) Low-discrepancy and low-dispersion sequences. J Number Theor 30:51–70
Novak E, Wozniakowski H (2001) When are integration and discrepancy tractable? London mathematical society lecture note series, pp 211–266
Owen AB (1994) Controlling correlations in Latin hypercube samples. J Am Stat Assoc 89:1517–1522
Owen AB (2000) Monte Carlo, quasi-Monte carlo, and randomized quasi-Monte Carlo. Monte-Carlo and Quasi-Monte Carlo Methods 1998:86–97
Pukelsheim F (1993) Optimal design of experiments. Wiley, Hoboken
Shen S, Kang L, Deng X (2020) Additive heredity model for the analysis of mixture-of-mixtures experiments. Technometrics 62:265–276
Tang B (1998) Selecting Latin hypercubes using correlation criteria. Stat Sin 8:965–977
Tekle FB, Tan FE, Berger MP (2008) Maximin D-optimal designs for binary longitudinal responses. Comput Stat Data Anal 52:5253–5262
Winker P, Fang K-T (1997) Application of threshold-accepting to the evaluation of the discrepancy of a set of points. SIAM J Numer Anal 34:2028–2042
Woods DC, Lewis SM (2011) Continuous optimal designs for generalized linear models under model uncertainty. J Stat Theor Pract 5:137–145
Woods DC, Lewis SM, Eccleston JA, Russell K (2006) Designs for generalized linear models with several variables and model uncertainty. Technometrics 48:284–292
Woods DC, Overstall AM, Adamou M, Waite TW (2017) Bayesian design of experiments for generalized linear models and dimensional analysis with industrial and scientific application. Qual Eng 29:91–103
Wu CJ, Hamada MS (2011) Experiments: planning, analysis, and optimization, vol 552. Wiley, Hoboken
Yang M, Biedermann S, Tang E (2013) On optimal designs for nonlinear models: a general and efficient algorithm. J Am Stat Assoc 108:1411–1420
Zeng Y, Chen X, Deng X, Jin R (2021) A prediction-oriented optimal design for visualization recommender system. Stat Theor Relat Fields 5(2):134–148
Acknowledgements
The authors would like to sincerely thank the Associate Editor and reviewers for their insightful comments. Deng’s work was partly supported by National Science Foundation CISE Expedition Grant CCF-1918770.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Special Issue: State of the art in research on design and analysis of experiments” guest edited by John Stufken, Abhyuday Mandal, and Rakhi Singh.
Appendix
Appendix
Derivation of the discrepancy in (10).
We first consider the case \(d=1\). We integrate the kernel once:
Then we integrate once more:
Generalizing this to the d-dimensional case yields
Thus, the discrepancy of a design \(\xi \) for the uniform distribution on \([-1,1]^d\) is
Derivation of the discrepancy in (11).
Following the same procedure as the derivation of \(D^2(\xi ; F_unif )\),
and thus, the discrepancy of a design \(\xi \) for the arcsine distribution on \([-1,1]^d\) is
Rights and permissions
About this article
Cite this article
Li, Y., Deng, X. On Efficient Design of Pilot Experiment for Generalized Linear Models. J Stat Theory Pract 15, 83 (2021). https://doi.org/10.1007/s42519-021-00222-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s42519-021-00222-y
Keywords
- Design efficiency
- Discrepancy
- Model uncertainty
- Optimal design