Analyzing supersaturated designs for discrete responses via generalized linear models

Balakrishnan, N.; Koukouvinos, C.; Parpoula, C.

doi:10.1007/s00362-013-0569-z

Analyzing supersaturated designs for discrete responses via generalized linear models

Regular Article
Published: 19 November 2013

Volume 56, pages 121–145, (2015)
Cite this article

Statistical Papers Aims and scope Submit manuscript

N. Balakrishnan¹,
C. Koukouvinos² &
C. Parpoula²

224 Accesses
2 Citations
Explore all metrics

Abstract

A supersaturated design is a factorial design in which the number of factors to be estimated is larger than the available number of experimental runs. The cost and time required for many industrial experimentations can be reduced by using the class of supersaturated designs, since the main goal for such a design is to identify only a few of the factors under consideration that have dominant effects and to do this identification at a minimal cost. While most of the literature on supersaturated designs has focused on the construction of designs and their optimality properties, the data analysis of such designs has not been developed to a great extent. In this paper, we propose a supersaturated design analysis method, by assuming generalized linear models for discrete responses, for analyzing main effects designs and identifying simultaneously the effects that are significant. Empirical study demonstrates that this method performs well with low Type I and Type II error rates. The proposed method is therefore useful as it enables us to use supersaturated designs for analyzing data on discrete response regression models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-objective generalized normal distribution optimization: a novel algorithm for multi-objective problems

Article Open access 08 May 2024

Recent advances and applications of surrogate models for finite element method computations: a review

Article 17 July 2022

A Review on Global Sensitivity Analysis Methods

References

Abraham B, Chipman H, Vijayan K (1999) Some risks in the construction and analysis of supersaturated designs. Technometrics 41:135–141
Article Google Scholar
Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York
Book MATH Google Scholar
Balakrishnan N, Koukouvinos C, Parpoula C (2013) An information theoretical algorithm for analyzing supersaturated designs for a binary response. Metrika 76:1–18
Article MATH MathSciNet Google Scholar
Beattie SD, Fong DKF, Lin DKJ (2002) A two-stage Bayesian model selection strategy for supersaturated designs. Technometrics 44:55–63
Article MathSciNet Google Scholar
Box GEP, Meyer RD (1986) An analysis for unreplicated fractional factorials. Technometrics 28:11–18
Article MATH MathSciNet Google Scholar
Cameron AC, Trivedi PK (1998) Regression analysis of count data. Cambridge University Press, New York
Book MATH Google Scholar
Candes EJ, Tao T (2007) The Dantzig selector: statistical estimation when \({p}\) is much larger than \({n}\). Ann Stat 35:2313–2351
Article MATH MathSciNet Google Scholar
Chipman H, Hamada M, Wu CFJ (1997) A Bayesian variable selection approach for analyzing designed experiments with complex aliasing. Technometrics 39:372–381
Article MATH Google Scholar
Czado C, Raftery AE (2006) Choosing the link function and accounting for link uncertainty in generalized linear models using Bayes factors. Stat Pap 47:419–442
Article MATH MathSciNet Google Scholar
Draper NR, Pukelsheim F (1996) An overview of design of experiments. Stat Pap 37:1–32
Article MATH MathSciNet Google Scholar
Erdman D, Jackson L, Sinko A (2008) Zero-inflated Poisson and zero-inflated negative binomial models using the COUNTREG Procedure, Paper 322–2008. SAS Institute Inc., Cary
Google Scholar
Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555
MATH MathSciNet Google Scholar
Hamada M, Wu CFJ (1992) Analysis of designed experiments with complex aliasing. J Qual Technol 24:130–137
Google Scholar
Hilbe JM (2008) Negative binomial regression. Cambridge University Press, New York
Google Scholar
Holcomb DR, Montgomery DC, Carlyle WM (2003) Analysis of supersaturated designs. J Qual Technol 35:13–27
Google Scholar
Hong CS, Kim BJ (2011) Mutual information and redundancy for categorical data. Stat Pap 52:17–31
Article MATH MathSciNet Google Scholar
Koukouvinos C, Mylona K, Simos DE (2008) \(E(s^2)\)-optimal and minimax-optimal cyclic supersaturated designs via multi-objective simulated annealing. J Stat Planning Inference 138:1639–1646
Article MATH MathSciNet Google Scholar
Koukouvinos C, Parpoula C (2012) Analyzing supersaturated designs by means of an information based criterion. Commun Stat Simul Comput 41:44–57
Article MATH MathSciNet Google Scholar
Li R, Lin DKJ (2002) Data analysis in supersaturated designs. Stat Probab Lett 59:135–144
Article MATH Google Scholar
Lin DKJ (1993) A new class of supersaturated designs. Technometrics 35:28–31
Article Google Scholar
Lin DKJ (1995) Generating systematic supersaturated designs. Technometrics 37:213–225
Article MATH Google Scholar
Lu X, Wu X (2004) A strategy of searching active factors in supersaturated screening experiments. J Qual Technol 36:392–399
Google Scholar
Marley CJ, Woods DC (2010) A comparison of design and model selection methods for supersaturated experiments. Comput Stat Data Anal 54:3158–3167
Article MATH MathSciNet Google Scholar
McCullagh P, Nelder J (1997) Generalized linear models, 2nd edn. Chapman & Hall, New York
Google Scholar
Montgomery DC, Peck EA, Vining GG (2006) Introduction to linear regression analysis, 4th edn. Wiley, Hoboken
MATH Google Scholar
Myers RH, Montgomery DC, Vining GG (2002) Generalized linear models: with applications in engineering and the sciences. Wiley, New York
Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238
Article Google Scholar
Pettersson H (2005) Optimal design in average for inference in generalized linear models. Stat Pap 46:79–100
Article MATH MathSciNet Google Scholar
Phoa FKH, Pan Y-H, Xu H (2009) Analysis of supersaturated designs via the Dantzig selector. J Stat Planning Inference 139:2362–2372
Article MATH MathSciNet Google Scholar
Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical recipes in C. Cambridge University Press, Cambridge
MATH Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Google Scholar
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423 and 623–656.
Tang B, Wu CFJ (1997) A method for constructing supersaturated designs and its \(E(s^2)\)-optimality. Can J Stat 25:191–201
Article MATH MathSciNet Google Scholar
Wang PC (1995) Comments on Lin (1993). Technometrics 37:358–359
Google Scholar
Yu L, Liu H, (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-2003), Washington, DC, pp 856–863.
Zhang QZ, Zhang RC, Liu MQ (2007) A method for screening active effects in supersaturated designs. J Stat Planning Inference 137:235–248
Google Scholar

Download references

Acknowledgments

The research of the third author was financially supported by a scholarship awarded by the Secretariat of the Research Committee of National Technical University of Athens. The authors would like to thank the Associate Editor and the referees for their constructive and useful suggestions which resulted in an improvement on an earlier version of this manuscript.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, McMaster University, Hamilton, ON, L8S 4K1, Canada
N. Balakrishnan
Department of Mathematics, National Technical University of Athens, 15773 , Athens, Zografou, Greece
C. Koukouvinos & C. Parpoula

Authors

N. Balakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
C. Koukouvinos
View author publications
You can also search for this author in PubMed Google Scholar
C. Parpoula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Koukouvinos.

Appendix

For each of the 44 models presented in Table 1, 1,000 datasets were generated for each considered Scenario, and the results obtained after the application of our method are presented in Tables 6, 7, 8 and 9 for Scenario I, II, III and IV, respectively, in accordance with the threshold values examined. In the first column, the number that corresponds to each used model is given. Columns named “Type I” and ”Type II” present the average values over 1,000 simulations of the Type I and Type II error rates corresponding to every threshold value. The last line of each table presents the average Type I and Type II error values for the 44 models considered.

Comparative results for each of the following scenarios are as follows:

Scenario I: From Table 2, we readily observe that the average Type I and Type II error values for the 44 models considered are 0.23 and 0.06, respectively, for \(SU\) taken to be median\((\mathbf su )\) and \(w=\frac{k}{2}\). With this choice of thresholds, the proposed method has significantly lower Type II error values, and slightly higher Type I error values (see Table 6 for comparison).
Scenario II: From Table 3, we readily observe that the average Type I and Type II error values for the 44 models considered are 0.29 and 0.04, respectively, for \(SU\) taken to be median\((\mathbf su )\) and \(w=\frac{k}{2}\). With this choice of thresholds, the proposed method has significantly lower Type II error values, and slightly higher Type I error values (see Table 7 for comparison).
Scenario III: From Table 4, we readily observe that the average Type I and Type II error values for the 44 models considered are 0.29 and 0.05, respectively, for \(SU\) taken to be median\((\mathbf su )\) and \(w=\frac{k}{2}\). With this choice of thresholds, the proposed method has significantly lower Type II error values, and slightly higher Type I error values (see Table 8 for comparison).
Scenario IV: From Table 5, we readily observe that the average Type I and Type II error values for the 44 models considered are 0.30 and 0.07, respectively, for \(SU\) taken to be median\((\mathbf su )\) and \(w=\frac{k}{2}\). With this choice of thresholds, the proposed method has significantly lower Type II error values, and slightly higher Type I error values (see Table 9 for comparison).

Table 6 Empirical performance of the proposed algorithm for the models listed in Table 1 for Scenario I

Full size table

Table 7 Empirical performance of the proposed algorithm for the models listed in Table 1 for Scenario II

Full size table

Table 8 Empirical performance of the proposed algorithm for the models listed in Table 1 for Scenario III

Full size table

Table 9 Empirical performance of the proposed algorithm for the models listed in Table 1 for Scenario IV

Full size table

These results suggest that the proposed method for \(SU\) taken to be median\((\mathbf su )\) and \(w=\frac{k}{2}\) is quite robust for count response modelling. The average Type I and Type II error values are almost identical under Scenarios II, III, IV considered.

In general, we conclude that for each Scenario considered, the proposed method seems to perform efficiently after selecting \(SU\) to be median\((\mathbf su )\) and \(w=\frac{k}{2}\). With this choice of thresholds, the proposed method achieves the lowest Type II error values for all Scenarios considered. The fact that this choice of thresholds results in slightly higher Type I error values is not troublesome since the use of SSDs is mainly to screen the factors that should be considered for further investigation. Hence, the low Type II error rates are especially desirable, even though both Type I and Type II error rates are important and should be kept as low as possible. However, under situations of effect sparsity that holds in SSDs, Type I errors are quite likely to occur, of course.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balakrishnan, N., Koukouvinos, C. & Parpoula, C. Analyzing supersaturated designs for discrete responses via generalized linear models. Stat Papers 56, 121–145 (2015). https://doi.org/10.1007/s00362-013-0569-z

Download citation

Received: 27 January 2013
Revised: 29 October 2013
Published: 19 November 2013
Issue Date: February 2015
DOI: https://doi.org/10.1007/s00362-013-0569-z

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing supersaturated designs for discrete responses via generalized linear models

Abstract

Access this article

Similar content being viewed by others

Multi-objective generalized normal distribution optimization: a novel algorithm for multi-objective problems

Recent advances and applications of surrogate models for finite element method computations: a review

A Review on Global Sensitivity Analysis Methods

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Analyzing supersaturated designs for discrete responses via generalized linear models

Abstract

Access this article

Similar content being viewed by others

Multi-objective generalized normal distribution optimization: a novel algorithm for multi-objective problems

Recent advances and applications of surrogate models for finite element method computations: a review

A Review on Global Sensitivity Analysis Methods

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation