Skip to main content
Log in

Analyzing supersaturated designs for discrete responses via generalized linear models

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

A supersaturated design is a factorial design in which the number of factors to be estimated is larger than the available number of experimental runs. The cost and time required for many industrial experimentations can be reduced by using the class of supersaturated designs, since the main goal for such a design is to identify only a few of the factors under consideration that have dominant effects and to do this identification at a minimal cost. While most of the literature on supersaturated designs has focused on the construction of designs and their optimality properties, the data analysis of such designs has not been developed to a great extent. In this paper, we propose a supersaturated design analysis method, by assuming generalized linear models for discrete responses, for analyzing main effects designs and identifying simultaneously the effects that are significant. Empirical study demonstrates that this method performs well with low Type I and Type II error rates. The proposed method is therefore useful as it enables us to use supersaturated designs for analyzing data on discrete response regression models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Abraham B, Chipman H, Vijayan K (1999) Some risks in the construction and analysis of supersaturated designs. Technometrics 41:135–141

    Article  Google Scholar 

  • Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York

    Book  MATH  Google Scholar 

  • Balakrishnan N, Koukouvinos C, Parpoula C (2013) An information theoretical algorithm for analyzing supersaturated designs for a binary response. Metrika 76:1–18

    Article  MATH  MathSciNet  Google Scholar 

  • Beattie SD, Fong DKF, Lin DKJ (2002) A two-stage Bayesian model selection strategy for supersaturated designs. Technometrics 44:55–63

    Article  MathSciNet  Google Scholar 

  • Box GEP, Meyer RD (1986) An analysis for unreplicated fractional factorials. Technometrics 28:11–18

    Article  MATH  MathSciNet  Google Scholar 

  • Cameron AC, Trivedi PK (1998) Regression analysis of count data. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Candes EJ, Tao T (2007) The Dantzig selector: statistical estimation when \({p}\) is much larger than \({n}\). Ann Stat 35:2313–2351

    Article  MATH  MathSciNet  Google Scholar 

  • Chipman H, Hamada M, Wu CFJ (1997) A Bayesian variable selection approach for analyzing designed experiments with complex aliasing. Technometrics 39:372–381

    Article  MATH  Google Scholar 

  • Czado C, Raftery AE (2006) Choosing the link function and accounting for link uncertainty in generalized linear models using Bayes factors. Stat Pap 47:419–442

    Article  MATH  MathSciNet  Google Scholar 

  • Draper NR, Pukelsheim F (1996) An overview of design of experiments. Stat Pap 37:1–32

    Article  MATH  MathSciNet  Google Scholar 

  • Erdman D, Jackson L, Sinko A (2008) Zero-inflated Poisson and zero-inflated negative binomial models using the COUNTREG Procedure, Paper 322–2008. SAS Institute Inc., Cary

    Google Scholar 

  • Fleuret F (2004) Fast binary feature selection with conditional mutual information. J Mach Learn Res 5:1531–1555

    MATH  MathSciNet  Google Scholar 

  • Hamada M, Wu CFJ (1992) Analysis of designed experiments with complex aliasing. J Qual Technol 24:130–137

    Google Scholar 

  • Hilbe JM (2008) Negative binomial regression. Cambridge University Press, New York

    Google Scholar 

  • Holcomb DR, Montgomery DC, Carlyle WM (2003) Analysis of supersaturated designs. J Qual Technol 35:13–27

    Google Scholar 

  • Hong CS, Kim BJ (2011) Mutual information and redundancy for categorical data. Stat Pap 52:17–31

    Article  MATH  MathSciNet  Google Scholar 

  • Koukouvinos C, Mylona K, Simos DE (2008) \(E(s^2)\)-optimal and minimax-optimal cyclic supersaturated designs via multi-objective simulated annealing. J Stat Planning Inference 138:1639–1646

    Article  MATH  MathSciNet  Google Scholar 

  • Koukouvinos C, Parpoula C (2012) Analyzing supersaturated designs by means of an information based criterion. Commun Stat Simul Comput 41:44–57

    Article  MATH  MathSciNet  Google Scholar 

  • Li R, Lin DKJ (2002) Data analysis in supersaturated designs. Stat Probab Lett 59:135–144

    Article  MATH  Google Scholar 

  • Lin DKJ (1993) A new class of supersaturated designs. Technometrics 35:28–31

    Article  Google Scholar 

  • Lin DKJ (1995) Generating systematic supersaturated designs. Technometrics 37:213–225

    Article  MATH  Google Scholar 

  • Lu X, Wu X (2004) A strategy of searching active factors in supersaturated screening experiments. J Qual Technol 36:392–399

    Google Scholar 

  • Marley CJ, Woods DC (2010) A comparison of design and model selection methods for supersaturated experiments. Comput Stat Data Anal 54:3158–3167

    Article  MATH  MathSciNet  Google Scholar 

  • McCullagh P, Nelder J (1997) Generalized linear models, 2nd edn. Chapman & Hall, New York

    Google Scholar 

  • Montgomery DC, Peck EA, Vining GG (2006) Introduction to linear regression analysis, 4th edn. Wiley, Hoboken

    MATH  Google Scholar 

  • Myers RH, Montgomery DC, Vining GG (2002) Generalized linear models: with applications in engineering and the sciences. Wiley, New York

    Google Scholar 

  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238

    Article  Google Scholar 

  • Pettersson H (2005) Optimal design in average for inference in generalized linear models. Stat Pap 46:79–100

    Article  MATH  MathSciNet  Google Scholar 

  • Phoa FKH, Pan Y-H, Xu H (2009) Analysis of supersaturated designs via the Dantzig selector. J Stat Planning Inference 139:2362–2372

    Article  MATH  MathSciNet  Google Scholar 

  • Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical recipes in C. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423 and 623–656.

  • Tang B, Wu CFJ (1997) A method for constructing supersaturated designs and its \(E(s^2)\)-optimality. Can J Stat 25:191–201

    Article  MATH  MathSciNet  Google Scholar 

  • Wang PC (1995) Comments on Lin (1993). Technometrics 37:358–359

    Google Scholar 

  • Yu L, Liu H, (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the 20th international conference on machine learning (ICML-2003), Washington, DC, pp 856–863.

  • Zhang QZ, Zhang RC, Liu MQ (2007) A method for screening active effects in supersaturated designs. J Stat Planning Inference 137:235–248

    Google Scholar 

Download references

Acknowledgments

The research of the third author was financially supported by a scholarship awarded by the Secretariat of the Research Committee of National Technical University of Athens. The authors would like to thank the Associate Editor and the referees for their constructive and useful suggestions which resulted in an improvement on an earlier version of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Koukouvinos.

Appendix

Appendix

For each of the 44 models presented in Table 1, 1,000 datasets were generated for each considered Scenario, and the results obtained after the application of our method are presented in Tables 6, 7, 8 and 9 for Scenario I, II, III and IV, respectively, in accordance with the threshold values examined. In the first column, the number that corresponds to each used model is given. Columns named “Type I” and ”Type II” present the average values over 1,000 simulations of the Type I and Type II error rates corresponding to every threshold value. The last line of each table presents the average Type I and Type II error values for the 44 models considered.

Comparative results for each of the following scenarios are as follows:

  • Scenario I: From Table 2, we readily observe that the average Type I and Type II error values for the 44 models considered are 0.23 and 0.06, respectively, for \(SU\) taken to be median\((\mathbf su )\) and \(w=\frac{k}{2}\). With this choice of thresholds, the proposed method has significantly lower Type II error values, and slightly higher Type I error values (see Table 6 for comparison).

  • Scenario II: From Table 3, we readily observe that the average Type I and Type II error values for the 44 models considered are 0.29 and 0.04, respectively, for \(SU\) taken to be median\((\mathbf su )\) and \(w=\frac{k}{2}\). With this choice of thresholds, the proposed method has significantly lower Type II error values, and slightly higher Type I error values (see Table 7 for comparison).

  • Scenario III: From Table 4, we readily observe that the average Type I and Type II error values for the 44 models considered are 0.29 and 0.05, respectively, for \(SU\) taken to be median\((\mathbf su )\) and \(w=\frac{k}{2}\). With this choice of thresholds, the proposed method has significantly lower Type II error values, and slightly higher Type I error values (see Table 8 for comparison).

  • Scenario IV: From Table 5, we readily observe that the average Type I and Type II error values for the 44 models considered are 0.30 and 0.07, respectively, for \(SU\) taken to be median\((\mathbf su )\) and \(w=\frac{k}{2}\). With this choice of thresholds, the proposed method has significantly lower Type II error values, and slightly higher Type I error values (see Table 9 for comparison).

Table 6 Empirical performance of the proposed algorithm for the models listed in Table 1 for Scenario I
Table 7 Empirical performance of the proposed algorithm for the models listed in Table 1 for Scenario II
Table 8 Empirical performance of the proposed algorithm for the models listed in Table 1 for Scenario III
Table 9 Empirical performance of the proposed algorithm for the models listed in Table 1 for Scenario IV

These results suggest that the proposed method for \(SU\) taken to be median\((\mathbf su )\) and \(w=\frac{k}{2}\) is quite robust for count response modelling. The average Type I and Type II error values are almost identical under Scenarios II, III, IV considered.

In general, we conclude that for each Scenario considered, the proposed method seems to perform efficiently after selecting \(SU\) to be median\((\mathbf su )\) and \(w=\frac{k}{2}\). With this choice of thresholds, the proposed method achieves the lowest Type II error values for all Scenarios considered. The fact that this choice of thresholds results in slightly higher Type I error values is not troublesome since the use of SSDs is mainly to screen the factors that should be considered for further investigation. Hence, the low Type II error rates are especially desirable, even though both Type I and Type II error rates are important and should be kept as low as possible. However, under situations of effect sparsity that holds in SSDs, Type I errors are quite likely to occur, of course.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balakrishnan, N., Koukouvinos, C. & Parpoula, C. Analyzing supersaturated designs for discrete responses via generalized linear models. Stat Papers 56, 121–145 (2015). https://doi.org/10.1007/s00362-013-0569-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-013-0569-z

Keywords

Mathematics Subject Classification (2000)

Navigation