Skip to main content
Log in

An information theoretical algorithm for analyzing supersaturated designs for a binary response

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

A supersaturated design is a factorial design in which the number of effects to be estimated is greater than the number of runs. It is used in many experiments, for screening purpose, i.e., for studying a large number of factors and identifying the active ones. In this paper, we propose a method for screening out the important factors from a large set of potentially active variables through the symmetrical uncertainty measure combined with the information gain measure. We develop an information theoretical analysis method by using Shannon and some other entropy measures such as Rényi entropy, Havrda–Charvát entropy, and Tsallis entropy, on data and assuming generalized linear models for a Bernoulli response. This method is quite advantageous as it enables us to use supersaturated designs for analyzing data on generalized linear models. Empirical study demonstrates that this method performs well giving low Type I and Type II error rates for any entropy measure we use. Moreover, the proposed method is more efficient when compared to the existing ROC methodology of identifying the significant factors for a dichotomous response in terms of error rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abraham B, Chipman H, Vijayan K (1999) Some risks in the construction and analysis of supersaturated designs. Technometrics 41: 135–141

    Article  Google Scholar 

  • Beattie SD, Fong DKF, Lin DKJ (2002) A two-stage Bayesian model selection srategy for supersaturated designs. Technometrics 44: 55–63

    Article  MathSciNet  Google Scholar 

  • Biesiada J, Duch W et al (2007) Feature selection for high-dimensional data: a Pearson redundancy based filter. In: Kurzynski M (eds) Computer recognitions systems 2, vol 45. Springer, Berlin, pp 242–249

    Chapter  Google Scholar 

  • Box GEP, Meyer RD (1986) An analysis for unreplicated fractional factorials. Technometrics 28: 11–18

    Article  MathSciNet  MATH  Google Scholar 

  • Candes EJ, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35: 2313–2351

    Article  MathSciNet  MATH  Google Scholar 

  • Chipman H, Hamada M, Wu CFJ (1997) A Bayesian variable selection approach for analyzing designed experiments with complex aliasing. Technometrics 39: 372–381

    Article  MATH  Google Scholar 

  • Dash M, Liu H, Motoda H (2000) Consistency based feature selection. In: Proceedings of the fourth Pacific Asia conference on knowledge discovery and data mining. Springer, pp 98–109

  • Gini C (1912) Variabilita e mutabilita: contributo allo studio delle distribuzioni e relazioni stati-stiche. Studi Economico-Giuridici dell’Univ. di Cagliari 3: 1–158

    Google Scholar 

  • Hall MA (1999) Correlation based feature selection for machine learning. PhD thesis, Department of Computer Science, Waikato

  • Havrda J, Charvát F (1967) Quantification method of classification processes: concept of structural entropy. Kybernetika 3: 30–35

    MathSciNet  MATH  Google Scholar 

  • Holcomb DR, Montgomery DC, Carlyle WM (2003) Analysis of supersaturated designs. J Qual Technol 35: 13–27

    Google Scholar 

  • Jones B, Lin DKJ, Nachtsheimc CJ (2008) Bayesian D-optimal supersaturated designs. J Stat Plan Inference 138: 86–92

    Article  MATH  Google Scholar 

  • Kira K, Rendell L (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the tenth National conference on artificial intelligence. AAAI Press/The MIT Press, Menlo Park, pp 129–134

  • Koukouvinos C, Mylona K, Simos DE (2008) E(s 2)-optimal and minimax-optimal cyclic supersaturated designs via multi-objective simulated annealing. J Stat Plan Inference 138: 1639–1646

    Article  MathSciNet  MATH  Google Scholar 

  • Li R, Lin DKJ (2002) Data analysis in supersaturated designs. Stat Probab Lett 59: 135–144

    Article  MathSciNet  MATH  Google Scholar 

  • Lin DKJ (1993) A new class of supersaturated designs. Technometrics 35: 28–31

    Article  Google Scholar 

  • Lin DKJ (1995) Generating systematic supersaturated designs. Technometrics 37: 213–225

    Article  MATH  Google Scholar 

  • Lu X, Wu X (2004) A strategy of searching active factors in supersaturated screening experiments. J Qual Technol 36: 392–399

    Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, London

    MATH  Google Scholar 

  • Montgomery DC, Peck EA, Vining GG (2006) Introduction to linear regression analysis, 4th edn. Wiley, Hoboken

    MATH  Google Scholar 

  • Pepe MS (2000a) Receiver operating characteristic methodology. J Am Stat Assoc 95: 308–311

    Article  Google Scholar 

  • Pepe MS (2000b) An interpretation for ROC curve and inference using GLM procedures. Biometrics 56: 352–359

    Article  MathSciNet  MATH  Google Scholar 

  • Phoa FKH, Pan Y-H, Xu H (2009) Analysis of supersaturated designs via the Dantzig selector. J Stat Plan Inference 139: 2362–2372

    Article  MathSciNet  MATH  Google Scholar 

  • Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1988) Numerical recipes in C. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81–106

    Google Scholar 

  • Rényi A (1961) On measures of information and entropy. In: Proceedings of the 4th Berkeley symposium on mathematics, statistics and probability. Berkeley University Press, Berkeley, pp 547–561

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379-423 (623–656)

    Google Scholar 

  • Tang B, Wu CFJ (1997) A method for constructing supersaturated designs and its E(s 2)-optimality. Can J Stat 25: 191–201

    Article  MathSciNet  MATH  Google Scholar 

  • Tsallis C (1988) Possible generalization of Boltzmann-Gibbs statistics. J Stat Phys 52: 479–487

    Article  MathSciNet  MATH  Google Scholar 

  • Wang PC (1995) Comments on Lin (1993). Technometrics 37: 358–359

    Google Scholar 

  • Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: Proceedings of the twentieth international conference on machine learning (ICML-2003), Washington, DC, pp 856–863

  • Zhang QZ, Zhang RC, Liu MQ (2007) A method for screening active effects in supersaturated designs. J Stat Plan Inference 137: 235–248

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Koukouvinos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balakrishnan, N., Koukouvinos, C. & Parpoula, C. An information theoretical algorithm for analyzing supersaturated designs for a binary response. Metrika 76, 1–18 (2013). https://doi.org/10.1007/s00184-011-0373-5

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-011-0373-5

Keywords

Navigation