Automated discovery of test statistics using genetic programming

Abstract

The process of developing new test statistics is laborious, requiring the manual development and evaluation of mathematical functions that satisfy several theoretical properties. Automating this process, hitherto not done, would greatly accelerate the discovery of much-needed, new test statistics. This automation is a challenging problem because it requires the discovery method to know something about the desirable properties of a good test statistic in addition to having an engine that can develop and explore candidate mathematical solutions with an intuitive representation. In this paper we describe a genetic programming-based system for the automated discovery of new test statistics. Specifically, our system was able to discover test statistics as powerful as the t test for comparing sample means from two distributions with equal variances.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    G. Casella, R.L. Berger, Statistical Inference (Duxbury Press, Pacific Grove, 2001)

    Google Scholar 

  2. 2.

    L. Spector, D.M. Clark, I. Lindsay, B. Barr, J. Klein. Genetic programming for finite algebras, in Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation. (ACM, New York, 2008), pp. 1291–1298

  3. 3.

    J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)

    Google Scholar 

  4. 4.

    R. Poli, W.B. Langdon, N.F. McPhee, A Field Guide to Genetic Programming (Lulu Enterprises, UK Ltd, 2008)

    Google Scholar 

  5. 5.

    M. Sipper, W. Fu, K. Ahuja, J.H. Moore, Investigating the parameter space of evolutionary algorithms. BioData Min. 11, 2 (2018)

    Article  Google Scholar 

  6. 6.

    F.-A. Fortin, F.-M.D. Rainville, M.-A. Gardner, M. Parizeau, C. Gagné, DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MathSciNet  MATH  Google Scholar 

  7. 7.

    K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)

    Article  Google Scholar 

  8. 8.

    D.R. Cox, Present position and potential developments: some personal views: design of experiments and regression. J. R. Stat. Soc. Ser. A Gen. 147, 306–315 (1984)

    Article  Google Scholar 

  9. 9.

    K. Gervin, M. Hammerø, H.E. Akselsen, R. Moe, H. Nygård, I. Brandt, H.K. Gjessing, J.R. Harris, D.E. Undlien, R. Lyle, Extensive variation and low heritability of DNA methylation identified in a twin study. Genome Res. 21, 1813–1821 (2011)

    Article  Google Scholar 

  10. 10.

    K.D. Hansen, W. Timp, H.C. Bravo, S. Sabunciyan, B. Langmead, O.G. McDonald, B. Wen, H. Wu, Y. Liu, D. Diep, E. Briem, K. Zhang, R.A. Irizarry, A.P. Feinberg, Increased methylation variation in epigenetic domains across cancer types. Nat. Genet. 43, 768–775 (2011)

    Article  Google Scholar 

  11. 11.

    Y. Chen, Y. Ning, C. Hong, S. Wang, Semiparametric tests for identifying differentially methylated loci with case-control designs using Illumina arrays. Genet. Epidemiol. 38, 42–50 (2014)

    Article  Google Scholar 

  12. 12.

    C. Hong, Y. Chen, Y. Ning, S. Wang, H. Wu, R.J. Carroll, Plemt: a novel pseudolikelihood based em test for homogeneity in generalized exponential tilt mixture models. J. Am. Stat. Assoc. 112, 1393–1404 (2017)

    MathSciNet  Article  Google Scholar 

  13. 13.

    D. Medernach, J. Fitzgerald, R.M.A Azad, C. Ryan. A new wave: a dynamic approach to genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference 2016. (ACM, New York, 2016), pp. 757–764

Download references

Acknowledgements

This work was supported by National Institutes of Health (USA) Grants LM012601, AI116794, and DK112217. We would like to thank the reviewers for the thoughtful suggestions.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Moshe Sipper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Moore, J.H., Olson, R.S., Chen, Y. et al. Automated discovery of test statistics using genetic programming. Genet Program Evolvable Mach 20, 127–137 (2019). https://doi.org/10.1007/s10710-018-9338-z

Download citation

Keywords

  • Genetic programming
  • Statistics
  • Optimization
  • t test