Advertisement

Automated discovery of test statistics using genetic programming

  • Jason H. Moore
  • Randal S. Olson
  • Yong Chen
  • Moshe Sipper
Letter

Abstract

The process of developing new test statistics is laborious, requiring the manual development and evaluation of mathematical functions that satisfy several theoretical properties. Automating this process, hitherto not done, would greatly accelerate the discovery of much-needed, new test statistics. This automation is a challenging problem because it requires the discovery method to know something about the desirable properties of a good test statistic in addition to having an engine that can develop and explore candidate mathematical solutions with an intuitive representation. In this paper we describe a genetic programming-based system for the automated discovery of new test statistics. Specifically, our system was able to discover test statistics as powerful as the t test for comparing sample means from two distributions with equal variances.

Keywords

Genetic programming Statistics Optimization t test 

Notes

Acknowledgements

This work was supported by National Institutes of Health (USA) Grants LM012601, AI116794, and DK112217. We would like to thank the reviewers for the thoughtful suggestions.

References

  1. 1.
    G. Casella, R.L. Berger, Statistical Inference (Duxbury Press, Pacific Grove, 2001)zbMATHGoogle Scholar
  2. 2.
    L. Spector, D.M. Clark, I. Lindsay, B. Barr, J. Klein. Genetic programming for finite algebras, in Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation. (ACM, New York, 2008), pp. 1291–1298Google Scholar
  3. 3.
    J.R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection (MIT Press, Cambridge, 1992)zbMATHGoogle Scholar
  4. 4.
    R. Poli, W.B. Langdon, N.F. McPhee, A Field Guide to Genetic Programming (Lulu Enterprises, UK Ltd, 2008)Google Scholar
  5. 5.
    M. Sipper, W. Fu, K. Ahuja, J.H. Moore, Investigating the parameter space of evolutionary algorithms. BioData Min. 11, 2 (2018)CrossRefGoogle Scholar
  6. 6.
    F.-A. Fortin, F.-M.D. Rainville, M.-A. Gardner, M. Parizeau, C. Gagné, DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)MathSciNetzbMATHGoogle Scholar
  7. 7.
    K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182–197 (2002)CrossRefGoogle Scholar
  8. 8.
    D.R. Cox, Present position and potential developments: some personal views: design of experiments and regression. J. R. Stat. Soc. Ser. A Gen. 147, 306–315 (1984)CrossRefGoogle Scholar
  9. 9.
    K. Gervin, M. Hammerø, H.E. Akselsen, R. Moe, H. Nygård, I. Brandt, H.K. Gjessing, J.R. Harris, D.E. Undlien, R. Lyle, Extensive variation and low heritability of DNA methylation identified in a twin study. Genome Res. 21, 1813–1821 (2011)CrossRefGoogle Scholar
  10. 10.
    K.D. Hansen, W. Timp, H.C. Bravo, S. Sabunciyan, B. Langmead, O.G. McDonald, B. Wen, H. Wu, Y. Liu, D. Diep, E. Briem, K. Zhang, R.A. Irizarry, A.P. Feinberg, Increased methylation variation in epigenetic domains across cancer types. Nat. Genet. 43, 768–775 (2011)CrossRefGoogle Scholar
  11. 11.
    Y. Chen, Y. Ning, C. Hong, S. Wang, Semiparametric tests for identifying differentially methylated loci with case-control designs using Illumina arrays. Genet. Epidemiol. 38, 42–50 (2014)CrossRefGoogle Scholar
  12. 12.
    C. Hong, Y. Chen, Y. Ning, S. Wang, H. Wu, R.J. Carroll, Plemt: a novel pseudolikelihood based em test for homogeneity in generalized exponential tilt mixture models. J. Am. Stat. Assoc. 112, 1393–1404 (2017)MathSciNetCrossRefGoogle Scholar
  13. 13.
    D. Medernach, J. Fitzgerald, R.M.A Azad, C. Ryan. A new wave: a dynamic approach to genetic programming, in Proceedings of the Genetic and Evolutionary Computation Conference 2016. (ACM, New York, 2016), pp. 757–764Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute for Biomedical Informatics, Perelman School of MedicineUniversity of PennsylvaniaPhiladelphiaUSA
  2. 2.Department of Computer ScienceBen-Gurion UniversityBeer-ShevaIsrael

Personalised recommendations