Statistics and Computing

, Volume 20, Issue 1, pp 63–73 | Cite as

Non-linear regression models for Approximate Bayesian Computation

Article

Abstract

Approximate Bayesian inference on the basis of summary statistics is well-suited to complex problems for which the likelihood is either mathematically or computationally intractable. However the methods that use rejection suffer from the curse of dimensionality when the number of summary statistics is increased. Here we propose a machine-learning approach to the estimation of the posterior density by introducing two innovations. The new method fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling. The new algorithm is compared to the state-of-the-art approximate Bayesian methods, and achieves considerable reduction of the computational burden in two examples of inference in statistical genetics and in a queueing model.

Keywords

Likelihood-free inference Curse of dimensionality Feed forward neural networks Heteroscedasticity Coalescent models Approximate Bayesian computation Conditional density estimation Implicit statistical models Importance sampling Non-linear regression Indirect inference 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11222_2009_9116_MOESM1_ESM.pdf (275 kb)
Below is the link to the electronic supplementary material

References

  1. Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002) Google Scholar
  2. Beaumont, M.A.: Joint determination of topology, divergence time, and immigration in population trees. In: Matsumura, S., Forster, P., Renfrew, C. (eds.) Simulation, Genetics and Human Prehistory. McDonald Institute Monographs: Cambridge McDonald Institute for Archeological Research, UK, pp. 134–154 (2008) Google Scholar
  3. Beaumont, M.A., Cornuet, J.-M., Marin, J.-M., Robert, C.P.: Adaptivity for ABC algorithms: the ABC-PMC scheme (2009). arXiv:0805.2256
  4. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006) MATHCrossRefGoogle Scholar
  5. Blum, M.G.B., Tran, V.C.: Approximate Bayesian Computation for epidemiological models: Application to the Cuban HIV-AIDS epidemic with contact-tracing and unobserved infectious population (2008). arXiv:0810.0896
  6. Box, G.E.P., Cox, D.R.: An analysis of transformations. J. R. Stat. Soc. B 26, 211–246 (1964) MATHMathSciNetGoogle Scholar
  7. Bortot, P., Coles, S.G., Sisson, S.A.: Inference for stereological extremes. J. Am. Stat. Assoc. 102, 84–92 (2007) MATHCrossRefMathSciNetGoogle Scholar
  8. Butler, A., Glasbey, C.A.: A latent Gaussian model for compositional data with structural zeroes. J. R. Stat. Soc. Ser. C (Appl. Stat.) 57, 505–520 (2008) CrossRefGoogle Scholar
  9. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  10. Diggle, P.J., Gratton, R.J.: Monte Carlo methods of inference for implicit statistical models. J. R. Stat. Soc. B 46, 193–227 (1984) MATHMathSciNetGoogle Scholar
  11. Fagundes, N.J.R., Ray, N., Beaumont, M., Neuenschwander, S., Salzano, S.M., Bonatto, S.L., Excoffier, L.: Statistical evaluation of alternative models of human evolution. Proc. Natl. Acad. Sci. USA 104, 17614–17619 (2007) CrossRefGoogle Scholar
  12. Fan, J., Yao, Q.: Efficient estimation of conditional variance functions in stochastic regression. Biometrika 85, 645–660 (1998) MATHCrossRefMathSciNetGoogle Scholar
  13. Friedman, J.H., Stuetze, W.: Projection pursuit regression. J. Am. Stat. Assoc. 76, 817–823 (1981) CrossRefGoogle Scholar
  14. Fu, Y.-X., Li, W.-H.: Maximum likelihood estimation of population parameters. Genetics 134, 1261–1270 (1993) Google Scholar
  15. Fu, Y.-X., Li, W.-H.: Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14, 195–199 (1997) Google Scholar
  16. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn. Chapman & Hall, London (2003) Google Scholar
  17. Grelaud, A., Robert, C.P., Marin, J.-M., Rodolphe, F., Taly, J.-F.: ABC methods for model choice in Gibbs random fields (2009). arXiv:0807.2767
  18. Gourieroux, C., Monfort, A., Renault, E.: Indirect inference. J. Appl. Econ. 8, 85–118 (1993) CrossRefGoogle Scholar
  19. Härdle, W., Müller, M., Sperlich, S., Werwatz, A.: Nonparametric and Semiparametric Models. Springer, New York (2004) MATHGoogle Scholar
  20. Heggland, K., Frigessi, A.: Estimating functions in indirect inference. J. R. Stat. Soc. B 66, 447–462 (2004) MATHCrossRefMathSciNetGoogle Scholar
  21. Hey, J., Nielsen, R.: Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc. Natl. Acad. Sci. USA 104, 2785–2790 (2007) CrossRefGoogle Scholar
  22. King, J.P., Kimmel, M., Chakraborty, R.: A power analysis of microsatellite-based statistics for inferring past population growth. Mol. Biol. Evol. 17, 1859–1868 (2000) Google Scholar
  23. Kuhner, M.K.: LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics 22, 768–770 (2006) CrossRefGoogle Scholar
  24. Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001) MATHGoogle Scholar
  25. Marjoram, P., Tavaré, S.: Modern computational approaches for analysing molecular genetic variation data. Nat. Rev. Genet. 7, 759–770 (2006) CrossRefGoogle Scholar
  26. Marjoram, P., Molitor, J., Plagnol, V., Tavaré, S.: Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100, 15324–15328 (2003) CrossRefGoogle Scholar
  27. Nadaraya, E.A.: On estimating regression. Theory Probab. Appl. 9, 141–142 (1964) CrossRefGoogle Scholar
  28. Nix, D.A., Weigend, A.S.: Learning local error bars for nonlinear regression. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems 7 (NIPS’94), pp. 489–496. MIT Press, Cambridge (1995) Google Scholar
  29. Ohta, T., Kimura, M.: A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22, 201–204 (1973) CrossRefMathSciNetGoogle Scholar
  30. Pritchard, J.K., Feldman, M.W.: Statistics for microsatellite variation based on coalescence. Theor. Popul. Biol. 50, 325–344 (1996) MATHCrossRefGoogle Scholar
  31. Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A., Feldman, M.W.: Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999) Google Scholar
  32. R Development Core Team: R: A Language and Environment for Statistical. R Foundation for Statistical Computing, Vienna, Austria (2008) Google Scholar
  33. Ratmann, O., Jørgensen, O., Hinkley, T., Stumpf, M., Richardson, S., Wiuf, C.: Using likelihood-free inference to compare evolutionary dynamics of the protein networks of H. pylori and P. falciparum. PLoS Comput. Biol. 3, e230 (2007) CrossRefGoogle Scholar
  34. Reich, D.E., Goldstein, D.B.: Genetic evidence for a Paleolithic human population expansion in Africa. Proc. Natl. Acad. Sci. USA 95, 8119–8123 (1998) CrossRefGoogle Scholar
  35. Ripley, B.D.: Pattern Recognition and Neural Networks. Oxford University Press, London (1996) MATHGoogle Scholar
  36. Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004) MATHGoogle Scholar
  37. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J.: Estimating the support of a high-dimensional distribution. Neural Comput. 13, 1443–1471 (2001) MATHCrossRefGoogle Scholar
  38. Shriver, M.D., Jin, L., Ferrell, R.E., Deka, R.: Microsatellite data support an early population expansion in Africa. Genome Res. 7, 586–591 (1997) Google Scholar
  39. Sisson, S.A., Fan, Y., Tanaka, M.M.: Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104, 1760–1765 (2007) MATHCrossRefMathSciNetGoogle Scholar
  40. Stephens, M., Donnelly, P.: Inference in molecular population genetics. J. R. Stat. Soc. Ser. B 62, 605–635 (2000) MATHCrossRefMathSciNetGoogle Scholar
  41. Tanaka, M., Francis, A., Luciani, F., Sisson, S.: Estimating tuberculosis transmission parameters from genotype data using approximate Bayesian computation. Genetics 173, 1511–1520 (2006) CrossRefGoogle Scholar
  42. Tavaré, S.: Ancestral inference in population genetics. In: Picard, J. (ed.) Lectures on Probability Theory and Statistics, pp. 1–188. Springer, Berlin (2004) Google Scholar
  43. Tavaré, S., Balding, D.J., Griffiths, R.C., Donnelly, P.: Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997) Google Scholar
  44. Toni, T., Welch, D., Strelkowa, N., Stumpf, M.P.H.: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009) CrossRefGoogle Scholar
  45. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998) MATHGoogle Scholar
  46. Watson, G.S.: Smooth regression analysis. Shankya Ser. A 26, 359–372 (1964) MATHGoogle Scholar
  47. Weiss, G., von Haeseler, A.: Inference of population history using a likelihood approach. Genetics 149, 1539–1546 (1998) Google Scholar
  48. Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error (2008). arXiv:0811.3355
  49. Wilson, I.J., Weale, M.E., Balding, D.J.: Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities. J. R. Stat. Soc. Ser. A 166, 155–187 (2003) CrossRefMathSciNetGoogle Scholar
  50. Zhivotovsky, L.A., Bennett, L., Bowcock, A.M., Feldman, M.W.: Human population expansion and microsatellite variation. Mol. Biol. Evol. 17, 757–767 (2000) Google Scholar
  51. Zhivotovsky, L.A., Rosenberg, N.A., Feldman, M.W.: Features of evolution and expansion of modern humans, inferred from genome-wide microsatellite markers. Am. J. Hum. Genet. 72, 1171–1186 (2003) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Centre National de la Recherche ScientifiqueTIMC-IMAG, Faculty of Medicine of GrenobleLa TroncheFrance
  2. 2.Institut National Polytechnique de GrenobleTIMC-IMAG, Faculty of Medicine of GrenobleLa TroncheFrance

Personalised recommendations