Pattern Analysis and Applications

, Volume 19, Issue 2, pp 475–485 | Cite as

Approximate variational inference based on a finite sample of Gaussian latent variables

  • Nikolaos Gianniotis
  • Christoph Schnörr
  • Christian Molkenthin
  • Sanjay Singh Bora
Short Paper
  • 135 Downloads

Abstract

Variational methods are employed in situations where exact Bayesian inference becomes intractable due to the difficulty in performing certain integrals. Typically, variational methods postulate a tractable posterior and formulate a lower bound on the desired integral to be approximated, e.g. marginal likelihood. The lower bound is then optimised with respect to its free parameters, the so-called variational parameters. However, this is not always possible as for certain integrals it is very challenging (or tedious) to come up with a suitable lower bound. Here, we propose a simple scheme that overcomes some of the awkward cases where the usual variational treatment becomes difficult. The scheme relies on a rewriting of the lower bound on the model log-likelihood. We demonstrate the proposed scheme on a number of synthetic and real examples, as well as on a real geophysical model for which the standard variational approaches are inapplicable.

Keywords

Bayesian inference Posterior estimation Expectation maximisation 

Notes

Acknowledgments

The RESORCE database [1] was used in this work with the kind permission of the SIGMA project. 8 N. Gianniotis was partially funded by the BMBF project “Potsdam Research Cluster for Georisk Analysis, Environmental Change and Sustainability”. C. Molkenthin and S. S. Bora were funded by the graduate research school GeoSim of the Geo.X initiative.9

References

  1. 1.
    Akkar S, Sandikkaya M, Senyurt M, Azari Sisi A, Ay BO, Traversa P, Douglas J, Cotton F, Luzi L, Hernandez B, Godey S (2014) Reference database for seismic ground-motion in Europe (RESORCE). Bull Earthq Eng 12(1):311–339CrossRefGoogle Scholar
  2. 2.
    Archambeau C, Delannay N, Verleysen M (2006) Robust probabilistic projections. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 33–40Google Scholar
  3. 3.
    Azzalini A (2005) The skew-normal distribution and related multivariate families. Scand J Stat 32(2):159–188MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Barber D, Bishop CM (1998) Ensemble learning in Bayesian neural networks. In: Generalization in neural networks and machine learning. Springer, pp 215–237Google Scholar
  5. 5.
    Beal MJ (2003) Variational algorithms for approximate bayesian inference. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College LondonGoogle Scholar
  6. 6.
    Bishop CM (2006) Pattern recognition and machine learning. Springer, New YorkMATHGoogle Scholar
  7. 7.
    Boore DM (2003) Simulation of ground motion using the stochastic method. Pure Appl Geophys 160(3–4):635–676CrossRefGoogle Scholar
  8. 8.
    Bottou L (2012) Stochastic gradient tricks. In: Neural networks, tricks of the trade, reloaded, LNCS, vol 7700. SpringerGoogle Scholar
  9. 9.
    Cseke B, Heskes T (2011) Approximate marginals in latent Gaussian models. J Mach Learn Res 12:417–454MathSciNetMATHGoogle Scholar
  10. 10.
    Georghiades AS, Belhumeur PN, Kriegman D (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. Pattern Anal Mach Intell IEEE Trans 23(6):643–660CrossRefGoogle Scholar
  11. 11.
    Jaakkola T, Jordan M (2000) Bayesian parameter estimation via variational methods. Stat Comput 10(1):25–37CrossRefGoogle Scholar
  12. 12.
    Kushner H, Yin G (2003) Stochastic approximation and recursive algorithms and applications, 2nd edn. Springer, New YorkMATHGoogle Scholar
  13. 13.
    Møller MF (1993) A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw 6(4):525–533CrossRefGoogle Scholar
  14. 14.
    Opper M, Archambeau C (2009) The variational Gaussian approximation revisited. Neural Comput 21:786–792MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Petersen KB, Pedersen MS (2012) The matrix cookbook. http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=3274
  16. 16.
    Psorakis I, Damoulas T, Girolami MA (2010) Multiclass relevance vector machines: sparsity and accuracy. IEEE Trans Neural Netw 21(10):1588–1598CrossRefGoogle Scholar
  17. 17.
    Reiter L (1991) Earthquake hazard analysis: issues and insights. Colombia University Press, New YorkGoogle Scholar
  18. 18.
    Rue H, Martino S, Chopin N (2009) Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc: Ser B 71(2):319–392MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244MathSciNetMATHGoogle Scholar
  20. 20.
    Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J R Stat Socy: Ser B 61(3):611–622MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Tzikas D, Likas C, Galatsanos N (2008) The variational approximation for Bayesian inference. Signal Process Mag IEEE 25(6):131–146CrossRefGoogle Scholar
  22. 22.
    Xie P, Xing EP (2014) Cauchy principal component analysis. CoRR abs/1412.6506. http://arxiv.org/abs/1412.6506. Accessed 25 Mar 2015

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Nikolaos Gianniotis
    • 1
    • 2
  • Christoph Schnörr
    • 3
  • Christian Molkenthin
    • 1
  • Sanjay Singh Bora
    • 1
  1. 1.Institute of Earth and Environmental ScienceUniversity of PotsdamPotsdamGermany
  2. 2.Heidelberg Institute for Theoretical Studies, Astroinformatics GroupHeidelbergGermany
  3. 3.Image and Pattern Analysis GroupUniversity of HeidelbergHeidelbergGermany

Personalised recommendations