Maximal Information Divergence from Statistical Models Defined by Neural Networks

  • Guido Montúfar
  • Johannes Rauh
  • Nihat Ay
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8085)

Abstract

We review recent results about the maximal values of the Kullback-Leibler information divergence from statistical models defined by neural networks, including naïve Bayes models, restricted Boltzmann machines, deep belief networks, and various classes of exponential families. We illustrate approaches to compute the maximal divergence from a given model starting from simple sub- or super-models. We give a new result for deep and narrow belief networks with finite-valued units.

Keywords

neural network exponential family Kullback-Leibler divergence multi-information 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ay, N., Knauf, A.: Maximizing multi-information. Kybernetika 42, 517–538 (2006)MathSciNetMATHGoogle Scholar
  2. 2.
    Ay, N., Montúfar, G., Rauh, J.: Selection criteria for neuromanifolds of stochastic dynamics. In: Advances in Cognitive Neurodynamics (III). Springer (2013)Google Scholar
  3. 3.
    Cybenko, G.: Approximation by superpositions of a sigmoidal function. Technical report, Department of computer Science, Tufts University, Medford, MA (1988)Google Scholar
  4. 4.
    Funahashi, K.: Multilayer neural networks and Bayes decision theory. Neural Networks 11(2), 209–213 (1998)CrossRefGoogle Scholar
  5. 5.
    Hornik, K., Stinchcombe, M.B., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989)CrossRefGoogle Scholar
  6. 6.
    Juríček, J.: Maximization of information divergence from multinomial distributions. Acta Universitatis Carolinae 52(1) (2011)Google Scholar
  7. 7.
    Le Roux, N., Bengio, Y.: Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation 20(6), 1631–1649 (2008)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Le Roux, N., Bengio, Y.: Deep belief networks are compact universal approximators. Neural Computation 22, 2192–2207 (2010)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Matúš, F., Ay, N.: On maximization of the information divergence from an exponential family. In: Proceedings of the WUPES 2003, pp. 199–204 (2003)Google Scholar
  10. 10.
    Matúš, F.: Maximization of information divergences from binary i.i.d. sequences. In: Proceedings IPMU, pp. 1303–1306 (2004)Google Scholar
  11. 11.
    Montúfar, G.: Mixture decompositions of exponential families using a decomposition of their sample spaces. Kybernetika 49(1), 23–39 (2013)MATHGoogle Scholar
  12. 12.
    Montúfar, G.: Universal approximation depth and errors of narrow belief networks with discrete units (2013). Preprint available at http://arxiv.org/abs/1303.7461
  13. 13.
    Montúfar, G., Ay, N.: Refinements of universal approximation results for DBNs and RBMs. Neural Computation 23(5), 1306–1319 (2011)MathSciNetMATHCrossRefGoogle Scholar
  14. 14.
    Montúfar, G., Morton, J.: Kernels and submodels of deep belief networks (2012). Preprint available at http://arxiv.org/abs/1211.0932
  15. 15.
    Montúfar, G., Morton, J.: Discrete restricted Boltzmann machines (2013). Preprint available at http://arxiv.org/abs/1301.3529
  16. 16.
    Montúfar, G., Rauh, J.: Scaling of model approximation errors and expected entropy distances. In: Proceedings of the WUPES 2012, pp. 137–148 (2012)Google Scholar
  17. 17.
    Montúfar, G., Rauh, J., Ay, N.: Expressive power and approximation errors of restricted Boltzmann machines. In: Advances in NIPS 24, pp. 415–423 (2011)Google Scholar
  18. 18.
    Rauh, J.: Finding the maximizers of the information divergence from an exponential family. IEEE Transactions on Information Theory 57(6), 3236–3247 (2011)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Rauh, J.: Optimally approximating exponential families. Kybernetika 49(2), 199–215 (2013)MATHGoogle Scholar
  20. 20.
    Sutskever, I., Hinton, G.E.: Deep narrow sigmoid belief networks are universal approximators. Neural Computation 20, 2629–2636Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Guido Montúfar
    • 1
  • Johannes Rauh
    • 2
  • Nihat Ay
    • 2
    • 3
  1. 1.Department of MathematicsPennsylvania State UniversityUniversity ParkUSA
  2. 2.Max Planck Institute for Mathematics in the SciencesLeipzigGermany
  3. 3.Santa Fe InstituteSanta FeUSA

Personalised recommendations