Advertisement

Information Trajectory of Optimal Learning

  • Roman V. Belavkin
Chapter
Part of the Springer Optimization and Its Applications book series (SOIA, volume 40)

Summary

The paper outlines some basic principles of geometric and nonasymptotic theory of learning systems. An evolution of such a system is represented by points on a statistical manifold, and a topology related to information dynamics is introduced to define trajectories continuous in information. It is shown that optimization of learning with respect to a given utility function leads to an evolution described by a continuous trajectory. Path integrals along the trajectory define the optimal utility and information bounds. Closed form expressions are derived for two important types of utility functions. The presented approach is a generalization of the use of Orlicz spaces in information geometry, and it gives a new, geometric interpretation of the classical information value theory and statistical mechanics. In addition, theoretical predictions are evaluated experimentally by comparing performance of agents learning in a nonstationary stochastic environment.

Keywords

Utility Function Probability Measure Path Integral Learning System Optimal Trajectory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amari, S.I.: Differential-geometrical methods of statistics. Lecture Notes in Statistics, vol. 25. Springer, Berlin (1985) Google Scholar
  2. 2.
    Belavkin, R.V.: On emotion, learning and uncertainty: A cognitive modelling approach. PhD thesis, The University of Nottingham, Nottingham, UK (2003) Google Scholar
  3. 3.
    Belavkin, R.V.: Acting irrationally to improve performance in stochastic worlds. In: Bramer, M., Coenen, F., Allen, T. (eds.) Proceedings of AI–2005, the 25th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence. Research and Development in Intelligent Systems vol. XXII, pp. 305–316. Springer, Cambridge (2005). BCS Google Scholar
  4. 4.
    Belavkin, R.V.: The duality of utility and information in optimally learning systems. In: 7th IEEE International Conference on ‘Cybernetic Intelligent Systems’. IEEE Press, London (2008) Google Scholar
  5. 5.
    Belavkin, R.V.: Bounds of optimal learning. In: 2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp. 199–204. IEEE Press, Nashville (2009) CrossRefGoogle Scholar
  6. 6.
    Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957) Google Scholar
  7. 7.
    Chentsov, N.N.: Statistical Decision Rules and Optimal Inference. Nauka, Moscow (1972). In Russian, English translation: Am. Math. Soc., Providence (1982) Google Scholar
  8. 8.
    de Finetti, B.: La prévision: ses lois logiques, ses sources subjectives. Ann. Inst. Henri Poincaré 7, 1–68 (1937). In French MATHGoogle Scholar
  9. 9.
    Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957) CrossRefMathSciNetGoogle Scholar
  10. 10.
    Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 108, 171–190 (1957) CrossRefMathSciNetGoogle Scholar
  11. 11.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. J. Artif. Intell. Res. 4, 237–285 (1996) Google Scholar
  12. 12.
    Kolmogorov, A.N.: The theory of information transmission. In: Meeting of the USSR Academy of Sciences on Scientific Problems of Production Automatisation, 1956, pp. 66–99. Akad. Nauk USSR, Moscow (1957). In Russian Google Scholar
  13. 13.
    Kullback, S.: Information Theory and Statistics. Wiley, New York (1959) MATHGoogle Scholar
  14. 14.
    Pistone, G., Sempi, C.: An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 23(5), 1543–1561 (1995) MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R.V., Mishchenko, E.F.: The Mathematical Theory of Optimal Processes. Wiley, New York (1962). Translated from Russian MATHGoogle Scholar
  16. 16.
    Robbins, H.: An empirical Bayes approach to statistics. In: Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 157–163 (1956) Google Scholar
  17. 17.
    Rockafellar, R.T.: Conjugate Duality and Optimization. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 16. SIAM, Philadelphia (1974) MATHGoogle Scholar
  18. 18.
    Shannon, C.E.: A mathematical theory of communication. Bell Syst. Techn. J. 27, 379–423 (1948) MATHMathSciNetGoogle Scholar
  19. 19.
    Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 623–656 (1948) MathSciNetGoogle Scholar
  20. 20.
    Showalter, R.E.: Monotone Operators in Banach Space and Nonlinear Partial Differential Equations. Mathematical Surveys and Monographs, vol. 49. Am. Math. Soc., Providence (1997) MATHGoogle Scholar
  21. 21.
    Stratonovich, R.L.: Optimum nonlinear systems which bring about a separation of a signal with constant parameters from noise. Radiofizika 2(6), 892–901 (1959) Google Scholar
  22. 22.
    Stratonovich, R.L.: Conditional Markov processes. Theory Probab. Appl. 5(2), 156–178 (1960) CrossRefGoogle Scholar
  23. 23.
    Stratonovich, R.L.: On value of information. Izv. USSR Acad. Sci. Techn. Cybern. 5, 3–12 (1965). In Russian Google Scholar
  24. 24.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press, Cambridge (1998) Google Scholar
  25. 25.
    von Neumann, J., Morgenstern, O.: Theory of Games and Economic Behavior, 1st edn. Princeton University Press, Princeton (1944) MATHGoogle Scholar
  26. 26.
    Wald, A.: Statistical Decision Functions. Wiley, New York (1950) MATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Middlesex UniversityLondonUK

Personalised recommendations