Skip to main content
Log in

Cross-entropic learning of a machine for the decision in a partially observable universe

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

In this paper, we are interested in optimal decisions in a partially observable universe. Our approach is to directly approximate an optimal strategic tree depending on the observation. This approximation is made by means of a parameterized probabilistic law. A particular family of Hidden Markov Models (HMM), with input and output, is considered as a model of policy. A method for optimizing the parameters of these HMMs is proposed and applied. This optimization is based on the cross-entropic (CE) principle for rare events simulation developed by Rubinstein.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bakker, B., Schmidhuber J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In Proceedings of the 8th Conference on Intelligent Autonomous Systems, pp. 438–445. Amsterdam, The Netherlands (2004)

  2. Bellman R. (1957): Dynamic Programming. Princeton University Press, Princeton, New Jersey

    Google Scholar 

  3. de Boer, P.-T., Kroesse, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method, http://www.cs.utwente.nl/~ptdeboer/ce/

  4. Cassandra A.R. (1998): Exact and approximate algorithms for partially observable Markov decision processes. PhD thesis, Brown University, Rhode Island, Providence

    Google Scholar 

  5. Fine S., Singer Y., Tishby N. (1998): The hierarchical hidden markov model: Analysis and Application. Machine Learning 32(1): 41–62

    Article  Google Scholar 

  6. Homem-de-Mello, T., Rubinstein, R.Y.: Rare event estimation for static models via cross-entropy and importance sampling. http://users.iems.nwu.edu/~tito/list.htm

  7. Meuleau, N., Peshkin, L., Kim, K-E., Kaelbling, L.P.: Learning finite-state controllers for partially observable environments. In Proc. of UAI-99, pp. 427–436. Stockholm (1999)

  8. Murphy, K., Paskin, M.: Linear time inference in hierarchical HMMs. In: Proceedings of Neural Information Processing Systems, Vancouver, Canada (2001)

  9. Rubinstein R., Kroese D.P. (2004): The Cross-Entropy method. An unified approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Information Science & Statistics, Springer, Berlin

    Google Scholar 

  10. Sondik E.J. (1971): The optimal control of partially observable markov processes. PhD thesis, Stanford University, Stanford, California

    Google Scholar 

  11. Sutton R.J., Barto A.G. (2000): Reinforcement Learning. MIT Press, Cambridge, MA

    Google Scholar 

  12. Theocharous, G: Hierarchical learning and planning in partially observable markov decision processes. PhD thesis, Michigan State University (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frédéric Dambreville.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dambreville, F. Cross-entropic learning of a machine for the decision in a partially observable universe. J Glob Optim 37, 541–555 (2007). https://doi.org/10.1007/s10898-006-9061-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-006-9061-9

Keywords

Navigation