Minds and Machines

, Volume 18, Issue 4, pp 521–526 | Cite as

Optimism in the Face of Uncertainty Should be Refutable

  • Ronald Ortner


We give an example from the theory of Markov decision processes which shows that the “optimism in the face of uncertainty” heuristics may fail to make any progress. This is due to the impossibility to falsify a belief that a (transition) probability is larger than 0. Our example shows the utility of Popper’s demand of falsifiability of hypotheses in the area of artificial intelligence.


Markov decision processes Refutability Reinforcement learning 



The author would like to thank Georg Dorn for comments on a prior version of this paper. This work was supported in part by the Austrian Science Fund FWF (S9104-N13 SP4) and the IST Programme of the European Community, under the PASCAL Network of Excellence, IST-2002-506778. This publication only reflects the authors’ views.


  1. Auer, P., & Ortner, R. (2006). Logarithmic online regret bounds for reinforcement learning. In B. Schölkopf, J. C. Platt, & T. Hofmann (Eds.), Advances in Neural Information Processing Systems (Vol. 19, pp. 49–56). Cambridge, MA: MIT Press.Google Scholar
  2. Auer, P., Jaksch, T., & Ortner, R. (2008). Near-optimal regret bounds for reinforcement learning (submitted).Google Scholar
  3. Brafman, R. I., & Tennenholtz, M. (2002). R-max – a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213–231.CrossRefMathSciNetGoogle Scholar
  4. Kemeny, J. G., Snell, J. L., & Knapp, A. W. (1976). Denumerable Markov chains. New York: Springer.zbMATHGoogle Scholar
  5. Popper, K. R. (1969). Logik der Forschung (3rd ed.). Tübingen: Mohr.Google Scholar
  6. Puterman, M. L. (1994). Markov decision processes. Discrete stochastic programming. New York: Wiley.zbMATHGoogle Scholar
  7. Strehl, A. L., & Littman, M. L. (2004). An empirical evaluation of interval estimation for Markov decision processes. In 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2004) (pp.128–135). IEEE Computer Society.Google Scholar
  8. Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model-based interval estimation. In L. De Raedt & S.Wrobel (Eds.), Machine learning, Proceedings of the Twenty-Second International Conference (ICML 2005) (pp. 857–864) ACM.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  1. 1.Department Mathematik und InformationstechnolgieMontanuniversität LeobenLeobenAustria

Personalised recommendations