Machine Learning

, Volume 85, Issue 3, pp 299–332

Model selection in reinforcement learning


DOI: 10.1007/s10994-011-5254-7

Cite this article as:
Farahmand, A. & Szepesvári, C. Mach Learn (2011) 85: 299. doi:10.1007/s10994-011-5254-7


We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, \(\ensuremath{\mbox{\textsc {BErMin}}}\), and prove that it enjoys an oracle-like property: the estimator’s error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions \(\ensuremath{\mbox{\textsc {BErMin}}}\) leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.


Reinforcement learningModel selectionComplexity regularizationAdaptivityOffline learningOff-policy learningFinite-sample bounds
Download to read the full article text

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada