Abstract
In this paper, we present a method to iteratively refine the parameters of a Markov Decision Process by leveraging constraints implied from an expert’s review of the policy. We impose a constraint on the parameters of the model for every case where the expert’s recommendation differs from the recommendation of the policy. We demonstrate that consistency with an expert’s feedback leads to non-convex constraints on the model parameters. We refine the parameters of the model, under these constraints, by partitioning the parameter space and iteratively applying alternating optimization. We demonstrate how the approach can be applied to both flat and factored MDPs and present results based on diagnostic sessions from a manufacturing scenario.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
References
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Twenty-First International Conference on Machine Learning, ICML (2004)
Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: Twenty Second International Conference on Machine Learning, ICML (2005)
Agosta, J.M., Khan, O.Z., Poupart, P.: Evaluation results for a query-based diagnostics application. In: Fifth Workshop on Probabilistic Graphical Models, PGM (2010)
Altman, E.: Constrained Markov Decision Processes. CRC Press (1999)
Bai, H., Hsu, D., Lee, W.S., Ngo, V.A.: Monte carlo value iteration for continuous-state POMDPs. In: Hsu, D., Isler, V., Latombe, J.-C., Lin, M.C. (eds.) Algorithmic Foundations of Robotics IX. STAR, vol. 68, pp. 175–191. Springer, Heidelberg (2010)
de Campos, C.P., Ji, Q.: Improving Bayesian network parameter learning using constraints. In: International Conference in Pattern Recognition, ICPR (2008)
Geibel, P.: Reinforcement learning for MDPs with constraints. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 646–653. Springer, Heidelberg (2006)
Khan, O.Z., Poupart, P., Agosta, J.M.: Automated refinement of Bayes networks’ parameters based on test ordering constraints. In: Neural Information Processing Systems, NIPS (2011)
Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Seventeenth International Conference on Machine Learning, ICML (2000)
Niculescu, R.S., Mitchell, T.M., Rao, R.B.: Bayesian network learning with parameter constraints. Journal of Machine Learning Research (JMLR) 7, 1357–1383 (2006)
Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research (JAIR) 19, 569–629 (2003)
Ratliff, N., Andrew (Drew) Bagnell, J., Zinkevich, M.: Maximum margin planning. In: Twenty Third International Conference on Machine Learning, ICML (2006)
Regan, K., Boutilier, C.: Robust policy computation in reward-uncertain MDPs using nondominated policies. In: Twenty-Fourth Conference on Artificial Intelligence, AAAI (2010)
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems 27, 1–51 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Khan, O.Z., Poupart, P., Agosta, J.M. (2013). Iterative Model Refinement of Recommender MDPs Based on Expert Feedback. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40988-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-40988-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40987-5
Online ISBN: 978-3-642-40988-2
eBook Packages: Computer ScienceComputer Science (R0)