Iterative Model Refinement of Recommender MDPs Based on Expert Feedback

  • Omar Zia Khan
  • Pascal Poupart
  • John Mark Agosta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8188)


In this paper, we present a method to iteratively refine the parameters of a Markov Decision Process by leveraging constraints implied from an expert’s review of the policy. We impose a constraint on the parameters of the model for every case where the expert’s recommendation differs from the recommendation of the policy. We demonstrate that consistency with an expert’s feedback leads to non-convex constraints on the model parameters. We refine the parameters of the model, under these constraints, by partitioning the parameter space and iteratively applying alternating optimization. We demonstrate how the approach can be applied to both flat and factored MDPs and present results based on diagnostic sessions from a manufacturing scenario.


Transition Function Optimal Policy Recommender System Reinforcement Learning Transition Parameter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Twenty-First International Conference on Machine Learning, ICML (2004)Google Scholar
  2. 2.
    Abbeel, P., Ng, A.Y.: Exploration and apprenticeship learning in reinforcement learning. In: Twenty Second International Conference on Machine Learning, ICML (2005)Google Scholar
  3. 3.
    Agosta, J.M., Khan, O.Z., Poupart, P.: Evaluation results for a query-based diagnostics application. In: Fifth Workshop on Probabilistic Graphical Models, PGM (2010)Google Scholar
  4. 4.
    Altman, E.: Constrained Markov Decision Processes. CRC Press (1999)Google Scholar
  5. 5.
    Bai, H., Hsu, D., Lee, W.S., Ngo, V.A.: Monte carlo value iteration for continuous-state POMDPs. In: Hsu, D., Isler, V., Latombe, J.-C., Lin, M.C. (eds.) Algorithmic Foundations of Robotics IX. STAR, vol. 68, pp. 175–191. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    de Campos, C.P., Ji, Q.: Improving Bayesian network parameter learning using constraints. In: International Conference in Pattern Recognition, ICPR (2008)Google Scholar
  7. 7.
    Geibel, P.: Reinforcement learning for MDPs with constraints. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 646–653. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Khan, O.Z., Poupart, P., Agosta, J.M.: Automated refinement of Bayes networks’ parameters based on test ordering constraints. In: Neural Information Processing Systems, NIPS (2011)Google Scholar
  9. 9.
    Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Seventeenth International Conference on Machine Learning, ICML (2000)Google Scholar
  10. 10.
    Niculescu, R.S., Mitchell, T.M., Rao, R.B.: Bayesian network learning with parameter constraints. Journal of Machine Learning Research (JMLR) 7, 1357–1383 (2006)MathSciNetMATHGoogle Scholar
  11. 11.
    Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. Journal of Artificial Intelligence Research (JAIR) 19, 569–629 (2003)MATHGoogle Scholar
  12. 12.
    Ratliff, N., Andrew (Drew) Bagnell, J., Zinkevich, M.: Maximum margin planning. In: Twenty Third International Conference on Machine Learning, ICML (2006)Google Scholar
  13. 13.
    Regan, K., Boutilier, C.: Robust policy computation in reward-uncertain MDPs using nondominated policies. In: Twenty-Fourth Conference on Artificial Intelligence, AAAI (2010)Google Scholar
  14. 14.
    Shani, G., Pineau, J., Kaplow, R.: A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems 27, 1–51 (2013)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Omar Zia Khan
    • 1
  • Pascal Poupart
    • 1
  • John Mark Agosta
    • 2
  1. 1.David R. Cheriton School of Computer ScienceUniversity of WaterlooCanada
  2. 2.Toyota InfoTechnology CenterMountain ViewUSA

Personalised recommendations