Abstracting Reinforcement Learning Agents with Prior Knowledge

  • Nicolas BougieEmail author
  • Ryutaro Ichise
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11224)


Recent breakthroughs in reinforcement learning have enabled the creation of learning agents for solving a wide variety of sequential decision problems. However, these methods require a large number of iterations in complex environments. A standard paradigm to tackle this challenge is to extend reinforcement learning to handle function approximation with deep learning. Lack of interpretability and impossibility to introduce background knowledge limits their usability in many safety-critical real-world scenarios. In this paper, we propose a new agent architecture to combine reinforcement learning and external knowledge. We derive a rule-based variant version of the Sarsa(\(\lambda \)) algorithm, which we call Sarsa-rb(\(\lambda \)), that augments data with complex knowledge and exploits similarities among states. We apply our method to a trading task from the stock market environment. We show that the resulting agent leads to much better performance but also improves training speed compared to the Deep Q-learning (DQN) algorithm and the Deep Deterministic Policy Gradients (DDPG) algorithm.


Reinforcement learning Learning agent Symbolic reinforcement learning Reasoning about knowledge Agent architecture 


  1. 1.
    Andre, D., Russell, S.J.: State abstraction for programmable reinforcement learning agents (2002)Google Scholar
  2. 2.
    Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents (2013)CrossRefGoogle Scholar
  3. 3.
    Boots, B., Siddiqi, S.M., Gordon, G.J.: Closing the learning-planning loop with predictive state representations. Int. J. Robot. Res. 30(7), 954–966 (2011)CrossRefGoogle Scholar
  4. 4.
    Bougie, N., Ichise, R.: Deep reinforcement learning boosted by external knowledge. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp. 331–338. ACM (2018)Google Scholar
  5. 5.
    d’Avila Garcez, A., Resende Riquetti Dutra, A., Alonso, E.: Towards Symbolic Reinforcement Learning with Common Sense. ArXiv e-prints, April 2018Google Scholar
  6. 6.
    Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Mach. Learn. 43(1–2), 7–52 (2001)CrossRefGoogle Scholar
  7. 7.
    Gambardella, L.M., Dorigo, M.: Ant-Q: a reinforcement learning approach to the traveling salesman problem. In: Machine Learning Proceedings 1995, pp. 252–260. Elsevier (1995)Google Scholar
  8. 8.
    Garnelo, M., Arulkumaran, K., Shanahan, M.: Towards deep symbolic reinforcement learning. In: Abbeel, P., Chen, P., Silver, D., Singh, S. (eds.) NIPS. Neural Information Processing Systems Foundation, La Jolla, California (2016)Google Scholar
  9. 9.
    Hausknecht, M., Stone, P.: Deep recurrent Q-learning for partially observable MDPs (2015)Google Scholar
  10. 10.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
  11. 11.
    Mashayekhi, M., Gras, R.: Rule extraction from random forest: the RF+HC methods. In: Barbosa, D., Milios, E. (eds.) CANADIAN AI 2015. LNCS (LNAI), vol. 9091, pp. 223–237. Springer, Cham (2015). Scholar
  12. 12.
    Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
  13. 13.
    Nison, S.: Japanese Candlestick Charting Techniques: A Contemporary Guide to the Ancient Investment Techniques of the Far East. Penguin, London (2001)Google Scholar
  14. 14.
    Papudesi, V., Huber, M.: Learning behaviorally grounded state representations for reinforcement learning agents (2006)Google Scholar
  15. 15.
    Randløv, J., Alstrøm, P.: Learning to drive a bicycle using reinforcement learning and shaping. Proc. ICML 98, 463–471 (1998)Google Scholar
  16. 16.
    Rosencrantz, M., Gordon, G., Thrun, S.: Learning low dimensional predictive representations. In: Proceedings of the Twenty-First ICML. ACM (2004)Google Scholar
  17. 17.
    Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Proceedings of NIPS, pp. 361–368 (1995)Google Scholar
  18. 18.
    Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22(1–3), 123–158 (1996)zbMATHGoogle Scholar
  19. 19.
    Sutton, R.S.: Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Proceedings of NIPS, pp. 1038–1044 (1996)Google Scholar
  20. 20.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  21. 21.
    Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Sokendai, The Graduate University for Advanced StudiesTokyoJapan
  2. 2.National Institute of InformaticsTokyoJapan

Personalised recommendations