Skip to main content

Leveraging Domain Knowledge for Reinforcement Learning Using MMC Architectures

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 11728)


Despite the success of reinforcement learning methods in various simulated robotic applications, end-to-end training suffers from extensive training times due to high sample complexity and does not scale well to realistic systems. In this work, we speed up reinforcement learning by incorporating domain knowledge into policy learning. We revisit an architecture based on the mean of multiple computations (MMC) principle known from computational biology and adapt it to solve a “reacher task”. We approximate the policy using a simple MMC network, experimentally compare this idea to end-to-end deep learning architectures, and show that our approach reduces the number of interactions required to approximate a suitable policy by a factor of ten.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. Brockman, G., et al.: OpenAI Gym. arXiv:1606.01540 (2016)

  2. Clavera, I., Held, D., Abbeel, P.: Policy transfer via modularity and reward guiding. In: Proceedings Intelligent Robots and Systems (2017)

    Google Scholar 

  3. Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K.O., Clune, J.: Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. arXiv preprint arXiv:1712.06560 (2017)

  4. Cruse, H., Kindermann, T., Schumm, M., Dean, J., Schmitz, J.: Walknet-a-biologically inspired network to control six-legged walking. Neural Networks 11(7–8), 1435–1447 (1998)

    CrossRef  Google Scholar 

  5. Devin, C., Gupta, A., Darrell, T., Abbeel, P., Levine, S.: Learning modular neural network policies for multi-task and multi-robot transfer. In: Proceedings International Conference on Robotics and Automation (2017)

    Google Scholar 

  6. Dorigo, M., Colombetti, M.: Robot shaping: developing autonomous agents through learning. Artif. Intell. 71(2), 321–370 (1994)

    CrossRef  Google Scholar 

  7. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings International Conference on Robotics and Automation (2017)

    Google Scholar 

  8. Ho, J., Ermon, S.: Generative adversarial imitation learning. In: Proceedings Neural Information Processing Systems (2016)

    Google Scholar 

  9. Lehman, J., Stanley, K.O.: Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings International Conference on Genetic and Evolutionary Computation (2011)

    Google Scholar 

  10. Loftin, R., et al.: Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Auton. Agent. Multi-Agent Syst. 30(1), 30–59 (2016)

    CrossRef  Google Scholar 

  11. Maes, F., Fonteneau, R., Wehenkel, L., Ernst, D.: Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning. In: Ganascia, J.-G., Lenca, P., Petit, J.-M. (eds.) DS 2012. LNCS (LNAI), vol. 7569, pp. 37–51. Springer, Heidelberg (2012).

    CrossRef  Google Scholar 

  12. Mataric, M.J.: Reward Functions for Accelerated Learning. In: Machine Learning Proceedings 1994 (1994)

    Google Scholar 

  13. Mirowski, P., et al.: Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673 (2016)

  14. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    CrossRef  Google Scholar 

  15. Müller, M., Dosovitskiy, A., Ghanem, B., Koltun, V.: Driving policy transfer via modularity and abstraction. arXiv preprint arXiv:1804.09364 (2018)

  16. Peng, B., MacGlashan, J., Loftin, R., Littman, M.L., Roberts, D.L., Taylor, M.E.: A need for speed: adapting agent action speed to improve task learning from non-expert humans. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (2016)

    Google Scholar 

  17. Ramamurthy, R., Bauckhage, C., Sifa, R., Wrobel, S.: Policy learning using SPSA. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 3–12. Springer, Cham (2018).

    CrossRef  Google Scholar 

  18. Robbins, H., Monro, S.: A stochastic approximation method. Annals of Mathematical Statistics 22(3), 400–407 (1951)

    CrossRef  MathSciNet  Google Scholar 

  19. Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)

  20. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust Region Policy Optimization. In: Proceedings International Conference on Machine Learning (2015)

    Google Scholar 

  21. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings International Conference on Machine Learning (2014)

    Google Scholar 

  22. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354 (2017)

    CrossRef  Google Scholar 

  23. Spall, J.C.: Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Trans. Autom. Control 37(3), 332–341 (1992)

    CrossRef  MathSciNet  Google Scholar 

  24. Steinkühler, U., Cruse, H.: A holistic model for an internal representation to control the movement of a manipulator with redundant degrees of freedom. Biol. Cybern. 79(6), 457–466 (1998)

    CrossRef  Google Scholar 

  25. Suay, H.B., Brys, T., Taylor, M.E., Chernova, S.: Learning from demonstration for shaping through inverse reinforcement learning. In: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (2016)

    Google Scholar 

  26. Tassa, Y., et al.: DeepMind Control Suite. arXiv preprint arXiv:1801.00690 (2018)

  27. Verma, A., Murali, V., Singh, R., Kohli, P., Chaudhuri, S.: Programmatically interpretable reinforcement learning. arXiv preprint arXiv:1804.02477 (2018)

  28. Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings International Conference on Robotics and Automation (2017)

    Google Scholar 

  29. Zhu, Y., et al.: Reinforcement and imitation learning for diverse visuomotor skills. arXiv preprint arXiv:1802.09564 (2018)

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Rajkumar Ramamurthy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ramamurthy, R., Bauckhage, C., Sifa, R., Schücker, J., Wrobel, S. (2019). Leveraging Domain Knowledge for Reinforcement Learning Using MMC Architectures. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30483-6

  • Online ISBN: 978-3-030-30484-3

  • eBook Packages: Computer ScienceComputer Science (R0)