Skip to main content

Specification Aware Multi-Agent Reinforcement Learning

  • Conference paper
  • First Online:
Agents and Artificial Intelligence (ICAART 2021)

Abstract

Engineering intelligent industrial systems is challenging due to high complexity and uncertainty with respect to domain dynamics and multiple agents. If industrial systems act autonomously, their choices and results must be within specified bounds to satisfy these requirements. Reinforcement learning (RL) is promising to find solutions that outperform known or handcrafted heuristics. However in industrial scenarios, it also is crucial to prevent RL from inducing potentially undesired or even dangerous behavior. This paper considers specification alignment in industrial scenarios with multi-agent reinforcement learning (MARL). We propose to embed functional and non-functional requirements into the reward function, enabling the agents to learn to align with the specification. We evaluate our approach in a smart factory simulation representing an industrial lot-size-one production facility, where we train up to eight agents using DQN, VDN, and QMIX. Our results show that the proposed approach enables agents to satisfy a given set of requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amodei, D., Olah, C., Steinhardt, J., Christiano, P.F., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv:1606.06565 (2016)

  2. Belzner, L., Beck, M.T., Gabor, T., Roelle, H., Sauer, H.: Software engineering for distributed autonomous real-time systems. In: 2016 IEEE/ACM 2nd International Workshop on Software Engineering for Smart Cyber-Physical Systems (SEsCPS), pp. 54–57. IEEE (2016)

    Google Scholar 

  3. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 41–48 (2009)

    Google Scholar 

  4. Bures, T., et al.: Software engineering for smart cyber-physical systems: challenges and promising solutions. ACM SIGSOFT Softw. Eng. Notes 42(2), 19–24 (2017)

    Article  Google Scholar 

  5. Chang, Y.H., Ho, T., Kaelbling, L.P.: All learning is local: multi-agent learning in global reward games. In: Advances in Neural Information Processing Systems, pp. 807–814 (2004)

    Google Scholar 

  6. Cheng, B.H.C., et al.: Software engineering for self-adaptive systems: a research roadmap. In: Cheng, B.H.C., de Lemos, R., Giese, H., Inverardi, P., Magee, J. (eds.) Software Engineering for Self-Adaptive Systems. LNCS, vol. 5525, pp. 1–26. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02161-9_1

    Chapter  Google Scholar 

  7. Devlin, S., Kudenko, D.: Dynamic potential-based reward shaping. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2012, vol. 1, pp. 433–440 (2012)

    Google Scholar 

  8. Devlin, S., Yliniemi, L., Kudenko, D., Tumer, K.: Potential-based difference rewards for multiagent reinforcement learning. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS 2014, pp. 165–172 (2014)

    Google Scholar 

  9. Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)

    Google Scholar 

  10. Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  11. García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(42), 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  12. Grześ, M.: Reward shaping in episodic reinforcement learning. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2017, pp. 565–573 (2017)

    Google Scholar 

  13. Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: Sukthankar, G., Rodriguez-Aguilar, J.A. (eds.) AAMAS 2017. LNCS (LNAI), vol. 10642, pp. 66–83. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71682-4_5

    Chapter  Google Scholar 

  14. Hendrycks, D., Carlini, N., Schulman, J., Steinhardt, J.: Unsolved problems in ML safety (2021)

    Google Scholar 

  15. Jaderberg, M., et al.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364(6443), 859–865 (2019)

    Article  MathSciNet  Google Scholar 

  16. Laurent, G.J., Matignon, L., Fort-Piat, L., et al.: The world of Independent Learners is not Markovian. J. Knowl.-Based Intell. Eng. Syst. 15, 55–64 (2011)

    Article  Google Scholar 

  17. Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J., Graepel, T.: Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems, pp. 464–473 (2017)

    Google Scholar 

  18. Leike, J., Krueger, D., Everitt, T., Martic, M., Maini, V., Legg, S.: Scalable agent alignment via reward modeling: a research direction (2018)

    Google Scholar 

  19. Leike, J., et al.: AI safety gridworlds. arXiv:1711.09883 (2017)

  20. Liu, S., Lever, G., Merel, J., Tunyasuvunakool, S., Heess, N., Graepel, T.: Emergent coordination through competition. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA (2019)

    Google Scholar 

  21. Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 641–647. ACM (2005)

    Google Scholar 

  22. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)

    Google Scholar 

  23. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (2016)

    Google Scholar 

  24. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    Article  Google Scholar 

  25. Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the Sixteenth International Conference on Machine Learning, ICML 1999, pp. 278–287 (1999)

    Google Scholar 

  26. Phan, D.T., Grosu, R., Jansen, N., Paoletti, N., Smolka, S.A., Stoller, S.D.: Neural simplex architecture. In: Lee, R., Jha, S., Mavridou, A., Giannakopoulou, D. (eds.) NFM 2020. LNCS, vol. 12229, pp. 97–114. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-55754-6_6

    Chapter  Google Scholar 

  27. Phan, T., Belzner, L., Gabor, T., Schmid, K.: Leveraging statistical multi-agent online planning with emergent value function approximation. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS, pp. 730–738 (2018)

    Google Scholar 

  28. Phan, T., Belzner, L., Gabor, T., Sedlmeier, A., Ritz, F., Linnhoff-Popien, C.: Resilient multi-agent reinforcement learning with adversarial value decomposition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 13, pp. 11308–11316 (2021)

    Google Scholar 

  29. Phan, T., et al.: Learning and testing resilience in cooperative multi-agent systems. In: Proceedings of the 19th Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2020 (2020)

    Google Scholar 

  30. Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4292–4301 (2018)

    Google Scholar 

  31. Ritz, F., et al.: SAT-MARL: specification aware training in multi-agent reinforcement learning. In: Proceedings of the 13th International Conference on Agents and Artificial Intelligence, Volume 1: ICAART, pp. 28–37. SciTePress (2021). https://doi.org/10.5220/0010189500280037

  32. Seurin, M., Preux, P., Pietquin, O.: “I’m sorry Dave, I’m afraid I can’t do that” deep q-learning from forbidden action. arXiv:1910.02078 (2019)

  33. Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018). https://doi.org/10.1126/science.aar6404

    Article  MathSciNet  MATH  Google Scholar 

  34. Son, K., Kim, D., Kang, W.J., Hostallero, D.E., Yi, Y.: QTRAN: learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896 (2019)

    Google Scholar 

  35. Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (Extended Abstract), IFAAMAS, pp. 2085–2087 (2018)

    Google Scholar 

  36. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. A Bradford Book, Cambridge (2018)

    Google Scholar 

  37. Tampuu, A., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 12(4), e0172395 (2017)

    Google Scholar 

  38. Wang, S., Wan, J., Zhang, D., Li, D., Zhang, C.: Towards smart factory for industry 4.0: a self-organized multi-agent system with big data based feedback and coordination. Comput. Netw. 101, 158–168 (2016)

    Article  Google Scholar 

  39. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  40. Wolpert, D.H., Tumer, K.: Optimal payoff functions for members of collectives. In: Modeling Complexity in Economic and Social Systems, pp. 355–369. World Scientific (2002)

    Google Scholar 

  41. Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D.J., Mannor, S.: Learn what not to learn: action elimination with deep reinforcement learning. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31, pp. 3562–3573. Curran Associates, Inc. (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabian Ritz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ritz, F. et al. (2022). Specification Aware Multi-Agent Reinforcement Learning. In: Rocha, A.P., Steels, L., van den Herik, J. (eds) Agents and Artificial Intelligence. ICAART 2021. Lecture Notes in Computer Science(), vol 13251. Springer, Cham. https://doi.org/10.1007/978-3-031-10161-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-10161-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-10160-1

  • Online ISBN: 978-3-031-10161-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics