Agents teaching agents: a survey on inter-agent transfer learning

Abstract

While recent work in reinforcement learning (RL) has led to agents capable of solving increasingly complex tasks, the issue of high sample complexity is still a major concern. This issue has motivated the development of additional techniques that augment RL methods in an attempt to increase task learning speed. In particular, inter-agent teaching—endowing agents with the ability to respond to instructions from others—has been responsible for many of these developments. RL agents that can leverage instruction from a more competent teacher have been shown to be able to learn tasks significantly faster than agents that cannot take advantage of such instruction. That said, the inter-agent teaching paradigm presents many new challenges due to, among other factors, differences between the agents involved in the teaching interaction. As a result, many inter-agent teaching methods work only in restricted settings and have proven difficult to generalize to new domains or scenarios. In this article, we propose two frameworks that provide a comprehensive view of the challenges associated with inter-agent teaching. We highlight state-of-the-art solutions, open problems, prospective applications, and argue that new research in this area should be developed in the context of the proposed frameworks.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    Notice that the initial policy definition might be implicit (e.g., assuming the agent starts with a random policy), effectively enabling the inter-agent interaction to start immediately at the beginning of the learning process.

References

  1. 1.

    Amir, O., Kamar, E., Kolobov, A., & Grosz, B. (2016). Interactive teaching strategies for agent training. In Proceedings of the 25th international joint conference on artificial intelligence (IJCAI) (pp. 804–811).

  2. 2.

    Arakawa, R., Kobayashi, S., Unno, Y., Tsuboi, Y., & Maeda, S.I. (2018). DQN-TAMER: Human-in-the-loop reinforcement learning with intractable feedback. arXiv preprint arXiv:1810.11748.

  3. 3.

    Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483. https://doi.org/10.1016/j.robot.2008.10.024.

    Article  Google Scholar 

  4. 4.

    Barrett, S., & Stone, P. (2015). Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork. In Proceedings of the 29th AAAI conference on artificial intelligence (AAAI) (pp. 2010–2016).

  5. 5.

    Bazzan, A. L. C. (2014). Beyond reinforcement learning and local view in multiagent systems. Künstliche Intelligenz, 28(3), 179–189. https://doi.org/10.1007/s13218-014-0312-5.

    Article  Google Scholar 

  6. 6.

    Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The Arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research (JAIR), 47, 253–279.

    Article  Google Scholar 

  7. 7.

    Bianchi, R. A. C., Martins, M. F., Ribeiro, C. H. C., & Costa, A. H. R. (2014). Heuristically-accelerated multiagent reinforcement learning. IEEE Transactions on Cybernetics, 44(2), 252–265. https://doi.org/10.1109/TCYB.2013.2253094.

    Article  Google Scholar 

  8. 8.

    Bowling, M., & Veloso, M. (2000). An analysis of stochastic game theory for multiagent reinforcement learning. Techical report, Computer Science Department, Carnegie Mellon University.

  9. 9.

    Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172. https://doi.org/10.1109/TSMCC.2007.913919.

    Article  Google Scholar 

  10. 10.

    Calandriello, D., Lazaric, A., & Restelli, M. (2014). Sparse Multi-Task Reinforcement Learning. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in neural information processing systems (NIPS) (pp. 819–827). Curran Associates, Inc. http://papers.nips.cc/paper/5247-sparse-multi-task-reinforcement-learning.pdf.

  11. 11.

    Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research (JAIR), 34(1), 1–25.

    MathSciNet  MATH  Google Scholar 

  12. 12.

    Clouse, J. A. (1996). Learning from an automated training agent. In G. Weiß & S. Sen (Eds.), Adaptation and learning in multiagent systems. Berlin: Springer.

    Google Scholar 

  13. 13.

    Cui, Y., Niekum, S. (2018). Active reward learning from critiques. In IEEE international conference on robotics and automation (ICRA) (pp. 6907–6914).

  14. 14.

    Devlin, S. (2013). Potential-based reward shaping for knowledge-based, multi-agent reinforcement learning. Ph.D. thesis, University of York.

  15. 15.

    Dusparic, I., Harris, C., Marinescu, A., Cahill, V., & Clarke, S. (2013). Multi-agent residential demand response based on load forecasting. In 1st IEEE conference on technologies for sustainability (SusTech) (pp. 90–96). https://doi.org/10.1109/SusTech.2013.6617303

  16. 16.

    Fachantidis, A., Taylor, M. E., & Vlahavas, I. (2018). Learning to teach reinforcement learning agents. Machine Learning and Knowledge Extraction, 1(1), 21–42. https://doi.org/10.3390/make1010002.

    Article  Google Scholar 

  17. 17.

    Fernández, F., & Veloso, M. (2006). Probabilistic Policy Reuse in a Reinforcement Learning Agent. In Proceedings of the 5th international joint conference on autonomous agents and multiagent systems (AAMAS) (pp. 720–727). https://doi.org/10.1145/1160633.1160762

  18. 18.

    Fernandez, R., John, N., Kirmani, S., Hart, J., Sinapov, J., & Stone, P. (2018). Passive demonstrations of light-based robot signals for improved human interpretability. In IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

  19. 19.

    Foerster, J.N., Assael, Y.M., de Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Conference on neural information processing systems (NIPS).

  20. 20.

    Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., et al. (2019). Guidelines for Reinforcement Learning in Healthcare. Nature Medicine, 25, 16–18.

    Article  Google Scholar 

  21. 21.

    Gottesman, O., Johansson, F.D., Meier, J., Dent, J., Lee, D., Srinivasan, S., Zhang, L., Ding, Y., Wihl, D., Peng, X., Yao, J., Lage, I., Mosch, C., Lehman, L.H., Komorowski, M., Faisal, A., Celi, L.A., Sontag, D., & Doshi-Velez, F. (2018). Evaluating reinforcement learning algorithms in observational health settings. arXiv preprint arXiv:1805.12298.

  22. 22.

    Gupta, A., Devin, C., Liu, Y., Abbeel, P., & Levine, S. (2017). Learning invariant feature spaces to transfer skills with reinforcement learning. In Proceedings of the 5th international conference on learning representations (ICLR).

  23. 23.

    Hausknecht, M., & Stone, P. (2016). Grounded semantic networks for learning shared communication protocols. In NIPS workshop on deep reinforcement learning.

  24. 24.

    Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, 24(6), 1463–1467.

    Article  Google Scholar 

  25. 25.

    Hockley, W. E. (1984). Analysis of response time distributions in the study of cognitive processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10(4), 598.

    Google Scholar 

  26. 26.

    Hu, Y., Gao, Y., & An, B. (2015). Multiagent Reinforcement learning with unshared value functions. IEEE Transactions on Cybernetics, 45(4), 647–662.

    Article  Google Scholar 

  27. 27.

    Jonsson, A. (2019). Deep reinforcement learning in medicine. Kidney Diseases, 5(1), 3–7.

    Article  Google Scholar 

  28. 28.

    Judah, K., Fern, A.P., Dietterich, T.G., Tadepalii, P.: Active imitation learning: Formal and practical reductions to I.I.D. Learning. Journal of Machine Learning Research (JMLR)15(1), 3925–3963 (2014)

  29. 29.

    Knox, W.B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the 5th international conference on knowledge capture (pp. 9–16).

  30. 30.

    Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274. https://doi.org/10.1177/0278364913495721.

    Article  Google Scholar 

  31. 31.

    Kono, H., Kamimura, A., Tomita, K., Murata, Y., & Suzuki, T. (2014). Transfer learning method using ontology for heterogeneous multi-agent reinforcement learning. International Journal of Advanced Computer Science and Applications (IJACSA), 5(10), 156–164. https://doi.org/10.14569/IJACSA.2014.051022.

    Article  Google Scholar 

  32. 32.

    Kuhlmann, G., Stone, P., Mooney, R., & Shavlik, J. (2004). Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer. In AAAI workshop on supervisory control of learning and adaptive systems.

  33. 33.

    Lauer, M., & Riedmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th international conference on machine learning (ICML) (pp. 535–542).

  34. 34.

    Lazaric, A. (2012). Transfer in reinforcement learning: A framework and a survey (pp. 143–173). Heidelberg: Springer.

    Book  Google Scholar 

  35. 35.

    Li, G., Hung, H., Whiteson, S., & Knox, W.B. (2013). Using informative behavior to increase engagement in the TAMER framework. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 909–916).

  36. 36.

    Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback. Nature, 521(7553), 445–451. https://doi.org/10.1038/nature14540.

    Article  Google Scholar 

  37. 37.

    MacGlashan, J., Ho, M.K., Loftin, R., Peng, B., Wang, G., Roberts, D.L., Taylor, M.E., & Littman, M.L. (2017). Interactive learning from policy-dependent human feedback. In Proceedings of the 34th international conference on machine learning (ICML) (pp. 2285–2294).

  38. 38.

    Maclin, R., Shavlik, J., Torrey, L., Walker, T., & Wild, E. (2005). Giving advice about preferred actions to reinforcement learners via knowledge-based Kernel regression. In Proceedings of the 20th AAAI conference on artificial intelligence.

  39. 39.

    Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd international conference on machine learning (ICML) (pp. 1928–1937).

  40. 40.

    Omidshafiei, S., Kim, D., Liu, M., Tesauro, G., Riemer, M., Amato, C., Campbell, M., & How, J.P. (2019). Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of the 33rd AAAI conference on artificial intelligence (AAAI).

  41. 41.

    Peng, B., MacGlashan, J., Loftin, R., Littman, M.L., Roberts, D.L., & Taylor, M.E. (2016). A need for speed: Adapting agent action speed to improve task learning from non-expert humans. In Proceedings of the 15th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 957–965).

  42. 42.

    Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. Hoboken (N. J.): Wiley.

  43. 43.

    Reardon, C., Lee, K., & Fink, J. (2018). Come see this!. Augmented reality to enable human-robot cooperative search: In IEEE international symposium on safety, security, and rescue robotics.

  44. 44.

    Ross, S., Melik-Barkhudarov, N., Shankar, K.S., Wendel, A., Dey, D., Bagnell, J.A., & Hebert, M. (2013). Learning monocular reactive UAV control in cluttered natural environments. In IEEE international conference on robotics and automation (ICRA).

  45. 45.

    Santara, A., Naik, A., Ravindran, B., Das, D., Mudigere, D., Avancha, S., & Kaul, B. (2018). RAIL: Risk-averse imitation learning. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 2062–2063).

  46. 46.

    Schaal, S. (1997). Learning from demonstration. In Advances in neural information processing systems (NIPS) (pp. 1040–1046).

  47. 47.

    Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., Levine, S., & Brain, G. (2018). Time-contrastive networks: Self-supervised learning from video. In IEEE international conference on robotics and automation (ICRA).

  48. 48.

    Settles, B. (2010). Active learning literature survey. Technical report, University of Wisconsin-Madison.

  49. 49.

    Silva, F. L. D., & Costa, A. H. R. (2019). A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research (JAIR), 69, 645–703.

    MathSciNet  MATH  Article  Google Scholar 

  50. 50.

    Silva, F.L.D., Glatt, R., & Costa, A.H.R. (2017). Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 1100–1108).

  51. 51.

    Silva, F.L.D., Taylor, M.E., & Costa, A.H.R. (2018). Autonomously reusing knowledge in multiagent reinforcement learning. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 5487–5493).

  52. 52.

    Stone, P., Kaminka, G.A., Kraus, S., & Rosenschein, J.S. (2010). Ad Hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the 24th AAAI conference on artificial intelligence (AAAI) (pp. 1504–1509).

  53. 53.

    Stone, P., & Veloso, M. (1999). Task decomposition, dynamic role assignment, and low-bandwidth communication for real-time strategic teamwork. Artificial Intelligence, 110(2), 241–273. https://doi.org/10.1016/S0004-3702(99)00025-9.

    MATH  Article  Google Scholar 

  54. 54.

    Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Conference on neural information processing systems (NIPS).

  55. 55.

    Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (1st ed.). Cambridge, MA, USA: MIT Press.

    MATH  Google Scholar 

  56. 56.

    Sutton, R.S., McAllester, D.A., Singh, S.P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (NIPS) (pp. 1057–1063).

  57. 57.

    Tafesse, Y. D., Wigness, M., & Twigg, J. (2018). Analysis techniques for displaying robot intent with LED patterns. US Army Research Laboratory: Technical report.

  58. 58.

    Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th international conference on machine learning (ICML) (pp. 330–337).

  59. 59.

    Taylor, A., Dusparic, I., Galvan-Lopez, E., Clarke, S., & Cahill, V. (2014). Accelerating learning in multi-objective systems through transfer learning. In International joint conference on neural networks (IJCNN) (pp. 2298–2305). https://doi.org/10.1109/IJCNN.2014.6889438

  60. 60.

    Taylor, M. E., Carboni, N., Fachantidis, A., Vlahavas, I. P., & Torrey, L. (2014). Reinforcement learning agents providing advice in complex video games. Connection Science, 26(1), 45–63. https://doi.org/10.1080/09540091.2014.885279.

    Article  Google Scholar 

  61. 61.

    Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research (JMLR), 10, 1633–1685. https://doi.org/10.1145/1577069.1755839.

    MathSciNet  MATH  Article  Google Scholar 

  62. 62.

    Taylor, M. E., Stone, P., & Liu, Y. (2007). Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research (JMLR), 8(1), 2125–2167.

    MathSciNet  MATH  Google Scholar 

  63. 63.

    Todorov, E., Erez, T., & Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In IEEE/RSJ international conference on intelligent robots and systems.

  64. 64.

    Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 4950–4957).

  65. 65.

    Torrey, L., & Taylor, M.E. (2013). Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of 12th the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 1053–1060).

  66. 66.

    Warnell, G., Waytowich, N., Lawhern, V., & Stone, P. (2018). Deep TAMER: Interactive agent shaping in high-dimensional state spaces. In AAAI conference on artificial intelligence.

  67. 67.

    Watkins, C. J., & Dayan, P. (1992). Q-Learning. Machine Learning, 8(3), 279–292.

    MATH  Google Scholar 

  68. 68.

    Wirth, C., Akrour, R., Neumann, G., & Fürnkranz, J. (2017). A survey of preference-based reinforcement learning methods. Journal of Machine Learning Research18(136), 1–46. http://jmlr.org/papers/v18/16-634.html

  69. 69.

    Zhan, Y., Bou-Ammar, H., & Taylor, M.E. (2016). Theoretically-grounded policy advice from multiple teachers in reinforcement learning settings with applications to negative transfer. In Proceedings of the 25th international joint conference on artificial intelligence (IJCAI) (pp. 2315–2321).

  70. 70.

    Zimmer, M., Viappiani, P., & Weng, P. (2014). Teacher–student framework: A reinforcement learning approach. In Workshop on autonomous robots and multirobot systems at AAMAS.

Download references

Acknowledgements

This work has taken place in the Learning Agents Research Group (LARG) at UT Austin. LARG research is supported in part by NSF (IIS-1637736, IIS-1651089, IIS-1724157), ONR (N00014-18-2243), FLI (RFP2-000), ARL, DARPA, Intel, Raytheon, and Lockheed Martin. Peter Stone serves on the Board of Directors of Cogitai, Inc. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research. We also gratefully acknowledge financial support from CNPq, Grants 425860/2016-7 and 307027/2017-1 and São Paulo Research Foundation (FAPESP), Grants 2015/16310-4 and 2018/00344-5.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Felipe Leno Da Silva.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Da Silva, F.L., Warnell, G., Costa, A.H.R. et al. Agents teaching agents: a survey on inter-agent transfer learning. Auton Agent Multi-Agent Syst 34, 9 (2020). https://doi.org/10.1007/s10458-019-09430-0

Download citation

Keywords

  • Multiagent learning
  • Transfer learning
  • Reinforcement learning