While recent work in reinforcement learning (RL) has led to agents capable of solving increasingly complex tasks, the issue of high sample complexity is still a major concern. This issue has motivated the development of additional techniques that augment RL methods in an attempt to increase task learning speed. In particular, inter-agent teaching—endowing agents with the ability to respond to instructions from others—has been responsible for many of these developments. RL agents that can leverage instruction from a more competent teacher have been shown to be able to learn tasks significantly faster than agents that cannot take advantage of such instruction. That said, the inter-agent teaching paradigm presents many new challenges due to, among other factors, differences between the agents involved in the teaching interaction. As a result, many inter-agent teaching methods work only in restricted settings and have proven difficult to generalize to new domains or scenarios. In this article, we propose two frameworks that provide a comprehensive view of the challenges associated with inter-agent teaching. We highlight state-of-the-art solutions, open problems, prospective applications, and argue that new research in this area should be developed in the context of the proposed frameworks.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Notice that the initial policy definition might be implicit (e.g., assuming the agent starts with a random policy), effectively enabling the inter-agent interaction to start immediately at the beginning of the learning process.
Amir, O., Kamar, E., Kolobov, A., & Grosz, B. (2016). Interactive teaching strategies for agent training. In Proceedings of the 25th international joint conference on artificial intelligence (IJCAI) (pp. 804–811).
Arakawa, R., Kobayashi, S., Unno, Y., Tsuboi, Y., & Maeda, S.I. (2018). DQN-TAMER: Human-in-the-loop reinforcement learning with intractable feedback. arXiv preprint arXiv:1810.11748.
Argall, B. D., Chernova, S., Veloso, M., & Browning, B. (2009). A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5), 469–483. https://doi.org/10.1016/j.robot.2008.10.024.
Barrett, S., & Stone, P. (2015). Cooperating with unknown teammates in complex domains: A robot soccer case study of ad hoc teamwork. In Proceedings of the 29th AAAI conference on artificial intelligence (AAAI) (pp. 2010–2016).
Bazzan, A. L. C. (2014). Beyond reinforcement learning and local view in multiagent systems. Künstliche Intelligenz, 28(3), 179–189. https://doi.org/10.1007/s13218-014-0312-5.
Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The Arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research (JAIR), 47, 253–279.
Bianchi, R. A. C., Martins, M. F., Ribeiro, C. H. C., & Costa, A. H. R. (2014). Heuristically-accelerated multiagent reinforcement learning. IEEE Transactions on Cybernetics, 44(2), 252–265. https://doi.org/10.1109/TCYB.2013.2253094.
Bowling, M., & Veloso, M. (2000). An analysis of stochastic game theory for multiagent reinforcement learning. Techical report, Computer Science Department, Carnegie Mellon University.
Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172. https://doi.org/10.1109/TSMCC.2007.913919.
Calandriello, D., Lazaric, A., & Restelli, M. (2014). Sparse Multi-Task Reinforcement Learning. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in neural information processing systems (NIPS) (pp. 819–827). Curran Associates, Inc. http://papers.nips.cc/paper/5247-sparse-multi-task-reinforcement-learning.pdf.
Chernova, S., & Veloso, M. (2009). Interactive policy learning through confidence-based autonomy. Journal of Artificial Intelligence Research (JAIR), 34(1), 1–25.
Clouse, J. A. (1996). Learning from an automated training agent. In G. Weiß & S. Sen (Eds.), Adaptation and learning in multiagent systems. Berlin: Springer.
Cui, Y., Niekum, S. (2018). Active reward learning from critiques. In IEEE international conference on robotics and automation (ICRA) (pp. 6907–6914).
Devlin, S. (2013). Potential-based reward shaping for knowledge-based, multi-agent reinforcement learning. Ph.D. thesis, University of York.
Dusparic, I., Harris, C., Marinescu, A., Cahill, V., & Clarke, S. (2013). Multi-agent residential demand response based on load forecasting. In 1st IEEE conference on technologies for sustainability (SusTech) (pp. 90–96). https://doi.org/10.1109/SusTech.2013.6617303
Fachantidis, A., Taylor, M. E., & Vlahavas, I. (2018). Learning to teach reinforcement learning agents. Machine Learning and Knowledge Extraction, 1(1), 21–42. https://doi.org/10.3390/make1010002.
Fernández, F., & Veloso, M. (2006). Probabilistic Policy Reuse in a Reinforcement Learning Agent. In Proceedings of the 5th international joint conference on autonomous agents and multiagent systems (AAMAS) (pp. 720–727). https://doi.org/10.1145/1160633.1160762
Fernandez, R., John, N., Kirmani, S., Hart, J., Sinapov, J., & Stone, P. (2018). Passive demonstrations of light-based robot signals for improved human interpretability. In IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).
Foerster, J.N., Assael, Y.M., de Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Conference on neural information processing systems (NIPS).
Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., et al. (2019). Guidelines for Reinforcement Learning in Healthcare. Nature Medicine, 25, 16–18.
Gottesman, O., Johansson, F.D., Meier, J., Dent, J., Lee, D., Srinivasan, S., Zhang, L., Ding, Y., Wihl, D., Peng, X., Yao, J., Lage, I., Mosch, C., Lehman, L.H., Komorowski, M., Faisal, A., Celi, L.A., Sontag, D., & Doshi-Velez, F. (2018). Evaluating reinforcement learning algorithms in observational health settings. arXiv preprint arXiv:1805.12298.
Gupta, A., Devin, C., Liu, Y., Abbeel, P., & Levine, S. (2017). Learning invariant feature spaces to transfer skills with reinforcement learning. In Proceedings of the 5th international conference on learning representations (ICLR).
Hausknecht, M., & Stone, P. (2016). Grounded semantic networks for learning shared communication protocols. In NIPS workshop on deep reinforcement learning.
Hersch, M., Guenter, F., Calinon, S., & Billard, A. (2008). Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics, 24(6), 1463–1467.
Hockley, W. E. (1984). Analysis of response time distributions in the study of cognitive processes. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10(4), 598.
Hu, Y., Gao, Y., & An, B. (2015). Multiagent Reinforcement learning with unshared value functions. IEEE Transactions on Cybernetics, 45(4), 647–662.
Jonsson, A. (2019). Deep reinforcement learning in medicine. Kidney Diseases, 5(1), 3–7.
Judah, K., Fern, A.P., Dietterich, T.G., Tadepalii, P.: Active imitation learning: Formal and practical reductions to I.I.D. Learning. Journal of Machine Learning Research (JMLR)15(1), 3925–3963 (2014)
Knox, W.B., & Stone, P. (2009). Interactively shaping agents via human reinforcement: The TAMER framework. In Proceedings of the 5th international conference on knowledge capture (pp. 9–16).
Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11), 1238–1274. https://doi.org/10.1177/0278364913495721.
Kono, H., Kamimura, A., Tomita, K., Murata, Y., & Suzuki, T. (2014). Transfer learning method using ontology for heterogeneous multi-agent reinforcement learning. International Journal of Advanced Computer Science and Applications (IJACSA), 5(10), 156–164. https://doi.org/10.14569/IJACSA.2014.051022.
Kuhlmann, G., Stone, P., Mooney, R., & Shavlik, J. (2004). Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer. In AAAI workshop on supervisory control of learning and adaptive systems.
Lauer, M., & Riedmiller, M. (2000). An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In Proceedings of the 17th international conference on machine learning (ICML) (pp. 535–542).
Lazaric, A. (2012). Transfer in reinforcement learning: A framework and a survey (pp. 143–173). Heidelberg: Springer.
Li, G., Hung, H., Whiteson, S., & Knox, W.B. (2013). Using informative behavior to increase engagement in the TAMER framework. In Proceedings of the 9th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 909–916).
Littman, M. L. (2015). Reinforcement learning improves behaviour from evaluative feedback. Nature, 521(7553), 445–451. https://doi.org/10.1038/nature14540.
MacGlashan, J., Ho, M.K., Loftin, R., Peng, B., Wang, G., Roberts, D.L., Taylor, M.E., & Littman, M.L. (2017). Interactive learning from policy-dependent human feedback. In Proceedings of the 34th international conference on machine learning (ICML) (pp. 2285–2294).
Maclin, R., Shavlik, J., Torrey, L., Walker, T., & Wild, E. (2005). Giving advice about preferred actions to reinforcement learners via knowledge-based Kernel regression. In Proceedings of the 20th AAAI conference on artificial intelligence.
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd international conference on machine learning (ICML) (pp. 1928–1937).
Omidshafiei, S., Kim, D., Liu, M., Tesauro, G., Riemer, M., Amato, C., Campbell, M., & How, J.P. (2019). Learning to teach in cooperative multiagent reinforcement learning. In Proceedings of the 33rd AAAI conference on artificial intelligence (AAAI).
Peng, B., MacGlashan, J., Loftin, R., Littman, M.L., Roberts, D.L., & Taylor, M.E. (2016). A need for speed: Adapting agent action speed to improve task learning from non-expert humans. In Proceedings of the 15th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 957–965).
Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. Hoboken (N. J.): Wiley.
Reardon, C., Lee, K., & Fink, J. (2018). Come see this!. Augmented reality to enable human-robot cooperative search: In IEEE international symposium on safety, security, and rescue robotics.
Ross, S., Melik-Barkhudarov, N., Shankar, K.S., Wendel, A., Dey, D., Bagnell, J.A., & Hebert, M. (2013). Learning monocular reactive UAV control in cluttered natural environments. In IEEE international conference on robotics and automation (ICRA).
Santara, A., Naik, A., Ravindran, B., Das, D., Mudigere, D., Avancha, S., & Kaul, B. (2018). RAIL: Risk-averse imitation learning. In Proceedings of the 17th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 2062–2063).
Schaal, S. (1997). Learning from demonstration. In Advances in neural information processing systems (NIPS) (pp. 1040–1046).
Sermanet, P., Lynch, C., Chebotar, Y., Hsu, J., Jang, E., Schaal, S., Levine, S., & Brain, G. (2018). Time-contrastive networks: Self-supervised learning from video. In IEEE international conference on robotics and automation (ICRA).
Settles, B. (2010). Active learning literature survey. Technical report, University of Wisconsin-Madison.
Silva, F. L. D., & Costa, A. H. R. (2019). A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research (JAIR), 69, 645–703.
Silva, F.L.D., Glatt, R., & Costa, A.H.R. (2017). Simultaneously learning and advising in multiagent reinforcement learning. In Proceedings of the 16th international conference on autonomous agents and multiagent systems (AAMAS) (pp. 1100–1108).
Silva, F.L.D., Taylor, M.E., & Costa, A.H.R. (2018). Autonomously reusing knowledge in multiagent reinforcement learning. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 5487–5493).
Stone, P., Kaminka, G.A., Kraus, S., & Rosenschein, J.S. (2010). Ad Hoc autonomous agent teams: Collaboration without pre-coordination. In Proceedings of the 24th AAAI conference on artificial intelligence (AAAI) (pp. 1504–1509).
Stone, P., & Veloso, M. (1999). Task decomposition, dynamic role assignment, and low-bandwidth communication for real-time strategic teamwork. Artificial Intelligence, 110(2), 241–273. https://doi.org/10.1016/S0004-3702(99)00025-9.
Sukhbaatar, S., Szlam, A., & Fergus, R. (2016). Learning multiagent communication with backpropagation. In Conference on neural information processing systems (NIPS).
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (1st ed.). Cambridge, MA, USA: MIT Press.
Sutton, R.S., McAllester, D.A., Singh, S.P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (NIPS) (pp. 1057–1063).
Tafesse, Y. D., Wigness, M., & Twigg, J. (2018). Analysis techniques for displaying robot intent with LED patterns. US Army Research Laboratory: Technical report.
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the 10th international conference on machine learning (ICML) (pp. 330–337).
Taylor, A., Dusparic, I., Galvan-Lopez, E., Clarke, S., & Cahill, V. (2014). Accelerating learning in multi-objective systems through transfer learning. In International joint conference on neural networks (IJCNN) (pp. 2298–2305). https://doi.org/10.1109/IJCNN.2014.6889438
Taylor, M. E., Carboni, N., Fachantidis, A., Vlahavas, I. P., & Torrey, L. (2014). Reinforcement learning agents providing advice in complex video games. Connection Science, 26(1), 45–63. https://doi.org/10.1080/09540091.2014.885279.
Taylor, M. E., & Stone, P. (2009). Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research (JMLR), 10, 1633–1685. https://doi.org/10.1145/1577069.1755839.
Taylor, M. E., Stone, P., & Liu, Y. (2007). Transfer learning via inter-task mappings for temporal difference learning. Journal of Machine Learning Research (JMLR), 8(1), 2125–2167.
Todorov, E., Erez, T., & Tassa, Y. (2012). Mujoco: A physics engine for model-based control. In IEEE/RSJ international conference on intelligent robots and systems.
Torabi, F., Warnell, G., & Stone, P. (2018). Behavioral cloning from observation. In Proceedings of the 27th international joint conference on artificial intelligence (IJCAI) (pp. 4950–4957).
Torrey, L., & Taylor, M.E. (2013). Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of 12th the international conference on autonomous agents and multiagent systems (AAMAS) (pp. 1053–1060).
Warnell, G., Waytowich, N., Lawhern, V., & Stone, P. (2018). Deep TAMER: Interactive agent shaping in high-dimensional state spaces. In AAAI conference on artificial intelligence.
Watkins, C. J., & Dayan, P. (1992). Q-Learning. Machine Learning, 8(3), 279–292.
Wirth, C., Akrour, R., Neumann, G., & Fürnkranz, J. (2017). A survey of preference-based reinforcement learning methods. Journal of Machine Learning Research18(136), 1–46. http://jmlr.org/papers/v18/16-634.html
Zhan, Y., Bou-Ammar, H., & Taylor, M.E. (2016). Theoretically-grounded policy advice from multiple teachers in reinforcement learning settings with applications to negative transfer. In Proceedings of the 25th international joint conference on artificial intelligence (IJCAI) (pp. 2315–2321).
Zimmer, M., Viappiani, P., & Weng, P. (2014). Teacher–student framework: A reinforcement learning approach. In Workshop on autonomous robots and multirobot systems at AAMAS.
This work has taken place in the Learning Agents Research Group (LARG) at UT Austin. LARG research is supported in part by NSF (IIS-1637736, IIS-1651089, IIS-1724157), ONR (N00014-18-2243), FLI (RFP2-000), ARL, DARPA, Intel, Raytheon, and Lockheed Martin. Peter Stone serves on the Board of Directors of Cogitai, Inc. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research. We also gratefully acknowledge financial support from CNPq, Grants 425860/2016-7 and 307027/2017-1 and São Paulo Research Foundation (FAPESP), Grants 2015/16310-4 and 2018/00344-5.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Da Silva, F.L., Warnell, G., Costa, A.H.R. et al. Agents teaching agents: a survey on inter-agent transfer learning. Auton Agent Multi-Agent Syst 34, 9 (2020). https://doi.org/10.1007/s10458-019-09430-0
- Multiagent learning
- Transfer learning
- Reinforcement learning