Creating Advice-Taking Reinforcement Learners

  • Richard Maclin
  • Jude W. Shavlik


Learning from reinforcements is a promising approach for creating intelligent agents. However, reinforcement learning usually requires a large number of training episodes. We present and evaluate a design that addresses this shortcoming by allowing a connectionist Q-learner to accept advice given, at any time and in a natural manner, by an external observer. In our approach, the advice-giver watches the learner and occasionally makes suggestions, expressed as instructions in a simple imperative programming language. Based on techniques from knowledge-based neural networks, we insert these programs directly into the agent’s utility function. Subsequent reinforcement learning further integrates and refines the advice. We present empirical evidence that investigates several aspects of our approach and shows that, given good advice, a learner can achieve statistically significant gains in expected reward. A second experiment shows that advice improves the expected reward regardless of the stage of training at which it is given, while another study demonstrates that subsequent advice can result in further gains in reward. Finally, we present experimental results that indicate our method is more powerful than a naive technique for making use of advice.


Reinforcement learning advice-giving neural networks Q-learning learning from instruction theory refinement knowledge-based neural networks adaptive agents 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abu-Mostafa, Y. (1995). Hints. Neural Computation, 7, 639–671.CrossRefGoogle Scholar
  2. Agre, P., & Chapman, D. (1987). Pengi: An implementation of a theory of activity. In Proceedings of the Sixth National Conference on Artificial Intelligence, pp. 268–272 Seattle, WA.Google Scholar
  3. Anderson, C. (1987). Strategy learning with multilayer connectionist representations. In Proceedings of the Fourth International Workshop on Machine Learning, pp. 103–114 Irvine, CA.Google Scholar
  4. Barto, A., Sutton, R., & Anderson, C. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 834–846.Google Scholar
  5. Barto, A., Sutton, R., & Watkins, C. (1990). Learning and sequential decision making. In Gabriel, M., & Moore, J. (Eds.), Learning and Computational Neuroscience, pp. 539–602, MIT Press, Cambridge, MA.Google Scholar
  6. Berenji, H., & Khedkar, P. (1992). Learning and tuning fuzzy logic controllers through reinforcements. IEEE Transactions on Neural Networks, 3, 724–740.CrossRefGoogle Scholar
  7. Chapman, D. (1991). Vision, Instruction, and Action. MIT Press, Cambridge, MA.Google Scholar
  8. Clouse, J., & Utgoff, P. (1992). A teaching method for reinforcement learning. In Proceedings of the Ninth International Conference on Machine Learning, pp. 92–101 Aberdeen, Scotland.Google Scholar
  9. Crangle, C., & Suppes, P. (1994). Language and Learning for Robots. CSLI Publications, Stanford, CA.Google Scholar
  10. Craven, M., & Shavlik, J. (1994). Using sampling and queries to extract rules from trained neural networks. In Proceedings of the Eleventh International Conference on Machine Learning, pp. 37–45 New Brunswick, NJ.Google Scholar
  11. Diederich, J. (1989). “Learning by instruction” in connectionist systems. In Proceedings of the Sixth International Workshop on Machine Learning, pp. 66–68 Ithaca, NY.Google Scholar
  12. Dietterich, T. (1991). Knowledge compilation: Bridging the gap between specification and implementation. IEEE Expert, 6, 80–82.CrossRefGoogle Scholar
  13. Elman, J. (1990). Finding structure in time. Cognitive Science, 14, 179–211.CrossRefGoogle Scholar
  14. Frasconi, P., Gori, M., Maggini, M., & Soda, G. (1995). Unified integration of explicit knowledge and learning by example in recurrent networks. IEEE Transactions on Knowledge and Data Engineering, 7, 340–346.CrossRefGoogle Scholar
  15. Fu, L. M. (1989). Integration of neural heuristics into knowledge-based inference. Connection Science, 1, 325–340.CrossRefGoogle Scholar
  16. Ginsberg, A. (1988). Automatic Refinement of Expert System Knowledge Bases. Pitman, London.zbMATHGoogle Scholar
  17. Gordon, D., & Subramanian, D. (1994). A multistrategy learning scheme for agent knowledge acquisition. Informatica, 17, 331–346.Google Scholar
  18. Gruau, F. (1994). Neural Network Synthesis using Cellular Encoding and the Genetic Algorithm. Ph.D. thesis, Ecole Normale Superieure de Lyon, France.Google Scholar
  19. Hayes-Roth, F., Klahr, P., & Mostow, D.J. (1981). Advice-taking and knowledge refinement: An iterative view of skill acquisition. In Anderson, J. (Ed.), Cognitive Skills and their Acquisition, pp. 231–253. Lawrence Erlbaum, Hillsdale, NJ.Google Scholar
  20. Huffman, S., & Laird, J. (1993). Learning procedures from interactive natural language instructions. In Machine Learning: Proceedings on the Tenth International Conference, pp. 143–150 Amherst, MA.Google Scholar
  21. Kaelbling, L. (1987). REX: A symbolic language for the design and parallel implementation of embedded systems. In Proceedings of the AIAA Conference on Computers in Aerospace Wakefield, MA.Google Scholar
  22. Kaelbling, L., & Rosenschein, S. (1990). Action and planning in embedded agents. Robotics and Autonomous Systems, 6, 35–48.CrossRefGoogle Scholar
  23. Laird, J., Hucka, M., Yager, E., & Tuck, C. (1990). Correcting and extending domain knowledge using outside guidance. In Proceedings of the Seventh International Conference on Machine Learning, pp. 235–243 Austin, TX.Google Scholar
  24. Le Cun, Y., Denker, J., & Solla, S. (1990). Optimal brain damage. In Touretzky, D. (Ed.), Advances in Neural Information Processing Systems, Vol. 2, pp. 598–605. Morgan Kaufmann, Palo Alto, CA.Google Scholar
  25. Levine, J., Mason, T., & Brown, D. (1992). Lex & yacc. O’Reilly, Sebastopol, CA.Google Scholar
  26. Lin, L. (1992). Self-improving reactive agents based on reinforcement learning, planning, and teaching. Machine Learning, 8, 293–321.Google Scholar
  27. Lin, L. (1993). Scaling up reinforcement learning for robot control. In Proceedings of the Tenth International Conference on Machine Learning, pp. 182–189 Amherst, MA.Google Scholar
  28. Maclin, R. (1995). Learning from Instruction and Experience: Methods for Incorporating Procedural Domain Theories into Knowledge-Based Neural Networks. Ph.D. thesis, Computer Sciences Department, University of Wisconsin, Madison, WI.Google Scholar
  29. Maclin, R., & Shavlik, J. (1993). Using knowledge-based neural networks to improve algorithms: Refining the Chou-Fasman algorithm for protein folding. Machine Learning, 11, 195–215.Google Scholar
  30. Maclin, R., & Shavlik, J. (1994). Incorporating advice into agents that learn from reinforcements. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pp. 694–699 Seattle, WA.Google Scholar
  31. Mahadevan, S., & Connell, J. (1992). Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55, 311–365.CrossRefGoogle Scholar
  32. McCarthy, J. (1958). Programs with common sense. In Proceedings of the Symposium on the Mechanization of Thought Processes, Vol. I, pp. 77–84. (Reprinted in M. Minsky, editor, 1968, Semantic Information Processing. Cambridge, MA: MIT Press, 403–409.).Google Scholar
  33. Monahan, G. (1982). A survey of partially observable Markov decision processes: Theory, models, and algorithms. Management Science, 28, 1–16.zbMATHMathSciNetCrossRefGoogle Scholar
  34. Mostow, D. J. (1982). Transforming declarative advice into effective procedures: A heuristic search example. In Michalski, R., Carbonell, J., & Mitchell, T. (Eds.), Machine Learning: An Artificial Intelligence Approach, Vol. 1. Tioga Press, Palo Alto.Google Scholar
  35. Nilsson, N. (1994). Teleo-reactive programs for agent control. Journal of Artificial Intelligence Research, 1, 139–158.Google Scholar
  36. Noelle, D., & Cottrell, G. (1994). Towards instructable connectionist systems. In Sun, R., & Bookman, L. (Eds.), Computational Architectures Integrating Neural and Symbolic Processes. Kluwer Academic, Boston.Google Scholar
  37. Omlin, C., & Giles, C. (1992). Training second-order recurrent neural networks using hints. In Proceedings of the Ninth International Conference on Machine Learning, pp. 361–366 Aberdeen, Scotland.Google Scholar
  38. Ourston, D., & Mooney, R. (1994). Theory refinement combining analytical and empirical methods. Artificial Intelligence, 66, 273–309.zbMATHCrossRefMathSciNetGoogle Scholar
  39. Pazzani, M., & Kibler, D. (1992). The utility of knowledge in inductive learning. Machine Learning, 9, 57–94.Google Scholar
  40. Riecken, D. (1994). Special issue on intelligent agents. Communications of the ACM, 37(7).Google Scholar
  41. Shavlik, J., & Towell, G. (1989). An approach to combining explanation-based and neural learning algorithms. Connection Science, 1, 233–255.CrossRefGoogle Scholar
  42. Siegelmann, H. (1994). Neural programming language. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pp. 877–882 Seattle, WA.Google Scholar
  43. Suddarth, S., & Holden, A. (1991). Symbolic-neural systems and the use of hints for developing complex systems. International Journal of Man-Machine Studies, 35, 291–311.CrossRefGoogle Scholar
  44. Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.Google Scholar
  45. Sutton, R. (1991). Reinforcement learning architectures for animats. In Meyer, J., & Wilson, S. (Eds.), From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, pp. 288–296. MIT Press, Cambridge, MA.Google Scholar
  46. Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.zbMATHGoogle Scholar
  47. Thrun, S., & Mitchell, T. (1993). Integrating inductive neural network learning and explanation-based learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 930–936 Chambery, France.Google Scholar
  48. Towell, G., & Shavlik, J. (1993). Extracting refined rules from knowledge-based neural networks. Machine Learning, 13, 71–101.Google Scholar
  49. Towell, G., & Shavlik, J. (1994). Knowledge-based artificial neural networks. Artificial Intelligence, 70, 119–165.zbMATHCrossRefGoogle Scholar
  50. Towell, G., Shavlik, J., & Noordewier, M. (1990). Refinement of approximate domain theories by knowledge-based neural networks. In Proceedings of the Eighth National Conference on Artificial Intelligence, pp. 861–866 Boston, MA.Google Scholar
  51. Utgoff, P., & Clouse, J. (1991). Two kinds of training information for evaluation function learning. In Proceedings of the Ninth National Conference on Artificial Intelligence, pp. 596–600 Anaheim, CA.Google Scholar
  52. Watkins, C. (1989). Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge.Google Scholar
  53. Watkins, C., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.zbMATHGoogle Scholar
  54. Weigend, A. (1993). On overfitting and the effective number of hidden units. In Proceedings of the 1993 Connectionist Models Summer School, pp. 335–342 San Mateo, CA. Morgan Kaufmann.Google Scholar
  55. Whitehead, S. (1991). A complexity analysis of cooperative mechanisms in reinforcement learning. In Proceedings of the Ninth National Conference on Artificial Intelligence, pp. 607–613 Anaheim, CA.Google Scholar
  56. Zadeh, L. (1965). Fuzzy sets. Information and Control, 8, 338–353.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Kluwer Academic Publishers 1996

Authors and Affiliations

  • Richard Maclin
    • 1
  • Jude W. Shavlik
    • 1
  1. 1.Computer Sciences Dept.University of WisconsinMadison

Personalised recommendations