Introspective Agents: Confidence Measures for General Value Functions

  • Craig Sherstan
  • Adam White
  • Marlos C. Machado
  • Patrick M. Pilarski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9782)

Abstract

Agents of general intelligence deployed in real-world scenarios must adapt to ever-changing environmental conditions. While such adaptive agents may leverage engineered knowledge, they will require the capacity to construct and evaluate knowledge themselves from their own experience in a bottom-up, constructivist fashion. This position paper builds on the idea of encoding knowledge as temporally extended predictions through the use of general value functions. Prior work has focused on learning predictions about externally derived signals about a task or environment (e.g. battery level, joint position). Here we advocate that the agent should also predict internally generated signals regarding its own learning process—for example, an agent’s confidence in its learned predictions. Finally, we suggest how such information would be beneficial in creating an introspective agent that is able to learn to make good decisions in a complex, changing world.

References

  1. 1.
    Sutton, R.S., Modayil, J., Delp, M., Degris, T., Pilarski, P.M., White, A., Precup, D.: Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction categories and subject descriptors. In: International Conference on Autonomous Agents and Multi-Agent Systems, pp. 761–768 (2011)Google Scholar
  2. 2.
    Modayil, J., White, A., Sutton, R.S.: Multi-timescale nexting in a reinforcement learning robot. Adapt. Behav. 22, 146–160 (2014)CrossRefGoogle Scholar
  3. 3.
    Edwards, A.L., Dawson, M.R., Hebert, J.S., Sherstan, C., Sutton, R.S., Chan, K.M., Pilarski, P.M.: Application of real-time machine learning to myoelectric prosthesis control: A case series in adaptive switching. Prosthet. Orthot. Int., published online ahead of print, pp. 1–9 (2015)Google Scholar
  4. 4.
    Sherstan, C., Modayil, J., Pilarski, P.M.: A collaborative approach to the simultaneous multi-joint control of a prosthetic arm. In: International Conference on Rehabilitation Robotics, Singapore, Singapore, pp. 13–18 (2015)Google Scholar
  5. 5.
    Clark, A.: Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford University Press, New York (2015)Google Scholar
  6. 6.
    Wiering, M.A., van Hasselt, H.: Ensemble algorithms in reinforcement learning. IEEE Trans. Syst. Man, Cybern. Part B Cybern. 38(4), 930–936 (2008)CrossRefGoogle Scholar
  7. 7.
    White, A.: Developing a predictive approach to knowledge. Ph.D. Thesis. University of Alberta (2015)Google Scholar
  8. 8.
    Rafols, E.J., Ring, M.B., Sutton, R.S., Tanner, B.: Using predictive representations to improve generalization in reinforcement learning. In: International Joint Conference on Artificial Intelligence, pp. 835–840 (2005)Google Scholar
  9. 9.
    Schaul, T., Ring, M.: Better generalization with forecasts. In: International Joint Conference on Artificial Intelligence, Beijing, China, pp. 1656–1662 (2013)Google Scholar
  10. 10.
    Littman, M.L., Sutton, R.S., Singh, S.: Predictive representations of state. In: Advances in Neural Information Processing Systems 14, pp. 1555–1561 (2001)Google Scholar
  11. 11.
    Sherstan, C.: Towards Prosthetic Arms as Wearable Intelligent Robots. MSc Thesis. University of Alberta (2015)Google Scholar
  12. 12.
    White, M., White, A.: Interval estimation for reinforcement-learning algorithms in continuous-state domains. In: Advances in Neural Information Processing Systems 23, pp. 2433–2441 (2010)Google Scholar
  13. 13.
    Schmidhuber, J.: Curious model-building control systems. In: IEEE International Joint Conference on Neural Networks, Singapore, Singapore, Singapore, pp. 1458–1463 (1991)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Craig Sherstan
    • 1
  • Adam White
    • 2
  • Marlos C. Machado
    • 1
  • Patrick M. Pilarski
    • 1
  1. 1.University of AlbertaEdmontonCanada
  2. 2.Indiana UniversityBloomingtonUSA

Personalised recommendations