Reinforcement Learning for Mobile Robot Perceptual Learning

  • Xiaochun WangEmail author
  • Xiali Wang
  • Don Mitchell Wilkes


With the development of a computer vision system, the autonomous robots possess the ability to detect target objects from image sequences to build a model for an unknown environment. The next step in mobile robot autonomous navigation is for it to acquire control by some kind of learning. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to control engineering. Unlike supervised learning, in reinforcement learning, only partial feedback is given to the learner about its predictions which may have long term effects through influencing the future states of the controlled system and the goal of reinforcement learning is concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. In this chapter, we will provide an overview of the field of reinforcement learning in general and concepts that are relevant to the proposed work in specific. We begin with a review of the fundamental concepts of reinforcement learning considered from the artificial intelligence or computer science perspective on solving sequential decision-making problems. This introduction is followed by a brief introduction to one of the fundamental reinforcement learning method, the temporal difference learning algorithm, and its implementation based on a psychological model of its, working memory, to represent the learned knowledge.


Reinforcement learning Temporal difference learning Working memory Conjunctive coding 


  1. Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control process. In K. W. Spence (Ed.) The psychology of learning and motivation: Advances in research and theory (pp. 89–195). Academic Press. Google Scholar
  2. Baddeley, A. D., & Hitch, G. J. (1974). Working memory. In G. A. Bower (Ed.) The psychology of learning and motivation (pp. 47–89). Academic Press.Google Scholar
  3. Baddeley, A. D. (2000). The episodic buffer: A new component of working memory. Trends in Cognitive Science, 4(11), 417–423.CrossRefGoogle Scholar
  4. Baldassarre, G. (2002). A modular neural-network model of the basal ganglia’s role in learning and selecting motor behaviours. Cognitive Systems Research, 3, 5–13.CrossRefGoogle Scholar
  5. Bellman, R. E. (1957). A Markov decision process. Journal of Mathematical Mechanics, 6, 679–684.Google Scholar
  6. Bertsekas, D. P. (1987). Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ: Prentice-Hall.zbMATHGoogle Scholar
  7. Bruner, J. S. (1973). Beyond the information given: Studies in the psychology of knowing. New York: W. W. Norton.Google Scholar
  8. Clayton, N. S., & Dickinson, A. (1998). Episodic-like memory during cache recovery by scrub jays. Nature, 395, 272–274.CrossRefGoogle Scholar
  9. Dayan, P. (1992). The convergence of TD (λ) for general λ. Machine Learning, 8, 341–362.zbMATHGoogle Scholar
  10. Dayan, P., & Niv, Y. (2008). Reinforcement learning: The good, the bad and the ugly. Current Opinion in Neuroscience, 18(2), 185–196.CrossRefGoogle Scholar
  11. Desimore, R. (1998). Visual attention mediated by biased competition in extrastriate visual cortex. Philosophical Transactions of the Royal Society B: Biological Sciences, 353(1373), 1245–1255.Google Scholar
  12. Fuster, J. M. (1997). The prefrontal cortex. Philadelphia, New York: Lippincott-Raven Publisher.Google Scholar
  13. Hayek, F. A. (1952). The sensory order. Chicago: University of Chicago Press.Google Scholar
  14. Helmholtz, H. V. (1925). Helmholtz’s treatise on physiological optics (translated from German by J. P. C. Southall). The Optical Society of America, G. Banta, Menasha, Wisconsin.Google Scholar
  15. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRefGoogle Scholar
  16. Hunter, W. S. (1913). The delayed reactions in animals and children. Behavior Monographs, 2(1), 1–85.Google Scholar
  17. Kaelbling, L. P., Littmann, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.CrossRefGoogle Scholar
  18. Maia, T.V. (2009). Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, & Behavioral Neuroscience, 9(4), 343–364.CrossRefGoogle Scholar
  19. Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53(3):139–154.MathSciNetCrossRefGoogle Scholar
  20. Phillips, J. L., & Noelle, D. C. (2005). A biologically inspired working memory framework for robots. In Proceedings of the 27th Annual Meeting of the Cognitive Science Society, Stresa, Italy.Google Scholar
  21. Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: The effectiveness of reinforcement and non-reinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–69). New York: Appleton-Century-Crofts.Google Scholar
  22. Russell, S. J., & Norvig, P. (2003). Artificial intelligence: A modern approach (2nd ed.). Prentice Hall Series in Artificial Intelligence. Prentice Hall.Google Scholar
  23. Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learning without state-estimation in partially observable Markovian decision processes. In Proceedings of the 11th International Conference on Machine Learning (ICML’94), New Brunswick, NJ, 10–13 July (pp. 284–292).Google Scholar
  24. Sutton, R. S., & Barto, A. G. (1990). Time-derivative models of Pavlovian conditioning. In M. Gabriel & J. W. Moore (Eds.), Learning and computational neuroscience (pp. 497–537). Cambridge, MA: MIT Press.Google Scholar
  25. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  26. Sutton, R. S. (1998). Reinforcement learning: An introduction. Cambridge, Massachusetts: MIT Press.Google Scholar
  27. Tulving, E. (2002). Episodic memory: From mind to brain. Annual Review of Psychology, 53, 1–25.CrossRefGoogle Scholar
  28. Wierstra, D., Foerster, A., Peters, J., & Schmidhuber, J. (2007). Solving deep memory POMDPs with recurrent policy gradients. In Proceedings of the 17th International Conference on Artificial Neural Networks (ICANN), Paris, France, 9–13 September (Vol. 4668 of Lecture Notes in Computer Science, pp. 697–706).Google Scholar

Copyright information

© Xi'an Jiaotong University Press 2020

Authors and Affiliations

  • Xiaochun Wang
    • 1
    Email author
  • Xiali Wang
    • 2
  • Don Mitchell Wilkes
    • 3
  1. 1.School of Software EngineeringXi’an Jiaotong UniversityXi’anChina
  2. 2.School of Information EngineeringChang’an UniversityXi’anChina
  3. 3.Department of Electrical Engineering and Computer ScienceVanderbilt UniversityNashvilleUSA

Personalised recommendations