Science China Technological Sciences

, Volume 56, Issue 11, pp 2745–2761 | Cite as

The skinner automaton: A psychological model formalizing the theory of operant conditioning

Article

Abstract

Operant conditioning is one of the fundamental mechanisms of animal learning, which suggests that the behavior of all animals, from protists to humans, is guided by its consequences. We present a new stochastic learning automaton called a Skinner automaton that is a psychological model for formalizing the theory of operant conditioning. We identify animal operant learning with a thermodynamic process, and derive a so-called Skinner algorithm from Monte Carlo method as well as Metropolis algorithm and simulated annealing. Under certain conditions, we prove that the Skinner automaton is expedient, ɛ-optimal, optimal, and that the operant probabilities converge to the set of stable roots with probability of 1. The Skinner automaton enables machines to autonomously learn in an animal-like way.

Keywords

Learning automata Boltzmann distribution operant conditioning operant learning simulated annealing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Skinner B F. The Behavior of Organisms. New York: Appleton-Century-Crofts, 1938. 61–116Google Scholar
  2. 2.
    Skinner B F. Science and Human Behavior. New York: Macmillan, 1953. 45–128Google Scholar
  3. 3.
    Thorndike E L. Animal Intelligence: Experimental Studies. Edison: Transaction Publishers, 1911. 241–282CrossRefGoogle Scholar
  4. 4.
    Watson J B. Behaviorism. New York: People’s Institute, 1924. 141–232Google Scholar
  5. 5.
    Watson J B. Psychology as the behaviorist views it. Psychol Rev, 1913, 20: 158–177CrossRefGoogle Scholar
  6. 6.
    Pavlov I P. Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex. London: Oxford University Press, 1927. 219–300Google Scholar
  7. 7.
    Grossberg S. On the dynamics of operant conditioning. J Theor Biol, 1971, 33: 225–255CrossRefGoogle Scholar
  8. 8.
    Grossberg S. Classical and instrumental learning by neural networks. In: Rosen R, Snell F, eds. Progress in theoretical biology. New York: Academic Press, 1974. 51–141CrossRefGoogle Scholar
  9. 9.
    Chang C, Gaudiano P. Application of biological learning theories to mobile robot avoidance and approach behaviors. Advs Complex Syst, 1998, 1: 79–114CrossRefMATHGoogle Scholar
  10. 10.
    Touretzky D S, Saksida L M. Operant conditioning in Skinnerbots. Adapt Behav, 1997, 5: 219–247CrossRefGoogle Scholar
  11. 11.
    Saksida L M, Raymond S M, Touretzky D S. Shaping robot behavior using principles from instrumental conditioning. Rob Auton Syst, 1997, 22: 231–249CrossRefGoogle Scholar
  12. 12.
    Daw N D, Touretzky D S. Operant behavior suggests attentional gating of dopamine system inputs. Neurocomputing, 2001, 38: 1161–1167CrossRefGoogle Scholar
  13. 13.
    Itoh K, Miwa H, Matsumoto M, et al. Behavior model of humanoid robots based on operant conditioning. In: Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots, Tsukuba, Japan, 2005. 220–225Google Scholar
  14. 14.
    Itoh K, Onishi Y, Takahashi S, et al. Development of face robot to express various face shapes by moving the parts and outline. In: Proceedings of the 2nd Biennial IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, Scottsdale, AZ, USA, 2008. 439–444Google Scholar
  15. 15.
    Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. 1–86Google Scholar
  16. 16.
    Narendra K S, Thathachar M A L. Learning automata: A survey. IEEE Trans Syst Man Cybern, 1974, SMC-14: 323–334MathSciNetCrossRefGoogle Scholar
  17. 17.
    Thathachar M A L, Sastry P S. Varieties of learning automata: An Overview. IEEE Trans Syst Man Cybern B Cybern, 2002, 32: 711–722CrossRefGoogle Scholar
  18. 18.
    Thathachar M A L, Sastry P S. A new approach to designing reinforcement schemes for learning automata. IEEE Trans Syst Man Cybern, 1985, SMC-15: 168–175MathSciNetCrossRefGoogle Scholar
  19. 19.
    Lanctot J K, Oommen B J. Discretized estimator learning automata. IEEE Trans Syst Man Cybern, 1992, 22: 1473–1483MathSciNetCrossRefGoogle Scholar
  20. 20.
    Thathachar M A L, Phansalkar V V. Learning the global maximum with parameterized learning automata. IEEE Trans Neural Netw, 1995, 6: 398–406CrossRefGoogle Scholar
  21. 21.
    Phansalkar V V, Thathachar M A L. Local and global optimization algorithms for generalized learning automata. Neural Comput, 1995, 7: 950–973CrossRefGoogle Scholar
  22. 22.
    Hauwere Y-M De, Vrancx P, Nowé A. Generalized learning automata for multi-agent reinforcement learning. AI Commun, 2010, 23: 311–324MathSciNetMATHGoogle Scholar
  23. 23.
    Viswanathan R, Narendra K S. A note on the linear reinforcement scheme for variable-structure stochastic automata. IEEE Trans Syst Man Cybern, 1972, SMC-2: 292–294MathSciNetGoogle Scholar
  24. 24.
    Poznyak S, Najim K. On nonlinear reinforcement schemes. IEEE Trans Automat Contr, 1997, 42: 1002–1004MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Stoica F, Popa E M. An absolutely expedient learning algorithm for stochastic automata. WSEAS Trans COMPUTERS, 2007, 6: 229–235Google Scholar
  26. 26.
    Stoica F, Popa E M. A new evolutionary reinforcement scheme for stochastic learning automata. In: Mastorakis N E, Mladenov V, Bojkovic Z, et al., eds. The Proceedings of the 12th WSEAS International Conference on Computers, Stevens Point, Wisconsin, USA, 2008. 268–273Google Scholar
  27. 27.
    Simian D, Stoica F. A new nonlinear reinforcement scheme for stochastic learning automata. In: The Proceedings of 12th WSEAS International Conference on Automatic control, Modeling & Simulation, Catania, Sicily, Italy, 2010. 450–454Google Scholar
  28. 28.
    Metropolis N, Rosenbluth A W, Rosenbluth M N, et al. Equation of State Calculations by Fast Computing Machines. J Chem Phys, 1953, 21: 1087–1092CrossRefGoogle Scholar
  29. 29.
    Jorgensen W L. Perspective on ‘Equation of state calculations by fast computing machines’. Theor Chem Acc, 2000, 103: 225–227CrossRefGoogle Scholar
  30. 30.
    Kirkpatrick S, Gelatt C D, Vecchi M P. Optimization by Simulated Annealing. Science, 1983, 220: 671–680MathSciNetCrossRefMATHGoogle Scholar
  31. 31.
    Černý V A. Thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. J Optim Theory Appl, 1985, 45: 41–51MathSciNetCrossRefMATHGoogle Scholar
  32. 32.
    Horowitz M J. Introduction to Psychodynamics: A New synthesis. New York: Basic Books, 1988. 17–243Google Scholar
  33. 33.
    Palm W J. System Dynamics. 2nd ed. London: McGraw-Hill Science/Engineering/Math, 2009. 172–283Google Scholar
  34. 34.
    Kiese-Himmel C. Verstärkungslernen: Operante Konditionierung. Sprache-Stimme-Gehör, 2010, 34: 1CrossRefGoogle Scholar
  35. 35.
    Dayan P, Belleine W. Reward, motivation and reinforcement learning. Neuron, 2002, 36: 285–298CrossRefGoogle Scholar
  36. 36.
    Oudeyer P Y, Kaplan F, Hafner V V. Intrinsic motivation systems for autonomous mental development. IEEE Trans Evolut Comput, 2007, 11: 265–286CrossRefGoogle Scholar
  37. 37.
    Brucke E W. Lectures on Physiology. Vienna: Braumuller, 1874.Google Scholar
  38. 38.
    Haynie D. Biological Thermodynamics. Cambridge: Cambridge University Press, 2001. 293–330CrossRefGoogle Scholar
  39. 39.
    Nicholls D G, Ferguson S J. Bioenergetics. 4th ed. Europe: Academic Press, 2013. 1–52CrossRefGoogle Scholar
  40. 40.
    Hopfield J J. Networks, computations, logic, and noise. In: Proceedings of IEEE First International Conference on Neural Networks, California, USA, 1987. 109–141Google Scholar
  41. 41.
    Neumann J von. Various techniques used in connection with random digits, in Monte Carlo Method. Applied Mathematics Series, vol. 12, Washington D.C.: U.S. Department of Commerce, National Bureau of Standards, 1951. 36–38Google Scholar
  42. 42.
    Skinner B F. ’Superstition’ in the pigeon. J Exp Psychol, 1948, 38(2): 168–172CrossRefGoogle Scholar
  43. 43.
    Wiener N. Cybernetics: Or Control and Communication in the Animal and the Machine. New York: J. Wiley, 1948. 60–132Google Scholar
  44. 44.
    Braitenberg V. Vehicles: Experiments in Synthetic Psychology. USA: The MIT Press, 1986. 95–144Google Scholar
  45. 45.
    Ooi R C. Balancing a two-wheeled autonomous robot. Dissertation of Masteral Degree. Perth: University of Western Australia, 2003. 1–7Google Scholar
  46. 46.
    Ruan X G, Li X Y, ZHAO J W, et al. A flexible two-wheeled self-balancing robot system and its motion control method. China Patent 200910084259.8, 2010-10-9Google Scholar
  47. 47.
    Asada M, Hosoda K, Kuniyoshi Y, et al. Cognitive developmental robotics: A survey. IEEE Trans Auton Ment Dev, 2009, 1: 12–34CrossRefGoogle Scholar
  48. 48.
    Wood S E, Wood E G, Boyd D. Mastering the World of Psychology. Boston: Allyn & Bacon, 2004. 333–354Google Scholar
  49. 49.
    Baranès A, Oudeyer P Y. R-IAC: Robust intrinsically motivated exploration and active learning. IEEE Trans Auton Ment De, 2009, 1: 155–169CrossRefGoogle Scholar
  50. 50.
    Oudeyer P Y, Kaplan F. What is intrinsic motivation? A typology of computational approaches. Front Neurorobot, 2007, 1: 1–14CrossRefGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Institute of Artificial Intelligence and Robots, School of Electronic Information and Control EngineeringBeijing University of TechnologyBeijingChina

Personalised recommendations