Skip to main content
Log in

The skinner automaton: A psychological model formalizing the theory of operant conditioning

  • Published:
Science China Technological Sciences Aims and scope Submit manuscript

Abstract

Operant conditioning is one of the fundamental mechanisms of animal learning, which suggests that the behavior of all animals, from protists to humans, is guided by its consequences. We present a new stochastic learning automaton called a Skinner automaton that is a psychological model for formalizing the theory of operant conditioning. We identify animal operant learning with a thermodynamic process, and derive a so-called Skinner algorithm from Monte Carlo method as well as Metropolis algorithm and simulated annealing. Under certain conditions, we prove that the Skinner automaton is expedient, ɛ-optimal, optimal, and that the operant probabilities converge to the set of stable roots with probability of 1. The Skinner automaton enables machines to autonomously learn in an animal-like way.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Skinner B F. The Behavior of Organisms. New York: Appleton-Century-Crofts, 1938. 61–116

    Google Scholar 

  2. Skinner B F. Science and Human Behavior. New York: Macmillan, 1953. 45–128

    Google Scholar 

  3. Thorndike E L. Animal Intelligence: Experimental Studies. Edison: Transaction Publishers, 1911. 241–282

    Book  Google Scholar 

  4. Watson J B. Behaviorism. New York: People’s Institute, 1924. 141–232

    Google Scholar 

  5. Watson J B. Psychology as the behaviorist views it. Psychol Rev, 1913, 20: 158–177

    Article  Google Scholar 

  6. Pavlov I P. Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex. London: Oxford University Press, 1927. 219–300

    Google Scholar 

  7. Grossberg S. On the dynamics of operant conditioning. J Theor Biol, 1971, 33: 225–255

    Article  Google Scholar 

  8. Grossberg S. Classical and instrumental learning by neural networks. In: Rosen R, Snell F, eds. Progress in theoretical biology. New York: Academic Press, 1974. 51–141

    Chapter  Google Scholar 

  9. Chang C, Gaudiano P. Application of biological learning theories to mobile robot avoidance and approach behaviors. Advs Complex Syst, 1998, 1: 79–114

    Article  MATH  Google Scholar 

  10. Touretzky D S, Saksida L M. Operant conditioning in Skinnerbots. Adapt Behav, 1997, 5: 219–247

    Article  Google Scholar 

  11. Saksida L M, Raymond S M, Touretzky D S. Shaping robot behavior using principles from instrumental conditioning. Rob Auton Syst, 1997, 22: 231–249

    Article  Google Scholar 

  12. Daw N D, Touretzky D S. Operant behavior suggests attentional gating of dopamine system inputs. Neurocomputing, 2001, 38: 1161–1167

    Article  Google Scholar 

  13. Itoh K, Miwa H, Matsumoto M, et al. Behavior model of humanoid robots based on operant conditioning. In: Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots, Tsukuba, Japan, 2005. 220–225

    Google Scholar 

  14. Itoh K, Onishi Y, Takahashi S, et al. Development of face robot to express various face shapes by moving the parts and outline. In: Proceedings of the 2nd Biennial IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, Scottsdale, AZ, USA, 2008. 439–444

    Google Scholar 

  15. Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. 1–86

    Google Scholar 

  16. Narendra K S, Thathachar M A L. Learning automata: A survey. IEEE Trans Syst Man Cybern, 1974, SMC-14: 323–334

    Article  MathSciNet  Google Scholar 

  17. Thathachar M A L, Sastry P S. Varieties of learning automata: An Overview. IEEE Trans Syst Man Cybern B Cybern, 2002, 32: 711–722

    Article  Google Scholar 

  18. Thathachar M A L, Sastry P S. A new approach to designing reinforcement schemes for learning automata. IEEE Trans Syst Man Cybern, 1985, SMC-15: 168–175

    Article  MathSciNet  Google Scholar 

  19. Lanctot J K, Oommen B J. Discretized estimator learning automata. IEEE Trans Syst Man Cybern, 1992, 22: 1473–1483

    Article  MathSciNet  Google Scholar 

  20. Thathachar M A L, Phansalkar V V. Learning the global maximum with parameterized learning automata. IEEE Trans Neural Netw, 1995, 6: 398–406

    Article  Google Scholar 

  21. Phansalkar V V, Thathachar M A L. Local and global optimization algorithms for generalized learning automata. Neural Comput, 1995, 7: 950–973

    Article  Google Scholar 

  22. Hauwere Y-M De, Vrancx P, Nowé A. Generalized learning automata for multi-agent reinforcement learning. AI Commun, 2010, 23: 311–324

    MathSciNet  MATH  Google Scholar 

  23. Viswanathan R, Narendra K S. A note on the linear reinforcement scheme for variable-structure stochastic automata. IEEE Trans Syst Man Cybern, 1972, SMC-2: 292–294

    MathSciNet  Google Scholar 

  24. Poznyak S, Najim K. On nonlinear reinforcement schemes. IEEE Trans Automat Contr, 1997, 42: 1002–1004

    Article  MathSciNet  MATH  Google Scholar 

  25. Stoica F, Popa E M. An absolutely expedient learning algorithm for stochastic automata. WSEAS Trans COMPUTERS, 2007, 6: 229–235

    Google Scholar 

  26. Stoica F, Popa E M. A new evolutionary reinforcement scheme for stochastic learning automata. In: Mastorakis N E, Mladenov V, Bojkovic Z, et al., eds. The Proceedings of the 12th WSEAS International Conference on Computers, Stevens Point, Wisconsin, USA, 2008. 268–273

    Google Scholar 

  27. Simian D, Stoica F. A new nonlinear reinforcement scheme for stochastic learning automata. In: The Proceedings of 12th WSEAS International Conference on Automatic control, Modeling & Simulation, Catania, Sicily, Italy, 2010. 450–454

    Google Scholar 

  28. Metropolis N, Rosenbluth A W, Rosenbluth M N, et al. Equation of State Calculations by Fast Computing Machines. J Chem Phys, 1953, 21: 1087–1092

    Article  Google Scholar 

  29. Jorgensen W L. Perspective on ‘Equation of state calculations by fast computing machines’. Theor Chem Acc, 2000, 103: 225–227

    Article  Google Scholar 

  30. Kirkpatrick S, Gelatt C D, Vecchi M P. Optimization by Simulated Annealing. Science, 1983, 220: 671–680

    Article  MathSciNet  MATH  Google Scholar 

  31. Černý V A. Thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. J Optim Theory Appl, 1985, 45: 41–51

    Article  MathSciNet  MATH  Google Scholar 

  32. Horowitz M J. Introduction to Psychodynamics: A New synthesis. New York: Basic Books, 1988. 17–243

    Google Scholar 

  33. Palm W J. System Dynamics. 2nd ed. London: McGraw-Hill Science/Engineering/Math, 2009. 172–283

    Google Scholar 

  34. Kiese-Himmel C. Verstärkungslernen: Operante Konditionierung. Sprache-Stimme-Gehör, 2010, 34: 1

    Article  Google Scholar 

  35. Dayan P, Belleine W. Reward, motivation and reinforcement learning. Neuron, 2002, 36: 285–298

    Article  Google Scholar 

  36. Oudeyer P Y, Kaplan F, Hafner V V. Intrinsic motivation systems for autonomous mental development. IEEE Trans Evolut Comput, 2007, 11: 265–286

    Article  Google Scholar 

  37. Brucke E W. Lectures on Physiology. Vienna: Braumuller, 1874.

    Google Scholar 

  38. Haynie D. Biological Thermodynamics. Cambridge: Cambridge University Press, 2001. 293–330

    Book  Google Scholar 

  39. Nicholls D G, Ferguson S J. Bioenergetics. 4th ed. Europe: Academic Press, 2013. 1–52

    Book  Google Scholar 

  40. Hopfield J J. Networks, computations, logic, and noise. In: Proceedings of IEEE First International Conference on Neural Networks, California, USA, 1987. 109–141

    Google Scholar 

  41. Neumann J von. Various techniques used in connection with random digits, in Monte Carlo Method. Applied Mathematics Series, vol. 12, Washington D.C.: U.S. Department of Commerce, National Bureau of Standards, 1951. 36–38

    Google Scholar 

  42. Skinner B F. ’Superstition’ in the pigeon. J Exp Psychol, 1948, 38(2): 168–172

    Article  Google Scholar 

  43. Wiener N. Cybernetics: Or Control and Communication in the Animal and the Machine. New York: J. Wiley, 1948. 60–132

    Google Scholar 

  44. Braitenberg V. Vehicles: Experiments in Synthetic Psychology. USA: The MIT Press, 1986. 95–144

    Google Scholar 

  45. Ooi R C. Balancing a two-wheeled autonomous robot. Dissertation of Masteral Degree. Perth: University of Western Australia, 2003. 1–7

    Google Scholar 

  46. Ruan X G, Li X Y, ZHAO J W, et al. A flexible two-wheeled self-balancing robot system and its motion control method. China Patent 200910084259.8, 2010-10-9

  47. Asada M, Hosoda K, Kuniyoshi Y, et al. Cognitive developmental robotics: A survey. IEEE Trans Auton Ment Dev, 2009, 1: 12–34

    Article  Google Scholar 

  48. Wood S E, Wood E G, Boyd D. Mastering the World of Psychology. Boston: Allyn & Bacon, 2004. 333–354

    Google Scholar 

  49. Baranès A, Oudeyer P Y. R-IAC: Robust intrinsically motivated exploration and active learning. IEEE Trans Auton Ment De, 2009, 1: 155–169

    Article  Google Scholar 

  50. Oudeyer P Y, Kaplan F. What is intrinsic motivation? A typology of computational approaches. Front Neurorobot, 2007, 1: 1–14

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuan Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruan, X., Wu, X. The skinner automaton: A psychological model formalizing the theory of operant conditioning. Sci. China Technol. Sci. 56, 2745–2761 (2013). https://doi.org/10.1007/s11431-013-5369-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11431-013-5369-0

Keywords

Navigation