Abstract
Operant conditioning is one of the fundamental mechanisms of animal learning, which suggests that the behavior of all animals, from protists to humans, is guided by its consequences. We present a new stochastic learning automaton called a Skinner automaton that is a psychological model for formalizing the theory of operant conditioning. We identify animal operant learning with a thermodynamic process, and derive a so-called Skinner algorithm from Monte Carlo method as well as Metropolis algorithm and simulated annealing. Under certain conditions, we prove that the Skinner automaton is expedient, ɛ-optimal, optimal, and that the operant probabilities converge to the set of stable roots with probability of 1. The Skinner automaton enables machines to autonomously learn in an animal-like way.
Similar content being viewed by others
References
Skinner B F. The Behavior of Organisms. New York: Appleton-Century-Crofts, 1938. 61–116
Skinner B F. Science and Human Behavior. New York: Macmillan, 1953. 45–128
Thorndike E L. Animal Intelligence: Experimental Studies. Edison: Transaction Publishers, 1911. 241–282
Watson J B. Behaviorism. New York: People’s Institute, 1924. 141–232
Watson J B. Psychology as the behaviorist views it. Psychol Rev, 1913, 20: 158–177
Pavlov I P. Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex. London: Oxford University Press, 1927. 219–300
Grossberg S. On the dynamics of operant conditioning. J Theor Biol, 1971, 33: 225–255
Grossberg S. Classical and instrumental learning by neural networks. In: Rosen R, Snell F, eds. Progress in theoretical biology. New York: Academic Press, 1974. 51–141
Chang C, Gaudiano P. Application of biological learning theories to mobile robot avoidance and approach behaviors. Advs Complex Syst, 1998, 1: 79–114
Touretzky D S, Saksida L M. Operant conditioning in Skinnerbots. Adapt Behav, 1997, 5: 219–247
Saksida L M, Raymond S M, Touretzky D S. Shaping robot behavior using principles from instrumental conditioning. Rob Auton Syst, 1997, 22: 231–249
Daw N D, Touretzky D S. Operant behavior suggests attentional gating of dopamine system inputs. Neurocomputing, 2001, 38: 1161–1167
Itoh K, Miwa H, Matsumoto M, et al. Behavior model of humanoid robots based on operant conditioning. In: Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots, Tsukuba, Japan, 2005. 220–225
Itoh K, Onishi Y, Takahashi S, et al. Development of face robot to express various face shapes by moving the parts and outline. In: Proceedings of the 2nd Biennial IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, Scottsdale, AZ, USA, 2008. 439–444
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. 1–86
Narendra K S, Thathachar M A L. Learning automata: A survey. IEEE Trans Syst Man Cybern, 1974, SMC-14: 323–334
Thathachar M A L, Sastry P S. Varieties of learning automata: An Overview. IEEE Trans Syst Man Cybern B Cybern, 2002, 32: 711–722
Thathachar M A L, Sastry P S. A new approach to designing reinforcement schemes for learning automata. IEEE Trans Syst Man Cybern, 1985, SMC-15: 168–175
Lanctot J K, Oommen B J. Discretized estimator learning automata. IEEE Trans Syst Man Cybern, 1992, 22: 1473–1483
Thathachar M A L, Phansalkar V V. Learning the global maximum with parameterized learning automata. IEEE Trans Neural Netw, 1995, 6: 398–406
Phansalkar V V, Thathachar M A L. Local and global optimization algorithms for generalized learning automata. Neural Comput, 1995, 7: 950–973
Hauwere Y-M De, Vrancx P, Nowé A. Generalized learning automata for multi-agent reinforcement learning. AI Commun, 2010, 23: 311–324
Viswanathan R, Narendra K S. A note on the linear reinforcement scheme for variable-structure stochastic automata. IEEE Trans Syst Man Cybern, 1972, SMC-2: 292–294
Poznyak S, Najim K. On nonlinear reinforcement schemes. IEEE Trans Automat Contr, 1997, 42: 1002–1004
Stoica F, Popa E M. An absolutely expedient learning algorithm for stochastic automata. WSEAS Trans COMPUTERS, 2007, 6: 229–235
Stoica F, Popa E M. A new evolutionary reinforcement scheme for stochastic learning automata. In: Mastorakis N E, Mladenov V, Bojkovic Z, et al., eds. The Proceedings of the 12th WSEAS International Conference on Computers, Stevens Point, Wisconsin, USA, 2008. 268–273
Simian D, Stoica F. A new nonlinear reinforcement scheme for stochastic learning automata. In: The Proceedings of 12th WSEAS International Conference on Automatic control, Modeling & Simulation, Catania, Sicily, Italy, 2010. 450–454
Metropolis N, Rosenbluth A W, Rosenbluth M N, et al. Equation of State Calculations by Fast Computing Machines. J Chem Phys, 1953, 21: 1087–1092
Jorgensen W L. Perspective on ‘Equation of state calculations by fast computing machines’. Theor Chem Acc, 2000, 103: 225–227
Kirkpatrick S, Gelatt C D, Vecchi M P. Optimization by Simulated Annealing. Science, 1983, 220: 671–680
Černý V A. Thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. J Optim Theory Appl, 1985, 45: 41–51
Horowitz M J. Introduction to Psychodynamics: A New synthesis. New York: Basic Books, 1988. 17–243
Palm W J. System Dynamics. 2nd ed. London: McGraw-Hill Science/Engineering/Math, 2009. 172–283
Kiese-Himmel C. Verstärkungslernen: Operante Konditionierung. Sprache-Stimme-Gehör, 2010, 34: 1
Dayan P, Belleine W. Reward, motivation and reinforcement learning. Neuron, 2002, 36: 285–298
Oudeyer P Y, Kaplan F, Hafner V V. Intrinsic motivation systems for autonomous mental development. IEEE Trans Evolut Comput, 2007, 11: 265–286
Brucke E W. Lectures on Physiology. Vienna: Braumuller, 1874.
Haynie D. Biological Thermodynamics. Cambridge: Cambridge University Press, 2001. 293–330
Nicholls D G, Ferguson S J. Bioenergetics. 4th ed. Europe: Academic Press, 2013. 1–52
Hopfield J J. Networks, computations, logic, and noise. In: Proceedings of IEEE First International Conference on Neural Networks, California, USA, 1987. 109–141
Neumann J von. Various techniques used in connection with random digits, in Monte Carlo Method. Applied Mathematics Series, vol. 12, Washington D.C.: U.S. Department of Commerce, National Bureau of Standards, 1951. 36–38
Skinner B F. ’Superstition’ in the pigeon. J Exp Psychol, 1948, 38(2): 168–172
Wiener N. Cybernetics: Or Control and Communication in the Animal and the Machine. New York: J. Wiley, 1948. 60–132
Braitenberg V. Vehicles: Experiments in Synthetic Psychology. USA: The MIT Press, 1986. 95–144
Ooi R C. Balancing a two-wheeled autonomous robot. Dissertation of Masteral Degree. Perth: University of Western Australia, 2003. 1–7
Ruan X G, Li X Y, ZHAO J W, et al. A flexible two-wheeled self-balancing robot system and its motion control method. China Patent 200910084259.8, 2010-10-9
Asada M, Hosoda K, Kuniyoshi Y, et al. Cognitive developmental robotics: A survey. IEEE Trans Auton Ment Dev, 2009, 1: 12–34
Wood S E, Wood E G, Boyd D. Mastering the World of Psychology. Boston: Allyn & Bacon, 2004. 333–354
Baranès A, Oudeyer P Y. R-IAC: Robust intrinsically motivated exploration and active learning. IEEE Trans Auton Ment De, 2009, 1: 155–169
Oudeyer P Y, Kaplan F. What is intrinsic motivation? A typology of computational approaches. Front Neurorobot, 2007, 1: 1–14
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ruan, X., Wu, X. The skinner automaton: A psychological model formalizing the theory of operant conditioning. Sci. China Technol. Sci. 56, 2745–2761 (2013). https://doi.org/10.1007/s11431-013-5369-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11431-013-5369-0