The skinner automaton: A psychological model formalizing the theory of operant conditioning
- 1.9k Downloads
- 3 Citations
Abstract
Operant conditioning is one of the fundamental mechanisms of animal learning, which suggests that the behavior of all animals, from protists to humans, is guided by its consequences. We present a new stochastic learning automaton called a Skinner automaton that is a psychological model for formalizing the theory of operant conditioning. We identify animal operant learning with a thermodynamic process, and derive a so-called Skinner algorithm from Monte Carlo method as well as Metropolis algorithm and simulated annealing. Under certain conditions, we prove that the Skinner automaton is expedient, ɛ-optimal, optimal, and that the operant probabilities converge to the set of stable roots with probability of 1. The Skinner automaton enables machines to autonomously learn in an animal-like way.
Keywords
Learning automata Boltzmann distribution operant conditioning operant learning simulated annealingPreview
Unable to display preview. Download preview PDF.
References
- 1.Skinner B F. The Behavior of Organisms. New York: Appleton-Century-Crofts, 1938. 61–116Google Scholar
- 2.Skinner B F. Science and Human Behavior. New York: Macmillan, 1953. 45–128Google Scholar
- 3.Thorndike E L. Animal Intelligence: Experimental Studies. Edison: Transaction Publishers, 1911. 241–282CrossRefGoogle Scholar
- 4.Watson J B. Behaviorism. New York: People’s Institute, 1924. 141–232Google Scholar
- 5.Watson J B. Psychology as the behaviorist views it. Psychol Rev, 1913, 20: 158–177CrossRefGoogle Scholar
- 6.Pavlov I P. Conditioned Reflexes: An Investigation of the Physiological Activity of the Cerebral Cortex. London: Oxford University Press, 1927. 219–300Google Scholar
- 7.Grossberg S. On the dynamics of operant conditioning. J Theor Biol, 1971, 33: 225–255CrossRefGoogle Scholar
- 8.Grossberg S. Classical and instrumental learning by neural networks. In: Rosen R, Snell F, eds. Progress in theoretical biology. New York: Academic Press, 1974. 51–141CrossRefGoogle Scholar
- 9.Chang C, Gaudiano P. Application of biological learning theories to mobile robot avoidance and approach behaviors. Advs Complex Syst, 1998, 1: 79–114CrossRefMATHGoogle Scholar
- 10.Touretzky D S, Saksida L M. Operant conditioning in Skinnerbots. Adapt Behav, 1997, 5: 219–247CrossRefGoogle Scholar
- 11.Saksida L M, Raymond S M, Touretzky D S. Shaping robot behavior using principles from instrumental conditioning. Rob Auton Syst, 1997, 22: 231–249CrossRefGoogle Scholar
- 12.Daw N D, Touretzky D S. Operant behavior suggests attentional gating of dopamine system inputs. Neurocomputing, 2001, 38: 1161–1167CrossRefGoogle Scholar
- 13.Itoh K, Miwa H, Matsumoto M, et al. Behavior model of humanoid robots based on operant conditioning. In: Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots, Tsukuba, Japan, 2005. 220–225Google Scholar
- 14.Itoh K, Onishi Y, Takahashi S, et al. Development of face robot to express various face shapes by moving the parts and outline. In: Proceedings of the 2nd Biennial IEEE/RAS-EMBS International Conference on Biomedical Robotics and Biomechatronics, Scottsdale, AZ, USA, 2008. 439–444Google Scholar
- 15.Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998. 1–86Google Scholar
- 16.Narendra K S, Thathachar M A L. Learning automata: A survey. IEEE Trans Syst Man Cybern, 1974, SMC-14: 323–334MathSciNetCrossRefGoogle Scholar
- 17.Thathachar M A L, Sastry P S. Varieties of learning automata: An Overview. IEEE Trans Syst Man Cybern B Cybern, 2002, 32: 711–722CrossRefGoogle Scholar
- 18.Thathachar M A L, Sastry P S. A new approach to designing reinforcement schemes for learning automata. IEEE Trans Syst Man Cybern, 1985, SMC-15: 168–175MathSciNetCrossRefGoogle Scholar
- 19.Lanctot J K, Oommen B J. Discretized estimator learning automata. IEEE Trans Syst Man Cybern, 1992, 22: 1473–1483MathSciNetCrossRefGoogle Scholar
- 20.Thathachar M A L, Phansalkar V V. Learning the global maximum with parameterized learning automata. IEEE Trans Neural Netw, 1995, 6: 398–406CrossRefGoogle Scholar
- 21.Phansalkar V V, Thathachar M A L. Local and global optimization algorithms for generalized learning automata. Neural Comput, 1995, 7: 950–973CrossRefGoogle Scholar
- 22.Hauwere Y-M De, Vrancx P, Nowé A. Generalized learning automata for multi-agent reinforcement learning. AI Commun, 2010, 23: 311–324MathSciNetMATHGoogle Scholar
- 23.Viswanathan R, Narendra K S. A note on the linear reinforcement scheme for variable-structure stochastic automata. IEEE Trans Syst Man Cybern, 1972, SMC-2: 292–294MathSciNetGoogle Scholar
- 24.Poznyak S, Najim K. On nonlinear reinforcement schemes. IEEE Trans Automat Contr, 1997, 42: 1002–1004MathSciNetCrossRefMATHGoogle Scholar
- 25.Stoica F, Popa E M. An absolutely expedient learning algorithm for stochastic automata. WSEAS Trans COMPUTERS, 2007, 6: 229–235Google Scholar
- 26.Stoica F, Popa E M. A new evolutionary reinforcement scheme for stochastic learning automata. In: Mastorakis N E, Mladenov V, Bojkovic Z, et al., eds. The Proceedings of the 12th WSEAS International Conference on Computers, Stevens Point, Wisconsin, USA, 2008. 268–273Google Scholar
- 27.Simian D, Stoica F. A new nonlinear reinforcement scheme for stochastic learning automata. In: The Proceedings of 12th WSEAS International Conference on Automatic control, Modeling & Simulation, Catania, Sicily, Italy, 2010. 450–454Google Scholar
- 28.Metropolis N, Rosenbluth A W, Rosenbluth M N, et al. Equation of State Calculations by Fast Computing Machines. J Chem Phys, 1953, 21: 1087–1092CrossRefGoogle Scholar
- 29.Jorgensen W L. Perspective on ‘Equation of state calculations by fast computing machines’. Theor Chem Acc, 2000, 103: 225–227CrossRefGoogle Scholar
- 30.Kirkpatrick S, Gelatt C D, Vecchi M P. Optimization by Simulated Annealing. Science, 1983, 220: 671–680MathSciNetCrossRefMATHGoogle Scholar
- 31.Černý V A. Thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. J Optim Theory Appl, 1985, 45: 41–51MathSciNetCrossRefMATHGoogle Scholar
- 32.Horowitz M J. Introduction to Psychodynamics: A New synthesis. New York: Basic Books, 1988. 17–243Google Scholar
- 33.Palm W J. System Dynamics. 2nd ed. London: McGraw-Hill Science/Engineering/Math, 2009. 172–283Google Scholar
- 34.Kiese-Himmel C. Verstärkungslernen: Operante Konditionierung. Sprache-Stimme-Gehör, 2010, 34: 1CrossRefGoogle Scholar
- 35.Dayan P, Belleine W. Reward, motivation and reinforcement learning. Neuron, 2002, 36: 285–298CrossRefGoogle Scholar
- 36.Oudeyer P Y, Kaplan F, Hafner V V. Intrinsic motivation systems for autonomous mental development. IEEE Trans Evolut Comput, 2007, 11: 265–286CrossRefGoogle Scholar
- 37.Brucke E W. Lectures on Physiology. Vienna: Braumuller, 1874.Google Scholar
- 38.Haynie D. Biological Thermodynamics. Cambridge: Cambridge University Press, 2001. 293–330CrossRefGoogle Scholar
- 39.Nicholls D G, Ferguson S J. Bioenergetics. 4th ed. Europe: Academic Press, 2013. 1–52CrossRefGoogle Scholar
- 40.Hopfield J J. Networks, computations, logic, and noise. In: Proceedings of IEEE First International Conference on Neural Networks, California, USA, 1987. 109–141Google Scholar
- 41.Neumann J von. Various techniques used in connection with random digits, in Monte Carlo Method. Applied Mathematics Series, vol. 12, Washington D.C.: U.S. Department of Commerce, National Bureau of Standards, 1951. 36–38Google Scholar
- 42.Skinner B F. ’Superstition’ in the pigeon. J Exp Psychol, 1948, 38(2): 168–172CrossRefGoogle Scholar
- 43.Wiener N. Cybernetics: Or Control and Communication in the Animal and the Machine. New York: J. Wiley, 1948. 60–132Google Scholar
- 44.Braitenberg V. Vehicles: Experiments in Synthetic Psychology. USA: The MIT Press, 1986. 95–144Google Scholar
- 45.Ooi R C. Balancing a two-wheeled autonomous robot. Dissertation of Masteral Degree. Perth: University of Western Australia, 2003. 1–7Google Scholar
- 46.Ruan X G, Li X Y, ZHAO J W, et al. A flexible two-wheeled self-balancing robot system and its motion control method. China Patent 200910084259.8, 2010-10-9Google Scholar
- 47.Asada M, Hosoda K, Kuniyoshi Y, et al. Cognitive developmental robotics: A survey. IEEE Trans Auton Ment Dev, 2009, 1: 12–34CrossRefGoogle Scholar
- 48.Wood S E, Wood E G, Boyd D. Mastering the World of Psychology. Boston: Allyn & Bacon, 2004. 333–354Google Scholar
- 49.Baranès A, Oudeyer P Y. R-IAC: Robust intrinsically motivated exploration and active learning. IEEE Trans Auton Ment De, 2009, 1: 155–169CrossRefGoogle Scholar
- 50.Oudeyer P Y, Kaplan F. What is intrinsic motivation? A typology of computational approaches. Front Neurorobot, 2007, 1: 1–14CrossRefGoogle Scholar