Abstract
Objective
Living creatures can learn or improve their behaviour by temporally correlating sensor cues where near-senses (e.g., touch, taste) follow after far-senses (vision, smell). Such type of learning is related to classical and/or operant conditioning. Algorithmically all these approaches are very simple and consist of single learning unit. The current study is trying to solve this problem focusing on chained learning architectures in a simple closed-loop behavioural context.
Methods
We applied temporal sequence learning (Porr B and Wörgötter F 2006) in a closed-loop behavioural system where a driving robot learns to follow a line. Here for the first time we introduced two types of chained learning architectures named linear chain and honeycomb chain. We analyzed such architectures in an open and closed-loop context and compared them to the simple learning unit.
Conclusions
By implementing two types of simple chained learning architectures we have demonstrated that stable behaviour can also be obtained in such architectures. Results also suggest that chained architectures can be employed and better behavioural performance can be obtained compared to simple architectures in cases where we have sparse inputs in time and learning normally fails because of weak correlations.
Similar content being viewed by others
References
Agostini E, Celaya A (2004) Trajectory tracking control of a rotational joint using feature-based categorization learning. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, IEEE, Sendai, Japan
Ashby WR (1956). An introduction to cybernetics. Methnen, London
Bailey CH, Giustetto M, Huang YY, Hawkins RD and Kandel ER (2000). Is heterosynaptic modulation essential for stabilizing Hebbian plasticity and memory. Nat Rev Neurosci 1(1): 11–20
Barto A (1995). Reinforcement learning in motor control. In: Arbib, M (eds) Handbook of brain theory and neural networks., pp 809–812. MIT Press, Cambridge
Barto AG, Sutton RS and Anderson CW (1983). Neuronlike elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13: 835–846
Braitenberg V (1984). Vehicles: experiments in synthetic psychology. MIT Press, Cambridge
Gewirtz JC and Davis M (2000). Using pavlovian higher-order conditioning paradigms to investigate the neural substrates of emotional learning and memory. Learn Mem 7(5): 257–266
Gomi H and Kawato M (1993). Neural network control for a closed-loop system using feedback-error-learning. Neural Netw 6(7): 933–946
Humeau Y, Shaban H, Bissiere S and Luthi A (2003). Presynaptic induction of heterosynaptic associative plasticity in the mammalian brain. Nature 426(6968): 841–845
Ikeda H, Akiyama G, Fujii Y, Minowa R, Koshikawa N and Cools A (2003). Role of AMPA and NMDA receptors in the nucleus accumbens shell in turning behaviour of rats: interaction with dopamine and receptors. Neuropharmacology 44: 81–87
Jara E, Vila J and Maldonado A (2006). Second-order conditioning of human causal learning. Learn Motiv 37: 230–246
Jay T (2003). Dopamine: a potential substrate for synaptic plasticity and memory mechanisms. Prog Neurobiol 69(6): 375–390
Jodogne S, Scalzo F, Piater JH (2005) Task-driven learning of spatial combinations of visual features. In: Proceedings of the IEEE workshop on learning in computer vision and pattern recognition, IEEE, San Diego (CA, USA)
Kelley AE (1999). Functional specificity of ventral striatal compartments in appetitive behaviors. Ann NY Acad Sci 877: 71–90
Klopf AH (1988). A neuronal model of classical conditioning. Psychobiology 16(2): 85–123
Kolodziejski C, Wörgötter F, Porr B (2007) Mathematical properties of neuronal TD-rules and differential Hebbian learning: A comparison. Biol Cybern (submitted)
Kosco B (1986) Differential Hebbian learning. In: Denker JS (ed) Neural networks for computing: AIP Conference Proceedings, vol. 151. American Institute of Physics, New York
Land MF (2001) Does steering a car involve perception of the velocity flow field. In: Zeil JMZJ (ed) Motion vision—computational, neural, and ecological constraints, pp. 227–235
Manoonpong P, Geng T, Kulvicius T, Porr B, Wörgötter F (2007) Adaptive, fast walking in a biped robot under neuronal control and learning. PLoS Comput Biol 3(7):e134 doi:10.1371/journal.pcbi.0030,134
McClelland JL, Rumelhart DE and Hinton GE (1987). Parallel distributed processing, vol 1. MIT Press, Cambridge
McFarland DJ (1971). Feedback mechanisms in animal behaviour. Academic, London
McKinstry JL, Edelman GM and Krichmar JL (2006). A cerebellar model for predictive motor control tested in a brain-based device. Proc Natl Acad Sci USA 103(9): 3387–3392
Montague PR, Dayan P, Person C and Sejnowski TJ (1995). Bee foraging in uncertain environments using predictive Hebbian learning. Nature 377: 725–728
Nakanishi J and Schaal S (2004). Feedback error learning and nonlinear adaptive control. Neural Netw 17: 1453–1465
Niv Y, Joel D, Meilijson I and Ruppin E (2002). Evolution of reinforcement learning in uncertain environments: a simple explanation for complex foraging behaviors. Adapt Behav 10(1): 5–24
Pomerleau D (1996). Neural network vision for robot driving. In: Nayar, S and Poggio, T (eds) Early visual learning., pp 161–181. Oxford University Press, New York
Porr B and Wörgötter F (2003a). Isotropic sequence order learning. Neural Comp 15: 831–864
Porr B and Wörgötter F (2003b). Isotropic sequence order learning in a closed loop behavioural system. R Soc Phil Trans Math Phys Eng Sci 361(1811): 2225–2244
Porr B and Wörgötter F (2006). Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only. Neural Comp 18(6): 1380–1412
Porr B, Ferber C and Worgotter F (2003). Iso-learning approximates a solution to the inverse controller problem in an unsupervised behavioural paradigm. Neural Comp 15: 865–884
Porr B, Wörgötter F and Ferber C (2003). ISO-learning approximates a solution to the inverse-controller problem in an unsupervised behavioral paradigm. Neural Comp 15: 865–884
Rescorla RA (1980). Pavlovian second-order conditioning: studies in associative learning. Erlbaum, Hillsdale
Schultz W and Suri RE (2001). Temporal difference model reproduces anticipatory neural activity. Neural Comp 13(4): 841–862
Suri RE and Schultz W (1998). Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp Brain Res 121: 350–354
Sutton R and Barto A (1981). Towards a modern theory of adaptive networks: expectation and prediction. Psychol Rev 88: 135–170
Sutton RS (1988). Learning to predict by the methods of temporal differences. Mach Learn 3: 9–44
Sutton RS and Barto AG (1990). Time-derivative models of Pavlovian reinforcement. In: Gabriel, M and Moore, J (eds) Learning and computational neuroscience: foundation of adaptive networks., pp. MIT Press, Cambridge
Sutton RS and Barto AG (1998). Reinforcement learning: an introduction. MIT Press, Cambridge
Tsukamoto M, Yasui T, Yamada MK, Nishiyama N, Matsuki N and Ikegaya Y (2003). Mossy fibre synaptic NMDA receptors trigger non-Hebbian long-term potentiation at entorhino-CA3 synapses in the rat. J Physiol 546(3): 665–675
Verschure P and Althaus P (2003). A real-world rational agent: unifying old and new AI. Cogn Sci 27: 561–590
Verschure P and Coolen A (1991). Adaptive fields: distributed representations of classically conditioned associations. Network 2: 189–206
Walter WG (1950). An imitation of life. Sci Am 182: 42–45
Watkins CJCH (1989) Learning from delayed rewards. PhD Thesis, University of Cambridge, Cambridge, England
Watkins CJCH and Dayan P (1992). Technical note: Q-Learning. Mach Learn 8: 279–292
Webb B (2002). Robots in invertebrate neuroscience. Nature 417: 359–363
Wiener N (1961). Cybernetics—or control and communication in the animal and the machine, 2nd edn. The MIT Press, Cambridge
Witten IH (1977). An adaptive optimal controller for discrete-time Markov environments. Inf Control 34: 86–295
Wörgötter F and Porr B (2005). Temporal sequence learning for prediction and control - a review of different models and their relation to biological mechanisms. Neural Comp 17: 245–319
Wyss R, König P and Verschure PFMJ (2004). Involving the motor system in decision making. Proc Biol Sci 271(Suppl 3): 50–52
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kulvicius, T., Porr, B. & Wörgötter, F. Chained learning architectures in a simple closed-loop behavioural context. Biol Cybern 97, 363–378 (2007). https://doi.org/10.1007/s00422-007-0176-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00422-007-0176-y