Chained learning architectures in a simple closed-loop behavioural context

Kulvicius, Tomas; Porr, Bernd; Wörgötter, Florentin

doi:10.1007/s00422-007-0176-y

Chained learning architectures in a simple closed-loop behavioural context

Original Paper
Published: 03 October 2007

Volume 97, pages 363–378, (2007)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

Tomas Kulvicius^1,2,
Bernd Porr³ &
Florentin Wörgötter¹

114 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Objective

Living creatures can learn or improve their behaviour by temporally correlating sensor cues where near-senses (e.g., touch, taste) follow after far-senses (vision, smell). Such type of learning is related to classical and/or operant conditioning. Algorithmically all these approaches are very simple and consist of single learning unit. The current study is trying to solve this problem focusing on chained learning architectures in a simple closed-loop behavioural context.

Methods

We applied temporal sequence learning (Porr B and Wörgötter F 2006) in a closed-loop behavioural system where a driving robot learns to follow a line. Here for the first time we introduced two types of chained learning architectures named linear chain and honeycomb chain. We analyzed such architectures in an open and closed-loop context and compared them to the simple learning unit.

Conclusions

By implementing two types of simple chained learning architectures we have demonstrated that stable behaviour can also be obtained in such architectures. Results also suggest that chained architectures can be employed and better behavioural performance can be obtained compared to simple architectures in cases where we have sparse inputs in time and learning normally fails because of weak correlations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classical and Operant Conditioning—Ivan Pavlov; Burrhus Skinner

Brain Intelligence: Go beyond Artificial Intelligence

Article 21 September 2017

Artificial Intelligence Meets Flexible Sensors: Emerging Smart Flexible Sensing Systems Driven by Machine Learning and Artificial Synapses

Article Open access 13 November 2023

References

Agostini E, Celaya A (2004) Trajectory tracking control of a rotational joint using feature-based categorization learning. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, IEEE, Sendai, Japan
Ashby WR (1956). An introduction to cybernetics. Methnen, London
Google Scholar
Bailey CH, Giustetto M, Huang YY, Hawkins RD and Kandel ER (2000). Is heterosynaptic modulation essential for stabilizing Hebbian plasticity and memory. Nat Rev Neurosci 1(1): 11–20
Article PubMed CAS Google Scholar
Barto A (1995). Reinforcement learning in motor control. In: Arbib, M (eds) Handbook of brain theory and neural networks., pp 809–812. MIT Press, Cambridge
Google Scholar
Barto AG, Sutton RS and Anderson CW (1983). Neuronlike elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13: 835–846
Google Scholar
Braitenberg V (1984). Vehicles: experiments in synthetic psychology. MIT Press, Cambridge
Google Scholar
Gewirtz JC and Davis M (2000). Using pavlovian higher-order conditioning paradigms to investigate the neural substrates of emotional learning and memory. Learn Mem 7(5): 257–266
Article PubMed CAS Google Scholar
Gomi H and Kawato M (1993). Neural network control for a closed-loop system using feedback-error-learning. Neural Netw 6(7): 933–946
Article Google Scholar
Humeau Y, Shaban H, Bissiere S and Luthi A (2003). Presynaptic induction of heterosynaptic associative plasticity in the mammalian brain. Nature 426(6968): 841–845
Article PubMed CAS Google Scholar
Ikeda H, Akiyama G, Fujii Y, Minowa R, Koshikawa N and Cools A (2003). Role of AMPA and NMDA receptors in the nucleus accumbens shell in turning behaviour of rats: interaction with dopamine and receptors. Neuropharmacology 44: 81–87
Article PubMed CAS Google Scholar
Jara E, Vila J and Maldonado A (2006). Second-order conditioning of human causal learning. Learn Motiv 37: 230–246
Article Google Scholar
Jay T (2003). Dopamine: a potential substrate for synaptic plasticity and memory mechanisms. Prog Neurobiol 69(6): 375–390
Article PubMed CAS Google Scholar
Jodogne S, Scalzo F, Piater JH (2005) Task-driven learning of spatial combinations of visual features. In: Proceedings of the IEEE workshop on learning in computer vision and pattern recognition, IEEE, San Diego (CA, USA)
Kelley AE (1999). Functional specificity of ventral striatal compartments in appetitive behaviors. Ann NY Acad Sci 877: 71–90
Article PubMed CAS Google Scholar
Klopf AH (1988). A neuronal model of classical conditioning. Psychobiology 16(2): 85–123
Google Scholar
Kolodziejski C, Wörgötter F, Porr B (2007) Mathematical properties of neuronal TD-rules and differential Hebbian learning: A comparison. Biol Cybern (submitted)
Kosco B (1986) Differential Hebbian learning. In: Denker JS (ed) Neural networks for computing: AIP Conference Proceedings, vol. 151. American Institute of Physics, New York
Land MF (2001) Does steering a car involve perception of the velocity flow field. In: Zeil JMZJ (ed) Motion vision—computational, neural, and ecological constraints, pp. 227–235
Manoonpong P, Geng T, Kulvicius T, Porr B, Wörgötter F (2007) Adaptive, fast walking in a biped robot under neuronal control and learning. PLoS Comput Biol 3(7):e134 doi:10.1371/journal.pcbi.0030,134
McClelland JL, Rumelhart DE and Hinton GE (1987). Parallel distributed processing, vol 1. MIT Press, Cambridge
Google Scholar
McFarland DJ (1971). Feedback mechanisms in animal behaviour. Academic, London
Google Scholar
McKinstry JL, Edelman GM and Krichmar JL (2006). A cerebellar model for predictive motor control tested in a brain-based device. Proc Natl Acad Sci USA 103(9): 3387–3392
Article PubMed CAS Google Scholar
Montague PR, Dayan P, Person C and Sejnowski TJ (1995). Bee foraging in uncertain environments using predictive Hebbian learning. Nature 377: 725–728
Article PubMed CAS Google Scholar
Nakanishi J and Schaal S (2004). Feedback error learning and nonlinear adaptive control. Neural Netw 17: 1453–1465
Article PubMed Google Scholar
Niv Y, Joel D, Meilijson I and Ruppin E (2002). Evolution of reinforcement learning in uncertain environments: a simple explanation for complex foraging behaviors. Adapt Behav 10(1): 5–24
Article Google Scholar
Pomerleau D (1996). Neural network vision for robot driving. In: Nayar, S and Poggio, T (eds) Early visual learning., pp 161–181. Oxford University Press, New York
Google Scholar
Porr B and Wörgötter F (2003a). Isotropic sequence order learning. Neural Comp 15: 831–864
Article Google Scholar
Porr B and Wörgötter F (2003b). Isotropic sequence order learning in a closed loop behavioural system. R Soc Phil Trans Math Phys Eng Sci 361(1811): 2225–2244
Article Google Scholar
Porr B and Wörgötter F (2006). Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only. Neural Comp 18(6): 1380–1412
Article Google Scholar
Porr B, Ferber C and Worgotter F (2003). Iso-learning approximates a solution to the inverse controller problem in an unsupervised behavioural paradigm. Neural Comp 15: 865–884
Article Google Scholar
Porr B, Wörgötter F and Ferber C (2003). ISO-learning approximates a solution to the inverse-controller problem in an unsupervised behavioral paradigm. Neural Comp 15: 865–884
Article Google Scholar
Rescorla RA (1980). Pavlovian second-order conditioning: studies in associative learning. Erlbaum, Hillsdale
Google Scholar
Schultz W and Suri RE (2001). Temporal difference model reproduces anticipatory neural activity. Neural Comp 13(4): 841–862
Article Google Scholar
Suri RE and Schultz W (1998). Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp Brain Res 121: 350–354
Article PubMed CAS Google Scholar
Sutton R and Barto A (1981). Towards a modern theory of adaptive networks: expectation and prediction. Psychol Rev 88: 135–170
Article PubMed CAS Google Scholar
Sutton RS (1988). Learning to predict by the methods of temporal differences. Mach Learn 3: 9–44
Google Scholar
Sutton RS and Barto AG (1990). Time-derivative models of Pavlovian reinforcement. In: Gabriel, M and Moore, J (eds) Learning and computational neuroscience: foundation of adaptive networks., pp. MIT Press, Cambridge
Google Scholar
Sutton RS and Barto AG (1998). Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Tsukamoto M, Yasui T, Yamada MK, Nishiyama N, Matsuki N and Ikegaya Y (2003). Mossy fibre synaptic NMDA receptors trigger non-Hebbian long-term potentiation at entorhino-CA3 synapses in the rat. J Physiol 546(3): 665–675
Article PubMed CAS Google Scholar
Verschure P and Althaus P (2003). A real-world rational agent: unifying old and new AI. Cogn Sci 27: 561–590
Article Google Scholar
Verschure P and Coolen A (1991). Adaptive fields: distributed representations of classically conditioned associations. Network 2: 189–206
Article Google Scholar
Walter WG (1950). An imitation of life. Sci Am 182: 42–45
Article Google Scholar
Watkins CJCH (1989) Learning from delayed rewards. PhD Thesis, University of Cambridge, Cambridge, England
Watkins CJCH and Dayan P (1992). Technical note: Q-Learning. Mach Learn 8: 279–292
Google Scholar
Webb B (2002). Robots in invertebrate neuroscience. Nature 417: 359–363
Article PubMed CAS Google Scholar
Wiener N (1961). Cybernetics—or control and communication in the animal and the machine, 2nd edn. The MIT Press, Cambridge
Google Scholar
Witten IH (1977). An adaptive optimal controller for discrete-time Markov environments. Inf Control 34: 86–295
Article Google Scholar
Wörgötter F and Porr B (2005). Temporal sequence learning for prediction and control - a review of different models and their relation to biological mechanisms. Neural Comp 17: 245–319
Article Google Scholar
Wyss R, König P and Verschure PFMJ (2004). Involving the motor system in decision making. Proc Biol Sci 271(Suppl 3): 50–52
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bernstein Centre for Computational Neuroscience, University of Göttingen, Bunsenstr. 10, 37073, Göttingen, Germany
Tomas Kulvicius & Florentin Wörgötter
Department of Informatics, Vytautas Magnus University, Kaunas, Lithuania
Tomas Kulvicius
Department of Electronics and Electrical Engineering, University of Glasgow, Glasgow, GT12 8LT, Scotland, UK
Bernd Porr

Authors

Tomas Kulvicius
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Porr
View author publications
You can also search for this author in PubMed Google Scholar
Florentin Wörgötter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Florentin Wörgötter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kulvicius, T., Porr, B. & Wörgötter, F. Chained learning architectures in a simple closed-loop behavioural context. Biol Cybern 97, 363–378 (2007). https://doi.org/10.1007/s00422-007-0176-y

Download citation

Received: 25 January 2007
Accepted: 20 August 2007
Published: 03 October 2007
Issue Date: December 2007
DOI: https://doi.org/10.1007/s00422-007-0176-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Chained learning architectures in a simple closed-loop behavioural context