Skip to main content

Advertisement

Log in

Chained learning architectures in a simple closed-loop behavioural context

  • Original Paper
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

Objective

Living creatures can learn or improve their behaviour by temporally correlating sensor cues where near-senses (e.g., touch, taste) follow after far-senses (vision, smell). Such type of learning is related to classical and/or operant conditioning. Algorithmically all these approaches are very simple and consist of single learning unit. The current study is trying to solve this problem focusing on chained learning architectures in a simple closed-loop behavioural context.

Methods

We applied temporal sequence learning (Porr B and Wörgötter F 2006) in a closed-loop behavioural system where a driving robot learns to follow a line. Here for the first time we introduced two types of chained learning architectures named linear chain and honeycomb chain. We analyzed such architectures in an open and closed-loop context and compared them to the simple learning unit.

Conclusions

By implementing two types of simple chained learning architectures we have demonstrated that stable behaviour can also be obtained in such architectures. Results also suggest that chained architectures can be employed and better behavioural performance can be obtained compared to simple architectures in cases where we have sparse inputs in time and learning normally fails because of weak correlations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agostini E, Celaya A (2004) Trajectory tracking control of a rotational joint using feature-based categorization learning. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems, IEEE, Sendai, Japan

  • Ashby WR (1956). An introduction to cybernetics. Methnen, London

    Google Scholar 

  • Bailey CH, Giustetto M, Huang YY, Hawkins RD and Kandel ER (2000). Is heterosynaptic modulation essential for stabilizing Hebbian plasticity and memory. Nat Rev Neurosci 1(1): 11–20

    Article  PubMed  CAS  Google Scholar 

  • Barto A (1995). Reinforcement learning in motor control. In: Arbib, M (eds) Handbook of brain theory and neural networks., pp 809–812. MIT Press, Cambridge

    Google Scholar 

  • Barto AG, Sutton RS and Anderson CW (1983). Neuronlike elements that can solve difficult learning control problems. IEEE Trans Syst Man Cybern 13: 835–846

    Google Scholar 

  • Braitenberg V (1984). Vehicles: experiments in synthetic psychology. MIT Press, Cambridge

    Google Scholar 

  • Gewirtz JC and Davis M (2000). Using pavlovian higher-order conditioning paradigms to investigate the neural substrates of emotional learning and memory. Learn Mem 7(5): 257–266

    Article  PubMed  CAS  Google Scholar 

  • Gomi H and Kawato M (1993). Neural network control for a closed-loop system using feedback-error-learning. Neural Netw 6(7): 933–946

    Article  Google Scholar 

  • Humeau Y, Shaban H, Bissiere S and Luthi A (2003). Presynaptic induction of heterosynaptic associative plasticity in the mammalian brain. Nature 426(6968): 841–845

    Article  PubMed  CAS  Google Scholar 

  • Ikeda H, Akiyama G, Fujii Y, Minowa R, Koshikawa N and Cools A (2003). Role of AMPA and NMDA receptors in the nucleus accumbens shell in turning behaviour of rats: interaction with dopamine and receptors. Neuropharmacology 44: 81–87

    Article  PubMed  CAS  Google Scholar 

  • Jara E, Vila J and Maldonado A (2006). Second-order conditioning of human causal learning. Learn Motiv 37: 230–246

    Article  Google Scholar 

  • Jay T (2003). Dopamine: a potential substrate for synaptic plasticity and memory mechanisms. Prog Neurobiol 69(6): 375–390

    Article  PubMed  CAS  Google Scholar 

  • Jodogne S, Scalzo F, Piater JH (2005) Task-driven learning of spatial combinations of visual features. In: Proceedings of the IEEE workshop on learning in computer vision and pattern recognition, IEEE, San Diego (CA, USA)

  • Kelley AE (1999). Functional specificity of ventral striatal compartments in appetitive behaviors. Ann NY Acad Sci 877: 71–90

    Article  PubMed  CAS  Google Scholar 

  • Klopf AH (1988). A neuronal model of classical conditioning. Psychobiology 16(2): 85–123

    Google Scholar 

  • Kolodziejski C, Wörgötter F, Porr B (2007) Mathematical properties of neuronal TD-rules and differential Hebbian learning: A comparison. Biol Cybern (submitted)

  • Kosco B (1986) Differential Hebbian learning. In: Denker JS (ed) Neural networks for computing: AIP Conference Proceedings, vol. 151. American Institute of Physics, New York

  • Land MF (2001) Does steering a car involve perception of the velocity flow field. In: Zeil JMZJ (ed) Motion vision—computational, neural, and ecological constraints, pp. 227–235

  • Manoonpong P, Geng T, Kulvicius T, Porr B, Wörgötter F (2007) Adaptive, fast walking in a biped robot under neuronal control and learning. PLoS Comput Biol 3(7):e134 doi:10.1371/journal.pcbi.0030,134

  • McClelland JL, Rumelhart DE and Hinton GE (1987). Parallel distributed processing, vol 1. MIT Press, Cambridge

    Google Scholar 

  • McFarland DJ (1971). Feedback mechanisms in animal behaviour. Academic, London

    Google Scholar 

  • McKinstry JL, Edelman GM and Krichmar JL (2006). A cerebellar model for predictive motor control tested in a brain-based device. Proc Natl Acad Sci USA 103(9): 3387–3392

    Article  PubMed  CAS  Google Scholar 

  • Montague PR, Dayan P, Person C and Sejnowski TJ (1995). Bee foraging in uncertain environments using predictive Hebbian learning. Nature 377: 725–728

    Article  PubMed  CAS  Google Scholar 

  • Nakanishi J and Schaal S (2004). Feedback error learning and nonlinear adaptive control. Neural Netw 17: 1453–1465

    Article  PubMed  Google Scholar 

  • Niv Y, Joel D, Meilijson I and Ruppin E (2002). Evolution of reinforcement learning in uncertain environments: a simple explanation for complex foraging behaviors. Adapt Behav 10(1): 5–24

    Article  Google Scholar 

  • Pomerleau D (1996). Neural network vision for robot driving. In: Nayar, S and Poggio, T (eds) Early visual learning., pp 161–181. Oxford University Press, New York

    Google Scholar 

  • Porr B and Wörgötter F (2003a). Isotropic sequence order learning. Neural Comp 15: 831–864

    Article  Google Scholar 

  • Porr B and Wörgötter F (2003b). Isotropic sequence order learning in a closed loop behavioural system. R Soc Phil Trans Math Phys Eng Sci 361(1811): 2225–2244

    Article  Google Scholar 

  • Porr B and Wörgötter F (2006). Strongly improved stability and faster convergence of temporal sequence learning by utilising input correlations only. Neural Comp 18(6): 1380–1412

    Article  Google Scholar 

  • Porr B, Ferber C and Worgotter F (2003). Iso-learning approximates a solution to the inverse controller problem in an unsupervised behavioural paradigm. Neural Comp 15: 865–884

    Article  Google Scholar 

  • Porr B, Wörgötter F and Ferber C (2003). ISO-learning approximates a solution to the inverse-controller problem in an unsupervised behavioral paradigm. Neural Comp 15: 865–884

    Article  Google Scholar 

  • Rescorla RA (1980). Pavlovian second-order conditioning: studies in associative learning. Erlbaum, Hillsdale

    Google Scholar 

  • Schultz W and Suri RE (2001). Temporal difference model reproduces anticipatory neural activity. Neural Comp 13(4): 841–862

    Article  Google Scholar 

  • Suri RE and Schultz W (1998). Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Exp Brain Res 121: 350–354

    Article  PubMed  CAS  Google Scholar 

  • Sutton R and Barto A (1981). Towards a modern theory of adaptive networks: expectation and prediction. Psychol Rev 88: 135–170

    Article  PubMed  CAS  Google Scholar 

  • Sutton RS (1988). Learning to predict by the methods of temporal differences. Mach Learn 3: 9–44

    Google Scholar 

  • Sutton RS and Barto AG (1990). Time-derivative models of Pavlovian reinforcement. In: Gabriel, M and Moore, J (eds) Learning and computational neuroscience: foundation of adaptive networks., pp. MIT Press, Cambridge

    Google Scholar 

  • Sutton RS and Barto AG (1998). Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  • Tsukamoto M, Yasui T, Yamada MK, Nishiyama N, Matsuki N and Ikegaya Y (2003). Mossy fibre synaptic NMDA receptors trigger non-Hebbian long-term potentiation at entorhino-CA3 synapses in the rat. J Physiol 546(3): 665–675

    Article  PubMed  CAS  Google Scholar 

  • Verschure P and Althaus P (2003). A real-world rational agent: unifying old and new AI. Cogn Sci 27: 561–590

    Article  Google Scholar 

  • Verschure P and Coolen A (1991). Adaptive fields: distributed representations of classically conditioned associations. Network 2: 189–206

    Article  Google Scholar 

  • Walter WG (1950). An imitation of life. Sci Am 182: 42–45

    Article  Google Scholar 

  • Watkins CJCH (1989) Learning from delayed rewards. PhD Thesis, University of Cambridge, Cambridge, England

  • Watkins CJCH and Dayan P (1992). Technical note: Q-Learning. Mach Learn 8: 279–292

    Google Scholar 

  • Webb B (2002). Robots in invertebrate neuroscience. Nature 417: 359–363

    Article  PubMed  CAS  Google Scholar 

  • Wiener N (1961). Cybernetics—or control and communication in the animal and the machine, 2nd edn. The MIT Press, Cambridge

    Google Scholar 

  • Witten IH (1977). An adaptive optimal controller for discrete-time Markov environments. Inf Control 34: 86–295

    Article  Google Scholar 

  • Wörgötter F and Porr B (2005). Temporal sequence learning for prediction and control - a review of different models and their relation to biological mechanisms. Neural Comp 17: 245–319

    Article  Google Scholar 

  • Wyss R, König P and Verschure PFMJ (2004). Involving the motor system in decision making. Proc Biol Sci 271(Suppl 3): 50–52

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florentin Wörgötter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kulvicius, T., Porr, B. & Wörgötter, F. Chained learning architectures in a simple closed-loop behavioural context. Biol Cybern 97, 363–378 (2007). https://doi.org/10.1007/s00422-007-0176-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-007-0176-y

Keywords

Navigation