# Information-driven self-organization: the dynamical system approach to autonomous robot behavior

- 527 Downloads
- 21 Citations

## Abstract

In recent years, information theory has come into the focus of researchers interested in the sensorimotor dynamics of both robots and living beings. One root for these approaches is the idea that living beings are information processing systems and that the optimization of these processes should be an evolutionary advantage. Apart from these more fundamental questions, there is much interest recently in the question how a robot can be equipped with an internal drive for innovation or curiosity that may serve as a drive for an open-ended, self-determined development of the robot. The success of these approaches depends essentially on the choice of a convenient measure for the information. This article studies in some detail the use of the predictive information (PI), also called excess entropy or effective measure complexity, of the sensorimotor process. The PI of a process quantifies the total information of past experience that can be used for predicting future events. However, the application of information theoretic measures in robotics mostly is restricted to the case of a finite, discrete state-action space. This article aims at applying the PI in the dynamical systems approach to robot control. We study linear systems as a first step and derive exact results for the PI together with explicit learning rules for the parameters of the controller. Interestingly, these learning rules are of Hebbian nature and local in the sense that the synaptic update is given by the product of activities available directly at the pertinent synaptic ports. The general findings are exemplified by a number of case studies. In particular, in a two-dimensional system, designed at mimicking embodied systems with latent oscillatory locomotion patterns, it is shown that maximizing the PI means to recognize and amplify the latent modes of the robotic system. This and many other examples show that the learning rules derived from the maximum PI principle are a versatile tool for the self-organization of behavior in complex robotic systems.

## Keywords

Autonomous systems Predictive information Self-organization Sensorimotor loop Embodiment Hebbian learning Intrinsic motivation## Notes

### Acknowledgments

Part of this work was completed during a stay of Nihat Ay and Ralf Der at the CSIRO in Sydney, Australia. Hospitality and financial support are gratefully acknowledged. Nihat Ay also acknowledges support by the Santa Fe Institute at the early stage of the paper. Mikhail Prokopenko thanks the Max Planck Institute of Mathematics in the Sciences in Leipzig, Germany, for support and hospitality at the Institute. The authors thank the anonymous reviewer for many important comments that helped to improve the paper substantially.

## References

- Amari S-I (1998) Natural gradient works efficiently in learning. Neural Comput 10:251–276CrossRefGoogle Scholar
- Anthony T, Polani D, Nehaniv CL (2009) Impoverished empowerment: ‘meaningful’ action sequence generation through bandwidth limitation. In: Kampis G, Szathmry E (eds), vol 2. Springer, Budapest, pp 294–301Google Scholar
- Ay N, Bertschinger H, Der R, Güttler F, Olbrich E (2008) Predictive information and explorative behavior of autonomous robots. Eur Phys J B 63(3):329–339CrossRefGoogle Scholar
- Ay N, Bernigau H, Der R, Martius G (2011) Information-driven homeokinesis (in preparation)Google Scholar
- Baldassarre G (2008) Self-organization as phase transition in decentralized groups of robots: a study based on Boltzmann entropy. In: Prokopenko M (ed) Advances in applied self-organizing systems. Springer, Berlin, pp 127–146CrossRefGoogle Scholar
- Barto AG (2004) Intrinsically motivated learning of hierarchical collections of skills. In: Proceedings of 3rd international conference development Learning, San Diego, CA, USA, pp 112–119Google Scholar
- Bialek W, Nemenman I, Tishby N (2001) Predictability, complexity and learning. Neural Comput 13:2409PubMedCrossRefGoogle Scholar
- Cover TM, Thomas JA (2006) Elements of information theory. Wiley, New YorkGoogle Scholar
- Crutchfield JP, Young K (1989) Inferring statistical complexity. Phys Rev Lett 63:105–108PubMedCrossRefGoogle Scholar
- DelSole T (2004) Predictability and information theory. Part I: Measures of predictability. J Atmos Sci 61(3):2425–2440CrossRefGoogle Scholar
- Der R (2001) Self-organized acquisition of situated behaviors. Theory Biosci 120:179–187Google Scholar
- Der R, Liebscher R (2002) True autonomy from self-organized adaptivity. In: Proceedings of EPSRC/BBSRC international workshop on biologically inspired robotics. HP Labs, BristolGoogle Scholar
- Der R, Martius G (2006) From motor babbling to purposive actions: emerging self-exploration in a dynamical systems approach to early robot development. In: Nolfi S, Baldassarre G, Calabretta R, Hallam JCT, Marocco D, Meyer J-A, Miglino O, Parisi D (eds) Proceedings from animals to animats 9 (SAB 2006). LNCS, vol 4095. Springer, pp 406–421Google Scholar
- Der R, Martius G (2011) The playful machine—theoretical foundation and practical realization of self-organizing robots. Springer, BerlinGoogle Scholar
- Der R, Hesse F, Martius G (2005) Learning to feel the physics of a body. In: Proceedings of the international conference on computational intelligence for modelling, control and automation (CIMCA 06). IEEE Computer Society, Washington, DC, pp 252–257Google Scholar
- Der R, Hesse F, Martius G (2006a) Rocking stamper and jumping snake from a dynamical system approach to artificial life. Adapt Behav 14(2):105–115CrossRefGoogle Scholar
- Der R, Martius G, Hesse F (2006b) Let it roll–emerging sensorimotor coordination in a spherical robot. In: Rocha LM, Yaeger LS, Bedau MA, Floreano D, Goldstone RL, Vespignani A (eds) Proceedings of the artificial life X, August. International Society for Artificial Life, MIT Press, pp 192–198Google Scholar
- Der R, Güttler F, Ay N (2008) Predictive information and emergent cooperativity in a chain of mobile robots. In: Artificial Life XI. MIT Press, CambridgeGoogle Scholar
- Engel Y (2010) Gaussian process reinforcement learning. In: Claude S, Geoffrey IW (eds) Encyclopedia of machine learning. Springer, pp 439–447Google Scholar
- Georgiev P, Cichocki A, Amari S-I (2001) On some extensions of the natural gradient algorithm. In: Proceedings of the 3rd international conference on independent component analysis and blind signal separation, pp 581–585Google Scholar
- Grassberger P (1986) Toward a quantitative theory of self-generated complexity. Int J Theor Phys 25(9):907–938CrossRefGoogle Scholar
- Kantz H, Schreiber T (2003) Nonlinear time series analysis, 2nd ed. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- Kaplan F, Oudeyer P-Y (2004) Maximizing learning progress: an internal reward system for development. In: Iida F, Pfeifer R, Steels L, Kuniyoshi Y (eds) Embodied artificial intelligence, Lecture Notes in Computer Science, vol 3139. Springer, pp 629–629Google Scholar
- Klyubin AS, Polani D, Nehaniv CL (2005) Empowerment: a universal agent-centric measure of control. In: Congress on evolutionary computation, pp 128–135Google Scholar
- Klyubin AS, Polani D, Nehaniv CL (2007) Representations of space and time in the maximization of information flow in the perception-action loop. Neural Comput 19:2387–2432PubMedCrossRefGoogle Scholar
- Kobayashi S, Nomizu K (1963) Foundations of differential geometry. Wiley, New YorkGoogle Scholar
- Kober J, Peters J (2009) Policy search for motor primitives in robotics. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Twenty-Second annual conference on neural information processing systems, Red Hook, NY, USA, 06 2009, Curran, pp 849–856Google Scholar
- Kühnel W (2006) Differential geometry, vol 16. American Mathematical Society Student Mathematical LibraryGoogle Scholar
- Lungarella M, Pegors T, Bulwinkle D, Sporns O (2005) Methods for quantifying the informational structure of sensory and motor data. Neuroinformatics 3(3):243–262PubMedCrossRefGoogle Scholar
- Magnus J, Neudecker H (1988) Matrix differential calculus with applications in statistics and econometrics. Wiley, New YorkGoogle Scholar
- Martius G (2010) Goal-oriented control of self-organizing behavior in autonomous robots. PhD thesis, Georg-August-Universität GöttingenGoogle Scholar
- Martius G, Herrmann J (2010) Taming the beast: guided self-organization of behavior in autonomous robots. In: Doncieux S, Girard B, Guillot A, Hallam J, Meyer J-A, Mouret J-B (eds) From animals to animats 11. LNCS, vol 6226. Springer, pp 50–61Google Scholar
- Martius G, Herrmann JM, Der R (2007) Guided self-organisation for autonomous robot development. In: Almeida e Costa F, Rocha L, Costa E, Harvey I, Coutinho A (eds) Proceedings of the advances in artificial life, 9th European conference (ECAL 2007). LNCSm, vol 4648. Springer, pp 766–775Google Scholar
- Oudeyer P-Y, Kaplan F, Hafner V (2007) Intrinsic motivation systems for autonomous mental development. IEEE Trans Evol Comput 11(2):265–286CrossRefGoogle Scholar
- Pearl J (2000) Causality. Cambridge University Press, CambridgeGoogle Scholar
- Pfeifer R, Bongard JC (2006) How the Body Shapes the Way We Think: A New View of Intelligence. MIT Press, CambridgeGoogle Scholar
- Pfeifer R, Lungarella M, Iida F (2007) Self-organization, embodiment, and biologically inspired robotics. Science 318:1088–1093PubMedCrossRefGoogle Scholar
- Prokopenko M, Wang P, Price D, Valencia P, Foreman M, Farmer AJ (2005) Self-organizing hierarchies in sensor and communication networks. Artif Life 11(4):407–426PubMedCrossRefGoogle Scholar
- Prokopenko M, Gerasimov V, Tanev I (2006) Evolving spatiotemporal coordination in a modular robotic system. In: Nolfi S, Baldassarre G, Calabretta R, Hallam J, Marocco D, Meyer J-A, Parisi D (eds) From animals to animats 9: 9th international conference on the simulation of adaptive behavior (SAB 2006). Lecture Notes in Computer Science, vol 4095. Springer, pp 558–569Google Scholar
- Schmidhuber J (1990) A possibility for implementing curiosity and boredom in model-building neural controllers. In: Proceedings of the first international conference on simulation of adaptive behavior. MIT Press, Cambridge, pp 222–227Google Scholar
- Schmidhuber J (2007) Simple algorithmic principles of discovery, subjective beauty, selective attention, curiosity and creativity. Springer, BerlinGoogle Scholar
- Schmidhuber J (2009) Driven by compression progress: a simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In: Pezzulo G, Butz MV, Sigaud O, Baldassarre G (eds) Anticipatory behavior in adaptive learning systems, Lecture Notes in Computer Science, vol 5499. Springer, pp 48–76Google Scholar
- Spivak M (1999) Differential geometry, vol 1. Publish or Perish, Inc., BerkeleyGoogle Scholar
- Steels L (2004) The autotelic principle. In: Iida F, Pfeifer R, Steels L, Kuniyoshi Y (eds) Embodied artificial intelligence, Lecture Notes in Computer Science, vol 3139. Springer, pp 629–629Google Scholar
- Storck J, Hochreiter S, Schmidhuber J (1995) Reinforcement driven information acquisition in non-deterministic environments. In: Proceedings of the international conference on artificial neural networks, pp 159–164Google Scholar
- Theodorou EA, Buchli J, Schaal S (2010) Reinforcement learning of motor skills in high dimensions: a path integral approach. In: International conference of robotics and automation (ICRA 2010) (accepted)Google Scholar
- Willmore T (1959) Differential geometry. Oxford University Press, OxfordGoogle Scholar
- Zahedi K, Ay N, Der R (2010) Higher coordination with less control—a result of information maximization in the sensorimotor loop. Adapt Behav 18(3–4):338–355CrossRefGoogle Scholar