Challenges of Machine Learning for Living Machines

  • Jordi-Ysard PuigbòEmail author
  • Xerxes D. Arsiwalla
  • Paul F. M. J. Verschure
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10928)


Machine Learning algorithms (and in particular Reinforcement Learning (RL)) have proved very successful in recent years. These have managed to achieve super-human performance in many different tasks, from video-games to board-games and complex cognitive tasks such as path-planning or Theory of Mind (ToM) on artificial agents. Nonetheless, this super-human performance is also super-artificial. Despite some metrics are better than what a human can achieve (i.e. cumulative reward), in less common metrics (i.e. time to learning asymptote) the performance is significantly worse. Moreover, the means by which those are achieved fail to extend our understanding of the human or mammal brain. Moreover, most approaches used are based on black-box optimization, making any comparison beyond performance (e.g. at the architectural level) difficult. In this position paper, we review the origins of reinforcement learning and propose its extension with models of learning derived from fear and avoidance behaviors. We argue that avoidance-based mechanisms are required when training on embodied, situated systems to ensure fast and safe convergence and potentially overcome some of the current limitations of the RL paradigm.


Reinforcement learning Neural networks Avoidance 



This work is supported by the European Research Councils CDAC project: The Role of Consciousness in Adaptive Behavior: A Combined Empirical, Computational and Robot based Approach, (ERC-2013- ADG341196).


  1. 1.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  2. 2.
    Kulkarni, T.D., et al.: Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In: NIPS (2016)Google Scholar
  3. 3.
    Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Puigbò, J.-Y., et al.: Cholinergic behavior state-dependent mechanisms of neocortical gain control: a neurocomputational study. Mol. Neuro. 55(1), 249–257 (2018)CrossRefGoogle Scholar
  5. 5.
    Schultz, W., Dayan, P., Montague, P.R.: A neural substrate of prediction and reward. Science 275(5306), 1593–1599 (1997)CrossRefGoogle Scholar
  6. 6.
    Rescorla, R.A., Wagner, A.R.: A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Class. Cond. Curr. Res. Theory 2, 64–99 (1972)Google Scholar
  7. 7.
    Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)Google Scholar
  8. 8.
    Bousmalis, K., et al.: Using simulation and domain adaptation to improve efficiency of deep robotic grasping. arXiv preprint arXiv:1709.07857 (2017)
  9. 9.
    Legenstein, R., Pecevski, D., Maass, W.: A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback. PLoS Comput. Biol. 4(10), e1000180 (2008)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Maffei, G., et al.: The perceptual shaping of anticipatory actions. Proc. R. Soc. B 284(1869), 20171780 (2017)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Jordi-Ysard Puigbò
    • 1
    • 2
    Email author
  • Xerxes D. Arsiwalla
    • 1
    • 2
    • 3
  • Paul F. M. J. Verschure
    • 2
    • 3
    • 4
  1. 1.UPF, Universitat Pompeu FabraBarcelonaSpain
  2. 2.IBEC, Institute for BioEngineering of CataloniaBarcelonaSpain
  3. 3.BIST, Barcelona Institue of Science and TechnologyBarcelonaSpain
  4. 4.ICREA, Catalan Institute for Research and Advanced StudiesBarcelonaSpain

Personalised recommendations