Skip to main content

Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems


This article presents two new algorithms for finding the optimal solution of a Multi-agent Multi-objective Reinforcement Learning problem. Both algorithms make use of the concepts of modularization and acceleration by a heuristic function applied in standard Reinforcement Learning algorithms to simplify and speed up the learning process of an agent that learns in a multi-agent multi-objective environment. In order to verify performance of the proposed algorithms, we considered a predator-prey environment in which the learning agent plays the role of prey that must escape the pursuing predator while reaching for food in a fixed location. The results show that combining modularization and acceleration using a heuristics function indeed produced simplification and speeding up of the learning process in a complex problem when comparing with algorithms that do not make use of acceleration or modularization techniques, such as Q-Learning and Minimax-Q.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    1 The modules used in this problem were: navigate the room, do not hit a wall, pass through a door, find the base to recharge and recharge the battery.


  1. 1.

    Bianchi RAC, Ribeiro CHC, Costa AHR (2007) Heuristic selection of actions in multiagent reinforcement learning.In: International joint conference on artificial intellifence, vol 20. Morgan Kaufmann, Hyderabad, India, pp 6–12

    Google Scholar 

  2. 2.

    Bianchi RAC, Ribeiro CHC, Costa AHR (2008) Accelerating autonomous learning by using heuristic selection of actions. J Heuristics 14(2):135–168

    Article  Google Scholar 

  3. 3.

    Humphrys M (1997) Action selection methods using reinforcement learning. Ph.D. thesis, University of Cambridge

  4. 4.

    Leyton-Brown K, Shoham Y (2008) Essentials of game theory: a concise multidisciplinary introduction, vol 2. Morgan & Claypool Publishers

  5. 5.

    Lin LJ (1993) Hierarchical learning of robot skills by reinforcement.In: International conference on neural networks 1993. IEEE, Nagoya, pp 181–186

    Google Scholar 

  6. 6.

    Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning.In: Proceedings of the 11th international conference on machine learning. Morgan Kaufmann, New Brunswick, pp 157–163

    Google Scholar 

  7. 7.

    Littman ML (2001) Friend-or-foe q-learning in general-sum games.In: Proceedings of the 8th international conference on machine learning. Morgan Kaufmann, San Francisco, pp 322–328

    Google Scholar 

  8. 8.

    Mausam Kolobov A (2012) Planning with Markov decision processes: an AI perspective, vol 6. Morgan & Claypool Publishers

  9. 9.

    Russell SJ, Norvig P (2004) Artificial intelligence, 2nd edn. Pearson Education India, NJ

    MATH  Google Scholar 

  10. 10.

    Singh SP (1992) Transfer of learning by composing solutions of elemental sequential tasks. Mach Learn 8(3):323–339

    MATH  Google Scholar 

  11. 11.

    Sousa CdO (2007) Aprendizagem por Reforço de Sistemas com Múltiplos Objectivos. Master’s thesis, Universidade Técnica De Lisboa

  12. 12.

    Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44

    Google Scholar 

  13. 13.

    Sutton RS, Barto AG (1998) Reinforcement learning. The MIT press, Cambridge, MA

    Google Scholar 

  14. 14.

    Sutton RS, Modayil J, Delp M, Degris T, Pilarski PM, White A, Precup D (2011) Horde: a scalable real-time architeture for learning knowledge from unsupervised sensorimotor interaction.In: Proceedings of 10th international conference on autonomous agents and multiagent system (AAMAS 2011), pp 761–768

  15. 15.

    Tham CK, Prager RW (1994) A modular Q-learning architecture for manipulator task decomposition.In: Proceedings of the 11th international conference on machine learning. Citeseer, Morgan Kaufmann, New Brunswick, NJ, pp 309–317

    Google Scholar 

  16. 16.

    Vamplew P, Yearwood J, Dazeley R, Berry A (2008) On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts. In: Wobcke W, Zhang M (eds)AI 2008: advances in artificial intelligence. Lecture notes in computer science, vol 5360. Springer Berlin/Heidelberg, Auckland, Nova Zelândia, pp 372–378

    Google Scholar 

  17. 17.

    Watkins CJCH (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge

  18. 18.

    Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):279–292

    MATH  Google Scholar 

Download references


The authors would like to thank the National Laboratory for Scientific Computing (LNCC) for providing equipment that allowed the realization of the experiments. Leonardo Anjoletto Ferreira acknowledges support from CNPq (grant 151521/2010-7) and CAPES. Carlos H. C. Ribeiro thanks CNPq (grant 305772/2010-4).

Author information



Corresponding author

Correspondence to Leonardo Anjoletto Ferreira.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ferreira, L.A., Costa Ribeiro, C.H. & da Costa Bianchi, R.A. Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems. Appl Intell 41, 551–562 (2014).

Download citation


  • Reinforcement learning
  • Heuristically accelerated reinforcement learning
  • Multi-agent systems
  • Multi-objective problems