Combining Learning Algorithms: An Approach to Markov Decision Processes

  • Richardson Ribeiro
  • Fábio Favarim
  • Marco A. C. Barbosa
  • Alessandro L. Koerich
  • Fabrício Enembreck
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 141)


In this paper we present a technique for estimating policies which combines instance-based learning and reinforcement learning algorithms in Markovian environments. This approach has been developed for speeding up the convergence of adaptive intelligent agents that using reinforcement learning algorithms. Speeding up the learning of an intelligent agent is a complex task since the choice of inadequate updating techniques may cause delays in the learning process or even induce an unexpected acceleration that causes the agent to converge to a non-satisfactory policy. Experimental results in real-world scenarios have shown that the proposed technique is able to speed up the convergence of the agents while achieving optimal policies, overcoming problems of classical reinforcement learning approaches.


Reinforcement learning Dynamic environments Adaptive agents 


  1. 1.
    Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3/4), 279–292 (1992)CrossRefGoogle Scholar
  2. 2.
    Ribeiro, C.H.C.: A tutorial on reinforcement learning techniques. In: Proceedings of International Joint Conference on Neural Networks, Washington, USA, pp. 59–61 (1999)Google Scholar
  3. 3.
    Tesauro, G.: Temporal difference learning and td-gammon. Commun. ACM 38(3), 58–68 (1995)CrossRefGoogle Scholar
  4. 4.
    Taylor, M., Stone, P.: Using imagery to simplify perceptual abstraction in reinforcement learning agents. J. Mach. Learn. Res. (JMLR) 10(1), 1633–1685 (2009)Google Scholar
  5. 5.
    Strehl, A.L., Li, L., Littman, M.L.: Reinforcement learning in finite mdps: Pac analysis. J. Mach. Learn. Res. (JMLR) 10, 2413–2444 (2009)Google Scholar
  6. 6.
    Stula, M., Stipanicev, D., Bodrozic, L.: Intelligent modeling with agent-based fuzzy cognitive map. Int. J. Intell. Syst. 25(24), 981–1004 (2010)CrossRefGoogle Scholar
  7. 7.
    Walsh, T.J., Goschin, S., Littman, M.L.: Integrating sample-based planning and model-based reinforcement learning. In: Proceedings of 14th Conference on Artificial Intelligence (AAAI’10), vol. 1 (2010)Google Scholar
  8. 8.
    Zhang, C., Lesser, V., Abdallah, S.: Self-organization for cordinating decentralized reinforcement learning. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems. AAMAS’10, International Foundation for Autonomous Agents and Multiagent Systems, pp. 739–746 (2010)Google Scholar
  9. 9.
    Wintermute, S.: Using imagery to simplify perceptual abstraction in reinforcement learning agents. In: Proceedings of 24th Conference on Artificial Intelligence (AAAI’10), Atlanta, Georgia, USA, pp. 1567–1573 (2010)Google Scholar
  10. 10.
    Price, B., Boutilier, C.: Accelerating reinforcement learning through implicit imitation. J. Artif. Intell. Res. 19, 569–629 (2003)Google Scholar
  11. 11.
    Bianchi, R.A.C., Ribeiro, C.H.C., Costa, A.H.R.: Heuristically accelerated Q–learning: A new approach to speed up reinforcement learning. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 245–254. Springer, Heidelberg (2004)Google Scholar
  12. 12.
    Comanici, G., Precup, D.: Optimal policy switching algorithms for reinforcement learning. In: Proceedings of 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’10), pp. 709–714 (2010)Google Scholar
  13. 13.
    Banerjee, B., Kraemer, L.: Action discovery for reinforcement learning. In: Proceedings of 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’10), pp. 585–1586 (2010)Google Scholar
  14. 14.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  15. 15.
    Ribeiro, R., Enembreck, F., Koerich, A.L.: A hybrid learning strategy for discovery of policies of action. In: Sichman, J.S., Coelho, H., Rezende, S.O. (eds.) IBERAMIA-SBIA 2006. LNCS (LNAI), vol. 4140, pp. 268–277. Springer, Heidelberg (2006)Google Scholar
  16. 16.
    Jordan, P.R., Schvartzman, L.J., Wellman, M.P.: Strategy exploration in empirical games. In: Proceedings of 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’10),Toronto, Canada, vol. 1, pp. 1131–1138 (2010)Google Scholar
  17. 17.
    Amato, C., Shani, G.: High-level reinforcement learning in strategy games. In: Proceedings of 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS’10), pp. 75–82 (2010)Google Scholar
  18. 18.
    Spaan, M.T.J., Melo, F.S.: Interaction-driven markov games for decentralized multiagent planning under uncertainty. In: Proceedings of 7th International Conference on AAMAS, Estoril, Portugal, pp. 525–532 (2008)Google Scholar
  19. 19.
    Mohammadian, M.: Multi-agents systems for intelligent control of traffic signals. In: Proceedings of International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce, Sydney, Australia, p. 270 (2006)Google Scholar
  20. 20.
    Le, T., Cai, C.: A new feature for approximate dynamic programming traffic light controller. In: Proceedings of 2th International Workshop on Computational Transportation Science (IWCTS’10), San Jose, CA, USA, pp. 29–34 (2010)Google Scholar
  21. 21.
    Sislak, D., Samek, J., Pechoucek, M.: Decentralized algorithms for collision avoidance in airspace. In: Proceedings of 7th International Conference on AAMAS, Estoril, Portugal, pp. 543–550 (2008)Google Scholar
  22. 22.
    Dimitrakiev, D., Nikolova, N., Tenekedjiev, K.: Simulation and discrete event optimization for automated decisions for in-queue flights. Int. J. Intell. Syst. 25(28), 460–487 (2010)Google Scholar
  23. 23.
    Firby, R.J.: Adaptive execution in complex dynamic worlds. Ph.D. thesis, Yale University (1989)Google Scholar
  24. 24.
    Pelta, D., Cruz, C., Gonzlez, J.: A study on diversity and cooperation in a multiagent strategy for dynamic optimization problems. Int. J. Intell. Syst. 24(18), 844–861 (2009)CrossRefGoogle Scholar
  25. 25.
    Drummond, C.: Accelerating reinforcement learning by composing solutions of automatically identified subtask. J. Artif. Intell. Res. 16, 59–104 (2002)Google Scholar
  26. 26.
    Butz, M.: State value learning with an anticipatory learning classifier system in a markov decision process. Technical report, Illinois Genetic Algorithms Laboratory (2002)CrossRefGoogle Scholar
  27. 27.
    Koenig, S., Simmons, R.G.: The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms. Mach. Learn. 22(1/3), 227–250 (1996)Google Scholar
  28. 28.
    Bianchi, R.A.C., Ribeiro, C.H.C., Costa, A.H.R.: Accelerating autonomous learning by using heuristic selection of actions. J. Heuristics 14, 135–168 (2008)CrossRefGoogle Scholar
  29. 29.
    Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classifiers. IEEE Trans. Pattern Analysis Mach. Intell. 20(3), 226–239 (1998)CrossRefGoogle Scholar
  30. 30.
    Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)Google Scholar
  31. 31.
    Galvn, I., Valls, J., Garca, M., Isasi, P.: A lazy learning approach for building classification models. Int. J. Intell. Syst. 26(8), 773–786 (2011)CrossRefGoogle Scholar
  32. 32.
    Enembreck, F., Avila, B.C., Scalabrini, E.E., Barthes, J.P.A.: Learning drifting negotiations. Appl. Artif. Intell. 21, 861–881 (2007)CrossRefGoogle Scholar
  33. 33.
    Pegoraro, R., Costa, A.H.R., Ribeiro, C.H.C.: Experience generalization for multi-agent reinforcement learning. In: Proceedings of XXI International Conference of the Chilean Computer Science Society, Punta Arenas, Chile, pp. 233–239 (2001)Google Scholar
  34. 34.
    Ribeiro, R., Borges, A.P., Enembreck, F.: Interaction models for multiagent reinforcement learning. In: Proceedings of International Conferences on Computational Intelligence for Modelling, Control and Automation; Intelligent Agents, Web Technologies and Internet Commerce; and Innovation in Software Engineering, Vienna, Austria, pp. 464–469 (2008)Google Scholar
  35. 35.
    Ribeiro, R., Borges, A.P., Ronszcka, A.F., Scalabrin, E., Avila, B.C., Enembreck, F.: Combinando modelos de interao para melhorar a coordenao em sistemas multiagente. Revista de Informtica Terica e Aplicada 18, 133–157 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Richardson Ribeiro
    • 1
  • Fábio Favarim
    • 1
  • Marco A. C. Barbosa
    • 1
  • Alessandro L. Koerich
    • 2
  • Fabrício Enembreck
    • 2
  1. 1.Graduate Program in Computer EngineeringFederal Technological University of ParanáPato BrancoBrazil
  2. 2.Post-Graduate Program in Computer SciencePontificial Catholical University of ParanáCuritibaBrazil

Personalised recommendations