A Framework for Dynamic Decision Making by Multi-agent Cooperative Fault Pair Algorithm (MCFPA) in Retail Shop Application

  • Deepak A. VidhateEmail author
  • Parag Kulkarni
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 107)


The paper gives the novel framework for dynamic decision making in the retail shop application based on proposed improved Nash Q-learning by Fault Pair Algorithm. Accordingly, this approach presents three retailer shops in the retail market. Shops must support each other to gain maximum revenue from cooperative knowledge via learning their own policies. The suppliers are the intelligent agents to utilize the cooperative learning to train in the situation. Assuming significant theory on the shop’s storage plan, restock time, arrival process of the customers, the approach is formed as Markov decision process model that makes it feasible to develop the learning algorithms. The proposed algorithms obviously learn changing market situation. Moreover, the paper illustrates results of cooperative reinforcement learning algorithms using improved Nash Q-learning by Fault Pair Algorithm for three shop agents for the period of one-year sale duration. Results obtained by two approaches—Nash Q-learning and improved Nash Q-learning by Fault Pair—are compared. An agent keeps Q-functions containing joint actions and carries out modifications depending on Nash equilibrium performance for the present Q-values. Paper discovers that the agents are intended to attain a joint best possible path with Nash Q-learning. The performance of both agents enhanced after using Fault pair Nash Q-learning.


Cooperative learning Fault pair learning Reinforcement learning Multi-agent learning Nash Q-learning 


  1. 1.
    Park, K.-H., Kim, Y.-J., Kim, J.-H.: Modular Q-learning based multi-agent cooperation for robot soccer. Robot. Auton. Syst. 3026–3033 (2015)Google Scholar
  2. 2.
    Camara, M., Bonham-Carter, O., Jumadinova, J.: A multi-agent system with reinforcement learning agents for biomedical text mining. In: Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics, BCB’15, pp. 634–643, NY, USA, ACM (2015)Google Scholar
  3. 3.
    Iima, H., Kuroe, Y.: Swarm Reinforcement Learning Methods Improving Certainty of Learning for a Multi-robot Formation Problem, pp. 3026–3033. CEC (2015)Google Scholar
  4. 4.
    Vidhate, D.A., Kulkarni, P.: Expertise based cooperative reinforcement learning methods (ECRLM). In: International Conference on Information and Communication Technology for Intelligent System, Springer book series Smart Innovation, Systems and Technologies (SIST), vol. 84, pp. 350–360. Springer, Cham (2017)Google Scholar
  5. 5.
    Raju Chinthalapati, V.L., Yadati, N., Karumanchi, R.: Learning dynamic prices in multi-seller electronic retail markets with price sensitive customers, stochastic demands, and inventory replenishments. IEEE Trans. Syst. Man Cybern.—Part C: Appl. Rev. 36(1) (2008)CrossRefGoogle Scholar
  6. 6.
    Vidhate, D.A., Kulkarni, P.: Innovative approach towards cooperation models for multi-agent reinforcement learning (CMMARL). In: International Conference on Smart Trends for Information Technology and Computer Communications, pp. 468–478. Springer, Singapore (2016)Google Scholar
  7. 7.
    Choi, Y.-C., Ahn, H.-S.: A survey on multi-agent reinforcement learning: coordination problems. In: IEEE/ASME International Conference on Mechatronics and Embedded Systems and Applications, pp. 81–86 (2010)Google Scholar
  8. 8.
    Vidhate, D.A., Kulkarni, P.: Enhanced cooperative multi-agent learning algorithms (ECMLA) using reinforcement learning. In: International Conference on Computing, Analytics and Security Trends (CAST), IEEE Xplorer, pp. 556–561 (2017)Google Scholar
  9. 9.
    Gosavi, A.: Simulation-based Optimization: Parametric Optimization Techniques and Reinforcement Learning. Kluwer Academic Publishers (2003)Google Scholar
  10. 10.
    Vidhate, D.A., Kulkarni, P.: Performance enhancement of cooperative learning algorithms by improved decision-making for context-based application. In: International Conference on Automatic Control and Dynamic Optimization Techniques IEEE Xplorer, pp 246–252 (2016)Google Scholar
  11. 11.
    Wang, P.K.C.: Navigation strategies for multiple autonomous mobile robots moving in formation. J. Robotic Syst. 8(2), 177–195 (1991)zbMATHCrossRefGoogle Scholar
  12. 12.
    Matari, M.J.: Reinforcement learning in the multi-robot domain. Auton. Robots 4(1), 73–83 (1997)CrossRefGoogle Scholar
  13. 13.
    Tan, M.: Multi-agent reinforcement learning: independent versus cooperative agents. In: Proceedings of the 10th International Conference on Machine Learning, pp. 330–337. Morgan Kaufmann (1993)Google Scholar
  14. 14.
    Uchibe, E., Nakamura, M., Asada, M.: Co-evolution for cooperative behavior acquisition in a multiple mobile robot environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1, pp. 425–430, Oct 1998Google Scholar
  15. 15.
    Kim, J.H., Vadakkepat, P.: Multi-agent systems: a survey from the robot-soccer perspective. Intell. Autom. Soft Comput. 6(1), 3–18 (2000)CrossRefGoogle Scholar
  16. 16.
    Harmon, M.E., Harmon, S.S.: Reinforcement Learning: A Tutorial. Wright Lab, Wright-Patterson AFB, Ohio, USA (1997)zbMATHCrossRefGoogle Scholar
  17. 17.
    Wang, Y.: Cooperative and intelligent control of multi-robot systems using machine learning [Thesis]. The University of British Columbia (2008)Google Scholar
  18. 18.
    Duan, Y., Cui, B.X., Xu, X.H.: A multi-agent reinforcement learning approach to robot soccer. Artif. Intell. Rev. 38(3), 193–211 (2012)CrossRefGoogle Scholar
  19. 19.
    Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning, pp. 157–163 (2000)CrossRefGoogle Scholar
  20. 20.
    Hu, J., Wellman, M.P.: Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4(6), 1039–1069 (2004)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Littman, M.L.: Friend-or-foe Q-learning in general-sum games. In: Proceedings of the 18th International Conference on Machine Learning (ICML ’01), pp. 322–328 (2001)Google Scholar
  22. 22.
    Greenwald, A., Hall, K.: Correlated-Q learning. In: Proceedings of the 20th International Conference on Machine Learning, pp. 242–249, Aug 2003Google Scholar
  23. 23.
    Bowling, M.: Convergence and no-regret in multi-agent learning. Adv. Neural. Inf. Process. Syst. 17, 209–216 (2005)Google Scholar
  24. 24.
    Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometrica 68(5), 1127–1150 (2000)MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th IEEE Annual Symposium on Foundations of Computer Science, pp. 322–331, Oct 1995Google Scholar
  26. 26.
    Marden, J.R.: Learning in Large-Scale Games and Cooperative Control. University of California, Los Angeles, Los Angeles, Calif, USA (2007)Google Scholar
  27. 27.
    Vidhate, D.A., Kulkarni, P.: New approach for advanced cooperative learning algorithms using RL methods (ACLA). In: VisionNet’16 Proceedings of the Third International Symposium on Computer Vision and the Internet, ACM DL, pp. 12–20 (2016)Google Scholar
  28. 28.
    Ichikawa, Y., Takadama, K.: Designing internal reward of reinforcement learning agents in multi-step dilemma problem. J. Adv. Comput. Intell. Intell. Inf. (JACIII) 17(6), 926–931 (2013)CrossRefGoogle Scholar
  29. 29.
    Elidrisi, M., Johnson, N., Gini, M., Crandall, J.: Fast adaptive learning in repeated stochastic games by game abstraction. Auton. Agent. Multi-agent Syst. 1141–1148 (2014)Google Scholar
  30. 30.
    Karl Tuyls, K.V., Lenaerts, T.: A selection-mutation model for Q-learning in multi-agent systems. Robot. Auton. Syst. 3026–3033 (2015)Google Scholar
  31. 31.
    Vidhate, D.A., Kulkarni, P.: Enhancement in decision making with improved performance by multi-agent learning algorithms. IOSR J. Comput. Eng. 1(18), 18–25 (2016)Google Scholar
  32. 32.
    Liu, Q., Ma, J., Xie, W.: Multi-agent reinforcement learning with regret matching for robot soccer. J. Math. Problems Eng. 2013, Article ID 926267. Hindawi Publishing CorporationGoogle Scholar
  33. 33.
    Vidhate, D.A., Kulkarni, P.: Implementation of multi-agent learning algorithms for improved decision making. Int. J. Comput. Trends Technol. (IJCTT) 35(2) (2016)Google Scholar
  34. 34.
    Junling, Hu, Wellman, M.P.: Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4, 1039–1069 (2003)MathSciNetzbMATHGoogle Scholar
  35. 35.
    Vidhate, D.A., Kulkarni, P.: To improve association rule mining using new technique: multilevel relationship algorithm towards cooperative learning. In: International Conference on Circuits, Systems, Communication and Information Technology Applications, IEEE Explorer (2014)Google Scholar
  36. 36.
    Abbasi, Z., Abbasi, M.A.: Reinforcement distribution in a team of cooperative Q-learning agent. In: Proceedings of the 9th ACIS International Conference on Artificial Intelligence (2012)Google Scholar
  37. 37.
    Vidhate, D.A., Kulkarni, P.: Design of multi-agent system architecture based on association mining for cooperative reinforcement learning. Spvryan’s Int. J. Eng. Sci. Technol. (SEST) 1(1) (2014)Google Scholar
  38. 38.
    Araabi, B.N., Mastoureshgh, S., Ahmadabadi, M.N.: A study on expertise of agents and its effects on cooperative Q-learning. IEEE Trans. Evol. Comput. 14, 23–57 (2011)Google Scholar
  39. 39.
    Vidhate, D.A., Kulkarni, P.: Multilevel relationship algorithm for association rule mining used for cooperative learning. Int. J. Comput. Appl. (0975–8887). 86(4), 20–27 (2014)Google Scholar
  40. 40.
    Jiang, J., Kamel, M.S.: Aggregation of reinforcement learning algorithms. In: International Joint Conference on Neural Networks, Vancouver, Canada, 16–21 July 2006Google Scholar
  41. 41.
    Vidhate, D.A., Kulkarni, P.: A novel approach to association rule mining using multilevel relationship algorithm for cooperative learning. In: Proceedings of 4th International Conference on Advanced Computing and Communication Technologies, pp. 230–236 (2014)Google Scholar
  42. 42.
    Verikas, A., Lipnickas, A., Malmqvist, K., Gelzinis, A.: Soft combination of neural classifiers: a comparative study. Pattern Recognit. Lett. (20), 429–444 (1999)CrossRefGoogle Scholar
  43. 43.
    Vidhate, D.A., Kulkarni, P.: Cooperative machine learning with information fusion for dynamic decision making in diagnostic applications. In: International Conference on Advances in Mobile Network, Communication and its Applications (MNCAPPS), IEEE, pp 70–74 (2012)Google Scholar
  44. 44.
    Prabuchandran, K.J., Bhatnagar, S.: Multi-agent reinforcement learning for traffic signal control. In: 17th IEEE International Conference on Intelligent Transportation Systems, pp. 2529–2534 (2014)Google Scholar
  45. 45.
    Vidhate, D.A., Kulkarni, P.: A step toward decision making in diagnostic applications using single agent learning algorithms. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 7(3), 1337–1342 (2016)Google Scholar
  46. 46.
    de Cote, E.M., Lazaric, A., Restelli, M.: Learning to cooperate in multi-agent social dilemmas. Auton. Agents Multi-Agent Syst. 783–785 (2006)Google Scholar
  47. 47.
    Vidhate, D.A., Kulkarni, P.: Single agent learning algorithms for decision making in diagnostic applications. SSRG Int. J. Comput. Sci. Eng. (SSRG-IJCSE) 3(5), 46–52 (2016)CrossRefGoogle Scholar
  48. 48.
    Vidhate, D.A., Kulkarni, P.: Multi-agent cooperation models by reinforcement learning (MCMRL). Int. J. Comput. Appl. 176(1), 25–29 (2017)Google Scholar
  49. 49.
    Vidhate, D.A., Kulkarni, P.: Cooperative multi-agent reinforcement learning models (CMRLM) for intelligent traffic control. In: 1st International Conference on Intelligent Systems and Information Management (ICISIM), IEEE Xplore, pp. 325–331 (2017)Google Scholar
  50. 50.
    Vidhate, D.A., Kulkarni, P.: Intelligent traffic control by multi-agent cooperative Q learning (MCQL). In: Advances in Intelligent Systems and Computing book series, vol. 673, pp. 479–489. Springer, Singapore (2018)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer EngineeringCollege of EngineeringPuneIndia
  2. 2.iKnowlation Research Laboratory Pvt. LtdPuneIndia

Personalised recommendations