Abstract
In the real world, applications with very large state and action spaces and unknown state transition probability, classical reinforcement learning algorithms usually show poor performance. One way to address the performance problem is to approximate the policy or value function. Fuzzy rule-based systems are amongst the well-known function approximators. This paper presents a Flexible Fuzzy Reinforcement Learning algorithm, in which value function is approximated by a fuzzy rule-based system. The proposed algorithm has a separate module for tuning the structure of fuzzy rules. Moreover, the parameters of the system are tuned during the learning phase. Next, the proposed algorithm is applied to the problem of inventory control in supply chains. In this problem, a fuzzy agent (supplier) should determine the amount of orders for each retailer based on their utility for supplier, by considering its limited supply capacity. Finally, a simulation is performed to show the capability of the proposed algorithm.
Similar content being viewed by others
References
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. The MIT Press, Cambridge
Shafran AP (2011) Learning in games with risky payoffs. Games and Economic Behavior, In Press
Bazzan A, de Oliveira D, da Silva B (2010) Learning in groups of traffic signals. Eng Appl Artif Intel 23:560–568
Vengerov D (2008) A reinforcement learning framework for utility-based scheduling in resource-constrained systems, Future Generation Computer Systems
Neuneier R, Mihatsch O (2000) Risk-averse asset allocation using reinforcement learning. In Proceedings of the Seventh International Conference on Forecasting Financial Markets: Advances for Exchange Rates, Interest Rates and Asset Management
Sawh D, Ponnambalam K, Karray F (2011) Artificial intelligence modeling of financial profit and fraud. Proceedings of the World Congress on Engineering, WCE 2011:381–383
Jiang C, Sheng Z (2009) Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst Appl 36:6520–6526
Aissani N, Beldjilali B, Trentesaux D (2009) Dynamic scheduling of maintenance tasks in the petroleum industry: a reinforcement approach. Eng Appl Artif Intel 22:1089–1103
Kwon IH, Kim CO, Jun J, Lee JH (2008) Case-based myopic reinforcement learning for satisfying target service level in supply chain. Expert Syst Appl 35:389–397
Ko JM, Kwak C, Cho Y, Kim CO (2011) Adaptive product tracking in RFID-enabled large-scale supply chain. Expert Syst Appl 38:1583–1590
Valluri A, Croson DC (2005) Agent learning in supplier selection models. Decis Support Syst 39:219–240
Kim T, Bilsel RU, Kumara S (2008) Supplier selection in dynamic competitive environments. International J Serv Oper Inform 3:283–293
Gosavi A (2004) Reinforcement learning for long-run average cost. Eur J Oper Res 155:654–674
Berenji HR (1992) A reinforcement learning-based architecture for fuzzy logic control. Int J Approx Reason 6:267–292
Berenji HR, Khedkar P (1992) Learning and tuning fuzzy logic controllers through reinforcements. IEEE Trans Neural Netw 3:724–740
Lin T, Lee CSG (1994) Reinforcement structure/parameter learning for neural-network-based fuzzy logic control systems. IEEE Trans Fuzzy Syst 2:41–63
Lin J, Lin CT (1996) Reinforcement learning for an ART-based fuzzy adaptive learning control network. IEEE Trans Neural Netw 7:709–731
Berenji HR, Khedkar PS (1998) Using fuzzy logic for performance evaluation in reinforcement learning. Int J Approx Reason 18:131–144
Vengerov D, Bambos N, Berenji HR (2005) A fuzzy reinforcement learning approach to power control in wireless transmitters. IEEE Trans Syst Man Cybern B 35:768–778
Vengerov D (2007) A reinforcement learning approach to dynamic resource allocation. Eng Appl Artif Intel 20:383–390
Lin C, Chen C (2011) Nonlinear system control using self-evolving neural fuzzy inference networks with reinforcement evolutionary learning. Appl Soft Comput J 11:5463–5476
da Motta Salles Barreto A, Anderson CW (2008) Restricted gradient-descent algorithm for value-function approximation in reinforcement learning. Artif Intell 172:454–482
Jouffe L (1998) Fuzzy inference system learning by reinforcement learning. IEEE Trans Syst Man Cybern 28:338–355
Berenji HR, Vengerov D (2003) A convergent actor—critic-based FRL algorithm with application to power management of wireless transmitters, IEEE trans. Fuzzy Systems 11, AUGUST
Fazel Zarandi MH, Jouzdani J, Turksen IB (2007) Generalized reinforcement learning fuzzy control with vague states, in: analysis and design of intelligent systems using soft computing techniques, Springer, Berlin, 41:811–820
Berenji HR, Vengerov D (1999) Cooperation and coordination between fuzzy reinforcement learning agents in continuous state partially observable Markov decision processes, Proceedings of 8th IEEE Int. Conf. Fuzzy Systems, (FUZZ-IEEE’99) 621–627
Berenji HR, Vengerov D (2000) Advantages of cooperation between reinforcement learning agents in difficult stochastic problems, Proceedings of 9th IEEE Int. Conf. Fuzzy Systems, (FUZZ-IEEE 2000), 871–876
Vengerov D (2008) A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments. Futur Gener Comput Syst 24:687–693
Sugeno M, Kang GT (1988) Structure identification of fuzzy model. Fuzzy Sets Syst 28:15–33
Sugeno M, Yasukawa T (1993) A fuzzy-logic based approach to qualitative modeling. IEEE Transactions on Fuzzy Systems
Setnes M, Babuska R, Kaymak U, van Nauta Lemke HR (1998) Similarity Measures in Fuzzy Rule Base Simplification, IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 28
Tsitsiklis JN, Van Roy B (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans Automat Control 42:674–690
Yao Y, Evers PT, Dresner ME (2007) Supply chain integration in vendor-managed inventory. Decis Support Syst 43:663–674
Tesauro G, Das R, Walsh WE, Kephart JO (2005) Utility-function driven resource allocation in autonomic systems. In: Proceedings of the Second IEEE International Conference on Autonomic Computing (ICAC-05)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zarandi, M.H.F., Moosavi, S.V. & Zarinbal, M. A fuzzy reinforcement learning algorithm for inventory control in supply chains. Int J Adv Manuf Technol 65, 557–569 (2013). https://doi.org/10.1007/s00170-012-4195-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00170-012-4195-z