Abstract
The common belief is that using Reinforcement Learning methods (RL) with bootstrapping gives better results than without. However, inclusion of bootstrapping increases the complexity of the RL implementation and requires significant effort. This study investigates whether inclusion of bootstrapping is worth the effort when applying RL to inventory problems. Specifically, we investigate bootstrapping of the temporal difference learning method by using eligibility trace. In addition, we develop a new bootstrapping extension to the Residual Gradient method to supplement our investigation. The results show questionable benefit of bootstrapping when applied to inventory problems. Significance tests could not confirm that bootstrapping had statistically significantly reduced costs of inventory controlled by a RL agent. Our empirical results are based on a variety of problem settings, including demand correlations, demand variances, and cost structures.
Chapter PDF
Similar content being viewed by others
Keywords
References
Baird, L.: Residual Algorithms: Reinforcement Learning with Function Approximation. In: Proceedings of the 12th International Conference on Machine Learning, pp. 30–37. Morgan Kaufmann (1995)
Barreto, A.M.S., Anderson, C.W.: Restricted gradient-descent algorithm for value-function approximation in reinforcement learning. Artificial Intelligence 172(4-5), 454–482 (2008)
Jiang, C., Sheng, Z.: Case-based reinforcement learning for dynamic inventory control in a multi-agent supply chain system. Expert Systems with Applications 36(3), 6520–6526 (2009)
Katanyukul, T., Duff, W.S., Chong, E.K.P.: Approximate dynamic programming for an inventory problem: Empirical comparison. Computers & Industrial Engineering 60(4), 719–743 (2011)
Kim, C.O., Jun, J., Baek, J.K., Smith, R.L., Kim, Y.D.: Adaptive inventory control models for supply chain management. International Journal of Advanced Manufacturing Technology 26(9-10), 1184–1192 (2005)
Kim, C.O., Kwon, I.H., Baek, J.G.: Asynchronous action-reward learning for nonstationary serial supply chain inventory control. Applied Intelligence 28(1), 1–16 (2008)
Kwon, I.H., Kim, C.O., Jun, J., Lee, J.H.: Case-based myopic reinforcement learning for satisfying target service level in supply chain. Expert Systems with Applications 35(1-2), 389–397 (2008)
Leng, J., Jain, L., Fyfe, C.: Experimental analysis of eligibility traces strategies in temporal difference learning. International Journal of Knowledge Engineering and Soft Data Paradigms 1(1), 26–39 (2009)
Maei, H.R., Szepesvari, C., Bhatnagar, S., Precup, D., Silver, D., Sutton, R.S.: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation. In: Advances in Neural Information Processing Systems. MIT Press, Vancouver (2009)
Prestwich, S.D., Tarim, S.A., Rossi, R., Hnich, B.: A Cultural Algorithm for POMDPs from Stochastic Inventory Control. In: Blesa, M.J., Blum, C., Cotta, C., Fernández, A.J., Gallardo, J.E., Roli, A., Sampels, M. (eds.) HM 2008. LNCS, vol. 5296, pp. 16–28. Springer, Heidelberg (2008)
Reynolds, R.G.: An Introduction to Cultural Algorithms. In: Proceedings of the 3rd Annual Conference on Evolutionary Programming, pp. 131–139. World Scientific Publishing (1994)
Shervais, S., Shannon, T.T., Lendaris, G.G.: Intelligent Supply Chain Management Using Adaptive Critic Learning. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 33(2), 235–244 (2003)
Singh, S.P., Sutton, R.S.: Reinforcement Learning with Replacing Eligibility Traces. Machine Learning 22(1-3), 123–158 (1996)
Sutton, R.S., Barto, A.G.: Reinforcement Learning. MIT Press (1998)
Tesauro, G.J.: TD-Gammon, a self-teaching backgammon program, achieves master level play. Neural Computation 6(2), 215–219 (1994)
Van Roy, B., Bertsekas, D.P., Lee, Y., Tsitsiklis, J.N.: A Neuro-Dynamic Programming Approach to Retailer Inventory Management. In: Proceedings of the IEEE Conference on Decision and Control (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Katanyukul, T., Chong, E.K.P., Duff, W.S. (2012). Intelligent Inventory Control: Is Bootstrapping Worth Implementing?. In: Shi, Z., Leake, D., Vadera, S. (eds) Intelligent Information Processing VI. IIP 2012. IFIP Advances in Information and Communication Technology, vol 385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32891-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-32891-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32890-9
Online ISBN: 978-3-642-32891-6
eBook Packages: Computer ScienceComputer Science (R0)