Abstract
Heuristic production control policies such as CONWIP, kanban, and other hybrid policies have been in use for years as better alternatives to MRP-based push control policies. It is a fact that these policies, although efficient, are far from optimal. Our goal is to develop a methodology that, for a given system, finds a dynamic control policy via intelligent agents. Such a policy while achieving the productivity (i.e., demand service rate) goal of the system will optimize a cost/reward function based on the WIP inventory. To achieve this goal we applied a simulation-based optimization technique called Reinforcement Learning (RL) on a four-station serial line. The control policy attained by the application of a RL algorithm was compared with the other existing policies on the basis of total average WIP and average cost of WIP. We also develop a heuristic control policy in light of our experience gained from a close examination of the policies obtained by the RL algorithm. This heuristic policy named Behavior-Based Control (BBC), although placed second to the RL policy, proved to be a more efficient and leaner control policy than most of the existing policies in the literature. The performance of the BBC policy was found to be comparable to the Extended Kanban Control System (EKCS), which as per our experimentation, turned out to be the best of the existing policies. The numerical results used for comparison purposes were obtained from a four-station serial line with two different (constant and Poisson) demand arrival processes.
Similar content being viewed by others
References
Abounadi, J. (1998) Stochastic approximation for non-expansive maps: applications to Q-learning algorithms. Unpublished Ph.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA.
Arkin, R.C. (1998) Behavior-based Robotics, 1st edn, The MIT Press, Cambridge, MA.
Askin, R.G. and Standridge, C.R. (1993) Modeling and Analysis of Manufacturing Systems, 1st edn, John Wiley & Sons, New York, NY.
Berkley, B.J. (1992) A review of the kanban production control research literature. Production and Operations Management, 1(4), 393-411.
Bertsekas, D.P. and Tsitsiklis, J.N. (1996) Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.
Bonvik, A.M., Couch, C.E. and Gershwin, S.B. (1997) A comparison of production-line control mechanisms. International Journal of Production Research, 35(3), 789-804.
Buzacott, J.A. and Shantikumar, J.G. (1992) A general approach for coordinating production in multiple cell manufacturing systems. Production and Operation Management, 1(1), 34-52.
Dallery, Y. and Liberopoulos, G. (2000) Extended kanban control system: combining kanban and base stock. IIE Transactions, 32(4), 369-386.
Das, T.K., Gosavi, A., Mahadevan, S. and Marchellack, N. (1999) Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 45(4), 560-574.
Das, T.K. and Sarkar, S. (1999) Optimal preventive maintenance in a production/inventory system. IIE Transactions, 31(6), 537-551.
Frein, Y., Di Mascolo, M. and Dallery, Y. (2000) On the design of generalized kanban control systems. International Journal of Operations and Production Management (in press).
Gershwin, S.B. (1994) Manufacturing Systems Engineering, Prentice Hall, Englewoods Cliffs, NJ.
Gosavi, A. (1999) An algorithm for solving semi-Markov decision problems using reinforcement learning: convergence analysis and numerical results. Unpublished Ph.D. Thesis, Department of Industrial Engineering, University of South Florida, Tampa, FL 33620.
Kaelbling, L.P., Littman, M.L. and Moore, A.W. (1996) Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237-285.
Law, A.M. and Kelton, W.D. (1991) Simulation Modeling and Analysis, McGraw-Hill, Inc., New York, NY.
Lutz, C.M., Davis, K.R. and Sun, M. (1998) Determining buffer location and size in production lines using tabu search. European Journal of Operational Research, 106(2/3), 301-316.
Mahadevan, S. and Theochaurus, G. (1998) Optimizing production manufacturing using reinforcement learning, in Proceedings of the Eleventh International FLAIRS Conference, AAAI Press, Menlo Park, CA, pp. 372-377.
Muckstadt, J.A. and Tayur, S.R. (1995a) Comparison of alternative kanban control mechanisms. I. background and structural results. IIE Transactions, 27(2), 140-150.
Muckstadt, J.A. and Tayur, S.R. (1995b) Comparison of alternative kanban control mechanisms. II. experimental results. IIE Transactions, 27(2), 151-161.
Putterman, M.L. (1994) Markov Decision Processes, Wiley Interscience, New York, NY.
Sethi, S. and Zhang, Q. (1994) Hierarchical Decision Making in Stochastic Manufacturing Systems. Birkhäuser, Boston, MA.
Sethi, S., Zhang, H. and Zhang, Q. (1997) Hierarchical production control in a stochastic manufacturing system with long-run average cost. Journal of Mathematical Analysis and Applications, 214, 151-172.
So, K.C. and Pinnault, S.C. (1988) Allocating buffer storages in a pull system. International Journal of Production Research, 15(12), 1959-1980.
Spearman, M.L., Woodruff, D.L. and Hoop, W.J. (1990) CONWIP: a pull alternative to kanban. International Journal of Production Research, 28(5), 879-894.
Sugimori, Y., Kusunoki, K., Cho, F. and Uchikawa, S. (1977) Toyota production system and kanban system materialization of just-in-time and respect-for-humans systems. International Journal of Production Research, 15(6), 553-564.
Sutton, R.S. (1988) Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
Sutton, R.S. and Barto, A.G. (1998) Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA.
Tabe, T., Muramatsu, R. and Tanaka, Y. (1980) Analysis of production ordering quantities and inventory variations in a multi-stage production ordering system. International Journal of Production Research, 18(2), 245-257.
Van Ryzin, G., Lou, S.X. and Gershwin, S.B. (1993) Production control for a tandem two-machine system. IIE Transactions, 25(5), 5-20.
Veatch, M.H. and Wein, L.M. (1994) Optimal control of a two-station tandem production/inventory system. Operations Research, 42(2), 337-350.
Watkins, C.J. (1989) Learning from delayed rewards. Ph.D. thesis, Kings College, Cambridge, England.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Paternina-Arboleda, C.D., Das, T.K. Intelligent dynamic control policies for serial production lines. IIE Transactions 33, 65–77 (2001). https://doi.org/10.1023/A:1007641824604
Issue Date:
DOI: https://doi.org/10.1023/A:1007641824604