Skip to main content
Log in

Intelligent dynamic control policies for serial production lines

  • Published:
IIE Transactions

Abstract

Heuristic production control policies such as CONWIP, kanban, and other hybrid policies have been in use for years as better alternatives to MRP-based push control policies. It is a fact that these policies, although efficient, are far from optimal. Our goal is to develop a methodology that, for a given system, finds a dynamic control policy via intelligent agents. Such a policy while achieving the productivity (i.e., demand service rate) goal of the system will optimize a cost/reward function based on the WIP inventory. To achieve this goal we applied a simulation-based optimization technique called Reinforcement Learning (RL) on a four-station serial line. The control policy attained by the application of a RL algorithm was compared with the other existing policies on the basis of total average WIP and average cost of WIP. We also develop a heuristic control policy in light of our experience gained from a close examination of the policies obtained by the RL algorithm. This heuristic policy named Behavior-Based Control (BBC), although placed second to the RL policy, proved to be a more efficient and leaner control policy than most of the existing policies in the literature. The performance of the BBC policy was found to be comparable to the Extended Kanban Control System (EKCS), which as per our experimentation, turned out to be the best of the existing policies. The numerical results used for comparison purposes were obtained from a four-station serial line with two different (constant and Poisson) demand arrival processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abounadi, J. (1998) Stochastic approximation for non-expansive maps: applications to Q-learning algorithms. Unpublished Ph.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA.

  • Arkin, R.C. (1998) Behavior-based Robotics, 1st edn, The MIT Press, Cambridge, MA.

    Google Scholar 

  • Askin, R.G. and Standridge, C.R. (1993) Modeling and Analysis of Manufacturing Systems, 1st edn, John Wiley & Sons, New York, NY.

    Google Scholar 

  • Berkley, B.J. (1992) A review of the kanban production control research literature. Production and Operations Management, 1(4), 393-411.

    Google Scholar 

  • Bertsekas, D.P. and Tsitsiklis, J.N. (1996) Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.

    Google Scholar 

  • Bonvik, A.M., Couch, C.E. and Gershwin, S.B. (1997) A comparison of production-line control mechanisms. International Journal of Production Research, 35(3), 789-804.

    Google Scholar 

  • Buzacott, J.A. and Shantikumar, J.G. (1992) A general approach for coordinating production in multiple cell manufacturing systems. Production and Operation Management, 1(1), 34-52.

    Google Scholar 

  • Dallery, Y. and Liberopoulos, G. (2000) Extended kanban control system: combining kanban and base stock. IIE Transactions, 32(4), 369-386.

    Google Scholar 

  • Das, T.K., Gosavi, A., Mahadevan, S. and Marchellack, N. (1999) Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 45(4), 560-574.

    Google Scholar 

  • Das, T.K. and Sarkar, S. (1999) Optimal preventive maintenance in a production/inventory system. IIE Transactions, 31(6), 537-551.

    Google Scholar 

  • Frein, Y., Di Mascolo, M. and Dallery, Y. (2000) On the design of generalized kanban control systems. International Journal of Operations and Production Management (in press).

  • Gershwin, S.B. (1994) Manufacturing Systems Engineering, Prentice Hall, Englewoods Cliffs, NJ.

    Google Scholar 

  • Gosavi, A. (1999) An algorithm for solving semi-Markov decision problems using reinforcement learning: convergence analysis and numerical results. Unpublished Ph.D. Thesis, Department of Industrial Engineering, University of South Florida, Tampa, FL 33620.

  • Kaelbling, L.P., Littman, M.L. and Moore, A.W. (1996) Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237-285.

    Google Scholar 

  • Law, A.M. and Kelton, W.D. (1991) Simulation Modeling and Analysis, McGraw-Hill, Inc., New York, NY.

    Google Scholar 

  • Lutz, C.M., Davis, K.R. and Sun, M. (1998) Determining buffer location and size in production lines using tabu search. European Journal of Operational Research, 106(2/3), 301-316.

    Google Scholar 

  • Mahadevan, S. and Theochaurus, G. (1998) Optimizing production manufacturing using reinforcement learning, in Proceedings of the Eleventh International FLAIRS Conference, AAAI Press, Menlo Park, CA, pp. 372-377.

    Google Scholar 

  • Muckstadt, J.A. and Tayur, S.R. (1995a) Comparison of alternative kanban control mechanisms. I. background and structural results. IIE Transactions, 27(2), 140-150.

    Google Scholar 

  • Muckstadt, J.A. and Tayur, S.R. (1995b) Comparison of alternative kanban control mechanisms. II. experimental results. IIE Transactions, 27(2), 151-161.

    Google Scholar 

  • Putterman, M.L. (1994) Markov Decision Processes, Wiley Interscience, New York, NY.

    Google Scholar 

  • Sethi, S. and Zhang, Q. (1994) Hierarchical Decision Making in Stochastic Manufacturing Systems. Birkhäuser, Boston, MA.

    Google Scholar 

  • Sethi, S., Zhang, H. and Zhang, Q. (1997) Hierarchical production control in a stochastic manufacturing system with long-run average cost. Journal of Mathematical Analysis and Applications, 214, 151-172.

    Google Scholar 

  • So, K.C. and Pinnault, S.C. (1988) Allocating buffer storages in a pull system. International Journal of Production Research, 15(12), 1959-1980.

    Google Scholar 

  • Spearman, M.L., Woodruff, D.L. and Hoop, W.J. (1990) CONWIP: a pull alternative to kanban. International Journal of Production Research, 28(5), 879-894.

    Google Scholar 

  • Sugimori, Y., Kusunoki, K., Cho, F. and Uchikawa, S. (1977) Toyota production system and kanban system materialization of just-in-time and respect-for-humans systems. International Journal of Production Research, 15(6), 553-564.

    Google Scholar 

  • Sutton, R.S. (1988) Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.

    Google Scholar 

  • Sutton, R.S. and Barto, A.G. (1998) Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA.

    Google Scholar 

  • Tabe, T., Muramatsu, R. and Tanaka, Y. (1980) Analysis of production ordering quantities and inventory variations in a multi-stage production ordering system. International Journal of Production Research, 18(2), 245-257.

    Google Scholar 

  • Van Ryzin, G., Lou, S.X. and Gershwin, S.B. (1993) Production control for a tandem two-machine system. IIE Transactions, 25(5), 5-20.

    Google Scholar 

  • Veatch, M.H. and Wein, L.M. (1994) Optimal control of a two-station tandem production/inventory system. Operations Research, 42(2), 337-350.

    Google Scholar 

  • Watkins, C.J. (1989) Learning from delayed rewards. Ph.D. thesis, Kings College, Cambridge, England.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paternina-Arboleda, C.D., Das, T.K. Intelligent dynamic control policies for serial production lines. IIE Transactions 33, 65–77 (2001). https://doi.org/10.1023/A:1007641824604

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1007641824604

Keywords

Navigation