A Cultural Algorithm for POMDPs from Stochastic Inventory Control

  • S. D. Prestwich
  • S. A. Tarim
  • R. Rossi
  • B. Hnich
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5296)


Reinforcement Learning algorithms such as SARSA with an eligibility trace, and Evolutionary Computation methods such as genetic algorithms, are competing approaches to solving Partially Observable Markov Decision Processes (POMDPs) which occur in many fields of Artificial Intelligence. A powerful form of evolutionary algorithm that has not previously been applied to POMDPs is the cultural algorithm, in which evolving agents share knowledge in a belief space that is used to guide their evolution. We describe a cultural algorithm for POMDPs that hybridises SARSA with a noisy genetic algorithm, and inherits the latter’s convergence properties. Its belief space is a common set of state-action values that are updated during genetic exploration, and conversely used to modify chromosomes. We use it to solve problems from stochastic inventory control by finding memoryless policies for nondeterministic POMDPs. Neither SARSA nor the genetic algorithm dominates the other on these problems, but the cultural algorithm outperforms the genetic algorithm, and on highly non-Markovian instances also outperforms SARSA.


Reinforcement Learn Order Quantity Memetic Algorithm Inventory Control Partially Observable Markov Decision Process 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arnold, D.V., Beyer, H.-G.: Local Performance of the (1+1)-ES in a Noisy Environment. IEEE Trans. Evolutionary Computation 6(1), 30–41 (2002)CrossRefGoogle Scholar
  2. 2.
    Becerra, R.L., Coello, C.A.C.: A Cultural Algorithm with Differential Evolution to Solve Constrained Optimization Problems. In: Lemaître, C., Reyes, C.A., González, J.A. (eds.) IBERAMIA 2004. LNCS (LNAI), vol. 3315, pp. 881–890. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    de Croon, G., van Dartel, M.F., Postma, E.O.: Evolutionary Learning Outperforms Reinforcement Learning on Non-Markovian Tasks. In: Workshop on Memory and Learning Mechanisms in Autonomous Robots, 8th European Conference on Artificial Life, Canterbury, Kent, UK (2005)Google Scholar
  4. 4.
    Fitzpatrick, J.M., Grefenstette, J.J.: Genetic Algorithms in Noisy Environments. Machine Learning 3, 101–120 (1988)Google Scholar
  5. 5.
    Gao, F., Cui, G., Liu, H.: Integration of Genetic Algorithm and Cultural Algorithms for Constrained Optimization. In: King, I., Wang, J., Chan, L.-W., Wang, D. (eds.) ICONIP 2006. LNCS, vol. 4234, pp. 817–825. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Gopalakrishnan, G., Minsker, B.S., Goldberg, D.: Optimal Sampling in a Noisy Genetic Algorithm for Risk-Based Remediation Design. In: World Water and Environmental Resources Congress. ASCE (2001)Google Scholar
  7. 7.
    Heisig, G.: Comparison of (s,S) and (s,nQ) Inventory Control Rules with Respect to Planning Stability. International Journal of Production Economics 73, 59–82 (2001)CrossRefGoogle Scholar
  8. 8.
    Holland, J.H.: Adaptation. In: Progress in Theoretical Biology IV, pp. 263–293. Academic Press, London (1976)CrossRefGoogle Scholar
  9. 9.
    Iglesias, R., Rodriguez, M., Sánchez, M., Pereira, E., Regueiro, C.V.: Improving Reinforcement Learning Through a Better Exploration Strategy and an Adjustable Representation of the Environment. In: 3rd European Conference on Mobile Robots (2007)Google Scholar
  10. 10.
    Jaakkola, T., Singh, S.P., Jordan, M.I.: Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems. In: Advances in Neural Information Processing Systems 6. MIT Press, Cambridge (1994)Google Scholar
  11. 11.
    Kovacs, T., Reynolds, S.I.: A Proposal for Population-Based Reinforcement Learning. Technical report CSTR-03-001, Department of Computer Science, University of Bristol (2003)Google Scholar
  12. 12.
    Littman, M.: Memoryless Policies: Theoretical Limitations and Practical Results. In: 3rd Conference on Simulation of Adaptive Behavior (1994)Google Scholar
  13. 13.
    Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning Policies for Partially Observable Environments: Scaling Up. In: International Conference on Machine Learning (1995)Google Scholar
  14. 14.
    Littman, M., Dean, T., Kaelbling, L.: On the Complexity of Solving Markov Decision Problems. In: 11th Conference on Uncertainty in Artificial Intelligence, pp. 394–402 (1995)Google Scholar
  15. 15.
    Liu, H., Hong, B., Shi, D., Ng, G.S.: On Partially Observable Markov Decision Processes Using Genetic Algorithm Based Q-Learning. In: Advances in Neural Networks, pp. 248–252. Watam Press (2007)Google Scholar
  16. 16.
    Loch, J., Singh, S.P.: Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes. In: 15th International Conference on Machine Learning, pp. 323–331 (1998)Google Scholar
  17. 17.
    Miller, B.L.: Noise, Sampling, and Efficient Genetic Algorithms. PhD thesis, University of Illinois, Urbana-Champaign (1997)Google Scholar
  18. 18.
    Miller, B.L., Goldberg, D.E.: Optimal Sampling for Genetic Algorithms. In: Intelligent Engineering Systems Through Artificial Neural Networks, vol. 6, pp. 291–298. ASME Press (1996)Google Scholar
  19. 19.
    Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary Algorithms for Reinforcement Learning. Journal of Artificial Intelligence Research 11, 241–276 (1999)zbMATHGoogle Scholar
  20. 20.
    Penrith, M.D., McGarity, M.J.: An Analysis of Direct Reinforcement Learning in non-Markovian Domains. In: 15th International Conference on Machine Learning, pp. 421–429. Morgan Kaufmann, San Francisco (1998)Google Scholar
  21. 21.
    Reynolds, R.G.: An Introduction to Cultural Algorithms. In: 3rd Annual Conference on Evolutionary Programming, pp. 131–139. World Scientific Publishing, Singapore (1994)Google Scholar
  22. 22.
    Reynolds, R.G.: Cultural Algorithms: Theory and Applications. New Ideas in Optimization, pp. 367–377. McGraw-Hill, New York (1999)Google Scholar
  23. 23.
    Reynolds, R.G., Chung, C.: A Cultural Algorithm Framework to Evolve Multiagent Cooperation With Evolutionary Programming. In: Angeline, P.J., McDonnell, J.R., Reynolds, R.G., Eberhart, R. (eds.) EP 1997. LNCS, vol. 1213, pp. 323–333. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  24. 24.
    Rivera, D.C., Becerra, R.L., Coello, C.A.C.: Cultural Algorithms, an Alternative Heuristic to Solve the Job Shop Scheduling Problem. Engineering Optimization 39(1), 69–85 (2007)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Rummery, G.A., Niranjan, M.: On-line Q-learning Using Connectionist Systems. Technical report CUED/F-INFENG/TR 166, Cambridge University (1994)Google Scholar
  26. 26.
    Russell, S.J., Zimdars, A.: Q-Decomposition for Reinforcement Learning Agents. In: 20th International Conference on Machine Learning, pp. 656–663. AAAI Press, Menlo Park (2003)Google Scholar
  27. 27.
    Silver, E.A., Pyke, D.F., Peterson, R.: Inventory Management and Production Planning and Scheduling. John-Wiley and Sons, New York (1998)Google Scholar
  28. 28.
    Singh, S., Jaakkola, T., Jordan, M.: Learning Without State-Estimation in Partially Observable Markovian Decision Processes. In: 11th International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann, San Francisco (1994)Google Scholar
  29. 29.
    Stroud, P.D.: Kalman-Extended Genetic Algorithm for Search in Nonstationary Environments with Noisy Fitness Functions. IEEE Transactions on Evolutionary Computation 5(1), 66–77 (2001)CrossRefGoogle Scholar
  30. 30.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  31. 31.
    Treharne, J.T., Sox, C.R.: Adaptive Inventory Control for Nonstationary Demand and Partial Information. Management Science 48(5), 607–624 (2002)CrossRefzbMATHGoogle Scholar
  32. 32.
    Watkins, C.J.C.H.: Learning From Delayed Rewards. PhD thesis, Cambridge University (1989)Google Scholar
  33. 33.
    Whitley, D., Kauth, J.: GENITOR: A Different Genetic Algorithm. In: Rocky Mountain Conference on Artificial Intelligence, Denver, CO, USA, pp. 118–130 (1988)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • S. D. Prestwich
    • 1
  • S. A. Tarim
    • 2
  • R. Rossi
    • 1
  • B. Hnich
    • 3
  1. 1.Cork Constraint Computation CentreIreland
  2. 2.Department of ManagementHacettepe UniversityTurkey
  3. 3.Faculty of Computer ScienceIzmir University of EconomicsTurkey

Personalised recommendations