Advertisement

Tuning Local Search by Average-Reward Reinforcement Learning

  • Steven Prestwich
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5313)

Abstract

Reinforcement Learning and local search have been combined in a variety of ways, in order to learn how to solve combinatorial problems more efficiently. Most approaches optimise the total reward, where the reward at each action is the change in objective function. We argue that it is more appropriate to optimise the average reward. We use R-learning to dynamically tune noise in standard SAT local search algorithms on single instances. Experiments show that noise can be successfully automated in this way.

Keywords

Local Search Reinforcement Learn Local Move Local Search Algorithm Average Reward 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Boese, K.D.: Cost Versus Distance in the Travelling Salesman Problem. Technical report CSD-950018, UCLA Computer Science DepartmentGoogle Scholar
  2. 2.
    Boyan, J.A., Moore, A.W.: Learning Evaluation Functions for Global Optimization and Boolean Satisfiability. In: 15th National Conference on Artificial Intelligence and 10th Innovative Applications of Artificial Intelligence Conference, pp. 3–10. AAAI Press / MIT Press (1998)Google Scholar
  3. 3.
    Crites, R., Barto, A.: Improving Elevator Performance Using Reinforcement Learning. In: Conference on Advance in Neural Information Processing Systems, pp. 1017–1023. MIT Press, Cambridge (1999)Google Scholar
  4. 4.
    Gagliolo, M., Schmidhuber, J.: Gambling in a Computationally Expensive Casino: Algorithm Selection as a Bandit Problem. In: Online Trading of Exploration and Exploitation, NIPS 2006 Workshop, Whistler, BC, Canada (2006)Google Scholar
  5. 5.
    Gambardella, L.M., Dorigo, M.: Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem. In: 12th International Conference on Machine Learning, pp. 252–260. Morgan Kaufmann, San Francisco (1995)Google Scholar
  6. 6.
    Gent, I.P., Walsh, T.: An Empirical Analysis of Search in GSAT. Journal of Artificial Intelligence Research 1, 47–59 (1993)zbMATHGoogle Scholar
  7. 7.
    Gent, I.P., Walsh, T.: Unsatisfied Variables in Local Search. In: Hallam, J. (ed.) Hybrid Problems, Hybrid Solutions, pp. 73–85. IOS Press, Amsterdam (1995)Google Scholar
  8. 8.
    Hoos, H.H., Stützle, T.: Stochastic Local Search: Foundations and Applications. Morgan Kaufmann, San Francisco (2004)zbMATHGoogle Scholar
  9. 9.
    Lagoudakis, M.G., Littman, M.L.: Algorithm Selection Using Reinforcement Learning. In: 17th International Conference on Machine Learning, pp. 511–518. Morgan Kaufmann, San Francisco (2000)Google Scholar
  10. 10.
    Mahadevan, S.: Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results. Machine Learning 22, 159–196 (1996)zbMATHGoogle Scholar
  11. 11.
    McAllester, D.A., Selman, B., Kautz, H.A.: Evidence for Invariants in Local Search. In: 14th National Conference on Artificial Intelligence and Ninth Innovative Applications of Artificial Intelligence Conference, pp. 321–326. AAAI Press / MIT Press (1997)Google Scholar
  12. 12.
    Miagkikh, V., Punch, W.: Global Search in Combinatorial Optimization using Reinforcement Learning Algorithms. In: Congress on Evolutionary Computation, vol. 1, pp. 189–196. IEEE, Los Alamitos (1999)Google Scholar
  13. 13.
    Moll, R., Barto, A., Perkins, T., Sutton, R.: Learning Instance-Independent Value Functions to Enhance Local Search. In: Advances in Neural Information Processing Systems 11, pp. 1017–1023. MIT Press, Cambridge (1999)Google Scholar
  14. 14.
    Morris, P.: The Breakout Method for Escaping from Local Minima. In: 11th National Conference on Artificial Intelligence, pp. 40–45. AAAI Press / MIT Press (1993)Google Scholar
  15. 15.
    Nareyek, A.: Choosing Search Heuristics by Non-Stationary Reinforcement Learning. Metaheuristics: Computer Decision-Making, pp. 523–544. Kluwer, Dordrecht (2004)Google Scholar
  16. 16.
    Prestwich, S.D.: Random Walk With Continuously Smoothed Variable Weights. In: Bacchus, F., Walsh, T. (eds.) SAT 2005. LNCS, vol. 3569, pp. 203–215. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Rummery, G.A., Niranjan, M.: On-line Q-learning Using Connectionist Systems. Technical report CUED/F-INFENG/TR 166, Engineering Dept., Cambridge University, UK (1994)Google Scholar
  18. 18.
    Schwartz, A.: A Reinforcement Learning Method for Maximizing Undiscounted Rewards. In: 10th International Conference on Machine Learning, pp. 298–305. Morgan Kaufmann, San Francisco (1993)Google Scholar
  19. 19.
    Selman, B., Kautz, H.A., Cohen, B.: Noise Strategies for Improving Local Search. In: 12th National Conference on Artificial Intelligence, pp. 337–343. AAAI Press, Menlo Park (1994)Google Scholar
  20. 20.
    Singh, S., Jaakkola, T., Jordan, M., Cohen, W.W., Hirsh, H. (eds.): Learning Without State-Estimation in Partially Observable Markovian Decision Processes. Eleventh International Conference on Machine Learning, pp. 284–292. Morgan Kaufmann, San Francisco (1994)Google Scholar
  21. 21.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  22. 22.
    Tesauro, G.: Temporal Difference Learning and TD-Gammon. Communications of the ACM 38(3), 58–67 (1995)CrossRefGoogle Scholar
  23. 23.
    Tompkins, D.A.D., Hoos, H.H.: Scaling and Probabilistic Smoothing: Dynamic Local Search for Unweighted MAX-SAT. In: Xiang, Y., Chaib-draa, B. (eds.) Canadian AI 2003. LNCS, vol. 2671, pp. 145–159. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  24. 24.
    Varrentrapp, K.E.: A Practical Framework for Adaptive Metaheuristics. PhD thesis, Fachgebiet Intellektik, Fachbereich Informatik, Technische Universität Darmstadt, Darmstadt, Germany (2005)Google Scholar
  25. 25.
    Watkins, C.J.C.H.: Learning From Delayed Rewards. PhD thesis. Cambridge University (1989)Google Scholar
  26. 26.
    Zhang, W., Dietterrich, T.D.: A Reinforcement Learning Approach to Job-Shop Scheduling. In: 14th International Joint Conference on Artificial Intelligence, pp. 1114–1120. Morgan Kaufmann, San Francisco (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Steven Prestwich
    • 1
  1. 1.Cork Constraint Computation Centre Department of Computer ScienceUniversity CollegeCorkIreland

Personalised recommendations