Learning While Optimizing an Unknown Fitness Surface

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5313)


This paper is about Reinforcement Learning (RL) applied to online parameter tuning in Stochastic Local Search (SLS) methods. In particular a novel application of RL is considered in the Reactive Tabu Search (RTS) method, where the appropriate amount of diversification in prohibition-based (Tabu) local search is adapted in a fast online manner to the characteristics of a task and of the local configuration. We model the parameter-tuning policy as a Markov Decision Process where the states summarize relevant information about the recent history of the search, and we determine a near-optimal policy by using the Least Squares Policy Iteration (LSPI) method. Preliminary experiments on Maximum Satisfiability (MAX-SAT) instances show very promising results indicating that the learnt policy is competitive with previously proposed reactive strategies.


Local Search Optimal Policy Reinforcement Learn Markov Decision Process Tabu Search Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Battiti, R., Tecchiolli, G.: The reactive tabu search. ORSA Journal on Computing 6(2), 126–140 (1994)CrossRefzbMATHGoogle Scholar
  2. 2.
    Battiti, R., Brunato, M.: Reactive search: machine learning for memory-based heuristics. In: Gonzalez, T.F. (ed.) Approximation Algorithms and Metaheuristics, pp. 21–1 – 21–17. Taylor and Francis Books, CRC Press, Washington (2007)Google Scholar
  3. 3.
    Battiti, R., Brunato, M., Mascia, F.: Reactive Search and Intelligent Optimization. In: Operations research/Computer Science Interfaces. Springer, Heidelberg (in press, 2008)Google Scholar
  4. 4.
    Battiti, R.: Machine learning methods for parameter tuning in heuristics. In: 5th DIMACS Challenge Workshop: Experimental Methodology Day, Rutgers University (October 1996)Google Scholar
  5. 5.
    Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific (1996)Google Scholar
  6. 6.
    Lagoudakis, M., Parr, R.: Least-Squares Policy Iteration. Journal of Machine Learning Research 4(6), 1107–1149 (2004)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Baluja, S., Barto, A., Boese, K., Boyan, J., Buntine, W., Carson, T., Caruana, R., Cook, D., Davies, S., Dean, T., et al.: Statistical Machine Learning for Large-Scale Optimization. Neural Computing Surveys 3, 1–58 (2000)Google Scholar
  8. 8.
    Boyan, J.A., Moore, A.W.: Learning evaluation functions for global optimization and boolean satisfability. In: Press, A. (ed.) Proc. of 15th National Conf. on Artificial Intelligence (AAAI), pp. 3–10 (1998)Google Scholar
  9. 9.
    Zhang, W., Dietterich, T.: A reinforcement learning approach to job-shop scheduling. In: Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, vol. 1114 (1995)Google Scholar
  10. 10.
    Zhang, W., Dietterich, T.: High-performance job-shop scheduling with a time-delay TD (λ) network. Advances in Neural Information Processing Systems 8, 1024–1030 (1996)Google Scholar
  11. 11.
    Lagoudakis, M., Littman, M.: Learning to select branching rules in the DPLL procedure for satisfiability. In: LICS 2001 Workshop on Theory and Applications of Satisfiability Testing, SAT 2001 (2001)Google Scholar
  12. 12.
    Lagoudakis, M., Littman, M.: Algorithm selection using reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 511–518 (2000)Google Scholar
  13. 13.
    Muller, S., Schraudolph, N., Koumoutsakos, P.: Step size adaptation in evolution strategies using reinforcementlearning. In: Proceedings of the 2002 Congress on Evolutionary Computation, 2002. CEC 2002, vol. 1, pp. 151–156 (2002)Google Scholar
  14. 14.
    Eiben, A., Horvath, M., Kowalczyk, W., Schut, M.: Reinforcement learning for online control of evolutionary algorithms. In: Brueckner, S.A., Hassas, S., Jelasity, M., Yamins, D. (eds.) ESOA 2006. LNCS (LNAI), vol. 4335. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. 15.
    Battiti, R., Protasi, M.: Approximate algorithms and heuristics for MAX-SAT. In: Du, D., Pardalos, P. (eds.) Handbook of Combinatorial Optimization, vol. 1, pp. 77–148. Kluwer Academic Publishers, Dordrecht (1998)CrossRefGoogle Scholar
  16. 16.
    Lagoudakis, M., Parr, R.: LSPI: Least-squares policy iteration (as of September 1, 2007),
  17. 17.
    Mitchell, D., Selman, B., Levesque, H.: Hard and easy distributions of SAT problems. In: Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI 1992), San Jose, Ca, pp. 459–465 (July 1992)Google Scholar
  18. 18.
    Steinmann, O., Strohmaier, A., Stutzle, T.: Tabu search vs. random walk. In: KI - Kunstliche Intelligenz, pp. 337–348 (1997)Google Scholar
  19. 19.
    Selman, B., Levesque, H., Mitchell, D.: A new method for solving hard satisfiability problems. In: Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI 1992), San Jose, Ca, pp. 440–446 (July 1992)Google Scholar
  20. 20.
    McAllester, D., Selman, B., Kautz, H.: Evidence for invariants in local search. In: Proceedings of the national conference on artificial intelligence (14), pp. 321–326. John Wiley & sons LTD., USA (1997)Google Scholar
  21. 21.
    Selman, B., Kautz, H., Cohen, B.: Noise strategies for improving local search. In: Proceedings of the national conference on artificial intelligence, vol. 12. John Wiley & sons LTD., USA (1994)Google Scholar
  22. 22.
    Tompkins, D.A.D., Hoos, H.H.: Novelty +  and adaptive novelty + . SAT 2004 Competition Booklet (solver description) (2004)Google Scholar
  23. 23.
    Tompkins, F.H.D., Hoos, H.: Scaling and probabilistic smoothing: Efficient dynamic local search for sat. In: Van Hentenryck, P. (ed.) CP 2002. LNCS, vol. 2470, p. 233. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  24. 24.
    Schuurmans, D., Southey, F., Holte, R.: The exponentiated subgradient algorithm for heuristic boolean programming. In: Proceedings of the international joint conference on artificial intelligence, vol. 17, pp. 334–341. Lawrence Erlbaum associates LTD., USA (2001)Google Scholar
  25. 25.
    Battiti, R., Protasi, M.: Reactive search, a history-sensitive heuristic for MAX-SAT. ACM Journal of Experimental Algorithmics 2 (ARTICLE 2) (1997),

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  1. 1.DISI - Dipartimento di Ingegneria e Scienza dell’InformazioneUniversità di TrentoItaly

Personalised recommendations