Abstract
Reactive Search Optimization advocates the adoption of learning mechanisms as an integral part of a heuristic optimization scheme. This work studies reinforcement learning methods for the online tuning of parameters in stochastic local search algorithms. In particular, the reactive tuning is obtained by learning a (near-)optimal policy in a Markov decision process where the states summarize relevant information about the recent history of the search. The learning process is performed by the Least Squares Policy Iteration (LSPI) method. The proposed framework is applied for tuning the prohibition value in the Reactive Tabu Search, the noise parameter in the Adaptive Walksat, and the smoothing probability in the Reactive Scaling and Probabilistic Smoothing (RSAPS) algorithm. The novel approach is experimentally compared with the original ad hoc. reactive schemes.
Keywords
- Local Search
- Optimal Policy
- Reinforcement Learning
- Markov Decision Process
- Constraint Satisfaction Problem
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Baluja S., Barto A., Boese K., Boyan J., Buntine W., Carson T., Caruana R., Cook D., Davies S., Dean T., et al.: Statistical machine learning for large-scale optimization. Neural Computing Surveys 3:1–58 (2000)
Battiti R.: Machine learning methods for parameter tuning in heuristics. In: 5th DIMACS Challenge Workshop: Experimental Methodology Day, Rutgers University (1996)
Battiti R., Brunato M.: Reactive search: Machine learning for memory-based heuristics. In: Gonzalez T.F. (ed.) Approximation Algorithms and Metaheuristics, Taylor and Francis Books (CRC Press), Washington, DC, chap. 21, pp. 21-1–21-17 (2007)
Battiti R., Campigotto P.: Reinforcement learning and reactive search: An adaptive MAX-SAT solver. In: Proceedings of the 2008 Conference on ECAI 2008: 18th European Conference on Artificial Intelligence, IOS Press, pp. 909–910 (2008)
Battiti R., Protasi M.: Reactive search, a history-sensitive heuristic for MAX-SAT. ACM Journal of Experimental Algorithmics 2 (article 2), http://www.jea.acm.org/ (1997)
Battiti R., Tecchiolli G.: The Reactive Tabu Search. ORSA Journal on Computing 6(2):126–140 (1994)
Battiti R., Brunato M., Mascia F.: Reactive Search and Intelligent Optimization, Operations research/Computer Science Interfaces, vol. 45. Springer Verlag (2008)
Bennett K., Parrado-Hernández E.: The interplay of optimization and machine learning research. The Journal of Machine Learning Research 7:1265–1281 (2006)
Bertsekas D., Tsitsiklis J.: Neuro-dynamic programming. Athena Scientific (1996)
Boyan J.A., Moore A.W.: Learning evaluation functions for global optimization and Boolean satisfiability. In: Press A. (ed.) Proc. of 15th National Conf. on Artificial Intelligence (AAAI), pp. 3–10 (1998)
Brunato M., Battiti R., Pasupuleti S.: A memory-based rash optimizer. In: Geffner A.F.R.H.H. (ed.) Proceedings of AAAI-06 Workshop on Heuristic Search, Memory Based Heuristics and Their applications, Boston, Mass., pp. 45–51, ISBN 978-1-57735-290-7 (2006)
Eiben A., Horvath M., Kowalczyk W., Schut M.: Reinforcement learning for online control of evolutionary algorithms. In: Brueckner S.A., Hassas S., Jelasity M., Yamins D. (eds.) Proceedings of the 4th International Workshop on Engineering Self-Organizing Applications (ESOA’06), Springer Verlag, LNAI 4335, pp. 151–160 (2006)
Epstein S.L., Freuder E.C., Wallace R.J.: Learning to support constraint programmers. Computational Intelligence 21(4):336–371 (2005)
Fong P.W.L.: A quantitative study of hypothesis selection. In: International Conference on Machine Learning, pp. 226–234 (1995) URL citeseer.ist.psu.edu/fong95quantitative.html
Hamadi Y., Monfroy E., Saubion F.: What is Autonomous Search? Tech. Rep. MSR-TR-2008-80, Microsoft Research (2008)
Hoos H.: An adaptive noise mechanism for WalkSAT. In: Proceedings of the National Conference on Artificial Intelligence, AAAI Press; MIT Press, vol. 18, pp. 655–660 (1999)
Hoos H., Stuetzle T.: Stochastic Local Search: Foundations and applications. Morgan Kaufmann (2005)
Hutter F., Tompkins D., Hoos H.: Scaling and probabilistic smoothing: Efficient dynamic local search for sat. In: Proc. Principles and Practice of Constraint Programming - CP 2002, Ithaca, NY, Sept. 2002, Springer LNCS, pp. 233–248 (2002)
Hutter F., Hoos H.H., Stützle T.: Automatic algorithm configuration based on local search. In: Proc. of the Twenty-Second Conference on Artifical Intelligence (AAAI ’07), pp. 1152–1157 (2007)
Lagoudakis M., Littman M.: Algorithm selection using reinforcement learning. Proceedings of the Seventeenth International Conference on Machine Learning, pp. 511–518 (2000)
Lagoudakis M., Littman M.: Learning to select branching rules in the DPLL procedure for satisfiability. LICS 2001 Workshop on Theory and Applications of Satisfiability Testing (SAT 2001) (2001)
Lagoudakis M., Parr R.: Least-Squares Policy Iteration. Journal of Machine Learning Research 4(6):1107–1149 (2004)
Mitchell D., Selman B., Levesque H.: Hard and easy distributions of SAT problems. In: Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92), San Jose, CA, pp. 459–465 (1992)
Muller S., Schraudolph N., Koumoutsakos P.: Step size adaptation in Evolution Strategies using reinforcement learning. Proceedings of the 2002 Congress on Evolutionary Computation, 2002 CEC’02 1, pp. 151–156 (2002)
Prestwich S.: Tuning local search by average-reward reinforcement learning. In: Proceedings of the 2nd Learning and Intelligent OptimizatioN Conference (LION II), Trento, Italy, Dec. 10–12, 2007, Springer, Lecture Notes in Computer Science (2008)
Schwartz A.: A reinforcement learning method for maximizing undiscounted rewards. In: ICML, pp. 298–305 (1993)
Selman B., Kautz H., Cohen B.: Noise strategies for improving local search. In: Proceedings of the national conference on artificial intelligence, John Wiley & sons Ltd, USA, vol. 12 (1994)
Sutton R. S., Barto A. G.: Reinforcement Learning: An introduction. MIT Press (1998)
Tompkins D.: UBCSAT. http://www.satlib.org/ubcsat/#introduction (as of Oct. 1, 2008)
Xu Y., Stern D., Samulowitz H.: Learning adaptation to solve Constraint Satisfaction Problems. In: Proceedings of the 3rd Learning and Intelligent OptimizatioN Conference (LION III), Trento, Italy, Jan. 14–18, 2009, Springer, Lecture Notes in Computer Science (2009)
Zhang W., Dietterich T.: A reinforcement learning approach to job-shop scheduling. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1114–1120 (1995)
Zhang W., Dietterich T.: High-performance job-shop scheduling with a time-delay TD (λ) network. Advances in Neural Information Processing Systems 8:1024–1030 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Battiti, R., Campigotto, P. (2011). An Investigation of Reinforcement Learning for Reactive Search Optimization. In: Hamadi, Y., Monfroy, E., Saubion, F. (eds) Autonomous Search. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21434-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-21434-9_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21433-2
Online ISBN: 978-3-642-21434-9
eBook Packages: Computer ScienceComputer Science (R0)