Skip to main content
Log in

Optimization of parallel iterated local search algorithms on graphics processing unit

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Local search metaheuristics (LSMs) are efficient methods for solving hard optimization problems in science, engineering, economics and technology. By using LSMs, we could obtain satisfactory resolution (approximate optimum) in a reasonable time. However, it is still very CPU time-consuming when solving large problem instances. As graphic process units (GPUs) have been evolved to support general purpose computing, they are taken as a major accelerator in scientific and industrial computing. In this paper, we present an optimized parallel iterated local search algorithm efficiently accelerated on GPUs and test the algorithm with a typical case study of the Travelling Salesman Problem (TSP) in computational science. We introduce novel methods as follows: first, we present an efficient mapping between a neighborhood and a GPU thread. Second, we use the Roofline model to analyze the performance of existing GPU-based 2-opt kernels. Based on our analysis, we point out the limiting factor of these 2-opt kernels and provide our optimization approaches. Furthermore, we test our algorithm with standard TSP problem instances up to 4461 cities, in which our strategy leads to a speedup factor 279\(\times \) over the sequential counterpart. We compare our approach with existing high-performance GPU-based local search algorithms, and the results demonstrate that the proposed algorithm is competitive.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://olab.is.s.u-tokyo.ac.jp/~kamil.rocki/logo-tsp-src_v_0_62.zip.

References

  1. Alba E, Luque G, Nesmachnow S (2013) Parallel metaheuristics: recent advances and new trends. Int Trans Oper Res 20(1):1–48. doi:10.1111/j.1475-3995.2012.00862.x

    Article  MATH  Google Scholar 

  2. AMD (2013) AMD accelerated parallel processing OpenCL programming guide. AMD, Sunnyvale

  3. Arbelaez A, Codognet P (2014) A GPU implementation of parallel constraint-based local search. In: 22nd euromicro international conference on parallel, distributed and network-based processing (PDP), pp 648–655. doi:10.1109/PDP.2014.28

  4. Boyer V, El Baz D (2013) Recent advances on GPU computing in operations research. In: 2013 IEEE 27th international conference on parallel and distributed processing symposium workshops PhD forum (IPDPSW), pp 1778–1787. doi:10.1109/IPDPSW.2013.45

  5. Cai X, He F, Li W, Li X, Wu Y (2015) Encryption based partial sharing of cad models. Integr Comput Aided Eng 22(3):243–260. doi:10.3233/ICA-150487

    Article  Google Scholar 

  6. Cheng Y, He F, Wu Y, Zhang D (2016) Meta-operation conflict resolution for human–human interaction in collaborative feature-based CAD systems. Cluster Comput 19(1):237–253. doi:10.1007/s10586-016-0538-0

    Article  Google Scholar 

  7. Delévacq A, Delisle P, Gravel M, Krajecki M (2013) Parallel ant colony optimization on graphics processing units. J Parallel Distribut Comput 73(1):52–61

    Article  Google Scholar 

  8. Delévacq A, Delisle P, Krajecki M (2012) Parallel GPU implementation of iterated local search for the travelling salesman problem. In: Hamadi Y, Schoenauer M (eds) Learning and intelligent optimization. Lecture notes in computer science. Springer, Berlin, pp 372–377. doi:10.1007/978-3-642-34413-8_30

  9. Fosin J, Davidovic D, Caric T (2013) A GPU implementation of local search operators for symmetric travelling salesman problem. Promet Traffic Transp 25(3):225–234. doi:10.7307/ptt.v25i3.300

    Google Scholar 

  10. Glover F, Laguna M (2003) Tabu search. Intell Artif Rev Iberoam Intell Artif 7(19):29–48. doi:10.4114/ia.v7i19.714

    MATH  Google Scholar 

  11. Guerrero GD, Cecilia JM, Llanes A, García JM, Amos M, Ujaldón M (2014) Comparative evaluation of platforms for parallel ant colony optimization. J Supercomput 69(1):318–329. doi:10.1007/s11227-014-1154-5

    Article  Google Scholar 

  12. Harris M (2007) Optimizing parallel reduction in CUDA. NVIDIA Developer Technology. http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf

  13. Hasançebi O, Carbas S (2014) Bat inspired algorithm for discrete size optimization of steel frames. Adv Eng Softw 67:173–185. doi:10.1016/j.advengsoft.2013.10.003

    Article  Google Scholar 

  14. Hoos HH, Stützle T (2005) Stochastic local search: foundations and applications. The Morgan Kaufmann series in artificial intelligence. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  15. Huang ZY, He FZ, Cai XT, Zou ZQ, Liu J, Liang MM, Chen X (2011) Efficient random saliency map detection. Sci China Inf Sci 54(6):1207–1217. doi:10.1007/s11432-011-4263-2

    Article  Google Scholar 

  16. Intel (2014) Compute architecture of Intel processor graphics Gen8. In: Technical report

  17. Iturriaga S, Nesmachnow S, Luna F, Alba E (2015) A parallel local search in cpu/gpu for scheduling independent tasks on large heterogeneous computing systems. J Supercomput 71(2):648–672. doi:10.1007/s11227-014-1315-6

    Article  Google Scholar 

  18. Khronos OpenCL Working Group (2011) The OpenCL specification version 1.2. Khronos Group. http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf

  19. Kim KH, Kim K, Park QH (2011) Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model. Comput Phys Commun 182:1201–1207. doi:10.1016/j.cpc.2011.01.025

    Article  MATH  Google Scholar 

  20. Kirkpatrick S, Vecchi MP, Gelatt CD (1983) KB: optimization by simulated annealing. IBM Ger Sci Sympos Ser 220(4598):671–680. doi:10.1126/science.220.4598.671

    MathSciNet  MATH  Google Scholar 

  21. Koopmans TC, Beckmann M (1957) Assignment problems and the location of economic activities. Econ J Econ Soc 53–76

  22. Li K, He F, Chen X (2016) Real-time object tracking via compressive feature selection. Frontiers Comput Sci (2016). doi:10.1007/s11704-016-5106-5

  23. Li X, Li W, Cai X, He F (2015) A hybrid optimization approach for sustainable process planning and scheduling. Integr Comput Aided Eng 22(4):311–326. doi:10.3233/ICA-150492

    Article  Google Scholar 

  24. Liang YC, Cuevas Juarez JR (2015) A novel metaheuristic for continuous optimization problems: virus optimization algorithm. Eng Optim 1–21. doi:10.1080/0305215X.2014.994868

  25. Lin S, Kernighan BW (1973) An effective heuristic algorithm for the traveling-salesman problem. Oper Res 21:498–516. doi:10.1287/opre.21.2.498

    Article  MathSciNet  MATH  Google Scholar 

  26. Lourenço H, Martin O, Stützle T (2010) Iterated local search: framework and applications. In: Gendreau M, Potvin JY (eds) Handbook of metaheuristics. International series in operations research and management science, vol 146. Springer, New York, pp 363–397. doi:10.1007/978-1-4419-1665-5_12

  27. Luong TV, Loukil L, Melab N, Talbi EG (2010) A GPU-based iterated tabu search for solving the quadratic 3-dimensional assignment problem. In: 2010 IEEE/ACS international conference on computer systems and applications (AICCSA), pp 1–8. doi:10.1109/AICCSA.2010.5587019

  28. Luong TV, Melab N, Talbi EG (2013) GPU computing for parallel local search metaheuristic algorithms. IEEE Trans Comput 62(1):173–185. doi:10.1109/TC.2011.206

    Article  MathSciNet  Google Scholar 

  29. Mahdavi S, Shiri ME, Rahnamayan S (2015) Metaheuristics in large-scale global continues optimization: a survey. Inf Sci 295:407–428. doi:10.1016/j.ins.2014.10.042

    Article  MathSciNet  Google Scholar 

  30. Mohammadi M, Musa SN, Bahreininejad A (2014) Optimization of mixed integer nonlinear economic lot scheduling problem with multiple setups and shelf life using metaheuristic algorithms. Adv Eng Softw 78:41–51. doi:10.1016/j.advengsoft.2014.08.004

    Article  Google Scholar 

  31. NVIDIA (2012) NVIDIA’s next generation CUDA compute architecture: Kepler GK110. In: Technical report

  32. NVIDIA (2014) NVIDIA CUDA C programming guide v6.5. NVIDIA. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf

  33. O’Neil MA, Burtscher M (2015) Rethinking the parallelization of random-restart hill climbing: a case study in optimizing a 2-opt TSP solver for GPU execution. In: Proceedings of the 8th workshop on general purpose processing using GPUs, GPGPU-8. ACM, New York, pp 99–108. doi:10.1145/2716282.2716287

  34. Reinelt G (1991) TSPLIB—a traveling salesman problem library. INFORMS J Comput 3(4):376–384. doi:10.1287/ijoc.3.4.376

    Article  MATH  Google Scholar 

  35. Rocki K, Suda R (2012) Accelerating 2-opt and 3-opt local search using GPU in the travelling salesman problem. In: Cluster computing and the grid. doi:10.1109/CCGrid.2012.133

  36. Rocki K, Suda R (2013) High performance GPU accelerated local optimization in TSP. In: 2013 IEEE 27th international conference on parallel and distributed processing symposium workshops PhD forum (IPDPSW), pp 1788–1796. doi:10.1109/IPDPSW.2013.227

  37. Shmoys D, Lenstra J, Kan A, Lawler E (1985) The traveling salesman problem. Wiley, New York

    MATH  Google Scholar 

  38. Talbi E.G.: Metaheuristics: from design to implementation. Wiley, Hoboken. doi:10.1002/9780470496916

  39. Van Werkhoven B, Maassen J, Bal HE, Seinstra FJ (2014) Optimizing convolution operations on GPUs using Adaptive tiling. Future Gener Comput Syst 30(0):14–26. doi:10.1016/j.future.2013.09.003 (special issue on extreme scale parallel architectures and systems, cryptography in cloud computing and recent advances in parallel and distributed systems, ICPADS 2012 selected papers)

  40. Vasant PM (2013) Meta-heuristics optimization algorithms in engineering, business, economics, and finance. IGI Global, Hershey. doi:10.4018/978-1-4666-2086-5

  41. Williams S, Waterman A, Patterson DA (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52:65–76. doi:10.1145/1498765.1498785

    Article  Google Scholar 

  42. Wu Y, He F, Zhang D, Li, X.: Service-oriented feature-based data exchange for cloud-based design and manufacturing. IEEE Trans Serv Comput. doi:10.1109/TSC.2015.2501981

  43. Yan X, HE F, Chen Y, Yuan Z (2015) An efficient improved particle swarm optimization based on prey behavior of fish schooling. J Adv Mech Des Syst Manuf 9(4). doi:10.1299/jamdsm.2015jamdsm0048

  44. Zarrabi A, Samsudin K, Karuppiah EK (2015) Gravitational search algorithm using CUDA: a case study in high-performance metaheuristics. J Supercomput 71(4):1277–1296. doi:10.1007/s11227-014-1360-1

    Article  Google Scholar 

  45. Zhang D, He F, Han S, Li X (2016) Quantitative optimization of interoperability during feature-based data exchange. Integr Comput Aided Eng 23(1):31–50. doi:10.3233/ICA-150499

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments. This paper is supported by the National Science Foundation of China (Grant No. 61472289) and Hubei Province Science Foundation (Grant No. 2015CFB254).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fazhi He.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., He, F. & Qiu, Y. Optimization of parallel iterated local search algorithms on graphics processing unit. J Supercomput 72, 2394–2416 (2016). https://doi.org/10.1007/s11227-016-1738-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1738-3

Keywords

Navigation