The Journal of Supercomputing

, Volume 71, Issue 2, pp 648–672 | Cite as

A parallel local search in CPU/GPU for scheduling independent tasks on large heterogeneous computing systems

  • Santiago Iturriaga
  • Sergio Nesmachnow
  • Francisco Luna
  • Enrique Alba


This article presents the parallel implementation on CPU/GPU of two variants of a stochastic local search method to efficiently solve the scheduling problem in heterogeneous computing systems. Both methods are based on a set of simple operators to keep the computational complexity as low as possible, thus allowing large instances of the scheduling problem to be efficiently addressed. The experimental analysis demonstrates that both versions of the parallel CPU/GPU stochastic local search are able to compute accurate suboptimal schedules in significantly shorter execution times than state-of-the-art schedulers, while also outperforming a recently published GPU parallel evolutionary scheduler in terms of both efficiency and solution quality.


Heterogeneous computing Scheduling GPU computing 



The work of S. Iturriaga and S. Nesmachnow has been partially supported by ANII and PEDECIBA, Uruguay. The work of F. Luna and E. Alba has been partially funded by FEDER (TIN2011-28194). The experiments were carried out using the HPC facility of the University of Luxembourg.


  1. 1.
    Alba E, Luque G (2007) A new local search algorithm for the DNA fragment assembly problem. In: Proceedings of the 7th European conference on evolutionary computation in combinatorial optimization, pp 1–12Google Scholar
  2. 2.
    Alba E, Luque G, Nesmachnow S (2013) Parallel metaheuristics: recent advances and new trends. Int Trans Oper Res 20(1):1–48CrossRefzbMATHGoogle Scholar
  3. 3.
    Ali S, Siegel H, Maheswaran M, Ali S, Hensgen D (2000) Task execution time modeling for heterogeneous computing systems. In: Proceedings of the 9th heterogeneous computing workshop, Washington, DC, USA, pp 185Google Scholar
  4. 4.
    Blazewicz J, Frohmberg W, Kierzynka M, Wojciechowski P (2013) G-MSA—a GPU-based, fast and accurate algorithm for multiple sequence alignment. J Parallel Distrib Comput 73(1):32–41Google Scholar
  5. 5.
    Bordoloi U, Suri B, Nunna S, Chakraborty S, Eles P, Peng Z (2012) Customizing instruction set extensible reconfigurable processors using GPUs. In: Proceedings of the 25th international conference on VLSI design, pp 418–423Google Scholar
  6. 6.
    Braun T, Siegel H, Beck N, Bölöni L, Maheswaran M, Reuther A, Robertson J, Theys M, Yao B, Hensgen D, Freund R (2001) A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J Parallel Distrib Comput 61(6):810–837CrossRefGoogle Scholar
  7. 7.
    Canabé M, Nesmachnow S (2012) Parallel implementations of the minmin heterogeneous computing scheduler in GPU. CLEI Electron J 15(3):1–12Google Scholar
  8. 8.
    Chen Y, Hung C, Lin Y, Lin C, Lee T, Lee K (2012) Parallel UPGMA algorithm on graphics processing units using CUDA. In: Proceedings of 14th international conference on high performance computing and communication, pp 849–854Google Scholar
  9. 9.
    Croes A (1958) A method for solving traveling salesman problems. Oper Res 5:791–812CrossRefMathSciNetGoogle Scholar
  10. 10.
    Czapiński M (2013) An effective parallel multistart tabu search for quadratic assignment problem on CUDA platform. J Parallel Distrib Comput 73(11):1461–1468CrossRefGoogle Scholar
  11. 11.
    Delévacq A, Delisle P, Krajecki M (2012) Parallel GPU implementation of iterated local search for the travelling salesman problem. In: Hamadi Y, Schoenauer M (eds) Learning and intelligent optimization. Lecture notes in computer science. Springer, Berlin, pp 372–377Google Scholar
  12. 12.
    El-Rewini H, Lewis T, Ali H (1994) Task scheduling in parallel and distributed systems. Prentice-Hall Inc, Upper Saddle RiverGoogle Scholar
  13. 13.
    Eshaghian M (1996) Heterogeneous computing. Artech House, NorwoodGoogle Scholar
  14. 14.
    Foster I, Kesselman C (1998) The grid: blueprint for a future computing infrastructure. Morgan Kaufmann Publishers, Menlo ParkGoogle Scholar
  15. 15.
    Freund R, Sunderam V, Gottlieb A, Hwang K, Sahni S (1994) Special issue on heterogeneous processing. J Parallel Distrib Comput 21(3):1Google Scholar
  16. 16.
    Garey M, Johnson D (1979) Computers and intractability. Freeman, San FranciscoGoogle Scholar
  17. 17.
    Graham R, Lawler J, Lenstra E, Kan A (1979) Optimization and approximation in deterministic sequencing and scheduling: a survey. Ann Discret Math 5:287–326CrossRefzbMATHGoogle Scholar
  18. 18.
    Gulati K, Khatri SP (2010) Boolean satisfiability on a graphics processor. In: Proceedings of the 20th symposium on great lakes symposium on VLSI, pp 123–126Google Scholar
  19. 19.
    Kider J, Henderson M, Likhachev M, Safonova A (2010) High-dimensional planning on the GPU. In: IEEE international conference on robotics and automation, pp 2515–2522Google Scholar
  20. 20.
    Kwok Y, Ahmad I (1999) Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput Surv 31(4):406–471CrossRefGoogle Scholar
  21. 21.
    Leung J, Kelly L, Anderson J (2004) Handbook of scheduling: algorithms, models, and performance analysis. CRC Press Inc, Boca RatonGoogle Scholar
  22. 22.
    Luna F, Nesmachnow S, Alba E (2010) Búsqueda local paralela para la planificación de tareas en sistemas heterogéneos. In: Proceedings of VIII Congreso Español de Metaheurísticas, Algoritmos Evolutivos y Bioinspirados, Albacete, España, pp 1–8Google Scholar
  23. 23.
    Luong TV, Loukil L, Melab N, Talbi E-G (2010) A GPU-based iterated tabu search for solving the quadratic 3-dimensional assignment problem. In: 2010 IEEE/ACS international conference on computer systems and applications, pp 1–8Google Scholar
  24. 24.
    Luong TV, Melab N, Talbi E-G (2011) GPU-based multi-start local search algorithms. In: Proceedings of the 5th international conference on learning and intelligent optimization, pp 321–335Google Scholar
  25. 25.
    Luong TV, Melab N, Talbi E-G (2010) Neighborhood structures for gpu-based local search algorithms. Parallel Process Lett 20(4):307–324CrossRefMathSciNetGoogle Scholar
  26. 26.
    Luong TV, Melab N, Talbi E-G (2013) GPU computing for parallel local search metaheuristic algorithms. IEEE Trans Comput 62(1):173–185CrossRefMathSciNetGoogle Scholar
  27. 27.
    Melab N, Luong TV, Boufaras K, Talbi E-G (2013) Paradiseo-mo-GPU: a framework for parallel GPU-based local search metaheuristics. In: Proceedings of the 15th genetic and evolutionary computation conference, pp 1189–1196Google Scholar
  28. 28.
    Nashed Y, Ugolotti R, Mesejo P, Cagnoni S (2012) libCudaOptimize: an open source library of GPU-based metaheuristics. In: Proceedings of the 14th international conference on genetic and evolutionary computation conference companion, pp 117–124Google Scholar
  29. 29.
    Nesmachnow S, Cancela H, Alba E (2010) Heterogeneous computing scheduling with evolutionary algorithms. Soft Comput 15(4):685–701CrossRefGoogle Scholar
  30. 30.
    Nesmachnow S, Cancela H, Alba E (2012) A parallel micro evolutionary algorithm for heterogeneous computing and grid scheduling. Appl Soft Comput 12(2):626–639CrossRefMathSciNetGoogle Scholar
  31. 31.
    Nesmachnow S, Luna F, Alba E (2012) An efficient stochastic local search for heterogeneous computing scheduling. In: Proceedings of the 26th international parallel and distributed processing symposium, pp 593–600Google Scholar
  32. 32.
    nVidia (2010) CUDA website. Accessed March 2014
  33. 33.
    nVidia Corporation (2011) 2701 San Tomas Expressway, Santa Clara 95050, USA. CUDA C Best Practices Guide, 4.0 ednGoogle Scholar
  34. 34.
    Pinel F, Dorronsoro B, Bouvry P (2013) Solving very large instances of the scheduling of independent tasks problem on the GPU. J Parallel Distrib Comput 73(1):101–110CrossRefGoogle Scholar
  35. 35.
    Pinel F, Pecero J, Bouvry P, Khan SU (2011) A two-phase heuristic for the scheduling of independent tasks on computational grids. In: Proceedings of international conference on high performance computing and simulation, pp 471–477Google Scholar
  36. 36.
    Ritchie G, Levine J (2003) A fast, effective local search for scheduling independent jobs in heterogeneous computing environments. In: Proceedings of the 22nd workshop of the UK Planning and Scheduling Special Interest GroupGoogle Scholar
  37. 37.
    Rocki K, Suda R (2012) Accelerating 2-opt and 3-opt local search using GPU in the travelling salesman problem. In: Proceedings of the international conference on high performance computing and simulation, pp 489–495Google Scholar
  38. 38.
    Rocki K, Suda R (2012) An efficient GPU implementation of a multi-start TSP solver for large problem instances. In: Proceedings of the 14th international conference on genetic and evolutionary computation companion, pp 1441–1442Google Scholar
  39. 39.
    Roverso R, Naiem A, El-Beltagy M, El-Ansary S, Haridi S (2010) A GPU-enabled solver for time-constrained linear sum assignment problems. In: Proceedings of 7th international conference on informatics and systems, pp 1–6Google Scholar
  40. 40.
    Schulz C (2013) Efficient local search on the GPU—investigations on the vehicle routing problem. J Parallel Distrib Comput 73:14–31CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Santiago Iturriaga
    • 1
  • Sergio Nesmachnow
    • 1
  • Francisco Luna
    • 2
  • Enrique Alba
    • 3
  1. 1.Universidad de la RepúblicaMontevideoUruguay
  2. 2.Universidad de ExtremaduraMéridaSpain
  3. 3.Universidad de MálagaMálagaSpain

Personalised recommendations