Skip to main content
Log in

Reconstructing permutation table to improve the Tabu Search for the PFSP on GPU

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

General-purpose computing on graphics processing unit (GPGPU) has been adopted to accelerate the running of applications which require long execution time in various problem domains. Tabu Search belonging to meta-heuristics optimization has been used to find a suboptimal solution for NP-hard problems within a more reasonable time interval. In this paper, we have investigated in how to improve the performance of Tabu Search algorithm on GPGPU and took the permutation flow shop scheduling problem (PFSP) as the example for our study. In previous approach proposed recently for solving PFSP by Tabu Search on GPU, all the job permutations are stored in global memory to successfully eliminate the occurrences of branch divergence. Nevertheless, the previous algorithm requires a large amount of global memory space, because of a lot of global memory access resulting in system performance degradation. We propose a new approach to address the problem. The main contribution of this paper is an efficient multiple-loop struct to generate most part of the permutation on the fly, which can decrease the size of permutation table and significantly reduce the amount of global memory access. Computational experiments on problems according with benchmark suite for PFSP reveal that the best performance improvement of our approach is about 100%, comparing with the previous work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  1. Fung J, Tang F, Mann S (2002) Mediated reality using computer graphics hardware for computer vision. In: Proceedings of the International Symposium on Wearable Computing 2002, 83–89

  2. Fung J, Mann S (2004) Computer vision signal processing on graphics processing units. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp V-93–V-96

  3. Abi-Chahla F (2015) Nvidia’s CUDA: The End of the CPU?. Tom’s Hardware

  4. Zouaneb I, Belarbi M, Chouarfia A (2016) Multi approach for real-time systems specification: case study of GPU parallel systems. Int J Big Data Intell 3(2):122–141

    Article  Google Scholar 

  5. Playne DP, Hawick KA (2015) Benchmarking multi-GPU communication using the shallow water equations. Int J Big Data Intell 2(3):157–167

    Article  Google Scholar 

  6. Wu CC, Ke JY, Lin H, Jhan SS (2014) Adjusting thread parallelism dynamically to accelerate dynamic programming with irregular workload distribution on GPGPUs. Int J Grid High Perform Comput (IJGHPC) 6(1):1–20

    Article  Google Scholar 

  7. Novoa C, Qasem A, Chaparala A (2015) A SIMD tabu search implementation for solving the quadratic assignment problem with GPU acceleration. In: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, pp 13

  8. Czapiński M, Barnes S (2011) Tabu search with two approaches to parallel flowshop evaluation on CUDA platform. J Parallel Distrib Comput 71:802–811

    Article  Google Scholar 

  9. Johnson SM (1954) Optimal two- and three-stage production schedules with setup times included. Naval Res Logist Q 1(1):61–68

    Article  MATH  Google Scholar 

  10. Garey MR, Johnson D, Sethi R (1976) The complexity of flowshop and jobshop scheduling. Math Oper Res 1(2):117–129

    Article  MATH  MathSciNet  Google Scholar 

  11. Chung C-S, Flynn J, Kirca O (2002) A branch and bound algorithm to minimize the total flow time for m-machine permutation flowshop problems. Int J Prod Econ 79(3):185–196

    Article  MATH  Google Scholar 

  12. Bautista J, Canoa A, Companys R, Ribasb I (2012) Solving the Fm\(\mid \)block\(\mid \)C\(_{max}\) problem using bounded dynamic programming. Eng Appl Artif Intell 25(6):1235–1245

    Article  Google Scholar 

  13. Ren T, Zhao P, Zhang D, Liu B, Yuan H, Bai D (2016) Permutation flow-shop scheduling problem to optimize a quadratic objective function. Eng Optim. doi:10.1080/0305215X.2016.1261127

  14. Gangadharan R, Rajendran C (1993) Heuristic algorithms for scheduling in the no-wait flowshop. Int J Prod Econ 32(3):285–290

    Article  Google Scholar 

  15. Santos N, Rebelo R, Pedroso J (2014) A tabu search for the permutation flow shop problem with sequence dependent setup times. Int J Data Anal Tech Strateg 6(3):275–285

    Article  Google Scholar 

  16. Gao J, Chen R, Dong W (2013) An efficient tabu search algorithm for the distributed permutation flowshop scheduling problem. Int J Prod Res 51(3):641–651

    Article  Google Scholar 

  17. Rajkumar R, Shahabudeen P (2009) An improved genetic algorithm for the flowshop scheduling problem. Int J Prod Res 47(1):233–249

    Article  MATH  Google Scholar 

  18. Jarosław P, Czesław S, Dominik Ż (2013) Optimizing bicriteria flow shop scheduling problem by simulated annealing algorithm. Proc Comput Sci 18:936–945

    Article  Google Scholar 

  19. Xu X, Xu Z, Gu X (2011) An asynchronous genetic local search algorithm for the permutation flowshop scheduling problem with total flowtime minimization. Expert Syst Appl 38(7):7970–7979

    Article  Google Scholar 

  20. Banka M, Ghomia SMTF, Jolai F, Behnamian J (2012) Application of particle swarm optimization and simulated annealing algorithms in flow shop scheduling problem under linear deterioration. Adv Eng Softw 47(1):1–6

    Article  Google Scholar 

  21. Ahmadiza F (2012) A new ant colony algorithm for makespan minimization in permutation flow shops. Comput Ind Eng 63(2):355–361

    Article  Google Scholar 

  22. Bożejko W, Uchroński M, Wodeck M (2016) Parallel metaheuristics for the cyclic flow shop scheduling problem. Comput Ind Eng 95:156–163

    Article  Google Scholar 

  23. Czapiński M (2010) Parallel simulated annealing with genetic enhancement for flowshop problem with C\(_{sum}\). Comput Ind Eng 59(4):778–785

    Article  Google Scholar 

  24. Bożejko W (2009) Solving the flow shop problem by parallel programming. J Parallel Distrib Comput 69(5):470–481

    Article  Google Scholar 

  25. Nowicki E, Smutnicki C (1998) The flow shop with parallel machines: a tabu search approach. Eur J Oper Res 106(2–3):226–253

    Article  MATH  Google Scholar 

  26. Janiak A, Janiak WA, Lichtenstein M (2008) Tabu Search on GPU. J UCS 14(14):2416–2426

    Google Scholar 

  27. Kaviani M, Abbasi M, Rahpeyma B, Yusefi M (2014) A hybrid tabu search-simulated annealing method to solve quadratic assignment problem. Decis Sci Lett 3(3):391–396

    Article  Google Scholar 

  28. Pattnaik A, Tang X, Jog A, Kayiran O, Mishra AK, Kandemir MT, Mutlu O, Das CR (2016) Scheduling techniques for GPU architectures with processing-in-memory capabilities. In: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, pp 31–44

  29. Han TD, Abdelrahman TS (2011) Reducing branch divergence in GPU programs. In: Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, pp 1–8

  30. Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28(2):39–55

    Article  Google Scholar 

  31. Lu F, Song J, Cao X, Zhu X (2012) CPU/GPU computing for long-wave radiation physics on large GPU clusters. Comput Geosci 41:47–55

    Article  Google Scholar 

  32. Nvidia CUDA (2017) CUDA C Programming Guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf

  33. Nvidia CUDA (2017) CUDA C BEST PRACTICES GUIDE. http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf

  34. Liu Y-F, Liu S-Y (2011) A hybrid discrete artificial bee colony algorithm for permutation flowshop scheduling problem. Appl Soft Comput 13(3):1459–1463

    Article  Google Scholar 

  35. Lin Q, Gao L, Li X, Zhang C (2015) A hybrid backtracking search algorithm for permutation flow-shop. Comput Ind Eng 85:437–446

    Article  Google Scholar 

  36. Glover F (1989) Tabu search—part I. ORSA J Comput 1(3):190–206

    Article  MATH  Google Scholar 

  37. Glover F (1990) Tabu search—part II. ORSA J Comput 2(1):4–32

    Article  MATH  Google Scholar 

  38. Huang L-T, Jhan S-S, Li Y-J, Wu C.C (2014) Solving the permutation problem efficiently for tabu search on CUDA GPUs. In: Proceedings of 6th International Conference on Computational Collective Intelligence Technologies and Applications, pp 342–352

  39. Wu C-C, Wei K-C, Lai W-S, Li Y-J (2016) Avoiding duplicated computation to improve the performance of PFSP on CUDA GPUs. Comput Sci Inform Technol 6:13–23

    Google Scholar 

  40. Fung WWL, Sham I, Yuan G, Aamodt TM (2007) Dynamic warp formation and scheduling for efficient GPU control flow. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp 407–420

  41. Taillard E (1993) Benchmarks for basic scheduling problems. Eur J Oper Res 64(2):278–285

    Article  MATH  Google Scholar 

Download references

Acknowledgements

We would like to express our gratitude for reviewers’ valuable comments and thank the National Science Council, Taiwan, for financially supporting this research under Contract No. MOST104-2221-E-018-007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao-Chin Wu.

Appendix

Appendix

This appendix argue that the maximum number of block areas is five for each permutation segment table.

The permutation table is constructed as follows. Each permutation is generated by swapping two positions on the parent permutation, resulting in \(C_2^n \) child permutations totally. These child permutations are placed into the permutation table by ordering defined in Table 4 .

Table 4 The indices of “From” and “To” for each ordered child permutation
Fig. 19
figure 19

Conceptual overview of the sth group of child permutations

The two indices, “From” and “To” indicate that two positions on the parent permutation are swapped for one child permutation, where \(1\le From\le n-1\) and \(2\le From\le n\). Note that “From” is smaller than “To” for any one of the child permutations.

The permutation table can be divided into (\(n-1)\) groups from left to right, where each group has the same “From” value. For instance, in the 7\(^{\mathrm{th}}\) group, the “From” index of each child permutation is 7. We illustrate the conceptual overview of the s\(^{\mathrm{th}}\) group in Fig. 19. There are exactly two shaded cells in each column, representing the two swapped positions.

The permutation table will be divided into segment tables, from the left to the right columns. Each permutation segment table consists of 32 consecutive columns because the size of one warp is 32.

Fig. 20
figure 20

Three cases for dividing permutation segment tables

First, assume that one permutation segment table falls in only one group of child permutations. There are three cases as shown in Fig. 20. Case 1 is derived when the permutation segment table (PST) begins from the first column of the group, where there are three block areas (BAs). Case 2 is obtained when the PST ends at the final column of the group, where there are five BAs. Case 3 is the remaining cases and there are four BAs. In general, there are five BAs in cases 2. However, if the PST in case 2 is equivalent to the s\(^{\mathrm{th}}\) group, there are only two BAs because BA3 and BA5 will not exist and BA4 will be merged into BA2.

Fig. 21
figure 21

Two cases when one PST is across two groups of child permutations

Next, let us look at the cases when one PST includes multiple groups of child permutations, as shown in Fig. 21. Case 5 contains the last columns in the sth group and the first column in the (s+1)th group, where several rows, between Row sand Rown, have the same values in their own row, respectively. There are four BAs in cases 5. Case 4 demonstrates a general case when one PST is comprised of multiple groups. If there are more than two groups to form a PST, the rows with distinct data will be merged into BA2, resulting in two BAs totally.

According to the above analysis, the maximum number of BAs is five.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, KC., Sun, X., Chu, H. et al. Reconstructing permutation table to improve the Tabu Search for the PFSP on GPU. J Supercomput 73, 4711–4738 (2017). https://doi.org/10.1007/s11227-017-2041-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-2041-7

Keywords

Navigation