Skip to main content
Log in

Parallel Partition and Merge QuickSort (PPMQSort) on Multicore CPUs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

An explosive amount of data has tremendous impacts on sorting, searching, indexing, and so on. Sorting is one of the basic Computer Science problems needed to be fast and efficient to serve Big Data. This paper presents an efficient and scalable algorithm called Parallel Partition and Merge QuickSort (PPMQSort) running on any shared memory/multicore/multi-socket systems. Together with OpenMP 3.0 library, the PPMQSort is developed to be compatible and benchmarked with the fastest C/C++ Stdlib qsort(). The PPMQSort recursively divides an unsorted input array into partially sorted partitions up to Cutoff length using nested multithreading. Finally, those independent partitions are qsort() (conquered) such that no synchronizations are needed. The resulting Speedup of 12.29\(\times \) on a dual-socket 8-core Xeon E5520 can be achieved for sorting random 200 M 32-bit integer data at 16 threads. With the same configuration, a 4-core AMD A6-3600 CPU (non-HyperThread) can reach up to 4.67\(\times \), a superlinear Speedup. It has been proved that the proposed PPMQSort can exploit all available cache levels and HyperThread CPU cores well thus utilizing up to 83 % and 96 % of CPU on E5520 and A6-3600, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Hoare CAR (1962) Quicksort ACM 4:321

    Article  Google Scholar 

  2. Sedgewick R (1978) Implementing quicksort program. Commun ACM 21(10):847–857

    Article  MATH  Google Scholar 

  3. Mishra AD (2009) Selection of best sorting algorithm for a particular problem. Master’s thesis, Thapar University, Computer Science and Engineering Department

  4. Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114

    Article  Google Scholar 

  5. Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269

    Article  MATH  Google Scholar 

  6. Bhandarkar SM, Arabnia HR (1997) Parallel computer vision on a reconfigurable multiprocessor network. IEEE Trans Parallel Distrib Syst 8(3):292–309

    Article  Google Scholar 

  7. Koch D, Torresen J (2011) Fpgasort: a high performance sorting architecture exploiting run-time reconfiguration on fpgas for large problem sorting. In: Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’11. ACM, New York, pp 45–54

  8. Mueller R, Teubner J, Alonso G (2012) Sorting networks on fpgas. VLDB J 21(1):1–23

    Article  Google Scholar 

  9. Casper J, Olukotun K (2014) Hardware acceleration of database operations. In: Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays, FPGA ’14. ACM, New York, pp 151–160

  10. Capannini G, Silvestri F, Baraglia R (2012) Sorting on gpus for large scale datasets: a thorough comparison. Inf Process Manag 48(5):903–917

    Article  Google Scholar 

  11. Xiaochen T, Rocki K, Suda R (2013) Register level sort algorithm on multi-core simd processors. In: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms, p 9. ACM

  12. Heidelberger P, Norton A, Robinson JT (1990) Parallel quicksort using fetch-and-add. IEEE Trans Comput 39(1):847–857

    Article  Google Scholar 

  13. Tsigas P, Zhang Y (2003) A simple, fast parallel implementation of quicksort and its performance evaluation on sun enterprise 10000. In: 11th Euromicro Conference on Parallel Distributed and Network based Processing (PDP 2003). Genoa, pp 372–381

  14. Sub M, Leopold C (2004) A user’s experience with parallel sorting and openmp. In: Proc. of the 6th European Workshop on OpenMP (EWOMP 2004). Stockholm

  15. Man D, Ito Y, Nakano K (2009) An efficient parallel sorting compatible with the standard qsort. In: International Conference on Parallel and Distributed Computing, Applications and Technologies. Hiroshima, pp 512–517

  16. Man D, Ito Y, Nakano K (2011) An efficient parallel sorting compatible with the standard qsort. Int J Found Comput Sci 22(5):1057–1071

    Article  MATH  Google Scholar 

  17. Kim KJ, Cho SJ, Jeon JW (2011) Parallel quick sort algorithms analysis using openmp 3.0 in embedded system. In: 11th International Conference on Control, Automation and Systems. KINTEX, Gyeonggi-do, pp 757–761

  18. Mahafzah BA (2013) Performance assessment of multithreaded quicksort algorithm on simultaneous multithreaded architecture. J Supercomput 66:339–363

    Article  Google Scholar 

  19. Bingmann T (2015) Andreas Eberle, and Peter Sanders. Engineering parallel string sorting. Algorithmica, pp 1–52

  20. Rashid L, Hassanein WM, Hammad MA (2010) Analyzing and enhancing the parallel sort operation on multithreaded architectures. J Supercomput 53:293–312

    Article  Google Scholar 

  21. Saleem S, Lali MIU, Nawaz MS, Nauman AB (2014) Multi-core program optimization: parallel sorting algorithms in intel cilk plus. Int J Hybrid Inf Technol 7(2):151–164

    Article  Google Scholar 

  22. Architecture Review Board (2014) The openmp api specification for parallel programming. http://www.openmp.org

  23. Gustafson JL (1990) Fixed time, tiered memory, and superlinear speedup. In: Proceedings of the Fifth Distributed Memory Computing Conference (DMCC5)

  24. Helmbold DP, Mcdowell CE (1990) Modeling speedup (n) greater than n. IEEE Trans Parallel Distrib Syst 1(2):250–256

    Article  MathSciNet  Google Scholar 

  25. Weaver VM (2013) Linux perf event features and overhead. In: Second International Workshop on Performance Analysis of Workload Optimized Systems (FastPath 2013). Austin

  26. Zhang Y, Li ZP, Cao HF (2015) System-enforced deterministic streaming for efficient pipeline parallelism. J Comput Sci Technol 30(1):57–73

    Article  MathSciNet  Google Scholar 

  27. Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing. 2nd ed. Pearson Education Limited

  28. Akhter S, Roberts J (2006) Multi-core programming increasing performance through software multi-threading. Intel Press, Hillsboro

    Google Scholar 

  29. Barker KJ, Davis K, Hoisie A, Kerbyson DJ, Lang Mike, Pakin Scott, Sancho Jose Carlos (2008) A performance evaluation of the nehalem quad-core processor for scientific computing. Parallel Process Lett 18(4):453–469

    Article  MathSciNet  Google Scholar 

  30. Wulf WA, McKee SA (1995) Hitting the memory wall: implications of the obvious. SIGARCH Comput Archit News 23(1):20–24

    Article  Google Scholar 

  31. Eyerman S, Smith JE, Eeckhout L (2006) Characterizing the branch misprediction penalty. In: IEEE International Symposium on Performance Analysis of Systems Software (ISPASS 2006). Austin, pp 48–58

  32. Qureshi K, Majeed B, Kazmi JH, Madani SA (2012) Task partitioning, scheduling and load balancing strategy for mixed nature of tasks. J Supercomput 59(3):1348–1359

    Article  Google Scholar 

Download references

Acknowledgments

The authors wish to thank Mr. Apisit Rattanatranurak and Mr. Surapong Towtiamton for experiments and discussions on some of the algorithms in this paper. The authors wish to thank the reviewers for their insightful comments which greatly improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ratthaslip Ranokphanuwat.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ranokphanuwat, R., Kittitornkun, S. Parallel Partition and Merge QuickSort (PPMQSort) on Multicore CPUs. J Supercomput 72, 1063–1091 (2016). https://doi.org/10.1007/s11227-016-1641-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1641-y

Keywords

Navigation