Abstract
In this paper, we show how the theory of sorting networks can be applied to synthesize optimized general-purpose sorting libraries. Standard sorting libraries are often based on combinations of the classic Quicksort algorithm, with insertion sort applied as base case for small, fixed, numbers of inputs. Unrolling the code for the base case by ignoring loop conditions eliminates branching, resulting in code equivalent to a sorting network. By replacing it with faster sorting networks, we can improve the performance of these algorithms. We show that by considering the number of comparisons and swaps alone we are not able to predict any real advantage of this approach. However, significant speed-ups are obtained when taking advantage of instruction level parallelism and non-branching conditional assignment instructions, both of which are common in modern CPU architectures. Furthermore, a close control of how often registers have to be spilled to memory gives us a complete explanation of the performance of different sorting networks, allowing us to choose an optimal one for each particular architecture. Our experimental results show that using code synthesized from these efficient sorting networks as the base case for Quicksort libraries results in significant real-world speed-ups.
Similar content being viewed by others
References
Batcher KE (1968) Sorting networks and their applications. In: AFIPS Conference Proceedings, vol 32. Thomson Book Company, pp 307–314
Baddar SWA-H, Batcher KE (2011) Designing sorting networks: a new paradigm. Springer
Bose RC, Nelson RJ (1962) A sorting problem. J ACM 9(2): 282–296
Bundala D, Závodný J (2014) Optimal sorting networks. In: Dediu AH, Martín-Vide C, Sierra-Rodríguez JL, Truthe B (eds) LATA 2014, vol 8370 of LNCS. Springer, pp 236–247
Codish M, Cruz-Filipe L, Frank M, Schneider-Kamp P (2014) Twenty-five comparators is optimal when sorting nine inputs (and twenty-nine for ten). In: ICTAI 2014. IEEE, December, pp 186–193
Codish M, Cruz-Filipe L, Frank M, Schneider-Kamp P (2016) Sorting nine inputs requires twenty-five comparisons. J Comput Syst Sci 82(3): 551–563
Codish M, Cruz-Filipe L, Nebel M, Schneider-Kamp P (2015) Applying sorting networks to synthesize optimized sorting libraries. In: Falaschi M (ed) LOPSTR, vol 9527 of LNCS. Springer, pp 127–142
Codish M, Cruz-Filipe L, Schneider-Kamp P (2015) The quest for optimal sorting networks: efficient generation of two-layer prefixes. In: Winkler F, Negru V, Ida T, Jebelan T, Petcu D, Watt SM, Zaharie D (eds) SYNASC 2014. IEEE, pp 359–366
Codish M, Cruz-Filipe L, Schneider-Kamp P (2015) Sorting networks: the end game. In: Dediu AH, Formenti E, Martín-Vide C, Truthe B (eds) LATA 2015, vol 8977 of LNCS. Springer, pp 664–675
Eppstein D, Goodrich MT, Tamassia R (2010) Privacy-preserving data-oblivious geometric algorithms for geographic data. In: GIS 10, ACM, pp 13–22
Ehlers T, Müller M (2015) New bounds on optimal sorting networks. In: Beckmann A, Mitrana V, Soskova MI (eds) CiE 2015, vol 9136 of LNCS. Springer, pp 167–176
Furtak T, Amaral JN, Niewiadomski R (2007) Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms. In: SPAA. ACM, pp 348–357
Fisher JA, Faraboschi P, Young C (2005) Embedded computing: a VLIW approach to architecture, compilers, and tools. Morgan Kaufman
Gamble J (2011) Algorithm::networksort 1.30. Available from http://cpansearch.perl.org/src/JGAMBLE/Algorithm-Networksort-1.30/lib/Algorithm/Networksort.pm
Greß A, Zachmann G (2006) GPU-ABiSort: optimal parallel sorting on stream architectures. In: IPDPS. IEEE
Hibbard TN (1963) A simple sorting algorithm. J ACM 10(2): 142–150
Hoare CAR (1962) Quicksort. Comput J 5(1): 10–15
Knuth DE (1973) The art of computer programming, vol III: sorting and searching. Addison-Wesley
Lopez B, Cruz-Cortes N (2014) On the usage of sorting networks to big data. In: Arabnia HR, Yang MQ, Jandieri G, Park JJ, Solo AMG, Tinetti FG (eds) Advances in big data analytics: the 2014 WorldComp International Conference Proceedings. Mercury Learning and Information
Paoloni G (2010) How to benchmark code execution times on intel® IA-32 and IA-64 instruction set architectures. White paper 324264-001, Intel Corporation, September
Parberry I (1991) A computer-assisted optimal depth lower bound for nine-input sorting networks. Math Syst Theor 24(2): 101–116
Sedgewick R (1977) The analysis of quicksort programs. Acta Inf 7: 327–355
Sedgewick R, Flajolet P (1996) An introduction to the analysis of algorithms. Addison-Wesley-Longman
Silc J, Robic B, Ungerer T (1999) Processor architecture: from dataflow to superscalar and beyond. Springer
Sedgewick R, Wayne K (2011) Algorithms. Addison-Wesley, 4th edn
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Augusto Sampaio and Moreno Falashi
Rights and permissions
About this article
Cite this article
Codish, M., Cruz-Filipe, L., Nebel, M. et al. Optimizing sorting algorithms by using sorting networks. Form Asp Comp 29, 559–579 (2017). https://doi.org/10.1007/s00165-016-0401-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00165-016-0401-3