Abstract
Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory parallel computers. We show that sample sort is also useful on a single processor. The main algorithmic insight is that element comparisons can be decoupled from expensive conditional branching using predicated instructions. This transformation facilitates optimizations like loop unrolling and software pipelining. The final implementation, albeit cache efficient, is limited by a linear number of memory accesses rather than the \(\mathcal{O}\!\left(n\log n\right)\) comparisons. On an Itanium 2 machine, we obtain a speedup of up to 2 over std::sort from the GCC STL library, which is known as one of the fastest available quicksort implementations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agarwal, R.: A super scalar sort algorithm for RISC processors. In: ACM SIGMOD Int. Conf. on Management of Data, pp. 240–246 (1996)
Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related problems. Communications of the ACM 31(9), 1116–1127 (1988)
Allan, V.H., Jones, R.B., Lee, R.M., Allan, S.J.: Software Pipelining. Computing Surveys 27(3), 367–432 (1995)
Blelloch, G.E., Leiserson, C.E., Maggs, B.M., Plaxton, C.G., Smith, S.J., Zagha, M.: A comparison of sorting algorithms for the connection machine CM-2. In: ACM Symposium on Parallel Architectures and Algorithms, pp. 3–16 (1991)
Brodal, G.S., Fagerberg, R., Vinther, K.: Engineering a cache-oblivious sorting algorithm. In: 6th Workshop on Algorithm Engineering and Experiments (2004)
Dulong, C., Krishnaiyer, R., Kulkarni, D., Lavery, D., Li, W., Ng, J., Sehr, D.: An Overview of the Intel® IA-64 Compiler. Intel Technology Journal (Q4) (1999)
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: 40th Symposium on Foundations of Computer Science, pp. 285–298 (1999)
Hennessy, J.L., Patterson, D.A.: Computer Architecture a Quantitative Approach, 3rd edn. Morgan Kaufmann, San Francisco (2002)
Hoare, C.A.R.: Quicksort. Communication of the ACM 4(7), 321 (1961)
Intel. Intel® Itanium® 2 Processor Reference Manual for Software Development and Optimization (April 2003)
Jiminez-Gonzalez, D., Larriba-Pey, J.-L., Navarro, J.J.: Algorithms for Memory Hierarchies. In: Meyer, U., Sanders, P., Sibeyn, J.F. (eds.) Algorithms for Memory Hierarchies. LNCS, vol. 2625, pp. 171–192. Springer, Heidelberg (2003)
Knuth, D.E.: The Art of Computer Programming— Sortingand Searching, 2nd edn., vol. 3. Addison-Wesley, Reading (1998)
LaMarca, A., Ladner, R.E.: The influence of caches on the performance of sorting. In: 8th Symposium on Discrete Algorithm, pp. 370–379 (1997)
Martínez, C., Roura, S.: Optimal sampling strategies in Quicksort and Quickselect. SIAM Journal on Computing 31(3), 683–705 (2002)
Mehlhorn, K., Sanders, P.: Scanning multiple sequences via cache memory. Algorithmica 35(1), 75–93 (2003)
Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, San Francisco (1997)
Musser, D.R.: Introspective sorting and selection algorithms. Softw. Pract. Exper. 27(8), 983–993 (1997)
Nyberg, C., Barclay, T., Cvetanovic, Z., Gray, J., Lomet, D.: AlphaSort: A RISC machine sort. In: SIGMOD, pp. 233–242 (1994)
Rahman, N.: Algorithms for Memory Hierarchies. In: Meyer, U., Sanders, P., Sibeyn, J.F. (eds.) Algorithms for Memory Hierarchies. LNCS, vol. 2625, pp. 171–192. Springer, Heidelberg (2003)
Ranade, A., Kothari, S.C., Udupa, U.R.U.: Register efficient mergesorting. In: Prasanna, V.K., Vajapeyam, S., Valero, M. (eds.) HiPC 2000. LNCS, vol. 1970, pp. 96–103. Springer, Heidelberg (2000)
Riedlinger, R., Grutkowski, T.: The High Bandwidth, 256KB 2nd Level Cache on an ItaniumTM Microprocessor. In: Proceedings of the IEEE International Solid-State Circuits Conference, San Francisco (February 2002)
Sanders, P.: Fast priority queues for cached memory. ACM Journal of Experimental Algorithmics 5 (2000)
Sen, S., Chatterjee, S.: Towards a theory of cache-efficient algorithms. In: 11th ACM Symposium of Discrete Algorithms, pp. 829–838 (2000)
Wickremesinghe, R., Arge, L., Chase, J.S., Vitter, J.S.: Efficient sorting using registers and caches. ACM Journal of Experimental Algorithmics 7(9) (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sanders, P., Winkel, S. (2004). Super Scalar Sample Sort. In: Albers, S., Radzik, T. (eds) Algorithms – ESA 2004. ESA 2004. Lecture Notes in Computer Science, vol 3221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30140-0_69
Download citation
DOI: https://doi.org/10.1007/978-3-540-30140-0_69
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23025-0
Online ISBN: 978-3-540-30140-0
eBook Packages: Springer Book Archive