Abstract
A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads. Using commodity processors, memory, and arrays of SCSI disks, AlphaSort runs the industrystandard sort benchmark in seven seconds. This beats the best published record on a 32-CPU 32-disk Hypercube by 8:1. On another benchmark, AlphaSort sorted more than a gigabyte in one minute. AlphaSort is a cache-sensitive, memoryintensive sort algorithm. We argue that modern architectures require algorithm designers to re-examine their use of the memory hierarchy. AlphaSort uses clustered data structures to get good cache locality, file striping to get high disk bandwidth, QuickSort to generate runs, and replacement-selection to merge the runs. It uses shared memory multiprocessors to break the sort into subsort chores. Because startup times are becoming a significant part of the total time, we propose two new benchmarks: (1) MinuteSort: how much can you sort in one minute, and (2) PennySort: how much can you sort for one penny.
Similar content being viewed by others
References
Anon et-al. A measure of transaction processing power.Datamation, 31 (7):112–118, 1985. Also in: Stonebraker, M.J., ed.Readings in Database Systems. San Mateo, CA: Morgan Kaufmann, 1989.
Baer, J.L. and Lin, Y.B. Improving Quicksort performance with codeword data structure.IEEE Transactions on Software Engineering, 15(5):622–631, 1989.
Baugsto, B.A.W. and Greipsland, J.F. Parallel sorting methods for large data volumes on a hypercube database computer.Proceedings of the Sixth International Workshop on Database Machines, Deauville, France, 1989.
Baugsto, B.A.W., Greipsland, J.F., and Kamerbeek, J. Sorting large data files on POMA.Proceedings of CONPAR-90VAPP IV, Zurich, 1990.
Beck, M., Bitton, D., and Wilkenson, W.K. Sorting large files on a backend multiprocessor.IEEE Transactions on Computers, V, 37(7):769–778, 1988.
Bitton, D. Design, analysis and implementation of parallel external sorting algorithms. Ph.D. Thesis, University of Wisconsin, Madison, WI 1981.
Conner, W.M., Offset value coding.IBM Technical Disclosure Bulletin, 20(7):2832–2837, 1977.
Cvetanovic, Z. and Bhandarkar, D. Characterization of Alpha AXP performance using TP and SPEC workloads.Proceedings of the Twenty-First International Symposium on Computer Architecture, Chicago, 1994.
DeWitt, D.J., Naughton, J.F., and Schneider, D.A. Parallel sorting on a shared-nothing architecture using probabilistic splitting.Proceedings of the First International Conference on Parallel and Distributed Information Systems, Los Alamitos, NM, 1992.
Graefe, G. Parallel external sorting in Volcano. University of Colorado Computer Science Technical Report 459, June, 1990.
Graefe, G. and Thakkar, S.S. Tuning a parallel sort algorithm on a shared-memory multiprocessor.Software Practice and Experience, 22(7):495, 1992.
Gray, J., ed.The Benchmark Handbook for Database and Transaction Processing Systems. San Mateo, CA: Morgan Kaufmann, 1991.
Kaivalya, D. The SPEC benchmark suite. In:The Benchmark Handbook for Database and Transaction Processing Systems, Second Edition. San Mateo, CA: Morgan Kaufmann, 1993.
Kim, M.Y. Synchronized disk interleaving.IEEE TOCS, 35(11):978–988, 1986.
Kitsuregawa, M., Yang, W., and Fushimi, S. Evaluation of an 18-stage pipeline hardware sorter.Proceedings of the Sixth International Workshop on Database Machines, Deauville, France, 1989.
Knuth, D.E.,Sorting and Searching, The Art of Computer Programming, Reading, MA: Addison Wesley, 1973.
Lorie, R.A. and Young, H. C. A. low communications sort algorithm for a parallel database machine.Proceedings of the Fifteenth VLDB Amsterdam, 1989.
Lorin, H.Sorting. Englewood Cliffs, NJ: Addison Wesley, 1974.
Nyberg, C., Barclay, T., Cvetanovic, Z., Gray, J., Lomet, D. AlphaSort: A RISC machine sort.Proceedings of the ACM SIGMOD International Conference on Management of Data, Minneapolis, MN, 1994.
Salzberg, B., Tsukerman, A., Gray, J., Stewart, M., Uren, S. Vaughn, B. FastSort: An external sort using parallel processing.Proceedings of SIGMOD, Atlantic City, NJ, 1990.
Tsukerman, A. FastSort: An external sort using parallel processing.Tandem Systems Review, 3(4):57–72, 1986.
Weinberger, P.J. Private communication, 1986.
Yamane, Y. and Take, R.: Parallel partition sort for database machines. In: Kitsuregawa, M. and Tanaka, H., eds.Database Machines and Knowledge Based Machines. Boston: Kluwar Academic Publishers, 1988, pp. 117–130.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Nyberg, C., Barclay, T., Cvetanovic, Z. et al. Alphasort: A cache-sensitive parallel external sort. VLDB Journal 4, 603–627 (1995). https://doi.org/10.1007/BF01354877
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01354877