Skip to main content
Log in

Alphasort: A cache-sensitive parallel external sort

  • Special System-oriented Section: The Best of SIGMOD '94
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

A new sort algorithm, called AlphaSort, demonstrates that commodity processors and disks can handle commercial batch workloads. Using commodity processors, memory, and arrays of SCSI disks, AlphaSort runs the industrystandard sort benchmark in seven seconds. This beats the best published record on a 32-CPU 32-disk Hypercube by 8:1. On another benchmark, AlphaSort sorted more than a gigabyte in one minute. AlphaSort is a cache-sensitive, memoryintensive sort algorithm. We argue that modern architectures require algorithm designers to re-examine their use of the memory hierarchy. AlphaSort uses clustered data structures to get good cache locality, file striping to get high disk bandwidth, QuickSort to generate runs, and replacement-selection to merge the runs. It uses shared memory multiprocessors to break the sort into subsort chores. Because startup times are becoming a significant part of the total time, we propose two new benchmarks: (1) MinuteSort: how much can you sort in one minute, and (2) PennySort: how much can you sort for one penny.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Anon et-al. A measure of transaction processing power.Datamation, 31 (7):112–118, 1985. Also in: Stonebraker, M.J., ed.Readings in Database Systems. San Mateo, CA: Morgan Kaufmann, 1989.

  • Baer, J.L. and Lin, Y.B. Improving Quicksort performance with codeword data structure.IEEE Transactions on Software Engineering, 15(5):622–631, 1989.

    Google Scholar 

  • Baugsto, B.A.W. and Greipsland, J.F. Parallel sorting methods for large data volumes on a hypercube database computer.Proceedings of the Sixth International Workshop on Database Machines, Deauville, France, 1989.

  • Baugsto, B.A.W., Greipsland, J.F., and Kamerbeek, J. Sorting large data files on POMA.Proceedings of CONPAR-90VAPP IV, Zurich, 1990.

  • Beck, M., Bitton, D., and Wilkenson, W.K. Sorting large files on a backend multiprocessor.IEEE Transactions on Computers, V, 37(7):769–778, 1988.

    Google Scholar 

  • Bitton, D. Design, analysis and implementation of parallel external sorting algorithms. Ph.D. Thesis, University of Wisconsin, Madison, WI 1981.

    Google Scholar 

  • Conner, W.M., Offset value coding.IBM Technical Disclosure Bulletin, 20(7):2832–2837, 1977.

    Google Scholar 

  • Cvetanovic, Z. and Bhandarkar, D. Characterization of Alpha AXP performance using TP and SPEC workloads.Proceedings of the Twenty-First International Symposium on Computer Architecture, Chicago, 1994.

  • DeWitt, D.J., Naughton, J.F., and Schneider, D.A. Parallel sorting on a shared-nothing architecture using probabilistic splitting.Proceedings of the First International Conference on Parallel and Distributed Information Systems, Los Alamitos, NM, 1992.

  • Graefe, G. Parallel external sorting in Volcano. University of Colorado Computer Science Technical Report 459, June, 1990.

  • Graefe, G. and Thakkar, S.S. Tuning a parallel sort algorithm on a shared-memory multiprocessor.Software Practice and Experience, 22(7):495, 1992.

    Google Scholar 

  • Gray, J., ed.The Benchmark Handbook for Database and Transaction Processing Systems. San Mateo, CA: Morgan Kaufmann, 1991.

    Google Scholar 

  • Kaivalya, D. The SPEC benchmark suite. In:The Benchmark Handbook for Database and Transaction Processing Systems, Second Edition. San Mateo, CA: Morgan Kaufmann, 1993.

    Google Scholar 

  • Kim, M.Y. Synchronized disk interleaving.IEEE TOCS, 35(11):978–988, 1986.

    Google Scholar 

  • Kitsuregawa, M., Yang, W., and Fushimi, S. Evaluation of an 18-stage pipeline hardware sorter.Proceedings of the Sixth International Workshop on Database Machines, Deauville, France, 1989.

  • Knuth, D.E.,Sorting and Searching, The Art of Computer Programming, Reading, MA: Addison Wesley, 1973.

    Google Scholar 

  • Lorie, R.A. and Young, H. C. A. low communications sort algorithm for a parallel database machine.Proceedings of the Fifteenth VLDB Amsterdam, 1989.

  • Lorin, H.Sorting. Englewood Cliffs, NJ: Addison Wesley, 1974.

    Google Scholar 

  • Nyberg, C., Barclay, T., Cvetanovic, Z., Gray, J., Lomet, D. AlphaSort: A RISC machine sort.Proceedings of the ACM SIGMOD International Conference on Management of Data, Minneapolis, MN, 1994.

  • Salzberg, B., Tsukerman, A., Gray, J., Stewart, M., Uren, S. Vaughn, B. FastSort: An external sort using parallel processing.Proceedings of SIGMOD, Atlantic City, NJ, 1990.

  • Tsukerman, A. FastSort: An external sort using parallel processing.Tandem Systems Review, 3(4):57–72, 1986.

    Google Scholar 

  • Weinberger, P.J. Private communication, 1986.

  • Yamane, Y. and Take, R.: Parallel partition sort for database machines. In: Kitsuregawa, M. and Tanaka, H., eds.Database Machines and Knowledge Based Machines. Boston: Kluwar Academic Publishers, 1988, pp. 117–130.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nyberg, C., Barclay, T., Cvetanovic, Z. et al. Alphasort: A cache-sensitive parallel external sort. VLDB Journal 4, 603–627 (1995). https://doi.org/10.1007/BF01354877

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01354877

Key Words

Navigation