Skip to main content
Log in

Sorting networks on FPGAs

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Computer architectures are quickly changing toward heterogeneous many-core systems. Such a trend opens up interesting opportunities but also raises immense challenges since the efficient use of heterogeneous many-core systems is not a trivial problem. Software-configurable microprocessors and FPGAs add further diversity but also increase complexity. In this paper, we explore the use of sorting networks on field-programmable gate arrays (FPGAs). FPGAs are very versatile in terms of how they can be used and can also be added as additional processing units in standard CPU sockets. Our results indicate that efficient usage of FPGAs involves non-trivial aspects such as having the right computation model (a sorting network in this case); a careful implementation that balances all the design constraints in an FPGA; and the proper integration strategy to link the FPGA to the rest of the system. Once these issues are properly addressed, our experiments show that FPGAs exhibit performance figures competitive with those of modern general-purpose CPUs while offering significant advantages in terms of power consumption and parallel stream evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abadi D. J., Carney D., Çetintemel U., Cherniack M., Convey C., Lee S., Stonebraker M., Tatbul N., Zdonik S.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)

    Article  Google Scholar 

  2. Abadi, D.J., Ahmad, Y., Balazinska, M., Çetintemel, U., Cherniack, M., Hwang, J.H., Lindner, W., Maskey, A.S., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.: The design of the Borealis stream processing engine. In: Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2005)

  3. Ajtai, M., Komlós, J., Szemerédi, E.: An O(n log n) sorting network. In: ACM Symposium on Theory of Computing (STOC), pp. 1–9 (1983)

  4. Arasu A., Babu S., Widom J.: The cql continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)

    Article  Google Scholar 

  5. Batcher, K.E.: Sorting networks and their applications. In: AFIPS Spring Joint Computer Conference, pp. 307–314 (1968)

  6. Burleson, W.P., Ciesielski, M., Klass, F., Liu, W.: Wave-pipelining: a tutorial and research survey. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 6(3), 464–474. doi:10.1109/92.711317

  7. Chhugani J., Nguyen A. D., Lee V. W., Macy W., Hagog M., Chen Y. K., Baransi A., Kumar S., Dubey P.: Efficient implementation of sorting on multi-core SIMD CPU architecture. Proc. VLDB Endow. 1(2), 1313–1324 (2008)

    Google Scholar 

  8. Cormen T. H., Leiserson C. E., Rivest R. L., Stein C.: Introduction to Algorithms. 2nd edn. MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  9. DeWitt D.J. DIRECT—a multiprocessor organization for supporting relational database management systems. IEEE Trans. Comput. 28(6) (1979)

  10. Furtak, T., Amaral, J.N., Niewiadomski, R.: Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms. In: ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 348–357 (2007)

  11. Gedik, B., Bordawekar, R.R., Yu, P.S.: CellSort: high performance sorting on the cell processor. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, pp. 1286–1297 (2007)

  12. Gold, B.T., Ailamaki, A., Huston, L., Falsafi, B.: Accelerating database operators using a network processor. In: International Workshop on Data Management on New Hardware (DaMoN), Baltimore, MD, USA (2005)

  13. Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of data, Paris, France, pp. 215–226 (2004)

  14. Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics coprocessor sorting for large database management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, pp. 325–336 (2006)

  15. Greaves, D.J., Singh, S.: Kiwi: Synthesis of FPGA circuits from parallel programs. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM) (2008)

  16. Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD, USA (2005)

  17. Huang, S.S., Hormati, A., Bacon, D.F., Rabbah, R.: Liquid metal: object-oriented programming across the hardware/software boundary. In: European Conference on Object-Oriented Programming, Paphos, Cyprus (2008)

  18. Inoue, H., Moriyama, T., Komatsu, H., Nakatani, T.: AA-sort a new parallel sorting algorithm for multi-core SIMD processors. In: International Conference on Parallel Architecture and Compilation Techniques (PACT), Brasov, Romania, pp. 189–198 (2007)

  19. Kickfire: http://www.kickfire.com/ (2009)

  20. Knuth D. E.: The Art of Computer Programming, Volume 3: Sorting and Searching. 2nd edn. Addison-Wesley, Reading (1998)

    Google Scholar 

  21. Manegold S., Boncz P. A., Kersten M. L.: Optimizing database architecture for the new bottleneck: Memory access. VLDB J. 9(3), 231–246 (2000)

    Article  Google Scholar 

  22. Mitra, A., Vieira, M.R., Bakalov, P., Tsotras, V.J., Najjar, W.: Boosting XML filtering through a scalable FPGA-based architecture. In: Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2009)

  23. Mueller, R.: Data processing on embedded devices. PhD thesis, ETH Zurich, Diss. ETH No. 19163 (2010)

  24. Mueller, R., Eguro, K.: FPGA-accelerated deserialization of object structures. Technical report MSR-TR-2009-126, Microsoft Research Redmond (2009)

  25. Mueller, R., Teubner, J., Alonso, G.: Data processing on fpgas. Proc. VLDB Endow. 2(1) (2009a)

  26. Mueller, R., Teubner, J., Alonso, G.: Streams on wires—a query compiler for FPGAs. Proc. VLDB Endow. 2(1) (2009b)

  27. Netezza: http://www.netezza.com/ (2009)

  28. Oflazer K.: Design and implementation of a single-chip 1-d median filter. IEEE Trans. Acoust. Speech Signal Process. 31, 1164–1168 (1983)

    Article  Google Scholar 

  29. Q6700 datasheet: Intel Core 2 Extreme Quad-Core processor XQ6000 Sequence and Intel Core 2 Quad Processor Q600 Sequence Datasheet. Intel (2007)

  30. Rabiner L. R., Sambur M. R., Schmidt C. E.: Applications of a nonlinear smoothing algorithm to speech processing. IEEE Trans. Acoust. Speech Signal Process. 23(6), 552–557 (1975)

    Article  Google Scholar 

  31. Tukey J. W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)

  32. Wendt, P.D., Coyle, E.J., Gallagher, N.C., Jr.: Stack filters. IEEE Trans. Acoust. Speech Signal Process. 34(4) (1986)

  33. Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., Brown, J.F., Agarwal, A.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5) (2007)

  34. Xilinx: Virtex-5 FGPA Data Sheet: DC and Switching Characteristics. Xilinx Inc., v5.0 edn (2009a)

  35. Xilinx: Virtex-5 FPGA User Guide. Xilinx Inc., v4.5 edn (2009b)

  36. XtremeData: http://www.xtremedatainc.com/ (2009)

  37. Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, WI, USA (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rene Mueller.

Additional information

The work reported in this article was done while Rene Mueller was at ETH Zurich.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mueller, R., Teubner, J. & Alonso, G. Sorting networks on FPGAs. The VLDB Journal 21, 1–23 (2012). https://doi.org/10.1007/s00778-011-0232-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-011-0232-z

Keywords

Navigation