Abstract
Computer architectures are quickly changing toward heterogeneous many-core systems. Such a trend opens up interesting opportunities but also raises immense challenges since the efficient use of heterogeneous many-core systems is not a trivial problem. Software-configurable microprocessors and FPGAs add further diversity but also increase complexity. In this paper, we explore the use of sorting networks on field-programmable gate arrays (FPGAs). FPGAs are very versatile in terms of how they can be used and can also be added as additional processing units in standard CPU sockets. Our results indicate that efficient usage of FPGAs involves non-trivial aspects such as having the right computation model (a sorting network in this case); a careful implementation that balances all the design constraints in an FPGA; and the proper integration strategy to link the FPGA to the rest of the system. Once these issues are properly addressed, our experiments show that FPGAs exhibit performance figures competitive with those of modern general-purpose CPUs while offering significant advantages in terms of power consumption and parallel stream evaluation.
Similar content being viewed by others
References
Abadi D. J., Carney D., Çetintemel U., Cherniack M., Convey C., Lee S., Stonebraker M., Tatbul N., Zdonik S.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)
Abadi, D.J., Ahmad, Y., Balazinska, M., Çetintemel, U., Cherniack, M., Hwang, J.H., Lindner, W., Maskey, A.S., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.: The design of the Borealis stream processing engine. In: Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2005)
Ajtai, M., Komlós, J., Szemerédi, E.: An O(n log n) sorting network. In: ACM Symposium on Theory of Computing (STOC), pp. 1–9 (1983)
Arasu A., Babu S., Widom J.: The cql continuous query language: semantic foundations and query execution. VLDB J. 15(2), 121–142 (2006)
Batcher, K.E.: Sorting networks and their applications. In: AFIPS Spring Joint Computer Conference, pp. 307–314 (1968)
Burleson, W.P., Ciesielski, M., Klass, F., Liu, W.: Wave-pipelining: a tutorial and research survey. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 6(3), 464–474. doi:10.1109/92.711317
Chhugani J., Nguyen A. D., Lee V. W., Macy W., Hagog M., Chen Y. K., Baransi A., Kumar S., Dubey P.: Efficient implementation of sorting on multi-core SIMD CPU architecture. Proc. VLDB Endow. 1(2), 1313–1324 (2008)
Cormen T. H., Leiserson C. E., Rivest R. L., Stein C.: Introduction to Algorithms. 2nd edn. MIT Press, Cambridge (2001)
DeWitt D.J. DIRECT—a multiprocessor organization for supporting relational database management systems. IEEE Trans. Comput. 28(6) (1979)
Furtak, T., Amaral, J.N., Niewiadomski, R.: Using SIMD registers and instructions to enable instruction-level parallelism in sorting algorithms. In: ACM Symposium on Parallel Algorithms and Architectures (SPAA), pp. 348–357 (2007)
Gedik, B., Bordawekar, R.R., Yu, P.S.: CellSort: high performance sorting on the cell processor. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, pp. 1286–1297 (2007)
Gold, B.T., Ailamaki, A., Huston, L., Falsafi, B.: Accelerating database operators using a network processor. In: International Workshop on Data Management on New Hardware (DaMoN), Baltimore, MD, USA (2005)
Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., Manocha, D.: Fast computation of database operations using graphics processors. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of data, Paris, France, pp. 215–226 (2004)
Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high performance graphics coprocessor sorting for large database management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, Chicago, IL, USA, pp. 325–336 (2006)
Greaves, D.J., Singh, S.: Kiwi: Synthesis of FPGA circuits from parallel programs. In: IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM) (2008)
Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: QPipe: a simultaneously pipelined relational query engine. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD, USA (2005)
Huang, S.S., Hormati, A., Bacon, D.F., Rabbah, R.: Liquid metal: object-oriented programming across the hardware/software boundary. In: European Conference on Object-Oriented Programming, Paphos, Cyprus (2008)
Inoue, H., Moriyama, T., Komatsu, H., Nakatani, T.: AA-sort a new parallel sorting algorithm for multi-core SIMD processors. In: International Conference on Parallel Architecture and Compilation Techniques (PACT), Brasov, Romania, pp. 189–198 (2007)
Kickfire: http://www.kickfire.com/ (2009)
Knuth D. E.: The Art of Computer Programming, Volume 3: Sorting and Searching. 2nd edn. Addison-Wesley, Reading (1998)
Manegold S., Boncz P. A., Kersten M. L.: Optimizing database architecture for the new bottleneck: Memory access. VLDB J. 9(3), 231–246 (2000)
Mitra, A., Vieira, M.R., Bakalov, P., Tsotras, V.J., Najjar, W.: Boosting XML filtering through a scalable FPGA-based architecture. In: Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA (2009)
Mueller, R.: Data processing on embedded devices. PhD thesis, ETH Zurich, Diss. ETH No. 19163 (2010)
Mueller, R., Eguro, K.: FPGA-accelerated deserialization of object structures. Technical report MSR-TR-2009-126, Microsoft Research Redmond (2009)
Mueller, R., Teubner, J., Alonso, G.: Data processing on fpgas. Proc. VLDB Endow. 2(1) (2009a)
Mueller, R., Teubner, J., Alonso, G.: Streams on wires—a query compiler for FPGAs. Proc. VLDB Endow. 2(1) (2009b)
Netezza: http://www.netezza.com/ (2009)
Oflazer K.: Design and implementation of a single-chip 1-d median filter. IEEE Trans. Acoust. Speech Signal Process. 31, 1164–1168 (1983)
Q6700 datasheet: Intel Core 2 Extreme Quad-Core processor XQ6000 Sequence and Intel Core 2 Quad Processor Q600 Sequence Datasheet. Intel (2007)
Rabiner L. R., Sambur M. R., Schmidt C. E.: Applications of a nonlinear smoothing algorithm to speech processing. IEEE Trans. Acoust. Speech Signal Process. 23(6), 552–557 (1975)
Tukey J. W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)
Wendt, P.D., Coyle, E.J., Gallagher, N.C., Jr.: Stack filters. IEEE Trans. Acoust. Speech Signal Process. 34(4) (1986)
Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., Brown, J.F., Agarwal, A.: On-chip interconnection architecture of the tile processor. IEEE Micro 27(5) (2007)
Xilinx: Virtex-5 FGPA Data Sheet: DC and Switching Characteristics. Xilinx Inc., v5.0 edn (2009a)
Xilinx: Virtex-5 FPGA User Guide. Xilinx Inc., v4.5 edn (2009b)
XtremeData: http://www.xtremedatainc.com/ (2009)
Zhou, J., Ross, K.A.: Implementing database operations using SIMD instructions. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, WI, USA (2002)
Author information
Authors and Affiliations
Corresponding author
Additional information
The work reported in this article was done while Rene Mueller was at ETH Zurich.
Rights and permissions
About this article
Cite this article
Mueller, R., Teubner, J. & Alonso, G. Sorting networks on FPGAs. The VLDB Journal 21, 1–23 (2012). https://doi.org/10.1007/s00778-011-0232-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-011-0232-z