Advertisement

Theoretical Model of Computation and Algorithms for FPGA-Based Hardware Accelerators

  • Martin Hora
  • Václav Končický
  • Jakub TětekEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11436)

Abstract

While FPGAs have been used extensively as hardware accelerators in industrial computation [20], no theoretical model of computation has been devised for the study of FPGA-based accelerators. In this paper, we present a theoretical model of computation on a system with conventional CPU and an FPGA, based on word-RAM. We show several algorithms in this model which are asymptotically faster than their word-RAM counterparts. Specifically, we show an algorithm for sorting, evaluation of associative operation and general techniques for speeding up some recursive algorithms and some dynamic programs. We also derive lower bounds on the running times needed to solve some problems.

References

  1. 1.
    Ajtai, M., Komlós, J., Szemerédi, E.: An 0(n log n) sorting network. In: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, STOC 1983, pp. 1–9. ACM, New York (1983). http://doi.acm.org/10.1145/800061.808726
  2. 2.
    Alam, N.: Implementation of genetic algorithms in FPGA-based reconfigurable computing systems. Master’s thesis, Clemson University (2009). https://tigerprints.clemson.edu/all_theses/618/?utm_source=tigerprints.clemson.edu%2Fall_theses%2F618&utm_medium=PDF&utm_campaign=PDFCoverPages
  3. 3.
    Batcher, K.E.: Sorting networks and their applications. In: Proceedings of the Spring Joint Computer Conference, 30 April–2 May 1968, AFIPS 1968 (Spring), pp. 307–314. ACM, New York (1968). http://doi.acm.org/10.1145/1468075.1468121
  4. 4.
    Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with GPUs and FPGAs. In: 2008 Symposium on Application Specific Processors, pp. 101–107, June 2008Google Scholar
  5. 5.
    Chodowiec, P., Gaj, K.: Very compact FPGA implementation of the AES algorithm. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS, vol. 2779, pp. 319–333. Springer, Heidelberg (2003).  https://doi.org/10.1007/978-3-540-45238-6_26CrossRefGoogle Scholar
  6. 6.
    Chrysos, G., et al.: Opportunities from the use of FPGAs as platforms for bioinformatics algorithms. In: 2012 IEEE 12th International Conference on Bioinformatics Bioengineering (BIBE), pp. 559–565, November 2012Google Scholar
  7. 7.
    Cormen, T.H., Leiserson, C.E.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  8. 8.
    Demaine, E.: Cache-oblivious algorithms and data structures. EEF Summer Sch. Massive Data Sets 8(4), 1–249 (2002)Google Scholar
  9. 9.
    Grozea, C., Bankovic, Z., Laskov, P.: FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge. LNCS, vol. 6310, pp. 105–117. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-16233-6_12CrossRefGoogle Scholar
  10. 10.
    Guo, Z., Najjar, W., Vahid, F., Vissers, K.: A quantitative analysis of the speedup factors of FPGAs over processors. In: Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, FPGA 2004, pp. 162–170. ACM, New York (2004). http://doi.acm.org/10.1145/968280.968304
  11. 11.
    Hagerup, T.: Sorting and searching on the word RAM. In: Morvan, M., Meinel, C., Krob, D. (eds.) STACS 1998. LNCS, vol. 1373, pp. 366–398. Springer, Heidelberg (1998).  https://doi.org/10.1007/BFb0028575CrossRefGoogle Scholar
  12. 12.
    Harper, L.H.: An \(n \log n\) lower bound on synchronous combinational complexity. Proc. Am. Math. Soc. 64(2), 300–306 (1977). http://www.jstor.org/stable/2041447MathSciNetzbMATHGoogle Scholar
  13. 13.
    Huffstetler, J.: Intel processors and FPGAs-better together, May 2018. https://itpeernetwork.intel.com/intel-processors-fpga-better-together/
  14. 14.
    Hussain, H.M., Benkrid, K., Seker, H., Erdogan, A.T.: FPGA implementation of k-means algorithm for bioinformatics application: an accelerated approach to clustering microarray data. In: 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pp. 248–255, June 2011Google Scholar
  15. 15.
    Karatsuba, A., Ofman, Y.: Multiplication of many-digital numbers by automatic computers. In: Dokl. Akad. Nauk SSSR, vol. 145, pp. 293–294 (1962). http://mi.mathnet.ru/dan26729
  16. 16.
    Karkooti, M., Cavallaro, J.R., Dick, C.: FPGA implementation of matrix inversion using QRD-RLS algorithm. In: Conference Record of the Thirty-Ninth Asilomar Conference on Signals, Systems and Computers 2005, pp. 1625–1629 (2005)Google Scholar
  17. 17.
    Ma, L., Agrawal, K., Chamberlain, R.D.: A memory access model for highly-threaded many-core architectures. Future Gener. Comput. Syst. 30, 202–215 (2014). http://www.sciencedirect.com/science/article/pii/S0167739X13001349, special Issue on Extreme Scale Parallel Architectures and Systems, Cryptography in Cloud Computing and Recent Advances in Parallel and Distributed Systems, ICPADS 2012 Selected Papers
  18. 18.
    Mahram, A.: FPGA acceleration of sequence analysis tools in bioinformatics (2013). https://open.bu.edu/handle/2144/11126
  19. 19.
    Reed, B.: The height of a random binary search tree. J. ACM 50(3), 306–332 (2003).  https://doi.org/10.1145/765568.765571MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Romoth, J., Porrmann, M., Rückert, U.: Survey of FPGA applications in the period 2000–2015 (Technical report) (2017)Google Scholar
  21. 21.
    van Rooij, J.M., Bodlaender, H.L.: Exact algorithms for dominating set. Discrete Appl. Math. 159(17), 2147–2164 (2011). http://www.sciencedirect.com/science/article/pii/S0166218X11002393MathSciNetCrossRefGoogle Scholar
  22. 22.
    Sklavos, D.: DDR3 vs. DDR4: raw bandwidth by the numbers, September 2015. https://www.techspot.com/news/62129-ddr3-vs-ddr4-raw-bandwidth-numbers.html
  23. 23.
    Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969).  https://doi.org/10.1007/BF02165411MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Vitter, J.S.: Algorithms and data structures for external memory. Found. Trends Theor. Comput. Sci. 2(4), 54–63 (2008).  https://doi.org/10.1561/0400000014MathSciNetCrossRefGoogle Scholar
  25. 25.
    Vollmer, H.: Introduction to Circuit Complexity: A Uniform Approach. Springer, Heidelberg (1999).  https://doi.org/10.1007/978-3-662-03927-4CrossRefzbMATHGoogle Scholar
  26. 26.
    Woeginger, G.J.: Exact algorithms for NP-hard problems: a survey. In: Jünger, M., Reinelt, G., Rinaldi, G. (eds.) Combinatorial Optimization - Eureka, You Shrink!. LNCS, vol. 2570, pp. 185–207. Springer, Heidelberg (2003).  https://doi.org/10.1007/3-540-36478-1_17. http://dl.acm.org/citation.cfm?id=885909CrossRefGoogle Scholar
  27. 27.
    Zwick, U., Gupta, A.: Concrete complexity lecture notes, lecture 3 (1996). www.cs.tau.ac.il/~zwick/circ-comp-new/two.ps

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Computer Science InstituteCharles UniversityPragueCzech Republic
  2. 2.Department of Applied MathematicsCharles UniversityPragueCzech Republic

Personalised recommendations