Theoretical Model of Computation and Algorithms for FPGA-Based Hardware Accelerators

Hora, Martin; Končický, Václav; Tětek, Jakub

doi:10.1007/978-3-030-14812-6_18

Martin Hora¹⁶,
Václav Končický¹⁷ &
Jakub Tětek¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11436))

Included in the following conference series:

International Conference on Theory and Applications of Models of Computation

628 Accesses
1 Altmetric

Abstract

While FPGAs have been used extensively as hardware accelerators in industrial computation [20], no theoretical model of computation has been devised for the study of FPGA-based accelerators. In this paper, we present a theoretical model of computation on a system with conventional CPU and an FPGA, based on word-RAM. We show several algorithms in this model which are asymptotically faster than their word-RAM counterparts. Specifically, we show an algorithm for sorting, evaluation of associative operation and general techniques for speeding up some recursive algorithms and some dynamic programs. We also derive lower bounds on the running times needed to solve some problems.

This work was carried out while the authors were participants in the DIMATIA-DIMACS REU exchange program at Rutgers University.

The work was supported by the grant SVV–2017–260452.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ajtai, M., Komlós, J., Szemerédi, E.: An 0(n log n) sorting network. In: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing, STOC 1983, pp. 1–9. ACM, New York (1983). http://doi.acm.org/10.1145/800061.808726
Alam, N.: Implementation of genetic algorithms in FPGA-based reconfigurable computing systems. Master’s thesis, Clemson University (2009). https://tigerprints.clemson.edu/all_theses/618/?utm_source=tigerprints.clemson.edu%2Fall_theses%2F618&utm_medium=PDF&utm_campaign=PDFCoverPages
Batcher, K.E.: Sorting networks and their applications. In: Proceedings of the Spring Joint Computer Conference, 30 April–2 May 1968, AFIPS 1968 (Spring), pp. 307–314. ACM, New York (1968). http://doi.acm.org/10.1145/1468075.1468121
Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with GPUs and FPGAs. In: 2008 Symposium on Application Specific Processors, pp. 101–107, June 2008
Google Scholar
Chodowiec, P., Gaj, K.: Very compact FPGA implementation of the AES algorithm. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS, vol. 2779, pp. 319–333. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45238-6_26
Chapter Google Scholar
Chrysos, G., et al.: Opportunities from the use of FPGAs as platforms for bioinformatics algorithms. In: 2012 IEEE 12th International Conference on Bioinformatics Bioengineering (BIBE), pp. 559–565, November 2012
Google Scholar
Cormen, T.H., Leiserson, C.E.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)
MATH Google Scholar
Demaine, E.: Cache-oblivious algorithms and data structures. EEF Summer Sch. Massive Data Sets 8(4), 1–249 (2002)
Google Scholar
Grozea, C., Bankovic, Z., Laskov, P.: FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge. LNCS, vol. 6310, pp. 105–117. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16233-6_12
Chapter Google Scholar
Guo, Z., Najjar, W., Vahid, F., Vissers, K.: A quantitative analysis of the speedup factors of FPGAs over processors. In: Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, FPGA 2004, pp. 162–170. ACM, New York (2004). http://doi.acm.org/10.1145/968280.968304
Hagerup, T.: Sorting and searching on the word RAM. In: Morvan, M., Meinel, C., Krob, D. (eds.) STACS 1998. LNCS, vol. 1373, pp. 366–398. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0028575
Chapter Google Scholar
Harper, L.H.: An \(n \log n\) lower bound on synchronous combinational complexity. Proc. Am. Math. Soc. 64(2), 300–306 (1977). http://www.jstor.org/stable/2041447
MathSciNet MATH Google Scholar
Huffstetler, J.: Intel processors and FPGAs-better together, May 2018. https://itpeernetwork.intel.com/intel-processors-fpga-better-together/
Hussain, H.M., Benkrid, K., Seker, H., Erdogan, A.T.: FPGA implementation of k-means algorithm for bioinformatics application: an accelerated approach to clustering microarray data. In: 2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pp. 248–255, June 2011
Google Scholar
Karatsuba, A., Ofman, Y.: Multiplication of many-digital numbers by automatic computers. In: Dokl. Akad. Nauk SSSR, vol. 145, pp. 293–294 (1962). http://mi.mathnet.ru/dan26729
Karkooti, M., Cavallaro, J.R., Dick, C.: FPGA implementation of matrix inversion using QRD-RLS algorithm. In: Conference Record of the Thirty-Ninth Asilomar Conference on Signals, Systems and Computers 2005, pp. 1625–1629 (2005)
Google Scholar
Ma, L., Agrawal, K., Chamberlain, R.D.: A memory access model for highly-threaded many-core architectures. Future Gener. Comput. Syst. 30, 202–215 (2014). http://www.sciencedirect.com/science/article/pii/S0167739X13001349, special Issue on Extreme Scale Parallel Architectures and Systems, Cryptography in Cloud Computing and Recent Advances in Parallel and Distributed Systems, ICPADS 2012 Selected Papers
Mahram, A.: FPGA acceleration of sequence analysis tools in bioinformatics (2013). https://open.bu.edu/handle/2144/11126
Reed, B.: The height of a random binary search tree. J. ACM 50(3), 306–332 (2003). https://doi.org/10.1145/765568.765571
Article MathSciNet MATH Google Scholar
Romoth, J., Porrmann, M., Rückert, U.: Survey of FPGA applications in the period 2000–2015 (Technical report) (2017)
Google Scholar
van Rooij, J.M., Bodlaender, H.L.: Exact algorithms for dominating set. Discrete Appl. Math. 159(17), 2147–2164 (2011). http://www.sciencedirect.com/science/article/pii/S0166218X11002393
Article MathSciNet Google Scholar
Sklavos, D.: DDR3 vs. DDR4: raw bandwidth by the numbers, September 2015. https://www.techspot.com/news/62129-ddr3-vs-ddr4-raw-bandwidth-numbers.html
Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969). https://doi.org/10.1007/BF02165411
Article MathSciNet MATH Google Scholar
Vitter, J.S.: Algorithms and data structures for external memory. Found. Trends Theor. Comput. Sci. 2(4), 54–63 (2008). https://doi.org/10.1561/0400000014
Article MathSciNet Google Scholar
Vollmer, H.: Introduction to Circuit Complexity: A Uniform Approach. Springer, Heidelberg (1999). https://doi.org/10.1007/978-3-662-03927-4
Book MATH Google Scholar
Woeginger, G.J.: Exact algorithms for NP-hard problems: a survey. In: Jünger, M., Reinelt, G., Rinaldi, G. (eds.) Combinatorial Optimization - Eureka, You Shrink!. LNCS, vol. 2570, pp. 185–207. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36478-1_17. http://dl.acm.org/citation.cfm?id=885909
Chapter Google Scholar
Zwick, U., Gupta, A.: Concrete complexity lecture notes, lecture 3 (1996). www.cs.tau.ac.il/~zwick/circ-comp-new/two.ps

Download references

Author information

Authors and Affiliations

Computer Science Institute, Charles University, Malostranské nám. 25, 11800, Prague, Czech Republic
Martin Hora
Department of Applied Mathematics, Charles University, Malostranské nám. 25, 11800, Prague, Czech Republic
Václav Končický & Jakub Tětek

Authors

Martin Hora
View author publications
You can also search for this author in PubMed Google Scholar
Václav Končický
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Tětek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Tětek .

Editor information

Editors and Affiliations

Anna University, Chennai, India
T.V. Gopal
Waseda University, Kitakyushu, Japan
Junzo Watada

A Simulation of Word-RAM

Theorem 10

A Word RAM with word size w running in time t(n) using m(n) memory can be simulated by a circuit of size \(\mathcal {O}(t(n)m(n)w\log (m(n)w))\) and depth \(\mathcal {O}(t(n) \log (m(n)w))\).

Proof

We first construct an asynchronous circuit. In the proof, we will be using the following two subcircuits for writing to and reading from the RAM’s memory.

Memory read subcircuit gets as input nw bits consisting of m(n) words of length w together with a number k which fits into one word when represented in binary. It returns the k’th group. There is such circuit with \(\mathcal {O}(m(n)w)\) gates and depth \(\mathcal {O}(\log (m(n)w))\).

Memory write subcircuit gets as input m(n)w bits consisting of m(n) words of length w and additional numbers k and v, both represented in binary, each fitting into a word. The circuit outputs the m(n)w bits from input with the exception of the k’th word, which is replaced by value v. There is such circuit with \(\mathcal {O}(m(n)w)\) gates in depth \(\mathcal {O}(\log w)\).

The circuit consists of t(n) layers, each of depth \(\mathcal {O}(\log (m(n)w))\). Each layer executes one step of the word-RAM. Each layer gets as input the memory of the RAM after execution of the previous instruction and the instruction pointer to the instruction which is to be executed and outputs the memory after execution of the instruction and a pointer to the next instruction. Each layer works in two phases.

In the first phase, we retrieve from memory the values necessary for execution of the instruction (including the address where the result is to be saved, in case of indirect access). We do this using the memory read subcircuit (or possibly two of them coupled together in case of indirect addressing). This can be done since the addresses from which the program reads can be inferred from the instruction pointer.

In the second phase, we execute all possible instruction on the values retrieved in phase 1. Note that all instructions of the word-RAM can be implemented by a circuit of depth \(\mathcal {O}(\log w)\). Each instruction has an output and optional wires for outputting the next instruction (which is used only by jump and conditional jump – all other instructions will output zeros). The correct instruction can be inferred from the instruction pointer, so we can use a binary tree to get the output from the correct instruction to specified wires. This output value is then stored in memory using the memory store subcircuit.

The first layer takes as input the input of the RAM. The last layer outputs the output of the RAM.

Every signal has to be delayed for at most \(\mathcal {O}(\log (m(n)w))\) steps. The number of gates is, therefore, increased by a factor of at most \(\mathcal {O}(\log (m(n)w))\). \(\square \)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hora, M., Končický, V., Tětek, J. (2019). Theoretical Model of Computation and Algorithms for FPGA-Based Hardware Accelerators. In: Gopal, T., Watada, J. (eds) Theory and Applications of Models of Computation. TAMC 2019. Lecture Notes in Computer Science(), vol 11436. Springer, Cham. https://doi.org/10.1007/978-3-030-14812-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-14812-6_18
Published: 06 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14811-9
Online ISBN: 978-3-030-14812-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Theoretical Model of Computation and Algorithms for FPGA-Based Hardware Accelerators

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Simulation of Word-RAM

A Simulation of Word-RAM

Theorem 10

Proof

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation