A reconfigurable approach to low cost media processing
In this paper we have proposed a new approach to the design of media intensive appliances using a CPU and a modest amount of FPGA for hardware acceleration. From implementations of representative algorithms (a digital filter and a dynamic programming match), we have demonstrated that a small amount of reconfigurable logic can be used to achieve high performance. For these algorithms the equivalent of around 500 Xilinx CLBs, plus a small local memory, can increase performance 12 to 21 times when compared to an optimised CPU-only implementation.
The FPGA is able to exploit parallelism available in an algorithm. The FPGA can also be “bit-width efficient”, using no more bits of precision than are necessary. Internal store can be used for local data, state and control. The system architecture must however be able to support this increased computation rate. We have explored three architectures that enable this.
In functional unit mode the FPGA is a slave coprocessor to the CPU. This is the simplest mode for partitioning but performance can be limited by marshalling overheads on the CPU. It can be used where FPGA resources do not enable direct memory access or where only portions of an algorithm computation can be accommodated.
Lockstep mode is very similar to functional unit mode but increases performance by having a direct data path between the FPGA and memory.
Datapath mode offers the best potential performance but, as the CPU and FPGA execute as two independent units, there are many more issues that must be considered when partitioning. Datapath mode is particularly effective when the FPGA can process a large independent part of the algorithm without need for complex control and synchronisation with the CPU.
These architectures provide templates for the synthesis of CPU/FPGA systems. They provide contexts within which partitions can be evaluated. Although the process is manual at the moment, we expect the architectures and techniques discussed in this paper to form the basis for future tools which will automate this process. Research at HPLabs Bristol is targeted at providing these design tools.
Unable to display preview. Download preview PDF.
- W.Sharpe, D.M.McCarthy, “Why can't hardware be more like software?”, IEE Colloquium on Partitioning in Hardware-Software Codesigns, Feb 1995.Google Scholar
- D.A.Buell, J.MArnold, and W.J.Kleinfelder, “Splash 2: FPGAs for Custom Computing Machines”, FCCM 95, Los Alamitos, IEEE Computer Society Press, 1995.Google Scholar
- Rahul Razdan, “PRISC: Programmable Reduced Instruction Set Computers”, PhD dissertation at the Dept. Of Computer Science, Harvard University, May 1994.Google Scholar
- “7960® Jx Microprocessor User's Manual”, Sept. 1994, Intel Corp.; ISBN 155512-228-0.Google Scholar
- “The Programmable Logic Data Book”, Xilinx Inc., 1994. San Jose, CAGoogle Scholar
- Steve Morley, “Multiply-Accumulate Intensive Algorithms on the Riley Experimental Board”, Hewlett Packard Laboratories, Bristol, November, 1995; HPL95-127.Google Scholar
- Igor Kostarnov, Javed Osmany, Charles Solomon, “Riley DPMatch — An exercise in algorithm mapping to hardware”, Hewlett Packard Laboratories, Bristol, November, 1995; HPL-95-128.Google Scholar
- Robert Sedgewick, “Algorithms”, Addison-Wesley Publishing Company, Reading, Massachusetts, 1983: ISBN 0-201-06672-6; “Chapter 37: Dynamic Programming ”, pp 483ff.Google Scholar