Scheduling of Iterative Algorithms with Matrix Operations for Efficient FPGA Design—Implementation of Finite Interval Constant Modulus Algorithm
- 121 Downloads
This paper deals with the optimization of iterative algorithms with matrix operations or nested loops for hardware implementation in Field Programmable Gate Arrays (FPGA), using Integer Linear Programming (ILP). The method is demonstrated on an implementation of the Finite Interval Constant Modulus Algorithm. It is an equalization algorithm, suitable for modern communication systems (4G and behind). For the floating-point calculations required in the algorithm, two arithmetic libraries were used in the FPGA implementation: one based on the logarithmic number system, the other using floating-point number system in the standard IEEE format. Both libraries use pipelined modules. Traditional approaches to the scheduling of nested loops lead to a relatively large code, which is unsuitable for FPGA implementation. This paper presents a new high-level synthesis methodology, which models both, iterative loops and imperfectly nested loops, by means of the system of linear inequalities. Moreover, memory access is considered as an additional resource constraint. Since the solutions of ILP formulated problems are known to be computationally intensive, an important part of the article is devoted to the reduction of the problem size.
Keywordshigh-level synthesis cyclic scheduling iterative algorithms imperfectly nested loops integer linear programming FPGA VLSI design blind equalization implementation
Unable to display preview. Download preview PDF.
- 2.P. A. Regalia, “A Finite Interval Constant Modulus Algorithm,” in Proc. International Conference on Acoustics, Speech, and Signal Processing(ICASSP-2002), volume III, Orlando, FL, May 13–17 2002, pp. 2285–2288.Google Scholar
- 3.Celoxica Ltd, Platform Developer’s Kit: Pipelined Floating-point Library Manual, 2004. http://www.celoxica.com.
- 4.R. Matoušek, M. Tichý, Z. Pohl, J. Kadlec, and C. Softley, “Logarithmic Number System and Floating-Point Arithmetics on FPGA,” in Field-Programmable Logic and Applications: Reconfigurable Computing is Going Mainstream, vol. 2438 of Lecture Notes in Computer Science, M. Glesner, P. Zipf, and M. Renovell (Eds.), Springer, Berlin Heidelberg New York, 2002, pp. 627–636.Google Scholar
- 5.P. Šůcha and Z. Hanzálek, Optimization of Iterative Algorithms with Matrix Operations: Case Studies, Technical report, CTU FEL DCE, Prague, 2005. http://dce.felk.cvut.cz/sucha/articles/sucha05ficmaCS.pdf.
- 10.G. Lightbody, R. Walke, R. Woods, and J. McCanny, “Parameterizable qr core,” in Asilomar Conference on Signals, Systems and Computers, Conference Record, vol. 1, 1999, pp. 120–124.Google Scholar
- 11.R. L. Walke and R. W. M. Smith, “20 GFLOPS QR Processor on a Xilinx Virtex-E FPGA,” in Advanced Signal Processing Algorithms, Architectures, and Implementations X, vol. 4116, F. T. Luk (Ed.), SPIE, 2000.Google Scholar
- 12.S. L. Sindorf and S. H. Gerez, “An Integer Linear Programming Approach to the Overlapped Scheduling of Iterative Data-Flow Graphs for Target Architectures with Communication Delays,” in PROGRESS 2000 Workshop on Embedded Systems, Utrecht, The Netherlands, 2000.Google Scholar
- 16.P. Šůcha, Z. Pohl, and Z. Hanzálek, “Scheduling of Iterative Algorithms on FPGA with Pipelined Arithmetic Unit,” in 10th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2004), Toronto, Canada, 2004.Google Scholar
- 17.Z. Pohl, P. Šůcha, J. Kadlec, and Z. Hanzálek, “Performance Tuning of Iterative Algorithms in Signal Processing,” in The International Conference on Field-Programmable Logic and Applications (FPL’05), Tampere, Finland, August 2005.Google Scholar
- 18.M. Lam, Software Pipelining: An Effective Scheduling Technique for VLIW Machines,” in PLDI ’88: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language design and Implementation, 1988, pp. 318–328.Google Scholar
- 19.B. R. Rau and C. D. Glaeser, “Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing,” in MICRO 14: Proceedings of the 14th Annual Workshop on Microprogramming, IEEE Press, Piscataway, NJ, USA, 1981, pp. 183–198.Google Scholar
- 20.S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow,” in Design, Automation and Test in Europe Conference and Exhibition (DATE’04), Paris, France, February 2004.Google Scholar
- 22.S. Carr, C. Ding, and P. Sweany, “Improving Software Pipelining with Unroll-and-Jam,” in Proceedings of the 29th Hawaii International Conference on System Sciences (HICSS’96), January 1996.Google Scholar
- 23.D. Petkov, R. Harr, and S. Amarasinghe, “Efficient Pipelining of Nested Loops: Unroll-and-Squash,” in 16th International Parallel and Distributed Processing Symposium (IPDPS’02), Fort Lauderdale, California, April 2002.Google Scholar
- 24.M. J. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley Longman, Boston, MA, USA, 1995.Google Scholar
- 25.N. Ahmed, N. Mateev, and K. Pingali, “Tiling Imperfectly-Nested Loop Nests,” in Proceedings of the IEEE/ACM SC2000 Conference, Dallas, Texas, November 2000.Google Scholar
- 27.A. Heřmánek, J. Schier, and P. A. Regalia, “Architecture Design for FPGA Implementation of Finite Interval CMA,” in Proc. European Signal Processing Conference, Wiena, Austria, September 2004, pp. 2039–2042.Google Scholar
- 29.A. Heřmánek, Study of the next generation equalization algorithms and their implementation. PhD thesis, Université Paris XI, UFR Scientifique d’Orsay, 2005.Google Scholar
- 30.A. Makhorin, GLPK (GNU Linear Programming Kit) Version 4.6, 2004. http://www.gnu.org/software/glpk/.
- 31.ILOG, Inc. CPLEX Version 8.0, 2002. http://www.ilog.com/products/cplex/.