Scheduling of Iterative Algorithms with Matrix Operations for Efficient FPGA Design—Implementation of Finite Interval Constant Modulus Algorithm

  • Přemysl Šůcha
  • Zdeněk Hanzálek
  • Antonín Heřmánek
  • Jan Schier
Article

Abstract

This paper deals with the optimization of iterative algorithms with matrix operations or nested loops for hardware implementation in Field Programmable Gate Arrays (FPGA), using Integer Linear Programming (ILP). The method is demonstrated on an implementation of the Finite Interval Constant Modulus Algorithm. It is an equalization algorithm, suitable for modern communication systems (4G and behind). For the floating-point calculations required in the algorithm, two arithmetic libraries were used in the FPGA implementation: one based on the logarithmic number system, the other using floating-point number system in the standard IEEE format. Both libraries use pipelined modules. Traditional approaches to the scheduling of nested loops lead to a relatively large code, which is unsuitable for FPGA implementation. This paper presents a new high-level synthesis methodology, which models both, iterative loops and imperfectly nested loops, by means of the system of linear inequalities. Moreover, memory access is considered as an additional resource constraint. Since the solutions of ILP formulated problems are known to be computationally intensive, an important part of the article is devoted to the reduction of the problem size.

Keywords

high-level synthesis cyclic scheduling iterative algorithms imperfectly nested loops integer linear programming FPGA VLSI design blind equalization implementation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    D. N. Godard, “Self-Recovering Equalization and Carrier Tracking in Two-Dimensional Data Communication Systems,” IEEE Trans. Commun., vol. 28, November 1980, pp. 1867–1875.CrossRefGoogle Scholar
  2. 2.
    P. A. Regalia, “A Finite Interval Constant Modulus Algorithm,” in Proc. International Conference on Acoustics, Speech, and Signal Processing(ICASSP-2002), volume III, Orlando, FL, May 13–17 2002, pp. 2285–2288.Google Scholar
  3. 3.
    Celoxica Ltd, Platform Developer’s Kit: Pipelined Floating-point Library Manual, 2004. http://www.celoxica.com.
  4. 4.
    R. Matoušek, M. Tichý, Z. Pohl, J. Kadlec, and C. Softley, “Logarithmic Number System and Floating-Point Arithmetics on FPGA,” in Field-Programmable Logic and Applications: Reconfigurable Computing is Going Mainstream, vol. 2438 of Lecture Notes in Computer Science, M. Glesner, P. Zipf, and M. Renovell (Eds.), Springer, Berlin Heidelberg New York, 2002, pp. 627–636.Google Scholar
  5. 5.
    P. Šůcha and Z. Hanzálek, Optimization of Iterative Algorithms with Matrix Operations: Case Studies, Technical report, CTU FEL DCE, Prague, 2005. http://dce.felk.cvut.cz/sucha/articles/sucha05ficmaCS.pdf.
  6. 6.
    M. A. Bayoumi, G. A. Jullien, and W. C. Miller, “Hybrid VLSI Architecture of FIR Filters using Residue Number Systems,” Electron. Lett., vol. 21, no. 8, January 1985, pp. 358–359.CrossRefGoogle Scholar
  7. 7.
    J. G. McWhirter, “Systolic Array for Recursive Least-Squares Minimisation,” Electron. Lett., vol. 19, no. 18, 1983, pp. 729–730.CrossRefGoogle Scholar
  8. 8.
    I. K. Proudler, J. G. McWhirter, M. Moonen, and G. Hekstra, “The Formal Derivation of a Systolic Array for Recursive Least Squares Estimation,” IEEE Trans. Circuits Syst. 2: Analog Digit. Signal Process, vol. 43, no. 3, 1996, pp. 247–254.CrossRefGoogle Scholar
  9. 9.
    M. Moonen, P. Van Dooren, and J. Vandewalle, “Systolic Algorithm for QSVD Updating,” Signal Process., vol. 25, no. 2, 1991, pp. 203–213.MATHCrossRefGoogle Scholar
  10. 10.
    G. Lightbody, R. Walke, R. Woods, and J. McCanny, “Parameterizable qr core,” in Asilomar Conference on Signals, Systems and Computers, Conference Record, vol. 1, 1999, pp. 120–124.Google Scholar
  11. 11.
    R. L. Walke and R. W. M. Smith, “20 GFLOPS QR Processor on a Xilinx Virtex-E FPGA,” in Advanced Signal Processing Algorithms, Architectures, and Implementations X, vol. 4116, F. T. Luk (Ed.), SPIE, 2000.Google Scholar
  12. 12.
    S. L. Sindorf and S. H. Gerez, “An Integer Linear Programming Approach to the Overlapped Scheduling of Iterative Data-Flow Graphs for Target Architectures with Communication Delays,” in PROGRESS 2000 Workshop on Embedded Systems, Utrecht, The Netherlands, 2000.Google Scholar
  13. 13.
    C. Hanen and A. Munier, “A Study of the Cyclic Scheduling Problem on Parallel Processors,” Discrete Appl. Math., vol. 57, February 1995, pp. 167–192.MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    A. Munier, “The Complexity of a Cyclic Scheduling Problem with Identical Machines,” Eur. J. Oper. Res., vol. 91, June 1996, pp. 471–480.MATHCrossRefGoogle Scholar
  15. 15.
    Dirk Fimmel and Jan Müller, “Optimal Software Pipelining Under Resource Constraints,” Int. J. Found. Comput. Sci., vol. 12, no. 6, 2001, pp. 697–718.CrossRefGoogle Scholar
  16. 16.
    P. Šůcha, Z. Pohl, and Z. Hanzálek, “Scheduling of Iterative Algorithms on FPGA with Pipelined Arithmetic Unit,” in 10th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2004), Toronto, Canada, 2004.Google Scholar
  17. 17.
    Z. Pohl, P. Šůcha, J. Kadlec, and Z. Hanzálek, “Performance Tuning of Iterative Algorithms in Signal Processing,” in The International Conference on Field-Programmable Logic and Applications (FPL’05), Tampere, Finland, August 2005.Google Scholar
  18. 18.
    M. Lam, Software Pipelining: An Effective Scheduling Technique for VLIW Machines,” in PLDI ’88: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language design and Implementation, 1988, pp. 318–328.Google Scholar
  19. 19.
    B. R. Rau and C. D. Glaeser, “Some Scheduling Techniques and an Easily Schedulable Horizontal Architecture for High Performance Scientific Computing,” in MICRO 14: Proceedings of the 14th Annual Workshop on Microprogramming, IEEE Press, Piscataway, NJ, USA, 1981, pp. 183–198.Google Scholar
  20. 20.
    S. Gupta, N. Dutt, R. Gupta, and A. Nicolau, “Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow,” in Design, Automation and Test in Europe Conference and Exhibition (DATE’04), Paris, France, February 2004.Google Scholar
  21. 21.
    A. Darte and Guillaume Huard, “Loop Shifting for Loop Compaction,” Int. J. Parallel Program., vol. 28, no. 5, 2000, pp. 499–534.CrossRefGoogle Scholar
  22. 22.
    S. Carr, C. Ding, and P. Sweany, “Improving Software Pipelining with Unroll-and-Jam,” in Proceedings of the 29th Hawaii International Conference on System Sciences (HICSS’96), January 1996.Google Scholar
  23. 23.
    D. Petkov, R. Harr, and S. Amarasinghe, “Efficient Pipelining of Nested Loops: Unroll-and-Squash,” in 16th International Parallel and Distributed Processing Symposium (IPDPS’02), Fort Lauderdale, California, April 2002.Google Scholar
  24. 24.
    M. J. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley Longman, Boston, MA, USA, 1995.Google Scholar
  25. 25.
    N. Ahmed, N. Mateev, and K. Pingali, “Tiling Imperfectly-Nested Loop Nests,” in Proceedings of the IEEE/ACM SC2000 Conference, Dallas, Texas, November 2000.Google Scholar
  26. 26.
    R. Schreiber, S. Aditya, S. Mahlke, V. Kathail, B. Rau, D. Cronquist, and M. Sivaraman, “Pico-npa: High-Level Synthesis of Nonprogrammable Hardware Accelerators,” J. VLSI Signal Process., vol. 31, no. 2, 2002, pp. 127–142.MATHCrossRefGoogle Scholar
  27. 27.
    A. Heřmánek, J. Schier, and P. A. Regalia, “Architecture Design for FPGA Implementation of Finite Interval CMA,” in Proc. European Signal Processing Conference, Wiena, Austria, September 2004, pp. 2039–2042.Google Scholar
  28. 28.
    W. Givens, “Computation of Plane Unitary Rotations Transforming a General Matrix to Triangular Form,” J. Soc. Ind. Appl. Math., vol. 6, 1958, pp. 26–50.MATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    A. Heřmánek, Study of the next generation equalization algorithms and their implementation. PhD thesis, Université Paris XI, UFR Scientifique d’Orsay, 2005.Google Scholar
  30. 30.
    A. Makhorin, GLPK (GNU Linear Programming Kit) Version 4.6, 2004. http://www.gnu.org/software/glpk/.
  31. 31.
    ILOG, Inc. CPLEX Version 8.0, 2002. http://www.ilog.com/products/cplex/.

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Přemysl Šůcha
    • 1
  • Zdeněk Hanzálek
    • 1
  • Antonín Heřmánek
    • 2
  • Jan Schier
    • 2
  1. 1.Centre for Applied Cybernetics, Department of Control Engineering, Faculty of Electrical EngineeringCzech Technical University in PraguePragueCzech Republic
  2. 2.Institute of Information Theory and AutomationAcademy of Sciences of the Czech RepublicPragueCzech Republic

Personalised recommendations