Advertisement

Low-Power Processor-Level Data Transfer and Storage Exploration

  • Erik Brockmeyer
  • Cedric Ghez
  • Wim Baetens
  • Francky Catthoor
Chapter

Abstract

The current starting point of IMEC’s systematic processor-level DTSE methodology [9, 10] is a system specification with accesses on multi-dimensional (M-D) signals which can be statically ordered. The output is a net-list of memories and address generators (see Fig. 3. 1), combined with a transformed specification which is the input for the architecture (high-level) synthesis when custom realizations are envisioned, or for the software compilation stage in the case of predefined processors. The most time-consuming and error-prone steps within this methodology are becoming supported by tools, developed in the context of the ATOMIUM system exploration environment. The address generators are produced by a separate methodology called address optimisation or ADOPT (see section 3. 12).

Keywords

Memory Access Loop Nest Design Flow Memory Hierarchy Memory Organisation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    More information and related papers on the Acropolis project are available at the IMEC web site: http://www.imec.be/acropolis/Welcome.html.
  2. [2]
    S. Amarasinghe, J. Anderson, M. Lam, and C. Tseng, “The SUEF compiler for scalable parallel machines”, in Proc. of the 7th SIAM Conf. on Parallel Proc. for Scientific Computing, 1995.Google Scholar
  3. [3]
    More information and related papers on the Atomium project are available at the IMEC web site: http: //www. imec. be/atomium/Welcome. html.Google Scholar
  4. [4]
    B. M. Baas, “An Energy-Efficient Single-Chip FFT Processor” in IEEE Symp. on VLSI, Honolulu HI, 13–15 june 1996.Google Scholar
  5. [5]
    U. Banerjee, R. Eigenmann, A. Nicolau, D. Padua, “Automatic program parallelisa-tion”, Proc. of the IEEE, invited paper, Vol. 81, No. 2, Feb. 1993.Google Scholar
  6. [6.
    ] E. Brockmeyer, S. Wuytack, A. Vandecappelle, F. Catthoor, “Low power storage for hierarchical graphs”, Proc. 3rd ACM/IEEE Design and Test in Europe Conf, Paris, France, User-Forum pp. 249–254, April 2000.Google Scholar
  7. [7]
    F. Catthoor, F. Franssen, S. Wuytack, L. Nachtergaele, H. De Man, “Global communication and memory optimizing transformations for low power signal processing systems”, IEEE workshop on VLSI signal processing, La Jolla CA, Oct. 1994. Also in VLSI Signal Processing VII, J. Rabaey, P. Chau, J. Eldon (eds. ), IEEE Press, New York, pp. 178–187, 1994.Google Scholar
  8. [8]
    F. Catthoor, M. Janssen, L. Nachtergaele, H. De Man, “System-level data-flow transformation exploration and power-area trade-offs demonstrated on video coders”, special issue on “Systematic trade-off analysis in signal processing systems design” (eds. M. Ibrahim, W. Wolf) in Journal of VLSI Signal Processing, Vol. 18, No. 1, Kluwer, Boston, pp. 39–50, 1998.CrossRefGoogle Scholar
  9. [9]
    F. Catthoor, S. Wuytack, E. De Greef, F. Franssen, L. Nachtergaele. H. De Man, “System-level transformations for low power data transfer and storage”, in paper collection on “Low power CMOS design” (eds. A. Chandrakasan, R. Brodersen), IEEE Press, pp. 609–618, 1998.Google Scholar
  10. [10]
    F. Catthoor, S. Wuytack, E. De Greef, F. Balasa, P. Slock, “System exploration for custom low power data storage and transfer”, chapter in “Digital Signal Processing for Multimedia Systems” (eds. K. Parhi, T. Nishitani), Marcel Dekker, Inc. , New York, 1998.Google Scholar
  11. [11]
    F. Catthoor, S. Wuytack, E. De Greef, EBalasa, L. Nachtergaele, A. Vandecappelle, “Custom Memory Management Methodology — Exploration of Memory Organisation for Embedded Multimedia System Design”, ISBN 0–7923–8288–9, Kluwer Acad. Publ. , Boston, 1998.MATHGoogle Scholar
  12. [12]
    M. Cupak, F. Catthoor, “Efficient functional validation of system-level loop transformations for multi-media applications”, Proc. Electronic Circuits and Systems Conference, Bratislava, Slovakia, pp. 39–43, Sep. 1997.Google Scholar
  13. [13]
    M. Cupak, C. Kulkarni, F. Catthoor, H. De Man, “Functional Validation of System-level Loop Transformations for Power Efficient Caching” Proc. Wsh. on System Design Automation, Dresden, Germany, March, 1998.Google Scholar
  14. [14]
    F. Catthoor, “Energy-delay efficient data storage and transfer architectures: circuit techno logy versus design methodology solutions”, Proceedings of DATE’98, Feb. 23–25 1998.Google Scholar
  15. [15]
    K. Danckaert, F. Catthoor, H. De Man, “Platform independent data transfer and storage exploration illustrated on a parallel cavity detection algorithm”, Proc. ACM Conf. on Par. and Dist. Proc. Techniques and Applications, PDPTA’99, Vol. III, pp. 1669–1675,Las Vegas NV, June 1999.Google Scholar
  16. [16]
    K. Danckaert, F. Catthoor, H. De Man, “A loop transformation approach for combined parallelization and data transfer and storage optimization”, accepted for Proc. ACM Conf. on Par. and Dist. Proc. Techniques and Applications, PDPTA’00, Las Vegas NV, June 2000.Google Scholar
  17. [17]
    E. De Greef, F. Catthoor, H. De Man, “A memory-efficient, programmable multiprocessor architecture for real-time motion estimation type algorithms”, Intnl. Workshop on Algorithms and Parallel VLSI Architectures, Leuven, Belgium, August 1994. Also in “Algorithms and Parallel VLSI Architectures III” (eds. M. Moonen, F. Catthoor), Elsevier, pp. 191–202, 1995.CrossRefGoogle Scholar
  18. [18]
    E. De Greef, F. Catthoor, H. De Man, “Mapping real-time motion estimation type algorithms to memory-efficient, programmable multi-processor architectures”, Microprocessors and Microprogramming, special issue on “Parallel Programmable Architectures and Compilation for Multi-dimensional Processing” (eds. F. Catthoor, M. Moonen), Elsevier, pp. 409–423, Oct. 1995.Google Scholar
  19. [19]
    E. De Greef, F. Catthoor, H. De Man, “Memory organization for video algorithms on programmable signal processors”, Proc. IEEE Int. Conf. on Computer Design, Austin TX, pp. 552–557, Oct. 1995.Google Scholar
  20. [20]
    E. De Greef, F. Catthoor, H. De Man, “In-Place mapping and its relation to loop parallelisation”, presented at Dagstuhl on Loop Parallelisation, Schloss Dagstuhl, Germany, April 1996.Google Scholar
  21. [21]
    E. De Greef, “Combined flow-graph and polyhedral model”, Internal Research Presentation, IMEC, May 1996.Google Scholar
  22. [22]
    E. De Greef, F. Catthoor, H. De Man, “Memory Size Reduction through Storage Order Optimization for Embedded Parallel Multimedia Applications”, Intnl. Parallel Proc. Symp. (IPPS) in Proc. Workshop on “Parallel Processing and Multimedia”, Geneva, Switzerland, pp. 84–98, April 1997.Google Scholar
  23. [23]
    E. De Greef, F. Catthoor, H. De Man, “Array Placement for Storage Size Reduction in Embedded Multimedia Systems”, Proc. Intnl. Conf. on Applic. -Spec. Array Processors, Zurich, Switzerland, pp. 66–75, July 1997.Google Scholar
  24. [24]
    E. De Greef, F. Catthoor, H. De Man, “Memory Size Reduction through Storage Order Optimization for Embedded Parallel Multimedia Applications”, special issue on “Parallel Processing and Multi-media” (ed. A. Krikelis), in Parallel Computing Elsevier, Vol. 23, No. 12, Dec. 1997.Google Scholar
  25. [25]
    E. De Greef, “Storage size reduction for multimedia applications”, Doctoral dissertation, ESAT/EE Dept. , K. U. Leuven, Belgium, Jan. 1998.Google Scholar
  26. [26]
    J. Z. Fang, M. Lu, “An iteration partition approach for cache or local memory thrashing on parallel processing”, IEEE Trans, on Computers, Vol. C-42, No. 5, pp. 529–546, May 1993.CrossRefGoogle Scholar
  27. [27]
    P. Feautrier, “Compiling for massively parallel architectures: a perspective”, Intnl. Workshop on Algorithms and Parallel VLSI Architectures, Leuven, Belgium, August 1994. Also in “Algorithms and Parallel VLSI Architectures III” (eds. M. Moonen, F. Catthoor), Elsevier, pp. 259–270, 1995.CrossRefGoogle Scholar
  28. [28]
    W. Geurts, F. Catthoor, H. De Man, “Heuristic techniques for the synthesis of complex functional units”, Proc. 4th ACM/IEEE Europ. Design Automation Conf. , Paris, France, pp. 552–556, Feb. 1993.Google Scholar
  29. [29]
    W. Geurts, F. Catthoor, H. De Man, “Quadratic Zero-one Programming-based Synthesis of Application-Specific Data Paths”, IEEE Trans, on Comp. -aided Design, Vol. CAD-14, No. 1, pp. 1–11, Jan. 1995.CrossRefGoogle Scholar
  30. [30]
    W. Geurts, “Synthesis of accelerator data-paths for high-throughput signal processing applications”, Doctoral dissertation, ESAT/EE Dept. , K. U. Leuven, Belgium, March 1995.Google Scholar
  31. [31]
    W. Geurts, F. Catthoor, S. Vernalde, H. De Man, “Accelerator data-paths synthesis for high-throughput signal processing applications”, Kluwer Academic Publishers, Boston, 1996.Google Scholar
  32. [32]
    J. M. Janssen, F. Catthoor, H. De Man, “A Specification Invariant Technique for Operation Cost Minimisation in Flow-graphs”.Proc. 7th ACM/IEEE Int. Workshop on High-Level Synthesis, Niagara-on-the-Lake, Canada, pp. 146–151, May 1994.Google Scholar
  33. [33]
    J. M. Janssen, F. Catthoor, H. De Man, “A Specification Invariant Technique for Regularity Improvement between Flow-Graph Clusters”, Proc. European Design Automation Conf. , Paris, France, pp. 138–143, Feb. 1996.Google Scholar
  34. [34]
    M. Jimenez, J. Llaberia, A. Fernandez, E. Morancho, “A unified transformation technique for multi-level blocking” Proc. EuroPar Conference, Lyon, France, August 1996. “Lecture notes in computer science” series, Springer Verlag, pp. 402–405.Google Scholar
  35. [35]
    D. Kolson, A. Nicolau, N. Dutt, “Minimization of memory traffic in high-level synthesis”, Proc. 31st ACM/IEEE Design Automation Conf, San Diego, CA, pp. 149–154, June 1994.Google Scholar
  36. [36]
    D. Kulkarni, M. Stumm, R. Unrau, “Implementing flexible computation rules with subexpression-level loop transformations”, Technical report, Comp. Systems Res. Inst. Univ. of Toronto, Canada, 1995.Google Scholar
  37. [37]
    van de Laar, F. , Philips, N. , and Olde Dubbelink, R. , “General-purpose and application-specific design of a DAB channel decoder”, EBU Technical Review, 258 (Winter 1993), 25 – 35, ISSN 1019–6587.Google Scholar
  38. [38]
    Langen, E. , “The Philips DAB 452 test receiver”, DAB Newsletter, F. Kozamernik, Ed. , no. 6. European Broadcasting Union, autumn 1994, pp. 6–10.Google Scholar
  39. [39]
    W. Li, K. Pingali. “A singular loop transformation framework based on non-singular matrices”, Proc. 5th Annual Workshop on Languages and Compilers for Parallelism, New Haven CN, August 1992.Google Scholar
  40. [40]
    P. Lippens, J. van Meerbergen, W. Verhaegh, A. van der Werf, “Allocation of multiport memories for hierarchical data streams”, Proc. IEEE Int. Conf. Comp. Aided Design, Santa Clara CA, Nov. 1993.Google Scholar
  41. [41]
    N. Manjiakian, T. Abdelrahman, “Reduction of cache conflicts in loop nests”, Technical report CSRI-318, Comp. Systems Res. Inst. Univ. of Toronto, Canada, March 1995.Google Scholar
  42. [42]
    K. Masselos, F. Catthoor, C. E. Goutis, H. DeMan, “Low Power Mapping of Video Processing Applications on VLIW Multimedia Processors”, IEEE Alessandro Volta Memorial Intnl. Wsh. on Low Power Design (VOLTA), Como, Italy, pp. 52–60, March 1999.Google Scholar
  43. [43]
    K. Masselos, F. Catthoor, C. E. Goutis, H. De Man, “Code size effects of power optimizing code transformations for embedded multimedia applications”, Proc. IEEE Wsh. on Power and Timing Modeling, Optimization and Simulation (PAT-MOS), Kos, Greece, pp. 61–70, Oct. 1999.Google Scholar
  44. [44]
    M. Miranda, F. Catthoor, H. De Man, “Address equation optimization and hardware sharing for real-time signal processing applications”, IEEE workshop on VLSI signal processing, La Jolla CA, Oct. 1994. Also in VLSI Signal Processing VII, J. Rabaey, P. Chau, J. Eldon (eds. ), IEEE Press, New York, pp. 208–217, 1994.Google Scholar
  45. [45]
    M. Miranda, F. Catthoor, M. Janssen, H. De Man, “ADOPT: Efficient Hardware Address Generation in Distributed Memory Architectures”, Proc. 9th ACM/IEEE Intnl. Symp. on System-Level Synthesis, La Jolla CA, pp. 20–25, Nov. 1996.Google Scholar
  46. [46]
    M. Miranda, M. Kaspar, F. Catthoor, H. De Man, “Architectural Exploration And Optimization for Counter Based Hardware Address Generation”, Proc. European Design Automation Conf, Paris, France, pp. 293–298, Feb. 1997.Google Scholar
  47. [47]
    M. Miranda, F. Catthoor, M. Janssen, H. De Man, “High-level Address Optimisation and Synthesis Techniques for Data-Transfer Intensive Applications”, IEEE Trans, on VLSI Systems, Vol. 6, No. 4, pp. 677–686, Dec. 1998.CrossRefGoogle Scholar
  48. [48]
    L. Nachtergaele, F. Catthoor, F. Balasa, F. Franssen, E. De Greef, H. Samsom, H. De Man, “Optimization of memory organization and hierarchy for decreased size and power in video and image processing systems”, Proc. Intnl. Workshop on Memory Technology, Design and Testing, San Jose CA, pp. 82–87, Aug. 1995.Google Scholar
  49. [49]
    S. Note, W. Geurts, F. Catthoor, H. De Man, “Cathedral III: Architecture driven high-level synthesis for high throughput DSP applications”, Proc. 28th ACM/IEEE Design Automation Conf, San Francisco CA, pp. 597–602, June 1991.Google Scholar
  50. [50]
    S. Note, “Mapping high throughput signal processing algorithms into dedicated data-path architectures” Doctoral dissertation, ESAT/EE Dept. , K. U. Leuven, Belgium, March 1991.Google Scholar
  51. [51]
    D. A. Padua, M. J. Wolfe. “Advanced compiler optimizations for supercomputers”, Communications of the ACM, Vol. 29, No. 12, pp. 1184–1201, 1986.CrossRefGoogle Scholar
  52. [52]
    P. R. Panda, N. D. Dutt, A. Nicolau, “Memory issues in embedded in systems-on-chip: optimization and exploration”, Kluwer Acad. Publ. , Boston, 1999.CrossRefGoogle Scholar
  53. [53]
    L. Ramachandran, D. Gajski, V. Chaiyakul, “An algorithm for array variable clustering”, Proc. 5th ACM/IEEE Europ. Design and Test Conf. , Paris, France, pp. 262–266, Feb. 1994.Google Scholar
  54. [54]
    H. Samsom, L. Claesen, H. De Man, “SynGuide: an environment for doing interactive correctness preserving transformations”, IEEE workshop on VLSI signal processing, Veldhoven, The Netherlands, Oct. 1993. Also in VLSI Signal Processing VI, L. Eggermont, P. Dewilde, E. Deprettere, J. van Meerbergen (eds. ), IEEE Press, New York, pp. 269–277, 1993.CrossRefGoogle Scholar
  55. [55]
    H. Samsom, F. Franssen, F. Catthoor, H. De Man, “Verification of loop transformations for real time signal processing applications”, IEEE workshop on VLSI signal processing, La Jolla CA, Oct. 1994. Also in VLSI Signal Processing VII, J. Rabaey, P. Chau, J. Eldon (eds. ), IEEE Press, New York, pp. 269–277, 1994.Google Scholar
  56. [56]
    H. Samsom, F. Franssen, F. Catthoor, H. De Man, “System-level Verification of Video and Image Processing Specifications”, Proc. 8th ACM/IEEE Intnl. Symp. on System-Level Synthesis, Cannes, pp. 144–149, Sep. 1995.Google Scholar
  57. [57]
    H. Samsom, “Formal verification and transformation of video and image processing applications”, Doctoral dissertation, ESAT/EE Dept. , K. U. Leuven, Belgium, Oct. 1995.Google Scholar
  58. [58]
    H. Schmidt and D. Thomas. “Address generation for memories containing multiple arrays”, Proc. IEEE Int. Conf. Comp. Aided Design, San Jose CA, pp. 510–514, Nov. 1995.Google Scholar
  59. [59]
    O. Sentieys, D. Chillet, J. P. Diguet, J. Philippe, “Memory module selection for high-level synthesis”, Proc. IEEE workshop on VLSI signal processing, Monterey CA, Oct. 1996.Google Scholar
  60. [60]
    P. Slock, S. Wuytack, F. Catthoor, G. de Jong, “Fast and extensive system-level memory exploration for ATM applications”, Proc. 10th ACM/IEEE Intnl. Symp. on System-Level Synthesis, Antwerp, Belgium, pp. 74–81, Sep. 1997.Google Scholar
  61. [61]
    B. Vanhoof, M. Kaspar, P. Schaumont, “Address generation within Cathedral-2/3”, SPRITE deliverable report C3. g/IMEC/Y5m12/1, Dec. 1993.Google Scholar
  62. [62]
    J. Vanhoof, I. Bolsens, H. De Man, “Compiling multi-dimensional data streams into distributed DSP ASIC memory”, Proc. IEEE Int. Conf. Comp. Aided Design, Santa Clara CA, pp. 272–275, Nov. 1991.Google Scholar
  63. [63]
    I. Verbauwhede, F. Catthoor, J. Vandewalle, H. De Man, “High-level memory management for real-time signal processing of algebraic algorithms on application-specific micro-coded processors”, Proc. Intnl. Workshop on Algorithms and Parallel VLSI Architectures, Pont-a-Mousson, France, June 1990. Also in Algorithms and Parallel VLSI Architectures, Vol. B. E. Deprettere, A. Van der Veen (eds. ), Elsevier, Amsterdam, pp. 353–362,1991.Google Scholar
  64. [64]
    I. Verbauwhede, F. Catthoor, J. Vandewalle, H. De Man, “In-place memory management of algebraic algorithms on application-specific IC’s”, Journal of VLSI signal processing, Vol. 3, Kluwer, Boston, pp. 193–200, 1991.CrossRefGoogle Scholar
  65. [65]
    F. Vermeulen, F. Catthoor, D. Verkest, H. De Man, “A System-Level Reuse Methodology for Embedded Data-Dominated Applications”, Proc. IEEE Wsh. on Signal Processing Systems (SIPS), Boston MA, IEEE Press, pp. 551–560, Oct. 1998.Google Scholar
  66. [66]
    M. van Swaaij, F. Franssen, F. Catthoor, H. De Man, “Modelling data and control flow for high-level memory management”, Proc. 3rd ACM/IEEE Europ. Design Automation Conf. , Brussels, Belgium, pp. 8–13, March 1992.Google Scholar
  67. [67]
    M. van Swaaij, F. Franssen, F. Catthoor, H. De Man, “Automating high-level control flow transformations for DSP memory management”, Proc. IEEE workshop on VLSI signal processing, Napa Valley CA, Oct. 1992. Also in VLSI Signal Processing V, K. Yao, R. Jain, W. Przytula (eds. ), IEEE Press, New York, pp. 397–406, 1992.CrossRefGoogle Scholar
  68. [68]
    S. Wuytack, F. Catthoor, L. Nachtergaele, H. De Man, “Power Exploration for Data Dominated Video Applications”, Proc. IEEE Intnl. Symp. on Low Power Design, Monterey, pp. 359–364, Aug. 1996.Google Scholar
  69. [69]
    S. Wuytack, F. Catthoor, L. Nachtergaele, H. De Man, “Power Exploration for Data Dominated Video Applications”, Proc. IEEE Intnl. Symp. on Low Power Design, Monterey, pp. 359–364, Aug. 1996.Google Scholar
  70. [70]
    S. Wuytack, F. Catthoor, G. De Jong, B. Lin, H. De Man, “Flow Graph Balancing for Minimizing the Required Memory Bandwidth”, Proc. 9th ACM/IEEE Intnl. Symp. on System-Level Synthesis, La Jolla CA, pp. 127–132, Nov. 1996.Google Scholar
  71. [71]
    S. Wuytack, J. P. Diguet, F. Catthoor, H. De Man, “Formalized methodology for data reuse exploration for low-power hierarchical memory mappings”, IEEE Trans, on VLSI Systems, Vol. 6, No. 4, pp. 529–537, Dec. 1998.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2000

Authors and Affiliations

  • Erik Brockmeyer
    • 1
  • Cedric Ghez
    • 1
  • Wim Baetens
    • 1
  • Francky Catthoor
    • 1
  1. 1.IMECLeuvenBelgium

Personalised recommendations