L0 Cluster Synthesis and Operation Shuffling

  • Murali Jayapala
  • Tom Vander Aa
  • Francisco Barat
  • Francky Catthoor
  • Henk Corporaal
  • Geert Deconinck
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3254)


Clustered L0 buffers are an interesting alternative for reducing energy consumption in the instruction memory hierarchy of embedded VLIW processors. Currently, the synthesis of L0 clusters is performed as an hardware optimization, where the compiler generates a schedule and based on the given schedule L0 clusters are synthesized. The result of clustering is schedule dependent, which offers a design space for exploring the effects on clustering by scheduling. This paper presents a study indicating the potentials offered by shuffling operations within a VLIW instruction on L0 cluster synthesis. The simulation results indicate that potentially up to 75% of L0 buffer energy can be reduced by shuffling operations.


Functional Unit Data Cluster Valid Cluster Memory Hierarchy Cluster Tool 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Trimaran: An Infrastructure for Research in Instruction-Level Parallelism (1999),
  3. 3.
    Standard Performance Evaluation Corporation: SPEC CPU 2000 (2000),
  4. 4.
    Bajwa, R.S., Hiraki, M., Kojima, H., Gorny, D.J., Nitta, K., Shridhar, A., Seki, K., Sasaki, K.: Instruction buffering to reduce power in processors for signal processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 5(4), 417–424 (1997)CrossRefGoogle Scholar
  5. 5.
    Benini, L., Bruni, D., Chinosi, M., Silvano, C., Zaccaria, V.: A power modeling and estimation framework for vliw-based embedded system. ST Journal of System Research 3(1), 110–118 (2002) (Also presented in PATMOS 2001)Google Scholar
  6. 6.
    Bona, A., Sami, M., Sciuto, D., Zaccaria, V., Silvano, C., Zafalon, R.: An instruction-level methodology for power estimation and optimization of embedded vliw cores. In: Proc of Design Automation and Test in Europe (DATE) (March 2002)Google Scholar
  7. 7.
    Brooks, D., Tiwari, V., Martonosi, M.: Wattch: A framework for architectural-level power analysis and optimizations. In: Proc of the 27th International Symposium on Computer Architecture (ISCA), June 2000, pp. 83–94 (2000)Google Scholar
  8. 8.
    Burd, T., Brodersen, R.W.: Energy Efficient Micorprocessor Design, 1st edn. Kluwer Academic Publishers, Dordrecht (January 1992)Google Scholar
  9. 9.
    Faraboschi, P., Brown, G., Fischer, J., Desoli, G., Homewood, F.: Lx: A technology platform for customizable vliw embedded processing. In: Proc of 27th International Symposium on Computer Architecture (ISCA) (June 2000)Google Scholar
  10. 10.
    Jacome, M.F., de Veciana, G.: Design challenges for new application-specific processors. Special issue on Design of Embedded Systems in IEEE Design & Test of Computers (April-June 2000)Google Scholar
  11. 11.
    Jayapala, M., Barat, F., Aa, T.V., Catthoor, F., Deconinck, G., Corporaal, H.: Clustered l0 buffer organization for low energy embedded processors. In: Proc of 1st Workshop on Application Specific Processors (WASP), held in conjunction with MICRO-35 (November 2002)Google Scholar
  12. 12.
    Jayapala, M., Barat, F., OpDeBeeck, P., Catthoor, F., Deconinck, G., Corporaal, H.: A low energy clustered instruction memory hierarchy for long instruction word processors. In: Hochet, B., Acosta, A.J., Bellido, M.J. (eds.) PATMOS 2002. LNCS, vol. 2451, p. 258. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  13. 13.
    Lapinskii, V., Jacome, M.F., de Veciana, G.: Applicationspecific clustered vliw datapaths: Early exploration on a parameterized design space. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems 21(8), 889–903 (2002)CrossRefGoogle Scholar
  14. 14.
    Lee, L.H., Moyer, W., Arends, J.: Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In: Proc of International Symposium on Low Power Electronic Design (ISLPED) (August 1999)Google Scholar
  15. 15.
    Texas Instruments Inc., TMS320C6000 Power Consumption Summary (November 1999),
  16. 16.
    Texas Instruments Inc, TMS320C6000 CPU and Instruction Set Reference Guide (October 2000),

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Murali Jayapala
    • 1
  • Tom Vander Aa
    • 1
  • Francisco Barat
    • 1
  • Francky Catthoor
    • 2
  • Henk Corporaal
    • 3
  • Geert Deconinck
    • 1
  1. 1.ESAT/ELECTA, K.U.LeuvenHeverleeBelgium
  2. 2.IMEC vzwHeverleeBelgium
  3. 3.Department of Electrical EngineeringEindhoven University of Technology (TUE)EindhovenThe Netherlands

Personalised recommendations