Exploring Energy-Performance Trade-Offs for Heterogeneous Interconnect Clustered VLIW Processors

  • Rahul Nagpal
  • Y. N. Srikant
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4297)

Abstract

Clustered architecture processors are preferred for embedded systems because centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption. Although clustering helps by improving clock speed, reducing energy consumption of the logic, and making the design simpler, it introduces extra overheads by way of inter-cluster communication. This communication happens over long global wires which leads to delay in execution and significantly high energy consumption.

In this paper, we propose a new instruction scheduling algorithm that exploits scheduling slacks of instructions and communication slacks of data values together to achieve better energy-performance trade-offs for clustered architectures with heterogeneous interconnect. Our instruction scheduling algorithm achieves 35% and 40% reduction in communication energy, whereas the overall energy-delay product improves by 4.5% and 6.5% respectively for 2 cluster and 4 cluster machines with marginal increase (1.6% and 1.1%) in execution time. Our test bed uses the Trimaran compiler infrastructure.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Balasubramonian, R., Muralimanohar, N., Ramani, K., Venkatachalapathy, V.: Microarchitectural Wire Management for Performance and Power in Partitioned Architectures. In: Proc. of Intl. Symp. on High-Performance Computer Architecture, pp. 28–39 (2005)Google Scholar
  2. 2.
    Banerjee, K., Mehrotra, A.: A Power-Optimal Repeater Insertion Methodology for Global Interconnects in Nanometer Designs. In: Proc. of IEEE Trans. on Electron Devices, pp. 2001–2007 (November 2002)Google Scholar
  3. 3.
    Mui, M.L., Banerjee, K., Mehrotra, A.: A Global Interconnect Optimization Scheme for Nanometer Scale VLSI with Implications for Latency, Bandwidth and Power Dissipation. IEEE Trans. on Electron Devices, 195–203 (2004)Google Scholar
  4. 4.
    Chu, M., Fan, K., Mahlke, S.: Region-based Hierarchical Operation Partitioning for Multicluster Processors. SIGPLAN Notices, 300–311 (2003)Google Scholar
  5. 5.
    Faraboschi, P., Brown, G., Fisher, J.A., Desoli, G.: Clustered Instruction-level Parallel Processors. Technical report, Hewlett-Packard (1998)Google Scholar
  6. 6.
    Joan-Manuel Parcerisa, A.G., Sahuquillo, J., Duato, J.: Efficient Interconnects for Clustered Microarchitectures. In: Proc. of Int. Conf. on Parallel Architectures and Compilation Techniques, pp. 291–300 (2002)Google Scholar
  7. 7.
    Kailas, K., Agrawala, A., Ebcioglu, K.: CARS: A New Code Generation Framework for Clustered ILP Processors. In: Proc. of Intl. Symp. on High-Performance Computer Architecture, p. 133 (2001)Google Scholar
  8. 8.
    Kim, H.S., Vijaykrishnan, N., Kandemir, M., Irwin, M.J.: Adapting Instruction Level Parallelism for Optimizing Leakage in VLIW Architectures. In: Proc. of Conf. on Language, Compiler, and Tool for Embedded Systems, pp. 275–283 (2003)Google Scholar
  9. 9.
    Lapinskii, V.S., Jacome, M.F., De Veciana, G.A.: Cluster Assignment for High-Performance Embedded VLIW processors. ACM Trans. on Design and Automation of Electronic Systems, 430–454 (2002)Google Scholar
  10. 10.
    Nagpal, R., Srikant, Y.N.: Integrated Temporal and Spatial Scheduling for Extended Operand Clustered VLIW Processors. In: Proc. of Conf. on Computing Frontiers, pp. 457–470 (2004)Google Scholar
  11. 11.
    Nagpal, R., Srikant, Y.N.: Exploring Energy-Performance Trade-offs for Heterogeneous Interconnect Clustered VLIW Processors. Technical Report, Dept. of CSA, Indian Institute of Science (2005), http://www.archive.csa.iisc.ernet.in/TR
  12. 12.
    Ozer, E., Banerjia, S., Conte, T.M.: Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures. In: Proc. of Intl. Symp. on Microarchitecture, pp. 308–315 (1998)Google Scholar
  13. 13.
    Terechko, A., Thenaff, E.L., Garg, M., Eijndhoven, J.V., Corporaal, H.: Inter-Cluster Communication Models for Clustered VLIW Processors. In: Proc. of Intl. Symp. on High-Performance Computer Architecture, p. 354 (2003)Google Scholar
  14. 14.
    Wang, H., Peh, L.-S., Malik, S.: Power-driven Design of Router Microarchitectures in On-chip Networks. In: Proc. of Symp. on Microarchitecture, p. 105 (2003)Google Scholar
  15. 15.
    Zhang, W., Vijaykrishnan, N., Kandemir, M., Irwin, M.J., Duarte, D., Tsai, Y.-F.: Exploiting VLIW Schedule Slacks for Dynamic and Leakage Energy Reduction. In: Proc. of Intl. Symp. on Microarchitecture, pp. 102–113 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Rahul Nagpal
    • 1
  • Y. N. Srikant
    • 1
  1. 1.Department of Computer Science and AutomationIndian Institute of ScienceBangaloreIndia

Personalised recommendations