UCIFF: Unified Cluster Assignment Instruction Scheduling and Fast Frequency Selection for Heterogeneous Clustered VLIW Cores

  • Vasileios Porpodas
  • Marcelo Cintra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7760)

Abstract

Clustered VLIW processors are scalable wide-issue statically scheduled processors. Their design is based on physically partitioning the otherwise shared hardware resources, a design which leads to both high performance and low energy consumption. In traditional clustered VLIW processors, all clusters operate at the same frequency. Heterogeneous clustered VLIW processors however, support dynamic voltage and frequency scaling (DVFS) independently per cluster. Effectively controlling DVFS, to selectively decrease the frequency of clusters with a lot of slack in their schedule, can lead to significant energy savings.

In this paper we propose UCIFF, a new scheduling algorithm for heterogeneous clustered VLIW processors with software DVFS control, that performs cluster assignment, instruction scheduling and fast frequency selection simultaneously, all in a single compiler pass. The proposed algorithm solves the phase ordering problem between frequency selection and scheduling, present in existing algorithms. We compared the quality of the generated code, using both performance and energy-related metrics, against that of the current state-of-the-art and an optimal scheduler. The results show that UCIFF produces better code than the state-of-the-art, very close to the optimal across the mediabench2 benchmarks, while keeping the algorithmic complexity low.

Keywords

clustered VLIW heterogeneous DVFS scheduling phase-ordering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gcc: Gnu compiler collection, http://gcc.gnu.org
  2. 2.
    Aleta, A., Codina, J., González, A., Kaeli, D.: Heterogeneous clustered vliw microarchitectures. In: CGO, pp. 354–366 (2007)Google Scholar
  3. 3.
    Baniasadi, A., Moshovos, A.: Asymmetric-frequency clustering: a power-aware back-end for high-performance processors. In: ISLPED, pp. 255–258 (2002)Google Scholar
  4. 4.
    Desoli, G.: Instruction assignment for clustered vliw dsp compilers: A new approach. HP laboratories Technical Report HPL (1998)Google Scholar
  5. 5.
    Ellis, J.: Bulldog: A compiler for vliw architectures. Technical Report, Yale Univ., New Haven, CT, USA (1985)Google Scholar
  6. 6.
    Faraboschi, P., Brown, G., et al.: Lx: a technology platform for customizable vliw embedded processing. In: ISCA, pp. 203–213 (2000)Google Scholar
  7. 7.
    Fridman, J., Greenfield, Z.: The tigersharc dsp architecture. IEEE Micro 20(1), 66–76 (2000)CrossRefGoogle Scholar
  8. 8.
    Fritts, J., Steiling, F., et al.: Mediabench ii video: expediting the next generation of video systems research. In: Proceedings of SPIE, vol. 5683, p. 79 (2005)Google Scholar
  9. 9.
    Kailas, K., Ebcioglu, K., Agrawala, A.: Cars: a new code generation framework for clustered ilp processors. Technical Report UMIACS-TR-2000-55 (2000)Google Scholar
  10. 10.
    Kailas, K., Ebcioglu, K., Agrawala, A.: Cars: a new code generation framework for clustered ilp processors. In: HPCA, pp. 133–143 (2001)Google Scholar
  11. 11.
    Lee, W., Barua, R., et al.: Space-time scheduling of instruction-level parallelism on a raw machine. In: ASPLOS (1998)Google Scholar
  12. 12.
    Lowney, P.G., Freudenberger, S.M., et al.: The multiflow trace scheduling compiler. Journal of Supercomputing 7, 51–142 (1993)CrossRefGoogle Scholar
  13. 13.
    Muralimanohar, N., et al.: Power efficient resource scaling in partitioned architectures through dynamic heterogeneity. In: ISPASS, pp. 100–111 (2006)Google Scholar
  14. 14.
    Ozer, E., et al.: Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures, pp. 308–315 (1998)Google Scholar
  15. 15.
    Pechanek, G., Vassiliadis, S.: The ManArray embedded processor architecture. Euromicro 1, 348–355 (2000)Google Scholar
  16. 16.
    Sharangpani, H., Arora, H.: Itanium processor microarchitecture. IEEE Micro 20(5), 24–43 (2000)CrossRefGoogle Scholar
  17. 17.
    Terechko, A., Corporaal, H.: Inter-cluster communication in vliw architectures. ACM Transactions on Architecture and Code Optimization (TACO) 4(2), 11 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Vasileios Porpodas
    • 1
  • Marcelo Cintra
    • 1
  1. 1.School of InformaticsUniversity of EdinburghUK

Personalised recommendations