Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

Porpodas, Vasileios; Cintra, Marcelo

doi:10.1007/978-3-319-09967-5_16

Vasileios Porpodas¹⁷ &
Marcelo Cintra^17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8664))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

660 Accesses

Abstract

The performance of statically scheduled VLIW processors is highly sensitive to the instruction scheduling performed by the compiler. In this work we identify a major deficiency in existing instruction scheduling for VLIW processors. Unlike most dynamically scheduled processors, a VLIW processor with no load-use hardware interlocks will completely stall upon a cache-miss of any of the operations that are scheduled to run in parallel. Other operations in the same or subsequent instruction words must stall. However, if coupled with non-blocking caches, the VLIW processor is capable of simultaneously resolving multiple loads from the same word. Existing instruction scheduling algorithms do not optimize for this VLIW-specific problem.

We propose Aligned Scheduling, a novel instruction scheduling algorithm that improves performance of VLIW processors with non-blocking caches by enabling them to better cope with unpredictable cache-memory latencies. Aligned Scheduling exploits the VLIW-specific cache-miss semantics to efficiently align cache misses on the same scheduling cycle, increasing the probability that they get serviced simultaneously. Our evaluation shows that Aligned Scheduling improves the performance of VLIW processors across a range of benchmarks from the Mediabench II and SPEC CINT2000 benchmark suites up to 20 %.

This work was supported in part by the EC under grant ERA 249059 (FP7).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gcc: Gnu compiler collection. http://gcc.gnu.org
ski IA64 simulator. http://ski.sourceforge.net
SPEC benchmark. http://www.spec.org
Branover, A., et al.: AMD Fusion APU: Llano. IEEE Micro 32(2), 28–37 (2012)
Article Google Scholar
Dehnert, J., et al.: The Transmeta code morphing software: using speculation, recovery, and adaptive retranslation to address real-life challenges. In: CGO (2003)
Google Scholar
Dehnert, J., et al.: Compiling for the Cydra. J. Supercomput. 7, 181–227 (1993)
Article Google Scholar
Ding, C., Carr, S., Sweany, P.: Modulo scheduling with cache reuse information. In: Lengauer, C., Griebl, M., Gorlatch, S. (eds.) Euro-Par 1997. LNCS, vol. 1300, pp. 1079–1083. Springer, Heidelberg (1997)
Chapter Google Scholar
Faraboschi, P., et al.: Lx: a technology platform for customizable VLIW embedded processing. In: ISCA (2000)
Google Scholar
Fisher, J.: Trace scheduling: a technique for global microcode compaction. IEEE Trans. Comput. 30(7), 478–490 (1981)
Article Google Scholar
Fisher, J.A., Faraboschi, P., Young, C.: VLIW processors. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 2135–2142. Springer, Heidelberg (2011)
Google Scholar
Fridman, J., Greenfield, Z.: The TigerSHARC DSP architecture. IEEE Micro 20(1), 66–176 (2000)
Article Google Scholar
Fritts, J., et al.: Mediabench II video: expediting the next generation of video systems research. In: SPIE (2005)
Google Scholar
Kerns, D., Eggers, S.: Balanced scheduling: instruction scheduling when memory latency is uncertain. In: PLDI (1993)
Google Scholar
Klaiber, A., et al.: The technology behind Crusoe processors. Transmeta Corporation White Paper (2000)
Google Scholar
Kroft, D.: Lockup-free instruction fetch/prefetch cache organization. In: ISCA (1981)
Google Scholar
Lam, M.: Software pipelining: an effective scheduling technique for VLIW machines. In: PLDI (1988)
Google Scholar
Lindenmaier, G., McKinley, K.S., Temam, O.: Load scheduling with profile information. In: Bode, A., Ludwig, T., Karl, W.C., Wismüller, R. (eds.) Euro-Par 2000. LNCS, vol. 1900, pp. 223–233. Springer, Heidelberg (2000)
Chapter Google Scholar
Llosa, J.: Swing modulo scheduling: a lifetime-sensitive approach. In: PACT (1996)
Google Scholar
Lo, J., et al.: Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism. In: PLDI (1995)
Google Scholar
McNairy, C., et al.: Itanium 2 processor microarchitecture. IEEE Micro 23(2), 44–55 (2003)
Article Google Scholar
Moon, S., et al.: An efficient resource-constrained global scheduling technique for superscalar and VLIW processors. In: MICRO (1992)
Google Scholar
Pai, V., et al.: Code transformations to improve memory parallelism. In: MICRO (1999)
Google Scholar
Pechanek, G., Vassiliadis, S.: The ManArrayTM embedded processor architecture. In: Euromicro (2000)
Google Scholar
Rau, B., Glaeser, C.: Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In: Workshop on Microprogramming (1981)
Google Scholar
Sánchez, F., González, A.: Cache sensitive modulo scheduling. In: MICRO (1997)
Google Scholar
Scheurich, C., et al.: Lockup-free caches in high-performance multiprocessors. J. Parallel Distrib. Syst. 11(1), 25–36 (1991)
Article Google Scholar
Sharangpanim, H., et al.: Itanium processor microarchitecture. IEEE Micro 20(5), 24–43 (2000)
Article Google Scholar
Sohi, G., Franklin, M.: High-bandwidth data memory systems for superscalar processors. In: ASPLOS (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, University of Edinburgh, Edinburgh, UK
Vasileios Porpodas & Marcelo Cintra
Intel Labs Braunschweig, Braunschweig, Germany
Marcelo Cintra

Authors

Vasileios Porpodas
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Cintra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vasileios Porpodas .

Editor information

Editors and Affiliations

Silicon Valley, Qualcomm Research, San Jose, California, USA
Călin Cașcaval
Silicon Valley, Qualcomm Research, San Jose, California, USA
Pablo Montesinos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Porpodas, V., Cintra, M. (2014). Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors. In: Cașcaval, C., Montesinos, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2013. Lecture Notes in Computer Science(), vol 8664. Springer, Cham. https://doi.org/10.1007/978-3-319-09967-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-09967-5_16
Published: 01 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09966-8
Online ISBN: 978-3-319-09967-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics