Widening the Memory Bottleneck by Automatically-Compiled Application-Specific Speculation Mechanisms

  • Benjamin ThielmannEmail author
  • Jens Huthmann
  • Thorsten Wink
  • Andreas Koch


The rate of improvement in the single-thread performance of conventional central processing units (CPUs) has decreased significantly over the last decade. This is mainly due to the difficulties in obtaining higher clock frequencies. As a consequence, the focus of development has shifted to multi-threaded execution models and multi-core CPU designs instead. Unfortunately, there are still many important algorithms and applications that cannot easily be rewritten to take advantage of this new computing paradigm. Thus, the performance gap between parallelizable algorithms and those depending on single-thread performance has widened significantly. Application-specific hardware accelerators with optimized pipelines are able to provide improved single-thread performance but have only limited flexibility and require high development effort compared to programming software-programmable processors (SPPs).


External Memory Cache Line Data Flow Graph Input Queue Output Queue 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by the German national research foundation DFG and by Xilinx Inc.


  1. 1.
    Aho AV, Lam MS et al (2006) Compilers: principles, techniques, and tools, 2nd edn. Prentice Hall, New JerseyGoogle Scholar
  2. 2.
    Budiu M, Goldstein SC (2003) Optimizing memory accesses for spatial computation. In: Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO ’03, IEEE Computer Society, Silver Spring, MD, pp 216–227Google Scholar
  3. 3.
    Burtscher M, Zorn BG et al (2002) Hybrid load-value predictors. IEEE Trans Comput 51:759–774CrossRefGoogle Scholar
  4. 4.
    Callahan TJ, Hauser JR et al (2000) The Garp architecture and C compiler. IEEE Comput 33(4):62–69CrossRefGoogle Scholar
  5. 5.
    Gädke-Lütjens H (2011) Dynamic scheduling in high-level compilation for adaptive computers. Ph.D. thesis, Technical University BraunschweigGoogle Scholar
  6. 6.
    González J, González A (1999) Limits of instruction level parallelism with data value speculation. In: International conference on vector and parallel processing. VECPAR ’98, Springer, London, UK, pp 452–465Google Scholar
  7. 7.
    Scale Compiler Group (2006) Scale. A scalable compiler for analytical experiments. Department of Computer Science University of Massachusetts,
  8. 8.
    Guo Z, Najjar W et al (2008) Efficient hardware code generation for FPGAs. ACM Trans. on Architecture and Code Optimization (TACO) 5(1):1–26Google Scholar
  9. 9.
    Hennessy JL, Patterson DA (2003) Computer architecture: a quantitative approach, 3rd edn. Morgan Kaufmann Publishers, San Francisco, CA, USAGoogle Scholar
  10. 10.
    Isen C, John LK et al (2009) A tale of two processors: revisiting the RISC-CISC debate. In: Proceedings of SPEC Benchmark Workshop, pp 57–76Google Scholar
  11. 11.
    Jouppi NP (1990) Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In: Proceedings of the 17th annual international symposium on computer architecture, ISCA ’90, ACM, New York, NY, USA, pp 364–373Google Scholar
  12. 12.
    Kaeli D, Yew P-C (2005) Speculative execution in high performance computer architectures. CRC Press, Boca Raton, FLGoogle Scholar
  13. 13.
    Kumar S, Pires L et al (2000) A benchmark suite for evaluating configurable computing systems—status, reflections, and future directions. In: FPGA, ACM, New York, NY, USA, pp 126–134Google Scholar
  14. 14.
    Lange H, Koch A (2007) An execution model for hardware/software compilation and its system-level realization. In: International conference on field programmable logic and applications (FPL), 2007, pp 285–292CrossRefGoogle Scholar
  15. 15.
    Lange H, Koch A (2010) Architectures and execution models for hardware/software compilation and their system-level realization. IEEE Trans Comput 59(10):1363–1377MathSciNetCrossRefGoogle Scholar
  16. 16.
    Lange H, Wink T et al (2011) MARC II: A parametrized speculative multi-ported memory subsystem for reconfigurable computers. In: 2011 Conference on design, automation & test in Europe (DATE)Google Scholar
  17. 17.
    Lee C, Potkonjak M et al (1997) MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: Proceedings of 30th annual IEEE/ACM international symposium on microarchitecture, 1997, pp 330–335Google Scholar
  18. 18.
    Lipasti MH, Wilkerson CB et al (1996) Value locality and load value prediction. ACM, New York, NY, USA, 31(9):138–147Google Scholar
  19. 19.
    McNairy C, Soltis D (2003) Itanium 2 processor microarchitecture. IEEE Micro 23:44–55CrossRefGoogle Scholar
  20. 20.
    Micheli GD (1994) Synthesis and optimization of digital circuits, 1st edn. McGraw-Hill Higher Education, New York, USAGoogle Scholar
  21. 21.
    Mock M, Villamarin R et al (2005) An empirical study of data speculation use on the intel itanium 2 processor. In: Proceedings of workshop on interaction between compilers and computer architectures, IEEE Computer Society, Washington, DC, USA, pp 22–33Google Scholar
  22. 22.
    Putnam A, Bennett D et al (2008) CHiMPS: A C-level compilation flow for hybrid CPU-FPGA architectures. In: 2008 international conference on field programmable logic and applications (FPL), pp 173–178Google Scholar
  23. 23.
    Sazeides Y, Smith JE (1997) The predictability of data values. In: Proceedings of international symposium on microarchitecture, MICRO 30. IEEE Computer Society, Washington, DC, USA, pp 248–258Google Scholar
  24. 24.
    Thielmann B, Huthmann J et al (2011) Evaluation of speculative execution techniques for high-level language to hardware compilation. In: 6th international workshop on reconfigurable communication-centric systems-on-chip (ReCoSoC) 2011, pp 1–8Google Scholar
  25. 25.
    Thielmann B, Huthmann J et al (2011) Precore—a token-based speculation architecture for high-level language to hardware compilation. In: 2011 international conference on field programmable logic and applications (FPL), pp 123–129Google Scholar
  26. 26.
    Thielmann B, Wink T et al (2011) RAP: More efficient memory access in highly speculative execution on reconfigurable adaptive computers. In: 2011 international conference on reconfigurable computing and FPGAs (ReConFig)Google Scholar
  27. 27.
    Wang K, Franklin M (1997) Highly accurate data value prediction using hybrid predictors. In: Proceedings 30th annual IEEE/ACM international symposium on microarchitecture, 1997, pp 281–290Google Scholar
  28. 28.
    Weaver G, Cahoon B et al (1997) Common language encoding form (clef) design document. Technical report, Department of Computer Science, University of MassachusettsGoogle Scholar
  29. 29.
    Yeh T-Y, Patt YN (1992) Alternative implementations of two-level adaptive branch prediction. In: Proceedings of the 19th annual international symposium on computer architecture, pp 124–134Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2013

Authors and Affiliations

  • Benjamin Thielmann
    • 1
    Email author
  • Jens Huthmann
    • 1
  • Thorsten Wink
    • 1
  • Andreas Koch
    • 1
  1. 1.Embedded Systems and Applications GroupTechnische Universität Darmstadt, FB20 (Informatik), FG ESADarmstadtGermany

Personalised recommendations