International Journal of Parallel Programming

, Volume 29, Issue 1, pp 35–58 | Cite as

An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors

Article

Abstract

The performance of superscalar processors depends on many parameters with correlated effects. This paper explores the relations between some of these parameters, and more particularly, the requirement in instruction fetch bandwidth. We introduce new enhancements to increase the bandwidth of conventional instruction fetch engines. However, experiments show that the performance does not increase proportionally to the fetch. Once the measured IPC is half the instruction fetch bandwidth, increasing the fetch bandwidth brings very little improvement. In order to better understand this behavior, we develop a model from the empirical observation that the available instruction parallelism grows as the square root of the instruction window size. From the model, we derive that the fetch bandwidth requirement grows as the square root of the distance between mispredicted branches. We also verify experimentally that, to double the IPC, one should both double the fetch bandwidth and decrease the number of mispredicted branches fourfold.

out-of-order superscalar processor instruction fetch two-block ahead branch prediction square-root model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFERENCES

  1. 1.
    Keith Diefendorff, Hal makes Sparcs fly, Microprocessor Report 13(15):1-12 (November 1999).Google Scholar
  2. 2.
    E. Rotenberg, S. Bennett, and J. E. Smith, Trace cache: A low latency approach to high bandwidth instruction fetching, Proc. 29th Int'l. Symp. on Microarchitecture (1996).Google Scholar
  3. 3.
    T. M. Conte, K. N. Menezes, P. M. Mills, and B. A. Patel, Optimization of instruction fetch mechanisms for high issue rates, Proc. 22nd Ann. Int'l. Symp. on Computer Architecture (1995).Google Scholar
  4. 4.
    A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud, Multiple-block ahead branch predictor, Proc. Seventh Int'l. Conf. Architectural Support for Progr. Lang. Operat. Syst. (1996).Google Scholar
  5. 5.
    T.-Y. Yeh, D. T. Marr, and Y. N. Patt, Increasing the instruction fetch rate via multiple branch prediction and a branch address cache, Proc. Seventh ACM Int'l. Conf. on Super-computing (July 1993).Google Scholar
  6. 6.
    Tse-Yu Yeh and Yale Patt, Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors, Proc. 26th Int'l. Symp. on Microarchitedcture (1993).Google Scholar
  7. 7.
    P.-Y. Chang, E. Hao, and Y. N. Patt, Target prediction for indirect jumps, Proc. 24th Ann. Int'l. Symp. on Computer Architecture (1997).Google Scholar
  8. 8.
    R. Uhlig, D. Nagle, T. Mudge, S. Sechrest, and J. Emer, Coping with code bloat, Proc. 22nd Ann. Int'l. Symp. on Computer Architecture (June 1995).Google Scholar
  9. 9.
    P. Michaud, A. Seznec, and R. Uhlig, Trading conflict and capacity aliasing in conditional branch predictors,Proc. 24th Ann. Int'l. Symp. on Computer Architecture (1997).Google Scholar
  10. 10.
    Karel Driesen and Urs Holzle, The cascaded predictor: Economical and adaptive branch target prediction, Proc. 31st Ann. Int'l. Symp. on Microarchitecture (1998).Google Scholar
  11. 11.
    Brad Calder and Dirk Grunwald, Reducing branch costs via branch alignment, Proc. Sixth Int'l. Conf. Architectural Support for Progr. Lang. Operat. Syst. (1994).Google Scholar
  12. 12.
    Pierre Michaud, André Seznec, and Stéphan Jourdan, Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors, Proc. Int'l. Conf. Parallel Architectures and Compilation Techniques (October 1999).Google Scholar
  13. 13.
    Edward Riseman and Caxton Foster, The inhibition of potential parallelism by conditional jumps, IEEE Trans. on Computer Architectures C-21(12):1405-1411 (December 1972).Google Scholar
  14. 14.
    A. Klauser, T. Austin, D. Grunwald, and B. Calder, Dynamic Hammock predication for nonpredicated instruction set architectures,Proc. Int'l. Conf. on Parallel Architectures and Compilation Techniques (1998).Google Scholar
  15. 15.
    Artur Klauser, Abhijit Paithankar, and Dirk Grunwald, Selective Eager execution on the polypath architecture, Proc. 25th Ann. Int'l. Symp. on Computer Architecture (1998).Google Scholar
  16. 16.
    Scott A. Mahlke, Richard E. Hank, Roger A. Bringmann, John C. Gyllenhaal, David M. Gallagher, and Wen-mei W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. 27th Ann. Int'l. Symp. on Microarchitecture (1994).Google Scholar
  17. 17.
    S. Jourdan, R. Ronen, M. Bekerman, B. Shomar, and A. Yoaz, A novel renaming scheme to exploit value temporal locality through physical register reuse and unification, Proc. 31st Ann. Int'l. Symp. on Microarchitecture (1998).Google Scholar
  18. 18.
    M. H. Lipasti and J. P. Shen, Exceeding the dataflow limit with value prediction, Proc. 29th Int'l. Symp. on Microarchitecture (1996).Google Scholar
  19. 19.
    Y. Sazeides, S. Vassiliadis, and J. E. Smith, The performance potential of data dependence speculation and collapsing, Proc. 29th Int'l. Symp. on Microarchitecture (1996).Google Scholar
  20. 20.
    A. Sodani and G. S. Sohi, Dynamic instruction reuse, Proc. 24th Ann. Int'l. Symp. on Computer Architecture (1997). 58 Michaud, Seznec, and JourdanGoogle Scholar

Copyright information

© Plenum Publishing Corporation 2001

Authors and Affiliations

  • Pierre Michaud
    • 1
  • André Seznec
    • 1
  • Stéphan Jourdan
    • 2
  1. 1.IRISA/INRIARennesFrance
  2. 2.Intel CorporationHillsboro, Oregon

Personalised recommendations