Skip to main content

Kilo-instruction Processors

  • Conference paper
Book cover High Performance Computing (ISHPC 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2858))

Included in the following conference series:

Abstract.

Due to the difference between processor speed and memory speed, the latter has steadily appeared further away in cycles to the processor. Superscalar out-of-order processors cope with these increasing latencies by having more in-flight instructions from where to extract ILP. With coming latencies of 500 cycles and more, this will eventually derive in what we have called Kilo-Instruction Processors, which will have to handle thousands of in-flight instructions. Managing such a big number of in-flight instructions must imply a microarchitectural change in the way the re-order buffer, the instructions queues and the physical registers are handled, since simply up-sizing these resources is technologically unfeasible. In this paper we present a survey of several techniques which try to solve these problems caused by thousands of in-flight instructions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Balasubramonian, R., Dwarkadas, S., Albonesi, D.: Dynamically allocating processor resources between nearby and distant ilp. In: Proceedings of the 28th annual international symposium on on Computer architecture, pp. 26–37. ACM Press, New York (2001)

    Chapter  Google Scholar 

  2. Brekelbaum, E., Rupley, J., Wilkerson, C., Black, B.: Hierarchical scheduling windows. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 27–36. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  3. Brown, M.D., Stark, J., Patt, Y.N.: Select-free instruction scheduling logic. In: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pp. 204–213. IEEE Computer Society, Los Alamitos (2001)

    Chapter  Google Scholar 

  4. Cristal, A., Martínez, J.F., Llosa, J., Valero, M.: A case for resourceconscious out-of-order processors. Technical Report UPC-DAC-2003-45, Universidad Politécnica de Cataluña, Department of Computer Architecture (July 2003)

    Google Scholar 

  5. Cristal, A., Ortega, D., Martínez, J.F., Llosa, J., Valero, M.: Out-of-order commit processors. Technical Report UPC-DAC-2003-44, Universidad Politécnica de Cataluña, Department of Computer Architecture (July 2003)

    Google Scholar 

  6. Cristal, A., Valero, M., Gonzalez, A., LLosa, J.: Large virtual robs by processor checkpointing. Technical Report UPC-DAC-2002-39, Universidad Politécnica de Cataluña, Department of Computer Architecture (July 2002)

    Google Scholar 

  7. Cruz, J.-L., González, A., Valero, M., Topham, N.P.: Multiple-banked register file architectures. In: Proceedings of the 27th annual international symposium on Computer architecture, pp. 316–325. ACM Press, New York (2000)

    Google Scholar 

  8. Farkas, K.I., Chow, P., Jouppi, N.P., Vranesic, Z.: Memorysystem design considerations for dynamically-scheduled processors. In: Proceedings of the 24th annual international symposium on Computer architecture, pp. 133–143. ACM Press, New York (1997)

    Google Scholar 

  9. Folegnani, D., González, A.: Energy-effective issue logic. In: Proceedings of the 28th Annual International Symposium on Computer Architecture, Göteborg, Sweden, May 2001, vol. 29(2), pp. 230–239. IEEE Computer Society and ACM SIGARCH (2001); Computer Architecture News 29(2) (May 2001)

    Google Scholar 

  10. González, A., González, J., Valero, M.: Virtual-physical registers. In: IEEE International Symposium on High-Performance Computer Architecture (February 1998)

    Google Scholar 

  11. Hennessy, J.L., Patterson, D.A.: Computer Architecture. A Quantitative Approach, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1996)

    MATH  Google Scholar 

  12. Hwu, W.M., Patt, Y.N.: Checkpoint repair for out-of-order execution machines. In: Proceedings of the 14th annual international symposium on Computer architecture, pp. 18–26. ACM Press, New York (1987)

    Google Scholar 

  13. Jouppi, N.P., Ranganathan, P.: The relative importance of memory latency, bandwidth, and branch limits to performance. In: Workshop of Mixing Logic and DRAM: Chips that Compute and Remember, ACM Press, New York (1997)

    Google Scholar 

  14. Lebeck, A.R., Koppanalil, J., Li, T., Patwardhan, J., Rotenberg, E.: A large, fast instruction window for tolerating cache misses. In: Proceedings of the 29th annual international symposium on Computer architecture, pp. 59–70. IEEE Computer Society, Los Alamitos (2002)

    Chapter  Google Scholar 

  15. Lo, J., Parekh, S., Eggers, S., Levy, H., Tullsen, D.: Software-directed register deallocation for simultaneous multithreaded processors. Technical Report TR-97- 12-01, University of Washington, Department of Computer Science and Engineering (1997)

    Google Scholar 

  16. Lozano, L.A., Gao, G.R.: Exploiting short-lived variables in superscalar processors. In: Proceedings of the 28th annual international symposium on Microarchitecture, November 1995, IEEE Computer Society Press, Los Alamitos (1995)

    Google Scholar 

  17. Martin, M.M., Roth, A., Fischer, C.N.: Exploiting dead value information. In: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, December 1997, IEEE Computer Society Press, Los Alamitos (1997)

    Google Scholar 

  18. Martínez, J.F., Cristal, A., Valero, M., Llosa, J.: Ephemeral registers. Technical Report CSL-TR-2003-1035, Cornell Computer Systems Lab (2003)

    Google Scholar 

  19. Martínez, J.F., Renau, J., Huang, M.C., Prvulovic, M., Torrellas, J.: Cherry: checkpointed early resource recycling in out-of-order microprocessors. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 3–14. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  20. Morancho, E., Llabería, J.M., Olivé, A.: Recovery mechanism for latency misprediction. Technical Report UPC-DAC-2001-37, Universidad Politécnica de Cataluña, Department of Computer Architecture (November 2001)

    Google Scholar 

  21. Moudgill, M., Pingali, K., Vassiliadis, S.: Register renaming and dynamic speculation: an alternative approach. In: Proceedings of the 26th annual international symposium on Microarchitecture, pp. 202–213. IEEE Computer Society Press, Los Alamitos (1993)

    Chapter  Google Scholar 

  22. Mutlu, O., Stark, J., Wilkerson, C., Patt, Y.N.: Runahead execution: An alternative to very large instruction windows for out-of-order processors. In: Proceedings of the Ninth International Symposium on High-Performance Computer Architecture, Anaheim, California, February 8–12. IEEE Computer Society TCCA, Los Alamitos (2003)

    Google Scholar 

  23. Palacharla, S., Jouppi, N.P., Smith, J.E.: Complexity-effective superscalar processors. In: Proceedings of the 24th international symposium on Computer architecture, pp. 206–218. ACM Press, New York (1997)

    Google Scholar 

  24. Park, I., Powell, M., Vijaykumar, T.: Reducing register ports for higher speed and lower energy. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 171–182. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  25. Seznec, A., Toullec, E., Rochecouste, O.: Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 383–394. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  26. Sherwood, T., Perelman, E., Calder, B.: Basic block distribution analysis to find periodic behavior and simulation points in applications. In: Proceedings of the Intl. Conference on Parallel Architectures and Compilation Techniques, September 2001, pp. 3–14 (2001)

    Google Scholar 

  27. Sima, D.: The design space of register renaming techniques. In: Micro, IEEE, September 1999, vol. 20(5), pp. 70–83. IEEE Computer Society, Los Alamitos (1999)

    Google Scholar 

  28. Skadron, K., Ahuja, P.A., Martonosi, M., Clark, D.W.: Branch prediction, instruction-window size, and cache size: Performance trade-offs and simulation techniques. IEEE Transactions on Computers, 1260–1281 (1999)

    Google Scholar 

  29. Stark, J., Brown, M.D., Patt, Y.N.: On pipelining dynamic instruction scheduling logic. In: Proceedings of the 33rd Annual International Symposium on Microarchitecture, Monterey, California, December 10-13, pp. 57–66. IEEE Computer Society TC-MICRO and ACM SIGMICRO (2000)

    Google Scholar 

  30. Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units (January 1967)

    Google Scholar 

  31. Tseng, J., Asanovic, K.: Energy-efficient register access. In: XIII Symposium on Integrated Circuits and System Design (September 2000)

    Google Scholar 

  32. Wallace, S., Bagherzadeh, N.: A scalable register file architecture for dynamically scheduled processors. In: Proceedings: Parallel Architectures and Compilation Techniques (October 1996)

    Google Scholar 

  33. Wulf, W.A., McKee, S.A.: Hitting the memory wall: Implications of the obvious. In: Computer Architecture News, pp. 20–24 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cristal, A., Ortega, D., Llosa, J., Valero, M. (2003). Kilo-instruction Processors. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds) High Performance Computing. ISHPC 2003. Lecture Notes in Computer Science, vol 2858. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39707-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39707-6_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20359-9

  • Online ISBN: 978-3-540-39707-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics