Kilo-instruction Processors

Cristal, Adrián; Ortega, Daniel; Llosa, Josep; Valero, Mateo

doi:10.1007/978-3-540-39707-6_2

Adrián Cristal⁸,
Daniel Ortega⁸,
Josep Llosa⁸ &
…
Mateo Valero⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2858))

Included in the following conference series:

International Symposium on High Performance Computing

578 Accesses
6 Citations

Abstract.

Due to the difference between processor speed and memory speed, the latter has steadily appeared further away in cycles to the processor. Superscalar out-of-order processors cope with these increasing latencies by having more in-flight instructions from where to extract ILP. With coming latencies of 500 cycles and more, this will eventually derive in what we have called Kilo-Instruction Processors, which will have to handle thousands of in-flight instructions. Managing such a big number of in-flight instructions must imply a microarchitectural change in the way the re-order buffer, the instructions queues and the physical registers are handled, since simply up-sizing these resources is technologically unfeasible. In this paper we present a survey of several techniques which try to solve these problems caused by thousands of in-flight instructions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Balasubramonian, R., Dwarkadas, S., Albonesi, D.: Dynamically allocating processor resources between nearby and distant ilp. In: Proceedings of the 28th annual international symposium on on Computer architecture, pp. 26–37. ACM Press, New York (2001)
Chapter Google Scholar
Brekelbaum, E., Rupley, J., Wilkerson, C., Black, B.: Hierarchical scheduling windows. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 27–36. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Brown, M.D., Stark, J., Patt, Y.N.: Select-free instruction scheduling logic. In: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pp. 204–213. IEEE Computer Society, Los Alamitos (2001)
Chapter Google Scholar
Cristal, A., Martínez, J.F., Llosa, J., Valero, M.: A case for resourceconscious out-of-order processors. Technical Report UPC-DAC-2003-45, Universidad Politécnica de Cataluña, Department of Computer Architecture (July 2003)
Google Scholar
Cristal, A., Ortega, D., Martínez, J.F., Llosa, J., Valero, M.: Out-of-order commit processors. Technical Report UPC-DAC-2003-44, Universidad Politécnica de Cataluña, Department of Computer Architecture (July 2003)
Google Scholar
Cristal, A., Valero, M., Gonzalez, A., LLosa, J.: Large virtual robs by processor checkpointing. Technical Report UPC-DAC-2002-39, Universidad Politécnica de Cataluña, Department of Computer Architecture (July 2002)
Google Scholar
Cruz, J.-L., González, A., Valero, M., Topham, N.P.: Multiple-banked register file architectures. In: Proceedings of the 27th annual international symposium on Computer architecture, pp. 316–325. ACM Press, New York (2000)
Google Scholar
Farkas, K.I., Chow, P., Jouppi, N.P., Vranesic, Z.: Memorysystem design considerations for dynamically-scheduled processors. In: Proceedings of the 24th annual international symposium on Computer architecture, pp. 133–143. ACM Press, New York (1997)
Google Scholar
Folegnani, D., González, A.: Energy-effective issue logic. In: Proceedings of the 28th Annual International Symposium on Computer Architecture, Göteborg, Sweden, May 2001, vol. 29(2), pp. 230–239. IEEE Computer Society and ACM SIGARCH (2001); Computer Architecture News 29(2) (May 2001)
Google Scholar
González, A., González, J., Valero, M.: Virtual-physical registers. In: IEEE International Symposium on High-Performance Computer Architecture (February 1998)
Google Scholar
Hennessy, J.L., Patterson, D.A.: Computer Architecture. A Quantitative Approach, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1996)
MATH Google Scholar
Hwu, W.M., Patt, Y.N.: Checkpoint repair for out-of-order execution machines. In: Proceedings of the 14th annual international symposium on Computer architecture, pp. 18–26. ACM Press, New York (1987)
Google Scholar
Jouppi, N.P., Ranganathan, P.: The relative importance of memory latency, bandwidth, and branch limits to performance. In: Workshop of Mixing Logic and DRAM: Chips that Compute and Remember, ACM Press, New York (1997)
Google Scholar
Lebeck, A.R., Koppanalil, J., Li, T., Patwardhan, J., Rotenberg, E.: A large, fast instruction window for tolerating cache misses. In: Proceedings of the 29th annual international symposium on Computer architecture, pp. 59–70. IEEE Computer Society, Los Alamitos (2002)
Chapter Google Scholar
Lo, J., Parekh, S., Eggers, S., Levy, H., Tullsen, D.: Software-directed register deallocation for simultaneous multithreaded processors. Technical Report TR-97- 12-01, University of Washington, Department of Computer Science and Engineering (1997)
Google Scholar
Lozano, L.A., Gao, G.R.: Exploiting short-lived variables in superscalar processors. In: Proceedings of the 28th annual international symposium on Microarchitecture, November 1995, IEEE Computer Society Press, Los Alamitos (1995)
Google Scholar
Martin, M.M., Roth, A., Fischer, C.N.: Exploiting dead value information. In: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, December 1997, IEEE Computer Society Press, Los Alamitos (1997)
Google Scholar
Martínez, J.F., Cristal, A., Valero, M., Llosa, J.: Ephemeral registers. Technical Report CSL-TR-2003-1035, Cornell Computer Systems Lab (2003)
Google Scholar
Martínez, J.F., Renau, J., Huang, M.C., Prvulovic, M., Torrellas, J.: Cherry: checkpointed early resource recycling in out-of-order microprocessors. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 3–14. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Morancho, E., Llabería, J.M., Olivé, A.: Recovery mechanism for latency misprediction. Technical Report UPC-DAC-2001-37, Universidad Politécnica de Cataluña, Department of Computer Architecture (November 2001)
Google Scholar
Moudgill, M., Pingali, K., Vassiliadis, S.: Register renaming and dynamic speculation: an alternative approach. In: Proceedings of the 26th annual international symposium on Microarchitecture, pp. 202–213. IEEE Computer Society Press, Los Alamitos (1993)
Chapter Google Scholar
Mutlu, O., Stark, J., Wilkerson, C., Patt, Y.N.: Runahead execution: An alternative to very large instruction windows for out-of-order processors. In: Proceedings of the Ninth International Symposium on High-Performance Computer Architecture, Anaheim, California, February 8–12. IEEE Computer Society TCCA, Los Alamitos (2003)
Google Scholar
Palacharla, S., Jouppi, N.P., Smith, J.E.: Complexity-effective superscalar processors. In: Proceedings of the 24th international symposium on Computer architecture, pp. 206–218. ACM Press, New York (1997)
Google Scholar
Park, I., Powell, M., Vijaykumar, T.: Reducing register ports for higher speed and lower energy. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 171–182. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Seznec, A., Toullec, E., Rochecouste, O.: Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 383–394. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Sherwood, T., Perelman, E., Calder, B.: Basic block distribution analysis to find periodic behavior and simulation points in applications. In: Proceedings of the Intl. Conference on Parallel Architectures and Compilation Techniques, September 2001, pp. 3–14 (2001)
Google Scholar
Sima, D.: The design space of register renaming techniques. In: Micro, IEEE, September 1999, vol. 20(5), pp. 70–83. IEEE Computer Society, Los Alamitos (1999)
Google Scholar
Skadron, K., Ahuja, P.A., Martonosi, M., Clark, D.W.: Branch prediction, instruction-window size, and cache size: Performance trade-offs and simulation techniques. IEEE Transactions on Computers, 1260–1281 (1999)
Google Scholar
Stark, J., Brown, M.D., Patt, Y.N.: On pipelining dynamic instruction scheduling logic. In: Proceedings of the 33rd Annual International Symposium on Microarchitecture, Monterey, California, December 10-13, pp. 57–66. IEEE Computer Society TC-MICRO and ACM SIGMICRO (2000)
Google Scholar
Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units (January 1967)
Google Scholar
Tseng, J., Asanovic, K.: Energy-efficient register access. In: XIII Symposium on Integrated Circuits and System Design (September 2000)
Google Scholar
Wallace, S., Bagherzadeh, N.: A scalable register file architecture for dynamically scheduled processors. In: Proceedings: Parallel Architectures and Compilation Techniques (October 1996)
Google Scholar
Wulf, W.A., McKee, S.A.: Hitting the memory wall: Implications of the obvious. In: Computer Architecture News, pp. 20–24 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Arquitectura de Computadores, Universidad Politécnica de Cataluña, Barcelona, Spain
Adrián Cristal, Daniel Ortega, Josep Llosa & Mateo Valero

Authors

Adrián Cristal
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Ortega
View author publications
You can also search for this author in PubMed Google Scholar
Josep Llosa
View author publications
You can also search for this author in PubMed Google Scholar
Mateo Valero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of California (UCI), 3019 Donald Bren Hall, 92697-3435, Irvine, CA, USA
Alex Veidenbaum
Department of Information and Computer Science, Faculty of Science, Nara women’s University, Kitauoyanishi-machi, Nara-city, 630-8506, Nara, Japan
Kazuki Joe
Keio University, Hiyoshi, Kohoku, Yokohama, 223–8522, Kanagawa, Japan
Hideharu Amano
Tokyo University of Technology, 1404-1 Katakura, Hachioji, 192-0982, Tokyo, Japan
Hideo Aiso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cristal, A., Ortega, D., Llosa, J., Valero, M. (2003). Kilo-instruction Processors. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds) High Performance Computing. ISHPC 2003. Lecture Notes in Computer Science, vol 2858. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39707-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-39707-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20359-9
Online ISBN: 978-3-540-39707-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics