Abstract.
Due to the difference between processor speed and memory speed, the latter has steadily appeared further away in cycles to the processor. Superscalar out-of-order processors cope with these increasing latencies by having more in-flight instructions from where to extract ILP. With coming latencies of 500 cycles and more, this will eventually derive in what we have called Kilo-Instruction Processors, which will have to handle thousands of in-flight instructions. Managing such a big number of in-flight instructions must imply a microarchitectural change in the way the re-order buffer, the instructions queues and the physical registers are handled, since simply up-sizing these resources is technologically unfeasible. In this paper we present a survey of several techniques which try to solve these problems caused by thousands of in-flight instructions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Balasubramonian, R., Dwarkadas, S., Albonesi, D.: Dynamically allocating processor resources between nearby and distant ilp. In: Proceedings of the 28th annual international symposium on on Computer architecture, pp. 26–37. ACM Press, New York (2001)
Brekelbaum, E., Rupley, J., Wilkerson, C., Black, B.: Hierarchical scheduling windows. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 27–36. IEEE Computer Society Press, Los Alamitos (2002)
Brown, M.D., Stark, J., Patt, Y.N.: Select-free instruction scheduling logic. In: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pp. 204–213. IEEE Computer Society, Los Alamitos (2001)
Cristal, A., Martínez, J.F., Llosa, J., Valero, M.: A case for resourceconscious out-of-order processors. Technical Report UPC-DAC-2003-45, Universidad Politécnica de Cataluña, Department of Computer Architecture (July 2003)
Cristal, A., Ortega, D., Martínez, J.F., Llosa, J., Valero, M.: Out-of-order commit processors. Technical Report UPC-DAC-2003-44, Universidad Politécnica de Cataluña, Department of Computer Architecture (July 2003)
Cristal, A., Valero, M., Gonzalez, A., LLosa, J.: Large virtual robs by processor checkpointing. Technical Report UPC-DAC-2002-39, Universidad Politécnica de Cataluña, Department of Computer Architecture (July 2002)
Cruz, J.-L., González, A., Valero, M., Topham, N.P.: Multiple-banked register file architectures. In: Proceedings of the 27th annual international symposium on Computer architecture, pp. 316–325. ACM Press, New York (2000)
Farkas, K.I., Chow, P., Jouppi, N.P., Vranesic, Z.: Memorysystem design considerations for dynamically-scheduled processors. In: Proceedings of the 24th annual international symposium on Computer architecture, pp. 133–143. ACM Press, New York (1997)
Folegnani, D., González, A.: Energy-effective issue logic. In: Proceedings of the 28th Annual International Symposium on Computer Architecture, Göteborg, Sweden, May 2001, vol. 29(2), pp. 230–239. IEEE Computer Society and ACM SIGARCH (2001); Computer Architecture News 29(2) (May 2001)
González, A., González, J., Valero, M.: Virtual-physical registers. In: IEEE International Symposium on High-Performance Computer Architecture (February 1998)
Hennessy, J.L., Patterson, D.A.: Computer Architecture. A Quantitative Approach, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1996)
Hwu, W.M., Patt, Y.N.: Checkpoint repair for out-of-order execution machines. In: Proceedings of the 14th annual international symposium on Computer architecture, pp. 18–26. ACM Press, New York (1987)
Jouppi, N.P., Ranganathan, P.: The relative importance of memory latency, bandwidth, and branch limits to performance. In: Workshop of Mixing Logic and DRAM: Chips that Compute and Remember, ACM Press, New York (1997)
Lebeck, A.R., Koppanalil, J., Li, T., Patwardhan, J., Rotenberg, E.: A large, fast instruction window for tolerating cache misses. In: Proceedings of the 29th annual international symposium on Computer architecture, pp. 59–70. IEEE Computer Society, Los Alamitos (2002)
Lo, J., Parekh, S., Eggers, S., Levy, H., Tullsen, D.: Software-directed register deallocation for simultaneous multithreaded processors. Technical Report TR-97- 12-01, University of Washington, Department of Computer Science and Engineering (1997)
Lozano, L.A., Gao, G.R.: Exploiting short-lived variables in superscalar processors. In: Proceedings of the 28th annual international symposium on Microarchitecture, November 1995, IEEE Computer Society Press, Los Alamitos (1995)
Martin, M.M., Roth, A., Fischer, C.N.: Exploiting dead value information. In: Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, December 1997, IEEE Computer Society Press, Los Alamitos (1997)
Martínez, J.F., Cristal, A., Valero, M., Llosa, J.: Ephemeral registers. Technical Report CSL-TR-2003-1035, Cornell Computer Systems Lab (2003)
Martínez, J.F., Renau, J., Huang, M.C., Prvulovic, M., Torrellas, J.: Cherry: checkpointed early resource recycling in out-of-order microprocessors. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 3–14. IEEE Computer Society Press, Los Alamitos (2002)
Morancho, E., Llabería, J.M., Olivé, A.: Recovery mechanism for latency misprediction. Technical Report UPC-DAC-2001-37, Universidad Politécnica de Cataluña, Department of Computer Architecture (November 2001)
Moudgill, M., Pingali, K., Vassiliadis, S.: Register renaming and dynamic speculation: an alternative approach. In: Proceedings of the 26th annual international symposium on Microarchitecture, pp. 202–213. IEEE Computer Society Press, Los Alamitos (1993)
Mutlu, O., Stark, J., Wilkerson, C., Patt, Y.N.: Runahead execution: An alternative to very large instruction windows for out-of-order processors. In: Proceedings of the Ninth International Symposium on High-Performance Computer Architecture, Anaheim, California, February 8–12. IEEE Computer Society TCCA, Los Alamitos (2003)
Palacharla, S., Jouppi, N.P., Smith, J.E.: Complexity-effective superscalar processors. In: Proceedings of the 24th international symposium on Computer architecture, pp. 206–218. ACM Press, New York (1997)
Park, I., Powell, M., Vijaykumar, T.: Reducing register ports for higher speed and lower energy. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 171–182. IEEE Computer Society Press, Los Alamitos (2002)
Seznec, A., Toullec, E., Rochecouste, O.: Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors. In: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pp. 383–394. IEEE Computer Society Press, Los Alamitos (2002)
Sherwood, T., Perelman, E., Calder, B.: Basic block distribution analysis to find periodic behavior and simulation points in applications. In: Proceedings of the Intl. Conference on Parallel Architectures and Compilation Techniques, September 2001, pp. 3–14 (2001)
Sima, D.: The design space of register renaming techniques. In: Micro, IEEE, September 1999, vol. 20(5), pp. 70–83. IEEE Computer Society, Los Alamitos (1999)
Skadron, K., Ahuja, P.A., Martonosi, M., Clark, D.W.: Branch prediction, instruction-window size, and cache size: Performance trade-offs and simulation techniques. IEEE Transactions on Computers, 1260–1281 (1999)
Stark, J., Brown, M.D., Patt, Y.N.: On pipelining dynamic instruction scheduling logic. In: Proceedings of the 33rd Annual International Symposium on Microarchitecture, Monterey, California, December 10-13, pp. 57–66. IEEE Computer Society TC-MICRO and ACM SIGMICRO (2000)
Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units (January 1967)
Tseng, J., Asanovic, K.: Energy-efficient register access. In: XIII Symposium on Integrated Circuits and System Design (September 2000)
Wallace, S., Bagherzadeh, N.: A scalable register file architecture for dynamically scheduled processors. In: Proceedings: Parallel Architectures and Compilation Techniques (October 1996)
Wulf, W.A., McKee, S.A.: Hitting the memory wall: Implications of the obvious. In: Computer Architecture News, pp. 20–24 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cristal, A., Ortega, D., Llosa, J., Valero, M. (2003). Kilo-instruction Processors. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds) High Performance Computing. ISHPC 2003. Lecture Notes in Computer Science, vol 2858. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39707-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-39707-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20359-9
Online ISBN: 978-3-540-39707-6
eBook Packages: Springer Book Archive