Abstract
In this paper, we propose a novel organization of high-speed memories, known as the register-cache, for a multi-threaded architecture. As the term suggests, it is organized both as a register file and a cache. Viewed from the execution unit, its contents are addressable similar to ordinary CPU registers using relatively short addresses. From the main memory perspective, it is content addressable, i.e., its contents are tagged just as in conventional caches. The register allocation for the register-cache is adaptively performed at runtime, resulting in a dynamically allocated register file.
A program is compiled into a number of instruction threads called super-actors. A super-actor becomes ready for execution only when its input data are physically residing in the register-cache and space is reserved in the register-cache to store its result. Therefore, the execution unit will never stall or ‘freeze’ when accessing instruction or data. Another advantage is that since registers are dynamically assigned at runtime, register allocation difficulties at compile-time, e.g., allocating registers for subscripted variables of large arrays, can be avoided. Architectural support for overlapping executions of super-actors and main memory operations are provided so that the available concurrency in the underlying machine can be better utilized. The preliminary simulation results seem to be very encouraging: with software pipelined loops, a register-cache of moderate size can keep the execution unit usefully busy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Agarwal, B. H. Lim, D. Kranz, and J. Kubiatowicz. APRIL: A processor architecture for multiprocessing. In Proceedings of the 17th International Symposium on Computer Architecture, pages 104–114, 1990.
R. Alverson et al. The Tera computer system. In Proc. of the 1990 Intl. Conf. on Supercomputing, 1990.
Arvind. Personal communication, 1990.
Arvind and R. A. Iannucci. Two fundamental issues in multiprocessing. Computation Structures Group Memo 226, Laboratory for Computer Science, MIT, 1987.
R. Bell. IBM RISC system/6000 preformance tuning for numerically intensive FORTRAN and C programs. Technical Report GG24–3611, IBM Int’l. Technical Support Center, Aug. 1990.
D. Callahan and A. Porterfield. Data cache performance of supercomputer applications. In Proc. of the Supercomputing ‘80 Conference, pages 564–572, New York, New York, 1990.
David Callahan, Steve Carr, and Ken Kennedy. Improving register allocation for subscripted variables. Proceedings of the SIGPLAN ‘80 Conference on Programming Language Design and Implementation, June 1990. White Plains, NY.
G. R. Gao, H. H. J. Hum, and Y. B. Wong. Towards efficient fine-grain software pipelining. In Proceedings of the ACM International Conference on Supercomputing, Amsterdam, Netherlands, June 1990.
G. R. Gao, R. Tio, and H. J. Hum. Design of an efficient datafiow architecture without datafiow. In Proceedings of the International Conference on Fifth-Generation Computers, pages 861–868, Tokyo, Japan, December 1988.
P.P. Gelsinger et al. Microprocessors circa 2000. IEEE Spectrum, pages 43–47, Oct. 1989.
R. H. Halstead Jr and T. Fujita. MASA: A multithreaded processor architecture for parallel symbolic computing. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 443–451, 1988.
J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach. Morgain Kaufman Publishers Inc., San Mateo, CA, 1990.
R. A. Iannucci. Toward a datafiow/von Neumann hybrid architecture. In Proceedings of the 15th Annual International Symposium on Computer Architecture, pages 131–140, 1988.
N.P. Jouppi and D.W. Wall. Available instruction-level parallelism for superscalar and superpipelined machines. In Third Intl. Conf. on Arch. Support for Prog. Lang. and Operating Sys., pages 272–282, 1988.
R. Nikhil and Arvind. Can datafiow subsume von Neumann computing? In Proceedings of the 16th International Symposium on Computer Architecture, pages 262–272, Israel, 1989.
R: S. Nikhil and Arvind. Id: A language with implicit parallelism. Computation Structures Group Memo 305, Laboratory for Computer Science, MIT, 1990.
M.R. Thistle and B.J. Smith. A processor architecture for Horizon. In Proc. of the Supercomputing Conference ‘88, Florida, 1988.
K. R. Traub. Sequential implementation of lenient programming languages. Technical Report MIT/LCS/TR-417, Laboratory for Computer Science, MIT, 1988.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1991 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hum, H.H.J., Gao, G.R. (1991). A Novel High-Speed Memory Organization for Fine-Grain Multi-Thread Computing. In: Aarts, E.H.L., van Leeuwen, J., Rem, M. (eds) Parle ’91 Parallel Architectures and Languages Europe. Lecture Notes in Computer Science, vol 505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-25209-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-662-25209-3_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-23206-4
Online ISBN: 978-3-662-25209-3
eBook Packages: Springer Book Archive