Skip to main content
Log in

The use of intermediate memories for low-latency memory access in supercomputer scalar units

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

One of the prime considerations for high scalar performance in supercomputers is a low memory latency. With the increasing disparity between main memory and CPU clock speeds, the use of an intermediate memory in the hierarchy becomes necessary. In this paper, we present an intermediate memory structure called a programmable cache. A programmable cache exploits structural locality to decrease the average memory access time. We evaluate the concept of a programmable cache by using the vector registers in the CRAY X-MP and Y-MP supercomputers as a programmable cache. Our results indicate that a programmable cache can be used profitably to reduce the memory latency if the pattern of references to a data structure can be determined at compile time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abu-Sufah, W., and Maloney, A.D. 1986. Vector processing on the Alliant FX/8 multiprocessor. In Proc., 1986 Internat. Conf. on Parallel Processing (Aug.), 559–563.

  • Cray. 1984. Cray Computer Systems: CRAY X-MP Model 48 Mainframe Reference Manual. HR-0097, Cray Research, Inc., Mendota Heights, Minn.

    Google Scholar 

  • Cray. 1985. Cray Computer Systems: CRAY-2 Hardware Reference Manual. HR-2000, Cray Research, Inc., Mendota Heights, Minn.

    Google Scholar 

  • Eoyang, C, Mendez, R.H., and Lubeck, O.M. 1988. The birth of the second generation: The Hitachi S-820/80. In Proc., Supercomputing '88 (Orlando, Fla., Nov. 14–18), pp. 296–303.

  • Fisher, J.A. 1981. Trace scheduling: A technique for global microcode compaction. IEEE Trans. Comp., C-30.

  • Gannon, D., and Jalby, W. 1987. The influence of memory hierarchy on algorithm organization: Programming FFTs on a vector multiprocessor. CSRD Rept. No. 633, Center for Supercomputing Res. and Dev., Univ. of Ill. at Urbana-Champaign, Urbana, Ill.

    Google Scholar 

  • Hsu, W.-C. 1987. Register allocation and code scheduling for load/store architectures. Comp. Sci. Tech. Rept. 722, Univ. of Wisconsin at Madison, Madison, Wisc.

    Google Scholar 

  • Kroft, D. 1981. Lockup-free instruction fetch/prefetch cache organization. In Proc., 8th Internat. Symp. on Comp. Architecture (May), pp. 81–87.

  • Miura, K., and Uchida, K. 1983. FACOM vector processor system: VP-100/VP-200. In Proc., NATO Advanced Res. Workship on High-Speed Computing (June).

  • Padua, D.A. and Wolfe, M.J. 1986. Advanced compiler optimizations for supercomputers. CACM, 29 (Dec), 1184–1201.

    Google Scholar 

  • Pfister, G.F., Brantley, W.C., George, D.A., Harvey, S.L., Kleinfelder, W.J., McAuliffe, K.P., Melton, E.A., Norton V.A., and Weiss, J. 1985. The IBM Research Parallel Processor Prototype (RP3): Introduction and architecure. In Proc., 1985 Internat. Conf. on Parallel Processing (Aug.), pp. 764–771.

  • Rau, B.R. 1988. Cydra 5 directed dataflow architecture. Digest of Papers, COMPCON Spring 1988 (Feb.), pp. 106–113.

  • Russel, R.M. 1978. The CRAY-1 computer system. CACM, 21 (Jan.), 63–72.

    Google Scholar 

  • Scheurich, C., and Dubois, M. 1988. The design of a lockup-free cache for high-performance multiprocessors. In Proc., Supercomputing '88 (Orlando, Fla., Nov. 14–18), pp. 352–359.

  • Watanabe, T. 1987. Architecture and performance of NEC supercomputer SX system. Parallel Computing, 5: 247–255.

    Google Scholar 

  • Wilson, A.W. 1987. Hierarchical cache/bus architecture for shared memory multiprocessors. In Proc., 14th Annual Symp. on Comp. Architecture (Pittsburgh, Penn., June), pp. 244–252.

Download references

Author information

Authors and Affiliations

Authors

Additional information

The work of the first author was supported in part by NSF Grant CCR-8706722.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sohi, G.S., Hsu, WC. The use of intermediate memories for low-latency memory access in supercomputer scalar units. J Supercomput 4, 5–21 (1990). https://doi.org/10.1007/BF00162340

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00162340

Key words

Navigation