Advertisement

Increasing memory bandwidth for vector computations

  • Sally A. McKee
  • Steven A. Moyer
  • Wm. A. Wulf
  • Charles Hitchcock
Session Papers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 782)

Abstract

Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vector-like algorithms, including the “Grand Challenge” scientific problems. Caching is not the sole solution for these applications due to the poor temporal and spatial locality of their data accesses. Moreover, the nature of memories themselves has changed. Achieving greater bandwidth requires exploiting the characteristics of memory components “on the other side of the cache” — they should not be treated as uniform access-time RAM. This paper describes the use of hardwareassisted access ordering, a technique that combines compile-time detection of memory access patterns with a memory subsystem that decouples the order of requests generated by the processor from that issued to the memory system. This decoupling permits the requests to be issued in an order that optimizes use of the memory system. Our simulations show significant speedup on important scientific kernels.

Keywords

Memory System Computer Architecture Memory Bandwidth Memory Bank Peak Bandwidth 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baer, J. L., Chen, T. F., “An Effective On-Chip Preloading Scheme To Reduce Data Access Penalty”, Supercomputing 91, November 1991.Google Scholar
  2. 2.
    Baron, R.L., and Higbie, L., Computer Architecture, Addison-Wesley, 1992.Google Scholar
  3. 3.
    Budnik, P., and Kuck, D., “The Organization and Use of Parallel Memories”, IEEE Trans. Comput., 20, 12, 1971.Google Scholar
  4. 4.
    Callahan, D., et. al., “Software Prefetching”, Fourth International Conference on Architectural Support for Programming Languages and Systems, April 1991.Google Scholar
  5. 5.
    Carr, S., Kennedy, K., “Blocking Linear Algebra Codes for Memory Hierarchies”, Proc. Fourth SIAM Conference on Parallel Processing for Scientific Computing, 1989.Google Scholar
  6. 6.
    Davidson, Jack W., and Benitez, Manuel E., “Code Generation for Streaming: An Access/Execute Mechanism”, Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.Google Scholar
  7. 7.
    Dongarra, et. al., “Linpack User's Guide“, SLAM, Philadelphia, 1979.Google Scholar
  8. 8.
    Fu, J. W. C., and Patel, J. H., “Data Prefetching in Multiprocessor Vector Cache Memories”, 18th International Symposium on Computer Architecture, May 1991.Google Scholar
  9. 9.
    Golub, G., and Ortega, J.M., Scientific Computation: An Introduction with Parallel Computing, Academic Press, Inc., 1993.Google Scholar
  10. 10.
    Goodman, J. R., et al, “PIPE: A VLSI Decoupled Architecture”, Twelfth International Symposium on Computer Architecture, June 1985.Google Scholar
  11. 11.
    Gupta, R., and Soffa, M., “Compile-time Techniques for Efficient Utilization of Parallel Memories”, SIGPLAN Not., 23, 9, 1988, pp. 235–246.Google Scholar
  12. 12.
    Harper, D. T., Jump., J., “Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme”, IEEE Trans. Comput., 36, 12, 1987.Google Scholar
  13. 13.
    Harper, D. T., “Address Transformation to Increase Memory Performance”, 1989 International Conference on Supercomputing.Google Scholar
  14. 14.
    Hayes, J.P., Computer Architecture and Organization, McGraw-Hill, 1988.Google Scholar
  15. 15.
    Hwang, K., and Briggs, F.A., Computer Architecture and Parallel Processing, McGraw-Hill, Inc., 1984.Google Scholar
  16. 16.
    “High-speed DRAMs”, Special Report, IEEE Spectrum, vol. 29, no. 10, October 1992.Google Scholar
  17. 17.
    i860 XP Microprocessor Data Book, Intel Corporation, 1991.Google Scholar
  18. 18.
    Jouppi, N., “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully Associative Cache and Prefetch Buffers”, 17th International Symposium on Computer Architecture, May 1990.Google Scholar
  19. 19.
    Katz, R., and Hennessy, J., “High Performance Microprocessor Architectures”, University of California, Berkeley, Report No. UCB/CSD 89/529, August, 1989.Google Scholar
  20. 20.
    Klaiber, A., et. al., “An Architecture for Software-Controlled Data Prefetching”, 18th International Symposium on Computer Architecture, May 1991.Google Scholar
  21. 21.
    Lam, Monica, et. al., “The Cache Performance and Optimizations of Blocked Algorithms”, Fourth International Conference on Architectural Support for Programming Languages and Systems, April 1991.Google Scholar
  22. 22.
    Lawson, et. al., “Basic Linear Algebra Subprograms for Fortran Usage”, ACM Trans. Math. Soft., 5, 3, 1979.Google Scholar
  23. 23.
    Lee, K., “Achieving High Performance On the i860 Microprocessor Using Naspack Subroutines”, NAS Systems Division, NASA Ames Research Center, July 1990.Google Scholar
  24. 24.
    Lee, K., “On the Floating Point Performance of the i860 Microprocessor”, RNR-90-019, NAS Systems Division, NASA Ames Research Center, October 1990.Google Scholar
  25. 25.
    Maccabe, A.B., Computer Systems: Architecture, Organization, and Programming, Richard D. Irwin, Inc., 1993.Google Scholar
  26. 26.
    Mano, M.M., Computer System Architecture, 2nd ed., Prentice-Hall, Inc., 1982Google Scholar
  27. 27.
    McMahon, F.H., “The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range”, Lawrence Livermore National Laboratory, UCRL-53745, December 1986.Google Scholar
  28. 28.
    McKee, S.A, “Hardware Support for Access Ordering: Performance of Some Design Options”, University of Virginia, Department of Computer Science, Technical Report CS-93-08, July 1993.Google Scholar
  29. 29.
    Meadows, L., Nakamoto, S., and Schuster, V., “A Vectorizing, Software Pipelining Compiler for LIW and Superscalar Architectures”, RISC'92, February 1992.Google Scholar
  30. 30.
    Moyer, S.A., “Performance of the iPSC/860 Node Architecture,” University of Virginia, IPC-TR-91-007, 1991.Google Scholar
  31. 31.
    Moyer, S., “Access Ordering and Effective Memory Bandwidth”, Ph.D. Dissertation, Department of Computer Science, University of Virginia, Technical Report CS-93-18, April 1993.Google Scholar
  32. 32.
    Quinnell, R., “High-speed DRAMs”, EDN, May 23, 1991.Google Scholar
  33. 33.
    “Architectural Overview”, Rambus Inc., Mountain View, CA, 1992.Google Scholar
  34. 34.
    Rau, B. R., “Pseudo-Randomly Interleaved Memory”, 18th International Symposium on Computer Architecture, May 1991.Google Scholar
  35. 35.
    Sklenar, Ivan, “Prefetch Unit for Vector Operation on Scalar Computers”, Computer Architecture News, 20, 4, September 1992.Google Scholar
  36. 36.
    Smith, J. E., et al, “The ZS-1 Central Processor”, The Second International Conference on Architectural Support for Programming Languages and Systems, Oct. 1987Google Scholar
  37. 37.
    Sohi, G. and Manoj, F., “High Bandwidth Memory Systems for Superscalar Processors”, Fourth International Conference on Architectural Support for Programming Languages and Systems, April 1991.Google Scholar
  38. 38.
    Tomek, I., The Foundations of Computer Architecture and Organization, Computer Science Press, 1990.Google Scholar
  39. 39.
    Valero, M., et. al., “Increasing the Number of Strides for Conflict-Free Vector Access”, 19th International Symposium on Computer Architecture, May 1992.Google Scholar
  40. 40.
    Wallach, S., “The CONVEX C-1 64-bit Supercomputer”, Compcon Spring 85, February 1985.Google Scholar
  41. 41.
    Wolfe, M., “Optimizing Supercompilers for Supercomputers”, MIT Press, Cambridge, MA, 1989.Google Scholar
  42. 42.
    Wulf, W. A., “Evaluation of the WM Architecture”, 19th Annual International Symposium on Computer Architecture, vol 20, no. 2, May 19–21, 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Sally A. McKee
    • 1
  • Steven A. Moyer
    • 1
  • Wm. A. Wulf
    • 1
  • Charles Hitchcock
    • 2
  1. 1.Department of Computer ScienceUniversity of VirginiaCharlottesville
  2. 2.Thayer School of EngineeringDartmouth CollegeHanover

Personalised recommendations