Advertisement

Memory System Support for Irregular Applications

  • John Carter
  • Wilson Hsieh
  • Mark Swanson
  • Lixin Zhang
  • Erik Brunvand
  • Al Davis
  • Chen-Chi Kuo
  • Ravindra Kuramkote
  • Michael Parker
  • Lambert Schaelicke
  • Leigh Stoller
  • Terry Tateyama
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1511)

Abstract

Because irregular applications have unpredictable memory access patterns, their performance is dominated by memory behavior. The Impulse configurable memory controller will enable significant performance improvements for irregular applications, because it can be configured to optimize memory accesses on an application-by-application basis. In this paper we describe the optimizations that the Impulse controller supports for sparse matrix-vector product, an important computational kernel, and outline the transformations that the compiler and runtime system must perform to exploit these optimizations.

Keywords

Cache Line Memory Controller Physical Memory Physical Address Physical Page 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, and B. Smith. The Tera computer system. In Proceedings of the International Conference on Supercomputing, pages 272–277, Amsterdam, The Netherlands, June 1990.Google Scholar
  2. 2.
    D. Bailey et al. The NAS parallel benchmarks. Technical Report RNR-94-007, NASA Ames Research Center, Mar. 1994.Google Scholar
  3. 3.
    J. Boisseau, L. Carter, K. S. Gatlin, A. Majumdar, and A. Snavely. NAS benchmarks on the Tera MTA. In Proceedings of the Multithreaded Execution Architecture and Compilation, Las Vegas, NV, Jan. 31–Feb. 1, 1998.Google Scholar
  4. 4.
    D. Burger, J. Goodman, and A. Kagi. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 78–89, May 1996.Google Scholar
  5. 5.
    A. Huang and J. Shen. The intrinsic bandwidth requirements of ordinary programs. In Proceedings of the 7th Symposium on Architectural Support for Programming Languages and Operating Systems, pages 105–114, Oct. 1996.Google Scholar
  6. 6.
    C. E. Kozyrakis et al. Scalable processors in the billion-transistor era: IRAM. IEEE Computer, pages 75–78, Sept. 1997.Google Scholar
  7. 7.
    M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the 4th ASPLOS, pages 63–74, Santa Clara, CA, Apr. 1991.Google Scholar
  8. 8.
    S. McKee and W. A. Wulf. Access ordering and memory-conscious cache utilization. In Proceedings of the First IEEE Symposium on High Performance Computer Architecture, pages 253–262, Raleigh, NC, Jan. 1995.Google Scholar
  9. 9.
    D. R. O’Hallaron. Spark98: Sparse matrix kernels for shared memory and message passing systems. Technical Report CMu-CS-97-178, Carnegie Mellon University School of Computer Science, Oct. 1997.Google Scholar
  10. 10.
    M. Oskin, F. T. Chong, and T. Sherwood. Active pages: A model of computation for intelligent memory. In Proceedings of the 25th International Symposium on Computer Architecture, June 1998. To appear.Google Scholar
  11. 11.
    S. E. Perl and R. Sites. Studies of Windows NT performance using dynamic execution traces. In Proceedings of the Second Symposium on Operating System Design and Implementation, pages 169–184, October 1996.Google Scholar
  12. 12.
    S. Saini and D. H. Bailey. NAS parallel benchmark (version 1.0) results. Technical Report NAS-96-18, NASA Ames Research Center, Moffett Field, CA, Nov. 1996.Google Scholar
  13. 13.
    M. Swanson, L. Stoller, and J. Carter. Increasing TLB reach using superpages backed by shadow memory. In Proceedings of the 25th Annual International Symposium on Computer Architecture, June 1998.Google Scholar
  14. 14.
    X. Zhang, A. Dasdan, M. Schulz, R. K. Gupta, and A. A. Chien. Architectural adaptation for application-specific locality optimizations. In Proceedings of the 1997 IEEE International Conference on Computer Design, 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • John Carter
    • 1
  • Wilson Hsieh
    • 1
  • Mark Swanson
    • 1
  • Lixin Zhang
    • 1
  • Erik Brunvand
    • 1
  • Al Davis
    • 1
  • Chen-Chi Kuo
    • 1
  • Ravindra Kuramkote
    • 1
  • Michael Parker
    • 1
  • Lambert Schaelicke
    • 1
  • Leigh Stoller
    • 1
  • Terry Tateyama
    • 1
  1. 1.Department of Computer ScienceUniversity of UtahUSA

Personalised recommendations