An Analysis of Scalar Memory Accesses in Embedded and Multimedia Systems

  • Osman S. Unsal
  • Zhenlin Wang
  • Israel Koren
  • C. Mani Krishna
  • Csaba Andras Moritz

Abstract

In an earlier chapter about the FlexCache project [24], we described our vision of a multipartitioned cache where memory accesses are separated based on their static predictability and memory footprint, and managed with various compiler-controlled techniques supported by instruction set architecture extensions or with traditional hardware control.

In line with that vision, this paper describes our work in progress related to the memory performance and memory management of scalars. Our focus in this paper is embedded and multimedia architectures, but the methodology described can be applied to other classes of applications.

In particular, we establish the minimum size of a memory partition that would allow us to map and manage all scalar accesses in a program statically and describe compiler techniques to automate the extraction of this information. We evaluate the impact of register file size on the volume of scalar-related memory accesses and its impact on the applications’ overall cache performance. We study the cache behavior of scalar accesses for embedded architectures, including reduction in cache misses due to separation of scalars from other types of memory accesses. Additionally, we develop an energy-efficient data caching strategy for multimedia processors, based on our scalar partitioning approach.

Keywords

Memory Access Basic Block None None Cache Size Data Cache 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Appel AW, George L (2001) Optimal Spilling for CISC Machines with Few Registers, In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 243–253.Google Scholar
  2. 2.
    Albonesi DH (1999) Selective Cache Ways: On-Demand Cache Resource Allocation, In: Proceedings of the 32nd International Symposium on Microarchitecture, MICRO32, pp. 248–258.Google Scholar
  3. 3.
    Belady LA (1966) A Study of Replacement Algorithms for a Virtual-Storage Computer, IBM Systems Journal, 5(2):78–101.CrossRefGoogle Scholar
  4. 4.
    Benini L, Macii A, Poncino M (2000) A Recursive Algorithm for Low-Power Memory Partitioning, In: Proceedings of the International Symposium on Low Power Electronics and Design, ISLPED’00, Rapallo, Italy, pp. 78–83.Google Scholar
  5. 5.
    Bhandarkar DP (1996) Alpha Implementations and Architecture, Complete Reference Guide, Digital Press, Newton, MA, pp. 42–43.Google Scholar
  6. 6.
    Bishop B, Kelliher T, Irwin N (1999) A Detailed Analysis of MediaBench, In: Proceedings of the IEEE Workshop on Signal Processing Systems, Taipei, Taiwan, IEEE, New York.Google Scholar
  7. 7.
    Burger D, Austin TD (1997) The Simplescalar Tool Set, Version 2.0, University of Wisconsin-Madison Computer Sciences Department Technical Report #1342.Google Scholar
  8. 8.
    Brooks D, Tiwari V, Martonosi M (2000) Wattch: A Framework for Architectural-Level Power Analysis and Optimizations, In: Proceedings of the 27th International Symposium on Computer Architecture, ISCA’00, Vancouver, Canada, pp. 83–94.Google Scholar
  9. 9.
    Burlin J (2000) Optimizing Stack Frame Layout for Embedded Systems, Masters Thesis, Computing Science Department, Uppsala University, Uppsala, Sweden.Google Scholar
  10. 10.
    Chiou D, Jain P, Rudolph L, Devadas S (2000) Application-Specific Memory Management for Embedded Systems Using Software-Controlled Caches, In: Proceedings of the 37th Design Automation Conference, DAC’00, Los Angeles, CA, pp. 416–419.Google Scholar
  11. 11.
    Cooper KD, Harvey TJ (1998) Compiler-Controlled Memory, In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Systems (ASPLOS-VIII), pp. 2–11.Google Scholar
  12. 12.
    Delaluz V, Kandemir M, Vijaykrishnan N, Irwin MJ (2000) Energy-Oriented Compiler Optimizations for Partitioned Memory Architectures, In: Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems CASES’00, San Jose, CA, pp. 138–147.Google Scholar
  13. 13.
    Engblom J (1999) Why Speclnt95 Should Not Be Used to Benchmark Embedded Systems Tools, In: Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers and Tools for Embedded Systems (LCTES’99), pp. 96–103.Google Scholar
  14. 14.
    Fritts J, Wolf W, Liu B (1999) Understanding Multimedia Application Characteristics for Designing Programmable Media Processors, In: Proceedings of SPIE, Multimedia Hardware Architectures, San Jose, CA, pp. 2–13.Google Scholar
  15. 15.
    Huang M, Renau J, Torrellas J (2001) LI Cache Decomposition for Energy Efficient Processors, In: Proceedings of the International Symposium on Low-Power Electronics and Design, ISLPED′01, Huntington Beach, CA, pp. 10–15.Google Scholar
  16. 16.
    Kin J, Gupta M, Mangione-Smith WH (1997) The Filter Cache: An Energy Efficient Memory Structure, In: Proceeedings of the 30th Annual Symposium on Microarchitecture, MICRO30, pp. 184–193.Google Scholar
  17. 17.
    Kulkarni C, Catthoor F, De Man H (2000) Advanced Data Layout Organization for Multi-Media Applications, In: Workshop on Parallel and Distributed Com-puting in Image Processing, Video Processing, and Multimedia (PDIVM 2000), Cancun, Mexico.Google Scholar
  18. 18.
    Lee C, Potkonjak M, Mangione-Smith WH (1997) Mediabench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems, In: Proceedings of the 30th Annual International Symposium on Microarchitecture, MICRO30, pp. 330–335.Google Scholar
  19. 19.
    Lee HS, Tyson GS (2000) Region-Based Caching: An Energy Delay Efficient Memory Architecture for Embedded Processors, In: Proceedings of PACM (CASES′00), San Jose, CA, pp. 120–127.Google Scholar
  20. 20.
  21. 21.
    Memik G, Kandemir M, Haldar M, Choudhary A (1999) A Selective Hardware/Compiler Approach for Improving Cache Locality, Northwestern University Technical Report CPDC-TR-9909-016.Google Scholar
  22. 22.
    Milutinovich V, Tomasevic M, Markovic B, Tremblay M (1996) The Split Temporal / Spatial Cache: Initial Performance Analysis, In: Proceedings of SCIzzL-5, Santa Clara, CA, pp. 63–69.Google Scholar
  23. 23.
    Moritz CA, Frank M, Amarasinghe S (2000) FlexCache: A Framework for Compiler Generated Data Caching, In: Proceedings of the Second Workshop on Intelligent Memory Systems, IRAM’00, Held in Conjunction with ASPLOS-IX, Cambridge, MA.Google Scholar
  24. 24.
    Moritz CA, Frank M, Amarasinghe S (2001) FlexCache: A Framework for Compiler Generated Data Caching, Lecture Notes in Computer Science, Springer-Verlag, Berlin.Google Scholar
  25. 25.
    Mueller F (1995) Compiler Support for Software-Based Cache Partitioning, In: Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, CA, pp. 125–133.Google Scholar
  26. 26.
    O’Boyle M, Knijnenburg P (1996) Non-Singular Data Transformations: Definition, Validity, Applications, In: Proceedings of the 6th Workshop on Compilers for Parallel Computers (CPC’96), Aachen, Germany, pp. 287–297.Google Scholar
  27. 27.
    Ranganathan P, Adve S, Jouppi NP (2000) Reconfigurable Caches and Their Application to Media Processing, In: Proceedings of the 27th International Symposium on Computer Architecture (ISCA-27), pp. 214–224.Google Scholar
  28. 28.

Copyright information

© Springer Science+Business Media New York 2004

Authors and Affiliations

  • Osman S. Unsal
    • 1
  • Zhenlin Wang
    • 2
  • Israel Koren
    • 1
  • C. Mani Krishna
    • 1
  • Csaba Andras Moritz
    • 1
  1. 1.Department of Electrical and Computer EngineeringUniversity of MassachusettsAmherstUSA
  2. 2.Department of Computer ScienceUniversity of MassachusettsAmherstUSA

Personalised recommendations