High Performance Memory Systems pp 199-212 | Cite as
An Analysis of Scalar Memory Accesses in Embedded and Multimedia Systems
Abstract
In an earlier chapter about the FlexCache project [24], we described our vision of a multipartitioned cache where memory accesses are separated based on their static predictability and memory footprint, and managed with various compiler-controlled techniques supported by instruction set architecture extensions or with traditional hardware control.
In line with that vision, this paper describes our work in progress related to the memory performance and memory management of scalars. Our focus in this paper is embedded and multimedia architectures, but the methodology described can be applied to other classes of applications.
In particular, we establish the minimum size of a memory partition that would allow us to map and manage all scalar accesses in a program statically and describe compiler techniques to automate the extraction of this information. We evaluate the impact of register file size on the volume of scalar-related memory accesses and its impact on the applications’ overall cache performance. We study the cache behavior of scalar accesses for embedded architectures, including reduction in cache misses due to separation of scalars from other types of memory accesses. Additionally, we develop an energy-efficient data caching strategy for multimedia processors, based on our scalar partitioning approach.
Keywords
Memory Access Basic Block None None Cache Size Data CachePreview
Unable to display preview. Download preview PDF.
References
- 1.Appel AW, George L (2001) Optimal Spilling for CISC Machines with Few Registers, In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 243–253.Google Scholar
- 2.Albonesi DH (1999) Selective Cache Ways: On-Demand Cache Resource Allocation, In: Proceedings of the 32nd International Symposium on Microarchitecture, MICRO32, pp. 248–258.Google Scholar
- 3.Belady LA (1966) A Study of Replacement Algorithms for a Virtual-Storage Computer, IBM Systems Journal, 5(2):78–101.CrossRefGoogle Scholar
- 4.Benini L, Macii A, Poncino M (2000) A Recursive Algorithm for Low-Power Memory Partitioning, In: Proceedings of the International Symposium on Low Power Electronics and Design, ISLPED’00, Rapallo, Italy, pp. 78–83.Google Scholar
- 5.Bhandarkar DP (1996) Alpha Implementations and Architecture, Complete Reference Guide, Digital Press, Newton, MA, pp. 42–43.Google Scholar
- 6.Bishop B, Kelliher T, Irwin N (1999) A Detailed Analysis of MediaBench, In: Proceedings of the IEEE Workshop on Signal Processing Systems, Taipei, Taiwan, IEEE, New York.Google Scholar
- 7.Burger D, Austin TD (1997) The Simplescalar Tool Set, Version 2.0, University of Wisconsin-Madison Computer Sciences Department Technical Report #1342.Google Scholar
- 8.Brooks D, Tiwari V, Martonosi M (2000) Wattch: A Framework for Architectural-Level Power Analysis and Optimizations, In: Proceedings of the 27th International Symposium on Computer Architecture, ISCA’00, Vancouver, Canada, pp. 83–94.Google Scholar
- 9.Burlin J (2000) Optimizing Stack Frame Layout for Embedded Systems, Masters Thesis, Computing Science Department, Uppsala University, Uppsala, Sweden.Google Scholar
- 10.Chiou D, Jain P, Rudolph L, Devadas S (2000) Application-Specific Memory Management for Embedded Systems Using Software-Controlled Caches, In: Proceedings of the 37th Design Automation Conference, DAC’00, Los Angeles, CA, pp. 416–419.Google Scholar
- 11.Cooper KD, Harvey TJ (1998) Compiler-Controlled Memory, In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Systems (ASPLOS-VIII), pp. 2–11.Google Scholar
- 12.Delaluz V, Kandemir M, Vijaykrishnan N, Irwin MJ (2000) Energy-Oriented Compiler Optimizations for Partitioned Memory Architectures, In: Proceedings of the International Conference on Compilers, Architectures, and Synthesis for Embedded Systems CASES’00, San Jose, CA, pp. 138–147.Google Scholar
- 13.Engblom J (1999) Why Speclnt95 Should Not Be Used to Benchmark Embedded Systems Tools, In: Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers and Tools for Embedded Systems (LCTES’99), pp. 96–103.Google Scholar
- 14.Fritts J, Wolf W, Liu B (1999) Understanding Multimedia Application Characteristics for Designing Programmable Media Processors, In: Proceedings of SPIE, Multimedia Hardware Architectures, San Jose, CA, pp. 2–13.Google Scholar
- 15.Huang M, Renau J, Torrellas J (2001) LI Cache Decomposition for Energy Efficient Processors, In: Proceedings of the International Symposium on Low-Power Electronics and Design, ISLPED′01, Huntington Beach, CA, pp. 10–15.Google Scholar
- 16.Kin J, Gupta M, Mangione-Smith WH (1997) The Filter Cache: An Energy Efficient Memory Structure, In: Proceeedings of the 30th Annual Symposium on Microarchitecture, MICRO30, pp. 184–193.Google Scholar
- 17.Kulkarni C, Catthoor F, De Man H (2000) Advanced Data Layout Organization for Multi-Media Applications, In: Workshop on Parallel and Distributed Com-puting in Image Processing, Video Processing, and Multimedia (PDIVM 2000), Cancun, Mexico.Google Scholar
- 18.Lee C, Potkonjak M, Mangione-Smith WH (1997) Mediabench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems, In: Proceedings of the 30th Annual International Symposium on Microarchitecture, MICRO30, pp. 330–335.Google Scholar
- 19.Lee HS, Tyson GS (2000) Region-Based Caching: An Energy Delay Efficient Memory Architecture for Embedded Processors, In: Proceedings of PACM (CASES′00), San Jose, CA, pp. 120–127.Google Scholar
- 20.
- 21.Memik G, Kandemir M, Haldar M, Choudhary A (1999) A Selective Hardware/Compiler Approach for Improving Cache Locality, Northwestern University Technical Report CPDC-TR-9909-016.Google Scholar
- 22.Milutinovich V, Tomasevic M, Markovic B, Tremblay M (1996) The Split Temporal / Spatial Cache: Initial Performance Analysis, In: Proceedings of SCIzzL-5, Santa Clara, CA, pp. 63–69.Google Scholar
- 23.Moritz CA, Frank M, Amarasinghe S (2000) FlexCache: A Framework for Compiler Generated Data Caching, In: Proceedings of the Second Workshop on Intelligent Memory Systems, IRAM’00, Held in Conjunction with ASPLOS-IX, Cambridge, MA.Google Scholar
- 24.Moritz CA, Frank M, Amarasinghe S (2001) FlexCache: A Framework for Compiler Generated Data Caching, Lecture Notes in Computer Science, Springer-Verlag, Berlin.Google Scholar
- 25.Mueller F (1995) Compiler Support for Software-Based Cache Partitioning, In: Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, La Jolla, CA, pp. 125–133.Google Scholar
- 26.O’Boyle M, Knijnenburg P (1996) Non-Singular Data Transformations: Definition, Validity, Applications, In: Proceedings of the 6th Workshop on Compilers for Parallel Computers (CPC’96), Aachen, Germany, pp. 287–297.Google Scholar
- 27.Ranganathan P, Adve S, Jouppi NP (2000) Reconfigurable Caches and Their Application to Media Processing, In: Proceedings of the 27th International Symposium on Computer Architecture (ISCA-27), pp. 214–224.Google Scholar
- 28.