Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy
Rent the article at a discountRent now
* Final gross prices may vary according to local VAT.Get Access
We present an architecture of decoupled processors with a memory hierarchy consisting only of scratch-pad memories, and a main memory. This architecture exploits the more efficient pre-fetching of Decoupled processors, that make use of the parallelism between address computation and application data processing, which mainly exists in streaming applications. This benefit combined with the ability of scratch-pad memories to store data with no conflict misses and low energy per access contributes significantly for increasing the system’s performance. The application code is split in two parallel programs the first runs on the Access processor and computes the addresses of the data in the memory hierarchy. The second processes the application data and runs on the Execute processor, a processor with a limited address space—just the register file addresses. Each transfer of any block in the memory hierarchy up to the Execute processor’s register file is controlled by the Access processor and the DMA units. This strongly differentiates this architecture from traditional uniprocessors and existing decoupled processors with cache memory hierarchies. The architecture is compared in performance with uniprocessor architectures with (a) scratch-pad and (b) cache memory hierarchies and (c) the existing decoupled architectures, showing its higher normalized performance. The reason for this gain is the efficiency of data transferring that the scratch-pad memory hierarchy provides combined with the ability of the Decoupled processors to eliminate memory latency using memory management techniques for transferring data instead of fixed prefetching methods. Experimental results show that the performance is increased up to almost 2 times compared to uniprocessor architectures with scratch-pad and up to 3.7 times compared to the ones with cache. The proposed architecture achieves the above performance without having penalties in energy delay product costs.
- Smith, J. E. (1982). “Decoupled Access/Execute Architectures”, Proceedings of the 9th International Symposium on Computer Architecture, pp. 112–119, May.
- Talla, D., John, L. K. (2001). “MediaBreeze: A Decoupled Architecture for Accelerating Multimedia Applications” ACM Computer Architecture News, ACM Press, ISSN 0163-5964, pp. 62–67, vol. 29. no. 5, December.
- Thies, W., Karczmarek, M., Amarasinghe, S. (2002). “StreamIt: A language for streaming applications,” in Int’l Conference on Compiler Construction, Apr.
- Buck, I. (2003). “Brook Specification v0.2,” merrimac.stanford.edu/brook/brookspec-v0.2.pdf, October.
- Gupta, S., Miranda, M., Catthoor, F., Gupta, R. (2000). “Analysis of high-level address code transformations for programmable processors,” Procedings ACM Conference on Design and Test in Europe 2000, Paris, France, pp. 9–13, March.
- Miranda, M., Catthoor, F., Janssen, M., & De Man, H. (1998). High-level Address Optimisation and Synthesis Techniques for Data-transfer Intensive Applications. IEEE Transactions on VLSI Systems, 6(4), 677–686. CrossRef
- Panda, P. R., Catthoor, F. et al. (2001). Data and memory optimizations for embedded systems. ACM TODAES, April.
- Kandemir, M. T., & Choudhary, A. (2002). Compiler-directed scratch pad memory hierarchy design and management. New Orleans, USA: DAC.
- Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., Mendias, J. (2004). “An integrated Hardware/Software Approach For Run-Time scratch-pad Management”, Proceedings of the 41st annual conference on Design automation, June 07–11, San Diego, CA, USA.
- Kandemir, M., et al. (2004). A Compiler Based Approach for Dynamically Managing Scratch-pad Memories in Embedded Systems. IEEE Transactions on Computer-Aided Design, 23(2), 243–260. CrossRef
- Issenin, I., Brockmeyer, E., Miranda, M., Dutt, N. (2004). Data reuse analysis technique for software-controlled memory hierarchies. In proceedings of the Conference on Design Automation and Test in Europe (DATE ), pp. 202–207.
- Dasygenis, M., Brockmeyer, E., Durinck, B., Catthoor, F., Soudris, D., & Thanailakis, A. (2006). A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(3), 279–291. CrossRef
- Kurian, L., Hulina, T., Coraor, L. D. (1994). “Memory Latency Effects in Decoupled Architectures”, IEEE Transactions on Computers, 43(10), October.
- Jones, G. P., Topham, N. P. (1997). “A Comparison of Data Prefetching on an Access Decoupled and Superscalar Machine” Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), December 1997, North Carolina, US.
- Mathew, B., Davis, A. (2004). “A Loop Accelerator for Low Power Embedded VLIW Processors”, Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, September 08–10, Stockholm, Sweden.
- Mowry, T. C., Lam, M. S., Gupta, A. (1991). “Design and Evaluation of a Compiler Algorithm for Prefetching”, Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, October.
- Rich, K. D., Farrens, M. K. (2000). “Code Partitioning in Decoupled Compilers” European Conference on Parallel Processing (Euro–Par), pp.1008–1017.
- Kurdah, F. J., Parker, A. C. (1999). “REAL: a program for register allocation”, Proc. EuroPar Conf., Toulouse, France, pp.668–676, Sep.
- Burger, D., Austin, T. M. (1997). “The simplescalar toolset, Version 2.0,” Comp. Sciences Dept, UW, Tech. Rep., June.
- Zhang, Y., Parikh, D., Sankaranarayanan, K., Skadron, K., & Stan, M. (2003). HotLeakage: A temperature-Aware Model of Subthreshold and Gate Leakage for Architects. Charlottesville: University of Virginia.
- Reinman, G., Jouppi, N. (1999). “An integrated cache timing and power model”, Technical report, Compaq Western Research Lab.
- Lee, C., Potkonjak, M., Mangione-Smith, W. H. (1997). “MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems” International Symposium on Microarchitecture.
- Stobach, P. (1998). “A new technique in scene adaptive coding”, European Signal processing Conference (EUSIPCO).
- Francesco, P., Marchal, P., Atienza, D., Benini, L., Catthoor, F., Mendias, J. M. (2004). “An integrated hardware/software approach for run-time scratchpad management”, Proceedings of the 41st annual conference on Design automation, 238–243.
- Absar, J., Catthoor, F. (2006). “Analysis of scratch-pad and data-cache performance using statistical methods, Proceedings of the 2006 conference on Asia South Pacific design automation”, 820–825.
- Banakar, R., Steinke, S., Lee, B.-S., Balakrishnan, M., Marwedel, P. “Comparison of Cache and Scratch-Pad based Memory Systems with respect to Performance, Area and Energy Consumption”, Technical Report 762, University of Dortmun.
- Absar, J., and Catthoor, F. (2005). “Compiler-Based Approach for Exploiting Scratch-Pad in Presence of Irregular Array Access”. In proceedings of the Conference on Design Automation and Test in Europe (DATE), 1162–1167
- Kudriavtsev, A., and Kogge, P. SMT possibilities for decoupled architecture, Technical Committee on Computer Architecture (TCCA) Newsletter: Papers from MEmory access DEcoupling for superscalar and multiple issue Architectures (MEDEA-2000)
- Van Achteren, T., Lauwereins, R., Catthoor, F. (2000) “Systematic Data Reuse Exploration Methodology for Irregular Access Patterns”13th International Symposium on System Synthesis (ISSS), Madrid, Spain, Proceedings. IEEE Computer Society, pp.115–122, September
- Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy
Journal of Signal Processing Systems
Volume 59, Issue 3 , pp 281-296
- Cover Date
- Print ISSN
- Online ISSN
- Springer US
- Additional Links
- Scratch pad
- Industry Sectors
- Author Affiliations
- 1. VLSI Design Lab., Electrical & Computer Engineering Department, University of Patras, Patras, Greece