Abstract
Microcoded customized IPs offer superior performance and direct programmability of micro-architectural structures compared to instruction-based processors, yet at the cost of drastically enlarged code sizes. Code compression can deliver size reductions but necessitates attention to performance issues, so that the performance benefits of microcoded IPs are not squandered in the process. To attain this goal, we propose in this paper a fast code compression technique through exploiting the fact that the microcodes contain a sizable amount of unspecified bits. Although the values and the positions of the specified bits are highly irregular, the proposed technique can still flexibly and precisely fill in these fully specified bits through utilizing a linear network. The linear property inherent in the compression strategy in turn enables the development of an extremely low-overhead decompression engine. At runtime, the decompressed code can be generated in such a way that all the specified bits can be filled as required by a fixed-bandwidth XOR network. The combination of the proposed flexible XOR-based network with a minimum two-level storage for highly specified fields, such as immediate values, offers utmost code compression, attained within a negligible amount of performance and hardware overhead.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Schreiber R, Aditya S, Mahlke S, Kathail V, Rau BR, Cronquist D, Sivaraman M (2002) PICO-NPA: High-level synthesis of nonprogrammable hardware accelerators. VLSI Signal Process 31(2):127–142
Clark N, Zhong H, Fan K, Mahlke S, Flautner K, Nieuwenhove KV (2004) OptimoDE: Programmable accelerator engines through retargetable customization. In: Hot Chips
Weber S, Keutzer K (2005) Using minimal minterms to represent programmability. In: CODES+ISSS, Sept 2005, pp 63–68
Reshadi M, Gorjiara B, Gajski D (2005) Utilizing horizontal and vertical parallelism with a no-instruction-set compiler for custom datapaths. In: ICCD, Oct 2005, pp 69–76
Thuresson M, Sjalander M, Bjork M, Svensson L, Larsson-Edefors P, Stenstrom P (2007) FlexCore: Utilizing exposed datapath control for efficient computing. In: IC-SAMOS, July 2007, pp 18–25
Wolfe A, Chanin A (1992) Executing compressed programs on an embedded RISC architecture. In: International Symposium on Microarchitecture, Dec 1992, pp 81–91
Kemp TM, Montoye RK, Harper JD, Palmer JD, Auerbach DJ (1998) A decompression core for PowerPC. IBM J Res Dev 42(6):807–812
Yang C, Chen M, Orailoglu A (2008) Squashing microcode stores to size in embedded systems while delivering rapid microcode accesses. In: CODES-ISSS, Oct 2009, pp 249–256
Cooper KD, McIntosh N (1999) Enhanced code compression for embedded RISC processors. In: Conference on programming language design and implementation, May 1999, pp. 139–149
Debray SK, Evans W, Muth R, Sutter BD (2000) Compiler techniques for code compaction. ACM Trans on Program Lang Syst 22(2)
Segars S, Clarke K, Goudge L (1995) Embedded control problems, thumb, and the ARM7TDMI. IEEE Micro 15(5):22–30
Grehan R (1999) 16-bit: The good, the bad, your options. Embed Syst Program 12(8)
Pechanek GG, Larin S, Conte T (2002) Any-size instruction abbreviation technique for embedded DSPs. In: ASIC/SOC Conference, Sept 2002, pp 8–12
Corliss ML, Lewis EC, Roth A (2003) DISE: a programmable macro engine for customizing applications. In: ISCA, June 2003, pp 362–373
Lau J, Schoenmackers S, Sherwood T, Calder B (2003) Reducing code size with echo instructions. In: CASES, Oct 2003, pp 84–94
Agerwala T (1976) Microprogram optimization: a survey. IEEE Trans Comput 25(10):962–973
Gorjiara B, Gajski D (2007) FPGA-friendly code compression for horizontal microcoded custom IPs. In: FPGA’07, pp 108–115
Borin E, Breternitz M, Wu Y, Araujo G (2007) Clustering-based microcode compression. In: ICCD’07, Oct 2007, pp 189–196
Thuresson M, Sjalander M, Stenstrom P (2009) A flexible code compression scheme using partitioned look-up tables. In: HiPEAC, Jan 2009, pp 95–109
Stewart G (1973) Introduction to matrix computations. Acadamic Press, New York
Bayraktaroglu I, Orailoglu A (2005) The construction of optimal deterministic partitionings in scan-based BIST fault diagnosis: Mathematical foundations and cost-effective implementations. IEEE Trans Comput 54(1):61–75
Kim D, Lee K, Lee S-J, Yoo H-J (2005) A reconfigurable crossbar switch with adaptive bandwidth control for networks-on-chip. In: ISCAS, Jan 2005, pp 2369–2372
Wan M, Zhang H, George V, Benes M, Abnous A, Prabhu V, Rabaey J (2001) Design methodology of a low-energy reconfigurable single-chip DSP system. J VLSI Signal Process Syst 28:47–61
Thoziyoor S, Muralimanohar N, Ahn JH, Jouppi NP (2008) CACTI 5.1, Tech report, HP Labs, April 2008
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Yang, C., Chen, M. & Orailoglu, A. Squashing code size in microcoded IPs while delivering high decompression speed. Des Autom Embed Syst 14, 265–284 (2010). https://doi.org/10.1007/s10617-010-9057-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-010-9057-z