International Journal of Parallel Programming

, Volume 42, Issue 1, pp 140–164 | Cite as

Microcode Compression Using Structured-Constrained Clustering

  • Edson Borin
  • Guido Araujo
  • Mauricio BreternitzJr.
  • Youfeng Wu
Article

Abstract

Modern microprocessors have used microcode as a way to implement legacy (rarely used) instructions, add new ISA features and enable patches to an existing design. As more features are added to processors (e.g. protection and virtualization), area and power costs associated with the microcode memory increased significantly. A recent Intel internal design targeted at low power and small footprint has estimated the costs of the microcode ROM to approach 20% of the total die area (and associated power consumption). Moreover, with the adoption of multicore architectures, the impact of microcode memory size on the chip area has become relevant, forcing industry to revisit the microcode size problem. A solution to address this problem is to store the microcode in a compressed form and decompress it at runtime. This paper describes techniques for microcode compression that achieve significant area and power savings, while proposes a streamlined architecture that enables high throughput within the constraints of a high performance CPU. The paper presents results for microcode compression on several commercial CPU designs which demonstrates compression ratios ranging from 50 to 62%. In addition, it proposes techniques that enable the reuse of (pre-validated) hardware building blocks that can considerably reduce the cost and design time of the microcode decompression engine in real-world designs.

Keywords

Microcode compression Code compression Microprocessor design Compression algorithm Micro-architecture Micro-programming 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawala A.K., Rauscher T.G.: Microprogramming: perspective and status. IEEE Trans. Comput. C-23(8), 817–837 (1974)CrossRefGoogle Scholar
  2. 2.
    Araujo G., Centoducatte P., Azevedo R., Pannain R.: Expression-tree-based algorithms for code compression on embedded RISC architectures. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 8(5), 530–533 (2000). doi:10.1109/92.894158 CrossRefGoogle Scholar
  3. 3.
    Araujo, G., Centoducatte, P., Cortes, M., Pannain, R.: Code compression based on operand factorization. In: MICRO ’31: Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, pp. 194–201. IEEE Computer Society Press, Los Alamitos, CA (1998)Google Scholar
  4. 4.
    Borin, E., Breternitz, M. Jr., Wu, Y., Araujo, G.: Clustering-based microcode compression. In: ICCD ’06: Proceedings of the XXIV IEEE International Conference on Computer Design, pp. 189–196 (2006)Google Scholar
  5. 5.
    Breternitz, M. Jr., Smith, R.: Enhanced compression techniques to simplify program decompression and execution. In: ICCD ’97: Proceedings of the 1997 International Conference on Computer Design, pp. 170–176. IEEE Computer Society, Washington, DC (1997)Google Scholar
  6. 6.
    Dasgupta S.: The organization of microprogram stores. ACM Comput. Surv. 11(1), 39–65 (1979). doi:10.1145/356757.356761 CrossRefMATHGoogle Scholar
  7. 7.
    Fisher J.A.: Trace scheduling: a technique for global microcode compaction. IEEE Trans. Comput. C-30(7), 478–490 (1981)CrossRefGoogle Scholar
  8. 8.
    Frieder, G., Miller, J.: An analysis of code density for the two level programmable control of the Nanodata QM-1. In: MICRO ’10: Proceedings of the 10th Annual Workshop on Microprogramming, pp. 26–32. IEEE Press, Piscataway, NJ (1977)Google Scholar
  9. 9.
    Garey M.R., Johnson D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, New York (1979)MATHGoogle Scholar
  10. 10.
    Gunter, T.G., Tredennick, H.L.: Two-level control store for microprogrammed data processor. U.S. Patent n. 4,325,121 (1982)Google Scholar
  11. 11.
    Hum, H., Breternitz, Jr., M., Wu, Y., Kim S.: Compressing microcode. U.S. Patent n. 7,095,342 (2006)Google Scholar
  12. 12.
    Ishiura, N., Yamaguchi, M.: Instruction code compression for application specific VLIW processors based on automatic field partitioning. In: The Seventh Workshop on Synthesis and System Integration of Mixed technologies, pp. 105–109 (1997)Google Scholar
  13. 13.
    Kernighan B.W., Lin S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 207–291 (1970)CrossRefGoogle Scholar
  14. 14.
    Lefurgy, C., Bird, P., Chen, I.C., Mudge, T.: Improving code density using compression techniques. In: MICRO ’30: Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 194–203. IEEE Computer Society, Washington, DC (1997)Google Scholar
  15. 15.
    Lefurgy, C., Piccininni, E., Mudge, T.: Evaluation of a high performance code compression method. In: MICRO ’32: Proceedings of the 32nd Annual ACM/IEEE international Symposium on Microarchitecture, pp. 93–102. IEEE Computer Society, Washington, DC (1999)Google Scholar
  16. 16.
    Menzilcioglu, O.: A case study in using two-level control stores. In: MICRO ’20: Proceedings of the 20th Annual Workshop on Microprogramming, pp. 142–146. ACM Press, New York, NY (1987). doi:10.1145/255305.255333
  17. 17.
    Nam S.J., Park I.C., Kyung C.M.: Improving dictionary-based code compression in VLIW architectures. IEICE Trans. Fund. Electron. Commun. Comput. Sci. E82-A(11), 2318–2324 (1999)Google Scholar
  18. 18.
    Rosin R.F., Frieder G., Eckhouse R.H. Jr: An environment for research in microprogramming and emulation. Commun. ACM. 15(8), 748–760 (1972). doi:10.1145/361532.361550 CrossRefMATHGoogle Scholar
  19. 19.
    Rota G.C.: The number of partitions of a set. Am. Math. Mon. 71(5), 498–504 (1964)CrossRefMATHMathSciNetGoogle Scholar
  20. 20.
    Schwartz, S.J.: An algorithm for minimizing read only memories for machine control. In: Conference Record of 1968 Ninth Annual Symposium on Switching and Automata Theory, pp. 28–33 (1968)Google Scholar
  21. 21.
    Stritter, S., Tredennick, N.: Microprogrammed implementation of a single chip microprocessor. In: MICRO ’11: Proceedings of the 11th Annual Workshop on Microprogramming, pp. 8–16. IEEE Press, Piscataway, NJ (1978)Google Scholar
  22. 22.
    Tredennick, H.L., Gunter, T.G.: Microprogrammed control apparatus having a two-level control store for data processor. U.S. Patent n. 4,307,445 (1981)Google Scholar
  23. 23.
    Wilkes, M.V.: The best way to design an automatic calculating machine. In: Manchester University Computer Inaugural Conference, pp. 16–18 (1951)Google Scholar
  24. 24.
    Wolfe, A., Chanin, A.: Executing compressed programs on an embedded RISC architecture. In: MICRO ’25: Proceedings of the 25th Annual International Symposium on Microarchitecture, pp. 81–91. IEEE Computer Society Press, Los Alamitos, CA (1992). doi:10.1145/144953.145003
  25. 25.
    Xie, Y., Wolf, W., Lekatsas, H.: Compression ratio and decompression overhead tradeoffs in code compression for VLIW architectures. In: Proceedings of the 4th International Conference on ASIC, pp. 337–340 (2001)Google Scholar
  26. 26.
    Xie, Y., Wolf, W., Lekatsas, H.: Code compression for VLIW processors using variable-to-fixed coding. In: ISSS ’02: Proceedings of the 15th international symposium on System Synthesis, pp. 138–143 (2002). doi:10.1145/581199.581231
  27. 27.
    Zhao, W., Papachristou, C.A.: Architectural partitioning of control memory for application specific programmable processors. In: ICCAD ’95: Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design, pp. 521–526. IEEE Computer Society, Washington, DC (1995)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Edson Borin
    • 1
  • Guido Araujo
    • 1
  • Mauricio BreternitzJr.
    • 2
  • Youfeng Wu
    • 3
  1. 1.Institute of ComputingUniversity of CampinasCampinasBrazil
  2. 2.AMDSunnyvaleUSA
  3. 3.Programming System LabIntel CorporationSanta ClaraUSA

Personalised recommendations