An introduction to CPU and DSP design in China



In recent years, China has witnessed considerable achievements in the production of domesticallydesigned CPUs and DSPs. Owing to fifteen years of hard work that began in 2001, significant progress has been made in Chinese domestic CPUs and DSPs, primarily represented by Loongson and ShenWei processors. Furthermore parts of the CPU design techniques are comparable to the world’s most advanced designs. A special issue published in Scientia Sinica Informationis in April 2015, is dedicated to exhibiting the technical advancements in Chinese domestically-designed CPUs and DSPs. The content in this issue describes the design and optimization of high performance processors and the key technologies in processor development; these include high-performance micro-architecture design, many-core and multi-core design, radiation hardening design, highperformance physical design, complex chip verification, and binary translation technology. We hope that the articles we collected will promote understanding of CPU/DSP progress in China. Moreover, we believe that the future of Chinese domestic CPU/DSP processors is quite promising.



This is a preview of subscription content, access via your institution.


  1. 1

    Hu W W, Wang J, Gao X, et al. Godson-3: a scalable multicore risc processor with x86 emulation. IEEE Micro, 2009, 29: 17–29

    Article  Google Scholar 

  2. 2

    Huang Y Q, Zhu Y, Ju P J, et al. Functional verification of “Shenwei-1” high performance microprocessor. J Softw, 2009, 20: 1077–1086

    Article  Google Scholar 

  3. 3

    Yang X J, Yan X B, Xing Z C, et al. A 64-bit stream processor architecture for scientific applications. In: Proceedings of the 34th Annual International Symposium on Computer Architecture, San Diego, 2007. 210–219

    Google Scholar 

  4. 4

    Hu W W, Xiao L M, An H. Editor’s note (in Chinese). Sci Sin Inform, 2015, 45: 457–458

    Google Scholar 

  5. 5

    Hu W W, Jin G J, Wang W X, et al. LoongISA for compatibility with mainstream instruction set architecture (in Chinese). Sci Sin Inform, 2015, 45: 459–479

    Article  Google Scholar 

  6. 6

    Wu R Y, Wang W X, Wang H D, et al. Design of Loongson GS464E processor architecture (in Chinese). Sci Sin Inform, 2015, 45: 480–500

    Article  Google Scholar 

  7. 7

    Yang X, Fan Y C, Fan B X. Loongson X-CPU radiation hardening by design (in Chinese). Sci Sin Inform, 2015, 45: 501–512

    Article  Google Scholar 

  8. 8

    Hu X D, Yang J X, Zhu Y. Shenwei-1600: a high-performance multi-core microprocessor (in Chinese). Sci Sin Inform, 2015, 45: 513–522

    Article  Google Scholar 

  9. 9

    Zheng F, Xu Y, Li H L, et al. A homegrown many-core processor architecture for high-performance computing (in Chinese). Sci Sin Inform, 2015, 45: 523–534

    Article  Google Scholar 

  10. 10

    Hu X D, Ju P J, Zhu Y, et al. Hierarchical and reusable simulation environment for high-performance processor verification (in Chinese). Sci Sin Inform, 2015, 45: 535–547

    Article  Google Scholar 

  11. 11

    Wang X, Ke X M. Design of a hierarchical clock distribution network with low clock skew and tolerance for process variations (in Chinese). Sci Sin Inform, 2015, 45: 548–559

    Article  Google Scholar 

  12. 12

    Chen S M, Liu S, Wan J H, et al. Coordinate multi-core DSP YHFT-QMBase: architecture and implementation (in Chinese). Sci Sin Inform, 2015, 45: 560–573

    Article  Google Scholar 

  13. 13

    Hong Y, Fang T L, Zhao B, et al. BWDSP100 and its applications (in Chinese). Sci Sin Inform, 2015, 45: 574–586

    Article  Google Scholar 

  14. 14

    Hu W W, Wang R, Chen Y J, et al. Godson-3B: a 1 GHz 40 W 8-core 128GFLOPS processor in 65 nm CMOS. In: Proceedings of IEEE International Solid-State Circuits Conference Digest of Technical Papers. San Francisco: IEEE, 2011. 76–78

    Google Scholar 

  15. 15

    Hu W W, Zhang Y F, Yang L, et al. Godson-3B1500: a 32 nm 1.35 GHz 40W 172.8 GFLOPS 8-core processor. In: Proceedings of IEEE International Solid-State Circuits Conference Digest of Technical Papers. San Francisco: IEEE, 2013. 54–55

    Google Scholar 

  16. 16

    Hu W W, Yang L, Fan B X, et al. An 8-core MIPS-compatible processor in 32/28 nm bulk CMOS. IEEE J Solid-State Circ, 2014, 49: 41–49

    Article  Google Scholar 

  17. 17

    Hu W W, Zhang F X, Li Z S. Microarchitecture of the Godson-2 processor. J Comput Sci Technol, 2005, 20: 243–249

    Article  Google Scholar 

  18. 18

    Lacoe R C. Improving integrated circuit performance through the application of hardness-by-design methodology. IEEE Trans Nucl Sci, 2008, 55: 1903–1925

    Article  Google Scholar 

  19. 19

    Mitra S, Seifert N, Zhang M, et al. Robust system design with built-in soft-error resilience. Computer, 2005, 38: 43–52

    Article  Google Scholar 

  20. 20

    Jung H, Ju M, Che H A. A theoretical framework for design space exploration of manycore processors. In: Proceedings of the 19th Annual IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, Singapore, 2011. 117–125

    Google Scholar 

  21. 21

    Duran A, Klemm M. The intel many integrated core architecture. In: Proceedings of International Conference on High Performance Computing and Simulation (HPCS), Madrid, 2012. 365–366

    Google Scholar 

  22. 22

    Seiler L, Carmean D, Sprangle E, et al. Larrabee: a many-core x86 architecture for visual computing. IEEE Micro, 2009, 29: 10–21

    Article  Google Scholar 

  23. 23

    Lee Y, Avizienis R, Bishara A, et al. Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. In: Proceedings of the 38th Annual International Symposium on Computer Architecture, San Jose, 2011. 129–140

    Google Scholar 

  24. 24

    Woh M, Seo S, Mahlke S, et al. AnySP: anytime anywhere anyway signal processing. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, Austin, 2009. 128–139

    Google Scholar 

  25. 25

    Rowen C, Nicolaescu D, Ravindran R. The world’s fastest DSP core: breaking the 100 GMAC/s barrier. In: Proceedings of the 23rd Hot Chips Conference, Memorial Auditorium. Palo Alto: Standford University Press, 2011. 21–23

    Google Scholar 

  26. 26

    Zhao X W. DSP Application and Development Base on TMS320C6200 Series. Beijing: Publishing House of Posts & Telecom Press, 2002. 14–17

    Google Scholar 

  27. 27

    Liu S M, Luo Y J. DSP Principle and Application Design of ADSP TS20XS Series. Beijing: Publishing House of Electronics Industry, 2007

    Google Scholar 

  28. 28

    Texas Instruments Incorporated. Multicore Fixed and Floating-Point Digital Signal Processor. TMS320C6678, 2010. 14–15

    Google Scholar 

  29. 29

    Ottoni G, Hartin T, Weaver C, et al. Harmonia: a transparent, efficient, and harmonious dynamic binary translator targeting the Intel architecture. In: Proceedings of the 8th ACM International Conference on Computing Frontiers, New York, 2011, 26: 1–10

    Google Scholar 

  30. 30

    Chang X, Franke H, Ge Y, et al. Improving virtualization in the presence of software managed translation lookaside buffers. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, New York, 2013. 120–129

    Google Scholar 

  31. 31

    Hu W W, Liu Q, Wang J, et al. Efficient binary translation system with low hardware cost. In: Proceedings of IEEE International Conference on Computer Design, Lake Tahoe, 2009. 305–312

    Google Scholar 

  32. 32

    Bryant R E. A methodology for hardware verification based on logic simulation. J ACM, 1991, 38: 299–328

    MathSciNet  Article  MATH  Google Scholar 

  33. 33

    Zhu Y, Chen C, Li Y Z, et al. Design and implementation of FPGA verification platform for multi-core processor. J Comput Res Dev, 2014, 51: 1295–1303

    Google Scholar 

  34. 34

    Schubert K D, Roesner W, Ludden J M, et al. Functional verification of the IBM POWER7 microprocessor and POWER7 multiprocessor systems. IBM J Res Dev, 2011, 55: 1–10

    Article  Google Scholar 

  35. 35

    Sagahyroon A, Lakkaraj G, Karunaratne M. Verification components reuse. J Comput, 2012, 7: 2641–2649

    Google Scholar 

  36. 36

    Cyclos Semiconductor. Addressing the Power-Performance IC Design Conundrum-A Novel Clock Design Technique to Reduce Power and Increase Performance, 2012

    Google Scholar 

  37. 37

    Chattopadhyay A, Zilic Z. Flexible and reconfigurable mismatch-tolerant serial clock distribution networks. IEEE Trans VL SI Syst, 2012, 20: 523–536

    Article  Google Scholar 

Download references

Author information



Corresponding authors

Correspondence to Weiwu Hu or Jie Fu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hu, W., Zhang, Y. & Fu, J. An introduction to CPU and DSP design in China. Sci. China Inf. Sci. 59, 1–8 (2016).

Download citation


  • Chinese domestic CPUs and DSPs
  • Loongson CPU
  • ShenWei CPU
  • 012101


  • 国产处理器芯片
  • 国产数字信号处理芯片
  • 龙芯处理器
  • 申威处理器
  • YHFT数字信号处理芯片
  • BW数字信号处理芯片