Frontiers of Computer Science

, Volume 8, Issue 3, pp 345–356 | Cite as

MilkyWay-2 supercomputer: system and application

  • Xiangke LiaoEmail author
  • Liquan Xiao
  • Canqun Yang
  • Yutong Lu
Research Article


On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity-off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16-core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.


MilkyWay-2 supercomputer petaflops computing neo-heterogeneous architecture interconnect network heterogeneous programing model system management benchmark optimization performance evaluation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Yang X J, Liao X K, Lu K, Hu Q F, Song J Q, Su J S. The Tianhe-1a supercomputer: its hardware and software. Journal of Computer Science and Technology, 2011, 26(3): 344–351CrossRefGoogle Scholar
  2. 2.
    Zhang H, Wang K, Zhang J, Wu N, Dai Y. A fast and fair shared buffer for high-radix router. Journal of Circuits, Systems, and Computers, 2013Google Scholar
  3. 3.
    Kirk D. Nvidia cuda software and GPU parallel computing architecture. In: Proceedings of the 6th International Symposium on Memory Management. 2007, 103–104Google Scholar
  4. 4.
    Sherlekar S. Tutorial: Intel many integrated core (MIC) architecture. In: Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems. 2012, 947Google Scholar
  5. 5.
    Gaster B, Howes L, Kaeli D R, Mistry P, Schaa D. Heterogeneous Computing with OpenCL. Morgan Kaufmann Publishers Inc., 2011Google Scholar
  6. 6.
    Lee S, Vetter J S. Early evaluation of directive-based GPU programming models for productive exascale computing. In: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis. 2012, 1–11Google Scholar
  7. 7.
    Wienke S, Springer P, Terboven C, Mey D. Openacc: first experiences with real-world applications. In: Proceedings of the 18th International Conference on Parallel Processing. 2012, 859–870Google Scholar
  8. 8.
    PGI Accelerator Compilers. Portland Group Inc, 2011Google Scholar
  9. 9.
    Yang X L, Tang T, Wang G B, Jia J, Xu X H. MPtoStream: an openMP compiler for CPU-GPU heterogeneous parallel systems. Science China Information Sciences, 2012, 55(9): 1961–1971CrossRefGoogle Scholar
  10. 10.
    Dolbeau R, Bihan S, Bodin F. Hmpp: a hybrid multi-core parallel programming environment. In: Proceedings of the 2007 Workshop on General Purpose Processing on Graphics Processing Units. 2007, 1–5Google Scholar
  11. 11.
    Checconi F, Petrini F, Willcock J, Lumsdaine A, Choudhury A R, Sabharwal Y. Breaking the speed and scalability barriers for graph exploration on distributed-memory machines. In: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and Analysis. 2012, 1–12CrossRefGoogle Scholar
  12. 12.
    Beamer S, Buluç A, Asanovic K, Patterson D. Distributed memory breadth-first search revisited: enabling bottom-up search. In: Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum. 2013, 1618–1627Google Scholar
  13. 13.
    Subramaniam S, Mehrotra M, Gupta D. Virtual high throughput screening (VHIS)-a perspective. Bioinformation, 2007, 3(1): 14–17CrossRefGoogle Scholar
  14. 14.
    Tanrikulu Y, Krüger B, Proschak E. The holistic integration of virtual screening in drug discovery. Drug Discovery Today, 2013, 18(7): 358–364CrossRefGoogle Scholar
  15. 15.
    Zhang X, Wong S E, Lightstone F C. Message passing interface and multithreading hybrid for parallel molecular docking of large databases on petascale high performance computing machines. Journal of Computational Chemistry, 2013, 34(11): 915–927CrossRefGoogle Scholar
  16. 16.
    Lang P T, Brozell S R, Mukherjee S, Pettersen E F, Meng E C, Thomas V, Rizzo R C, Case D A, James T L, Kuntz I D. Dock 6: combining techniques to model RNA-small molecule complexes. RNA, 2009, 15(6): 1219–1230CrossRefGoogle Scholar
  17. 17.
    Gao Z, Li H, Zhang H, Liu X, Kang L, Luo X, Zhu W, Chen K, Wang X, Jiang H. PDTD: a web-accessible protein database for drug target identification. BMC Bioinformatics, 2008, 9(1): 104CrossRefGoogle Scholar
  18. 18.
    Yang C, Xue W, Fu H, Gan L, Li L, Xu Y, Lu Y, Sun J, Yang G, Zheng W. A peta-scalable CPU-GPU algorithm for global atmospheric simulations. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2013, 1–12Google Scholar

Copyright information

© Higher Education Press and Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Xiangke Liao
    • 1
    • 2
    Email author
  • Liquan Xiao
    • 2
  • Canqun Yang
    • 1
    • 2
  • Yutong Lu
    • 2
  1. 1.Science and Technology on Parallel and Distributed Processing LaboratoryNational University of Defense TechnologyChangshaChina
  2. 2.College of ComputerNational University of Defense TechnologyChangshaChina

Personalised recommendations