Workload Characterization and Evolutionary Analyses of Tianhe-1A Supercomputer

  • Jinghua Feng
  • Guangming Liu
  • Jian Zhang
  • Zhiwei Zhang
  • Jie Yu
  • Zhaoning Zhang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10860)

Abstract

Currently, supercomputer systems face a variety of application challenges, including high-throughput, data-intensive, and stream-processing applications. At the same time, there is more challenge to improve user satisfaction at the supercomputers such as Tianhe-1A, Tianhe-2 and TaihuLight, because of the commercial service model. It is important to understand HPC workloads and their evolution to facilitate informed future research and improve user satisfaction.

In this paper, we present a methodology to characterize workloads on the commercial supercomputer (users need to pay), at a particular period and its evolution over time. We apply this method to the workloads of Tianhe-1A at the National Supercomputer Center in Tianjin. This paper presents the concept of quota-constrained waiting time for the first time, which has significance for optimizing scheduling and enhancing user satisfaction on the commercial supercomputer.

Keywords

HPC Workload Quota-constrained Scheduling 

References

  1. 1.
    Geist, A., et al.: A survey of high-performance computing scaling challenges. Int. J. High Perform. Comput. Appl. 33(1), 104–113 (2017)CrossRefGoogle Scholar
  2. 2.
    Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58(7), 56–58 (2015)CrossRefGoogle Scholar
  3. 3.
    Iosup, A., et al.: The Grid workloads archive. Future Gener. Comput. Syst. 24(7), 672–686 (2008)CrossRefGoogle Scholar
  4. 4.
    Di, S., et al.: Characterization and comparison of cloud versus grid workloads. In: IEEE International Conference on CLUSTER Computing (CLUSTER) (2012)Google Scholar
  5. 5.
    Rodrigo, G.P., et al.: HPC system lifetime story: workload characterization and evolutionary analyses on NERSC systems. In: International Symposium on High-Performance Parallel and Distributed Computing (HPDC) (2015)Google Scholar
  6. 6.
    Schlagkamp, S., et al.: Consecutive job submission behavior at mira supercomputer. In: International Symposium on High-Performance Parallel and Distributed Computing (HPDC) (2016)Google Scholar
  7. 7.
    Rodrigo, G.P., et al.: Towards understanding job heterogeneity in HPC: a NERSC case study. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (2016)Google Scholar
  8. 8.
    Rodrigo, G.P., et al.: Towards understanding HPC users and systems: a NERSC case study. J. Parallel Distrib. Comput. 111, 206–221 (2017)CrossRefGoogle Scholar
  9. 9.
    Feitelson, D.G., Rudolph, L., Schwiegelshohn, U.: Parallel job scheduling—A status report. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2004. LNCS, vol. 3277, pp. 1–16. Springer, Heidelberg (2005).  https://doi.org/10.1007/11407522_1CrossRefGoogle Scholar
  10. 10.
    Lifka, D.A.: The ANL/IBM SP scheduling system. In: Feitelson, D.G., Rudolph, L. (eds.) JSSPP 1995. LNCS, vol. 949, pp. 295–303. Springer, Heidelberg (1995).  https://doi.org/10.1007/3-540-60153-8_35CrossRefGoogle Scholar
  11. 11.
    Luu, H., et al.: A multiplatform study of I/O behavior on petascale supercomputers. In: International Symposium on High-Performance Parallel and Distributed Computing (HPDC) (2015)Google Scholar
  12. 12.
    Feitelson, D.: Parallel workloads archive http://www.cs.huji.ac.il/labs/parallel/workload. Accessed 11 Feb 2018
  13. 13.
    Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: simple linux utility for resource management. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003).  https://doi.org/10.1007/10968987_3CrossRefGoogle Scholar
  14. 14.
    Sun, N., et al.: High-performance computing in China: research and applications. Int. J. High Perform. Comput. Appl. 24, 363–409 (2010)CrossRefGoogle Scholar
  15. 15.
    Mishra, A.K., et al.: Towards characterizing cloud backend workloads: insights from google compute clusters. ACM Sigmetrics Perform. Eval. Rev. 37(4), 34–41 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Jinghua Feng
    • 1
    • 2
  • Guangming Liu
    • 1
    • 2
  • Jian Zhang
    • 2
  • Zhiwei Zhang
    • 2
  • Jie Yu
    • 1
  • Zhaoning Zhang
    • 1
  1. 1.College of ComputerNational University of Defense TechnologyChangshaChina
  2. 2.National Supercomputer Center in TianjinTianjinChina

Personalised recommendations