Capacity Estimation in HPC Systems: Simulation Approach

  • A. Anghelescu
  • R. B. Lenin
  • S. Ramaswamy
  • K. Yoshigoe
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6536)

Abstract

As HPC (high performance computing) systems are extensively employed for heavy computational problems throughout heterogeneous environments, the scale and complexity of applications raises the issue of capacity planning. A cardinal aspect of efficiency is the job scheduler in any HPC systems. The job scheduling techniques can worsen or mitigate issues such as job starvation, increased queue time, and decreased system utilization. Since the impact of scheduling techniques is dependent on the workload of a supercomputer, this research proposes to analyze various scheduling disciplines on a given workload. By simulating HPC system, for any given workload, we can find the paradigm that yields the best performance, i.e. minimizing the wait time of jobs in the queue while maximizing resource utilization. Furthermore, given a fixed configuration of a HPC system, this research can be used to determine an appropriate workload that optimizes the system’s performance. The development and implementation of such complex simulation framework for HPC does not yet exist in HPC’s literature. The efficiency of the proposed simulation framework is illustrated through simulation results of performance measures such as average queuing time, average number of jobs in the queue, and system utilization. These results are verified by a developed mathematical model for job load characterization.

Keywords

queuing disciplines job load characterization model discrete event system simulation average queuing time utilization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    OMNeT++ (2010), http://www.omnetpp.org
  2. 2.
    Bansal, N., Harchol-Balter, M.: Analysis of srpt scheduling: Investigating unfairness. ACM SIGMETRICS Performance Evaluation Review 29(1), 279–290 (2001)CrossRefGoogle Scholar
  3. 3.
    Cirne, W., Berman, F.: Adaptive selection of partition size for supercomputer requests. In: Feitelson, D.G., Rudolph, L. (eds.) IPDPS-WS 2000 and JSSPP 2000. LNCS, vol. 1911, pp. 187–207. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  4. 4.
    Hurst, W.B., Ramaswamy, S., Lenin, R.B., Hoffman, D.: Development of generalized hpc simulator. In: Proc. of Acxiom Laboratory for Applied Research 2010 (2010)Google Scholar
  5. 5.
    Iqbal, S., Gupta, S.R., Fang, Y.-C.: Planning considerations for job scheduling in hpc clusters. Dell Power Solutions Magazine, 133–136 (February 2005)Google Scholar
  6. 6.
    Jackson, D.B., Jackson, H.L., Snell, Q.O.: Simulation based HPC workload analysis. In: Proc. of International Parallel and Distributed Processing Symposium (2001)Google Scholar
  7. 7.
    Jones, J.P., Nitzberg, B.: Scheduling for parallel supercomputing: A historical perspective of achievable utilization. In: Feitelson, D., Rudolph, L. (eds.) JSSPP 1999, IPPS-WS 1999, and SPDP-WS 1999. LNCS, vol. 1659, pp. 1–16. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  8. 8.
    Lui, H.-L., Shooman, M.L.: Simulation of computer network reliability with congestion. In: Proc. of Annual Reliability and Maintainability Symposium, pp. 208–213 (1999)Google Scholar
  9. 9.
    Menascé, D.A., Almeida, V.A.F., Dowdy, L.W.: Capacity Planning and Performance Modeling: From Mainframes to Client-Server Systems. Prentice-Hall, Upper Saddle River (1994)Google Scholar
  10. 10.
    Merkuryev, Y., Tolujew, J., Blumel, E., Novitsky, L., Ginters, E., Viktorova, E., Merkuryeva, G., Pronins, J.: A modelling and simulation methodology for managing the riga harbour container terminal. Simulation 71(2), 84–95 (1998)CrossRefGoogle Scholar
  11. 11.
    Riesen, R.: Simulating a supercomputer. Presentation, Sandia National Laboratories, Wildhaus, Switzerland (March 2008), http://sos12.epfl.ch/riesen.pdf
  12. 12.
    Streit, A.: The self-tuning dynp job-scheduler. In: Proc. of the 20th International Parallel and Distributed Processing Symposium, pp. 1530–2075 (2002)Google Scholar
  13. 13.
    Thanalapati, T., Dandamudi, S.: An efficient adaptive scheduling scheme for distributed memory multicomputers. IEEE Transactions on Parallel and Distributed Systems 12(7), 758–768 (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • A. Anghelescu
    • 1
  • R. B. Lenin
    • 2
  • S. Ramaswamy
    • 3
  • K. Yoshigoe
    • 4
  1. 1.Department of Mathematics and Computer ScienceEmory UniversityAtlantaUSA
  2. 2.Department of MathematicsUniversity of Central ArkansasConwayUSA
  3. 3.Industrial Software SystemsABB Corporate ResearchBangaloreIndia
  4. 4.Department of Computer ScienceUniversity of Arkansas at Little RockLittle RockUSA

Personalised recommendations