Towards an Architecture for Management of Very Large Computing Systems

Buchholz, Jochen; Volk, Eugen

doi:10.1007/978-3-642-11851-7_2

Jochen Buchholz⁸ &
Eugen Volk⁸

562 Accesses

Abstract

Managing very large computing systems with up to 100.000 nodes has become a very complex issue. Existing tools reach their limits especially for High Performance Computing (HPC) resources because they are slightly different from other compute resources. First we will introduce the specific HPC obstacles and what we suppose to be challenges for future resources to support the system management. After that we propose the framework designed in scope of the TIMaCS Project (http://www.timacs.de). Assuming that we once have a corresponding solution implemented we will show how this solution can change administration far beyond the current situation. This is separated into a more technical part describing how the administration can be simplified or where we can add new capabilities in resources provisioning and a business part where we outline the need for business policy based management and scheduling, and show a possible approach investigating these relationships. In the end we will show what might be possible far beyond the scope of the project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

TIMaCS project web-site http://www.timacs.de
Top 500 supercomputing Sites http://www.top500.org
Rami Matarneh (2009). Multi Microkernel Operating Systems for Multi-Core Processors, Journal of Computing Science 5 (7) (pp. 493–500). ISSN 1549-3936
Article Google Scholar
Linux Magazine, Technical Review, Monitoring (2007)
Google Scholar
Nagios project web-site http://www.nagios.org. Cited 28 May 2010
Big Brother product web-site http://www.bb4.com
Zenoss project web-site http://www.zenoss.com
Buchholz, J., Volk, E.: The Need for New Monitoring and Management Technologies in Large Scale Computing Systems. In: Proceedings of eChallenges 2010, to appear
Google Scholar
IBM: An architectural blueprint for autonomic computing http://www-01.ibm.com/software/tivoli/autonomic/pdfs/AC_Blueprint_White_Paper_4th.pdf, IBM Whitepaper, June 2006. Cited 28 May 2010
AMQP web-site http://www.amqp.org. Cited 28 May 2010
Ganglia sourceforge web-site http://ganglia.sourceforge.net. Cited 28 May 2010
Wikipedia: Cron description http://en.wikipedia.org/wiki/Cron. Cited 28 May 2010
Clusterresources: Moab Workload Manager user-guide http://www.clusterresources.com/products/mwm/docs/moabusers.shtml. Cited 28 May 2010
Clusterresources: Maui Scheduler Administrator’s Guide, version 3.2 http://www.clusterresources.com/products/maui/docs/. Cited 28 May 2010

Download references

Author information

Authors and Affiliations

High Performance Computing Center Stuttgart (HLRS), Nobelstr. 19, 70569, Stuttgart, Germany
Jochen Buchholz & Eugen Volk

Authors

Jochen Buchholz
View author publications
You can also search for this author in PubMed Google Scholar
Eugen Volk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jochen Buchholz .

Editor information

Editors and Affiliations

Stuttgart (HLRS), Universität Stuttgart, Höchstleistungsrechenzentrum, Nobelstraße 19, Stuttgart, 70569, Germany
Michael Resch
Stuttgart (HLRS), Universität Stuttgart, Höchstleistungsrechenzentrum, Nobelstraße 19, Stuttgart, 70569, Germany
Katharina Benkert
Stuttgart (HLRS), Universität Stuttgart, Höchstleistungsrechenzentrum, Nobelstraße 19, Stuttgart, 70569, Germany
Xin Wang
Europe GmbH, NEC High Performance Computing, Prinzenallee 11, Düsseldorf, 40459, Germany
Martin Galle
Europe GmbH, NEC High Performance Computing, Prinzenallee 11, Düsseldorf, 40459, Germany
Wolfgang Bez
Cyberscience Center, Tohoku University, Aramaki-Aza-Aoba 4F, Sendai, 980-8578, Japan
Hiroaki Kobayashi
Simulation Sciences, German Research School for, Schinkelstrasse 2a, Aachen, 52062, Germany
Sabine Roller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buchholz, J., Volk, E. (2010). Towards an Architecture for Management of Very Large Computing Systems. In: Resch, M., et al. High Performance Computing on Vector Systems 2010. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11851-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-11851-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11850-0
Online ISBN: 978-3-642-11851-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Towards an Architecture for Management of Very Large Computing Systems