Abstract
Managing very large computing systems with up to 100.000 nodes has become a very complex issue. Existing tools reach their limits especially for High Performance Computing (HPC) resources because they are slightly different from other compute resources. First we will introduce the specific HPC obstacles and what we suppose to be challenges for future resources to support the system management. After that we propose the framework designed in scope of the TIMaCS Project (http://www.timacs.de). Assuming that we once have a corresponding solution implemented we will show how this solution can change administration far beyond the current situation. This is separated into a more technical part describing how the administration can be simplified or where we can add new capabilities in resources provisioning and a business part where we outline the need for business policy based management and scheduling, and show a possible approach investigating these relationships. In the end we will show what might be possible far beyond the scope of the project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
TIMaCS project web-site http://www.timacs.de
Top 500 supercomputing Sites http://www.top500.org
Rami Matarneh (2009). Multi Microkernel Operating Systems for Multi-Core Processors, Journal of Computing Science 5 (7) (pp. 493–500). ISSN 1549-3936
Linux Magazine, Technical Review, Monitoring (2007)
Nagios project web-site http://www.nagios.org. Cited 28 May 2010
Big Brother product web-site http://www.bb4.com
Zenoss project web-site http://www.zenoss.com
Buchholz, J., Volk, E.: The Need for New Monitoring and Management Technologies in Large Scale Computing Systems. In: Proceedings of eChallenges 2010, to appear
IBM: An architectural blueprint for autonomic computing http://www-01.ibm.com/software/tivoli/autonomic/pdfs/AC_Blueprint_White_Paper_4th.pdf, IBM Whitepaper, June 2006. Cited 28 May 2010
AMQP web-site http://www.amqp.org. Cited 28 May 2010
Ganglia sourceforge web-site http://ganglia.sourceforge.net. Cited 28 May 2010
Wikipedia: Cron description http://en.wikipedia.org/wiki/Cron. Cited 28 May 2010
Clusterresources: Moab Workload Manager user-guide http://www.clusterresources.com/products/mwm/docs/moabusers.shtml. Cited 28 May 2010
Clusterresources: Maui Scheduler Administrator’s Guide, version 3.2 http://www.clusterresources.com/products/maui/docs/. Cited 28 May 2010
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Buchholz, J., Volk, E. (2010). Towards an Architecture for Management of Very Large Computing Systems. In: Resch, M., et al. High Performance Computing on Vector Systems 2010. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11851-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-11851-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11850-0
Online ISBN: 978-3-642-11851-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)