Towards Intelligent Management of Very Large Computing Systems
- First Online:
- Cite this paper as:
- Volk E. et al. (2011) Towards Intelligent Management of Very Large Computing Systems. In: Bischof C., Hegering HG., Nagel W., Wittum G. (eds) Competence in High Performance Computing 2010. Springer, Berlin, Heidelberg
The increasing complexity of current and future very large computing systems with a rapidly growing number of cores and nodes requires high human effort on administration and maintenance of these systems. Existing monitoring tools are neither scalable nor capable to reduce the overwhelming flow of information and provide only essential information of high value. Current management tools lack on scalability and capability to process a huge amount of information intelligently by relating several data and information from various sources together for making right decisions on error/fault handling. In order to solve these problems, we present a solution designed within the TIMaCS project, a hierarchical, scalable, policy based monitoring and management framework.