Autonomic Management of Large Clusters and Their Integration into the Grid

Röblitz, Thomas; Schintke, Florian; Reinefeld, Alexander; Bärring, Olof; Barroso Lopez, Maite; Cancio, German; Chapeland, Sylvain; Chouikh, Karim; Cons, Lionel; Poznański, Piotr; Defert, Philippe; Iven, Jan; Kleinwort, Thorsten; Panzer-Steindel, Bernd; Polok, Jaroslaw; Rafflin, Catherine; Silverman, Alan; Smith, Tim; Eldik, Jan; Front, David; Biasotto, Massimo; Aiftimiei, Cristina; Ferro, Enrico; Maron, Gaetano; Chierici, Andrea; Dell’agnello, Luca; Serra, Marco; Michelotto, Michele; Hess, Lord; Lindenstruth, Volker; Pister, Frank; Morten Steinbeck, Timm; Groep, David; Steenbakkers, Martijn; Koeroo, Oscar; de Cerff, Wim Som; Venekamp, Gerben; Anderson, Paul; Colles, Tim; Holt, Alexander; Scobie, Alastair; George, Michael; Washbrook, Andrew; Leiva, Rafael A. García

doi:10.1007/s10723-004-7647-3

Autonomic Management of Large Clusters and Their Integration into the Grid

Published: 02 March 2005

Volume 2, pages 247–260, (2004)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Thomas Röblitz¹,
Florian Schintke¹,
Alexander Reinefeld¹,
Olof Bärring²,
Maite Barroso Lopez²,
German Cancio²,
Sylvain Chapeland²,
Karim Chouikh²,
Lionel Cons²,
Piotr Poznański²,
Philippe Defert²,
Jan Iven²,
Thorsten Kleinwort²,
Bernd Panzer-Steindel²,
Jaroslaw Polok²,
Catherine Rafflin²,
Alan Silverman²,
Tim Smith²,
Jan Eldik²,
David Front³,
Massimo Biasotto⁴,
Cristina Aiftimiei⁴,
Enrico Ferro⁴,
Gaetano Maron⁴,
Andrea Chierici⁵,
Luca Dell’agnello⁵,
Marco Serra⁶,
Michele Michelotto⁷,
Lord Hess⁸,
Volker Lindenstruth⁸,
Frank Pister⁸,
Timm Morten Steinbeck⁸,
David Groep⁹,
Martijn Steenbakkers⁹,
Oscar Koeroo⁹,
Wim Som de Cerff⁹,
Gerben Venekamp⁹,
Paul Anderson¹⁰,
Tim Colles¹⁰,
Alexander Holt¹⁰,
Alastair Scobie¹⁰,
Michael George¹¹,
Andrew Washbrook¹¹ &
…
Rafael A. García Leiva¹²

75 Accesses
12 Citations
3 Altmetric
Explore all metrics

Abstract

We present a framework for the co-ordinated, autonomic management of multiple clusters in a compute center and their integration into a Grid environment. Site autonomy and the automation of administrative tasks are prime aspects in this framework. The system behavior is continuously monitored in a steering cycle and appropriate actions are taken to resolve any problems.

All presented components have been implemented in the course of the EU project DataGrid: The Lemon monitoring components, the FT fault-tolerance mechanism, the quattor system for software installation and configuration, the RMS job and resource management system, and the Gridification scheme that integrates clusters into the Grid.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autonomic Computing: Models, Applications, and Brokerage

Managing Large Computer Systems with Change Objects

Self-managed Computer Systems: Foundations and Examples

References

R. Alfieri, “VOMS: An Authorization System for Virtual Organizations”, in Proceedings of the 1st European Across Grids Conference, Santiago de Compostela, Spain, 2003.
E. Anderson and D. Patterson, “Extensible, Scalable Monitoring for Clusters of Computers”, in Proceedings of the 11th Systems Administration Conference (LISA’97), San Diego, CA, USA, 1997.
P. Anderson and A. Scobie, “LCFG: The Next Generation”, in UKUUG Winter Conference, 2002.
S. Bethke, M. Calvetti, H. Hoffmann, D. Jacobs, M. Kasemann and D. Linglin, “Report of the Steering Group of the LHC Computing Review”, Technical Report, CERN European Organization for Nuclear Research, 2001.
B. Bode, D. Halstead, R. Kendall and Z. Lei, “The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters”, in USENIX Conference, Atlanta, GA, 2000.
M. Burgess, “Cfengine: A Site Configuration Engine”, USENIX Computing Systems, Vol. 8, No. 3, 1995.
DataGrid, “EU DataGrid Project Homepage”, 2004. http://www.eu-datagrid.org/
I. Foster, C. Kesselman, G. Tsudik and S. Tuecke, “A Security Architecture for Computational Grids”, in Proceedings of the 5th ACM Conference on Computer and Communications Security Conference, San Francisco, CA, USA, pp. 83–92, 1998.
A. Frohner, “DataGrid Security Design Report”, Technical Report, EU DataGrid Project, 2003.
Hawkeye, “Condor Hawkeye Homepage”, 2004. http://www.cs.wisc.edu/condor/hawkeye/
R. Henderson, “Job Scheduling under the Portable Batch System”, in Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science, Vol. 949, pp. 279–294, 1995.
S. Kannan, M. Roberts, P. Mayes, D. Brelsford and J. Skovira, Workload Management with LoadLeveler, IBM Redbooks, 2001.
A. Keller and A. Reinefeld, “Anatomy of a Resource Management System for HPC Clusters”, in Annual Review of Scalable Computing, Vol. 3, 2001.
J.O. Kephart and D.M. Chess, “The Vision of Autonomic Computing”, IEEE Computer, Vol. 36, No. 1, 41–50, 2001.
Google Scholar
M. Lorch, D.B. Adams, D. Kafura, M.S.R. Koneni, A. Rathi and S. Shah, “The PRIMA System for Privilege Management, Authorization and Enforcement in Grid Environments”, in Proceedings of the 4th International Workshop on Grid Computing – Grid 2003, Phoenix, AR, USA, 2003.
OSCAR, “OSCAR Homepage”, 2004. http://oscar.sourceforge.net/
P. Papadopoulos, M. Katz and G. Bruno, “NPACI Rocks: Tools and Techniques for Easily Deploying Manageable Linux Clusters”, Concurrency and Computation: Practice and Experience, Vol. 15, Nos. 7–8, 707–725, 2003.
Google Scholar
Patrol, “Patrol Homepage”, 2004. http://www-d0en.fnal.gov/patrol/patrol_doc.html
Performance Co-Pilot, “Performance Co-Pilot Homepage”, 2004. http://oss.sgi.com/projects/pcp/
A. Reinefeld and V. Lindenstruth, “How to Build a High-Performance Compute Cluster for the Grid”, in 2nd International Workshop on Metacomputing Systems and Applications (MSA2001), Valencia, Spain, 2001.
T. Roeblitz, F. Schintke and A. Reinefeld, “From Clusters to the Fabric: The Job Management Perspective”, in Proceedings of the IEEE International Conference on Cluster Computing (Cluster’03), Hong Kong, China, 2003.
F.D. Sacerdoti, M.J. Katz, M.L. Massie and D.E. Culler, “Wide Area Cluster Monitoring with Ganglia”, in Proceedings of the IEEE International Conference on Cluster Computing (Cluster’03), Hong Kong, China, 2003.
SGE, “Sun Grid Engine Homepage”, 2004. http://www.sun.com/software/gridware/
SNMP, “Simple Network Management Protocol”, 2004. http://www.faqs.org/rfcs/rfc1157.html
P. Uthayopas, J. Maneesilp and P. Ingongnam, “SCMS: An Integrated Cluster Management Tool for Beowulf Cluster System”, in Proceedings of the International Conference on Parallel and Distributed Proceeding Techniques and Applications 2000 (PDPTA’2000), Las Vegas, NV, USA, 2000.
VACM, “VACM Homepage”, 2004. http://vacm.sourceforge.net/
S. Zhou, X. Zheng, J. Wang and P. Delisle, “Utopia: A Load Sharing Facility for Large, Heterogenous Distributed Computer Systems”, Software – Practice & Experience, Vol. 23, No. 12, 1305–1336, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

ZIB, Takustraße 7, D-14195, Berlin Dahlem, Germany
Thomas Röblitz, Florian Schintke & Alexander Reinefeld
CERN, CH-1211, Geneva-23, Switzerland
Olof Bärring, Maite Barroso Lopez, German Cancio, Sylvain Chapeland, Karim Chouikh, Lionel Cons, Piotr Poznański, Philippe Defert, Jan Iven, Thorsten Kleinwort, Bernd Panzer-Steindel, Jaroslaw Polok, Catherine Rafflin, Alan Silverman, Tim Smith & Jan Eldik
Weizmann Institute of Science, PO Box 26, Rehovot, 76100, Israel
David Front
INFN, Viale dell’Università 2, I-35020, Legnaro (Padova), Italy
Massimo Biasotto, Cristina Aiftimiei, Enrico Ferro & Gaetano Maron
INFN, Viale Berti Pichat 6/2, I-40127, Bologna, Italy
Andrea Chierici & Luca Dell’agnello
INFN, P. le Aldo Moro 2, I-00185, Roma, Italy
Marco Serra
INFN, Via Marzolo 8, I-35131, Padova, Italy
Michele Michelotto
Kirchhoff-Institut für Physik, University of Heidelberg, Im Neuenheimer Feld 227, D-69120, Heidelberg, Germany
Lord Hess, Volker Lindenstruth, Frank Pister & Timm Morten Steinbeck
NIKHEF, PO Box 41882, 1009 DB, Amsterdam, The Netherlands
David Groep, Martijn Steenbakkers, Oscar Koeroo, Wim Som de Cerff & Gerben Venekamp
Old College, University of Edinburgh, South Bridge, Edinburgh, EH8 9YL, UK
Paul Anderson, Tim Colles, Alexander Holt & Alastair Scobie
University of Liverpool, Oxford Street, Liverpool L69, 7ZE, UK
Michael George & Andrew Washbrook
Department of Theoretical Physics, Universidad Autónoma de Madrid, Cantoblanco, 28049, Madrid, Spain
Rafael A. García Leiva

Authors

Thomas Röblitz
View author publications
You can also search for this author in PubMed Google Scholar
Florian Schintke
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Reinefeld
View author publications
You can also search for this author in PubMed Google Scholar
Olof Bärring
View author publications
You can also search for this author in PubMed Google Scholar
Maite Barroso Lopez
View author publications
You can also search for this author in PubMed Google Scholar
German Cancio
View author publications
You can also search for this author in PubMed Google Scholar
Sylvain Chapeland
View author publications
You can also search for this author in PubMed Google Scholar
Karim Chouikh
View author publications
You can also search for this author in PubMed Google Scholar
Lionel Cons
View author publications
You can also search for this author in PubMed Google Scholar
Piotr Poznański
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Defert
View author publications
You can also search for this author in PubMed Google Scholar
Jan Iven
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Kleinwort
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Panzer-Steindel
View author publications
You can also search for this author in PubMed Google Scholar
Jaroslaw Polok
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Rafflin
View author publications
You can also search for this author in PubMed Google Scholar
Alan Silverman
View author publications
You can also search for this author in PubMed Google Scholar
Tim Smith
View author publications
You can also search for this author in PubMed Google Scholar
Jan Eldik
View author publications
You can also search for this author in PubMed Google Scholar
David Front
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Biasotto
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Aiftimiei
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Ferro
View author publications
You can also search for this author in PubMed Google Scholar
Gaetano Maron
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Chierici
View author publications
You can also search for this author in PubMed Google Scholar
Luca Dell’agnello
View author publications
You can also search for this author in PubMed Google Scholar
Marco Serra
View author publications
You can also search for this author in PubMed Google Scholar
Michele Michelotto
View author publications
You can also search for this author in PubMed Google Scholar
Lord Hess
View author publications
You can also search for this author in PubMed Google Scholar
Volker Lindenstruth
View author publications
You can also search for this author in PubMed Google Scholar
Frank Pister
View author publications
You can also search for this author in PubMed Google Scholar
Timm Morten Steinbeck
View author publications
You can also search for this author in PubMed Google Scholar
David Groep
View author publications
You can also search for this author in PubMed Google Scholar
Martijn Steenbakkers
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Koeroo
View author publications
You can also search for this author in PubMed Google Scholar
Wim Som de Cerff
View author publications
You can also search for this author in PubMed Google Scholar
Gerben Venekamp
View author publications
You can also search for this author in PubMed Google Scholar
Paul Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Tim Colles
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Holt
View author publications
You can also search for this author in PubMed Google Scholar
Alastair Scobie
View author publications
You can also search for this author in PubMed Google Scholar
Michael George
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Washbrook
View author publications
You can also search for this author in PubMed Google Scholar
Rafael A. García Leiva
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This work from the EU DataGrid project was funded by the European Commission grant IST-2000-25182.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Röblitz, T., Schintke, F., Reinefeld, A. et al. Autonomic Management of Large Clusters and Their Integration into the Grid. J Grid Computing 2, 247–260 (2004). https://doi.org/10.1007/s10723-004-7647-3

Download citation

Published: 02 March 2005
Issue Date: September 2004
DOI: https://doi.org/10.1007/s10723-004-7647-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autonomic Management of Large Clusters and Their Integration into the Grid

Abstract

Access this article

Similar content being viewed by others

Autonomic Computing: Models, Applications, and Brokerage

Managing Large Computer Systems with Change Objects

Self-managed Computer Systems: Foundations and Examples

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Autonomic Management of Large Clusters and Their Integration into the Grid

Abstract

Access this article

Similar content being viewed by others

Autonomic Computing: Models, Applications, and Brokerage

Managing Large Computer Systems with Change Objects

Self-managed Computer Systems: Foundations and Examples

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation