Advertisement

Journal of Grid Computing

, Volume 2, Issue 3, pp 247–260 | Cite as

Autonomic Management of Large Clusters and Their Integration into the Grid

  • Thomas Röblitz
  • Florian Schintke
  • Alexander Reinefeld
  • Olof Bärring
  • Maite Barroso Lopez
  • German Cancio
  • Sylvain Chapeland
  • Karim Chouikh
  • Lionel Cons
  • Piotr Poznański
  • Philippe Defert
  • Jan Iven
  • Thorsten Kleinwort
  • Bernd Panzer-Steindel
  • Jaroslaw Polok
  • Catherine Rafflin
  • Alan Silverman
  • Tim Smith
  • Jan Eldik
  • David Front
  • Massimo Biasotto
  • Cristina Aiftimiei
  • Enrico Ferro
  • Gaetano Maron
  • Andrea Chierici
  • Luca Dell’agnello
  • Marco Serra
  • Michele Michelotto
  • Lord Hess
  • Volker Lindenstruth
  • Frank Pister
  • Timm Morten Steinbeck
  • David Groep
  • Martijn Steenbakkers
  • Oscar Koeroo
  • Wim Som de Cerff
  • Gerben Venekamp
  • Paul Anderson
  • Tim Colles
  • Alexander Holt
  • Alastair Scobie
  • Michael George
  • Andrew Washbrook
  • Rafael A. García Leiva
Article

Abstract

We present a framework for the co-ordinated, autonomic management of multiple clusters in a compute center and their integration into a Grid environment. Site autonomy and the automation of administrative tasks are prime aspects in this framework. The system behavior is continuously monitored in a steering cycle and appropriate actions are taken to resolve any problems.

All presented components have been implemented in the course of the EU project DataGrid: The Lemon monitoring components, the FT fault-tolerance mechanism, the quattor system for software installation and configuration, the RMS job and resource management system, and the Gridification scheme that integrates clusters into the Grid.

Keywords

autonomic computing cluster computing grid computing system management 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    R. Alfieri, “VOMS: An Authorization System for Virtual Organizations”, in Proceedings of the 1st European Across Grids Conference, Santiago de Compostela, Spain, 2003. Google Scholar
  2. 2.
    E. Anderson and D. Patterson, “Extensible, Scalable Monitoring for Clusters of Computers”, in Proceedings of the 11th Systems Administration Conference (LISA’97), San Diego, CA, USA, 1997. Google Scholar
  3. 3.
    P. Anderson and A. Scobie, “LCFG: The Next Generation”, in UKUUG Winter Conference, 2002. Google Scholar
  4. 4.
    S. Bethke, M. Calvetti, H. Hoffmann, D. Jacobs, M. Kasemann and D. Linglin, “Report of the Steering Group of the LHC Computing Review”, Technical Report, CERN European Organization for Nuclear Research, 2001. Google Scholar
  5. 5.
    B. Bode, D. Halstead, R. Kendall and Z. Lei, “The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters”, in USENIX Conference, Atlanta, GA, 2000. Google Scholar
  6. 6.
    M. Burgess, “Cfengine: A Site Configuration Engine”, USENIX Computing Systems, Vol. 8, No. 3, 1995. Google Scholar
  7. 7.
    DataGrid, “EU DataGrid Project Homepage”, 2004. http://www.eu-datagrid.org/
  8. 8.
    I. Foster, C. Kesselman, G. Tsudik and S. Tuecke, “A Security Architecture for Computational Grids”, in Proceedings of the 5th ACM Conference on Computer and Communications Security Conference, San Francisco, CA, USA, pp. 83–92, 1998. Google Scholar
  9. 9.
    A. Frohner, “DataGrid Security Design Report”, Technical Report, EU DataGrid Project, 2003. Google Scholar
  10. 10.
    Hawkeye, “Condor Hawkeye Homepage”, 2004. http://www.cs.wisc.edu/condor/hawkeye/
  11. 11.
    R. Henderson, “Job Scheduling under the Portable Batch System”, in Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science, Vol. 949, pp. 279–294, 1995. Google Scholar
  12. 12.
    S. Kannan, M. Roberts, P. Mayes, D. Brelsford and J. Skovira, Workload Management with LoadLeveler, IBM Redbooks, 2001. Google Scholar
  13. 13.
    A. Keller and A. Reinefeld, “Anatomy of a Resource Management System for HPC Clusters”, in Annual Review of Scalable Computing, Vol. 3, 2001. Google Scholar
  14. 14.
    J.O. Kephart and D.M. Chess, “The Vision of Autonomic Computing”, IEEE Computer, Vol. 36, No. 1, 41–50, 2001. Google Scholar
  15. 15.
    M. Lorch, D.B. Adams, D. Kafura, M.S.R. Koneni, A. Rathi and S. Shah, “The PRIMA System for Privilege Management, Authorization and Enforcement in Grid Environments”, in Proceedings of the 4th International Workshop on Grid Computing – Grid 2003, Phoenix, AR, USA, 2003. Google Scholar
  16. 16.
    OSCAR, “OSCAR Homepage”, 2004. http://oscar.sourceforge.net/
  17. 17.
    P. Papadopoulos, M. Katz and G. Bruno, “NPACI Rocks: Tools and Techniques for Easily Deploying Manageable Linux Clusters”, Concurrency and Computation: Practice and Experience, Vol. 15, Nos. 7–8, 707–725, 2003. Google Scholar
  18. 18.
    Patrol, “Patrol Homepage”, 2004. http://www-d0en.fnal.gov/patrol/patrol_doc.html
  19. 19.
    Performance Co-Pilot, “Performance Co-Pilot Homepage”, 2004. http://oss.sgi.com/projects/pcp/
  20. 20.
    A. Reinefeld and V. Lindenstruth, “How to Build a High-Performance Compute Cluster for the Grid”, in 2nd International Workshop on Metacomputing Systems and Applications (MSA2001), Valencia, Spain, 2001. Google Scholar
  21. 21.
    T. Roeblitz, F. Schintke and A. Reinefeld, “From Clusters to the Fabric: The Job Management Perspective”, in Proceedings of the IEEE International Conference on Cluster Computing (Cluster’03), Hong Kong, China, 2003. Google Scholar
  22. 22.
    F.D. Sacerdoti, M.J. Katz, M.L. Massie and D.E. Culler, “Wide Area Cluster Monitoring with Ganglia”, in Proceedings of the IEEE International Conference on Cluster Computing (Cluster’03), Hong Kong, China, 2003. Google Scholar
  23. 23.
    SGE, “Sun Grid Engine Homepage”, 2004. http://www.sun.com/software/gridware/
  24. 24.
    SNMP, “Simple Network Management Protocol”, 2004. http://www.faqs.org/rfcs/rfc1157.html
  25. 25.
    P. Uthayopas, J. Maneesilp and P. Ingongnam, “SCMS: An Integrated Cluster Management Tool for Beowulf Cluster System”, in Proceedings of the International Conference on Parallel and Distributed Proceeding Techniques and Applications 2000 (PDPTA’2000), Las Vegas, NV, USA, 2000. Google Scholar
  26. 26.
    VACM, “VACM Homepage”, 2004. http://vacm.sourceforge.net/
  27. 27.
    S. Zhou, X. Zheng, J. Wang and P. Delisle, “Utopia: A Load Sharing Facility for Large, Heterogenous Distributed Computer Systems”, Software – Practice & Experience, Vol. 23, No. 12, 1305–1336, 1993. Google Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  • Thomas Röblitz
    • 1
  • Florian Schintke
    • 1
  • Alexander Reinefeld
    • 1
  • Olof Bärring
    • 2
  • Maite Barroso Lopez
    • 2
  • German Cancio
    • 2
  • Sylvain Chapeland
    • 2
  • Karim Chouikh
    • 2
  • Lionel Cons
    • 2
  • Piotr Poznański
    • 2
  • Philippe Defert
    • 2
  • Jan Iven
    • 2
  • Thorsten Kleinwort
    • 2
  • Bernd Panzer-Steindel
    • 2
  • Jaroslaw Polok
    • 2
  • Catherine Rafflin
    • 2
  • Alan Silverman
    • 2
  • Tim Smith
    • 2
  • Jan Eldik
    • 2
  • David Front
    • 3
  • Massimo Biasotto
    • 4
  • Cristina Aiftimiei
    • 4
  • Enrico Ferro
    • 4
  • Gaetano Maron
    • 4
  • Andrea Chierici
    • 5
  • Luca Dell’agnello
    • 5
  • Marco Serra
    • 6
  • Michele Michelotto
    • 7
  • Lord Hess
    • 8
  • Volker Lindenstruth
    • 8
  • Frank Pister
    • 8
  • Timm Morten Steinbeck
    • 8
  • David Groep
    • 9
  • Martijn Steenbakkers
    • 9
  • Oscar Koeroo
    • 9
  • Wim Som de Cerff
    • 9
  • Gerben Venekamp
    • 9
  • Paul Anderson
    • 10
  • Tim Colles
    • 10
  • Alexander Holt
    • 10
  • Alastair Scobie
    • 10
  • Michael George
    • 11
  • Andrew Washbrook
    • 11
  • Rafael A. García Leiva
    • 12
  1. 1.ZIBBerlin DahlemGermany
  2. 2.CERNGeneva-23Switzerland
  3. 3.Weizmann Institute of ScienceRehovotIsrael
  4. 4.INFNLegnaro (Padova)Italy
  5. 5.INFNBolognaItaly
  6. 6.INFNRomaItaly
  7. 7.INFNPadovaItaly
  8. 8.Kirchhoff-Institut für PhysikUniversity of HeidelbergHeidelbergGermany
  9. 9.NIKHEFAmsterdamThe Netherlands
  10. 10.Old CollegeUniversity of EdinburghEdinburghUK
  11. 11.University of LiverpoolLiverpool L69UK
  12. 12.Department of Theoretical PhysicsUniversidad Autónoma de MadridMadridSpain

Personalised recommendations