Skip to main content
Log in

Autonomic Management of Large Clusters and Their Integration into the Grid

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

We present a framework for the co-ordinated, autonomic management of multiple clusters in a compute center and their integration into a Grid environment. Site autonomy and the automation of administrative tasks are prime aspects in this framework. The system behavior is continuously monitored in a steering cycle and appropriate actions are taken to resolve any problems.

All presented components have been implemented in the course of the EU project DataGrid: The Lemon monitoring components, the FT fault-tolerance mechanism, the quattor system for software installation and configuration, the RMS job and resource management system, and the Gridification scheme that integrates clusters into the Grid.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Alfieri, “VOMS: An Authorization System for Virtual Organizations”, in Proceedings of the 1st European Across Grids Conference, Santiago de Compostela, Spain, 2003.

  2. E. Anderson and D. Patterson, “Extensible, Scalable Monitoring for Clusters of Computers”, in Proceedings of the 11th Systems Administration Conference (LISA’97), San Diego, CA, USA, 1997.

  3. P. Anderson and A. Scobie, “LCFG: The Next Generation”, in UKUUG Winter Conference, 2002.

  4. S. Bethke, M. Calvetti, H. Hoffmann, D. Jacobs, M. Kasemann and D. Linglin, “Report of the Steering Group of the LHC Computing Review”, Technical Report, CERN European Organization for Nuclear Research, 2001.

  5. B. Bode, D. Halstead, R. Kendall and Z. Lei, “The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters”, in USENIX Conference, Atlanta, GA, 2000.

  6. M. Burgess, “Cfengine: A Site Configuration Engine”, USENIX Computing Systems, Vol. 8, No. 3, 1995.

  7. DataGrid, “EU DataGrid Project Homepage”, 2004. http://www.eu-datagrid.org/

  8. I. Foster, C. Kesselman, G. Tsudik and S. Tuecke, “A Security Architecture for Computational Grids”, in Proceedings of the 5th ACM Conference on Computer and Communications Security Conference, San Francisco, CA, USA, pp. 83–92, 1998.

  9. A. Frohner, “DataGrid Security Design Report”, Technical Report, EU DataGrid Project, 2003.

  10. Hawkeye, “Condor Hawkeye Homepage”, 2004. http://www.cs.wisc.edu/condor/hawkeye/

  11. R. Henderson, “Job Scheduling under the Portable Batch System”, in Job Scheduling Strategies for Parallel Processing, Lecture Notes in Computer Science, Vol. 949, pp. 279–294, 1995.

  12. S. Kannan, M. Roberts, P. Mayes, D. Brelsford and J. Skovira, Workload Management with LoadLeveler, IBM Redbooks, 2001.

  13. A. Keller and A. Reinefeld, “Anatomy of a Resource Management System for HPC Clusters”, in Annual Review of Scalable Computing, Vol. 3, 2001.

  14. J.O. Kephart and D.M. Chess, “The Vision of Autonomic Computing”, IEEE Computer, Vol. 36, No. 1, 41–50, 2001.

    Google Scholar 

  15. M. Lorch, D.B. Adams, D. Kafura, M.S.R. Koneni, A. Rathi and S. Shah, “The PRIMA System for Privilege Management, Authorization and Enforcement in Grid Environments”, in Proceedings of the 4th International Workshop on Grid Computing – Grid 2003, Phoenix, AR, USA, 2003.

  16. OSCAR, “OSCAR Homepage”, 2004. http://oscar.sourceforge.net/

  17. P. Papadopoulos, M. Katz and G. Bruno, “NPACI Rocks: Tools and Techniques for Easily Deploying Manageable Linux Clusters”, Concurrency and Computation: Practice and Experience, Vol. 15, Nos. 7–8, 707–725, 2003.

    Google Scholar 

  18. Patrol, “Patrol Homepage”, 2004. http://www-d0en.fnal.gov/patrol/patrol_doc.html

  19. Performance Co-Pilot, “Performance Co-Pilot Homepage”, 2004. http://oss.sgi.com/projects/pcp/

  20. A. Reinefeld and V. Lindenstruth, “How to Build a High-Performance Compute Cluster for the Grid”, in 2nd International Workshop on Metacomputing Systems and Applications (MSA2001), Valencia, Spain, 2001.

  21. T. Roeblitz, F. Schintke and A. Reinefeld, “From Clusters to the Fabric: The Job Management Perspective”, in Proceedings of the IEEE International Conference on Cluster Computing (Cluster’03), Hong Kong, China, 2003.

  22. F.D. Sacerdoti, M.J. Katz, M.L. Massie and D.E. Culler, “Wide Area Cluster Monitoring with Ganglia”, in Proceedings of the IEEE International Conference on Cluster Computing (Cluster’03), Hong Kong, China, 2003.

  23. SGE, “Sun Grid Engine Homepage”, 2004. http://www.sun.com/software/gridware/

  24. SNMP, “Simple Network Management Protocol”, 2004. http://www.faqs.org/rfcs/rfc1157.html

  25. P. Uthayopas, J. Maneesilp and P. Ingongnam, “SCMS: An Integrated Cluster Management Tool for Beowulf Cluster System”, in Proceedings of the International Conference on Parallel and Distributed Proceeding Techniques and Applications 2000 (PDPTA’2000), Las Vegas, NV, USA, 2000.

  26. VACM, “VACM Homepage”, 2004. http://vacm.sourceforge.net/

  27. S. Zhou, X. Zheng, J. Wang and P. Delisle, “Utopia: A Load Sharing Facility for Large, Heterogenous Distributed Computer Systems”, Software – Practice & Experience, Vol. 23, No. 12, 1305–1336, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This work from the EU DataGrid project was funded by the European Commission grant IST-2000-25182.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Röblitz, T., Schintke, F., Reinefeld, A. et al. Autonomic Management of Large Clusters and Their Integration into the Grid. J Grid Computing 2, 247–260 (2004). https://doi.org/10.1007/s10723-004-7647-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-004-7647-3

Keywords

Navigation