TDP_SHELL: An Interoperability Framework for Resource Management Systems and Run-Time Monitoring Tools
Resource management systems and tool support are two important factors for efficiently developing applications in large clusters. On the one hand, management systems (in the form of batch queue systems) are responsible for all issues related to executing jobs on the existing machines. On the other hand, run-time tools (in the form of debuggers, tracers, performance analyzers, etc.) are used to guarantee the correctness and the efficiency of execution. Executing an application under the control of both a resource management system and a run-time tool is still a challenging problem in most cases. Using run-time tools might be difficult or even impossible in usual environments due to the restrictions imposed by resource managers. We propose TDP-Shell as a framework for providing the necessary mechanisms to enable and simplify using run-time tools under a specific resource management system. We have analyzed the essential interactions between common run-time tools and resource management systems and implemented a pilot TDP-Shell. The paper describes the main components of TDP-Shell and its use with some illustrative examples.
KeywordsResource Management Attribute Space Resource Management System Tuple Space Remote Machine
Unable to display preview. Download preview PDF.
- 1.Sterling, T., Messina, P., Pool, J.: Findings of the second Pasadena Workshop on system software and tools for high performance computing environments. Tech. Report 95-162, Center of Exc. in Space Data and Inform. Sciences, NASA (1995E)Google Scholar
- 2.Johnsen, S., Anshus, O.J., Bjørndalen, J.M., Bongo, L.A.: Survey of execution monitoring tools for computer clusters, Tech. Report, Univ. of Tromso (September 2003)Google Scholar
- 3.Mutka, M.J., Livny, M., Litzkow, M.W.: Condor – A Hunter of Idle Workstations. In: 8th Int’l Conf. on Distributed Systems, June 1988, San Francisco (1988)Google Scholar
- 5.Wismuller, R., Trinitis, J., Ludwig, T.: OCM-A Monitoring System for Interiperable Tools. In: Proc. 2nd SIGMETRICS Symposium on Parallel and Distrubuted Tools, August 1998, Welches, USA (1998)Google Scholar
- 7.Rackl, G., Lindermeier, M., Rudorfer, M., Süss, B.: MIMO-An Infraestructure for Monitoring and Managing Distributed Middleware Environments. In: Proc. Middleware 2000, pp. 71–87 (2000)Google Scholar
- 9.Miller, B., Cortes, A., Senar, M.A., Livny, M.: The Tool Daemon Protocol (TDP). In: Proc. SuperComputing (November 2003)Google Scholar
- 10.Etnus, L.L.C.: TotalView User’s Guide. Document version 6.0.0-1 (January 2003), http://www.etnus.com
- 12.Miller, B.P., et al.: The Paradyn Parallel Performance Measurement Tools. IEEE Computer 28 11 (1995)Google Scholar