Abstract
This paper presents the results and experiences of adapting and improving the Many-Task Computing (MTC) framework Kestrel for use with bag of tasks applications and the STAR experiment in particular. Kestrel is a lightweight, highly available job scheduling framework for Virtual Organization Clusters (VOCs) constructed in the cloud. Kestrel uses the Extensible Message and Presence Protocol (XMPP) for increasing MTC platform scalability and mitigating faults in Wide Area Network (WAN) communications. Kestrel’s architecture is based upon pilot job frameworks used extensively in Grid computing, with fault-tolerant communications inspired by command-and-control botnets. The extensibility of XMPP has allowed development of protocols for identifying manager nodes, discovering the capabilities of worker agents, and for distributing tasks. Presence notifications provided by XMPP allow Kestrel to monitor the global state of the pool and to perform task dispatching based on worker availability. Since its inception, Kestrel has been modified based on its performance managing operational scientific workloads from the STAR group at Brookhaven National Laboratories. STAR provided a virtual machine image with applications for simulating proton collisions using PYTHIA and GEANT3. A Kestrel-based Virtual Organization Cluster, created on top of Clemson University’s Palmetto cluster, CERN, and Amazon EC2, was able to provide over 400,000 CPU hours of computation over the course of a month using an average of 800 virtual machine instances every day, generating nearly seven terabytes of data and the largest PYTHIA production run that STAR has achieved to date.
Similar content being viewed by others
References
Adium: http://www.adium.im. Accessed 1 Sept 2009
Pidgin: The universal chat client. http://www.pidgin.im. Accessed 1 Sept 2009
The star experiment. http://www.star.bnl.gov/. Accessed 13 Sept 2010
Xmpp standards foundation. http://xmpp.org. Accessed 13 Sept 2010
Ejabberd: http://www.process-one.net/en/ejabberd/. Accessed 2 Sept 2009
Abraham, L., Murphy, M., Fenn, M., Goasguen, S.: Self-provisioned hybrid clouds. In: Proceeding of the 7th International Conference on Autonomic Computing, pp. 161–168. ACM (2010)
Altunay, M., Avery, P., Blackburn, K., Bockelman, B., Ernst, M., Fraser, D., Quick, R., Gardner, R., Goasguen, S., Levshina, T., Livny, M., McGee, J., Olson, D., Pordes, R., Potekhin, M., Rana, A., Roy, A., Sehgal, C., Sfiligoi, I., Wuerthwein, F.: A science driven production cyberinfrastructure - the open science Grid. J. Grid Comput. 9, 201–218 (2011). ISSN 1570–7873. doi:10.1007/s10723-010-9176-6
Anderson, D.: BOINC: a system for public-resource computing and storage. In: Pproceedings of the 5th IEEE/ACM International Workshop on Grid Computing, pp. 4–10. IEEE Computer Society (2004). ISBN 0769522564
Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Nineteenth ACM Symposium on Operating Systems Principles (2003)
Bégin, M.: An EGEE comparative study: Grids and Clouds-evolution or revolution. In: 23rd Open Grid Forum (OGF23) (2008)
Crockford, D.: Introducing json. json.org. Accessed 26 Aug 2009
Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Patil, S., Su, M., Vahi, K., Livny, M.: Pegasus: mapping scientific workflows onto the Grid. In: Grid Computing, pp. 131–140. Springer (2004)
Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: International Conference for High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. pp. 1–12. IEEE (2009)
Douglas Thain, M.L., Tannenbaum, T.: How to measure a large open source distributed system. Concurr. Comput.: Pract. Exp. 18(15), 1989–2019 (2006)
Estrada, T., Taufer, M., Anderson, D.: Performance prediction and analysis of boinc projects: an empirical study with emboinc. J. Grid Comput. 7, 537–554 (2009). ISSN 1570–7873. doi:10.1007/s10723-009-9126-3
Figueiredo, R.J., Dinda, P.A., Fortes, J.A.B: A case for Grid computing on virtual machines. In: 23rd International Conference on Distributed Computing Systems (2003)
Foster, I., Kesselman, C.: Globus: a metacomputing infrastructure toolkit. Int. J. Supercomput. Appl. 11(2), 115–128 (1997)
Foster, I., Zhao, Y., Raicu, I., Lu, S.: Cloud computing and Grid computing 360-degree compared. In: Grid Computing Environments Workshop, 2008. GCE’08, pp. 1–10. IEEE (2009)
Fritz, N., Stout, L.: Sleekxmpp. http://github.com/fritzy/SleekXMPP. Accessed 2 Sept 2009
Ganguly, A., Agrawal, A., Boykin, P.O., Figueiredo, R.: IP over P2P: enabling self-configuring virtual IP networks for Grid computing. In: 20th International Parallel and Distributed Processing Symposium, (IPDPS 2006) (2006). doi:10.1109/IPDPS.2006.1639287
Ganguly, A., Agrawal, A., Boykin, P.O., Figueiredo, R.: WOW: self-organizing wide area overlay networks of virtual workstations. In: 15th IEEE International Symposium on High Performance Distributed Computing (2006). doi:10.1109/HPDC.2006.1652133
Hipp, R.: Sqlite. http://www.sqlite.org. Accessed 10 Nov 2010
Joe Hildebrand, R.E., Millard, P., Saint-Andre, P.: Xep-0030: service discovery. http://xmpp.org/extensions/xep-0030.html. Accessed 13 Sept 2010
Joe Hildebrand, R.T., Saint-Andre, P., Konieczny, J.: Xep-0115: entity capabilities. http://xmpp.org/extensions/xep-0115.html. Accessed 13 Sept 2010
Keahey, K., Freeman, T.: Contextualization: providing one-click virtual clusters. In: 4th IEEE International Conference on e-Science. Indianapolis, IN (2008)
Maeno, T.: PanDA: distributed production and distributed analysis system for ATLAS. In: Journal of Physics: Conference Series, vol. 119, pp. 062036. IOP Publishing (2008)
Moffitt, J.: Thoughts on scalable xmpp bots. (2008). http://metajack.im/2008/08/04/-thoughts-on-scalable-xmpp-bots. Accessed 26 Aug 2009
Murphy, M.A., Fenn, M., Goasguen, S.: Virtual organization clusters. In: 17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2009). Weimar, Germany (2009)
Murphy, M.A., Abraham, L., Fenn, M., Goasguen, S.: Autonomic clouds on the Grid. J. Grid Comput 8(1), 1–18 (2010). doi:10.1007/s10723-009-9142-3
Nitzberg, B., Schopf, J.M., Jones, J.P.: Grid resource management. In: Nabrzyski, J., Schopf, J.M., Weglarz, J. (eds.) PBS Pro: Grid computing and scheduling attributes, pp. 183–190. Kluwer Academic Publishers, Norwell (2004)
Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., Zagorodnov, D.: The eucalyptus open-source cloud-computing system. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid-Volume 00, pp. 124–131. IEEE Computer Society (2009)
Oinn, T., Addis, M., Ferris, J.,Marvin, D., Greenwood, M., Carver, T., Pocock,M., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)
Raicu, I., Zhao, Y., Dumitrescu, C., Foster, I., Wilde, M.: Falkon: a fast and light-weight task executiON framework. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC’07, pp. 43:1–43:12. ACM, New York (2007)
Ruth, P., Jiang, X., Xu, D., Goasguen, S.: Virtual distributed environments in a shared infrastructure. Computer 38(5), 63–69 (2005)
Saint-Andre, P.: Extensible messaging and presence protocol (xmpp): core. (2004). http://www.ietf.org/rfc/rfc3920.txt. Accessed 1 Sept 2009
Saint-Andre, P.: Extensible messaging and presence protocol (xmpp): instant messaging and presence (2004). http://www.ietf.org/rfc/rfc3921.txt. Accessed 1 Sept 2009
Saint-Andre, P.: Xep-0134: Xmpp design guidelines. http://xmpp.org/extensions/xep-0134.html. Accessed 13 Sept 2010
Sanfilippo, S.: Redis. http://code.google.com/p/redis/. Accessed 8 Nov 2010
Schmidt, M., Fallenbeck, N., Smith, M., Freisleben, B.: Efficient distribution of virtual machines for cloud computing. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 567–574. IEEE (2010)
Sfiligoi, I.: glideinWMS a generic pilot-based workload management system. In: Journal of Physics: Conference Series, vol. 119, pp. 062044. IOP Publishing (2008)
Sotomayor, B., Montero, R., Llorente, I., Foster, I:. Virtual infrastructure management in private and hybrid clouds. IEEE Internet Comput. 13(5), 14–22 (2009)
Stout, L., Murphy, M.A., Goasguen, S.: Kestrel: an xmpp-based framework for many task computing applications. In: MTAGS ’09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, pp. 1–6. New York, NY, USA (2009). ACM. ISBN 978-1-60558-714-1. doi:10.1145/1646468.1646479
Stout, L., Fenn, M., Murphy, M.A., Goasguen, S.: Scaling virtual organization clusters over a wide area network using the kestrel workload management system. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10, pp. 692–698. New York, NY, USA (2010). ACM. ISBN 978-1-60558-942-8. doi:10.1145/1851476.1851578
Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Condor – a distributed job scheduler. In: Sterling, T. (ed.) Beowulf Cluster Computing with linux. MIT Press (2001)
Tsaregorodtsev, A., Garonne, V., Stokes-Rees, I.: DIRAC: a scalable lightweight architecture for high throughput computing. In: Fifth IEEE/ACM International Workshop on Grid Computing, (GRID ’04). Pittsburgh, PA (2004)
Tsugawa, M., Fortes, J.A.B.: A virtual network (ViNe) architecture for Grid computing. In: 20th International Parallel and Distributed Processing Symposium, (IPDPS 2006) (2006). doi:10.1109/IPDPS.2006.1639380
Vaquero, L., Rodero-Merino, L., Caceres, J., and Lindner, M.: A break in the clouds: towards a cloud definition. ACM SIGCOMM Comput. Commun. Rev. 39(1), 50–55 (2008)
Wagener, J., Spjuth, O., Willighagen, E., Wikberg, J.: XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services. BMC Bioinformatics 10(1), 279 (2009). ISSN 1471-2105
Weis, G., Lewis, A.: Using xmpp for ad-hoc Grid computing - an application example using parallel ant colony optimisation. In: IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–4 (2009). doi:10.1109/IPDPS.2009.5161115
Wilde, M., Foster, I., Iskra, K., Beckman, P., Zhang, Z., Espinosa, A., Hategan, M., Clifford, B., Raicu, I.: Parallel socripting for applications at the petascale and beyond. Comput. 42(11), 50–60 (2009). ISSN 0018-9162
Wu, Q., Zhu, M., Gu, Y., Brown, P., Lu, X., Lin, W., Liu, Y.: A distributed workflow management system with case study of real-life scientific applications on Grids. J. Grid Comput. 10, 367–393 (2012). ISSN 1570-7873. doi:10.1007/s10723-012-9222-7
Yasin, R.: DOD wants instant messaging tools to speak the same language (2010). http://defensesystems.com/articles/2010/-06/23/-dod-instant-messaging-test.aspx. Accessed 10 Nov 2010
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Stout, L., Walker, M., Lauret, J. et al. Using Kestrel and XMPP to Support the STAR Experiment in the Cloud. J Grid Computing 11, 249–264 (2013). https://doi.org/10.1007/s10723-013-9253-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-013-9253-8