Skip to main content
Log in

Using Kestrel and XMPP to Support the STAR Experiment in the Cloud

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

This paper presents the results and experiences of adapting and improving the Many-Task Computing (MTC) framework Kestrel for use with bag of tasks applications and the STAR experiment in particular. Kestrel is a lightweight, highly available job scheduling framework for Virtual Organization Clusters (VOCs) constructed in the cloud. Kestrel uses the Extensible Message and Presence Protocol (XMPP) for increasing MTC platform scalability and mitigating faults in Wide Area Network (WAN) communications. Kestrel’s architecture is based upon pilot job frameworks used extensively in Grid computing, with fault-tolerant communications inspired by command-and-control botnets. The extensibility of XMPP has allowed development of protocols for identifying manager nodes, discovering the capabilities of worker agents, and for distributing tasks. Presence notifications provided by XMPP allow Kestrel to monitor the global state of the pool and to perform task dispatching based on worker availability. Since its inception, Kestrel has been modified based on its performance managing operational scientific workloads from the STAR group at Brookhaven National Laboratories. STAR provided a virtual machine image with applications for simulating proton collisions using PYTHIA and GEANT3. A Kestrel-based Virtual Organization Cluster, created on top of Clemson University’s Palmetto cluster, CERN, and Amazon EC2, was able to provide over 400,000 CPU hours of computation over the course of a month using an average of 800 virtual machine instances every day, generating nearly seven terabytes of data and the largest PYTHIA production run that STAR has achieved to date.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adium: http://www.adium.im. Accessed 1 Sept 2009

  2. Pidgin: The universal chat client. http://www.pidgin.im. Accessed 1 Sept 2009

  3. The star experiment. http://www.star.bnl.gov/. Accessed 13 Sept 2010

  4. Xmpp standards foundation. http://xmpp.org. Accessed 13 Sept 2010

  5. Ejabberd: http://www.process-one.net/en/ejabberd/. Accessed 2 Sept 2009

  6. Abraham, L., Murphy, M., Fenn, M., Goasguen, S.: Self-provisioned hybrid clouds. In: Proceeding of the 7th International Conference on Autonomic Computing, pp. 161–168. ACM (2010)

  7. Altunay, M., Avery, P., Blackburn, K., Bockelman, B., Ernst, M., Fraser, D., Quick, R., Gardner, R., Goasguen, S., Levshina, T., Livny, M., McGee, J., Olson, D., Pordes, R., Potekhin, M., Rana, A., Roy, A., Sehgal, C., Sfiligoi, I., Wuerthwein, F.: A science driven production cyberinfrastructure - the open science Grid. J. Grid Comput. 9, 201–218 (2011). ISSN 1570–7873. doi:10.1007/s10723-010-9176-6

    Article  Google Scholar 

  8. Anderson, D.: BOINC: a system for public-resource computing and storage. In: Pproceedings of the 5th IEEE/ACM International Workshop on Grid Computing, pp. 4–10. IEEE Computer Society (2004). ISBN 0769522564

  9. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: Nineteenth ACM Symposium on Operating Systems Principles (2003)

  10. Bégin, M.: An EGEE comparative study: Grids and Clouds-evolution or revolution. In: 23rd Open Grid Forum (OGF23) (2008)

  11. Crockford, D.: Introducing json. json.org. Accessed 26 Aug 2009

  12. Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Patil, S., Su, M., Vahi, K., Livny, M.: Pegasus: mapping scientific workflows onto the Grid. In: Grid Computing, pp. 131–140. Springer (2004)

  13. Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: International Conference for High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008. pp. 1–12. IEEE (2009)

  14. Douglas Thain, M.L., Tannenbaum, T.: How to measure a large open source distributed system. Concurr. Comput.: Pract. Exp. 18(15), 1989–2019 (2006)

    Article  Google Scholar 

  15. Estrada, T., Taufer, M., Anderson, D.: Performance prediction and analysis of boinc projects: an empirical study with emboinc. J. Grid Comput. 7, 537–554 (2009). ISSN 1570–7873. doi:10.1007/s10723-009-9126-3

    Article  Google Scholar 

  16. Figueiredo, R.J., Dinda, P.A., Fortes, J.A.B: A case for Grid computing on virtual machines. In: 23rd International Conference on Distributed Computing Systems (2003)

  17. Foster, I., Kesselman, C.: Globus: a metacomputing infrastructure toolkit. Int. J. Supercomput. Appl. 11(2), 115–128 (1997)

    Article  Google Scholar 

  18. Foster, I., Zhao, Y., Raicu, I., Lu, S.: Cloud computing and Grid computing 360-degree compared. In: Grid Computing Environments Workshop, 2008. GCE’08, pp. 1–10. IEEE (2009)

  19. Fritz, N., Stout, L.: Sleekxmpp. http://github.com/fritzy/SleekXMPP. Accessed 2 Sept 2009

  20. Ganguly, A., Agrawal, A., Boykin, P.O., Figueiredo, R.: IP over P2P: enabling self-configuring virtual IP networks for Grid computing. In: 20th International Parallel and Distributed Processing Symposium, (IPDPS 2006) (2006). doi:10.1109/IPDPS.2006.1639287

  21. Ganguly, A., Agrawal, A., Boykin, P.O., Figueiredo, R.: WOW: self-organizing wide area overlay networks of virtual workstations. In: 15th IEEE International Symposium on High Performance Distributed Computing (2006). doi:10.1109/HPDC.2006.1652133

  22. Hipp, R.: Sqlite. http://www.sqlite.org. Accessed 10 Nov 2010

  23. Joe Hildebrand, R.E., Millard, P., Saint-Andre, P.: Xep-0030: service discovery. http://xmpp.org/extensions/xep-0030.html. Accessed 13 Sept 2010

  24. Joe Hildebrand, R.T., Saint-Andre, P., Konieczny, J.: Xep-0115: entity capabilities. http://xmpp.org/extensions/xep-0115.html. Accessed 13 Sept 2010

  25. Keahey, K., Freeman, T.: Contextualization: providing one-click virtual clusters. In: 4th IEEE International Conference on e-Science. Indianapolis, IN (2008)

  26. Maeno, T.: PanDA: distributed production and distributed analysis system for ATLAS. In: Journal of Physics: Conference Series, vol. 119, pp. 062036. IOP Publishing (2008)

  27. Moffitt, J.: Thoughts on scalable xmpp bots. (2008). http://metajack.im/2008/08/04/-thoughts-on-scalable-xmpp-bots. Accessed 26 Aug 2009

  28. Murphy, M.A., Fenn, M., Goasguen, S.: Virtual organization clusters. In: 17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2009). Weimar, Germany (2009)

  29. Murphy, M.A., Abraham, L., Fenn, M., Goasguen, S.: Autonomic clouds on the Grid. J. Grid Comput 8(1), 1–18 (2010). doi:10.1007/s10723-009-9142-3

    Article  Google Scholar 

  30. Nitzberg, B., Schopf, J.M., Jones, J.P.: Grid resource management. In: Nabrzyski, J., Schopf, J.M., Weglarz, J. (eds.) PBS Pro: Grid computing and scheduling attributes, pp. 183–190. Kluwer Academic Publishers, Norwell (2004)

    Google Scholar 

  31. Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., Zagorodnov, D.: The eucalyptus open-source cloud-computing system. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid-Volume 00, pp. 124–131. IEEE Computer Society (2009)

  32. Oinn, T., Addis, M., Ferris, J.,Marvin, D., Greenwood, M., Carver, T., Pocock,M., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)

    Article  Google Scholar 

  33. Raicu, I., Zhao, Y., Dumitrescu, C., Foster, I., Wilde, M.: Falkon: a fast and light-weight task executiON framework. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC’07, pp. 43:1–43:12. ACM, New York (2007)

    Google Scholar 

  34. Ruth, P., Jiang, X., Xu, D., Goasguen, S.: Virtual distributed environments in a shared infrastructure. Computer 38(5), 63–69 (2005)

    Article  Google Scholar 

  35. Saint-Andre, P.: Extensible messaging and presence protocol (xmpp): core. (2004). http://www.ietf.org/rfc/rfc3920.txt. Accessed 1 Sept 2009

  36. Saint-Andre, P.: Extensible messaging and presence protocol (xmpp): instant messaging and presence (2004). http://www.ietf.org/rfc/rfc3921.txt. Accessed 1 Sept 2009

  37. Saint-Andre, P.: Xep-0134: Xmpp design guidelines. http://xmpp.org/extensions/xep-0134.html. Accessed 13 Sept 2010

  38. Sanfilippo, S.: Redis. http://code.google.com/p/redis/. Accessed 8 Nov 2010

  39. Schmidt, M., Fallenbeck, N., Smith, M., Freisleben, B.: Efficient distribution of virtual machines for cloud computing. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 567–574. IEEE (2010)

  40. Sfiligoi, I.: glideinWMS a generic pilot-based workload management system. In: Journal of Physics: Conference Series, vol. 119, pp. 062044. IOP Publishing (2008)

  41. Sotomayor, B., Montero, R., Llorente, I., Foster, I:. Virtual infrastructure management in private and hybrid clouds. IEEE Internet Comput. 13(5), 14–22 (2009)

    Article  Google Scholar 

  42. Stout, L., Murphy, M.A., Goasguen, S.: Kestrel: an xmpp-based framework for many task computing applications. In: MTAGS ’09: Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, pp. 1–6. New York, NY, USA (2009). ACM. ISBN 978-1-60558-714-1. doi:10.1145/1646468.1646479

  43. Stout, L., Fenn, M., Murphy, M.A., Goasguen, S.: Scaling virtual organization clusters over a wide area network using the kestrel workload management system. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10, pp. 692–698. New York, NY, USA (2010). ACM. ISBN 978-1-60558-942-8. doi:10.1145/1851476.1851578

  44. Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Condor – a distributed job scheduler. In: Sterling, T. (ed.) Beowulf Cluster Computing with linux. MIT Press (2001)

  45. Tsaregorodtsev, A., Garonne, V., Stokes-Rees, I.: DIRAC: a scalable lightweight architecture for high throughput computing. In: Fifth IEEE/ACM International Workshop on Grid Computing, (GRID ’04). Pittsburgh, PA (2004)

  46. Tsugawa, M., Fortes, J.A.B.: A virtual network (ViNe) architecture for Grid computing. In: 20th International Parallel and Distributed Processing Symposium, (IPDPS 2006) (2006). doi:10.1109/IPDPS.2006.1639380

  47. Vaquero, L., Rodero-Merino, L., Caceres, J., and Lindner, M.: A break in the clouds: towards a cloud definition. ACM SIGCOMM Comput. Commun. Rev. 39(1), 50–55 (2008)

    Article  Google Scholar 

  48. Wagener, J., Spjuth, O., Willighagen, E., Wikberg, J.: XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services. BMC Bioinformatics 10(1), 279 (2009). ISSN 1471-2105

    Article  Google Scholar 

  49. Weis, G., Lewis, A.: Using xmpp for ad-hoc Grid computing - an application example using parallel ant colony optimisation. In: IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2009, pp. 1–4 (2009). doi:10.1109/IPDPS.2009.5161115

  50. Wilde, M., Foster, I., Iskra, K., Beckman, P., Zhang, Z., Espinosa, A., Hategan, M., Clifford, B., Raicu, I.: Parallel socripting for applications at the petascale and beyond. Comput. 42(11), 50–60 (2009). ISSN 0018-9162

    Article  Google Scholar 

  51. Wu, Q., Zhu, M., Gu, Y., Brown, P., Lu, X., Lin, W., Liu, Y.: A distributed workflow management system with case study of real-life scientific applications on Grids. J. Grid Comput. 10, 367–393 (2012). ISSN 1570-7873. doi:10.1007/s10723-012-9222-7

    Article  Google Scholar 

  52. Yasin, R.: DOD wants instant messaging tools to speak the same language (2010). http://defensesystems.com/articles/2010/-06/23/-dod-instant-messaging-test.aspx. Accessed 10 Nov 2010

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael A. Murphy.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Stout, L., Walker, M., Lauret, J. et al. Using Kestrel and XMPP to Support the STAR Experiment in the Cloud. J Grid Computing 11, 249–264 (2013). https://doi.org/10.1007/s10723-013-9253-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-013-9253-8

Keywords

Navigation