Journal of Grid Computing

, Volume 11, Issue 3, pp 341–360 | Cite as

JETS: Language and System Support for Many-Parallel-Task Workflows

  • Justin M. Wozniak
  • Michael Wilde
  • Daniel S. Katz
Article

Abstract

Many-task computing is a well-established paradigm for implementing loosely coupled applications (tasks) on large-scale computing systems. However, few of the model’s existing implementations provide efficient, low-latency support for executing tasks that are tightly coupled multiprocessing applications. Thus, a vast array of parallel applications cannot readily be used effectively within many-task workloads. In this work, we present JETS, a middleware component that provides high performance support for many-parallel-task computing (MPTC). JETS is based on a highly concurrent approach to parallel task dispatch and on new capabilities now available in the MPICH2 MPI implementation and the ZeptoOS Linux operating system. JETS represents an advance over the few known examples of multilevel many-parallel-task scheduling systems: it more efficiently schedules and launches many short-duration parallel application invocations; it overcomes the challenges of coupling the user processes of each multiprocessing application invocation via the messaging fabric; and it concurrently manages many application executions in various stages. We report here on the JETS architecture and its performance on both synthetic benchmarks and an MPTC application in molecular dynamics.

Keywords

MPI MTC MPTC Swift JETS NAMD Workflow 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abramson, D., Giddy, J., Kotler, L.: High performance parametric modeling with Nimrod/G: killer application for the global Grid. In: Proc. International Parallel and Distributed Processing Symposium (2000)Google Scholar
  2. 2.
    Armstrong, T.G., Zhang, Z., Katz, D.S., Wilde, M., Foster, I.T.: Scheduling many-task workloads on supercomputers: dealing with trailing tasks. In: Proc. MTAGS Workshop at SC’10 (2010)Google Scholar
  3. 3.
    Berman, F., Wolski, R., Casanova, H., Cirne, W., Dail, H., Faerman, M., Figueira, S., Hayes, J., Obertelli, G., Schopf, J., Shao, G., Smallen, S., Spring, N., Su, A., Zagorodnov, D.: Adaptive computing on the Grid using AppLeS. IEEE Trans. Parallel Distrib. Syst. 14(4), 369–382 (2003)CrossRefGoogle Scholar
  4. 4.
    Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T., Spies, J., Estabrook, R., Kenny, S., Bates, T., Mehta, P., Fox, J.: OpenMx: an open source extended structural equation modeling framework. Psychometrika 76(2), 306–317 (2011)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Budnik, T., Knudson, B., Megerian, M., Miller, S., Mundy, M., Stockdell, W.: Blue Gene/Q resource management architecture. In: Proc. Workshop on Many-Task Computing on Grids and Supercomputers (2010)Google Scholar
  6. 6.
    Chakraborty, P., Jha, S., Katz, D.S.: Novel submission modes for tightly coupled jobs across distributed resources for reduced time-to-solution. Phil. Trans. R. Soc. A, Math. Phys. Eng. Sci. 367(1897), 2545–2556 (2009)Google Scholar
  7. 7.
    Chiu, P.-H., Potekhin, M.: Pilot factory—a Condor-based system for scalable pilot job generation in the Panda WMS framework. J. Phys. Conf. Ser. 219, 062041 (2011)CrossRefGoogle Scholar
  8. 8.
    Cobalt web site. http://trac.mcs.anl.gov/projects/cobalt. Accessed 30 May 2013
  9. 9.
    Cray Inc. Workload Management and Application Placement for the Cray Linux Environment: Document number S–2496–3103. Cray Inc., Chippewa Falls, WI, USA (2011)Google Scholar
  10. 10.
    Czajkowski, K., Foster, I., Karonis, N., Kesselman, C., Martin, S., Smith, W., Tuecke, S.: A resource management architecture for metacomputing systems. Lect. Notes Comput. Sci. 1459, 62–82 (1998)CrossRefGoogle Scholar
  11. 11.
    DeBartolo, J., Hocky, G., Wilde, M., Xu, J., Freed, K.F., Sosnick, T.R.: Protein structure prediction enhanced with evolutionary diversity: speed. Protein Sci. 19(3), 520–534 (2010)Google Scholar
  12. 12.
    Dinan, J., Krishnamoorthy, S., Larkins, D.B., Nieplocha, J., Sadayappan, P.: Scioto: a framework for global-view task parallelism. In: Intl. Conf. on Parallel Processing, pp. 586–593 (2008)Google Scholar
  13. 13.
    Fedorov, A., Clifford, B., Warfield, S.K., Kikinis, R., Chrisochoides, N.: Non-rigid registration for image-guided neurosurgery on the TeraGrid: a case study. Technical Report WM-CS-2009-05, College of William and Mary (2009)Google Scholar
  14. 14.
    Foley, S.S., Elwasif, W.R., Shet, A.G., Bernholdt, D.E., Bramley, R.: Incorporating concurrent component execution in loosely coupled integrated fusion plasma simulation. In: Component-Based High-Performance Computing 2008 (2008)Google Scholar
  15. 15.
    Foster, I.: What is the Grid? A three point checklist. GRIDToday 1(6) (2002)Google Scholar
  16. 16.
    Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure, 1st edn. Morgan Kaufmann (1999)Google Scholar
  17. 17.
    Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: Condor-G: a computation management agent for multi-institutional Grids. Cluster Comput. 5(3), 237–246 (2002)CrossRefGoogle Scholar
  18. 18.
    Hasson, U., Skipper, J.I., Wilde, M.J., Nusbaum, H.C., Small, S.L.: Improving the analysis, storage and sharing of neuroimaging data using relational databases and distributed computing. NeuroImage 39(2), 693–706 (2008)CrossRefGoogle Scholar
  19. 19.
    Hategan, M., Wozniak, J.M., Maheshwari, K.: Coasters: uniform resource provisioning and access for scientific computing on clouds and Grids. In: Proc. Utility and Cloud Computing (2011)Google Scholar
  20. 20.
    Henderson, R.L., Tweten, D.: Portable batch system: requirement specification. Technical report, NAS Systems Division, NASA Ames Research Center (1998)Google Scholar
  21. 21.
    Hocky, G., Wilde, M., DeBartolo, J., Hategan, M., Foster, I., Sosnick, T.R., Freed, K.F.: Towards petascale ab initio protein folding through parallel scripting. Technical Report ANL/MCS-P1612-0409, Argonne National Laboratory (2009)Google Scholar
  22. 22.
    Kenny, S., Andric, M., Boker, S.M., Neale, M.C., Wilde, M., Small, S.L.: Parallel workflows for data-driven structural equation modeling in functional neuroimaging. Front. Neuroinform. 3(34) (2009). doi:10.3389%2Fneuro.11.034.2009
  23. 23.
    Kernighan, B.W., Pike, R.: The UNIX Programming Environment. Prentice Hall (1984)Google Scholar
  24. 24.
    Lee, S., Chen, Y., Luo, H., Wu, A.A., Wilde, M., Schumacker, P.T., Zhao, Y.: The first global screening of protein substrates bearing protein-bound 3,4-dihydroxyphenylalanine in Escherichia coli and human mitochondria. J. Proteome Res. 9(11), 5705–5714 (2010)CrossRefGoogle Scholar
  25. 25.
    Litzkow, M., Livny, M., Mutka, M.: Condor—a hunter of idle workstations. In: Proc. International Conference of Distributed Computing Systems (1988)Google Scholar
  26. 26.
    Luckow, A., Lacinski, L., Jha, S.: SAGA BigJob: an extensible and interoperable pilot-job abstraction for distributed applications and systems. In: Proc. CCGrid (2010)Google Scholar
  27. 27.
    Lusk, E.L., Pieper, S.C., Butler, R.M.: More scalability, less pain: a simple programming model and its implementation for extreme computing. SciDAC Rev. 17, 992056 (2010)Google Scholar
  28. 28.
    MPICH web site. http://www.mpich.org. Accessed 30 May 2013
  29. 29.
    Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global arrays: a nonuniform memory access programming model for high-performance computers. J. Supercomputing 10(2), 1–17 (1996)CrossRefGoogle Scholar
  30. 30.
    NMA structure in the Protein Data Bank. http://www.rcsb.org/pdb/ligand/ligandsummary.do?hetId=NMA. Accessed 30 May 2013
  31. 31.
    OpenSSH web site. http://www.openssh.com. Accessed 30 May 2013
  32. 32.
    Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kalé, L., Schulten, K.: Scalable molecular dynamics with NAMD. J. Comput. Chem. 26(16), 1781–1802 (2005)CrossRefGoogle Scholar
  33. 33.
    Raicu, I., Foster, I., Zhao, Y.: Many-task computing for Grids and supercomputers. In: Proc. Workshop on Many-Task Computing on Grids and Supercomputers (2008)Google Scholar
  34. 34.
    Raicu, I., Zhang, Z., Wilde, M., Foster, I., Beckman, P., Iskra, K., Clifford, B.: Towards loosely-coupled programming on petascale systems. In: Proc. SC’08 (2008)Google Scholar
  35. 35.
    Raicu, I., Zhao, Y., Foster, I.T., Szalay, A.: Accelerating large-scale data exploration through data diffusion. In: Proc. Workshop on Data-aware Distributed Computing (2008)Google Scholar
  36. 36.
    Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proc. USENIX Conference on File and Storage Technologies (2002)Google Scholar
  37. 37.
    Sfiligoi, I.: glideinWMS a generic pilot-based workload management system. J. Phys. Conf. Ser. 119(6), 062044 (2008)CrossRefGoogle Scholar
  38. 38.
    Stef-Praun, T., Clifford, B., Foster, I., Hasson, U., Hategan, M., Small, S.L., Wilde, M., Zhao, Y.: Accelerating medical research using the Swift workflow system. Stud. Health Technol. Inform. 126, 207–216 (2007)Google Scholar
  39. 39.
    Stef-Praun, T., Madeira, G.A., Foster, I., Townsend, R.: Accelerating solution of a moral hazard problem with Swift. In: e-Social Science 2007, Indianapolis (2007)Google Scholar
  40. 40.
    Sugita, Y., Okamoto, Y.: Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 314(1–2), 141–151 (1999)CrossRefGoogle Scholar
  41. 41.
    Sun Grid Engine web site. http://www.oracle.com/technetwork/oem/grid-engine-166852.html. Accessed 30 May 2013
  42. 42.
    Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor experience. Concurrency Computat. Pract. Exper. 17(2–4), 325–356 (2005)Google Scholar
  43. 43.
    Thota, A., Luckow, A., Jha, S.: Efficient large-scale replica-exchange simulations on production infrastructure. Phil. Trans. R. Soc. Lond. A 369(1949), 3318–3335 (2011)Google Scholar
  44. 44.
    Top 500 web site. http://www.top500.org. Accessed 30 May 2013
  45. 45.
    Using the Hydra process manager. https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager. Accessed 30 May 2013
  46. 46.
    von Laszewski, G., Foster, I., Gawor, J., Lane, P.: A Java commodity Grid kit. Concurrency Computat. Pract. Exper. 13(8–9), 645–662 (2001)Google Scholar
  47. 47.
    Wibisono, A., Zhao, Z., Belloum, A., Bubak, M.: A framework for interactive parameter sweep applications. In: Bubak, M., van Albada, G., Dongarra, J., Sloot, P. (eds.) Computational Science—ICCS 2008. Lecture Notes in Computer Science, vol. 5103. Springer, Berlin/Heidelberg (2008)Google Scholar
  48. 48.
    Wilde, M., Foster, I., Iskra, K., Beckman, P., Zhang, Z., Espinosa, A., Hategan, M., Clifford, B., Raicu, I.: Parallel scripting for applications at the petascale and beyond. Computer 42(11), 50–60 (2009)CrossRefGoogle Scholar
  49. 49.
    Wilde, M., Hategan, M., Wozniak, J.M., Clifford, B., Katz, D.S., Foster, I.: Swift: a language for distributed parallel scripting. Parallel Comput. 37(9), 633–652 (2011)CrossRefGoogle Scholar
  50. 50.
    Wozniak, J.M., Wilde, M.: Case studies in storage access by loosely coupled petascale applications. In: Proc. Petascale Data Storage Workshop at SC’09 (2009)Google Scholar
  51. 51.
    Wozniak, J.M., Jacobs, B., Latham, R., Lang, S., Son, S.W., Ross, R.: Implementing reliable data structures for MPI services in high component count systems. In: Recent Advances in Parallel Virtual Machine and Message Passing Interface. Lecture Notes in Computer Science, vol. 5759. Springer (2009)Google Scholar
  52. 52.
    Zhang, Z., Espinosa, A., Iskra, K., Raicu, I., Foster, I., Wilde, M.: Design and evaluation of a collective I/O model for loosely-coupled petascale programming. In: Proc. MTAGS Workshop at SC’08 (2008)Google Scholar
  53. 53.
    Zhao, Y., Hategan, M., Clifford, B., Foster, I., von Laszewski, G., Raicu, I., Stef-Praun, T., Wilde, M.: Swift: Fast, reliable, loosely coupled parallel computation. In: Proc. Workshop on Scientific Workflows (2007)Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  • Justin M. Wozniak
    • 1
  • Michael Wilde
    • 1
  • Daniel S. Katz
    • 1
  1. 1.Argonne National LaboratoryArgonneUSA

Personalised recommendations