Cluster Computing

, Volume 13, Issue 3, pp 315–333 | Cite as

Parameterized specification, configuration and execution of data-intensive scientific workflows

  • Vijay S. Kumar
  • Tahsin Kurc
  • Varun Ratnakar
  • Jihie Kim
  • Gaurang Mehta
  • Karan Vahi
  • Yoonju Lee Nelson
  • P. Sadayappan
  • Ewa Deelman
  • Yolanda Gil
  • Mary Hall
  • Joel Saltz
Article

Abstract

Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multidimensional parameter space consisting of input performance parameters to the applications that are known to affect their execution times. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the analysis, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple such parameters. Using two real-world applications in the spatial, multidimensional data analysis domain, we present an experimental evaluation of the proposed framework.

Keywords

Scientific workflow Performance parameters Semantic representations Grid Application QoS 

References

  1. 1.
    Acher, M., Collet, P., Lahire, P.: Issues in managing variability of medical imaging grid services. In: MICCAI-Grid Workshop (MICCAI-Grid) (2008) Google Scholar
  2. 2.
    Beynon, M.D., Kurc, T., Catalyurek, U., Chang, C., Sussman, A., Saltz, J.: Distributed processing of very large datasets with DataCutter. Parallel Comput. 27(11), 1457–1478 (2001) MATHCrossRefGoogle Scholar
  3. 3.
    Brandic, I., Pllana, S., Benkner, S.: Specification, planning, and execution of QoS-aware Grid workflows within the Amadeus environment. Concurr. Comput. Pract. Exp. 20(4), 331–345 (2008) CrossRefGoogle Scholar
  4. 4.
    Chang, F., Karamcheti, V.: Automatic configuration and run-time adaptation of distributed applications. In: High Performance Distributed Computing, pp. 11–20 (2000) Google Scholar
  5. 5.
    Chen, C., Chame, J., Hall, M.W.: Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: International Symposium on Code Generation and Optimization (2005) Google Scholar
  6. 6.
    Chiu, D., Deshpande, S., Agrawal, G., Li, R.: Cost and accuracy sensitive dynamic workflow composition over grid environments. In: 9th IEEE/ACM International Conference on Grid Computing, pp. 9–16 (2008) Google Scholar
  7. 7.
    Chow, S.K., Hakozaki, H., Price, D.L., MacLean, N.A.B., Deerinck, T.J., Bouwer, J.C., Martone, M.E., Peltier, S.T., Ellisman, M.H.: Automated microscopy system for mosaic acquisition and processing. J. Microsc. 222(2), 76–84 (2006) CrossRefMathSciNetGoogle Scholar
  8. 8.
    Chung, I.H., Hollingsworth, J.: A case study using automatic performance tuning for large-scale scientific programs. In: 15th IEEE International Symposium on High Performance Distributed Computing, pp. 45–56 (2006) Google Scholar
  9. 9.
    Chung, I.H., Hollingsworth, J.K.: Using information from prior runs to improve automated tuning systems. In: SC ’04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 30. IEEE Computer Society, Washington (2004) Google Scholar
  10. 10.
    Cortellessa, V., Marinelli, F., Potena, P.: Automated selection of software components based on cost/reliability tradeoff. In: Software Architecture, Third European Workshop, EWSA 2006. Lecture Notes in Computer Science, vol. 4344. Springer, Berlin (2006) Google Scholar
  11. 11.
    Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Patil, S., Su, M.H., Vahi, K., Livny, M.: Pegasus: Mapping scientific workflows onto the grid. In: Lecture Notes in Computer Science: Grid Computing, pp. 11–20 (2004) Google Scholar
  12. 12.
    Gil, Y., Ratnakar, V., Deelman, E., Mehta, G., Kim, J.: Wings for Pegasus: Creating large-scale scientific applications using semantic representations of computational workflows. In: Proceedings of the 19th Annual Conference on Innovative Applications of Artificial Intelligence (IAAI) (2007) Google Scholar
  13. 13.
    Glatard, T., Montagnat, J., Pennec, X.: Efficient services composition for grid-enabled data-intensive applications. In: Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC’06), Paris, France, 19 June 2006 Google Scholar
  14. 14.
    Kong, J., Sertel, O., Shimada, H., Boyer, K., Saltz, J., Gurcan, M.: Computer-aided grading of neuroblastic differentiation: Multi-resolution and multi-classifier approach. In: IEEE International Conference on Image Processing, ICIP 2007, vol. 5, pp. 525–528 (2007) Google Scholar
  15. 15.
    Kumar, V., Rutt, B., Kurc, T., Catalyurek, U., Pan, T., Chow, S., Lamont, S., Martone, M., Saltz, J.: Large-scale biomedical image analysis in grid environments. IEEE Trans. Inf. Technol. Biomed. 12(2), 154–161 (2008) CrossRefGoogle Scholar
  16. 16.
    Kumar, V.S., Rutt, B., Kurc, T., Catalyurek, U., Saltz, J., Chow, S., Lamont, S., Martone, M.: Large image correction and warping in a cluster environment. In: SC ’06: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 79. ACM, New York (2006) CrossRefGoogle Scholar
  17. 17.
    Kumar, V.S., Narayanan, S., Kurç, T.M., Kong, J., Gurcan, M.N., Saltz, J.H.: Analysis and semantic querying in large biomedical image datasets. IEEE Comput. 41(4), 52–59 (2008) Google Scholar
  18. 18.
    Lera, I., Juiz, C., Puigjaner, R.: Performance-related ontologies and semantic web applications for on-line performance assessment intelligent systems. Sci. Comput. Program. 61(1), 27–37 (2006) MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system: Research articles. Concurr. Comput. Pract. Exp. 18(10), 1039–1065 (2006) CrossRefGoogle Scholar
  20. 20.
    Nelson, Y.L.: Model-guided performance tuning for application-level parameters. Ph.D. Dissertation, University of Southern California (2009) Google Scholar
  21. 21.
    Norris, B., Ray, J., Armstrong, R., Mcinnes, L.C., Shende, S.: Computational quality of service for scientific components. In: Proceedings of the International Symposium on Component-based Software Engineering (CBSE7), pp. 264–271. Springer, Berlin (2004) Google Scholar
  22. 22.
    Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004) CrossRefGoogle Scholar
  23. 23.
    Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor experience: Research articles. Concurr. Comput. Pract. Exp. 17(2–4), 323–356 (2005) CrossRefGoogle Scholar
  24. 24.
    Truong, H.L., Dustdar, S., Fahringer, T.: Performance metrics and ontologies for grid workflows. Future Gener. Comput. Syst. 23(6), 760–772 (2007) CrossRefGoogle Scholar
  25. 25.
    Wolski, R., Spring, N., Hayes, J.: The network weather service: a distributed resource performance forecasting service for metacomputing. J. Future Gener. Comput. Syst. 15, 757–768 (1999) CrossRefGoogle Scholar
  26. 26.
    Zhou, J., Cooper, K., Yen, I.L.: A rule-based component customization technique for QoS properties. In: Eighth IEEE International Symposium on High Assurance Systems Engineering, pp. 302–303 (2004) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Vijay S. Kumar
    • 1
  • Tahsin Kurc
    • 4
  • Varun Ratnakar
    • 2
  • Jihie Kim
    • 2
  • Gaurang Mehta
    • 2
  • Karan Vahi
    • 2
  • Yoonju Lee Nelson
    • 2
  • P. Sadayappan
    • 1
  • Ewa Deelman
    • 2
  • Yolanda Gil
    • 2
  • Mary Hall
    • 3
  • Joel Saltz
    • 4
  1. 1.Dept. of Computer Science and EngineeringOhio State UniversityColumbusUSA
  2. 2.Information Sciences InstituteUniversity of Southern CaliforniaMarina del ReyUSA
  3. 3.School of ComputingUniversity of UtahSalt Lake CityUSA
  4. 4.Center for Comprehensive InformaticsEmory UniversityAtlantaUSA

Personalised recommendations