Skip to main content
Log in

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Mapping applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline or fork graphs. Several antagonist criteria should be optimized for workflow applications, such as throughput and latency (or a combination). In this paper, we consider a simplified model with no communication cost, and we provide an exhaustive list of complexity results for different problem instances. Pipeline or fork stages can be replicated in order to increase the throughput by sending consecutive data sets onto different processors. In some cases, stages can also be data-parallelized, i.e. the computation of one single data set is shared between several processors. This leads to a decrease of the latency and an increase of the throughput. Some instances of this simple model are shown to be NP-hard, thereby exposing the inherent complexity of the mapping problem. We provide polynomial algorithms for other problem instances. Altogether, we provide solid theoretical foundations for the study of mono-criterion or bi-criteria mapping optimization problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ahmad, I., Kwok, Y.-K.: On exploiting task duplication in parallel program scheduling. IEEE Trans. Parallel Distrib. Syst. 9(9), 872–892 (1998)

    Article  Google Scholar 

  2. Amdahl, G.: The validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS Conference Proceedings, vol. 30, pp. 483–485. AFIPS Press, Montvale (1967)

    Google Scholar 

  3. Banikazemi, M., Moorthy, V., Panda, D.K.: Efficient collective communication on heterogeneous networks of workstations. In: Proceedings of the 27th International Conference on Parallel Processing (ICPP’98). IEEE Computer Society, Los Alamitos (1998)

    Google Scholar 

  4. Beaumont, O., Legrand, A., Marchal, L., Robert, Y.: Assessing the impact and limits of steady-state scheduling for mixed task and data parallelism on heterogeneous platforms. In: HeteroPar’2004: International Conference on Heterogeneous Computing. ISPDC’2004: International Symposium on Parallel and Distributed Computing, pp. 296–302. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  5. Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. J. Parallel Distrib. Comput. 68(6), 790–808 (2008). Available as LIP Research Report 2007-05, graal.ens-lyon.fr/~abenoit/. Short version appeared in ICCS’2007

    Article  Google Scholar 

  6. Beynon, M., Sussman, A., Catalyurek, U., Kurc, T., Saltz, J.: Performance optimization for data intensive grid applications. In: Proceedings of the Third Annual International Workshop on Active Middleware Services (AMS’01). IEEE Computer Society, Los Alamitos (2001)

    Google Scholar 

  7. Beynon, M.D., Kurc, T., Sussman, A., Saltz, J.: Optimizing execution of component-based applications using group instances. Future Gener. Comput. Syst. 18(4), 435–448 (2002)

    Article  MATH  Google Scholar 

  8. Bhat, P., Raghavendra, C., Prasanna, V.: Efficient collective communication in distributed heterogeneous systems. In: ICDCS’99 19th International Conference on Distributed Computing Systems, pp. 15–24. IEEE Computer Society, Los Alamitos (1999)

    Google Scholar 

  9. Bhat, P., Raghavendra, C., Prasanna, V.: Efficient collective communication in distributed heterogeneous systems. J. Parallel Distrib. Comput. 63, 251–263 (2003)

    Article  MATH  Google Scholar 

  10. Bokhari, S.H.: Partitioning problems in parallel, pipeline, and distributed computing. IEEE Trans. Comput. 37(1), 48–57 (1988)

    Article  MathSciNet  Google Scholar 

  11. Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)

    Article  Google Scholar 

  12. DataCutter Project: Middleware for filtering large archival scientific datasets in a grid environment. http://www.cs.umd.edu/projects/hpsl/ResearchAreas/DataCutter.htm

  13. Garey, M.R., Johnson, D.S.: Computers and intractability, a guide to the theory of NP-completeness. Freeman, New York (1979)

    MATH  Google Scholar 

  14. Hansen, P., Lih, K.-W.: Improved algorithms for partitioning problems in parallel, pipeline, and distributed computing. IEEE Trans. Comput. 41(6), 769–771 (1992)

    Article  Google Scholar 

  15. Hong, B., Prasanna, V.: Bandwidth-aware resource allocation for heterogeneous computing systems to maximize throughput. In: Proceedings of the 32th International Conference on Parallel Processing (ICPP’2003). IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

  16. Iqbal, M.A.: Approximate algorithms for partitioning problems. Int. J. Parallel Program. 20(5), 341–361 (1991)

    Article  MathSciNet  Google Scholar 

  17. Iqbal, M.A., Bokhari, S.H.: Efficient algorithms for a class of partitioning problems. IEEE Trans. Parallel Distrib. Syst. 6(2), 170–175 (1995)

    Article  Google Scholar 

  18. Khuller, S., Kim, Y.: On broadcasting in heterogenous networks. In: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1011–1020. SIAM, Philadelphia (2004)

    Google Scholar 

  19. Kwok, Y.-K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999)

    Article  Google Scholar 

  20. Liu, P.: Broadcast scheduling optimization for heterogeneous cluster systems. J. Algorithms 42(1), 135–152 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  21. Marchal, L., Rehn, V., Robert, Y., Vivien, F.: Scheduling and data redistribution strategies on star platforms. Research Report 2006-23, LIP, ENS Lyon, France, June 2006

  22. Olstad, B., Manne, F.: Efficient partitioning of sequences. IEEE Trans. Comput. 44(11), 1322–1326 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  23. Pinar, A., Aykanat, C.: Fast optimal load balancing algorithms for 1D partitioning. J. Parallel Distrib. Comput. 64(8), 974–996 (2004)

    Article  MATH  Google Scholar 

  24. Rabhi, F., Gorlatch, S.: Patterns and Skeletons for Parallel and Distributed Computing. Springer, Berlin (2002)

    Google Scholar 

  25. Saif, T., Parashar, M.: Understanding the behavior and performance of non-blocking communications in MPI. In: Proceedings of Euro-Par 2004: Parallel Processing. Lecture Notes in Computer Science, vol. 3149, pp. 173–182. Springer, Berlin (2004)

    Google Scholar 

  26. Spencer, M., Ferreira, R., Beynon, M., Kurc, T., Catalyurek, U., Sussman, A., Saltz, J.: Executing multiple pipelined data analysis operations in the grid. In: 2002 ACM/IEEE Supercomputing Conference. Assoc. Comput. Mach., New York (2002)

    Google Scholar 

  27. Subhlok, J., Vondran, G.: Optimal mapping of sequences of data parallel tasks. In: Proc. 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP’95, pp. 134–143. Assoc. Comput. Mach., New York (1995)

    Google Scholar 

  28. Subhlok, J., Vondran, G.: Optimal latency-throughput tradeoffs for data parallel pipelines. In: ACM Symposium on Parallel Algorithms and Architectures SPAA’96, pp. 62–71. Assoc. Comput. Mach., New York (1996)

    Google Scholar 

  29. Taura, K., Chien, A.A.: A heuristic algorithm for mapping communicating tasks on heterogeneous resources. In: Heterogeneous Computing Workshop, pp. 102–115. IEEE Computer Society, Los Alamitos (2000)

    Google Scholar 

  30. Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)

    Article  Google Scholar 

  31. Vydyanathan, N., Catalyurek, U., Kurc, T., Saddayappan, P., Saltz, J.: An approach for optimizing latency under throughput constraints for application workflows on clusters. Research Report OSU-CISRC-1/07-TR03, Ohio State University, Columbus, OH, Jan. 2007. Available at ftp://ftp.cse.ohio-state.edu/pub/tech-report/2007. Short version appears in EuroPar’2008

  32. Yang, T., Gerasoulis, A.: DSC: Scheduling parallel tasks on an unbounded number of processors. IEEE Trans. Parallel Distrib. Syst. 5(9), 951–967 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne Benoit.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Benoit, A., Robert, Y. Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows. Algorithmica 57, 689–724 (2010). https://doi.org/10.1007/s00453-008-9229-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-008-9229-4

Keywords

Navigation