Abstract
Mapping applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline or fork graphs. Several antagonist criteria should be optimized for workflow applications, such as throughput and latency (or a combination). In this paper, we consider a simplified model with no communication cost, and we provide an exhaustive list of complexity results for different problem instances. Pipeline or fork stages can be replicated in order to increase the throughput by sending consecutive data sets onto different processors. In some cases, stages can also be data-parallelized, i.e. the computation of one single data set is shared between several processors. This leads to a decrease of the latency and an increase of the throughput. Some instances of this simple model are shown to be NP-hard, thereby exposing the inherent complexity of the mapping problem. We provide polynomial algorithms for other problem instances. Altogether, we provide solid theoretical foundations for the study of mono-criterion or bi-criteria mapping optimization problems.
Similar content being viewed by others
References
Ahmad, I., Kwok, Y.-K.: On exploiting task duplication in parallel program scheduling. IEEE Trans. Parallel Distrib. Syst. 9(9), 872–892 (1998)
Amdahl, G.: The validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS Conference Proceedings, vol. 30, pp. 483–485. AFIPS Press, Montvale (1967)
Banikazemi, M., Moorthy, V., Panda, D.K.: Efficient collective communication on heterogeneous networks of workstations. In: Proceedings of the 27th International Conference on Parallel Processing (ICPP’98). IEEE Computer Society, Los Alamitos (1998)
Beaumont, O., Legrand, A., Marchal, L., Robert, Y.: Assessing the impact and limits of steady-state scheduling for mixed task and data parallelism on heterogeneous platforms. In: HeteroPar’2004: International Conference on Heterogeneous Computing. ISPDC’2004: International Symposium on Parallel and Distributed Computing, pp. 296–302. IEEE Computer Society, Los Alamitos (2004)
Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. J. Parallel Distrib. Comput. 68(6), 790–808 (2008). Available as LIP Research Report 2007-05, graal.ens-lyon.fr/~abenoit/. Short version appeared in ICCS’2007
Beynon, M., Sussman, A., Catalyurek, U., Kurc, T., Saltz, J.: Performance optimization for data intensive grid applications. In: Proceedings of the Third Annual International Workshop on Active Middleware Services (AMS’01). IEEE Computer Society, Los Alamitos (2001)
Beynon, M.D., Kurc, T., Sussman, A., Saltz, J.: Optimizing execution of component-based applications using group instances. Future Gener. Comput. Syst. 18(4), 435–448 (2002)
Bhat, P., Raghavendra, C., Prasanna, V.: Efficient collective communication in distributed heterogeneous systems. In: ICDCS’99 19th International Conference on Distributed Computing Systems, pp. 15–24. IEEE Computer Society, Los Alamitos (1999)
Bhat, P., Raghavendra, C., Prasanna, V.: Efficient collective communication in distributed heterogeneous systems. J. Parallel Distrib. Comput. 63, 251–263 (2003)
Bokhari, S.H.: Partitioning problems in parallel, pipeline, and distributed computing. IEEE Trans. Comput. 37(1), 48–57 (1988)
Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)
DataCutter Project: Middleware for filtering large archival scientific datasets in a grid environment. http://www.cs.umd.edu/projects/hpsl/ResearchAreas/DataCutter.htm
Garey, M.R., Johnson, D.S.: Computers and intractability, a guide to the theory of NP-completeness. Freeman, New York (1979)
Hansen, P., Lih, K.-W.: Improved algorithms for partitioning problems in parallel, pipeline, and distributed computing. IEEE Trans. Comput. 41(6), 769–771 (1992)
Hong, B., Prasanna, V.: Bandwidth-aware resource allocation for heterogeneous computing systems to maximize throughput. In: Proceedings of the 32th International Conference on Parallel Processing (ICPP’2003). IEEE Computer Society, Los Alamitos (2003)
Iqbal, M.A.: Approximate algorithms for partitioning problems. Int. J. Parallel Program. 20(5), 341–361 (1991)
Iqbal, M.A., Bokhari, S.H.: Efficient algorithms for a class of partitioning problems. IEEE Trans. Parallel Distrib. Syst. 6(2), 170–175 (1995)
Khuller, S., Kim, Y.: On broadcasting in heterogenous networks. In: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1011–1020. SIAM, Philadelphia (2004)
Kwok, Y.-K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv. 31(4), 406–471 (1999)
Liu, P.: Broadcast scheduling optimization for heterogeneous cluster systems. J. Algorithms 42(1), 135–152 (2002)
Marchal, L., Rehn, V., Robert, Y., Vivien, F.: Scheduling and data redistribution strategies on star platforms. Research Report 2006-23, LIP, ENS Lyon, France, June 2006
Olstad, B., Manne, F.: Efficient partitioning of sequences. IEEE Trans. Comput. 44(11), 1322–1326 (1995)
Pinar, A., Aykanat, C.: Fast optimal load balancing algorithms for 1D partitioning. J. Parallel Distrib. Comput. 64(8), 974–996 (2004)
Rabhi, F., Gorlatch, S.: Patterns and Skeletons for Parallel and Distributed Computing. Springer, Berlin (2002)
Saif, T., Parashar, M.: Understanding the behavior and performance of non-blocking communications in MPI. In: Proceedings of Euro-Par 2004: Parallel Processing. Lecture Notes in Computer Science, vol. 3149, pp. 173–182. Springer, Berlin (2004)
Spencer, M., Ferreira, R., Beynon, M., Kurc, T., Catalyurek, U., Sussman, A., Saltz, J.: Executing multiple pipelined data analysis operations in the grid. In: 2002 ACM/IEEE Supercomputing Conference. Assoc. Comput. Mach., New York (2002)
Subhlok, J., Vondran, G.: Optimal mapping of sequences of data parallel tasks. In: Proc. 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP’95, pp. 134–143. Assoc. Comput. Mach., New York (1995)
Subhlok, J., Vondran, G.: Optimal latency-throughput tradeoffs for data parallel pipelines. In: ACM Symposium on Parallel Algorithms and Architectures SPAA’96, pp. 62–71. Assoc. Comput. Mach., New York (1996)
Taura, K., Chien, A.A.: A heuristic algorithm for mapping communicating tasks on heterogeneous resources. In: Heterogeneous Computing Workshop, pp. 102–115. IEEE Computer Society, Los Alamitos (2000)
Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Vydyanathan, N., Catalyurek, U., Kurc, T., Saddayappan, P., Saltz, J.: An approach for optimizing latency under throughput constraints for application workflows on clusters. Research Report OSU-CISRC-1/07-TR03, Ohio State University, Columbus, OH, Jan. 2007. Available at ftp://ftp.cse.ohio-state.edu/pub/tech-report/2007. Short version appears in EuroPar’2008
Yang, T., Gerasoulis, A.: DSC: Scheduling parallel tasks on an unbounded number of processors. IEEE Trans. Parallel Distrib. Syst. 5(9), 951–967 (1994)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Benoit, A., Robert, Y. Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows. Algorithmica 57, 689–724 (2010). https://doi.org/10.1007/s00453-008-9229-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-008-9229-4