Computing the Throughput of Probabilistic and Replicated Streaming Applications

Benoit, Anne; Gallet, Matthieu; Gaujal, Bruno; Robert, Yves

doi:10.1007/s00453-013-9768-1

Computing the Throughput of Probabilistic and Replicated Streaming Applications

Published: 27 March 2013

Volume 69, pages 925–957, (2014)
Cite this article

Algorithmica Aims and scope Submit manuscript

Anne Benoit¹,
Matthieu Gallet¹,
Bruno Gaujal² &
…
Yves Robert¹

223 Accesses
Explore all metrics

Abstract

In this paper, we investigate how to compute the throughput of probabilistic and replicated streaming applications. We are given (i) a streaming application whose dependence graph is a linear chain; (ii) a one-to-many mapping of the application onto a fully heterogeneous target platform, where a processor is assigned at most one application stage, but where a stage can be replicated onto a set of processors; and (iii) a set of random variables modeling the computation and communication times in the mapping. We show how to compute the throughput of the application, i.e., the rate at which data sets can be processed. The problem is easy when application stages are not replicated, i.e., each application stage is assigned to a single processor: in that case the throughput is dictated by the critical hardware resource. However, when stages are replicated, i.e., each application stage may be assigned to several processors, the problem becomes surprisingly complicated: even in the deterministic case, the optimal throughput may be lower than the smallest internal resource throughput.

The first contribution of the paper is to provide a general method to compute the throughput when computation and communication times, also called stage parameters, are constant or follow I.I.D. exponential laws. The second contribution is to provide bounds for the throughput when stage parameters form associated random sequences (correlation between communication and processing times of a given data set on the different application stages, i.e., a data set that takes a long time on the first stage is likely to be large, and to take a long time on the next stages), and are N.B.U.E. (New Better than Used in Expectation) variables (if an operation has already been processed for some duration, the remaining time is smaller than the processing time of a fresh operation): the throughput is bounded from below by the exponential case and bounded from above by the deterministic case. An extensive set of simulation allows us to assess the quality of the model, and to observe the actual behavior of several distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Symmetric Markov Processes with Tightness Property

Optimization of uncertain dependent task mapping on heterogeneous computing platforms

Article 07 April 2024

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Article 31 July 2020

Notes

The product ⊗ is defined as: (V⊗M)_k=max_i(V _i+M _ik).

References

Baccelli, F., Cohen, G., Gaujal, B.: Evolution equations of timed Petri nets. In: Proceedings of the 30th IEEE Conference on Decision and Control, 1991, vol. 2, pp. 1139–1144 (1991). doi:10.1109/CDC.1991.261523
Google Scholar
Baccelli, F., Cohen, G., Olsder, G.J., Quadrat, J.-P.: Synchronization and Linearity. Wiley, New York (1992)
MATH Google Scholar
Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. J. Parallel Distrib. Comput. 68(6), 790–808 (2008)
Article MATH Google Scholar
Benoit, A., Gallet, M., Gaujal, B., Robert, Y.: Computing the throughput of replicated workflows on heterogeneous platforms. In: Proceedings of ICPP’2009, the 38th International Conference on Parallel Processing (2009)
Google Scholar
Benoit, A., Dufossé, F., Gallet, M., Gaujal, B., Robert, Y.: Computing the throughput of probabilistic and replicated streaming applications. In: Proceedings of SPAA 2010, the 22nd ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York (2010)
Google Scholar
Beynon, M.D., Kurc, T., Sussman, A., Saltz, J.: Optimizing execution of component-based applications using group instances. Future Gener. Comput. Syst. 18(4), 435–448 (2002)
Article MATH Google Scholar
Casanova, H., Legrand, A., Quinson, M.: SimGrid: a generic framework for large-scale distributed experiments. In: Proceedings of UKSim, the 10th EUROS/UKSim International Conference on Computer Modelling and Simulation, pp. 126–131 (2008)
Chapter Google Scholar
Chiola, G., Franceschinis, G., Gaeta, R., Ribaudo, M.: GreatSPN: graphical editor and analyzer for timed and stochastic Petri nets. Perform. Eval. 24(1–2), 47–68 (1995)
Article MATH Google Scholar
Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)
Article Google Scholar
DataCutter Project. Middleware for filtering large archival scientific datasets in a grid environment. http://www.cs.umd.edu/projects/hpsl/ResearchAreas/DataCutter.htm
Esary, J.D., Proschan, F., Walkup, D.W.: Association of random variables, with applications. Ann. Math. Stat. 38(5), 1466–1474 (1967)
Article MATH MathSciNet Google Scholar
Gaujal, B., Vincent, J.-M.: Comparisons of stochastic task-resource systems. In: Introduction to Scheduling. CRC Press, Boca Raton (2009)
Google Scholar
Häggström, O.: Finite Markov Chains and Algorithmic Applications. Cambridge University Press, Cambridge (2002)
Book MATH Google Scholar
Hillion, H., Proth, J.-M.: Performance evaluation of job shop systems using timed event graphs. IEEE Trans. Autom. Control 34(1), 3–9 (1989)
Article MATH MathSciNet Google Scholar
Jean-Marie, A.: ERS: a tool set for performance evaluation of discrete event systems. http://www-sop.inria.fr/mistral/soft/ers.html
Kamburowski, J.: Bounding the distribution of project duration in pert networks. Oper. Res. Lett. 12, 17–22 (1992)
Article MATH MathSciNet Google Scholar
Knuth, D.E.: The Art of Computer Programming, vol. 3, 2nd edn. Addison-Wesley, Reading (1998)
Google Scholar
Kumazawa, Y.: Tests for new better than used in expectation with randomly censored data. Seq. Anal. 5(1), 85–92 (1986)
Article MathSciNet Google Scholar
Spencer, M., Ferreira, R., Beynon, M., Kurc, T., Catalyurek, U., Sussman, A., Saltz, J.: Executing multiple pipelined data analysis operations in the grid. In: Proceedings of Supercomputing’02, the 2002 ACM/IEEE Conference on Supercomputing, pp. 1–18. IEEE Comput. Soc., Los Alamitos (2002)
Google Scholar
Subhlok, J., Vondran, G.: Optimal mapping of sequences of data parallel tasks. In: Proceedings of PPoPP’95, the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 134–143. ACM, New York (1995)
Google Scholar
Subhlok, J., Vondran, G.: Optimal latency-throughput tradeoffs for data parallel pipelines. In: Proceedings of SPAA’96, the 8th ACM Symposium on Parallel Algorithms and Architectures, pp. 62–71. ACM, New York (1996)
Google Scholar
Taura, K., Chien, A.: A heuristic algorithm for mapping communicating tasks on heterogeneous resources. In: Proceedings of HCW’00, the 9th Heterogeneous Computing Workshop, pp. 102–115. IEEE Comput. Soc., Los Alamitos (2000)
Google Scholar
Velho, P., Legrand, A.: Accuracy study and improvement of network simulation in the SimGrid framework. In: Proceedings of Simutools’09, the 2nd International Conference on Simulation Tools and Techniques, ICST, pp. 1–10 (2009)
Google Scholar
Vydyanathan, N., Çatalyurek, Ü.V., Kurc, T., Saddayappan, P., Saltz, J.: Toward optimizing latency under throughput constraints for application workflows on clusters. In: Proceedings of Euro-Par’07. LNCS, vol. 4641, pp. 173–183. Springer, Berlin (2007)
Google Scholar
Vydyanathan, N., Çatalyurek, Ü.V., Kurc, T., Saddayappan, P., Saltz, J.: A duplication based algorithm for optimizing latency under throughput constraints for streaming workflows. In: Proceedings of ICPP’2008, the 37th International Conference on Parallel Processing, pp. 254–261. IEEE Comput. Soc., Los Alamitos (2008)
Google Scholar
Wu, Q., Gu, Y.: Supporting distributed application workflows in heterogeneous computing environments. In: Proceedings of ICPADS’08, the 14th IEEE International Conference on Parallel and Distributed Systems, pp. 3–10. IEEE Comput. Soc., Los Alamitos (2008)
Chapter Google Scholar

Download references

Acknowledgements

The authors thank the reviewers for their numerous comments and suggestions, which greatly improved the final version of the paper.

Author information

Authors and Affiliations

ENS Lyon, LIP Laboratory and INRIA, CNRS, UCBL, Lyon, France
Anne Benoit, Matthieu Gallet & Yves Robert
INRIA, Grenoble, France
Bruno Gaujal

Authors

Anne Benoit
View author publications
You can also search for this author in PubMed Google Scholar
Matthieu Gallet
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Gaujal
View author publications
You can also search for this author in PubMed Google Scholar
Yves Robert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anne Benoit.

Additional information

Part of this work has appeared in ICPP’09 and SPAA’10. Anne Benoit and Yves Robert are with the Institut Universitaire de France. This work was supported in part by the ANR StochaGrid and RESCUE projects, and by the Inria ALEAE project.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Benoit, A., Gallet, M., Gaujal, B. et al. Computing the Throughput of Probabilistic and Replicated Streaming Applications. Algorithmica 69, 925–957 (2014). https://doi.org/10.1007/s00453-013-9768-1

Download citation

Received: 16 February 2011
Accepted: 09 March 2013
Published: 27 March 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s00453-013-9768-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Computing the Throughput of Probabilistic and Replicated Streaming Applications

Abstract

Access this article

Similar content being viewed by others

Symmetric Markov Processes with Tightness Property

Optimization of uncertain dependent task mapping on heterogeneous computing platforms

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Computing the Throughput of Probabilistic and Replicated Streaming Applications

Abstract

Access this article

Similar content being viewed by others

Symmetric Markov Processes with Tightness Property

Optimization of uncertain dependent task mapping on heterogeneous computing platforms

Parallel programming models for heterogeneous many-cores: a comprehensive survey

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation