Resource-Constrained Scheduling Algorithms for Stochastic Independent Tasks With Unknown Probability Distribution

Gao, Yiqin; Robert, Yves; Vivien, Frédéric

doi:10.1007/s00453-023-01100-8

Resource-Constrained Scheduling Algorithms for Stochastic Independent Tasks With Unknown Probability Distribution

Published: 17 February 2023

Volume 85, pages 2363–2394, (2023)
Cite this article

Algorithmica Aims and scope Submit manuscript

127 Accesses
Explore all metrics

Abstract

This work introduces scheduling algorithms to maximize the expected number of independent tasks that can be executed on a parallel platform within a given budget and under a deadline constraint. The main motivation for this problem comes from imprecise computations, where each job has a mandatory part and an optional part, and the objective is to maximize the number of optional parts that are successfully executed, in order to improve the accuracy of the results. The optional parts of the jobs represent the independent tasks of our problem. Task execution times are not known before execution; instead, the only information available to the scheduler is that they obey some (unknown) probability distribution. The scheduler needs to acquire some information before deciding for a cutting threshold: instead of allowing all tasks to run until completion, one may want to interrupt long-running tasks at some point. In addition, the cutting threshold may be reevaluated as new information is acquired when the execution progresses further. This work presents several algorithms to determine a good cutting threshold, and to decide when to re-evaluate it. In particular, we use the Kaplan-Meier estimator to account for tasks that are still running when making a decision. The efficiency of our algorithms is assessed through an extensive set of simulations with various budget and deadline values, and ranging over 13 probability distributions. In particular, the AutoPerSurvival(40%,0.005) strategy is proved to have a performance of 77% compared to the upper bound even in the worst case. This shows the robustness of our strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two Deadline Reduction Algorithms for Scheduling Dependent Tasks on Parallel Processors

Energy constrained scheduling of stochastic tasks

Article 06 September 2017

Processor Bounding for an Efficient Non-preemptive Task Scheduling Algorithm

Article 21 May 2019

References

Aalen, O., Borgan, O., Gjessing, H.: (2008) Survival and event history analysis: a process point of view. Springer Science & Business Media
Amirijoo, M., Hansson, J., Son, S.H.: Specification and management of qos in real-time databases supporting imprecise computations. IEEE Trans. Comput. 55(3), 304–319 (2006)
Article Google Scholar
Buttazzo, G.: (2012) Handling overload conditions in real-time systems. In: Babamir, S.M. (ed) Real-Time Systems, Architecture, Scheduling, and Application. InTech, Rijeka, chap 7, https://doi.org/10.5772/37265,
Cai, Z., Li, X., Ruiz, R., et al.: A delay-based dynamic scheduling algorithm for bag-of-task workflows with stochastic task execution times in clouds. Future Gener. Comput. Syst. 71, 57–72 (2017). https://doi.org/10.1016/j.future.2017.01.020
Article Google Scholar
Canon, LC., Kong Win Chang, A., Robert, Y. et al (2018) Scheduling independent stochastic tasks under deadline and budget constraints. In: SBAC-PAD. IEEE
Canon, L.C., Kong Win Chang, A., Robert, Y., et al.: Scheduling independent stochastic tasks under deadline and budget constraints. Int. J. High Perform. Comp. Appl. 34(2), 246–264 (2019)
Article Google Scholar
Casanova, H., Gallet, M., Vivien, F.: (2010) Non-clairvoyant scheduling of multiple bag-of-tasks applications. In: Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, pp 168–179, https://doi.org/10.1007/978-3-642-15277-1_17
Chambless, L.E., Diao, G.: Estimation of time-dependent area under the roc curve for long-term risk prediction. Stat. Med. 25(20), 3474–3486 (2006). https://doi.org/10.1002/sim.2299
Article MathSciNet Google Scholar
Chung, J.Y., Liu, J.W.S., Lin, K.J.: Scheduling periodic jobs that allow imprecise results. IEEE Trans. Comput. 39(9), 1156–1174 (1990)
Article Google Scholar
Feitelson, D.: Workload modeling for computer systems performance evaluation. Version 103, 1–607 (2014)
Google Scholar
Feng, W., Liu, JWS.: (1993) An extended imprecise computation model for time-constrained speech processing and generation. In: Proc. IEEE Workshop on Real-Time Applications, pp 76–80, https://doi.org/10.1109/RTA.1993.263112
Ferguson, TS.: (2008) Optimal stopping and applications. UCLA Press
Gao, Y.: Resource-constrained scheduling of stochastic tasks with unknown probability distribution. (2020). https://doi.org/10.6084/m9.figshare.13187135.v1, http://figshare.com/articles/software/Resource-Constrained_Scheduling_of_Stochastic_Tasks_With_Unknown_Probability_Distribution/13187135/1
Gao, Y., Canon, L., Robert, Y. et al (2019) Scheduling independent stochastic tasks on heterogeneous cloud platforms. In: 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–11
Grekioti, A., Shakhlevich, NV.: (2014) Scheduling bag-of-tasks applications to optimize computation time and cost. In: PPAM, LNCS, vol 8385. Springer
Guyot, P., Ades, A.E., Ouwens, M.J., et al.: Enhanced secondary analysis of survival data: reconstructing the data from published kaplan-meier survival curves. BMC Med. Res. Methodol. 12(1), 9 (2012). https://doi.org/10.1186/1471-2288-12-9
Article Google Scholar
Hassan, H., Simó, J., Crespo, A.: Flexible real-time mobile robotic architecture based on behavioural models. Eng. Appl. Artif. Intell. 14(5), 685–702 (2001). https://doi.org/10.1016/S0952-1976(01)00029-X
Article Google Scholar
Im, S., Kulkarni, J., Munagala, K.: Competitive algorithms from competitive equilibria: non-clairvoyant scheduling under polyhedral constraints. J. ACM (2017). https://doi.org/10.1145/3136754
Article MATH Google Scholar
Jumel, F., Simonot-Lion, F.: (2003) Management of anytime tasks in real time applications. In: XIV Workshop on Supervising and Diagnostics of Machining Systems, Karpacz/Pologne, https://hal.inria.fr/inria-00099612
Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53(282), 457–481 (1958). https://doi.org/10.1080/01621459.1958.10501452
Article MathSciNet MATH Google Scholar
Kobayashi, H., Yamasaki, N.: (2004) Rt-frontier: a real-time operating system for practical imprecise computation. In: 10th IEEE Real-Time and Embedded Tech. Appl. Symp., pp 255–264, https://doi.org/10.1109/RTTAS.2004.1317271
Li, K.: Non-clairvoyant scheduling of independent parallel tasks on single and multiple multicore processors. J. Parallel Distrib. Comput. 133, 210–220 (2019). https://doi.org/10.1016/j.jpdc.2018.06.001
Article Google Scholar
Liu, JWS., Lin, KJ., Shih, WK. et al (1991) Algorithms for scheduling imprecise computations. In: van Tilborg AM, Koob GM (eds) Foundations of Real-Time Computing: Scheduling and Resource Management. Springer, pp 203–249, https://doi.org/10.1007/978-1-4615-3956-8_8
Lopez, O.: A generalization of the kaplan-meier estimator for analyzing bivariate mortality under right-censoring and left-truncation with applications in model-checking for survival copula models. Insur. Math. Econom. 51(3), 505–516 (2012). https://doi.org/10.1016/j.insmatheco.2012.07.009
Article MathSciNet MATH Google Scholar
Mao, M., Li, J., Humphrey, M.: (2010) Cloud auto-scaling with deadline and budget constraints. In: 2010 11th IEEE/ACM International Conference on Grid Computing. IEEE, pp 41–48, https://doi.org/10.1109/GRID.2010.5697966
Meng, J., Chakradhar, S., Raghunathan, A.: (2009) Best-effort parallel execution framework for recognition and mining applications. In: IPDPS. IEEE, https://doi.org/10.1109/IPDPS.2009.5160991
Oprescu, AM., Kielmann, T.: (2010) Bag-of-tasks scheduling under budget constraints. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp 351–359, https://doi.org/10.1109/CloudCom.2010.32
Oprescu, A.M., Kielmann, T., Leahu, H.: Budget estimation and control for bag-of-tasks scheduling in clouds. Parallel Process. Lett. 21(02), 219–243 (2011). https://doi.org/10.1142/S0129626411000175
Article MathSciNet Google Scholar
Oprescu, AM., Kielmann, T., Leahu, H.: (2012) Stochastic tail-phase optimization for bag-of-tasks execution in clouds. In: Fifth Int. Confs. on Utility and Cloud Computing. IEEE, pp 204–208, https://doi.org/10.1109/UCC.2012.23
Samoladas, I., Angelis, L., Stamelos, I.: Survival analysis on the duration of open source projects. Inform. Soft. Technol. 52(9), 902–922 (2010)
Article Google Scholar
Scanniello, G.: (2011) Source code survival with the kaplan meier. In: 2011 27th IEEE International Conference on Software Maintenance (ICSM), pp 524–527
Scanniello, G.: (2011) Source code survival with the Kaplan Meier estimator. In: 27th IEEE International Conference on Software Maintenance (ICSM). IEEE, pp 524–527
Singh, P., Khan, B., Vidyarthi, A., et al.: Energy-aware online non-clairvoyant scheduling using speed scaling with arbitrary power function. Appl. Sci. (2019). https://doi.org/10.3390/app9071467
Article Google Scholar
Thai, L., Varghese, B., Barker, A.: A survey and taxonomy of resource optimisation for executing bag-of-task applications on public clouds. Future Gener. Comput. Syst. 82, 1–11 (2018). https://doi.org/10.1016/j.future.2017.11.038
Article Google Scholar
der Vaart, V.: Asymptotic Statistics. Cambridge University Press (1998)
Book MATH Google Scholar
Vecchiola, C., Calheiros, R.N., Karunamoorthy, D., et al.: Deadline-driven provisioning of resources for scientific applications in hybrid clouds with aneka. Future Gener. Comput. Syst. 28(1), 58–65 (2012). https://doi.org/10.1016/j.future.2011.05.008
Article Google Scholar
Xie, J., Liu, C.: Adjusted kaplan-meier estimator and log-rank test with inverse probability of treatment weighting for survival data. Stat. Med. 24(20), 3089–3110 (2005). https://doi.org/10.1002/sim.2174
Article MathSciNet Google Scholar

Download references

Acknowledgements

We would like to thank the reviewers for their comments and suggestions, which greatly helped improve the final version of the paper. The work of Yiqin Gao was supported by the LABEX MILYON (ANR-10-LABX-0070) of Université de Lyon, within the program “Investissements d’Avenir” (ANR-11-IDEX-0007) operated by the French National Research Agency (ANR).

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Yiqin Gao
Laboratoire LIP, ENS Lyon & Inria, Lyon, France
Yves Robert & Frédéric Vivien
University of Tennessee Knoxville, Knoxville, USA
Yves Robert

Authors

Yiqin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yves Robert
View author publications
You can also search for this author in PubMed Google Scholar
Frédéric Vivien
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yves Robert.

Ethics declarations

Conflict of interest

There is no conflict of interest for this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Proof of Theorem 2

We start by recalling some classical results for a unimodal exponential distribution of rate $\lambda $, whose probability density function and cumulative distribution function are respectively denoted by f(x) and F(x) (here we do not add any constant or, in other words, $\delta = 0$):

Probability that the random variable is no greater than l: $P(X \le l) = F(l) = 1-e^{-\lambda l}$.
Average value of a random variable which is no greater than l: $ {\mathbb {E}}(X \le l) = \int _{x=0}^l x f(x) dx = -l e^{-\lambda l} + \frac{1}{\lambda } \left( 1-e^{-\lambda l}\right) $.
Average value of the time spent executing a task whose execution time is defined by an exponential distribution of parameter $\lambda $ when the execution is cut at time l if the task has not completed by that time:
$$\begin{aligned} \left( -l e^{-\lambda l} + \frac{1}{\lambda } \left( 1-e^{-\lambda l}\right) \right) +l(1-F(l)) = \frac{1}{\lambda } \left( 1-e^{-\lambda l}\right) . \end{aligned}$$

When the cutting threshold l is smaller than or equal to the constant time $\delta $, the yield is obviously null: ${\mathcal {Y}} (l)=0$. Therefore, we only consider the yield for cutting thresholds greater than or equal to $\delta $. To ease the writing, let $y(l)={\mathcal {Y}} (l+\delta )$. Using Eq. 2, we derive the expression of the yield when the cutting threshold is set at $l+\delta $:

$$\begin{aligned} y(l)&= {\mathcal {Y}} (l+\delta )\\&= \frac{p \left( 1-e^{-\lambda l}\right) + (1-p) \left( 1-e^{-\mu l}\right) }{\delta +p\left( \frac{1}{\lambda } \left( 1-e^{-\lambda l}\right) \right) +(1-p)\left( \frac{1}{\mu } \left( 1-e^{-\mu l}\right) \right) }\\&= \frac{p \left( 1-e^{-\lambda l}\right) + (1-p) \left( 1-e^{-\mu l}\right) }{\delta +\frac{p}{\lambda } \left( 1-e^{-\lambda l}\right) +\frac{(1-p)}{\mu } \left( 1-e^{-\mu l}\right) }\\\end{aligned}$$

In order to identify the maximum of the function y(l), we differentiate it:

$$\begin{aligned} y'(l)&\scriptstyle = \frac{ \left( \lambda p e^{-\lambda l} + \mu (1-p)e^{-\mu l}\right) \left( \delta +\frac{p}{\lambda } \left( 1-e^{-\lambda l}\right) +\frac{(1-p)}{\mu } \left( 1-e^{-\mu l}\right) \right) }{\left( \delta +\frac{p}{\lambda } \left( 1-e^{-\lambda l}\right) +\frac{(1-p)}{\mu } \left( 1-e^{-\mu l}\right) \right) ^2}\\&\scriptstyle - \frac{\left( p \left( 1-e^{-\lambda l}\right) + (1-p)\left( 1-e^{-\mu l}\right) \right) \left( p e^{-\lambda l} + (1-p)e^{-\mu l} \right) }{\left( \delta +\frac{p}{\lambda } \left( 1-e^{-\lambda l}\right) +\frac{(1-p)}{\mu } \left( 1-e^{-\mu l}\right) \right) ^2}\\\end{aligned}$$

Because we are only interested by the sign of $ y'(l)$, we focus on its numerator, that we denote by g(l):

$$\begin{aligned} g(l)&\scriptstyle = \left( \lambda p e^{-\lambda l} + \mu (1-p)e^{-\mu l}\right) \left( \delta +\frac{p}{\lambda } \left( 1-e^{-\lambda l}\right) +\frac{(1-p)}{\mu } \left( 1-e^{-\mu l}\right) \right) \\&\scriptstyle \quad - \left( p \left( 1-e^{-\lambda l}\right) + (1-p)\left( 1-e^{-\mu l}\right) \right) \left( p e^{-\lambda l} + (1-p)e^{-\mu l} \right) \\&\scriptstyle = \delta \left( \lambda p e^{-\lambda l} + \mu (1-p)e^{-\mu l}\right) \\&\scriptstyle \quad + p(1-p)\left( \left( \frac{\lambda }{\mu }-1\right) e^{-\lambda l} \left( 1-e^{-\mu l}\right) + \left( \frac{\mu }{\lambda }-1\right) e^{-\mu l} \left( 1-e^{-\lambda l}\right) \right) \\&\scriptstyle = e^{-\lambda l} \left( \delta \left( \lambda p + \mu (1-p)e^{-(\mu -\lambda )l}\right) \right. \\&\scriptstyle \qquad \qquad \qquad \left. + p(1-p)\frac{\mu -\lambda }{\lambda \mu }\left( -\lambda \left( 1-e^{-\mu l}\right) + \mu e^{-\mu l} \left( e^{\lambda l}-1\right) \right) \right) . \end{aligned}$$

We focus on the expression

$$\begin{aligned} h(l)&= -\lambda \left( 1-e^{-\mu l}\right) + \mu e^{-\mu l} \left( e^{\lambda l}-1\right) \\&= -\lambda +\lambda e^{-\mu l} + \mu e^{-(\mu -\lambda ) l} - \mu e^{-\mu l}. \end{aligned}$$

We differentiate h (with respect to l):

$$\begin{aligned} h'(l)&= -\lambda \mu e^{-\mu l} - \mu (\mu -\lambda ) e^{-(\mu -\lambda ) l} + \mu ^2 e^{-\mu l}\\&= \mu e^{-\mu l} (\mu -\lambda )\left( 1 - e^{\lambda l} \right) . \end{aligned}$$

Because $\mu -\lambda $, $\lambda $ and l are all nonnegative, $1 - e^{\lambda l} \le 0$ and $h'(l)$ is nonpositive. Therefore, h(l) is a non-increasing function.

Clearly, g(l) and $k(l) = g(l) e^{\lambda l}$ have the same sign. From what precedes, $k(l) = g(l) e^{\lambda l}$ is the sum of two non-increasing functions, the functions $\delta (\lambda p + \mu (1-p)e^{-(\mu -\lambda )l})$ and $p(1-p)\frac{\mu -\lambda }{\lambda \mu } h(l)$. The maximum of k(l) is thus reached for $l=0$ and its minimum is reached when l tends toward $+\infty $:

$$\begin{aligned} k(0)&= \delta \left( \lambda p + \mu (1-p)\right) ; \\ \lim _{l \rightarrow +\infty } k(l)&= \delta \lambda p + p(1-p)\frac{\mu -\lambda }{\lambda \mu }\left( -\lambda \right) \nonumber \\&= \delta \lambda p - p(1-p)\frac{\mu -\lambda }{\mu }\\&= p\left( \delta \lambda - (1-p)\frac{\mu -\lambda }{\mu }\right) . \end{aligned}$$

To conclude, we have several cases to consider:

1.
$\delta =0$ (arbitrarily small execution times are allowed). Then, $k(0)=0$ and $\lim _{l \rightarrow +\infty } k(l) \le 0$. We have two subcases to consider:
1. (a)
  $p(1-p)(\mu -\lambda ) = 0$. In this case, we have a monomodal exponential distribution. Then, $y'(l)$ is null, ${\mathcal {Y}} (l)=y(l)$ is constant and equal to $\lambda $. In other words, in this case the yield is optimal whatever the value chosen for the threshold.
2. (b)
  $p(1-p)(\mu -\lambda ) \ne 0$. In this case, ${\mathcal {Y}} (l)=y(l)$ is a decreasing function and its maximum is achieved when $l=0$, and the optimum yield is then $\lim _{l \rightarrow 0} {\mathcal {Y}} (l)$. To compute this limit we use the equivalent to $e^{-x}$ in 0 which is $1-x$. We obtain:
  $$\begin{aligned} {\mathcal {Y}} (l)&= \frac{p \left( 1-e^{-\lambda l}\right) + (1-p) \left( 1-e^{-\mu l}\right) }{c+\frac{p}{\lambda } \left( 1-e^{-\lambda l}\right) +\frac{(1-p)}{\mu } \left( 1-e^{-\mu l}\right) }\\&\approx \frac{p \left( \lambda l\right) + (1-p)\left( \mu l\right) }{\frac{p}{\lambda } \lambda l+\frac{(1-p)}{\mu } \mu l}\\&= p \lambda + (1-p) \mu . \end{aligned}$$
  Remark: this counter-intuitive result means that the shortest the threshold, the better. It means that, in practice, the scheduler should stop each task as soon as it is started. Obviously, this is not achievable in practice. This peculiar property is a consequence of allowing execution times to be arbitrarily small, as the remainder of this case study will illustrate.
2.
$\delta > 0$. Once again, we have two subcases to consider:
- $p(\delta \lambda - (1-p)\frac{\mu -\lambda }{\mu }) \ge 0$. Then k(l), and thus g(l) and $y'(l)$ are nonnegative for all values of l. y and ${\mathcal {Y}} $ are thus increasing and we should never abort the execution of a running task (threshold = $+\infty $). The optimum yield in this case is then:
  $$\begin{aligned} \lim _{l \rightarrow +\infty } {\mathcal {Y}} (l)&= \frac{1}{\delta +\frac{p}{\lambda } +\frac{(1-p)}{\mu } }\cdot \end{aligned}$$
- $p(\delta \lambda - (1-p)\frac{\mu -\lambda }{\mu }) < 0$ which can be rewritten $\delta < (1-p)\frac{\mu -\lambda }{\lambda \mu }$. In this subcase, we have $y'(l)$ which is a decreasing function, with $y'(0) >0$ and $\lim _{l \rightarrow +\infty } k(l) < 0$ which implies that $y'(l)$ and ${\mathcal {Y}} '(l)$ are negative when l is sufficiently large. Therefore, ${\mathcal {Y}} (l+\delta )$ is first increasing and then decreasing, and has a unique maximum. This maximum is achieved for l satisfying $k(l) = 0$, which could only be solved numerically.

1.2 Additional Graphs and Statistics

We report here the theoretical threshold and the performance for the distributions that were not illustrated in the core of the article. We also report the performance of heuristics when the budget is large with respect to the average task execution time ($b =1000$).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gao, Y., Robert, Y. & Vivien, F. Resource-Constrained Scheduling Algorithms for Stochastic Independent Tasks With Unknown Probability Distribution. Algorithmica 85, 2363–2394 (2023). https://doi.org/10.1007/s00453-023-01100-8

Download citation

Received: 23 December 2021
Accepted: 19 January 2023
Published: 17 February 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00453-023-01100-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resource-Constrained Scheduling Algorithms for Stochastic Independent Tasks With Unknown Probability Distribution

Abstract

Access this article

Similar content being viewed by others

Two Deadline Reduction Algorithms for Scheduling Dependent Tasks on Parallel Processors

Energy constrained scheduling of stochastic tasks

Processor Bounding for an Efficient Non-preemptive Task Scheduling Algorithm

References

Acknowledgements