An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems

Malekimajd, M.; Ardagna, D.; Ciavotta, M.; Gianniti, E.; Passacantando, M.; Rizzi, A. M.

doi:10.1007/s11227-018-2426-2

An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems

Published: 25 May 2018

Volume 74, pages 5314–5348, (2018)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

M. Malekimajd ORCID: orcid.org/0000-0001-9094-7676¹,
D. Ardagna²,
M. Ciavotta²,
E. Gianniti²,
M. Passacantando³ &
…
A. M. Rizzi²

300 Accesses
4 Citations
Explore all metrics

Abstract

Nowadays, we live in a Big Data world and many sectors of our economy are guided by data-driven decision processes. Big Data and Business Intelligence applications are facilitated by the MapReduce programming model, while, at infrastructural layer, cloud computing provides flexible and cost-effective solutions to provide on-demand large clusters. Capacity allocation in such systems, meant as the problem of providing computational power to support concurrent MapReduce applications in a cost-effective fashion, represents a challenge of paramount importance. In this paper we lay the foundation for a solution implementing admission control and capacity allocation for MapReduce jobs with a priori deadline guarantees. In particular, shared Hadoop 2.x clusters supporting batch and/or interactive jobs are targeted. We formulate a linear programming model able to minimize cloud resources costs and rejection penalties for the execution of jobs belonging to multiple classes with deadline guarantees. Scalability analyses demonstrated that the proposed method is able to determine the global optimal solution of the linear problem for systems including up to 10,000 classes in less than 1 s.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On Scheduling Algorithms for MapReduce Jobs in Heterogeneous Clouds with Budget Constraints

On the Optimal Number of Computational Resources in MapReduce

Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center

Article 12 March 2020

Notes

A scheduler is defined to be work conserving if it never lets a processor lie idle, while there are runnable tasks in the system, i.e., Application Masters in a queue can borrow containers from other empty queues.
During the Shuffle phase data from the mapper tasks are moved to the nodes where the reducer tasks will run. As in [14] we distinguish between the first and the typical Shuffle, since they are characterized by significantly different performance.
Note that in Hadoop 1.X, each node resources can be partitioned between slots assigned to Map tasks and slots assigned to Reduce tasks. In Hadoop 2.x, the resource capacity configured for each container is suitable to both Map and Reduce tasks and cannot be partitioned anymore [39]. The maximum number of concurrent mappers and reducers (the slot count) is calculated by YARN based on administrator settings [40]. A node is eligible to run a task when its available memory and CPU can satisfy the task resource requirement. With our hypothesis above, we assume that the configuration settings is such that whatever combination of Map and Reduce tasks can be executed within a container, no vCPU remains idle because of a wrong setting of these parameters.
For example, the TPC-DS benchmark, designed to be representative of real data warehouse systems, includes 99 queries that, in the worst case, can be modeled as individual job classes.
http://www.tpc.org/tpcds/.

References

Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel JM, Ramakrishnan R, Shahabi C (2014) Big data and its technical challenges. Commun ACM 57(7):86–94
Article Google Scholar
Chen CP, Zhang C-Y (2014) Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf Sci 275:314–347
Article Google Scholar
Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Byers AH (2012) Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Institute, New York
Google Scholar
Lee K-H, Lee Y-J, Choi H, Chung YD, Moon B (2012) Parallel data processing with mapreduce: a survey. SIGMOD Rec 40(4):11–20
Article Google Scholar
Yan F, Cherkasova L, Zhang Z, Smirni E (2014) Optimizing power and performance trade-offs of MapReduce job processing with heterogeneous multi-core processors. In: CLOUD
Kambatla K, Kollias G, Kumar V, Grama A (2014) Trends in big data analytics. J Parallel Distrib Comput 74(7):2561–2573
Article Google Scholar
Elastic compute cloud (ec2). http://aws.amazon.com/ec2. Accessed 25 May 2018
Microsoft Azure. http://azure.microsoft.com/en-us/services/hdinsight/. Accessed 25 May 2018
The digital universe in 2020. https://www.idc.com/getdoc.jsp?containerId=prUS43511618. Accessed 25 May 2018
Polo J, Carrera D, Becerra Y, Torres J, Ayguad E, Steinder M, Whalley I (2010) Performance-driven task co-scheduling for mapreduce environments. In: NOMS
Rao BT, Reddy LSS (2012) Survey on improved scheduling in hadoop MapReduce in cloud environments. CoRR, abs/1207.0780
Zhang Z, Cherkasova L, Loo BT (2015) Exploiting cloud heterogeneity to optimize performance and cost of MapReduce processing. SIGMETRICS Perform Eval Rev 42(4):38–50
Article Google Scholar
Zhang Z, Cherkasova L, Verma A, Loo BT (2012) Automated profiling and resource management of pig programs for meeting service level objectives. In: ICAC
Verma A, Cherkasova L, Campbell RH (2011) ARIA: automatic resource inference and allocation for Mapreduce environments. In: ICAC
Lin M, Zhang L, Wierman A, Tan V (2013) Joint optimization of overlapping phases in MapReduce. SIGMETRICS Perform Eval Rev 41(3):16–18
Article Google Scholar
Chang R (ed) (2012) 2012 IEEE Fifth International Conference on Cloud Computing, Honolulu, HI, USA, 24–29 June 2012. IEEE Computer Society
Delimitrou C, Bambos N, Kozyrakis C (2013) Qos-aware admission control in heterogeneous datacenters. In: Kephart JO, Pu C, Zhu X (eds) 10th International Conference on Autonomic Computing, ICAC’13, San Jose, CA, USA, 26–28 June 2013. USENIX Association, pp 291–296
Khazaei H, Misic JV, Misic V, Rashwand S (2013) Analysis of a pool management scheme for cloud computing centers. IEEE Trans Parallel Distrib Syst 24(5):849–861
Article Google Scholar
Konstanteli K, Cucinotta T, Psychas K, Varvarigou TA Admission control for elastic cloud services. In: Chang (2012), pp 41–48
Wu L, Garg SK, Buyya R (2012) Sla-based admission control for a software-as-a-service provider in cloud computing environments. J Comput Syst Sci 78(5):1280–1299
Article Google Scholar
Xiong P, Chi Y, Zhu S, Tatemura J, Pu C, Hacigümüs H (2011) Activesla: a profit-oriented admission control framework for database-as-a-service providers. In: Chase JS, Abbadi AE (eds) ACM Symposium on Cloud Computing in Conjunction with SOSP 2011, SOCC ’11, Cascais, Portugal, 26–28 Oct 2011. ACM, p 15
Cerf S, Berekmeri M, Robu B, Marchand N, Bouchenak S (2016) Towards control of MapReduce performance and availability. In: Fast Abstract in the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Dhok J, Maheshwari N, Varma V (2010) Learning based opportunistic admission control algorithm for mapreduce as a service. In: Padmanabhuni S, Aggarwal SK, Bellur U (eds) Proceeding of the 3rd Annual India Software Engineering Conference, ISEC 2010, Mysore, India, 25–27 Feb 2010. ACM, pp 153–160
Capacity scheduler. http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/CapacityHrBScheduler.htmlHrB. Accessed 25 May 2018
Malekimajd M, Rizzi AM, Ardagna D, Ciavotta M, Passacantando M, Movaghar A Optimal Capacity Allocation for Executing Map Reduce jobs in Cloud Systems Politecnico di Milano Technical Report n. 2014.11. http://home.deib.polimi.it/ardagna/MapReduceTechReport2014-11.pdf. Accessed 25 May 2018
Malekimajd M, Rizzi AM, Ardagna D, Ciavotta M, Passacantando M, Movaghar A (2014) Optimal capacity allocation for executing mapreduce jobs in cloud systems. In: Winkler F, Negru V, Ida T, Jebelean T, Petcu D, Watt SM, Zaharie D (eds) 16th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2014, Timisoara, Romania, 22–25 Sept 2014. IEEE Computer Society, pp 385–392
Hadoop Yarn
Curino C, Difallah DE, Douglas C, Krishnan S, Ramakrishnan R, Rao S (2014) Reservation-based scheduling: if you’re late don’t blame us! In: SoCC
Zhang W, Rajasekaran S, Duan S, Wood T, Zhu M (2015) Minimizing interference and maximizing progress for hadoop virtual machines. SIGMETRICS Perform Eval Rev 42(4):62–71
Article Google Scholar
Ciavotta M, Ardagna D (2016) Optimization Tools–Initial Version. http://wp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2016/08/D3.8_DICE-optimization-tools-Initial-version.pdf. Accessed 25 May 2018
Apache tez
Ardagna D, Squillante MS (2015) Special issue on performance and resource management in big data applications. SIGMETRICS Perform Eval Rev 42(4):2
Article Google Scholar
Herodotou H, Lim V, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: CIDR
Poggi N, Carrera D, Call A, Mendoza S, Becerra Y, Torres J, Ayguad E, Gagliardi F, Labarta J, Reinauer R, Vujic N, Green D, Blakeley JA (2014) Aloja: a systematic study of hadoop deployment variables to enable automated characterization of cost-effectiveness. In: BigData Conference, pp 905–913
Tian F, Chen K (2011) Towards optimal resource provisioning for running MapReduce programs in public clouds. In: CLOUD
Verma A, Cherkasova L, Campbell RH (2014) Profiling and evaluating hardware choices for mapreduce environments: an application-aware approach. Perform Eval 79:328–344
Article Google Scholar
Amazon elastic mapreduce
Microsoft hdinsight
Getting MapReduce 2 Up to Speed. http://blog.cloudera.com/blog/2014/02/getting-mapreduce-2-up-to-speed/. Accessed 25 May 2018
Apache Hadoop YARN: Avoiding 6 Time-Consuming “Gotchas”. http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/ . Accessed 25 May 2018
Castiglione A, Gribaudo M, Iacono M, Palmieri F (2014) Exploiting mean field analysis to model performances of big data architectures. Future Gener Comput Syst 37:203–211
Article Google Scholar
Ardagna D, Bernardi S, Gianniti E, Aliabadi SK, Perez-Palacin D, Requeno JI (2016) Modeling performance of hadoop applications: a journey from queueing networks to stochastic well formed nets. In: Carretero J, Blas JG, Ko RKL, Mueller P, Nakano K (eds) Algorithms and Architectures for Parallel Processing—16th International Conference, ICA3PP 2016, Granada, Spain, 14–16 Dec 2016, Proceedings, volume 10048 of Lecture Notes in Computer Science. Springer, pp 599–613
Gómez A, Merseguer J, Nitto ED, Tamburri DA (2016) Towards a UML profile for data intensive applications. In: Ardagna D, Casale G, van Hoorn A, Willnecker F (eds) Proceedings of the 2nd International Workshop on Quality-aware DevOps, QUDOS@ISSTA 2016, Saarbrücken, Germany, 21 July 2016. ACM, pp 18–23
Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, Saha B, Curino C, O’Malley O, Radia S, Reed B, Baldeschwieler E (2013) Apache hadoop YARN: yet another resource negotiator. In: Lohman GM (ed) ACM Symposium on Cloud Computing, SOCC ’13, Santa Clara, CA, USA, 1–3 Oct 2013. ACM, pp 5:1–5:16
Lazowska ED, Zahorjan J, Graham GS, Sevcik KC (1984) Quantitative system performance—computer system analysis using queueing network models. Prentice Hall, Upper Saddle River
Google Scholar
Ardagna D, Panicucci B, Passacantando M (2013) Generalized nash equilibria for the service provisioning problem in cloud systems. IEEE Trans Serv Comput 6(4):429–442
Article Google Scholar
Zhang Q, Zhu Q, Zhani MF, Boutaba R (2012) Dynamic service placement in geographically distributed clouds. In: ICDCS
Anselmi J, Ardagna D, Passacantando M (2014) Generalized Nash equilibria for SaaS/PaaS clouds. Eur J Oper Res 236(1):326–339
Article MathSciNet Google Scholar
Verma A, Cherkasova L, Campbell RH (2011) Resource provisioning framework for MapReduce jobs with performance goals. In: Middleware
Gianniti E, Rizzi AM, Barbierato E, Gribaudo M, Ardagna D (2017) Fluid petri nets for the performance evaluation of mapreduce and spark applications. SIGMETRICS Perform Eval Rev 44(4):23–36
Article Google Scholar
Herodotou H, Dong F, Babu S (2011) No one (cluster) size fits all: automatic cluster sizing for data-intensive analytics. In: SOCC
Bardhan S, Menascé DA (2012) Queuing network models to predict the completion time of the map phase of mapreduce jobs. In: International of the CMG Conference
Vianna E, Comarela G, Pontes T, Almeida JM, Almeida VAF, Wilkinson K, Kuno HA, Dayal U (2013) Analytical performance models for mapreduce workloads. Int J Parallel Program 41(4):495–525
Article Google Scholar
Tan J, Wang Y, Yu W, Zhang L (2014) Non-work-conserving effects in MapReduce: diffusion limit and criticality. In: SIGMETRICS
Castiglione A, Gribaudo M, Iacono M, Palmieri F (2014) Exploiting mean field analysis to model performances of big data architectures. Future Gener Comput Syst 37:203–211
Article Google Scholar
Phan LTX, Zhang Z, Zheng Q, Loo BT, Lee I (2011) An empirical analysis of scheduling techniques for real-time cloud-based data processing. In: SOCA
Morton K, Balazinska M, Grossman D (2010) Paratimer: a progress indicator for MapReduce dags. In: SIGMOD
Morton K, Friesen A, Balazinska M, Grossman D (2010) Estimating the progress of MapReduce pipelines. In: ICDE
Tian W, Li G, Yang W, Buyya R (2016) HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs. J Supercomput 72(6):2376–2393
Article Google Scholar
Zhang Z, Cherkasova L, Loo BT (2014) Exploiting cloud heterogeneity for optimized cost/performance MapReduce processing. In: CloudDP
Beltrán M (2015) Automatic provisioning of multi-tier applications in cloud computing environments. J Supercomput 71(6):2221–2250
Article Google Scholar
Xiong PP, Chi Y, Zhu S, Tatemura J, Pu C, Hacigümüs H (2011) ActiveSLA: a profit-oriented admission control framework for database-as-a-service providers. In: SOCC
Impala admission control. https://www.cloudera.com/documentation/enterprise/5-12-x/topics/impala_admission.html. Accessed 25 May 2018
Yao Y, Lin J, Wang J, Mi N, Sheng B (2015) Admission control in YARN clusters based on dynamic resource reservation. In: Badonnel R, Xiao J, Ata S, Turck FD, Groza V, dos Santos CRP (eds) IFIP/IEEE International Symposium on Integrated Network Management, IM 2015, Ottawa, ON, Canada, 11–15 May 2015. IEEE, pp 838–841
Baranwal G, Vidyarthi DP (2016) Admission control in cloud computing using game theory. J Supercomput 72(1):317–346
Article Google Scholar
Yan F, Cherkasova L, Zhang Z, Smirni E (2014) Heterogeneous cores for MapReduce processing: opportunity or challenge? In: NOMS

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
M. Malekimajd
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
D. Ardagna, M. Ciavotta, E. Gianniti & A. M. Rizzi
Dipartimento di Informatica, Università di Pisa, Pisa, Italy
M. Passacantando

Authors

M. Malekimajd
View author publications
You can also search for this author in PubMed Google Scholar
D. Ardagna
View author publications
You can also search for this author in PubMed Google Scholar
M. Ciavotta
View author publications
You can also search for this author in PubMed Google Scholar
E. Gianniti
View author publications
You can also search for this author in PubMed Google Scholar
M. Passacantando
View author publications
You can also search for this author in PubMed Google Scholar
A. M. Rizzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Malekimajd.

Additional information

The simulations and numerical analyses have been performed under the Windows Azure Research Pass 2013 Grant. The work of Marzieh Malekimajd has been supported by the European Commission Grant No. FP7-ICT-2011-8-318484 (MODAClouds). The work of Danilo Ardagna, Michele Ciavotta, and Eugenio Gianniti has been supported by the Horizon 2020 research and innovation program under Grant Agreement No. 644869 (DICE). The work of Mauro Passacantando has been supported by the National Research Program No. PRIN/2015B5F27W$\_$001 (Nonlinear and Combinatorial Aspects of Complex Networks).

Appendices

Makespan bounds

In the following we report the results we presented in [32] providing an approximated formula for the estimation of MapReduce jobs. We consider a MapReduce system with up to $s_i^\text {M}$ and $s_i^\text {R}$ containers devoted for the Map and Reduce phase using the Capacity Scheduler.

Following the results in [14], the lower and upper bounds on the duration of the entire Map stage can be estimated as follows:

$$\begin{aligned} \begin{aligned} T_i^{\text {M}, \text {low}} =&\frac{N_i^\text {M} M_i^\text {avg}}{s_i^\text {M}} h_i \\ T_i^{\text {M}, \text {up}} =&\frac{N_i^\text {M} M_i^\text {avg} - 2M_i^\text {max}}{s_i^\text {M}} h_i+2M_i^\text {max} \end{aligned} \end{aligned}$$

where $M_i^\text {avg}$, $M_i^\text {max}$, $R_i^\text {avg}$, $R_i^\text {max}$, $S_i^{1, \text {avg}}$, $S_i^{1, \text {max}}$, $S_i^\text {avg}$, and $S_i^\text {max}$ denote the average and maximum duration of Map, Reduce, first Shuffle and typical Shuffle phases, respectively, while $N_i^\text {M}$ and $N_i^\text {R}$ are the number of Map and Reduce tasks (see Sect. 2). According to the results discussed in [14], we distinguish the non-overlapping portion of the first shuffle and the task durations in the typical shuffle. In the following bounds for the shuffle stage, this consideration affects the formula for $T_i^{\text {S}, \text {low}}$, where we subtract one wave:

$$\begin{aligned} \begin{aligned} T_i^{\text {S}, \text {low}} =&\left( \frac{N_i^\text {R}}{s_i^\text {R}} h_i - 1 \right) S_i^\text {avg} , \\ T_i^{\text {S}, \text {up}} =&\frac{N_i^\text {R} S_i^\text {avg} - 2 S_i^\text {max}}{s_i^\text {R}} h_i+2S_i^\text {max} . \end{aligned} \end{aligned}$$

$S_i^{1, \text {avg}}$ and $S_i^{1, \text {max}}$, the average and maximum execution time of the first shuffle phase, are estimated directly from the execution profile of class i.

Summing up all parts we get:

$$\begin{aligned} \begin{aligned} T_i^\text {low}&= T_i^{\text {M}, \text {low}} + S_i^{1, \text {avg}} + T_i^{\text {S}, \text {low}} + T_i^{\text {R}, \text {up}} , \\ T_i^\text {up}&= T_i^{\text {M}, \text {up}} + S_i^{1, \text {max}} + T_i^{\text {S}, \text {up}} + T_i^{\text {R}, \text {low}} . \end{aligned} \end{aligned}$$

$T_i^\text {low}$ and $T_i^\text {up}$ represent, respectively, an optimistic and a pessimistic prediction of class i job completion time. We also define $T_i^\text {avg} = \left( T_i^\text {up} + T_i^\text {low} \right) \!{/} 2$. Hence, the execution time of a class i job is at least:

$$\begin{aligned} T_i^\text {low} = \frac{\xi _i^{\text {M}, \text {low}} h_i}{s_i^\text {M}} + \frac{\xi _i^{\text {R}, \text {low}} h_i}{s_i^\text {R}} + \xi _i^{0, \text {low}} , \end{aligned}$$

(7a)

where

$$\begin{aligned} \xi _i^{\text {M}, \text {low}}&= N_i^\text {M} M_i^\text {avg} , \end{aligned}$$

(7b)

$$\begin{aligned} \xi _i^{\text {R}, \text {low}}&= N_i^\text {R} \left( S_i^\text {avg} + R_i^\text {avg} \right) , \end{aligned}$$

(7c)

$$\begin{aligned} \xi _i^{0, \text {low}}&= S_i^{1, \text {avg}}- S_i^\text {avg} . \end{aligned}$$

(7d)

In the same way, the execution time of a job of class i is at most:

$$\begin{aligned} T_i^\text {up} = \frac{\xi _i^{\text {M}, \text {up}} h_i}{s_i^\text {M}} + \frac{\xi _i^{\text {R}, \text {up}} h_i}{s_i^\text {R}} + \xi _i^{0, \text {up}} , \end{aligned}$$

(8a)

where

$$\begin{aligned} \xi _i^{\text {M}, \text {up}}&= N_i^\text {M} M_i^\text {avg} - 2 M_i^\text {max} , \end{aligned}$$

(8b)

$$\begin{aligned} \xi _i^{\text {R}, \text {up}}&= N_i^\text {R} Sh^\mathrm{typ{i}}_\mathrm{avg} - 2 Sh^\mathrm{typ{i}}_\mathrm{max} + N_i^\text {R} R_i^\text {avg} - 2 R_i^\text {max} , \end{aligned}$$

(8c)

$$\begin{aligned} \xi _i^{0, \text {up}}&= 2S_i^\text {max} + S_i^{1, \text {max}} + 2 M_i^\text {max}+2R_i^\text {max}. \end{aligned}$$

(8d)

Depending on the guarantees required, it is possible to adopt either a conservative approach to meeting deadlines with $T_i^\text {up}$ or a less resource-demanding one with $T_i^\text {avg}$.

In both cases (upper bounds or approximation), the formulae above reduce to constraints (P1b) and (P2b), by defining $\zeta _i^{0, D} \triangleq \xi _i^0 - D_i < 0$:

$$\begin{aligned} T_i = \frac{\xi _i^\text {M} h_i}{s_i^\text {M}} + \frac{\xi _i^\text {R} h_i}{s_i^\text {R}} + \xi _i^0 \le D_i \Rightarrow T_i = \frac{\xi _i^\text {M} h_i}{s_i^\text {M}} + \frac{\xi _i^\text {R} h_i}{s_i^\text {R}} + \zeta _i^{0, D} \le 0. \end{aligned}$$

Proofs

Proof of Theorem 1

Since all the constraints of problem (P2) are convex and the Slater constraints qualification holds, the KKT conditions are necessary for optimality. The Lagrangian function of problem (P2) is given by:

$$\begin{aligned} L&= \delta d + \rho r - \sum _{i\in {\mathcal {U}}}\frac{p_i}{\varPsi _i} +\sum _{i\in {\mathcal {U}}} \phi _i \left( \frac{\xi _i^\text {M}}{s_i^\text {M} \varPsi _i } + \frac{\xi _i^\text {R}}{s_i^\text {R} \varPsi _i } + \zeta _i^{0, D} \right) \\&\quad + \mu _r (r - {\bar{r}}) +\nu \left[ \sum _{i\in {\mathcal {U}}} \left( \frac{s_i^\text {M}}{c_i^\text {M}} + \frac{s_i^\text {R}}{c_i^\text {R}}\right) -r-d\right] \\&\quad +\sum _{i\in {\mathcal {U}}} \left[ \mu _i(\varPsi _i - \varPsi ^\text {up}_i)+\lambda _i(-\varPsi _i + \varPsi ^\text {low}_i)\right] \\&\quad - \lambda _r r - \lambda _d d \nonumber - \sum _{i\in {\mathcal {U}}} \left( \omega _i\, s_i^\text {M} +\chi _i s_i^\text {R}\, \right) . \end{aligned}$$

Therefore, the KKT conditions are the following:

$$\begin{aligned} \delta - \nu - \lambda _d = 0,&\end{aligned}$$

(9)

$$\begin{aligned} \rho - \nu + \mu _r - \lambda _r = 0,&\end{aligned}$$

(10)

$$\begin{aligned} \frac{\nu }{c_i^\text {M}} - \phi _i \frac{\xi _i^\text {M}}{({s_i^\text {M}})^2 \varPsi _i } -\omega _i = 0,&\quad \forall i \in {\mathcal {U}}, \end{aligned}$$

(11)

$$\begin{aligned} \frac{\nu }{c_i^\text {R}} - \phi _i \frac{\xi _i^\text {R}}{({s_i^\text {R}})^2 \varPsi _i } -\chi _i = 0,&\quad \forall i \in {\mathcal {U}}, \end{aligned}$$

(12)

$$\begin{aligned} \frac{p_i}{\varPsi _i^2} - \frac{\phi _i}{\varPsi _i^2} (\frac{\xi _i^\text {M}}{s_i^\text {M} } + \frac{\xi _i^\text {R}}{s_i^\text {R}}) + \mu _i - \lambda _i = 0,&\quad \forall i \in {\mathcal {U}}, \end{aligned}$$

(13)

$$\begin{aligned} \phi _i\left( \frac{\xi _i^\text {M}}{s_i^\text {M} \varPsi _i} + \frac{\xi _i^\text {R}}{s_i^\text {R} \varPsi _i} + \zeta _i^{0, D}\right) =0,&\quad \phi _i \ge 0 , \quad \forall i \in {\mathcal {U}}, \end{aligned}$$

(14)

$$\begin{aligned} \mu _r(r-{\bar{r}}) = 0,&\quad \mu _r \ge 0, \end{aligned}$$

(15)

$$\begin{aligned} \nu \left[ \sum _{i\in {\mathcal {U}}} \left( \frac{s_i^\text {M}}{c_i^\text {M}} + \frac{s_i^\text {R}}{c_i^\text {R}} \right) - r - d \right] = 0,&\quad \nu \ge 0, \end{aligned}$$

(16)

$$\begin{aligned} \mu _i (\varPsi _i - \varPsi ^\text {up}_i) = 0,&\quad \mu _i \ge 0, \quad \forall i \in {\mathcal {U}}, \end{aligned}$$

(17)

$$\begin{aligned} \lambda _i ( -\varPsi _i + \varPsi ^\text {low}_i ) =0,&\quad \lambda _i \ge 0, \quad \forall i \in {\mathcal {U}}, \end{aligned}$$

(18)

$$\begin{aligned} \lambda _r\, r =0,&\quad \lambda _r \ge 0, \end{aligned}$$

(19)

$$\begin{aligned} \lambda _d\, d =0,&\quad \lambda _d \ge 0, \end{aligned}$$

(20)

$$\begin{aligned} \omega _i\, s_i^\text {M} =0,&\quad \omega _i \ge 0, \quad \forall i \in {\mathcal {U}}, \end{aligned}$$

(21)

$$\begin{aligned} \chi _i\, s_i^\text {R} =0,&\quad \chi _i \ge 0, \quad \forall i \in {\mathcal {U}}. \end{aligned}$$

(22)

Constraints (P2b) imply that $s_i^\text {M}$ and $s_i^\text {R}$ are positive; hence, multipliers $\omega _i$ and $\chi _i$ are equal to zero. As the adoption of reserved instances is favored, being cheaper than the on-demand ones, we obtain $r>0$ and $\lambda _r=0$. Furthermore, we have $\nu = \rho + \mu _r \ge \rho >0$. Therefore, Eq. (11) guarantees that $\phi _i > 0$ for all $i \in {\mathcal {U}}$; hence, constraints (P2b) hold as equalities.

Finally, we can use Eqs. (P2b), (11) and (12) to compute $s_i^\text {M}$ and $s_i^\text {R}$ as a function of $\varPsi _i$. First, we calculate the relation between $s_i^\text {M}$ and $s_i^\text {R}$ by using conditions (11) and (12) as follows:

$$\begin{aligned} \frac{\xi _i^\text {M}}{({s_i^\text {M}})^2 \varPsi _i } {c_i^\text {M}} =\frac{\xi _i^\text {R}}{({s_i^\text {R}})^2 \varPsi _i } {c_i^\text {R}} \ \ \Longleftrightarrow \ \ s_i^\text {M} = s_i^\text {R}\sqrt{\frac{\xi _i^\text {M}\, c_i^\text {M}}{\xi _i^\text {R}\, c_i^\text {R}}}. \end{aligned}$$

Then, we can replace $s_i^\text {M}$ by $\displaystyle s_i^\text {R}\sqrt{\frac{\xi _i^\text {M}\, c_i^\text {M}}{\xi _i^\text {R}\, c_i^\text {R}}}$ into (P2b) to derive an explicit formulation for $s_i^\text {R}$, as follows:

$$\begin{aligned} s_i^\text {R} = - \frac{1}{\zeta _i^{0, D} \varPsi _i} \left( {\sqrt{\frac{\xi _i^\text {M}\, \xi _i^\text {R} c_i^\text {R}}{c_i^\text {M}}} } + \xi _i^\text {R} \right) , \end{aligned}$$

and along the same lines we can express $s_i^\text {M}$ in closed form as follows:

$$\begin{aligned} s_i^\text {M} = - \frac{1}{\zeta _i^{0, D} \varPsi _i} \left( \sqrt{\frac{\xi _i^\text {M}\, \xi _i^\text {R}\, c_i^\text {M}}{c_i^\text {R}}} + \xi _i^\text {M} \right) . \end{aligned}$$

$\square $

Proof of Theorem 2

Theorem 1 implies that the variables $s_i^\text {M}$ and $s_i^\text {R}$ can be written as in Eqs. (3) and (4). Hence, constraint (P3b) is obtained by replacing $s_i^\text {M}$ and $s_i^\text {R}$ in constraint (P2d). Moreover, constraints (P2b) can be dropped since they have been used to derive the value of $s_i^\text {M}$ and $s_i^\text {R}$. Hence, problem (P3) is equivalent to problem (P2). $\square $

Proof of Theorem 3

First, we notice that the KKT conditions of problem (P3) are necessary and sufficient for optimality since the problem is linear. The KKT conditions can be written as follows:

$$\begin{aligned} \rho - \nu + \mu _r - \lambda _r = 0,&\end{aligned}$$

(23)

$$\begin{aligned} \delta - \nu - \lambda _d = 0,&\end{aligned}$$

(24)

$$\begin{aligned} -p_i + \gamma _i\,\nu + \mu _i - \lambda _i = 0,&\quad \forall \ i\ \in {\mathcal {U}}, \end{aligned}$$

(25)

$$\begin{aligned} \nu \, \left( \sum _{i\in {\mathcal {U}}} \gamma _i\,h^*_i - r^* - d^* \right) = 0,&\end{aligned}$$

(26)

$$\begin{aligned} \lambda _r \, r^* = 0,&\end{aligned}$$

(27)

$$\begin{aligned} \mu _r \, (r^* - {\bar{r}}) = 0,&\end{aligned}$$

(28)

$$\begin{aligned} \lambda _d \, d^* = 0,&\end{aligned}$$

(29)

$$\begin{aligned} \lambda _i \, (h^*_i-H_i^\text {low}) = 0,&\quad \forall \ i\ \in {\mathcal {U}}, \end{aligned}$$

(30)

$$\begin{aligned} \mu _i \, (h^*_i-H_i^\text {up}) = 0,&\quad \forall \ i\ \in {\mathcal {U}}, \end{aligned}$$

(31)

$$\begin{aligned} \nu , \lambda _r, \mu _r, \lambda _d \ge 0,&\end{aligned}$$

(32)

$$\begin{aligned} \lambda _i, \mu _i \ge 0,&\quad \forall \ i\ \in {\mathcal {U}}. \end{aligned}$$

(33)

1.
Let us assume, by contradiction, that $r^* = 0$. Hence $d^* \ge \sum _{i\in {\mathcal {U}}} \gamma _i \, h^*_i \ge \sum _{i\in {\mathcal {U}}} \gamma _i \, H_i^\text {low} > 0,$ thus $\lambda _d=0$ and $\nu =\delta $. On the other hand, (28) implies that $\mu _r=0$ and $\lambda _r = \rho -\nu =\rho -\delta <0$, which is impossible.
2.
Since $r^*>0$, we have $\lambda _r=0$, hence (23) implies $\nu = \rho + \mu _r \ge \rho >0$, thus constraint (P3b) is active at $(r^*,d^*,h^*)$.
3.
It follows from (24) that $\nu =\delta - \lambda _d \le \delta $; hence, we have $\mu _i = \lambda _i + p_i - \gamma _i\,\nu \ge p_i - \gamma _i\,\nu \ge p_i - \gamma _i\,\delta > 0.$ Therefore $h^*_i=H_i^\text {up}$.
4.
Since $\nu \ge \rho $, we get $\lambda _i = \mu _i + \gamma _i\,\nu - p_i \ge \gamma _i\,\nu - p_i \ge \gamma _i\,\rho - p_i >0,$ hence $h^*_i=H_i^\text {low}$.
5.
We have $r^* = \sum _{i\in {\mathcal {U}}} \gamma _i\,h^*_i - d^* \le \sum _{i\in {\mathcal {U}}} \gamma _i\,H_i^\text {up} < {\bar{r}}, $ thus $\mu _r=0$ and $\nu =\rho $. Therefore, $\lambda _d = \delta - \rho >0$ implies $d^*=0$.
6.
We have $d^* = \sum _{i\in {\mathcal {U}}} \gamma _i\,h^*_i - r^* \ge \sum _{i\in {\mathcal {U}}} \gamma _i\,H_i^\text {low} - {\bar{r}} > 0, $ hence $\lambda _d=0$ and $\nu =\delta $. Therefore, $\mu _r = \delta - \rho >0$ implies $r^*={\bar{r}}$. $\square $

Proof of the closed-form optimal solution for the two-class case (Table 5)

First, we note that $\nu \in [\rho , \delta ]$, according to the properties (2) and (3) of Theorem 3. The following implications follow directly from the system (23)–(33):

$$\begin{aligned} \nu < \delta \quad \Longrightarrow \quad d^*=0, \\ \nu> \rho \quad \Longrightarrow \quad r^* = {\bar{r}}, \\ d^* > 0 \quad \Longrightarrow \quad r^* = {\bar{r}}. \end{aligned}$$

The proof is divided into six cases:

1.
Let $p_1/\gamma _1< p_2/\gamma _2 <\rho $. Theorem 3 implies $h^*_1=H_1^\text {low}$ and $h^*_2=H_2^\text {low}$. If ${\bar{r}}<\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low}$, then Theorem 3 guarantees that $r^*={\bar{r}}$ and $d^* = \gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low} - {\bar{r}}$. If ${\bar{r}}>\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low}$, then
$$\begin{aligned} r^* = \gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low} - d^* \le \gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low} < {\bar{r}}, \end{aligned}$$
thus $d^*=0$ and $r^*=\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low}$.
2.
Let $p_1/\gamma _1< \rho< p_2/\gamma _2 <\delta $. Theorem 3 implies $h^*_1=H_1^\text {low}$. We distinguish three cases.
1. (a)
  If $\nu \in [\rho , p_2/\gamma _2)$, then $\mu _2 \ge p_2-\gamma _2\,\nu >0$ hence $d^*=0$, $h^*_2=H_2^\text {up}$ and $r^*=\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up} \le {\bar{r}}$.
2. (b)
  If $\nu =p_2/\gamma _2$, then $d^*=0$, $r^*={\bar{r}}$ and $h^*_2=\left( {\bar{r}}-\gamma _1\,H_1^\text {low}\right) /\gamma _2$. In particular, we have $\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low} \le {\bar{r}} \le \gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up}.$
3. (c)
  If $\nu \in (p_2/\gamma _2 , \delta ]$, then $\lambda _2 \ge \gamma _2\,\nu - p_2 >0$ thus $r^*={\bar{r}}$, $h^*_2=H_2^\text {low}$ and $d^*=\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low} - {\bar{r}}$. In particular, we have ${\bar{r}} \le \gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low}$.
3.
Let $p_1/\gamma _1< \rho< \delta < p_2/\gamma _2$. It is similar to case 1. We have $h^*_1=H_1^\text {low}$ and $h^*_2=H_2^\text {up}$. If ${\bar{r}}<\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up}$, then $ d^* = \gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up} - r^* >0$, thus $r^*={\bar{r}}$ and $d^*=\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up} - {\bar{r}}$. If ${\bar{r}}>\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up}$, then $r^* = \gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up} - d^* < {\bar{r}}$, thus $d^*=0$ and $r^*=\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up}$.
4.
Let $\rho< p_1/\gamma _1< p_2/\gamma _2 < \delta $. We distinguish five cases.
1. (a)
  If $\nu \in [\rho , p_1/\gamma _1)$, then $d^*=0$. Furthermore, $\mu _1 \ge p_1-\gamma _1\,\nu >0$ and $\mu _2 \ge p_2-\gamma _2\,\nu >0$, hence $h^*_1=H_1^\text {up}$ and $h^*_2=H_2^\text {up}$. Thus, $r^*=\gamma _1\,H_1^\text {up}+\gamma _2\,H_2^\text {up} \le {\bar{r}}$.
2. (b)
  If $\nu =p_1/\gamma _1$, then $d^*=0$ and $r^*={\bar{r}}$. Since $\mu _2 \ge p_2-\gamma _2\,\nu >0$, we get $h^*_2=H_2^\text {up}$ and $h^*_1 = [{\bar{r}}-\gamma _2\,H_2^\text {up}]/\gamma _1$. In particular, we have $\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up} \le {\bar{r}} \le \gamma _1\,H_1^\text {up}+\gamma _2\,H_2^\text {up}$.
3. (c)
  If $\nu \in (p_1/\gamma _1 ,p_2/\gamma _2)$, then $d^*=0$ and $r^*={\bar{r}}$. Since $\lambda _1>0$ and $\mu _2>0$, we have $h^*_1=H_1^\text {low}$ and $h^*_2=H_2^\text {up}$. Thus, ${\bar{r}}=\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up}$.
4. (d)
  If $\nu =p_2/\gamma _2$, then $d^*=0$ and $r^*={\bar{r}}$. Since $\lambda _1 > 0$, we get $h^*_1=H_1^\text {low}$ and $h^*_2 = \left( {\bar{r}}-\gamma _1\,H_1^\text {low}\right) /\gamma _2$. In particular, we have $\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low} \le {\bar{r}} \le \gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up}$.
5. (e)
  If $\nu \in (p_2/\gamma _2 , \delta ]$, then $r^*={\bar{r}}$. Furthermore, $\lambda _1 >0$ and $\lambda _2 >0$ thus $h^*_1=H_1^\text {low}$, $h^*_2=H_2^\text {low}$ and $d^*=\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low} - {\bar{r}}$. In particular, we have ${\bar{r}} \le \gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {low}$.
5.
Let $\rho< p_1/\gamma _1< \delta < p_2/\gamma _2$. It is similar to case B. We have $h^*_2=H_2^\text {up}$ and distinguish three cases:
1. (a)
  If $\nu \in [\rho , p_1/\gamma _1)$, then $d^*=0$, $h^*_1=H_1^\text {up}$ and $r^*=\gamma _1\,H_1^\text {up}+\gamma _2\,H_2^\text {up} \le {\bar{r}}$.
2. (b)
  If $\nu =p_1/\gamma _1$, then $d^*=0$, $r^*={\bar{r}}$ and $h^*_1=\left( {\bar{r}}-\gamma _2\,H_2^\text {up}\right) /\gamma _1$. In particular, we have $\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up} \le {\bar{r}} \le \gamma _1\,H_1^\text {up}+\gamma _2\,H_2^\text {up}$.
3. (c)
  If $\nu \in (p_1/\gamma _1 , \delta ]$, then $r^*={\bar{r}}$, $h^*_1=H_1^\text {low}$ and $d^*=\gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up} - {\bar{r}}$. In particular, we have ${\bar{r}} \le \gamma _1\,H_1^\text {low}+\gamma _2\,H_2^\text {up}$.
6.
Let $\delta< p_1/\gamma _1 < p_2/\gamma _2$. It is similar to case 1. We have $h^*_1=H_1^\text {up}$ and $h^*_2=H_2^\text {up}$. If ${\bar{r}}<\gamma _1\,H_1^\text {up}+\gamma _2\,H_2^\text {up}$, then $d^* = \gamma _1\,H_1^\text {up}+\gamma _2\,H_2^\text {up} - r^* >0$, thus $r^*={\bar{r}}$ and $d^*=\gamma _1\,H_1^\text {up}+\gamma _2\,H_2^\text {up}-{\bar{r}}$. If ${\bar{r}}>\gamma _1\,H_1^\text {up}+\gamma _2\,H_2^\text {up}$, then $r^* = \gamma _1\,H_1^\text {up}+\gamma _2\,H_2^\text {up} - d^* < {\bar{r}}$, thus $d^*=0$ and $r^*=\gamma _1\,H_1^\text {up}+\gamma _2\,H_2^\text {up}$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malekimajd, M., Ardagna, D., Ciavotta, M. et al. An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems. J Supercomput 74, 5314–5348 (2018). https://doi.org/10.1007/s11227-018-2426-2

Download citation

Published: 25 May 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11227-018-2426-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems

Abstract

Access this article

Similar content being viewed by others

On Scheduling Algorithms for MapReduce Jobs in Heterogeneous Clouds with Budget Constraints

On the Optimal Number of Computational Resources in MapReduce

Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Makespan bounds

Proofs

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of the closed-form optimal solution for the two-class case (Table 5)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems

Abstract

Access this article

Similar content being viewed by others

On Scheduling Algorithms for MapReduce Jobs in Heterogeneous Clouds with Budget Constraints

On the Optimal Number of Computational Resources in MapReduce

Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Makespan bounds

Proofs

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of the closed-form optimal solution for the two-class case (Table 5)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation