Skip to main content

Scheduling Distributed Clusters of Parallel Machines : Primal-Dual and LP-based Approximation Algorithms


The Map-Reduce computing framework rose to prominence with datasets of such size that dozens of machines on a single cluster were needed for individual jobs. As datasets approach the exabyte scale, a single job may need distributed processing not only on multiple machines, but on multiple clusters. We consider a scheduling problem to minimize weighted average completion time of n jobs on m distributed clusters of parallel machines. In keeping with the scale of the problems motivating this work, we assume that (1) each job is divided into m “subjobs” and (2) distinct subjobs of a given job may be processed concurrently. When each cluster is a single machine, this is the NP-Hard concurrent open shop problem. A clear limitation of such a model is that a serial processing assumption sidesteps the issue of how different tasks of a given subjob might be processed in parallel. Our algorithms explicitly model clusters as pools of resources and effectively overcome this issue. Under a variety of parameter settings, we develop two constant factor approximation algorithms for this problem. The first algorithm uses an LP relaxation tailored to this problem from prior work. This LP-based algorithm provides strong performance guarantees. Our second algorithm exploits a surprisingly simple mapping to the special case of one machine per cluster. This mapping-based algorithm is combinatorial and extremely fast. These are the first constant factor approximations for this problem.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. Where we write “decreasing”, we mean “non-increasing.” Where we write “increasing”, we mean “non-decreasing”.

  2. A problem \(\alpha |\beta |\gamma \) implies a particular environment \(\alpha \), objective function \(\gamma \), and optional constraints \(\beta \).

  3. A permutation of the author’s names: Mastrolilli, Queyranne, Schulz, Svensson, and Uhan.

  4. We call such schedules “single-\(\sigma \) schedules.” As we will see later on, CC-TSPT serves as a constructive proof of existence of near-optimal single-\(\sigma \) schedules for all instances of \(CC||\sum w_j C_j\), including those instances for which single-\(\sigma \) schedules are strictly sub-optimal. This is addressed in Sect. 6.

  5. Here, our machine is machine \(\ell \) on cluster i.

  6. \(+p_{xit^*}\)”; see associated proofs.

  7. We omit the customary \(\star \) to avoid clutter in notation.


  1. Inc Amazon Web Services.: AWS Lambda - Serverless Compute. URL: 2016 Accessed 3 Apr 2016

  2. Zhi-Long, Chen., Nicholas, G.: Hall. Supply chain scheduling: assembly systems. Working paper., (2000). doi:10.1007/978-3-8349-8667-2

  3. Garg, Naveen, Kumar, Amit, Pandit, Vinayaka: Order scheduling models: hardness and algorithms. FSTTCS 2007: Found Softw Technol Theor Comput Sci 4855, 96–107 (2007). doi:10.1007/978-3-540-77050-3_8

    MathSciNet  MATH  Google Scholar 

  4. Gonzalez, Teofilo, Ibarra, Oscar, Sahni, Sartaj: Bounds for LPT schedules on uniform processors. SIAM J Comput 6(1), 155–166 (1977)

    MathSciNet  Article  MATH  Google Scholar 

  5. Ronald, L.: Graham, Eugene L Lawler, Jan Karel Lenstra, and AHG Rinnooy Kan. Optimization and approximation in deterministic sequencing and scheduling: a survey. Ann Disc Math 5, 287–326 (1979)

    Article  Google Scholar 

  6. Mohammad, Hajjat., Shankaranarayanan, P N., David, Maltz., Sanjay, Rao., Kunwadee, Sripanidkulchai.: Dealer : application-aware request splitting for interactive cloud applications. CoNEXT 2012, 157–168 (2012)

  7. Chien-Chun, Hung., Leana, Golubchik., Minlan, Yu.: Scheduling jobs across geo-distributed datacenters. In: proceedings of the sixth ACM symposium on cloud computing (ACM), 111–124 (2015)

  8. Leung, J.Y.T., Li, Haibing, Pinedo, Michael: Scheduling orders for multiple product types to minimize total weighted completion time. Disc Appl Math 155(8), 945–970 (2007). doi:10.1016/j.dam.2006.09.012

    MathSciNet  Article  MATH  Google Scholar 

  9. Mastrolilli, Monaldo, Queyranne, Maurice, Schulz, Andreas S., Svensson, Ola, Uhan, Nelson A.: Minimizing the sum of weighted completion times in a concurrent open shop. Oper Res Lett 38(5), 390–395 (2010). doi:10.1016/j.orl.2010.04.011

    MathSciNet  Article  MATH  Google Scholar 

  10. Microsoft.: Azure Service Fabric. URL: (2016) Accessed 3 Apr 2016

  11. Queyranne, Maurice: Structure of a simple scheduling polyhedron. Math Progr 58(1–3), 263–285 (1993). doi:10.1007/BF01581271

    MathSciNet  Article  MATH  Google Scholar 

  12. Sushant, Sachdeva., Rishi Saket.: Optimal inapproximability for scheduling problems via structural hardness for hypergraph vertex cover. In: IEEE conference on computational complexity (IEEE), 219–229 (2013)

  13. Andreas S. Schulz.: Polytopes and scheduling. Ph.D Thesis (1996)

  14. Andreas S, Schulz.: From linear programming relaxations to approximation algorithms for scheduling problems : a tour d ’ horizon. Working paper; available upon request (2012)

  15. Sriskandarajah, C., Wagneur, E.: Openshops with jobs overlap. Europ J Oper Res 71, 366–378 (1993)

    Article  MATH  Google Scholar 

  16. Qiang, Zhang., Weiwei, Wu., Minming, Li.: Resource scheduling with supply constraint and linear cost. COCOA 2012 conference (2012). doi:10.1007/3-540-68339-9_34

Download references


Special thanks to Andreas Schulz for sharing some of his recent work with us [14]. His thorough analysis of a linear program for \(P||\sum w_j C_j\) drives the LP-based results in this paper. Thanks also to Chien-Chung Hung and Leana Golubchik for sharing [7] while it was under review, and to Ioana Bercea and Manish Purohit for their insights on SWAG’s performance. Lastly, our sincere thanks to William Gasarch for organizing the REU which led to this work, and to the 2015 CAAR-REU cohort for making the experience an unforgettable one; in the words of Rick Sanchez wubalubadubdub!.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Riley Murray.

Additional information

All authors conducted this work at the University of Maryland, College Park. This work was made possible by the National Science Foundation, REU Grant CCF 1262805, and the Winkler Foundation. This work was also partially supported by NSF Grant CCF 1217890.

Remark - A shorter version of this paper (one that omitted several proofs) appeared in the proceedings of the 2016 European Symposium on Algorithms.

Appendix: A Reduction for Minimizing Total Weighted Lateness on Identical Parallel Machines

Appendix: A Reduction for Minimizing Total Weighted Lateness on Identical Parallel Machines

The problem of minimizing total weighted lateness on a bank of identical parallel machines is typically denoted \(P || \sum w_jL_j\), where the lateness of a job with deadline \(d_j\) is \(L_j \doteq \max {\{C_j - d_j, 0\}}\). The reduction we offer below shows that \(P || \sum w_j L_j\) can be stated in terms of \(CC || \sum w_jC_j\) at optimality. Thus while a \(\Delta \) approximation to \(CC || \sum w_jC_j\) does not imply a \(\Delta \) approximation to \(P || \sum w_j L_j\), the reduction below nevertheless provides new insights on the structure of \(P || \sum w_j L_j\).

Definition 17

(Total Weighted Lateness Reduction) Let \(I = (p, d, w, m)\) denote an instance of \(P || \sum w_j L_j\). p is the set of processing times, d is the set of deadlines, w is the set of weights, and m is the number of identical parallel machines. Given these inputs, we transform \(I \in \Omega _{P || \sum w_j L_j}\) to \(I' \in \Omega _{CC}\) in the following way.

Create a total of \(n + 1\) clusters. Cluster 0 has m machines. Job j has processing time \(p_j\) on this cluster, and \(|T_{j0}| = 1\). Clusters 1 through n each consist of a single machine. Job j has processing time \(d_j\) on cluster j, and zero on all clusters other than cluster 0 and cluster j. Denote this problem \(I'\).

We refer the reader to Fig. 2 for an example output of this reduction.

Theorem 18

Let I be an instance of \(P || \textstyle \sum w_j L_j\). Let \(I'\) be an instance of \(CC|| \sum w_j C_j\) resulting from the transformation described above. Any list schedule \(\sigma \) that is optimal for \(I'\) is also optimal for I.


If we restrict the solution space of \(I'\) to single permutations (which we may do without loss of generality), then any schedule \(\sigma \) for I or \(I'\) produces the same value of \(\sum _{j \in N} w_j(C_j - d_j)^+\) for I and \(I'\). The additional clusters we added for \(I'\) ensure that \(C_j \ge d_j\). Given this, the objective for I can be written as \(\sum _{j \in N} w_j d_j + w_j(C_j - d_j)^+\). Because \(w_j d_j\) is a constant, any permutation to solve \(I'\) optimally also solves \(\sum _{j \in N} w_j (C_j - d_j)^+\) optimally. Since \(\sum _{j \in N} w_j (C_j - d_j)^+ = \sum _{j \in N} w_j L_j\), we have the desired result. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Murray, R., Khuller, S. & Chao, M. Scheduling Distributed Clusters of Parallel Machines : Primal-Dual and LP-based Approximation Algorithms. Algorithmica 80, 2777–2798 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Approximation algorithms
  • Distributed computing
  • Machine scheduling
  • LP relaxations
  • Primal-dual algorithms

Mathematics Subject Classification

  • F.2.2 Nonnumerical Algorithms and Problems