An Algorithmic Framework for Geo-Distributed Analytics

  • Srikanth Kandula
  • Ishai MenacheEmail author
  • Joseph (Seffi) Naor
  • Erez Timnat
Conference paper
Part of the Static & Dynamic Game Theory: Foundations & Applications book series (SDGTFA)


Large-scale cloud enterprises operate tens to hundreds of datacenters, running a variety of services that produce enormous amounts of data, such as search clicks and infrastructure operation logs. A recent research direction in both academia and industry is to attempt to process the “big data” in multiple datacenters, as the alternative of centralized processing might be too slow and costly (e.g., due to transferring all the data to a single location). Running such geo-distributed analytics jobs at scale gives rise to key resource management decisions: Where should each of the computations take place? Accordingly, which data should be moved to which location, and when? Which network paths should be used for moving the data, etc. These decisions are complicated not only because they involve the scheduling of multiple types of resources (e.g., compute and network), but also due to the complicated internal data flow of the jobs—typically structured as a DAG of tens of stages, each of which with up to thousands of tasks. Recent work [17, 22, 25] has dealt with the resource management problem by abstracting away certain aspects of the problem, such as the physical network connecting the datacenters, the DAG structure of the jobs, and/or the compute capacity constraints at the (possibly heterogeneous) datacenters. In this paper, we provide the first analytical model that includes all aspects of the problem, with the objective of minimizing the makespan of multiple geo-distributed jobs. We provide exact and approximate algorithms for certain practical scenarios and suggest principled heuristics for other scenarios of interest.


  1. 1.
    Hadoop YARN Project.
  2. 2.
    Seattle department of transportation live traffic videos.
  3. 3.
    TPC-H Benchmark.
  4. 4.
    TPC-DS Benchmark., 2012.
  5. 5.
    A. Greenberg, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM, 2009.Google Scholar
  6. 6.
    Sameer Agarwal, Srikanth Kandula, Nico Burno, Ming-Chuan Wu, Ion Stoica, and Jingren Zhou. Re-optimizing data parallel computing. In NSDI, 2012.Google Scholar
  7. 7.
    Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A scalable, commodity data center network architecture. In SIGCOMM, 2008.Google Scholar
  8. 8.
    Michael Armbrust et al. Spark sql: Relational data processing in spark. In SIGMOD, 2015.Google Scholar
  9. 9.
    Peter Bodík, Ishai Menache, Joseph Seffi Naor, and Jonathan Yaniv. Brief announcement: deadline-aware scheduling of big-data processing jobs. In SPAA, pages 211–213, 2014.Google Scholar
  10. 10.
    Ronnie Chaiken et al. SCOPE: Easy and Efficient Parallel Processing of Massive Datasets. In VLDB, 2008.Google Scholar
  11. 11.
    Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. In OSDI, 2004.Google Scholar
  12. 12.
    Pierre-François Dutot, Grégory Mounié, and Denis Trystram. Scheduling parallel tasks approximation algorithms. In Handbook of Scheduling - Algorithms, Models, and Performance Analysis. 2004.Google Scholar
  13. 13.
    Ronald L. Graham. Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 1969.Google Scholar
  14. 14.
    Robert Grandl, Mosharaf Chowdhury, Aditya Akella, and Ganesh Ananthanarayanan. Altruistic scheduling in multi-resource clusters. In OSDI, 2016.Google Scholar
  15. 15.
    Chi-Yao Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, Mohan Nanduri, and Roger Wattenhofer. Achieving high utilization with software-driven wan. In SIGCOMM, 2013.Google Scholar
  16. 16.
    Chien-Chun Hung, Ganesh Ananthanarayanan, Leana Golubchik, Minlan Yu, and Mingyang Zhang. Wide-area analytics with multiple resources. In EuroSys, 2018.Google Scholar
  17. 17.
    Chien-Chun Hung, Leana Golubchik, and Minlan Yu. Scheduling jobs across geo-distributed datacenters. In SOCC, 2015.Google Scholar
  18. 18.
    IDC. Network video surveillance: Addressing storage challenges., 2012.
  19. 19.
    Michael Isard. Autopilot: Automatic Data Center Management. OSR, 41(2), 2007.Google Scholar
  20. 20.
    Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, et al. B4: Experience with a globally-deployed software defined wan. In SIGCOMM, 2013.Google Scholar
  21. 21.
    Klaus Jansen and Hu Zhang. Scheduling malleable tasks with precedence constraints. J. Comput. Syst. Sci., 78(1):245–259, 2012.MathSciNetCrossRefGoogle Scholar
  22. 22.
    Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, and Ion Stoica. Low latency geo-distributed analytics. In SIGCOMM, 2015.Google Scholar
  23. 23.
    Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. Omega: Flexible, scalable schedulers for large compute clusters. In EuroSys, 2013.Google Scholar
  24. 24.
    Ashish Thusoo et al. Hive- a warehousing solution over a map-reduce framework. In VLDB, 2009.Google Scholar
  25. 25.
    Ashish Vulimiri, Carlo Curino, P. Brighten Godfrey, Thomas Jungblut, Jitu Padhye, and George Varghese. Global analytics in the face of bandwidth and regulatory constraints. In NSDI, 2015.Google Scholar
  26. 26.
    M. Zaharia et al. Spark: Cluster computing with working sets. Technical Report UCB/EECS-2010-53, EECS Department, University of California, Berkeley, 2010.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Srikanth Kandula
    • 1
  • Ishai Menache
    • 1
    Email author
  • Joseph (Seffi) Naor
    • 2
  • Erez Timnat
    • 3
  1. 1.Microsoft ResearchRedmondUSA
  2. 2.Technion – Israel Institute of TechnologyHaifaIsrael
  3. 3.GoogleTel AvivIsrael

Personalised recommendations