Partial Clones for Stragglers in MapReduce

  • Jia Li
  • Changjian Wang
  • Dongsheng Li
  • Zhen Huang
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 503)

Abstract

Stragglers can temporize jobs and reduce cluster efficiency seriously. Many researches have been contributed to the solution, such as Blacklist[8], speculative execution[1, 6], Dolly[8]. In this paper, we put forward a new approach for mitigating stragglers in MapReduce, name Hummer. It starts task clones only for high-risk delaying tasks. Related experiments have been carried and results show that it can decrease the job delaying risk with fewer resources consumption. For small jobs, Hummer also improves job completion time by 48% and 10% compared to LATE and Dolly.

Keywords

MapReduce mitigating stragglers task clones 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Harris, E., Saha, B.: Reining in the Outliers in Map-Reduce Clusters using Mantri. In: Proc. of the USENIX OSDI (2010)Google Scholar
  2. 2.
    Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: SkewTune: Mitigating skew in MapReduce applications. In: Proc. of the SIGMOD Conf., pp. 25–36 (2012)Google Scholar
  3. 3.
    Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proc. of the USENIX OSDI (2004)Google Scholar
  4. 4.
    Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce Performance in Heterogeneous Environments. In: Proc. of the USENIX OSDI (2008)Google Scholar
  5. 5.
    Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: SkewTune in action (demonstration). Proc. of the VLDB Endowment 5(12), 1934–1937 (2012)CrossRefGoogle Scholar
  6. 6.
    Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Effective Straggler Mitigation: Attack of the Clones. In: Proc. of the USENIX NSDI (2013)Google Scholar
  7. 7.
    Ananthanarayanan, G., Hung, M.C.-C., Ren, X., Stoica, I., Wierman, A., Yu, M.: GRASS: Trimming Stragglers in Approximation Analytics. In: Proc. of the 11th USENIX NSDI (2014)Google Scholar
  8. 8.
    Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: A Study of Skew in MapReduce Applications. In: Proc. of the Open Cirrus Summit (2011)Google Scholar
  9. 9.
    Chen, Y., Alspaugh, S., Borthakur, D., Katz, R.: Energy Efficiency for Large-Scale MapReduce Workloads with Significant Interactive Analysis. In: Proc. of the ACM EuroSys (2012)Google Scholar
  10. 10.
    Barroso, L.A.: Warehouse-scale computing: Entering the teenage decade. In: Proc. of the ISCA (2011)Google Scholar
  11. 11.
    Resnick, S.: Heavy-tail phenomena: probabilistic and statistical modeling. Springer (2007)Google Scholar
  12. 12.
    Cirne, W., Paranhos, D., Brasileiro, F., Goes, L.F.W., Voorsluys, W.: On the Efficacy, Efficiency and Emergent Behavior of Task Replication in Large Distributed Systems. Parallel Computing 33(3), 213–234 (2007)CrossRefGoogle Scholar
  13. 13.
  14. 14.
    Ananthanarayanan, G., Ghodsi, A., Shenker, S., Stoica, I.: Why Let Resources Idle? Aggressive Cloning of Jobs with Dolly. In: Proc. of the HotCloud (2012)Google Scholar
  15. 15.
    Ousterhout, K., Wendell, P., Zaharia, M., Stoica, I.: Sparrow: Distributed, Low-Latency Scheduling. In: Proc. of the SOSP (2013)Google Scholar
  16. 16.
    Ghodsi, A., Zaharia, M., Shenker, S., Stoica, I.: Choosy: Max-Min Fair Sharing for Datacenter Jobs with Constraints. In: Proc. of the EuroSys (2013)Google Scholar
  17. 17.
    Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: EuroSys 2010: Proceedings of the 5th European Conference on Computer Systems, pp. 265–278. ACM, New York (2010)Google Scholar
  18. 18.
    Gittins, J.C.: Bandit Processes and Dynamic Allocation Indices. Journal of the Royal Statistical Society. Series B (Methodological) (1979)Google Scholar
  19. 19.
    Sonin, I.: A Generalized Gittins Index for a Markov Chain and Its Recursive Calculation. Statistics & Probability Letters (2008)Google Scholar
  20. 20.
    Dean, J.: Achieving Rapid Response Times in Large Online Services., http://research.google.com/People/jeff/latency.html
  21. 21.
    Ren, K., Kwon, Y., Balazinska, M., Howe, B.: Hadoop’s Adolescence: An Analysis of Hadoop Usage in Scientific Workloads. In: Proc. of the VLDB (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Jia Li
    • 1
  • Changjian Wang
    • 1
  • Dongsheng Li
    • 1
  • Zhen Huang
    • 1
  1. 1.National Laboratory for Parallel and Distributed Processing, School of Computer ScienceNational University of Defense TechnologyChangshaChina

Personalised recommendations