Cluster Computing

, Volume 21, Issue 2, pp 1243–1260 | Cite as

AEGEUS++: an energy-aware online partition skew mitigation algorithm for mapreduce in cloud

  • Vimalkumar KumaresanEmail author
  • R. Baskaran
  • P. Dhavachelvan


This paper investigates the partition skew problem at reduce phase in the mapreduce jobs. Our study summarize the skew problem in both offline and online manner. Offline is a heuristics based approach waits for the completion of map tasks and it involves computation overhead to estimate the partition size. In online approach, the overloaded tasks are distributed across other nodes that needs extra split and merge operations. These extra operations and ineffective utilization of resources in turn hamper the performance of the entire system. In this paper, we propose Aegeus++, to address the skew mitigation and adaptive data sampling problems for mapreduce jobs which enables to build an online prediction model with improved accuracy in minimal waiting time. In addition, we propose near linear skew detection and fine-grained Resource Allocation algorithms for identifying the skewed partition and allocating appropriate resources to reducers based on the partition size. Finally, our energy-aware opportunistic frequency tuning algorithm improves the performance of the reducer container on-fly, that can process the skewed data faster with minimal energy consumption. We evaluated Aegeus++ in the cloud setup by using benchmark datasets, compared its performance with native Hadoop and its other approaches. Based on our observation, Aegeus++ outperforms native Hadoop by 44% by maximizing its overall performance of the application and decreases the energy consumption by 37.67% when compared with existing approaches.


MapReduce Partition skew Energy-awareness Load balancing Resource allocation cloud computing 



This work is supported by Anna Centenary Research Fellowship (CFR/ACRF/2015/15) which is funded by Anna University. Special Thanks to Microsoft for providing Microsoft Azure sponsorship Award for conducting our research.


  1. 1.
    Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.: Puma: Purdue mapreduce benchmarks suite (2012)Google Scholar
  2. 2.
    Ananthanarayanan, G., Kandula, S., Greenberg, A.G., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI, vol. 10, p. 24 (2010)Google Scholar
  3. 3.
    Bulmer, M.G.: Principles of Statistics. Courier Corporation, Mineola (1979)zbMATHGoogle Scholar
  4. 4.
    Chen, Q., Yao, J., Xiao, Z.: Libra: lightweight data skew mitigation in mapreduce. IEEE Trans Parallel Distrib. Syst. 26(9), 2520–2533 (2015)CrossRefGoogle Scholar
  5. 5.
  6. 6.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  7. 7.
    Dhawalia, P., Kailasam, S., Janakiram, D.: Chisel: A resource savvy approach for handling skew in mapreduce applications. In: 2013 IEEE Sixth International Conference on Cloud Computing, pp. 652–660. IEEE (2013)Google Scholar
  8. 8.
    Dhawalia, P., Kailasam, S., Janakiram, D.: Chisel++: handling partitioning skew in mapreduce framework using efficient range partitioning technique. In: Proceedings of the Sixth International Workshop on Data Intensive Distributed Computing, pp. 21–28. ACM (2014)Google Scholar
  9. 9.
    Elmeleegy, K., Olston, C., Reed, B.: Spongefiles: Mitigating data skew in mapreduce using distributed memory. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 551–562. ACM (2014)Google Scholar
  10. 10.
    Greenberg, A., Hamilton, J., Maltz, D.A., Patel, P.: The cost of a cloud: research problems in data center networks. ACM SIGCOMM Comput. Commun. Rev. 39(1), 68–73 (2008)CrossRefGoogle Scholar
  11. 11.
    Hackenberg, D., Schöne, R., Ilsche, T., Molka, D., Schuchart, J., Geyer, R.: An energy efficiency feature survey of the intel haswell processor. In: Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015 IEEE International, pp. 896–904. IEEE (2015)Google Scholar
  12. 12.
  13. 13.
    Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for mapreduce. In: Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on, pp. 570–576. IEEE (2011)Google Scholar
  14. 14.
    Hartog, J., Dede, E., Govindaraju, M.: Mapreduce framework energy adaptation via temperature awareness. Cluster Comput. 17(1), 111–127 (2014)CrossRefGoogle Scholar
  15. 15.
    Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., Qi, L.: Leen: Locality/fairness-aware key partitioning for mapreduce in the cloud. In: Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, pp. 17–24. IEEE (2010)Google Scholar
  16. 16.
    Ibrahim, S., Moise, D., Chihoub, H.E., Carpen-Amarie, A., Bougé, L., Antoniu, G.: Towards efficient power management in mapreduce: investigation of cpu-frequencies scaling on power efficiency in hadoop. In: International Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, pp. 147–164. Springer, Berlin (2014)Google Scholar
  17. 17.
    Intel: Intel xeon e5-e3 v3 spec update. Accessed 4 Jan 2017 (2017)Google Scholar
  18. 18.
    Jain, R., Chiu, D.M., Hawe, W.R.: A quantitative measure of fairness and discrimination for resource allocation in shared computer system, vol. 38. Eastern Research Laboratory, Digital Equipment Corporation, Hudson (1984)Google Scholar
  19. 19.
    Kaushik, R.T., Bhandarkar, M.: Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In: Proceedings of the USENIX annual technical conference, p. 109 (2010)Google Scholar
  20. 20.
    Kim, W., Shin, D., Yun, H.S., Kim, J., Min, S.L.: Performance comparison of dynamic voltage scaling algorithms for hard real-time systems. In: Real-Time and Embedded Technology and Applications Symposium, 2002. Proceedings. Eighth IEEE, pp. 219–228. IEEE (2002)Google Scholar
  21. 21.
    Kumaresan, V., Baskaran, R.: Aegeus: An online partition skew mitigation algorithm for mapreduce. In: Proceedings of the International Conference on Informatics and Analytics, p. 100. ACM (2016)Google Scholar
  22. 22.
    Komarasamy, D., Muthuswamy, V.: Deadline constrained adaptive multilevel scheduling system in cloud environment. KSII Trans. Internet Inf. Syst. (TIIS) 9(4), 1302–1320 (2015)Google Scholar
  23. 23.
    Kwon, Y., Balazinska, M., Howe, B., Rolia, J.: Skewtune: mitigating skew in mapreduce applications. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 25–36. ACM (2012)Google Scholar
  24. 24.
    Le, Y., Liu, J., Ergün, F., Wang, D.: Online load balancing for mapreduce with skewed data input. In: IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pp. 2004–2012. IEEE (2014)Google Scholar
  25. 25.
    Leverich, J., Kozyrakis, C.: On the energy (in) efficiency of hadoop clusters. ACM SIGOPS Oper. Syst. Rev. 44(1), 61–65 (2010)CrossRefGoogle Scholar
  26. 26.
    Li, P., Ju, L., Jia, Z., Sun, Z.: Sla-aware energy-efficient scheduling scheme for hadoop yarn. In: High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conference on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on, pp. 623–628. IEEE (2015)Google Scholar
  27. 27.
    Liu, Z., Zhang, Q., Boutaba, R., Liu, Y., Wang, B.: Optima: on-line partitioning skew mitigation for mapreduce with resource adjustment. J. Netw. Syst. Manag. 25, 859–883 (2016)CrossRefGoogle Scholar
  28. 28.
    Liu, Z., Zhang, Q., Zhani, M.F., Boutaba, R., Liu, Y., Gong, Z.: Dreams: dynamic resource allocation for mapreduce with data skew. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pp. 18–26. IEEE (2015)Google Scholar
  29. 29.
    Payberah, A.H., Kavalionak, H., Kumaresan, V., Montresor, A., Haridi, S.: Clive: cloud-assisted p2p live streaming. In: Peer-to-Peer Computing (P2P), 2012 IEEE 12th International Conference on, pp. 79–90. IEEE (2012)Google Scholar
  30. 30.
    Riquelme, C., Zhang, B., Johari, R.: Online active linear regression via thresholding. arXiv:1602.02845 (2016)
  31. 31.
  32. 32.
    Van Heddeghem, W., Lambert, S., Lannoo, B., Colle, D., Pickavet, M., Demeester, P.: Trends in worldwide ict electricity consumption from 2007 to 2012. Comput. Commun. 50, 64–76 (2014)CrossRefGoogle Scholar
  33. 33.
    Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, p. 5. ACM (2013)Google Scholar
  34. 34.
  35. 35.
    Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM (2011)Google Scholar
  36. 36.
    Wang, G., Wang, S., Luo, B., Shi, W., Zhu, Y., Yang, W., Hu, D., Huang, L., Jin, X., Xu, W.: Increasing large-scale data center capacity by statistical power control. In: Proceedings of the Eleventh European Conference on Computer Systems, p. 8. ACM (2016)Google Scholar
  37. 37.
    Wirtz, T., Ge, R.: Improving mapreduce energy efficiency for computation intensive workloads. In: Green Computing Conference and Workshops (IGCC), 2011 International, pp. 1–8. IEEE (2011)Google Scholar
  38. 38.
    Zaheilas, N., Kalogeraki, V.: Real-time scheduling of skewed mapreduce jobs in heterogeneous environments. In: 11th International Conference on Autonomic Computing (ICAC 14), pp. 189–200 (2014)Google Scholar
  39. 39.
    Zhang, Z., Feng, X.: New methods for deviation-based outlier detection in large database. In: Fuzzy Systems and Knowledge Discovery, 2009. FSKD’09. Sixth International Conference on, vol. 1, pp. 495–499. IEEE (2009)Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.College of Engineering GuindyAnna UniversityChennaiIndia
  2. 2.Department of Computer SciencePondicherry UniversityPondicherryIndia

Personalised recommendations