Advertisement

Journal of Grid Computing

, Volume 16, Issue 2, pp 285–298 | Cite as

A New Data Layout Scheme for Energy-Efficient MapReduce Processing Tasks

  • Xuan T. Tran
  • Tien Van DoEmail author
  • Csaba Rotter
  • Dosam Hwang
Article

Abstract

Yet Another Resource Negotiator (YARN) is a framework to manage and allocate resource requests from applications that process big data stored in HDFS. However, dynamic power management methods are not efficient when YARN manage applications to process big data stored in the default data layout of HDFS. In this paper, we propose a new data layout scheme that can be implemented for HDFS. A comparison between our proposal and the existing HDFS data layout scheme shows that the new data layout algorithm significantly reduces the energy consumption at the slight expense of the mean response time of jobs.

Keywords

Hadoop Layout Big data processing Data locality Dynamic power management 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

The research has been partially supported by the European Union, co-financed by the European Social Fund (EFOP-3.6.2-16-2017-00013).

This research was partially supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT & Future Planning (2017R1A2B4009410).

The authors are grateful for anonymous reviewers’ and guest editors’ comments that helped to improve the quality of the paper.

References

  1. 1.
    Ellision, B., Minas, L.: Energy Efficiency for Information Technology: How to Reduce Power Consumption in Servers and Data Centers. Intel Press (2009)Google Scholar
  2. 2.
    Gandhi, A., Harchol-Balter, M., Kozuch, M.A.: Are Sleep States Effective in Data Centers?. In: Proceedings of the 2012 International Green Computing Conference (IGCC), IGCC ’12, pp. 1–10. IEEE Computer Society, Washington (2012).  https://doi.org/10.1109/IGCC.2012.6322260
  3. 3.
    Shieh, W.Y., Pong, C.C.: Energy and transition-aware runtime task scheduling for multicore processors. J. Parallel Distrib. Comput. 73(9), 1225 (2013).  https://doi.org/10.1016/j.jpdc.2013.05.003 CrossRefGoogle Scholar
  4. 4.
    Maheshwari, N., Nanduri, R., Varma, V.: Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework. Futur. Gener. Comput. Syst. 28(1), 119 (2012).  https://doi.org/10.1016/j.future.2011.07.001. http://www.sciencedirect.com/science/article/pii/S0167739X1100135X CrossRefGoogle Scholar
  5. 5.
    Liao, B., Yu, J., Zhang, T., Binglei, G., Hua, S., Ying, C.: Energy-efficient algorithms for distributed storage system based on block storage structure reconfiguration. J. Netw. Comput. Appl. 48(0), 71 (2015).  https://doi.org/10.1016/j.jnca.2014.10.008. http://www.sciencedirect.com/science/article/pii/S1084804514002367 CrossRefGoogle Scholar
  6. 6.
    Xuan, T.T., Tien, V.D., Chakka, R.: The impact of dynamic power management in computational clusters with multi-core processors. J. Sci. Ind. Res. (JSIR) 75, 339 (2016)Google Scholar
  7. 7.
    Tang, Z., Qi, L., Cheng, Z., Li, K., Khan, S.U., Li, K.: An energy-efficient task scheduling algorithm in DVFS-enabled cloud environment. Journal of Grid Computing 14 (1), 55 (2016).  https://doi.org/10.1007/s10723-015-9334-y CrossRefGoogle Scholar
  8. 8.
    Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., Saha, B., Curino, C., O’Malley, O., Radia, S., Reed, B., Baldeschwieler, E.: Apache Hadoop YARN: Yet Another Resource Negotiator.. In: Proceedings of the 4th Annual Symposium on Cloud Computing. SOCC ’13, pp. 5:1–5:16. ACM, New York (2013).  https://doi.org/10.1145/2523616.2523633
  9. 9.
    Konstantin, S., Hairong, K., Sanjay, R., Robert, C.: The Hadoop Distributed File System. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). MSST ’10, pp. 1–10. IEEE Computer Society, Washington (2010).  https://doi.org/10.1109/MSST.2010.5496972
  10. 10.
    Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy Efficient Scheduling of MapReduce Workloads on Heterogeneous Clusters.. In: Green Computing Middleware on Proceedings of the 2nd International Workshop. GCM ’11, pp. 1:1–1:6. ACM, New York (2011).  https://doi.org/10.1145/2088996.2088997
  11. 11.
    Aysan, R., Down Douglas, G.: Guidelines for selecting Hadoop schedulers based on system heterogeneity. Journal of Grid Computing 12(3) (2014).  https://doi.org/10.1007/s10723-014-9299-2
  12. 12.
    Goiri, Í., Le, K., Nguyen, T.D., Guitart, J., Torres, J., Bianchini, R.: GreenHadoop: Leveraging Green Energy in Data-processing Frameworks.. In: Proceedings of the 7th ACM European Conference on Computer Systems. EuroSys ’12, pp. 57–70. ACM, New York (2012).  https://doi.org/10.1145/2168836.2168843
  13. 13.
    Mashayekhy, L., Nejad, M., Grosu, D., Lu, D., Shi, W.: Energy-Aware Scheduling of MapReduce Jobs.. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 32–39 (2014).  https://doi.org/10.1109/BigData.Congress.2014.15
  14. 14.
    Song, J., He, H., Wang, Z., Yu, G., Pierson J.-M.: Modulo based data placement algorithm for energy consumption optimization of MapReduce system. Journal of Grid Computing (2016).  https://doi.org/10.1007/s10723-016-9370-2
  15. 15.
    Kaushik, R.T., Bhandarkar, M.: GreenHDFS: Towards an Energy-conserving, Storage-efficient, Hybrid Hadoop Compute Cluster.. In: Proceedings of the 2010 International Conference on Power Aware Computing and Systems. HotPower’10, pp. 1–9. USENIX Association, Berkeley (2010). http://dl.acm.org/citation.cfm?id=1924920.1924927
  16. 16.
    Leverich, J., Kozyrakis, C.: On the energy (in)efficiency of Hadoop clusters. SIGOPS Oper. Syst. Rev. 44(1), 61 (2010).  https://doi.org/10.1145/1740390.1740405 CrossRefGoogle Scholar
  17. 17.
    Lang, W., Patel, J.M.: Energy management for MapReduce clusters. Proc. VLDB Endow. 3(1-2), 129 (2010).  https://doi.org/10.14778/1920841.1920862 CrossRefGoogle Scholar
  18. 18.
    SPEC. Fujitsu PRIMERGY rx100 s8 (intel xeon e3-1265lv3) (2013). https://www.spec.org/power_ssj2008/results/res2013q4/power_ssj2008-20131018-00643.html. Accessed 28 Feb 2017
  19. 19.
    SPEC. Acer Incorporated Acer ar380 f2 (intel xeon e5-2665) (2012). http://www.spec.org/power_ssj2008/results/res2012q3/power_ssj2008-20120525-00479.html. Accessed 28 Feb 2017
  20. 20.
    SPEC. Hitachi ha8000/rs110-hhm (intel xeon e5-2470) (2012). https://www.spec.org/power_ssj2008/results/res2012q3/power_ssj2008-20120724-00515.html. Accessed 28 Feb 2017
  21. 21.
    SPEC. Fujitsu primergy tx100 s3p (intel xeon e3-1240v2) (2012). http://www.spec.org/power_ssj2008/results/res2012q3/power_ssj2008-20120726-00519.html. Accessed 28 Feb 2017
  22. 22.
    SPEC. Acer Incorporated Acer ar380 f2 (intel xeon e5-2640) (2012). http://www.spec.org/power_ssj2008/results/res2012q3/power_ssj2008-20120525-00481.html. Accessed 28 Feb 2017
  23. 23.
    Verma, A., Cherkasova, L., Campbell, R.H.: Orchestrating an ensemble of MapReduce jobs for minimizing their makespan. IEEE Trans. Dependable Secur. Comput. 10(5), 314 (2013).  https://doi.org/10.1109/TDSC.2013.14 CrossRefGoogle Scholar
  24. 24.
    Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., Stoica, I.: Mesos: A Platform for Fine-grained Resource Sharing in the Data Center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation. NSDI’11, pp. 295–308. USENIX Association, Berkeley (2011). http://dl.acm.org/citation.cfm?id=1972457.1972488
  25. 25.
    Do, T.V., Vu, B.T., Do, N.H., Farkas, L., Rotter, C., Tarjanyi, T.: Building Block Components to Control a Data Rate in the Apache Hadoop Compute Platform. In: 2015 18th International Conference on Intelligence in Next Generation Networks, pp. 23–29 (2015).  https://doi.org/10.1109/ICIN.2015.7073802
  26. 26.
    Murthy, A.C., Vavilapalli, V.K., Eadline, D., Niemiec, J., Markham, J.: Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with Apache Hadoop 2, 1st edn. Addison-Wesley Professional, Boston (2014)Google Scholar
  27. 27.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107 (2008).  https://doi.org/10.1145/1327452.1327492 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  • Xuan T. Tran
    • 2
  • Tien Van Do
    • 1
    • 2
    Email author
  • Csaba Rotter
    • 3
  • Dosam Hwang
    • 4
  1. 1.Division of Knowledge and System Engineering for ICT, Faculty of Information TechnologyTon Duc Thang UniversityHo Chi Minh CityVietnam
  2. 2.Department of Networked Systems and ServicesBudapest University of Technology and EconomicsBudapestHungary
  3. 3.Nokia Bell LabsBudapestHungary
  4. 4.Department of Computer EngineeringYeungnam UniversityGyeongsanKorea

Personalised recommendations