Skip to main content

Advertisement

Log in

Modulo Based Data Placement Algorithm for Energy Consumption Optimization of MapReduce System

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

With the explosion of data production, the efficiency of data management and analysis has been concerned by both industry and academia. Meanwhile, more and more energy is consumed by the IT infrastructure especially the larger scale distributed systems. In this paper, a novel idea for optimizing the Energy Consumption (EC for short) of MapReduce system is proposed. We argue that a fair data placement is helpful to save energy, and then we propose three goals of data placement, and a modulo based Data Placement Algorithm (DPA for short) which achieves these goals. Afterwards, the correctness of the proposed DPA is proved from both theoretical and experimental perspectives. Three different systems which implement MapReduce model with different DPAs are compared in our experiments. Our algorithm is proved to optimize EC effectively, without introducing the additional costs and delaying data loading. With the help of our DPA, the EC for the WordCount (https://src/examples/org/apache/hadoop/examples/), Sort (https://src/examples/org/apache/hadoop/examples/sort) and MRBench (https://src/examples/org/apache/hadoop/mapred/) can be reduced by 10.9 %, 8.3 % and 17 % respectively, and time consumption is reduced by 7 %, 6.3 % and 7 % respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. PVLDB 5(12), 2032–2033 (2012)

    Google Scholar 

  2. Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM (CACM) 53(1), 72–77 (2010)

    Article  Google Scholar 

  3. Song, J., Liu, X., Zhu, Z., Zhao, D., Yu, G.: A Novel Task Scheduling Approach for Reducing Energy Consumption of MapReduce Cluster. IETE Tech. Rev. 31(1), 65–74 (2014)

    Article  Google Scholar 

  4. Elnozahy, E.N., Kistler, M., Rajamony, R.: Energy-efficient server clusters. PACS 2002, 179–196

  5. Lee, K.G., Bharadwaj, V., Sivakumar, V.: Design of fast and efficient Energy-Aware Gradient-Based scheduling algorithms heterogeneous embedded multiprocessor systems. IEEE Trans. Parallel Distrib. Syst. (TPDS) 20(1), 1–12 (2009)

    Article  Google Scholar 

  6. Da Costa, G., Dias de Assunção, M., Gelas, J-P, Georgiou, Y., Lefèvre, L., Orgerie, A., Pierson, J-M, Olivier, R., Sayah, A.: Multi-facet approach to reduce energy consumption in clouds and grids: the GREEN-NET framework. e-Energy 2010, 95–104

  7. Lang, W., Patel, J.M.: Energy management for MapReduce clusters. PVLDB 3(1), 129–139 (2010)

    Google Scholar 

  8. Maheshwari, N., Nanduri, R., Varma, V.: Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework. Future Generation Comp. Syst. (FGCS) 28(1), 119–127 (2012)

    Article  Google Scholar 

  9. Xiong, W., Kansal, A.: Energy efficient data intensive distributed computing. IEEE Data Eng. Bull. (DEBU) 34(1), 24–33 (2011)

    Google Scholar 

  10. Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: SC, pp. 58:1–58:11 (2011)

  11. Pinheiro, E., Bianchini, R., Enrique, V.C., Heath, T.: Load balancing and unbalancing for power and performance in cluster-based systems. In: Workshop on compilers and operating systems for low power, pp. 182– 195 (2001)

  12. Chen, Y., Keys, L., Katz, R.H.: Towards Energy Efficient MapReduce. Technical Report of EECS Department University of California, Berkeley. Available via http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-109.pdf (2008)

  13. Pinheiro, E., Bianchini, R., Carrera, E.V., Heath, T.: Dynamic cluster reconfiguration for power and performance: Compilers and operating systems for low power (book), Kluwer Academic Publishers Norwell, ISBN:1-4020-7573-1, 75–93 (2003)

  14. Chen, Y., Alspaugh, S., Borthakur, D., Katz, R.: Energy Efficiency for Large-Scale MapReduce Workloads with Significant Interactive Analysis. In: 7th ACM Europan Conference on Computer System, pp. 43– 56 (2012)

  15. Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In: Green Computing Middleware (ACM), pp. 1–6 (2011)

  16. Pinheiro, E., Bianchini, R.: Energy conservation techniques for disk array-based servers. ICS 2004, 68–78

  17. Kaushik, R.T., Abdelzaher, T.F., Egashira, R., Nahrstedt, K.: Predictive data and energy management in GreenHDFS. IGCC 2011, 1–9

  18. Colarelli, D., Grunwald, D.: Massive arrays of idle disks for storage archives. SC 2002, 1–11

  19. Karger, D.R., Lehman, E., Leighton, F.T., Panigrahy, R., Levine, M.S., Lewin, D.: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. STOC 1997, 654–663

  20. Brinkmann, A., Salzwedel, K., Scheideler, C.: Efficient, distributed data placement strategies for storage area networks (extended abstract). SPAA 2000, 119–128

  21. Tao, C., Nong, X., Fang, L., et al.: Clustering-Based And consistent Hashing-Aware data placement algorithm. J. Softw. 21(12), 3175–3185 (2010). (in Chinese)

  22. Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Future Generation Comp. Syst. (FGCS) 26(8), 1200–1214 (2010)

    Article  Google Scholar 

  23. Profiling Energy Usage for Efficient Consumption. https://msdn.microsoft.com/en-us/library/dd393312.aspx. 2016/6/1

  24. Unixbench https://code.google.com/p/byte-unixbench/

  25. Liu, Z.: Efficient, balanced data placement algorithm in scalable storage clusters. Journal of Communication and Computer, 2007, (7):8-17

  26. Ronald, L.: Graham: Bounds on Multiprocessing Timing anoMalies. SIAM J. Appl. Math. (SIAMAM) 17(2), 416–429 (1969)

    Article  MATH  Google Scholar 

  27. WordCount program: Available in Hadoop source distribution: https://src/examples/org/apache/hadoop/examples/ WordCount

  28. Sort program: Available in Hadoop source distribution: https://src/examples/org/apache/hadoop/examples/sort

  29. MRBench program: Available in Hadoop source distribution: https://src/examples/org/apache/hadoop/mapred/ MRBench

  30. Jie, S., Li, T., Zhi, W., Zhiliang, Z.: Study on energy-consumption regularities of cloud computing systems by a novel evaluation model. Computing, 1–19 (2013)

  31. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM (CACM) 51(1), 107–113 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Song.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, J., He, H., Wang, Z. et al. Modulo Based Data Placement Algorithm for Energy Consumption Optimization of MapReduce System. J Grid Computing 16, 409–424 (2018). https://doi.org/10.1007/s10723-016-9370-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-016-9370-2

Keywords

Navigation