Abstract
With the explosion of data production, the efficiency of data management and analysis has been concerned by both industry and academia. Meanwhile, more and more energy is consumed by the IT infrastructure especially the larger scale distributed systems. In this paper, a novel idea for optimizing the Energy Consumption (EC for short) of MapReduce system is proposed. We argue that a fair data placement is helpful to save energy, and then we propose three goals of data placement, and a modulo based Data Placement Algorithm (DPA for short) which achieves these goals. Afterwards, the correctness of the proposed DPA is proved from both theoretical and experimental perspectives. Three different systems which implement MapReduce model with different DPAs are compared in our experiments. Our algorithm is proved to optimize EC effectively, without introducing the additional costs and delaying data loading. With the help of our DPA, the EC for the WordCount (https://src/examples/org/apache/hadoop/examples/), Sort (https://src/examples/org/apache/hadoop/examples/sort) and MRBench (https://src/examples/org/apache/hadoop/mapred/) can be reduced by 10.9 %, 8.3 % and 17 % respectively, and time consumption is reduced by 7 %, 6.3 % and 7 % respectively.
Similar content being viewed by others
References
Labrinidis, A., Jagadish, H.V.: Challenges and opportunities with big data. PVLDB 5(12), 2032–2033 (2012)
Dean, J., Ghemawat, S.: Mapreduce: a flexible data processing tool. Commun. ACM (CACM) 53(1), 72–77 (2010)
Song, J., Liu, X., Zhu, Z., Zhao, D., Yu, G.: A Novel Task Scheduling Approach for Reducing Energy Consumption of MapReduce Cluster. IETE Tech. Rev. 31(1), 65–74 (2014)
Elnozahy, E.N., Kistler, M., Rajamony, R.: Energy-efficient server clusters. PACS 2002, 179–196
Lee, K.G., Bharadwaj, V., Sivakumar, V.: Design of fast and efficient Energy-Aware Gradient-Based scheduling algorithms heterogeneous embedded multiprocessor systems. IEEE Trans. Parallel Distrib. Syst. (TPDS) 20(1), 1–12 (2009)
Da Costa, G., Dias de Assunção, M., Gelas, J-P, Georgiou, Y., Lefèvre, L., Orgerie, A., Pierson, J-M, Olivier, R., Sayah, A.: Multi-facet approach to reduce energy consumption in clouds and grids: the GREEN-NET framework. e-Energy 2010, 95–104
Lang, W., Patel, J.M.: Energy management for MapReduce clusters. PVLDB 3(1), 129–139 (2010)
Maheshwari, N., Nanduri, R., Varma, V.: Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework. Future Generation Comp. Syst. (FGCS) 28(1), 119–127 (2012)
Xiong, W., Kansal, A.: Energy efficient data intensive distributed computing. IEEE Data Eng. Bull. (DEBU) 34(1), 24–33 (2011)
Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: SC, pp. 58:1–58:11 (2011)
Pinheiro, E., Bianchini, R., Enrique, V.C., Heath, T.: Load balancing and unbalancing for power and performance in cluster-based systems. In: Workshop on compilers and operating systems for low power, pp. 182– 195 (2001)
Chen, Y., Keys, L., Katz, R.H.: Towards Energy Efficient MapReduce. Technical Report of EECS Department University of California, Berkeley. Available via http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-109.pdf (2008)
Pinheiro, E., Bianchini, R., Carrera, E.V., Heath, T.: Dynamic cluster reconfiguration for power and performance: Compilers and operating systems for low power (book), Kluwer Academic Publishers Norwell, ISBN:1-4020-7573-1, 75–93 (2003)
Chen, Y., Alspaugh, S., Borthakur, D., Katz, R.: Energy Efficiency for Large-Scale MapReduce Workloads with Significant Interactive Analysis. In: 7th ACM Europan Conference on Computer System, pp. 43– 56 (2012)
Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In: Green Computing Middleware (ACM), pp. 1–6 (2011)
Pinheiro, E., Bianchini, R.: Energy conservation techniques for disk array-based servers. ICS 2004, 68–78
Kaushik, R.T., Abdelzaher, T.F., Egashira, R., Nahrstedt, K.: Predictive data and energy management in GreenHDFS. IGCC 2011, 1–9
Colarelli, D., Grunwald, D.: Massive arrays of idle disks for storage archives. SC 2002, 1–11
Karger, D.R., Lehman, E., Leighton, F.T., Panigrahy, R., Levine, M.S., Lewin, D.: Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. STOC 1997, 654–663
Brinkmann, A., Salzwedel, K., Scheideler, C.: Efficient, distributed data placement strategies for storage area networks (extended abstract). SPAA 2000, 119–128
Tao, C., Nong, X., Fang, L., et al.: Clustering-Based And consistent Hashing-Aware data placement algorithm. J. Softw. 21(12), 3175–3185 (2010). (in Chinese)
Yuan, D., Yang, Y., Liu, X., Chen, J.: A data placement strategy in scientific cloud workflows. Future Generation Comp. Syst. (FGCS) 26(8), 1200–1214 (2010)
Profiling Energy Usage for Efficient Consumption. https://msdn.microsoft.com/en-us/library/dd393312.aspx. 2016/6/1
Liu, Z.: Efficient, balanced data placement algorithm in scalable storage clusters. Journal of Communication and Computer, 2007, (7):8-17
Ronald, L.: Graham: Bounds on Multiprocessing Timing anoMalies. SIAM J. Appl. Math. (SIAMAM) 17(2), 416–429 (1969)
WordCount program: Available in Hadoop source distribution: https://src/examples/org/apache/hadoop/examples/ WordCount
Sort program: Available in Hadoop source distribution: https://src/examples/org/apache/hadoop/examples/sort
MRBench program: Available in Hadoop source distribution: https://src/examples/org/apache/hadoop/mapred/ MRBench
Jie, S., Li, T., Zhi, W., Zhiliang, Z.: Study on energy-consumption regularities of cloud computing systems by a novel evaluation model. Computing, 1–19 (2013)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM (CACM) 51(1), 107–113 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Song, J., He, H., Wang, Z. et al. Modulo Based Data Placement Algorithm for Energy Consumption Optimization of MapReduce System. J Grid Computing 16, 409–424 (2018). https://doi.org/10.1007/s10723-016-9370-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-016-9370-2