Advertisement

Cluster Computing

, Volume 17, Issue 4, pp 1171–1183 | Cite as

A dynamic block device reconfiguration algorithm in virtual MapReduce cluster

  • Kwonyong Lee
  • Yoonsung Nam
  • Taekhee Kim
  • Sungyong ParkEmail author
  • Hyuk-Jun Lee
  • Jihoon Yang
Article

Abstract

With the advances of cloud computing and virtualization technologies, running MapReduce applications over clouds has been attracting more and more attention in recent years. However, as a fundamental problem, the performance of MapReduce applications can sometimes be severely degraded due to the overheads from I/O virtualization and resource competitions among virtual machines. In this paper, we propose a dynamic block device reconfiguration algorithm in virtual MapReduce clusters, which reduces the data transfer time between virtual machines and thereby improving the performance of MapReduce applications on top of the clouds. The proposed algorithm utilizes a block device reconfiguration scheme, where a block device attached to a virtual machine can be dynamically detached and reattached to other virtual machines at runtime. This scheme allows us to move files easily across different virtual machines without any network transfers between virtual machines. This algorithm is also dynamic in a sense that it estimates the total data transfer times between virtual machines using multiple regression analysis based on CPU utilization and data size, and adaptively determines a least-cost data transfer path between a mapper virtual machine and a reducer virtual machine. We have implemented our algorithm in Hadoop MapReduce. The benchmarking results showed that the overheads incurred by transferring data from mapper virtual machines to reducer virtual machines are minimized and the execution times of MapReduce applications are shortened up to 14 %.

Keywords

MapReduce Cloud Virtual cluster  Xen Block device reconfiguration 

Notes

Acknowledgments

This research was supported by “Ministry of Science, ICT and Future Planning (MSIP), Korea, under the Information Technology Research Center (ITRC) support program (NIPA-2014-H0301-14-1001) supervised by the National IT Industry Promotion Agency (NIPA)” and “Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Plannig (2012M3C4A7033348)”.

References

  1. 1.
    Amazon Elastic Cloud Computing. http://aws.amazon.com/ec2. Accessed 1 Oct 2013
  2. 2.
    GoGrid Cloud Hosting. http://www.gogrid.com. Accessed 1 Oct 2013
  3. 3.
    Cherkasova, L., Gardner, R.: Measuring CPU overhead for I/O processing in the Xen virtual machine monitor. In: Annual Conference on USENIX Annual Technical Conference (ATEC’05), Anaheim, CA, USA, pp. 387–390 (2005)Google Scholar
  4. 4.
    Menon, A., Santos, J.R., Turner, Y., Janakiraman, G.J., Zwaenepoel, W.: Diagnosing Performance overheads in the Xen virtual machine environment. In: 1st ACM/USENIX Conference on Virtual Execution Environments (VEE’05), Chicago, IL, USA, pp. 13–23 (2005)Google Scholar
  5. 5.
    Kim, K., Kim, C., Jung, S.-I., Shin, H.-S., Kim, K.-S.: Inter-domain socket communications supporting high performance and full binary compatibility on Xen. In: 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’08), Seattle, WA, USA, pp. 11–20 (2008)Google Scholar
  6. 6.
    Dean, J., Chemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  7. 7.
    Matsunaga, A., Tsugawa, M., Fortes, J.: CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: 4th IEEE International Conferences on eScience, Indianapolis, IN, USA, pp. 222–229 (2008)Google Scholar
  8. 8.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkin, A.: Pig Latin: a not-so-foreign language for data processing. In: 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08), Vancouver, Canada, pp. 1099–1110 (2008)Google Scholar
  9. 9.
    Lee, K., Park, S., Lee, H.: Improving MapReduce performance using block device reconfiguration in virtualized clouds. In: 2012 International Conference on Information Science and Technology (IST’12), Shanghai, China, pp. 330–332 (2012)Google Scholar
  10. 10.
    Hadoop. http://lucene.apache.org/hadoop. Accessed 1 Oct 2013
  11. 11.
    Ibrahim, S., Jin, H., Lu, L., Qi, L., Wu, S., Shi, X.: Evaluating MapReduce on virtual machines: the Hadoop case. In: 1st International Conference on Cloud Computing (CloudCom’09), Beijing, China, pp. 519–528 (2009) Google Scholar
  12. 12.
    Fang, J., Yang, S., Zhou, W., Song, H.: Evaluating I/O scheduler in virtual machines for MapReduce applications. In: 9th IEEE International Conference on Grid and Cooperative Computing (GCC’10), Nanjing, China, pp. 64–69 (2010)Google Scholar
  13. 13.
    Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: 19th ACM Symposium on Operating Systems Principles (SOSP’03), Bolton Landing, NY, USA, pp. 164–177 (2003)Google Scholar
  14. 14.
    Ibrahim, S., Jin, H., Cheng, B., Cao, H., Wu, S., Qi, L.: CLOUDLET: towards MapReduce implementation on virtual machines. In: 18th ACM International Symposium on High Performance Distributed Computing (HPDC’09), Munich, Germany, pp. 65–66 (2009)Google Scholar
  15. 15.
    Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08), San Diego, CA, USA, pp. 29–42 (2008)Google Scholar
  16. 16.
    Herodotou, H., Lim, H., Luo, G., Borisov, N., Dong, L., Cetin, F.B., Babu, S.: Starfish: a self-tuning system for big data analytics. In: 5th Biennial Conference on Innovative Data Systems Research (CIDR’11), Asilomar, CA, USA, pp. 261–272 (2011)Google Scholar
  17. 17.
    Kang, H., Chen, Y., Wong, Y., Sion, R., Wu, J.: Enhancement of Xen’s scheduler for MapReduce workloads. In: 20th ACM International Symposium of High Performance Distributed Computing (HPDC’11), San Jose, CA, USA, pp. 251–262 (2011)Google Scholar
  18. 18.
    Ibrahim, S., Jin, H., Lu, L., He, B., Wu, S.: Adaptive disk I/O scheduling for MapReduce in virtualized environment. In: International Conference on Parallel Processing (ICPP’11), Taipei, Taiwan, pp. 335–344 (2011)Google Scholar
  19. 19.
    Sandholm, T., Lai, K.: MapReduce optimization using regulated dynamic prioritization. In: 11th International Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’09), Seattle, WA, USA, pp. 299–310 (2009)Google Scholar
  20. 20.
    Geng, Y., Chen, S., Wu, Y., Wu, R., Yang, G., Zheng, W.: Location-aware MapReduce in virtual cloud. In: 40th International Conference on Parallel Processing, Taipei, Taiwan, pp. 275–284 (2011)Google Scholar
  21. 21.
    Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11), Seattle, WA, USA, p. 58 (2011)Google Scholar
  22. 22.
    Park, J., Lee, D., Kim, B., Huh, J., Maeng, J.: Locality-aware dynamic VM reconfiguration on MapReduce clouds. In: 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC’12), Delft, The Netherlands, pp. 27–36 (2012)Google Scholar
  23. 23.
    dbench. http://dbench.samba.org/. Accessed 1 Oct 2013
  24. 24.
    Vazhkudai, S., Schopf, J.M.: Using regression techniques to predict large data transfers. Int. J. High Perform. Comput. Appl. 17(3), 249–268 (2003)CrossRefGoogle Scholar
  25. 25.
    Motulsky, H.J., Ransnas, L.A.: Fitting curves to data using nonlinear regression: a practical and nonmathematical review. FASEB J. 1(5), 365–374 (1987)Google Scholar
  26. 26.
    Lasdon, L.S., Waren, A.D., Jain, A., Ratner, M.: Design and testing of a generalized reduced gradient code for nonlinear programming. ACM Trans. Math. Softw. 4(1), 34–50 (1978)CrossRefzbMATHGoogle Scholar
  27. 27.
    libvirt. http://libvirt.org. Accessed 1 Oct 2013
  28. 28.
    Yamin, H.Y., Shahidehpour, S.M.: Bidding strategies using price based unit commitment in a deregulated power market. J. Electr. Power Compon. Syst. 32(3), 229–245 (2004)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Kwonyong Lee
    • 1
  • Yoonsung Nam
    • 1
  • Taekhee Kim
    • 1
  • Sungyong Park
    • 1
    Email author
  • Hyuk-Jun Lee
    • 1
  • Jihoon Yang
    • 1
  1. 1.Department of Computer Science and EngineeringSogang UniversitySeoulKorea

Personalised recommendations