High Performance Hadoop Distributed File System

Elkawkagy, Mohamed; Elbeh, Heba

doi:10.2991/ijndc.k.200515.007

High Performance Hadoop Distributed File System

Research Article
Open access
Published: 25 May 2020

Volume 8, pages 119–123, (2020)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Networked and Distributed Computing Aims and scope Submit manuscript

High Performance Hadoop Distributed File System

Download PDF

Mohamed Elkawkagy¹ &
Heba Elbeh¹

187 Accesses
12 Citations
Explore all metrics

Abstract

Although by the end of 2020, most of companies will be running 1000 node Hadoop in the system, the Hadoop implementation is still accompanied by many challenges like security, fault tolerance, flexibility. Hadoop is a software paradigm that handles big data, and it has a distributed file systems so-called Hadoop Distributed File System (HDFS). HDFS has the ability to handle fault tolerance using data replication technique. It works by repeating the data in multiple DataNodes which means the reliability and availability are achieved. Although data replications technique works well, but still waste much more time because it uses single pipelined paradigm. The proposed approach improves the performance of HDFS by using multiple pipelines in transferring data blocks instead of single pipeline. In addition, each DataNode will update its reliability value after each round and send this updated data to the NameNode. The NameNode will sort the DataNodes according to the reliability value. When the client submits request to upload data block, the NameNode will reply by a list of high reliability DataNodes that will achieve high performance. The proposed approach is fully implemented and the experimental results show that it improves the performance of HDFS write operations.

Article PDF

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, B. Lyon, Design and Implementation of the Sun Network Filesystem, Proceedings of the USENIX Conference & Exibition, Portland, OR, 1985, pp. 119–130.
Google Scholar
J.H. Howard, An overview of the Andrew file system, Proceedings of the USENIX Winter Technical Conference, Dallas TX, 1988, pp. 23–26.
Google Scholar
S. Ghemawat, H. Gobioff, S.T. Leung, The Google file system, SOSP ‘03 Proceedings of the 19th ACM Symposium on Operating Systems Review, ACM, New York, USA, 2003, pp. 29–43.
Google Scholar
B. Martini, K.K.R. Choo, Distributed filesystem forensics: XtreemFS as a case study, Dig. Invest. 11 (2014), 295–313.
Google Scholar
W. Lin, L. Chen, B. Liu, A Hadoop-based efficient economic cloud storage system, 2011 Third Pacific-Asia Conference on Circuits, Communications and System (PACCS), IEEE, Wuhan, China, 2011, pp. 1–4.
Google Scholar
E. Abdelfattah, M. Elkawkagy, A. El-Sisi, A reactive fault tolerance approach for cloud computing, 2017 13th International Computer Engineering Conference (ICENCO), IEEE, Cairo, Egypt, 2017, pp. 190–194.
Google Scholar
E. Sivaraman, R. Manickachezian, High performance and fault tolerant distributed file system for big data storage and processing using hadoop, 2014 International Conference on Intelligent Computing Applications (ICICA), IEEE, Coimbatore, India, 2014, pp. 32–36.
Google Scholar
C.L. Abad, Y. Lu, R.H. Campbell, DARE: adaptive data replication for efficient cluster scheduling, 2011 IEEE International Conference on Cluster Computing, IEEE, Austin, TX, USA, 2011, pp. 159–168.
Google Scholar
B. Fan, W. Tantisiriroj, L. Xiao, G. Gibson, DiskReduce: RAID for data-intensive scalable computing, Proceedings of the 4th Annual Workshop on Petascale Data Storage, ACM, Portland, OR, USA, 2009, pp. 6–10.
Google Scholar
Z. Cheng, Z. Luan, Y. Meng, Y. Xu, D. Qian, A. Roy, et al., ERMS: an elastic replication management system for HDFS, 2012 IEEE International Conference on Cluster Computing Workshops, IEEE, Beijing, China, 2012, pp. 32–40.
Google Scholar
Q. Feng, J. Han, Y. Gao, D. Meng, Magicube: high reliability and low redundancy storage architecture for cloud computing, 2012 IEEE Seventh International Conference on Networking, Architecture and Storage (NAS), IEEE, Xiamen, Fujian, China, 2012, pp. 89–93.
Google Scholar
M. Patel Neha, M. Patel Narendra, M.I. Hasan, D. Shah Parth, M. Patel Mayur, Improving HDFS write performance using efficient replica placement, 2014 5th International Conference - Confluence The Next Generation Information Technology Summit (Confluence), IEEE, Noida, India, 2014, pp. 36–39.
Google Scholar
M. Patel Neha, M. Patel Narendra, M.I. Hasan, M. Patel Mayur, Improving data transfer rate and throughput of HDFS using efficient replica placement, Int. J. Comput. Appl. 86 (2014), 254–261.
Google Scholar
H. Zhang, L. Wang, H. Huang, SMARTH: enabling multi-pipeline data transfer in HDFS, 2014 43rd International Conference on Parallel Processing, IEEE, Minneapolis MN, USA, 2014, pp. 30–39.
Google Scholar
A. Kashkouli, B. Soleimani, M. Rahbari, Investigating Hadoop architecture and fault tolerance in map-reduce, Int. J. Comput. Sci. Netw. Secur. 17 (2017), 81–87.
Google Scholar
C. Singh, R. Singh, Enhancing performance and fault tolerance of Hadoop cluster, Int. Res. J. Eng. Technol. 4 (2017), 2851–2854.
Google Scholar
D. Poola, M.A. Salehi, K. Ramamohanarao, R. Buyya, A taxonomy and survey of fault-tolerant workflow management systems in cloud and distributed computing environments, Software Architecture for Big Data and the Cloud, Elsevier, Amsterdam, The Netherlands, 2017, pp. 285–320.
Google Scholar
M.G. Noll, Benchmarking and stress testing an Hadoop cluster with TeraSort, TestDFSIO & Co., 2011, Available from: https://www.michael-noll.com/blog/2011/04/09/benchmark-ing-and-stress-testing-an-hadoop-cluster-with-terasort-testdf-sio-nnbench-mrbench/.

Download references

Author information

Authors and Affiliations

Computer Science Department, Menoufia University, Shibin Elkom, Menoufia, Egypt
Mohamed Elkawkagy & Heba Elbeh

Authors

Mohamed Elkawkagy
View author publications
You can also search for this author in PubMed Google Scholar
Heba Elbeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Elkawkagy.

Rights and permissions

This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Reprints and permissions

About this article

Cite this article

Elkawkagy, M., Elbeh, H. High Performance Hadoop Distributed File System. Int J Netw Distrib Comput 8, 119–123 (2020). https://doi.org/10.2991/ijndc.k.200515.007

Download citation

Received: 13 March 2020
Accepted: 24 April 2020
Published: 25 May 2020
Issue Date: June 2020
DOI: https://doi.org/10.2991/ijndc.k.200515.007

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

High Performance Hadoop Distributed File System

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation