HDFS Pipeline Reformation to Minimize the Data Loss

Purnachandra Rao, B.; Nagamalleswara Rao, N.

doi:10.1007/978-981-13-1927-3_27

HDFS Pipeline Reformation to Minimize the Data Loss

B. Purnachandra Rao⁶ &
N. Nagamalleswara Rao⁷

Conference paper
First Online: 05 November 2018

1513 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 105))

Abstract

The Hadoop is a popular framework. It has been designed to deal with very large sets of data. Hadoop file sizes are usually very large, ranging from gigabytes to terabytes, and large Hadoop clusters store millions of these files. HDFS will use the pipeline process to write the data into blocks. NameNode will send the available blocks list so that pipeline will be created based on the DataNodes having the empty blocks. We can customize the DataNode replacement policy in case of any DataNode failure in the pipeline process using configuration parameters. In this process, write process will be resumed even though there are less number of DataNodes, i.e., even having single DataNode. In single DataNode case, we will lose the data since we have only one copy of data. This paper addresses the issue while having single DataNode in the write operation and taking the pause in write operation until it gets the DataNodes in the pipeline process, and having pause is worthwhile than losing the valuable data if the DataNode fails while write operation is in progress.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Apache Hadoop. Available at Hadoop Apache
Google Scholar
Apache Hadoop Distributed File System. Available at Hadoop Distributed File System Apache
Google Scholar
Scalability of Hadoop Distributed File System
Google Scholar
Porter, G.: Decoupling storage and computation in Hadoop with SuperDataNodes. ACM SIGOPS Oper. Syst. Rev. 44 (2010)
Article Google Scholar
Kakade, A., Raut, S.: Hadoop distributed file system with cache technology. Ind. Sci. 1(6) (2014). ISSN: 2347-5420
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceeding of the 6th Conference on Symposium on operating Systems Design and Implementation (OSDI’04), Berkeley, CA, USA, pp. 137–150 (2004)
Google Scholar
Shafer, J., Rixner, S., Cox, A.L.: The Hadoop distributed filesystem: balancing portability and performance. In: Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2010), White Plains, NY (2010)
Google Scholar
Wang, F., et al.: Hadoop High Availability through Metadata Replication. IBM China Research Laboratory, ACM (2009)
Google Scholar
Tankel, D.: Scalability of Hadoop Distributed File System. Yahoo Developer Work (2010)
Google Scholar
Alapati, S.R.: Expert Hadoop Administration, Managing, Tuning and Securing
Google Scholar
Ankam, V.: Big Data Analytics. Published by Packt Publishing Ltd. ISBN 978-1-78588-469-6 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, ANU College of Engineering & Technology, Guntur, India
B. Purnachandra Rao
Department of Information Technology, R.V.R & J.C. College of Engineering & Technology, Guntur, India
N. Nagamalleswara Rao

Authors

B. Purnachandra Rao
View author publications
You can also search for this author in PubMed Google Scholar
N. Nagamalleswara Rao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. Purnachandra Rao .

Editor information

Editors and Affiliations

School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata, West Bengal, India
Swagatam Das

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Purnachandra Rao, B., Nagamalleswara Rao, N. (2019). HDFS Pipeline Reformation to Minimize the Data Loss. In: Satapathy, S., Bhateja, V., Das, S. (eds) Smart Intelligent Computing and Applications . Smart Innovation, Systems and Technologies, vol 105. Springer, Singapore. https://doi.org/10.1007/978-981-13-1927-3_27

Download citation

DOI: https://doi.org/10.1007/978-981-13-1927-3_27
Published: 05 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1926-6
Online ISBN: 978-981-13-1927-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics