Skip to main content
Log in

Elastic HDFS: interconnected distributed architecture for availability–scalability enhancement of large-scale cloud storages

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper presents an interconnected distributed architecture for storing data and metadata in large-scale cloud storage systems. The primary goal of the proposed architecture is to enhance the scalability of namespace directory in large-scale file systems. Structural shift from distinguished distributed model to interconnected distributed model and conducting effective coordination among file servers for namespace management are two key solutions considered in the context of proposed architecture. To this intent, a coordination protocol is designed for communication among file servers, and maintaining user transparency in the presence of different file system actions/reactions. The experimental results, obtained via emulations under different network conditions and cloud storage sizes, show up to 43.9% availability and 37.8% connection throughput improvements with negligible storage overhead compared to the latest released version of Hadoop distributed file system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Cai H et al (2016) IoT-based big data storage systems in cloud computing: perspectives and challenges. IEEE Internet Things J 4(1):75–78

    Google Scholar 

  2. Mahmood T et al (2018) Karma: cost-effective geo-replicated cloud storage with dynamic enforcement of causal consistency. IEEE Trans Cloud Comput 1(1):18–28

    MathSciNet  Google Scholar 

  3. Mittal A et al (2015) Google file system and Hadoop distributed file system: an analogy. Int J Innov Adv Comput Sci 4(1):29–43

    Google Scholar 

  4. Hu D et al (2015) Research on reliability of Hadoop distributed file system. Int J Multimed Ubiquitous Eng 10(11):42–54

    Google Scholar 

  5. Iliadis I et al (2014) Reliability of geo-replicated cloud storage systems. In: 2014 IEEE Pacific Rim International Symposium on Dependable Computing, Singapore, pp 169–179

  6. Asif Khan M et al (2012) Highly available Hadoop namenode architecture. In: 2012 International Conference on Advanced Computer Science Applications and Technologies, Malaysia, pp 167–172

  7. Liu J et al (2016) Reliable and confidential cloud storage with efficient data forwarding functionality. IET Commun J 10(6):661–668

    Article  Google Scholar 

  8. Xing L et al (2017) Reliability modeling of mesh storage area networks for Internet of Things. IEEE Internet Things J 4(6):2047–2057

    Article  Google Scholar 

  9. HDFS Federation (2018) Retrieved 1 Mar 2019 from https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-hdfs/Federation.html

  10. Uber (2018) Retrieved 1 Mar 2019 from https://www.uber.com/

  11. Hakimzadeh K et al (2014) Scaling HDFS with a strongly consistent relational model for metadata. In: 2014 IFIP International Conference on Distributed Applications and Interoperable Systems, Germany, pp 19–31

  12. Huang Z (2014) DNN: a distributed namenode filesystem for Hadoop. In partial fulfilment of requirements for the degree of Master of Science, University of Nebraska–Lincoln

  13. Kim Y et al (2014) A distributed namenode cluster for a highly-available Hadoop distributed file system. In: 2014 IEEE International Symposium on Reliable Distributed Systems, Japan, pp 835–851

  14. Xue R et al (2014) Partitioner: a distributed HDFS metadata server cluster. In: 2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, China, pp 167–174

  15. Wang Y et al (2012) Clover: a distributed file system of expandable metadata service derived from HDFS. In: 2012 IEEE International Conference on Cluster Computing, USA, pp 126–134

  16. Graphical Network Simulator 3 (2018) Retrieved 20 Sep 2018 from https://www.gns3.com/

  17. Intelligent Java IDE (2016) Retrieved 15 Feb 2019 from https://www.jetbrains.com/idea/

  18. R. Nayak (2018) Hadoop performance evaluation by benchmarking and stress testing with TeraSort and TestDFSIO, Retrieved 1 Mar 2019 from https://medium.com/ymedialabs-innovation/hadoop-performance-evaluation-by-benchmarking-and-stress-testing-with-terasort-and-testdfsio-444b22c77db2

  19. Apache Hadoop version 1.x.y (2012) Retrieved 2 Mar 2019 from https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html

  20. Apache Hadoop version 2.x.y (2015) Retrieved 2 Mar 2019 from https://hadoop.apache.org/docs/r2.7.2/

  21. HDFS High Availability Using the Quorum Journal Manager (2017) Retrieved 1 Mar 2019 from https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

  22. Apache Hadoop version 3.x.y (2017) Retrieved 1 Mar 2019 from https://hadoop.apache.org/docs/r3.0.0/

  23. Gupta T et al (2015) An extended HDFS with an AVATAR NODE to handle both small files and to eliminate single point of Failure. In: 2015 International Conference on Soft Computing Techniques and Implementations, India, pp 67–71

  24. Wang Z et al (2013) NCluster: using multiple active namenodes to achieve high availability for HDFS. In: 2013 IEEE International Conference on High Performance Computing and Communications, China, pp 2291–2297

  25. Tang Y et al (2015) MICS: mingling chained storage combining replication and erasure coding. In: 2015 IEEE Symposium on Reliable Distributed Systems, Canada, pp 192–201

  26. Yin J et al (2017) ASSER: an efficient, reliable, and cost-effective storage scheme for object-based cloud storage systems. IEEE Trans Comput 66(8):1326–1340

    Article  MathSciNet  Google Scholar 

  27. Application Request Routing (2018) Retrieved 10 Dec 2018 from https://www.iis.net/downloads/microsoft/application-request-routing

  28. NGINX (2018) Retrieved 10 Dec 2018 from https://nginx.org/en/

  29. Wang C et al (2013) Privacy-preserving public auditing for secure cloud storage. IEEE Trans Comput 62(2):362–375

    Article  MathSciNet  Google Scholar 

  30. HPE ProLiant DL380 Gen9 Server (2017) Retrieved 20 Feb 2019 from https://www.hpe.com/us/en/product-catalog/servers/proliant-servers/pip.hpe-proliant-dl380-gen9-server.7271241.html

Download references

Acknowledgements

The present study was supported by Golestan University (Grant 981871), Gorgan, Iran.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Maghsoudloo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maghsoudloo, M., Khoshavi, N. Elastic HDFS: interconnected distributed architecture for availability–scalability enhancement of large-scale cloud storages. J Supercomput 76, 174–203 (2020). https://doi.org/10.1007/s11227-019-03017-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-03017-y

Keywords

Navigation