Abstract
This paper presents an interconnected distributed architecture for storing data and metadata in large-scale cloud storage systems. The primary goal of the proposed architecture is to enhance the scalability of namespace directory in large-scale file systems. Structural shift from distinguished distributed model to interconnected distributed model and conducting effective coordination among file servers for namespace management are two key solutions considered in the context of proposed architecture. To this intent, a coordination protocol is designed for communication among file servers, and maintaining user transparency in the presence of different file system actions/reactions. The experimental results, obtained via emulations under different network conditions and cloud storage sizes, show up to 43.9% availability and 37.8% connection throughput improvements with negligible storage overhead compared to the latest released version of Hadoop distributed file system.
Similar content being viewed by others
References
Cai H et al (2016) IoT-based big data storage systems in cloud computing: perspectives and challenges. IEEE Internet Things J 4(1):75–78
Mahmood T et al (2018) Karma: cost-effective geo-replicated cloud storage with dynamic enforcement of causal consistency. IEEE Trans Cloud Comput 1(1):18–28
Mittal A et al (2015) Google file system and Hadoop distributed file system: an analogy. Int J Innov Adv Comput Sci 4(1):29–43
Hu D et al (2015) Research on reliability of Hadoop distributed file system. Int J Multimed Ubiquitous Eng 10(11):42–54
Iliadis I et al (2014) Reliability of geo-replicated cloud storage systems. In: 2014 IEEE Pacific Rim International Symposium on Dependable Computing, Singapore, pp 169–179
Asif Khan M et al (2012) Highly available Hadoop namenode architecture. In: 2012 International Conference on Advanced Computer Science Applications and Technologies, Malaysia, pp 167–172
Liu J et al (2016) Reliable and confidential cloud storage with efficient data forwarding functionality. IET Commun J 10(6):661–668
Xing L et al (2017) Reliability modeling of mesh storage area networks for Internet of Things. IEEE Internet Things J 4(6):2047–2057
HDFS Federation (2018) Retrieved 1 Mar 2019 from https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-hdfs/Federation.html
Uber (2018) Retrieved 1 Mar 2019 from https://www.uber.com/
Hakimzadeh K et al (2014) Scaling HDFS with a strongly consistent relational model for metadata. In: 2014 IFIP International Conference on Distributed Applications and Interoperable Systems, Germany, pp 19–31
Huang Z (2014) DNN: a distributed namenode filesystem for Hadoop. In partial fulfilment of requirements for the degree of Master of Science, University of Nebraska–Lincoln
Kim Y et al (2014) A distributed namenode cluster for a highly-available Hadoop distributed file system. In: 2014 IEEE International Symposium on Reliable Distributed Systems, Japan, pp 835–851
Xue R et al (2014) Partitioner: a distributed HDFS metadata server cluster. In: 2014 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, China, pp 167–174
Wang Y et al (2012) Clover: a distributed file system of expandable metadata service derived from HDFS. In: 2012 IEEE International Conference on Cluster Computing, USA, pp 126–134
Graphical Network Simulator 3 (2018) Retrieved 20 Sep 2018 from https://www.gns3.com/
Intelligent Java IDE (2016) Retrieved 15 Feb 2019 from https://www.jetbrains.com/idea/
R. Nayak (2018) Hadoop performance evaluation by benchmarking and stress testing with TeraSort and TestDFSIO, Retrieved 1 Mar 2019 from https://medium.com/ymedialabs-innovation/hadoop-performance-evaluation-by-benchmarking-and-stress-testing-with-terasort-and-testdfsio-444b22c77db2
Apache Hadoop version 1.x.y (2012) Retrieved 2 Mar 2019 from https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
Apache Hadoop version 2.x.y (2015) Retrieved 2 Mar 2019 from https://hadoop.apache.org/docs/r2.7.2/
HDFS High Availability Using the Quorum Journal Manager (2017) Retrieved 1 Mar 2019 from https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
Apache Hadoop version 3.x.y (2017) Retrieved 1 Mar 2019 from https://hadoop.apache.org/docs/r3.0.0/
Gupta T et al (2015) An extended HDFS with an AVATAR NODE to handle both small files and to eliminate single point of Failure. In: 2015 International Conference on Soft Computing Techniques and Implementations, India, pp 67–71
Wang Z et al (2013) NCluster: using multiple active namenodes to achieve high availability for HDFS. In: 2013 IEEE International Conference on High Performance Computing and Communications, China, pp 2291–2297
Tang Y et al (2015) MICS: mingling chained storage combining replication and erasure coding. In: 2015 IEEE Symposium on Reliable Distributed Systems, Canada, pp 192–201
Yin J et al (2017) ASSER: an efficient, reliable, and cost-effective storage scheme for object-based cloud storage systems. IEEE Trans Comput 66(8):1326–1340
Application Request Routing (2018) Retrieved 10 Dec 2018 from https://www.iis.net/downloads/microsoft/application-request-routing
NGINX (2018) Retrieved 10 Dec 2018 from https://nginx.org/en/
Wang C et al (2013) Privacy-preserving public auditing for secure cloud storage. IEEE Trans Comput 62(2):362–375
HPE ProLiant DL380 Gen9 Server (2017) Retrieved 20 Feb 2019 from https://www.hpe.com/us/en/product-catalog/servers/proliant-servers/pip.hpe-proliant-dl380-gen9-server.7271241.html
Acknowledgements
The present study was supported by Golestan University (Grant 981871), Gorgan, Iran.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Maghsoudloo, M., Khoshavi, N. Elastic HDFS: interconnected distributed architecture for availability–scalability enhancement of large-scale cloud storages. J Supercomput 76, 174–203 (2020). https://doi.org/10.1007/s11227-019-03017-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-019-03017-y