Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases

  • Salman Niazi
  • Mahmoud Ismail
  • Seif Haridi
  • Jim Dowling
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_146-1

Definition

Modern NewSQL database systems can be used to store fully normalized metadata for distributed hierarchical file systems, and provide high throughput and low operational latencies for the file system operations.

Introduction

For many years, researchers have investigated the use of database technology to manage file system metadata, with the goal of providing extensible typed metadata and support for fast, rich metadata search. However, previous attempts failed mainly due to the reduced performance introduced by adding database operations to the file system’s critical path. However, recent improvements in the performance of distributed in-memory online transaction processing databases (NewSQL databases) led us to reinvestigate the possibility of using a database to manage file system metadata, but this time for a distributed, hierarchical file system, the Hadoop file system (HDFS). The single-host metadata service of HDFS is a well-known bottleneck for both the size of HDFS...

This is a preview of subscription content, log in to check access.

References

  1. Abad CL (2014) Big data storage workload characterization, modeling and synthetic generation. PhD thesis, University of Illinois at Urbana-ChampaignGoogle Scholar
  2. Guerraoui R, Raynal M (2006) A leader election protocol for eventually synchronous shared memory systems. In: The fourth IEEE workshop on software technologies for future embedded and ubiquitous systems, 2006 and the 2006 second international workshop on collaborative computing, integration, and assurance, SEUS 2006/WCCIA, pp 6–Google Scholar
  3. Hammer-Bench (2016) Distributed metadata benchmark to HDFS. https://github.com/smkniazi/hammer-bench. [Online; Accessed 1 Jan 2016]
  4. Ismail M, Gebremeskel E, Kakantousis T, Berthou G, Dowling J (2017) Hopsworks: improving user experience and development on hadoop with scalable, strongly consistent metadata. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS), pp 2525–2528Google Scholar
  5. Ismail M, Niazi S, Ronström M, Haridi S, Dowling J (2017) Scaling HDFS to more than 1 million operations per second with HopsFS. In: Proceedings of the 17th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGrid ’17. IEEE Press, Piscataway, pp 683–688Google Scholar
  6. Niazi S, Haridi S, Dowling J (2017) Size matters: improving the performance of small files in HDF. https://eurosys2017.github.io/assets/data/posters/poster09-Niazi.pdfl. [Online; Accessed 30 June 2017]
  7. Niazi S, Ismail M, Haridi S, Dowling J, Grohsschmiedt S, Ronström M (2017) Hopsfs: scaling hierarchical file system metadata using newsql databases. In: 15th USENIX conference on file and storage technologies (FAST’17). USENIX Association, Santa Clara, pp 89–104Google Scholar
  8. Noll MG (2015) Benchmarking and stress testing an hadoop cluster with TeraSort. TestDFSIO & Co. http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/. [Online; Accessed 3 Sept 2015]
  9. Ovsiannikov M, Rus S, Reeves D, Sutter P, Rao S, Kelly J (2013) The quantcast file system. Proc VLDB Endow 6(11):1092–1101Google Scholar
  10. Patil SV Gibson GA Lang S, Polte M (2007) GIGA+: scalable directories for shared file systems. In: Proceedings of the 2nd international workshop on petascale data storage: held in conjunction with supercomputing ’07, PDSW ’07. ACM, New York, pp 26–29Google Scholar
  11. Ren K, Kwon Y, Balazinska M, Howe B (2013) Hadoop’s adolescence: an analysis of hadoop usage in scientific workloads. Proc VLDB Endow 6(10):853–864Google Scholar
  12. Salman Niazi GB, Ismail M, Dowling J (2015) Leader election using NewSQL systems. In: Proceeding of DAIS 2015. Springer, pp 158–172Google Scholar
  13. Shvachko KV (2010) HDFS scalability: the limits to growth. Login Mag USENIX 35(2):6–16Google Scholar
  14. Thomson A, Abadi DJ (2015) CalvinFS: consistent WAN replication and scalable metadata management for distributed file systems. In: 13th USENIX conference on file and storage technologies (FAST 15). USENIX Association, Santa Clara, pp 1–14Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Salman Niazi
    • 1
  • Mahmoud Ismail
    • 1
  • Seif Haridi
    • 1
  • Jim Dowling
    • 1
  1. 1.KTH – Royal Institute of TechnologyStockholmSweden

Section editors and affiliations

  • Asterios Katsifodimos
    • 1
  • Pramod Bhatotia
    • 2
  1. 1.Delft University of TechnologyDelftNetherlands
  2. 2.School of InformaticsUniversity of EdinburghEdinburghUnited Kingdom