Skip to main content

A New Scalable Approach for Distributed Metadata in HPC

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2016)

Abstract

In the last years not only a growth of data-intensive storage has been observed, but also compute-intensive workloads need a high computing power and high parallelism with good performance and great scalability. Many distributed filesystem have focused in how to distribute data across multiple processing nodes, but one of the main problem to solve is the management of the ever-greater number of metadata requests. In fact, some studies have identified that an optimized metadata management is a key factor to achieve good performance. Applications in high performance computing usually require filesystems able to provide a huge amount of operations per second to achieve the required level of performance. Although the metadata storage is smaller than data storage, metadata operations consume large CPU cycles, so a single metadata server cannot be longer sufficient. In this paper we define a completely distributed method that provides efficient metadata management and seamlessly adapts to general purpose and scientific computing filesystem workloads. The throughput performance is measured by a metadata benchmark and compared with several distributed filesystems. The results show great scalability in creating operations on a single directory accessed by multiple clients.

A.F. Díaz—This work has been partially supported by European Union FEDER and the Spanish Ministry of Economy and Competitiveness TIN2015-67020-P, FPA2015-65150-C3-3-P, and PROMEP/103.5/13/6475 UAEH-146. The authors would like to thank FCSCL (Fundación Centro de Supercomputación de Castilla y León) for providing access to a cluster of its supercomputer Calendula.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fundación centro de supercomputación de castilla y león. http://www.fcsc.es

  2. Leveldb. http://www.leveldb.org

  3. Official web page of lustre filesystem. http://www.lustre.org

  4. Mdtest benchmark. http://www.nersc.gov. Accessed 29 Mar 2013

  5. Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering perspective. In: Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing, pp. 398–407. ACM (2007)

    Google Scholar 

  6. Díaz, A.F., Anguita, M., Camacho, H.E., Nieto, E., Ortega, J.: Two-level hash/table approach for metadata management in distributed file systems. J. Supercomput. 64(1), 144–155 (2013)

    Article  Google Scholar 

  7. Lorch, J.R., Anderson, T.E.: A comparison of file system workloads (2000)

    Google Scholar 

  8. Fagin, R., Nievergelt, J., Pippenger, N., Strong, H.R.: Extendible hashing-a fast access method for dynamic files. ACM Trans. Database Syst. 4(3), 315–344 (1979)

    Article  Google Scholar 

  9. Hua, Y., Zhu, Y., Jiang, H., Feng, D., Tian, L.: Supporting scalable and adaptive metadata management in ultralarge-scale file systems. IEEE Trans. Parallel Distrib. Syst. 22(4), 580–593 (2011). ID: 1

    Article  Google Scholar 

  10. Weil, S.A., Pollack, K.T., Brandt, S.A., Miller, E.L.: Dynamic metadata management for petabyte-scale file systems

    Google Scholar 

  11. Patil, S.V., Gibson, G.A., Lang, S., Polte, M.: Giga+: scalable directories for shared file systems. In: Proceedings of the 2nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing 2007, PDSW 2007, pp. 26–29. ACM, New York (2007)

    Google Scholar 

  12. Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST 2002, pp. 19–23. USENIX Association, Berkeley (2002)

    Google Scholar 

  13. Shvachko, K.V.: HDFS scalability: the limits to growth. 35(2), 6–16 (2010)

    Google Scholar 

  14. Studham, R.S., Subramaniyan, R.: Lustre: a future standard for parallel file systems. In: Invited Presentation at International Supercomputer Conference, Heidelberg, Germany (2005)

    Google Scholar 

  15. Tang, H., Gulbeden, A., Zhou, J., Strathearn, W., Yang, T., Chu, L.: A self-organizing storage cluster for parallel data-intensive applications. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC 2004, p. 52. IEEE Computer Society, Washington (2004)

    Google Scholar 

  16. Wang, F., Xin, Q., Hong, B., Brandt, S.A., Miller, E.L., Long, D.D., McLarty, T.T.: File system workload analysis for large scale scientific computing applications. In: Proceedings of the Twentieth IEEE/Eleventh NASA Goddard Conference on Mass Storage Systems and Technologies, College Park, MD. IEEE Computer Society Press, April 2004

    Google Scholar 

  17. Weil, S., Leung, A., Brandt, S., Maltzahn, C.: Rados. In: Proceedings of the 2nd International Workshop on Petascale Data Storage, pp. 35–44, 11 November 2007

    Google Scholar 

  18. Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI 2006, pp. 307–320. USENIX Association, Berkeley (2006)

    Google Scholar 

  19. Weil, S.A., Brandt, S.A., Miller, E.L., Maltzahn, C.: Crush: controlled, scalable, decentralized placement of replicated data. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 2006, p. 122. ACM (2006)

    Google Scholar 

  20. Welch, B., Unangst, M., Abbasi, Z., Gibson, G.A., Mueller, B., Small, J., Zhou, B.: Scalable performance of the panasas parallel file system. In: FAST, vol. 8, pp. 1–17 (2008)

    Google Scholar 

  21. Yang, S., Walter, B.: Ligon III Parallel Architecture Research Laboratory Clemson University, Clemson, SC 29634, USA f, and g. Scalable distributed directory implementation on orange file system

    Google Scholar 

  22. Zhu, Y., Jiang, H., Wang, J., Xian, F.: HBA: distributed metadata management for large cluster-based storage systems. IEEE Trans. Parallel Distrib. Syst. 19(6), 750–763 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio F. Díaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Rodríguez-Quintana, C., Díaz, A.F., Ortega, J., Palacios, R.H., Ortiz, A. (2016). A New Scalable Approach for Distributed Metadata in HPC. In: Carretero, J., Garcia-Blas, J., Ko, R., Mueller, P., Nakano, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2016. Lecture Notes in Computer Science(), vol 10048. Springer, Cham. https://doi.org/10.1007/978-3-319-49583-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49583-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49582-8

  • Online ISBN: 978-3-319-49583-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics