A New Scalable Approach for Distributed Metadata in HPC

  • Cristina Rodríguez-Quintana
  • Antonio F. DíazEmail author
  • Julio Ortega
  • Raúl H. Palacios
  • Andrés Ortiz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10048)


In the last years not only a growth of data-intensive storage has been observed, but also compute-intensive workloads need a high computing power and high parallelism with good performance and great scalability. Many distributed filesystem have focused in how to distribute data across multiple processing nodes, but one of the main problem to solve is the management of the ever-greater number of metadata requests. In fact, some studies have identified that an optimized metadata management is a key factor to achieve good performance. Applications in high performance computing usually require filesystems able to provide a huge amount of operations per second to achieve the required level of performance. Although the metadata storage is smaller than data storage, metadata operations consume large CPU cycles, so a single metadata server cannot be longer sufficient. In this paper we define a completely distributed method that provides efficient metadata management and seamlessly adapts to general purpose and scientific computing filesystem workloads. The throughput performance is measured by a metadata benchmark and compared with several distributed filesystems. The results show great scalability in creating operations on a single directory accessed by multiple clients.


Distributed filesystems HPC Metadata management 


  1. 1.
    Fundación centro de supercomputación de castilla y león.
  2. 2.
  3. 3.
    Official web page of lustre filesystem.
  4. 4.
    Mdtest benchmark. Accessed 29 Mar 2013
  5. 5.
    Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering perspective. In: Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing, pp. 398–407. ACM (2007)Google Scholar
  6. 6.
    Díaz, A.F., Anguita, M., Camacho, H.E., Nieto, E., Ortega, J.: Two-level hash/table approach for metadata management in distributed file systems. J. Supercomput. 64(1), 144–155 (2013)CrossRefGoogle Scholar
  7. 7.
    Lorch, J.R., Anderson, T.E.: A comparison of file system workloads (2000)Google Scholar
  8. 8.
    Fagin, R., Nievergelt, J., Pippenger, N., Strong, H.R.: Extendible hashing-a fast access method for dynamic files. ACM Trans. Database Syst. 4(3), 315–344 (1979)CrossRefGoogle Scholar
  9. 9.
    Hua, Y., Zhu, Y., Jiang, H., Feng, D., Tian, L.: Supporting scalable and adaptive metadata management in ultralarge-scale file systems. IEEE Trans. Parallel Distrib. Syst. 22(4), 580–593 (2011). ID: 1CrossRefGoogle Scholar
  10. 10.
    Weil, S.A., Pollack, K.T., Brandt, S.A., Miller, E.L.: Dynamic metadata management for petabyte-scale file systemsGoogle Scholar
  11. 11.
    Patil, S.V., Gibson, G.A., Lang, S., Polte, M.: Giga+: scalable directories for shared file systems. In: Proceedings of the 2nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing 2007, PDSW 2007, pp. 26–29. ACM, New York (2007)Google Scholar
  12. 12.
    Schmuck, F., Haskin, R.: GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies, FAST 2002, pp. 19–23. USENIX Association, Berkeley (2002)Google Scholar
  13. 13.
    Shvachko, K.V.: HDFS scalability: the limits to growth. 35(2), 6–16 (2010)Google Scholar
  14. 14.
    Studham, R.S., Subramaniyan, R.: Lustre: a future standard for parallel file systems. In: Invited Presentation at International Supercomputer Conference, Heidelberg, Germany (2005)Google Scholar
  15. 15.
    Tang, H., Gulbeden, A., Zhou, J., Strathearn, W., Yang, T., Chu, L.: A self-organizing storage cluster for parallel data-intensive applications. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC 2004, p. 52. IEEE Computer Society, Washington (2004)Google Scholar
  16. 16.
    Wang, F., Xin, Q., Hong, B., Brandt, S.A., Miller, E.L., Long, D.D., McLarty, T.T.: File system workload analysis for large scale scientific computing applications. In: Proceedings of the Twentieth IEEE/Eleventh NASA Goddard Conference on Mass Storage Systems and Technologies, College Park, MD. IEEE Computer Society Press, April 2004Google Scholar
  17. 17.
    Weil, S., Leung, A., Brandt, S., Maltzahn, C.: Rados. In: Proceedings of the 2nd International Workshop on Petascale Data Storage, pp. 35–44, 11 November 2007Google Scholar
  18. 18.
    Weil, S.A., Brandt, S.A., Miller, E.L., Long, D.D.E., Maltzahn, C.: Ceph: a scalable, high-performance distributed file system. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI 2006, pp. 307–320. USENIX Association, Berkeley (2006)Google Scholar
  19. 19.
    Weil, S.A., Brandt, S.A., Miller, E.L., Maltzahn, C.: Crush: controlled, scalable, decentralized placement of replicated data. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 2006, p. 122. ACM (2006)Google Scholar
  20. 20.
    Welch, B., Unangst, M., Abbasi, Z., Gibson, G.A., Mueller, B., Small, J., Zhou, B.: Scalable performance of the panasas parallel file system. In: FAST, vol. 8, pp. 1–17 (2008)Google Scholar
  21. 21.
    Yang, S., Walter, B.: Ligon III Parallel Architecture Research Laboratory Clemson University, Clemson, SC 29634, USA f, and g. Scalable distributed directory implementation on orange file systemGoogle Scholar
  22. 22.
    Zhu, Y., Jiang, H., Wang, J., Xian, F.: HBA: distributed metadata management for large cluster-based storage systems. IEEE Trans. Parallel Distrib. Syst. 19(6), 750–763 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Cristina Rodríguez-Quintana
    • 1
    • 2
  • Antonio F. Díaz
    • 1
    • 2
    Email author
  • Julio Ortega
    • 1
    • 2
  • Raúl H. Palacios
    • 1
    • 2
  • Andrés Ortiz
    • 1
    • 2
  1. 1.Department of Computer Architecture and TechnologyUniversity of GranadaGranadaSpain
  2. 2.Communications Enginnering DepartmentUniversity of MálagaMálagaSpain

Personalised recommendations