Scalability of Replicated Metadata Services in Distributed File Systems

  • Dimokritos Stamatakis
  • Nikos Tsikoudis
  • Ourania Smyrnaki
  • Kostas Magoutis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7272)


There has been considerable interest recently in the use of highly-available configuration management services based on the Paxos family of algorithms to address long-standing problems in the management of large-scale heterogeneous distributed systems. These problems include providing distributed locking services, determining group membership, electing a leader, managing configuration parameters, etc. While these services are finding their way into the management of distributed middleware systems and data centers in general, there are still areas of applicability that remain largely unexplored. One such area is the management of metadata in distributed file systems. In this paper we show that a Paxos-based approach to building metadata services in distributed file systems can achieve high availability without incurring a performance penalty. Moreover, we demonstrate that it is easy to retrofit such an approach to existing systems (such as PVFS and HDFS) that currently use different approaches to availability. Our overall approach is based on the use of a general-purpose Paxos-compatible component (the embedded Oracle Berkeley database) along with a methodology for making it interoperate with existing distributed file system metadata services.


Directory Object Hadoop Distribute File System Distribute File System Metadata Server Operating System Principle 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Lamport, L.: The Part-Time Parliament. ACM Transactions on Computer Systems (TOCS) 16(2), 133–169 (1998)CrossRefGoogle Scholar
  2. 2.
    Oki, B.M., Liskov, B.H.: Viewstamped Replication: A New Primary Copy Method to Support Highly Available Distributed Systems. In: Proc. of the 7th ACM Symposium on Principles of Distributed Computing (PODC 1988), Toronto, Canada (1988)Google Scholar
  3. 3.
    Lampson, B.W.: How to Build a Highly Available System using Consensus. In: Babaoğlu, Ö., Marzullo, K. (eds.) WDAG 1996. LNCS, vol. 1151, pp. 1–17. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  4. 4.
    Burrows, M.: The Chubby Lock Service for Loosely-Coupled Distributed Systems. In: Proceedings of OSDI 2006, Seattle, WA (2006)Google Scholar
  5. 5.
    Junqueira, F., Reed, B.C., Serafini, M.: Zab: High-performance Broadcast for Primary-Backup Systems. In: Proc. of IEEE/IFIP International Conference on Dependable Systems and Networks, Hong Kong, China (2011)Google Scholar
  6. 6.
    Olson, M.A., Bostic, K., Seltzer, M.I.: Berkeley DB. In: Proceedings of USENIX Annual Technical Conference, FREENIX Track, Monterey, CA (1999)Google Scholar
  7. 7.
    Perl, S.E., Seltzer, M.I.: Data Management for Internet-Scale Single-Sign-On. In: Proc. of USENIX WORLDS 2006, Seattle, WA (2006)Google Scholar
  8. 8.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems (TOCS) 26(2), 1–26 (2008)zbMATHCrossRefGoogle Scholar
  9. 9.
    Redstone, J., Chandra, T., Griesemer, R.: Paxos Made Live: An Engineering Perspective. In: Proc. of the 26th Annual ACM Symposium on Principles of Distributed Computing (PODC 2007), Portland, OR (2007)Google Scholar
  10. 10.
    Lee, E., Thekkath, C.: Petal: Distributed Virtual Disks. In: Proc. of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Cambridge, MA (1996)Google Scholar
  11. 11.
    Liskov, B., Ghemawat, S., Gruber, R., Johnson, P., Shrira, L.: Replication in the Harp File System. In: Proc. of the 13th ACM Symposium on Operating Systems Principles, Pacific Grove, CA (1991)Google Scholar
  12. 12.
    MacCormick, J., Murphy, N., Najork, M., Thekkath, C.A., Zhou, L.: Boxwood: Abstractions as the Foundation for Storage Infrastructure. In: Proc. of 6th Symposium on Operating Systems Design & Implementation (OSDI 2004), San Francisco, CA (2004)Google Scholar
  13. 13.
    MacCormick, J., Thekkath, C.A., Jager, M., Roomp, K., Zhou, L., Peterson, R.: Niobe: A Practical Replication Protocol. ACM Trans. on Storage 3(4), 1–43 (2008)CrossRefGoogle Scholar
  14. 14.
    Shepler, S., et al.: Parallel NFS, RFC 5661-5664,
  15. 15.
    Ligon, M., Ross, R.: Overview of the Parallel Virtual Fle System. In: Proceedings of USENIX Extreme Linux Workshop, Monterey, CA (1999)Google Scholar
  16. 16.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: Proc. of IEEE Conference on Mass Storage Systems and Technologies (MSST), Lake Tahoe, NV (2010)Google Scholar
  17. 17.
    Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System. In: Proc. of 19th ACM Symposium on Operating Systems Principles (SOSP-19), Bolton Landing, New York (2003)Google Scholar
  18. 18.
    Bhide, A.K., Elnozahy, E.N., Morgan, S.P.: A Highly Available Network File Server. In: Proc. of the USENIX Winter Conference, Nashville, TE (January 1991)Google Scholar
  19. 19.
    Anderson, T., Dahlin, M., Neefe, J., Patterson, D., Roselli, D., Wang, R.: Serverless Network File Systems. In: Proc. of 15th Symposium on Operating Systems Principles, Copper Mountain, CO (1996)Google Scholar
  20. 20.
    Weil, S., Brandt, S., Miller, E.L., Long, D., Maltzahn, C.: Ceph: A Scalable, High-Performance Distributed File System. In: Proc. of 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2006), Seattle, WA (2006)Google Scholar
  21. 21.
    Vahalia, U.: Unix Internals: The New Frontiers. Prentice Hall (2008)Google Scholar
  22. 22.
    Garcia-Molina, H., Salem, K.: Main Memory Database Systems: An Overview. IEEE Transactions on Knowledge and Data Engineering 4(6), 509–516 (1992)CrossRefGoogle Scholar
  23. 23.
    Pacemaker. A Scalable High-Availability Cluster Resource Manager,
  24. 24.
    Katcher, J.: PostMark: A New File System Benchmark. Technical report, Network Appliance TR-3022 (October 1997)Google Scholar
  25. 25.
    Schneider, F.B.: Implementing Fault-Tolerant Services Using the State Machine Approach: a Tutorial. ACM Computing Surveys 22(4), 299–319 (1990)CrossRefGoogle Scholar
  26. 26.
    Gray, J.: Why Do Computers Stop and What Can be Done About it? Technical report, Tandem TR 85-7 (1985)Google Scholar
  27. 27.
    Gifford, D.: Weighted Voting for Replicated Data. In: Proc. of the 7th ACM Symposium on Operating Systems Principles (SOSP), Pacific Grove, CA (1979)Google Scholar
  28. 28.
    Thekkath, C., Mann, T., Lee, E.: Frangipani: a Scalable Distributed File System. In: Proc. of the 16th Symp. on Operating Systems Principles, S. Malo, France (1997)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2012

Authors and Affiliations

  • Dimokritos Stamatakis
    • 1
  • Nikos Tsikoudis
    • 1
  • Ourania Smyrnaki
    • 1
  • Kostas Magoutis
    • 1
  1. 1.Institute of Computer Science (ICS), Foundation for Research and Technology Hellas (FORTH)HeraklionGreece

Personalised recommendations