Skip to main content

The Component of Searchable Storage: Semantic-Aware Namespace

  • Chapter
  • First Online:
Searchable Storage in Cloud Computing
  • 448 Accesses

Abstract

The explosive growth in data volume and complexity imposes great challenges for file systems. To address these challenges, an innovative namespace management scheme is in desperate need to provide both the ease and efficiency of data access. In almost all today’s file systems, the namespace management is based on hierarchical directory trees. This tree-based namespace scheme is prone to severe performance bottlenecks and often fails to provide real-time response to complex data lookups. We propose a Semantic-Aware Namespace scheme, called SANE, which provides dynamic and adaptive namespace management for ultra-large storage systems with billions of files. SANE introduces a new naming methodology based on the notion of semantic-aware per-file namespace, which exploits semantic correlations among files, to dynamically aggregate correlated files into small, flat but readily manageable groups to achieve fast and accurate lookups. SANE is implemented as a middleware in conventional file systems and works orthogonally with hierarchical directory trees. The semantic correlations and file groups identified in SANE can also be used to facilitate file prefetching and data de-duplication, among other system-level optimizations. Extensive trace-driven experiments on our prototype implementation validate the efficacy and efficiency of SANE (©{2014}IEEE. Reprinted, with permission, from Ref. [1].).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Xu, SANE: semantic-aware namespace in ultra-large-scale file systems. IEEE Trans. Parallel Distrib. Syst. (TPDS) 25(5), 1328–1338 (2014)

    Article  Google Scholar 

  2. I. Gorton, P. Greenfield, A. Szalay, R. Williams, Data-intensive computing in the 21st century. Computer 41(4), 30–32 (2008)

    Article  Google Scholar 

  3. I.D.C. (IDC), 2010 Digital Universe Study: A Digital Universe Decade - Are You Ready? http://gigaom.files.wordpress.com/2010/05/2010-digital-universe-iview (2010)

  4. Symantec. 2010 State of the Data Center Global Data, http://www.symantec.com/content/en/us/about/media/pdfs/Symantec_DataCenter10_Report_Global.pdf (2010)

  5. M. Seltzer, N. Murphy, Hierarchical file systems are dead, in Proceedings of the HotOS (2009)

    Google Scholar 

  6. R. Daley, P. Neumann, A general-purpose file system for secondary storage, in Proceedings of the Fall Joint Computer Conference, Part I (1965), pp. 213–229

    Google Scholar 

  7. N. Agrawal, W. Bolosky, J. Douceur, J. Lorch, A five-year study of file-system metadata, in Proceedings of the USENIX FAST (2007)

    Google Scholar 

  8. A.W. Leung, M. Shao, T. Bisson, S. Pasupathy, E.L. Miller, Spyglass: fast, scalable metadata search for large-scale storage systems, in Proceedings of the FAST (2009)

    Google Scholar 

  9. S. Doraimani, A. Iamnitchi, File grouping for scientific data management: lessons from experimenting with real traces, in Proceedings of the HPDC (2008)

    Google Scholar 

  10. A. Leung, S. Pasupathy, G. Goodson, E. Miller, Measurement and analysis of large-scale network file system workloads, in Proceedings of the USENIX ATC (2008)

    Google Scholar 

  11. A. Ames, C. Maltzahn, N. Bobb, E. Miller, S. Brandt, A. Neeman, A. Hiatt, D. Tuteja, Richer file system metadata using links and attributes, in Proceedings of the Mass Storage Systems and Technologies (MSST) (2005)

    Google Scholar 

  12. S. Weil, S.A. Brandt, E.L. Miller, D.D.E. Long, C. Maltzahn, Ceph: a scalable, high-performance distributed file system, in Proceedings of the OSDI (2006)

    Google Scholar 

  13. H. Huang, N. Zhang, W. Wang, G. Das, A. Szalay, Just-in-time analytics on large file systems, in Proceedings of the FAST (2011)

    Google Scholar 

  14. K. Veeraraghavan, J. Flinn, E.B. Nightingale, B. Noble, quFiles: the right file at the right time, in Proceedings of the USENIX Conference File and Storage Technologies (FAST) (2010)

    Google Scholar 

  15. Z. Zhang, C. Karamanolis, Designing a robust namespace for distributed file services, in Proceedings of the SRDS (2001), pp. 162–173

    Google Scholar 

  16. Y. Hua, H. Jiang, Y. Zhu, D. Feng, L. Tian, SmartStore: a new metadata organization paradigm with semantic-awareness for next-generation file systems, in Proceedings of the ACM/IEEE Supercomputing Conference (SC) (2009)

    Google Scholar 

  17. D. Beaver, S. Kumar, H. Li, J. Sobel, P. Vajgel, Finding a needle in haystack: facebooks photo storage, in Proceedings of the OSDI (2010)

    Google Scholar 

  18. S. Sinnamohideen, R. Sambasivan, J. Hendricks, L. Liu, G. Ganger, A transparently-scalable metadata service for the Ursa Minor storage system, in Proceedings of the USENIX Annual Technical Conference (2010)

    Google Scholar 

  19. D. Hildebrand, P. Honeyman, Exporting storage systems in a scalable manner with pNFS, in Proceedings of the MSST (2005)

    Google Scholar 

  20. PVFS2. Parallel Virtual File System, Version 2, http://www.pvfs2.org

  21. S. Ghemawat, H. Gobioff, S. Leung, The Google file system, in Proceedings of the SOSP (2003)

    Google Scholar 

  22. Hadoop Project, http://hadoop.apache.org/

  23. P. Indyk, R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, in Proceedings of the STOC (1998)

    Google Scholar 

  24. P. Gu, Y. Zhu, H. Jiang, J. Wang, Nexus: a novel weighted-graph-based prefetching algorithm for metadata servers in petabyte-scale storage systems, in Proceedings of the CCGrid (2006)

    Google Scholar 

  25. P. Xia, D. Feng, H. Jiang, L. Tian, F. Wang, FARMER: a novel approach to file access correlation mining and evaluation reference model for optimizing peta-scale file systems performance, in Proceedings of the HPDC (2008)

    Google Scholar 

  26. E. Riedel, M. Kallahalla, R. Swaminathan, A framework for evaluating storage system security, in Proceedings of the FAST (2002), pp.15–30

    Google Scholar 

  27. S. Kavalanekar, B. Worthington, Q. Zhang, V. Sharda, Characterization of storage workload traces from production windows servers, in Proceeding of the IEEE International Symposium on Workload Characterization (IISWC) (2008)

    Google Scholar 

  28. D. Ellard, J. Ledlie, P. Malkani, M. Seltzer, Passive NFS tracing of email and research workloads, Proceedings of the USENIX Conference File and Storage Technologies (FAST) (2003), pp. 203–216

    Google Scholar 

  29. J.L. Hellerstein, Google cluster data, http://googleresearch.blogspot.com/2010/01/google-cluster-data.html (2010)

  30. A. Andoni, P. Indyk, Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 1, 117–122 (2008)

    Article  Google Scholar 

  31. M. Datar, N. Immorlica, P. Indyk, V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, in Proceedings of the Annual Symposium on Computational Geometry (2004), pp. 253–262

    Google Scholar 

  32. A. Guttman, R-trees: a dynamic index structure for spatial searching, in Proceedings of the ACM SIGMOD (1984), pp. 47–57

    Article  Google Scholar 

  33. D. Hitz, J. Lau, M. Malcolm, File system design for an NFS file server appliance, in Proceedings of the USENIX Winter Technical Conference (1994), pp. 235–246

    Google Scholar 

  34. N.C. Hutchinson, S. Manley, M. Federwisch, G. Harris, D. Hitz, S. Kleiman, S. O’Malley, Logical versus physical file system backup. Oper. Syst. Rev. 33, 239–250 (1998)

    Google Scholar 

  35. Q. Lv, W. Josephson, Z. Wang, M. Charikar, K. Li, Multi-probe LSH: efficient indexing for high-dimensional similarity search, in Proceedings of the VLDB (2007), pp. 950–961

    Google Scholar 

  36. A. Traeger, E. Zadok, N. Joukov, C. Wright, A nine year study of file system and storage benchmarking. ACM Trans. Storage 2, 1–56 (2008)

    Article  Google Scholar 

  37. D.K. Gifford, P. Jouvelot, M.A. Sheldon, J.W.O. Jr, Semantic file systems, in Proceedings of the SOSP (1991)

    Google Scholar 

  38. C. Maltzahn, E. Molina-Estolano, A. Khurana, A.J. Nelson, S.A. Brandt, S. Weil, Ceph as a scalable alternative to the hadoop distributed file system, in ;login: The USENIX Magazine (2010)

    Google Scholar 

  39. S. Patil, G. Gibson, Scale and concurrency of GIGA+: file system directories with millions of files, in Proceedings of the FAST (2011)

    Google Scholar 

  40. J. Xing, J. Xiong, N. Sun, J. Ma, Adaptive and scalable metadata management to support a trillion files, in Proceedings of ACM/IEEE Supercomputing Conference (SC) (2009)

    Google Scholar 

  41. S. Weil, K. Pollack, S. Brandt, E. Miller, Dynamic metadata management for petabyte-scale file systems, in Proceedings of the ACM/IEEE Supercomputing (2004)

    Google Scholar 

  42. S. Deerwester, S. Dumas, G. Furnas, T. Landauer, R. Harsman, Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)

    Article  Google Scholar 

  43. C. Papadimitriou, P. Raghavan, H. Tamaki, S. Vempala, Latent semantic indexing: a probabilistic analysis. J. Comput. Syst. Sci. 61(2), 217–235 (2000)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Hua .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hua, Y., Liu, X. (2019). The Component of Searchable Storage: Semantic-Aware Namespace. In: Searchable Storage in Cloud Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-2721-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-2721-6_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-2720-9

  • Online ISBN: 978-981-13-2721-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics