Skip to main content

Adventures in NoSQL for Metadata Management

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Included in the following conference series:

Abstract

This paper describes an attempt to use a NoSQL database engine to manage custom metadata using a rich query interface as motivating and descriptive examples of what kind of functionality is desired. While the difficulties are numerous, a number of important considerations for how and when to use this alternative technology were revealed as well as some initial performance numbers showing the performance impact of those choices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache: Apache Accumulo (2018). http://accumulo.apache.org. Accessed 18 Dec 2018

  2. Baron, J., Kotecha, S.: Storage options in the AWS cloud. Amazon Web Services, Washington DC, Technical report (2013)

    Google Scholar 

  3. Edward Hartnett, E., Rew, R.K.: Experience with an enhanced NetCDF data model and interface for scientific data access. In: 24th Conference on IIPS (2008)

    Google Scholar 

  4. Folk, M., Heber, G., Koziol, Q., Pourmal, E., Robinson, D.: An overview of the HDF5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp. 36–47. ACM (2011)

    Google Scholar 

  5. Gamblin, T., et al.: The spack package manager: bringing order to HPC software chaos. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 40. ACM (2015)

    Google Scholar 

  6. Greenberg, H., Bent, J., Grider, G.: MDHIM: a parallel key/value framework for HPC. In: HotStorage (2015)

    Google Scholar 

  7. Khetrapal, A., Ganesh, V.: Hbase and hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, pp. 22–28 (2006)

    Google Scholar 

  8. Lakshman, A., Malik, P.: Cassandra: structured storage system on a P2P network. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing, p. 5. ACM (2009)

    Google Scholar 

  9. Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)

    Article  Google Scholar 

  10. Lawson, M.: EMPRESS Metadata Management System (2018). https://github.com/mlawsonca/empress. Accessed 18 Dec 2018

  11. Lawson, M., Lofstead, J.: Using a robust metadata management system to accelerate scientific discovery at extreme scales. In: Proceedings of the 3rd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems. ACM (2018)

    Google Scholar 

  12. Lawson, M., et al.: Empress: extensible metadata provider for extreme-scale scientific simulations. In: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems, pp. 19–24. ACM (2017)

    Google Scholar 

  13. Li, J., et al.: Parallel NetCDF: a high-performance scientific I/O interface. In: 2003 ACM/IEEE Conference on Supercomputing, p. 39, November 2003. https://doi.org/10.1109/SC.2003.10053

  14. Lofstead, J., et al.: Six degrees of scientific data: reading patterns for extreme scale science IO. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, HPDC 2011, pp. 49–60. ACM (2011). http://doi.acm.org/10.1145/1996130.1996139

  15. Lofstead, J., Zheng, F., Klasky, S., Schwan, K.: Adaptable, metadata rich IO methods for portable high performance IO. In: Proceedings of IPDPS 2009, Rome, Italy, 25–29 May 2009

    Google Scholar 

  16. Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments, pp. 15–24. ACM (2008)

    Google Scholar 

  17. Rew, R., Hartnett, E., Caron, J., et al.: NetCDF-4: software implementing an enhanced data model for the geosciences. In: 22nd International Conference on Interactive Information Processing Systems for Meteorology, Oceanograph, and Hydrology (2006)

    Google Scholar 

  18. Sahin, S., Cao, W., Zhang, Q., Liu, L.: JVM configuration management and its performance impact for big data applications. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp. 410–417. IEEE (2016)

    Google Scholar 

  19. Sevilla, M.A., et al.: Tintenfisch: file system namespace schemas and generators. In: The 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 2018) (2018)

    Google Scholar 

  20. Stax, D.: DataStax Cassandra Connector (2018). https://www.datastax.com/. Accessed 18 Dec 2018

  21. Tang, H., Byna, S., Dong, B., Liu, J., Koziol, Q.: SoMeta: scalable object-centric metadata management for high performance computing. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 359–369. IEEE (2017)

    Google Scholar 

  22. Tschetter, E.: Introducing Druid (2012). http://druid.io/blog/2012/10/24/introducing-druid.html. Accessed 18 Dec 2018

  23. Ulmer, C.D., et al.: Faodail: enabling in situ analytics for next-generation systems. Technical report, Sandia National Lab. (SNL-NM), Albuquerque, NM (United States) (2017)

    Google Scholar 

  24. Indiana University: IndexedHbase (2019). http://salsaproj.indiana.edu/IndexedHBase/HBguide.html. Accessed 14 June 2019

  25. Vora, M.N.: Hadoop-hbase for large-scale data. In: 2011 International Conference on Computer Science and Network Technology (ICCSNT), vol. 1, pp. 601–605. IEEE (2011)

    Google Scholar 

  26. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jay Lofstead .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lofstead, J., Ryan, A., Lawson, M. (2019). Adventures in NoSQL for Metadata Management. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34356-9_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34355-2

  • Online ISBN: 978-3-030-34356-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics