Adventures in NoSQL for Metadata Management

Lofstead, Jay; Ryan, Ashleigh; Lawson, Margaret

doi:10.1007/978-3-030-34356-9_19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Included in the following conference series:

International Conference on High Performance Computing

5922 Accesses
1 Citations

Abstract

This paper describes an attempt to use a NoSQL database engine to manage custom metadata using a rich query interface as motivating and descriptive examples of what kind of functionality is desired. While the difficulties are numerous, a number of important considerations for how and when to use this alternative technology were revealed as well as some initial performance numbers showing the performance impact of those choices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apache: Apache Accumulo (2018). http://accumulo.apache.org. Accessed 18 Dec 2018
Baron, J., Kotecha, S.: Storage options in the AWS cloud. Amazon Web Services, Washington DC, Technical report (2013)
Google Scholar
Edward Hartnett, E., Rew, R.K.: Experience with an enhanced NetCDF data model and interface for scientific data access. In: 24th Conference on IIPS (2008)
Google Scholar
Folk, M., Heber, G., Koziol, Q., Pourmal, E., Robinson, D.: An overview of the HDF5 technology suite and its applications. In: Proceedings of the EDBT/ICDT 2011 Workshop on Array Databases, pp. 36–47. ACM (2011)
Google Scholar
Gamblin, T., et al.: The spack package manager: bringing order to HPC software chaos. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 40. ACM (2015)
Google Scholar
Greenberg, H., Bent, J., Grider, G.: MDHIM: a parallel key/value framework for HPC. In: HotStorage (2015)
Google Scholar
Khetrapal, A., Ganesh, V.: Hbase and hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, pp. 22–28 (2006)
Google Scholar
Lakshman, A., Malik, P.: Cassandra: structured storage system on a P2P network. In: Proceedings of the 28th ACM Symposium on Principles of Distributed Computing, p. 5. ACM (2009)
Google Scholar
Lamb, A., et al.: The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)
Article Google Scholar
Lawson, M.: EMPRESS Metadata Management System (2018). https://github.com/mlawsonca/empress. Accessed 18 Dec 2018
Lawson, M., Lofstead, J.: Using a robust metadata management system to accelerate scientific discovery at extreme scales. In: Proceedings of the 3rd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems. ACM (2018)
Google Scholar
Lawson, M., et al.: Empress: extensible metadata provider for extreme-scale scientific simulations. In: Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems, pp. 19–24. ACM (2017)
Google Scholar
Li, J., et al.: Parallel NetCDF: a high-performance scientific I/O interface. In: 2003 ACM/IEEE Conference on Supercomputing, p. 39, November 2003. https://doi.org/10.1109/SC.2003.10053
Lofstead, J., et al.: Six degrees of scientific data: reading patterns for extreme scale science IO. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, HPDC 2011, pp. 49–60. ACM (2011). http://doi.acm.org/10.1145/1996130.1996139
Lofstead, J., Zheng, F., Klasky, S., Schwan, K.: Adaptable, metadata rich IO methods for portable high performance IO. In: Proceedings of IPDPS 2009, Rome, Italy, 25–29 May 2009
Google Scholar
Lofstead, J.F., Klasky, S., Schwan, K., Podhorszki, N., Jin, C.: Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS). In: Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments, pp. 15–24. ACM (2008)
Google Scholar
Rew, R., Hartnett, E., Caron, J., et al.: NetCDF-4: software implementing an enhanced data model for the geosciences. In: 22nd International Conference on Interactive Information Processing Systems for Meteorology, Oceanograph, and Hydrology (2006)
Google Scholar
Sahin, S., Cao, W., Zhang, Q., Liu, L.: JVM configuration management and its performance impact for big data applications. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp. 410–417. IEEE (2016)
Google Scholar
Sevilla, M.A., et al.: Tintenfisch: file system namespace schemas and generators. In: The 10th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 2018) (2018)
Google Scholar
Stax, D.: DataStax Cassandra Connector (2018). https://www.datastax.com/. Accessed 18 Dec 2018
Tang, H., Byna, S., Dong, B., Liu, J., Koziol, Q.: SoMeta: scalable object-centric metadata management for high performance computing. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 359–369. IEEE (2017)
Google Scholar
Tschetter, E.: Introducing Druid (2012). http://druid.io/blog/2012/10/24/introducing-druid.html. Accessed 18 Dec 2018
Ulmer, C.D., et al.: Faodail: enabling in situ analytics for next-generation systems. Technical report, Sandia National Lab. (SNL-NM), Albuquerque, NM (United States) (2017)
Google Scholar
Indiana University: IndexedHbase (2019). http://salsaproj.indiana.edu/IndexedHBase/HBguide.html. Accessed 14 June 2019
Vora, M.N.: Hadoop-hbase for large-scale data. In: 2011 International Conference on Computer Science and Network Technology (ICCSNT), vol. 1, pp. 601–605. IEEE (2011)
Google Scholar
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Sandia National Laboratories, Albuquerque, NM, USA
Jay Lofstead, Ashleigh Ryan & Margaret Lawson
Georgia Institute of Technology, Atlanta, GA, USA
Ashleigh Ryan
University of Illinois, Urbana-Champaign, IL, USA
Margaret Lawson

Authors

Jay Lofstead
View author publications
You can also search for this author in PubMed Google Scholar
Ashleigh Ryan
View author publications
You can also search for this author in PubMed Google Scholar
Margaret Lawson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jay Lofstead .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Michèle Weiland
Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Sachsen, Germany
Guido Juckeland
Swiss National Supercomputing Centre, Lugano, Ticino, Switzerland
Sadaf Alam
University of Tennessee at Knoxville, Knoxville, TN, USA
Heike Jagode

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lofstead, J., Ryan, A., Lawson, M. (2019). Adventures in NoSQL for Metadata Management. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-34356-9_19
Published: 03 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics