Skip to main content

MetaStore: an adaptive metadata management framework for heterogeneous metadata models


In this paper, we present MetaStore, a metadata management framework for scientific data repositories. Scientific experiments are generating a deluge of data, and the handling of associated metadata is critical, as it enables discovering, analyzing, reusing, and sharing of scientific data. Moreover, metadata produced by scientific experiments are heterogeneous and subject to frequent changes, demanding a flexible data model. Existing metadata management systems provide a broad range of features for handling scientific metadata. However, the principal limitation of these systems is their architecture design that is restricted towards either a single or at the most a few standard metadata models. Support for handling different types of metadata models, i.e., administrative, descriptive, structural, and provenance metadata, and including community-specific metadata models is not possible with these systems. To address this challenge, we present MetaStore, an adaptive metadata management framework based on a NoSQL database and an RDF triple store. MetaStore provides a set of core functionalities to handle heterogeneous metadata models by automatically generating the necessary software code (services) and on-the-fly extends the functionality of the framework. To handle dynamic metadata and to control metadata quality, MetaStore also provides an extended set of functionalities such as enabling annotation of images and text by integrating the Web Annotation Data Model, allowing communities to define discipline-specific vocabularies using Simple Knowledge Organization System, and providing advanced search and analytical capabilities by integrating the ElasticSearch. To maximize the utilization of the data models supported by NoSQL databases, MetaStore automatically segregates the different categories of metadata in their corresponding data models. Complex provenance graphs and dynamic metadata are modeled and stored in an RDF triple store, whereas the static metadata is stored in a NoSQL database. For enabling large-scale harvesting (sharing) of metadata using the METS standard over the OAI-PMH protocol, MetaStore is designed OAI-compliant. Finally, to show the practical usability of the MetaStore framework and that the requirements from the research communities have been realized, we describe our experience in the adoption of MetaStore for three communities.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14


























  1. Hey, T., Trefethen, A.: The Data Deluge: An e-Science Perspective. Wiley and Sons (2003)

  2. Gutierrez, D.D.: InsideBIGDATA guide to scientific research. Accessed 9 June 2017

  3. Berry, D., Parastatidis, S.: e-Science workflow services workshop, December 2003. Accessed 10 June 2017

  4. Gannon, D., Fox, G., Farazdel, A., Goble, C., Deelman, E., Berry, D.: Workflow in grid systems workshop, March 2004. Accessed 16 June 2017

  5. Jacob, J., Katz, D., Miller, C., et al.: GRIST workshop on service composition for data exploration in the virtual observatory, July 2004. Accessed 10 June 2017

  6. LINK-Up Workshop on Scientific Workflows, October 2004. Accessed 16 June 2017

  7. Deelman, E., Gil, Y., Zemankova, M.: NSF Workshop on the Challenges of Scientific Workflows, May 2006.$_$summ.jsp?cntn$_$id=108411. Accessed 16 June 2017

  8. Gray, J., Liu, D.T., Nieto-Santisteban, M., Szalay, A., DeWitt, D.J., Heber, G.: Scientific data management in the coming decade. SIGMOD Rec. 34(4), 34–41 (2005)

    Article  Google Scholar 

  9. Graybeal, J., Miller, S.P., Stocks, K.: The MMI guides: navigating the world of marine metadata. (2010). Accessed 15 June 2017

  10. Lemmer, P., Gunkel, M., Baddeley, D., Kaufmann, R., Urich, A., Weiland, Y., Reymann, J., Müller, P., Hausmann, M., Cremer, C.: SPDM: light microscopy with single-molecule resolution at the nanoscale. Appl. Phys. B 93(1), 1 (2008)

    Article  Google Scholar 

  11. National Information Standards Organization: Understanding Metadata, NISO Press, Bethesda (2004). Accessed 15 May 2017

  12. Dimitrovski, I., Kocev, D., Loskovska, S., Džeroski, S.: Hierarchical annotation of medical images. Patt. Recogn. 44(1011), 2436–2449 (2011)

    Article  Google Scholar 

  13. Hu, B., Dasmahapatra, S., Lewis, P., Shadbolt, N.: Ontology-based medical image annotation with description logics. In: Proceedings of 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 77–82 (2003)

  14. Blanke, T., Hedges, M., Dunn, S.: Arts and humanities e-science: current practices and future challenges. Fut. Gener. Comput. Syst. 25(4), 474–480 (2009)

    Article  Google Scholar 

  15. Gao, S., Sperberg-McQueen, C.M., Thompson, H.S., Mendelsohn, N., Beech, D., Maloney, M.: W3C XML schema definition language (XSD) 1.1 part 1: structures. W3C Candidate Recommendation 30(7.2) (2009)

  16. Higgins, D., Berkley, C., Jones, M. B.: Managing heterogeneous ecological data using Morpho. In: Proceedings 14th International Conference on Scientific and Statistical Database Management, pp. 69–76 (2002)

  17. Frew, J., Bose, R.: Earth system science workbench: a data management infrastructure for earth science products. In: Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM, pp. 180–189 (2001)

  18. Pancerella, C., Hewson, J., et al: Metadata in the collaboratory for multi-scale chemical science. In: International Conference on Dublin Core and Metadata Applications (2003)

  19. Malet, G., Munoz, F., Appleyard, R., Hersh, W.: A model for enhancing internet medical document retrieval with medical core metadata. J. Am. Med. Inf. Assoc. 6(2), 163 (1999)

    Article  Google Scholar 

  20. Prabhune, A., Ansari, H., Keshav, A., Stotzka, R., Gertz, M., Hesser, J.: Metastore: a metadata framework for scientific data repositories. In: IEEE International Conference on Big Data (Big Data), pp. 3026–3035 (2016)

  21. Cuevas-Vicenttín, V., Ludäscher, B,. Missier, P., Belhajjame, K., Chirigati, F., Wei, Y., Dey, S., Kianmajd, P., Koop, D., Bowers, S., et al.: ProvONE: a PROV extension data model for scientific workflow provenance (2015)

  22. PREMIS Working Group et al.: Data dictionary for preservation metadata: final report of the premis working group. OCLC Online Computer Library Center & Research Libraries Group, Dublin, OH, USA, Final report (2005)

  23. Lagoze, C., Van de Sompel, H., Nelson, M., Warner, S.: The open archives initiative protocol for metadata harvesting-version 2.0 (2002)

  24. McDonough, J.P.: METS: standardized encoding for digital library objects. Int. J. Digit. Libr. 6(2), 148–158 (2006)

    Article  Google Scholar 

  25. Miles, A., Matthews, B., Wilson, M., Brickley, D.: SKOS core: simple knowledge organisation for the web. In: International Conference on Dublin Core and Metadata Applications, pp. 3–10 (2005)

  26. Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2015)

    Google Scholar 

  27. Apache Jena. A free and open source java framework for building semantic web and linked data applications. Accessed 15 March 2017

  28. Prabhune, A., Zweig, A., Stotzka, R., Gertz, M., Hesser, J.: Prov2ONE: An Algorithm for Automatically Constructing ProvONE Provenance Graphs, pp. 204–208. Springer International Publishing (2016)

  29. Carlson, J.L.: Redis in Action. Manning Publications Co., Greenwich (2013)

    Google Scholar 

  30. Banker, K.: MongoDB in Action. Manning Publications Co., Greenwich (2011)

    Google Scholar 

  31. Vukotic, A., Watt, N., Abedrabbo, T., Fox, D., Partner, J.: Neo4j in Action. Manning Publications Co., Greenwich (2015)

    Google Scholar 

  32. Chandna, S., Rindone, F., Dachsbacher, C., Stotzka, R.: Quantitative exploration of large medieval manuscripts data for the codicological research. In: 2016 IEEE 6th Symposium on Large Data Analysis and Visualization (LDAV), pp. 20–28 (2016)

  33. McKinley, P.K., Sadjadi, S.M., Kasten, E.P., Cheng, B.H.: Composing adaptive software. Computer 37(7), 56–64 (2004)

    Article  Google Scholar 

  34. OASIS. Web services business process execution language version 2.0. (2007)

  35. Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Bhagat, J.: The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucl Acids Res 41, W557–W561 (2013)

    Article  Google Scholar 

  36. Lee, E.A., Neuendorffer, S.: MoML: a modeling markup language in SML: version 0.4. Electronics Research Laboratory, University of California (2000)

  37. Prud, E., Seaborne, A., et al.: SPARQL query language for RDF., Accessed 15 March 2017

  38. Zhao, Y., Wilde, M., Foster, I.: Applying the Virtual Data Provenance Model. Springer, Berlin (2006)

    Book  Google Scholar 

  39. Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., et al.: The open provenance model core specification (v1. 1). Fut. Gener. Comput. Syst. 27(6), 743–756 (2011)

    Article  Google Scholar 

  40. Sahoo, S., Groth, P., Hartig, S.M., Miles, S., Gil, Y., Myers, J., Moreau, L., Panzer, M., Zhao, J., Garijo, D.: Provenance Vocabulary Mappings. W3C Provenance Incubator Group (2010)

  41. Weibel, S., Kunze, J., Lagoze, C., Wolf, M.: Dublin core metadata for resource discovery. Technical report (1998)

  42. Berndl, E., Schlegel, K., Eisenkolb, A., Kosch, H.: Idiomatic persistence and querying for the W3C Web Annotation Data Model. In: Joint Proceedings of the 4th International Workshop on Linked Media and the 3rd Developers Hackshop Co-located with the 13th Extended Semantic Web Conference ESWC (2016)

  43. Suominen, O., Ylikotila, H., Pessala, S., Lappalainen, M., Frosterus, M., Tuominen, J., Baker, T., Caracciolo, C., Retterath, A.: Publishing SKOS Vocabularies with Skosmos. Manuscript submitted for review (2015)

  44. Scholz, H.: Die mittelalterlichen Glasmalereien in Mittelfranken und Nürnberg: extra muros, vol. 10. Deutscher Verlag für Kunstwissenschaft (2002)

  45. Scholz, H.: Die mittelalterlichen Glasmalereien in Nürnberg: Sebalder Stadtseite. Deutscher Verlag für Kunstwissenschaft (2013)

  46. Couprie, L.D.: Iconclass: an iconographic classification system. Art Libr. J. 8(2), 3249 (1983)

    Article  Google Scholar 

  47. Ball, A., Chen, S., Greenberg, J., Perez, C., Jeffery, K., Koskela, R.: Building a disciplinary metadata standards directory. Int. J. Digit. Curation 9(1), 142–151 (2014)

    Article  Google Scholar 

  48. Ben-Kiki, O., Evans, C., Ingerson, B.: YAML Ain’t Markup Language (YAML) version 1.1. yaml. org, Technical Report (2005)

  49. Allcock, W., Bester, J., Bresnahan, J., Chervenak, A., Liming, L., Tuecke, S.: GridFTP: protocol extensions to FTP for the grid. Global Grid Forum GFD-RP 20, 1–21 (2003)

    Google Scholar 

  50. Whitehead, E.J., Wiggins, M.: WebDAV: IEFT standard for collaborative authoring on the web. IEEE Internet Comput. 2(5), 34–40 (1998)

    Article  Google Scholar 

  51. Marcial, L.H., Hemminger, B.M.: Scientific data repositories on the web: an initial survey. J. Am. Soc. Inf. Sci. Technol. 61(10), 2029–2048 (2010)

    Article  Google Scholar 

  52. Woodberry, E., Bailey, C.W.: SPEC Kit 292: Institutional Repositories. Australian Acad. Res. Libr. 39(2), 129–130 (2008)

  53. Lynch, C.A., Lippincott, J.K.: Institutional repository deployment in the united states as of early 2005. D-lib Mag. 11(9), 1–11 (2005)

    Google Scholar 

  54. Smith, M., Barton, M., Bass, M., Branschofsky, M., McClellan, G., Stuve, D., Tansley, R., Walker, J.H.: DSpace: an open source dynamic digital repository. D-Lib Mag. 9(1) (2003).

  55. Van Garderen, P.: Archivematica: using micro-services and open-source software to deliver a comprehensive digital curation solution. In: Proceedings of the 7th International Conference on Preservation of Digital Objects, Vienna, Austria, pp. 145–149 (2010)

  56. Flannery, D., Matthews, B., Griffin, T., Bicarregui, J., Gleaves, M., Lerusse, L., Downing, R., Ashton, A., Sufi, S., Drinkwater, G., Kleese, K.: ICAT: integrating data infrastructure for facilities based science. In: Fifth IEEE International Conference e-Science ’09, pp. 201–207 (2009)

  57. Sufi, S., Mathews, B.: CCLRC scientific metadata model: version 2. Technical report, CCLRC technical report DL TR2004001 (2004)

  58. Lecarpentier, D., Wittenburg, P., Elbers, W., Michelini, A., Kanso, R., Coveney, P., Baxter, R.: EUDAT: a new cross-disciplinary data infrastructure for science. Int. J. Digit. Curation 8(1), 279–287 (2013)

    Article  Google Scholar 

  59. Grainger, T., Potter, T., Seeley, Y.: Solr in Action. Manning, Cherry Hill (2014)

    Google Scholar 

  60. Beazley, M.: EPrints institutional repository software: a review. Partnership 5(2), 1 (2010)

    Google Scholar 

  61. Jensen, S., Plale, B.: Using characteristics of computational science schemas for workflow metadata management. In: IEEE Congress on Services—Part I, pp. 445–452 (2008)

  62. Shanmugasundaram, J., Tufte, K., Zhang, C., He, G., DeWitt, D. J., Naughton, J.F.: Relational databases for querying XML documents: limitations and opportunities. In: Proceedings of the 25th International Conference on Very Large Data Bases (VLDB), pp. 302–314, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc (1999)

  63. Jones, M.B., Berkley, C., Bojilova, J., Schildhauer, M.: Managing scientific metadata. IEEE Internet Comput. 5(5), 59–68 (2001)

    Article  Google Scholar 

  64. Yang, R., Deng, X., Kafatos, M., Wang, C., Wang, X.S.: An XML-based Distributed Metadata Server (DIMES) supporting earth science metadata. In: Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM, pp. 251–256 (2001)

  65. Baru, C., Moore, R., Rajasekar, A., Wan, M.: The SDSC storage resource broker. In: Proceedings of the Conference of the Centre for Advanced Studies on Collaborative Research, CASCON ’98, p. 5. IBM Press, New York (1998)

  66. Singh, G., Bharathi, S., Chervenak, A., Deelman, E., Kesselman, C., Manohar, M., Patil, S., Pearlman, L.: A metadata catalog service for data intensive applications. In: Supercomputing, 2003 ACM/IEEE Conference, pp. 33–33 (2003)

  67. Deelman, E., Singh, G., Atkinson, M.P., Chervenak, A., Hong, N.C., Kesselman, C., Patil, S., Pearlman, L., Su, M.H.: Grid-based metadata services. In: Proceedings. 16th International Conference on Scientific and Statistical Database Management, pp. 393–402 (2004)

  68. Pham, Q., Malik, T., Foster, I.T., Di Lauro, R., Montella, R.: SOLE: linking research papers with science objects. In: IPAW, pp. 203–208. Springer, Berlin (2012)

  69. McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action, 2nd edn. Covers Apache Lucene 3.0. Manning Publications Co., Greenwich (2010)

  70. Belhajjame, K., B’Far, R., Cheney, J., Coppens, S., Cresswell, S., Gil, Y., Groth, P., Klyne, G., Lebo, T., McCusker, J., Miles, S., Myers, J., Sahoo, S., Curt, T.: PROV-DM: the PROV data model. Project report (2013)

  71. Schandl, T., Blumauer, A.: PoolParty: SKOS thesaurus management utilizing linked data. In: The Semantic Web: Research and Applications: 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Crete, Greece, May 30–June 3, 2010, Proceedings, Part II, pp. 421–425. Springer, Berlin, Heidelberg (2010)

  72. Culhane, W., Kogan, L., Jayalath, C., Eugster, P.: LOOM: optimal aggregation overlays for in-memory big data processing. In: 6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14), Philadelphia, USENIX Association (2014)

  73. Deelman, E., Berriman, B., Chervenak, A., Corcho, O., Groth, P., Moreau, L.: Metadata and provenance management. In: Scientific Data Management: Challenges, Technology, and Deployment, 1st edn. (2009)

  74. Li, Y., Manoharan, S.: A performance comparison of SQL and NoSQL databases. In: IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), pp. 15–19 (2013)

  75. Boicea, A., Radulescu, F., Agapin, L.I.: MongoDB vs oracle-database comparison. In: EIDWT, pp. 330–335 (2012)

  76. Jensen, S., Ghoshal, D., Plale, B.: Evaluation of two XML storage approaches for scientific metadata. Indiana University Department of Computer Science Technical Report (2011)

  77. Wood, L., Le Hors, A., Apparao, V., Byrne, S., Champion, M., Isaacs, S., Jacobs, I., Nicol, G., Robie, J., Sutor, R., Wilson, C.: Document object model (DOM) level 1 specification. W3C recommendation (1998)

  78. Cremer, C., Kaufmann, R., Gunkel, M., Pres, S., Weiland, Y., Müller, P., Ruckelshausen, T., Lemmer, P., Geiger, F., Degenhard, S., Christina, W., Lemmermann, N., Holtappels, R., Strickfaden, H., Hausmann, M.: Superresolution imaging of biological nanostructures by spectral precision distance microscopy. Biotech. J. 6(9), 1037–1051 (2011)

  79. Prabhune, A., Stotzka, R., Jejkal, T., Hartmann, V., Bach, M., Schmitt, E., Hausmann, M., Hesser, J.: An optimized generic client service API for managing large datasets within a data repository. In: Big Data Computing Service and Applications (BigDataService), IEEE First International Conference, pp. 44–51 (2015)

  80. Jordan, D., Evdemon, J., Alves, A., Arkin, A., Askary, S., Barreto, C., Bloch, B., Curbera, F., Ford, M., Goland, Y., Guzar, A.: Web services business process execution language version 2.0. OASIS Stand. 11(120), 5 (2007)

    Google Scholar 

  81. Chandna, S., Tonne, D., Jejkal, T., Stotzka, R., Krause, C., Vanscheidt, P., Prabhune, A.: Software workflow for the automatic tagging of medieval manuscript images (SWATI). In: SPIE/IS&T Electronic Imaging, p. 940206 (2015)

  82. Forman, I.R., Forman, N.: Java Reflection in Action. Manning Publication Co., Greenwich (2004)

    MATH  Google Scholar 

  83. Altintas, I., Anand, M.K., Crawl, D., Bowers, S., Belloum, A., Missier, P., Ludäscher, B., Goble, C.A., Sloot, P.M.: Understanding Collaborative Studies Through Interoperable Workflow Provenance. Springer, Berlin (2010)

    Book  Google Scholar 

  84. Braun, U., Seltzer, M.I., Chapman, A., Blaustein, B.T., Allen, M.D., Seligman, L.: Towards query interoperability: PASSing PLUS. In: TaPP, pp. 1–10 (2010)

  85. Missier, P., Ludäscher, B., Bowers, S., Dey, S., Sarkar, A., Shrestha, B., Altintas, I., Anand, M.K., Goble, C.: Linking multiple workflow provenance traces for interoperable collaborative science. In: The 5th Workshop on Workflows in Support of Large-Scale Science, pp. 1–8 (2010)

  86. Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analysis. O’Reilly Media, Inc., Sebastopol (2015)

    Google Scholar 

Download references


This research is supported by the Portfolio Extension of Helmholtz Association “Large Scale Data Management and Analysis” and DFG (German Research Foundation) MASi Project (STO 397/4-1).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ajinkya Prabhune.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Prabhune, A., Stotzka, R., Sakharkar, V. et al. MetaStore: an adaptive metadata management framework for heterogeneous metadata models. Distrib Parallel Databases 36, 153–194 (2018).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • MetaStore
  • NoSQL database
  • Automated code generation
  • Annotations