Advertisement

A Lightweight Method of Metadata and Data Management with DataNet

  • Daniel Harężlak
  • Marek Kasztelnik
  • Maciej Pawlik
  • Bartosz Wilk
  • Marian Bubak
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8500)

Abstract

Scientific computation is a source of many large data sets, which are often structured in a non-interoperable manner. Data and metadata are stored on computing infrastructures or local computers in databases or in files. The discoverability and verifiability of published results represented by such data are poorly established. It is also difficult to manage access to data by applying permission granting mechanisms in the available file systems or databases. Moreover, accessibility of data from external systems is limited by security restrictions imposed by storage facilities. In this paper we present a novel method for managing scientific data, addressing the aforementioned issues by providing a web-based data model management interface, which supports design of metadata structures and their relation to data stored in files, exposing REST-based repositories for data recording and providing easy access level configuration to limit data visibility during the publication process. The method implemented by DataNet tools exploits one of the available PaaS platforms. We present a typical use case scenario and provide an evaluation of DataNet deployment in the PL-Grid Infrastructure.

Keywords

data management metadata recording storage infrastructures data sharing discoverability reproducibility 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Belhajjame, K., Corcho, O., Garijo, D., Zhao, J., Missier, P., Newman, D., Palma, R., Bechhofer, S., García Cuesta, E., Gómez-Pérez, J.M., Soiland-Reyes, S., Verdes-Montenegro, L., De Roure, D., Goble, C.: Workflow-centric research objects: First class citizens in scholarly discourse. In: Proceedings of Workshop on the Semantic Publishing (2012), https://www.escholar.manchester.ac.uk/api/datastream?publicationPid=uk-ac-man-scw:192020&datastreamId=POST-PEER-REVIEW-NON-PUBLISHERS.PDF
  2. 2.
    Ciepiela, E., Harężlak, D., Kasztelnik, M., Meizner, J., Dyk, G., Nowakowski, P., Bubak, M.: The collage authoring environment: From proof-of-concept prototype to pilot service. In: Proceedings of the International Conference on Computational Science. Procedia Computer Science, vol. 18, pp. 769–778 (2013), http://www.sciencedirect.com/science/article/pii/S1877050913003840
  3. 3.
    Ciepiela, E., et al.: Managing Entire Lifecycles of e-Science Applications in the GridSpace2 Virtual Laboratory – From Motivation through Idea to Operable Web-Accessible Environment Built on Top of PL-Grid e-Infrastructure. In: Bubak, M., Szepieniec, T., Wiatr, K. (eds.) PL-Grid 2011. LNCS, vol. 7136, pp. 228–239. Springer, Heidelberg (2012), http://dl.acm.org/citation.cfm?id=2184180.2184198CrossRefGoogle Scholar
  4. 4.
    Cloudify – the open paas stack web page (January 2014), http://www.cloudifysource.org/
  5. 5.
    Crosas, M.: A data sharing story. Journal of eScience Librarianship 1, 173–179 (2013), http://escholarship.umassmed.edu/jeslib/vol1/iss3/7/Google Scholar
  6. 6.
    Cushing, R., Belloum, A., Bubak, M., Oprescu, A., de Laat, C.: Exploratory data processing using non-deterministic finite automataGoogle Scholar
  7. 7.
    De Roure, D., Belhajjame, K., Missier, P., Manuel, J., Palma, R., Ruiz, J.E., Hettne, K., Roos, M., Klyne, G., Goble, C.: Towards the preservation of scientific workflows. In: Procs. of the 8th International Conference on Preservation of Digital Objects (iPRES 2011). ACM (2011)Google Scholar
  8. 8.
    Deis web page (January 2014), http://deis.io/
  9. 9.
    Dspace web page (January 2014), http://www.dspace.org
  10. 10.
    Figshare repository web page (January 2014), http://figshare.com
  11. 11.
    Fundulaki, I., Auer, S.: Introduction to the special theme: Linked open data. ERCIM News 2014(96) (2014), http://ercim-news.ercim.eu/images/stories/EN96/EN96-web.pdf
  12. 12.
    Grape: an opinionated micro-framework for creating rest-like apis in ruby web page (January 2014), https://github.com/intridea/grape
  13. 13.
    Greenberg, J., White, H.C., Carrier, S., Scherle, R.: A metadata best practice for a scientific data repository. Journal of Library Metadata 9(3-4), 194–212 (2009), http://www.tandfonline.com/doi/abs/10.1080/19386380903405090CrossRefGoogle Scholar
  14. 14.
    Heroku cloud application platform web page (January 2014), https://www.heroku.com/
  15. 15.
    Json schema web page (January 2014), http://json-schema.org
  16. 16.
    Koulouzis, S., Vasyunin, D., Cushing, R., Belloum, A., Bubak, M.: Cloud data federation for scientific applications. In: an Mey, D., et al. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 13–22. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  17. 17.
    Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the kepler system: Research articles. Concurr. Comput.: Pract. Exper. 18(10), 1039–1065 (Aug 2006), http://dx.doi.org/10.1002/cpe.v18:10
  18. 18.
    Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., Goble, C.: Taverna, Reloaded. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 471–481. Springer, Heidelberg (2010), http://dx.doi.org/10.1007/978-3-642-13818-8_33CrossRefGoogle Scholar
  19. 19.
    Mobley, A., Linder, S.K., Braeuer, R., Ellis, L.M., Zwelling, L.: A survey on data reproducibility in cancer research provides insights into our limited ability to translate findings from the laboratory to the clinic. PLoS ONE 8(5), e63221 (2013), http://dx.doi.org/10.1371%2Fjournal.pone.0063221CrossRefGoogle Scholar
  20. 20.
    Mongodb web page (January 2014), http://www.mongodb.org
  21. 21.
    Openid foundation web site (January 2014), http://openid.net/
  22. 22.
    Openshift by red hat web page (January 2014), https://www.openshift.com/
  23. 23.
    Pivotal: Cloud foundry web site (January 2014), http://www.cloudfoundry.com
  24. 24.
    Rack – modular ruby webserver interface web site (January 2014), https://github.com/rack/rack
  25. 25.
    Ruby-ffi web page (January 2014), https://github.com/ffi/ffi/wiki
  26. 26.
    Ruby json schema validator web page (January 2014), https://github.com/hoxworth/json-schema
  27. 27.
    Stodden, V., Hurlin, C., Perignon, C.: Runmycode.org: A novel dissemination and collaboration platform for executing published computational results. In: 2012 IEEE 8th International Conference on E-Science, pp. 1–8 (2012)Google Scholar
  28. 28.
    Toolkit, G.: Grid ftp web site (January 2014), http://toolkit.globus.org/toolkit/data/gridftp
  29. 29.
    Witt, S.D., Sinclair, R., Sansum, A., Wilson, M.: Managing large data volumes from scientific facilities. ERCIM News 2012(89) (2012), http://dblp.uni-trier.de/db/journals/ercim/ercim2012.html#WittSSW12

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Daniel Harężlak
    • 1
  • Marek Kasztelnik
    • 1
  • Maciej Pawlik
    • 1
  • Bartosz Wilk
    • 1
  • Marian Bubak
    • 1
    • 2
  1. 1.ACC Cyfronet AGHAGH University of Science and TechnologyKrakówPoland
  2. 2.Department of Computer ScienceAGH University of Science and TechnologyKrakówPoland

Personalised recommendations