Advertisement

Publishing Without Publishers: A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

  • Tobias KuhnEmail author
  • Christine Chichester
  • Michael Krauthammer
  • Michel Dumontier
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9366)

Abstract

Making available and archiving scientific results is for the most part still considered the task of classical publishing companies, despite the fact that classical forms of publishing centered around printed narrative articles no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. Here we propose to design scientific data publishing as a Web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used for the Semantic Web in general. Evaluation of the current small network shows that this system is efficient and reliable.

Keywords

Server Network Triple Store Decentralize Approach SPARQL Endpoint Artifact Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Belhajjame, K., Corcho, O., Garijo, D., Zhao, J., Missier, P., Newman, D., Palma, R., Bechhofer, S., Garcıa, E., Cuesta, J.M.G.-P., et al.: Workflow-centric research objects: first class citizens in scholarly discourse. In: Proceedings of SePublica 2012. CEUR-WS (2012)Google Scholar
  2. 2.
    Berners-Lee, T.: Linked data – design issues (2006). http://www.w3.org/DesignIssues/LinkedData.html
  3. 3.
    Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.-Y.: SPARQL Web-Querying Infrastructure: Ready for Action? In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 277–293. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  4. 4.
    Carroll, J., Bizer, C., Hayes, P., Stickler, P.: Named graphs, provenance and trust. In: Proceedings of WWW 2005, pp. 613–622. ACM (2005)Google Scholar
  5. 5.
    Chichester, C., Gaudet, P., Karch, O., Groth, P., Lane, L., Bairoch, A., Mons, B., Loizou, A.: Querying nextprot nanopublications and their value for insights on sequence variants and tissue expression. Web Semantics: Science, Services and Agents on the World Wide Web (2014)Google Scholar
  6. 6.
    Chichester, C., Karch, O., Gaudet, P., Lane, L., Mons, B., Bairoch, A.: Converting neXtProt into linked data and nanopublications. Semantic Web (2014, to appear)Google Scholar
  7. 7.
    Clarke, I., Sandberg, O., Wiley, B., Hong, T.W.: Freenet: a distributed anonymous information storage and retrieval system. In: Federrath, H. (ed.) Designing Privacy Enhancing Technologies. LNCS, vol. 2009, p. 46. Springer, Heidelberg (2001) CrossRefGoogle Scholar
  8. 8.
    Cohen, J.P., Lo, H.Z.: Academic torrents: a community-maintained distributed repository. In: Proceedings of XSEDE 2014, p. 2. ACM (2014)Google Scholar
  9. 9.
    Filali, I., Bongiovanni, F., Huet, F., Baude, F.: A survey of structured P2P systems for RDF data storage and retrieval. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems III, pp. 20–55. Springer (2011)Google Scholar
  10. 10.
    Fu, K., Kaashoek, M.F., Mazières, D.: Fast and secure distributed read-only file system. ACM Transactions on Computer Systems 20(1), 1–24 (2002)CrossRefGoogle Scholar
  11. 11.
    Groth, P., Gibson, A., Velterop, J.: The anatomy of a nano-publication. Information Services and Use 30(1), 51–56 (2010)Google Scholar
  12. 12.
    Jacobson, V., Smetters, D.K., Thornton, J.D., Plass, M., Briggs, N., Braynard, R.: Networking named content. Commun. ACM 55(1), 117–124 (2012)CrossRefGoogle Scholar
  13. 13.
    Kuhn, T.: Science bots: a model for the future of scientific computation? In: WWW 2015 Companion Proceedings, pp. 1061–1062. ACM (2015)Google Scholar
  14. 14.
    Kuhn, T., Barbano, P.E., Nagy, M.L., Krauthammer, M.: Broadening the scope of nanopublications. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 487–501. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  15. 15.
    Kuhn, T., Dumontier, M.: Trusty URIs: verifiable, immutable, and permanent digital artifacts for linked data. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 395–410. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  16. 16.
    Kuhn, T., Dumontier, M.: Making digital artifacts on the web verifiable and reliable. IEEE Transactions on Knowledge and Data Engineering 27(9) (2015)Google Scholar
  17. 17.
    Ladwig, G., Harth, A.: CumulusRDF: linked data management on nested key-value stores. In: Proceedings of SSWS 2011 (2011)Google Scholar
  18. 18.
    Markman, C., Zavras, C.: BitTorrent and libraries: Cooperative data publishing, management and discovery. D-Lib Magazine 20(3), 5 (2014)Google Scholar
  19. 19.
    McCusker, J.P., Lebo, T., Krauthammer, M., McGuinness, D.L.: Next generation cancer data discovery, access, and integration using prizms and nanopublications. In: Baker, C.J.O., Butler, G., Jurisica, I. (eds.) DILS 2013. LNCS, vol. 7970, pp. 105–112. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  20. 20.
    Miller, A., Juels, A., Shi, E., Parno, B., Katz, J.: Permacoin: repurposing Bitcoin work for data preservation. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 475–490. IEEE (2014)Google Scholar
  21. 21.
    Mons, B., van Haagen, H., Chichester, C., den Dunnen, J.T., van Ommen, G., van Mulligen, E., Singh, B., Hooft, R., Roos, M., Hammond, J., et al.: The value of data. Nature genetics 43(4), 281–283 (2011)CrossRefGoogle Scholar
  22. 22.
    Paskin, N.: Digital object identifiers for scientific data. Data Science Journal 4, 12–20 (2005)CrossRefGoogle Scholar
  23. 23.
    Patrinos, G.P., Cooper, D.N., van Mulligen, E., Gkantouna, V., Tzimas, G., Tatum, Z., Schultes, E., Roos, M., Mons, B.: Microattribution and nanopublication as means to incentivize the placement of human genome variation data into the public domain. Human mutation 33(11), 1503–1512 (2012)CrossRefGoogle Scholar
  24. 24.
    Proell, S., Rauber, A.: A scalable framework for dynamic data citation of arbitrary structured data. In: 3rd International Conference on Data Management Technologies and Applications (DATA2014), 8 2014Google Scholar
  25. 25.
    Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., Furlong, L.I.: Publishing DisGeNET as nanopublications. Semantic Web – Interoperability, Usability, Applicability (2015, to appear)Google Scholar
  26. 26.
    Speicher, S., Arwe, J., Malhotra, A.: Linked data platform 1.0. Recommendation, W3C, February 26, 2015Google Scholar
  27. 27.
    Verborgh, R., Vander Sande, M., Colpaert, P., Coppens, S., Mannens, E., Van de Walle, R.: Web-scale querying through linked data fragments. In: Proceedings of LDOW 2014 (2014)Google Scholar
  28. 28.
    Williams, A.J., Harland, L., Groth, P., Pettifer, S., Chichester, C., Willighagen, E.L., Evelo, C.T., Blomberg, N., Ecker, G., Goble, C., et al.: Open PHACTS: semantic interoperability for drug discovery. Drug discovery today 17(21), 1188–1198 (2012)CrossRefGoogle Scholar
  29. 29.
    AIDA Nanopubs extracted from GeneRIF. Nanopublication index, 4 March 2015. http://np.inn.ac/RAY_lQruuagCYtAcKAPptkY7EpITwZeUilGHsWGm9ZWNI
  30. 30.
    Nanopubs converted from neXtProt protein data (preliminary). Nanopublication index, 10 March 2015. http://np.inn.ac/RAXFlG04YMi1A5su7oF6emA8mSp6HwyS3mFTVYreDeZRg
  31. 31.
    Nanopubs converted from OpenBEL’s Small and Large Corpus 1.0. Nanopublication index, 4 March 2015. http://np.inn.ac/RACy0I4f_wr62Ol7BhnD5EkJU6Glf-wp0oPbDbyve7P6o
  32. 32.
    Nanopubs converted from OpenBEL’s Small and Large Corpus 20131211. Nanopublication index http://np.inn.ac/RAR5dwELYLKGSfrOclnWhjOj-2nGZN_8BW1JjxwFZINHw, 4 March 2015
  33. 33.
    Nanopubs extracted from DisGeNET v2.1.0.0. Nanopublication index http://np.inn.ac/RAXy332hxqHPKpmvPc-wqJA7kgWiWa-QA0DIpr29LIG0Q, 5 March 2015

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Tobias Kuhn
    • 1
    • 2
    Email author
  • Christine Chichester
    • 3
  • Michael Krauthammer
    • 4
  • Michel Dumontier
    • 5
  1. 1.Department of Humanities, Social and Political SciencesETH ZurichZürichSwitzerland
  2. 2.Department of Computer ScienceVU University AmsterdamAmsterdamThe Netherlands
  3. 3.Swiss Institute of BioinformaticsGenevaSwitzerland
  4. 4.Yale University School of MedicineNew HavenUSA
  5. 5.Stanford Center for Biomedical Informatics ResearchStanford UniversityStanfordUSA

Personalised recommendations