Skip to main content

Reliable Granular References to Changing Linked Data

Part of the Lecture Notes in Computer Science book series (LNISA,volume 10587)

Abstract

Nanopublications are a concept to represent Linked Data in a granular and provenance-aware manner, which has been successfully applied to a number of scientific datasets. We demonstrated in previous work how we can establish reliable and verifiable identifiers for nanopublications and sets thereof. Further adoption of these techniques, however, was probably hindered by the fact that nanopublications can lead to an explosion in the number of triples due to auxiliary information about the structure of each nanopublication and repetitive provenance and metadata. We demonstrate here that this significant overhead disappears once we take the version history of nanopublication datasets into account, calculate incremental updates, and allow users to deal with the specific subsets they need. We show that the total size and overhead of evolving scientific datasets is reduced, and typical subsets that researchers use for their analyses can be referenced and retrieved efficiently with optimized precision, persistence, and reliability.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-68288-4_26
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-68288-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Notes

  1. 1.

    https://github.com/tkuhn/bel2nanopub.

  2. 2.

    See e.g. [33] and https://www.w3.org/TR/dwbp/#dataVersioning.

  3. 3.

    https://www.w3.org/wiki/FollowYourNose.

  4. 4.

    http://purl.org/nanopub/monitor.

  5. 5.

    https://github.com/Nanopublication/nanopub-java.

  6. 6.

    https://github.com/wikipathways/nanopublications.

  7. 7.

    See https://doi.org/10.6084/m9.figshare.5230639 and https://bitbucket.org/tkuhn/nanodiff-exp/.

References

  1. Auer, S., Herre, H.: A versioning and evolution framework for RDF knowledge bases. In: Virbitskaite, I., Voronkov, A. (eds.) PSI 2006. LNCS, vol. 4378, pp. 55–69. Springer, Heidelberg (2007). doi:10.1007/978-3-540-70881-0_8

    CrossRef  Google Scholar 

  2. Banda, J.M., Kuhn, T., Shah, N.H., Dumontier, M.: Provenance-centered dataset of drug-drug interactions. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 293–300. Springer, Cham (2015). doi:10.1007/978-3-319-25010-6_18

    CrossRef  Google Scholar 

  3. Belleau, F., Nolin, M.-A., Tourigny, N., Rigault, P., Morissette, J.: Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J. Biomed. Inf. 41(5), 706–716 (2008)

    CrossRef  Google Scholar 

  4. Bohler, A., Wu, G., Kutmon, M., Pradhana, L.A., Coort, S.L., Hanspers, K., Haw, R., Pico, A.R., Evelo, C.T.: Reactome from a WikiPathways perspective. PLoS Comput. Biol. 12(5), e1004941 (2016)

    CrossRef  Google Scholar 

  5. Chard, K., D’Arcy, M., Heavner, B., Foster, I., Kesselman, C., Madduri, R., Rodriguez, A., Soiland-Reyes, S., Goble, C., Clark, K., et al.: I’ll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets. In: IEEE International Conference on Big Data, pages 319–328. IEEE (2016)

    Google Scholar 

  6. Chichester, C., Karch, O., Gaudet, P., Lane, L., Mons, B., Bairoch, A.: Converting nextprot into linked data and nanopublications. Semant. Web 6(2), 147–153 (2015)

    Google Scholar 

  7. Cohen, J.P., Lo, H.Z.: Academic torrents: A community-maintained distributed repository. In: Proceedings of XSEDE 2014, p. 2. ACM (2014)

    Google Scholar 

  8. Fabregat, A., et al.: The reactome pathway knowledgebase. Nucleic Acids Res. 44(D1), D481–D487 (2016)

    CrossRef  Google Scholar 

  9. Fernández, J.D. Polleres, A., Umbrich, J.: Towards efficient archiving of dynamic linked open data. In: DIACRON@ESWC, pp. 34–49 (2015)

    Google Scholar 

  10. Frommhold, M., Piris, R.N., Arndt, N., Tramp, S., Petersen, N., Martin, M.: Towards versioning of arbitrary RDF data. In: Proceedings of the 12th International Conference on Semantic Systems, pp. 33–40. ACM (2016)

    Google Scholar 

  11. Graube, M. Hensel, S., Urbas, L.: R43ples: revisions for triples. In Proceedings of the 1st Workshop on Linked Data Quality. Citeseer (2014)

    Google Scholar 

  12. Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010)

    CrossRef  Google Scholar 

  13. Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., Hogan, A.: Observing linked data dynamics. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 213–227. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38288-8_15

    CrossRef  Google Scholar 

  14. Kuhn, T.: Nanopub-java: a java library for nanopublications. In: Proceedings of the 5th Workshop on Linked Science (LISC 2015) (2015)

    Google Scholar 

  15. Kuhn, T., Barbano, P.E., Nagy, M.L., Krauthammer, M.: Broadening the scope of nanopublications. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 487–501. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38288-8_33

    CrossRef  Google Scholar 

  16. Kuhn, T., Chichester, C., Krauthammer, M., Dumontier, M.: Publishing Without publishers: a decentralized approach to dissemination, retrieval, and archiving of data. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 656–672. Springer, Cham (2015). doi:10.1007/978-3-319-25007-6_38

    CrossRef  Google Scholar 

  17. Kuhn, T., Chichester, C., Krauthammer, M., Queralt-Rosinach, N., Verborgh, R., Giannakopoulos, G., Ngomo, A.-C.N., Viglianti, R., Dumontier, M.: Decentralized provenance-aware publishing with nanopublications. PeerJ Comput. Sci. 2, e78 (2016)

    CrossRef  Google Scholar 

  18. Kuhn, T., Dumontier, M.: Trusty URIs: verifiable, immutable, and permanent digital artifacts for linked data. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 395–410. Springer, Cham (2014). doi:10.1007/978-3-319-07443-6_27

    CrossRef  Google Scholar 

  19. Kuhn, T., Dumontier, M.: Making digital artifacts on the web verifiable and reliable. IEEE Trans. Knowl. Data Eng. 27(9), 2390–2400 (2015)

    CrossRef  Google Scholar 

  20. Kutmon, M., et al.: WikiPathways: capturing the full diversity of pathway knowledge. Nucleic Acids Res. 44(D1), D488–D494 (2016)

    CrossRef  Google Scholar 

  21. Meinhardt, P., Knuth, M., Sack, H.: TailR: a platform for preserving history on the web of data. In: Proceedings of the 11th International Conference on Semantic Systems, pp. 57–64. ACM (2015)

    Google Scholar 

  22. Miller, A., Juels, A., Shi, E., Parno, B., Katz, J.: Permacoin: repurposing Bitcoin work for data preservation. In: Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 475–490. IEEE (2014)

    Google Scholar 

  23. Mons, B., et al.: The value of data. Nat. Genet. 43(4), 281–283 (2011)

    CrossRef  Google Scholar 

  24. Moreau, L., Groth, P.: Provenance: an introduction to prov. Synth. Lect. Semant. Web Theor. Technol. 3(4), 1–129 (2013)

    CrossRef  Google Scholar 

  25. Nanopubs extracted from DisGeNET v2.1.0.0, incremental dataset. Nanopublication index, 9 May 2017. http://purl.org/np/RADYX-ia_TZYAw_eZD0-2oGGA7gnMxOnVj-Gh8wdJgAzI

  26. Nanopubs extracted from DisGeNET v3.0.0.0, incremental dataset. Nanopublication index, 9 May 2017. http://purl.org/np/RAufQaKzv1pZlMhZo2eBuZtx9vuugLBJsrs4ZkvR53xzw

  27. Nanopubs extracted from DisGeNET v4.0.0.0, incremental dataset. Nanopublication index, 9 May 2017. http://purl.org/np/RAu0PUrg-M8HxkOiYRXkTg7r9fgOIzFZNINj8q7ywNrdM

  28. Nanopublications extracted from WikiPathways, incremental dataset, 20170510. Nanopublication index, 11 May 2017. http://purl.org/np/RAKz0OQ3Dq8dDWqF7SIY4TgYcZRX4d2TnmLUEbOwnaGmQ

  29. Task Group on Data Citation Standards and Practices.: Out of cite, out of mind: The current state of practice, policy, and technology for the citation of data. In: Data Sci. J. 12, pp. CIDCR1-CIDCR75 (2013)

    Google Scholar 

  30. Piñero, J., Bravo, À., Queralt-Rosinach, N., Gutiérrez-Sacristán, A., Deu-Pons, J., Centeno, E., García-García, J., Sanz, F., Furlong, L.I.: DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45, D833–D839 (2016)

    CrossRef  Google Scholar 

  31. Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., Furlong, L.I.: Publishing DisGeNET as nanopublications. Semant. Web 7(5), 519–528 (2016)

    CrossRef  Google Scholar 

  32. Queralt-Rosinach, N., Piñero, J., Bravo, À., Sanz, F., Furlong, L.I.: DisGeNET-RDF: harnessing the innovative power of the semantic web to explore the genetic basis of diseases. Bioinformatics 32, 2236–2238 (2016)

    CrossRef  Google Scholar 

  33. Rauber, A., Asmi, A., van Uytvanck, D., Pröll, S.: Identification of reproducible subsets for data citation, sharing and re-use. Bull. IEEE Tech. Comm. Digit. Libr. 12(1), 6–15 (2016)

    Google Scholar 

  34. Schandl, B.: Replication and versioning of partial RDF graphs. In: Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L., Tudorache, T. (eds.) ESWC 2010. LNCS, vol. 6088, pp. 31–45. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13486-9_3

    CrossRef  Google Scholar 

  35. Silvello, G.: A methodology for citing linked open data subsets. D-Lib Magazine, 21(1/2) (2015)

    Google Scholar 

  36. Tzitzikas, Y., Theoharis, Y., Andreou, D.: On storage policies for semantic web repositories that support versioning. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 705–719. Springer, Heidelberg (2008). doi:10.1007/978-3-540-68234-9_51

    CrossRef  Google Scholar 

  37. Van de Sompel, H., Sanderson, R., Nelson, M.L., Balakireva, L.L., Shankar, H., Ainsworth, S.: An HTTP-based versioning mechanism for linked data (2010). arXiv:1003.3661

  38. Vander Sande, M., Colpaert, P., Verborgh, R., Coppens, S., Mannens, E., Van de Walle, R.: R&Wbase: git for triples. In: LDOW (2013)

    Google Scholar 

  39. Volkel, M., Winkler, W., Sure, Y., Kruk, S.R., Synak, M.: Semversion: A versioning system for RDF and ontologies. In: Proceedings of ESWC (2005)

    Google Scholar 

  40. Waagmeester, A., Kutmon, M., Riutta, A., Miller, R., Willighagen, E.L., Evelo, C.T., Pico, A.R.: Using the semantic web for rapid integration of WikiPathways with other biological online data resources. PLoS Comput. Biol. 12(6), e1004989 (2016)

    CrossRef  Google Scholar 

  41. Wilkinson, M.D., Dumontier, M., et al.: The FAIR guiding principles for scientific data management and stewardship. Sci. data 3, 160018 (2016)

    CrossRef  Google Scholar 

Download references

Acknowledgments

We would like to thank Javier D. Fernández for valuable input and discussions on RDF versioning. L.I. Furlong and E. Centeno received support from ISCIII-FEDER (PI13/00082, CP10/00524, CPII16/00026), the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tobias Kuhn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Kuhn, T., Willighagen, E., Evelo, C., Queralt-Rosinach, N., Centeno, E., Furlong, L.I. (2017). Reliable Granular References to Changing Linked Data. In: , et al. The Semantic Web – ISWC 2017. ISWC 2017. Lecture Notes in Computer Science(), vol 10587. Springer, Cham. https://doi.org/10.1007/978-3-319-68288-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68288-4_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68287-7

  • Online ISBN: 978-3-319-68288-4

  • eBook Packages: Computer ScienceComputer Science (R0)