Abstract
Provenance information of digital objects maintained by digital libraries and archives is crucial for authenticity assessment, reproducibility and accountability. Such information is commonly stored on metadata placed in various Metadata Repositories (MRs) or Knowledge Bases (KBs). Nevertheless, in various settings it is prohibitive to store the provenance of each digital object due to the high storage space requirements that are needed for having complete provenance. In this paper, we introduce provenance-based inference rules as a means to complete the provenance information, to reduce the amount of provenance information that has to be stored, and to ease quality control (e.g., corrections). Roughly, we show how provenance information can be propagated by identifying a number of basic inference rules over a core conceptual model for representing provenance. The propagation of provenance concerns fundamental modelling concepts such as actors, activities, events, devices and information objects, and their associations. However, since a MR/KB is not static but changes over time due to several factors, the question that arises is how we can satisfy update requests while still supporting the aforementioned inference rules. Towards this end, we elaborate on the specification of the required add/delete operations, consider two different semantics for deletion of information, and provide the corresponding update algorithms. Finally, we report extensive comparative results for different repository policies regarding the derivation of new knowledge, in datasets containing up to one million RDF triples. The results allow us to understand the tradeoffs related to the use of inference rules on storage space and performance of queries and updates.
Similar content being viewed by others
Notes
It was initially defined during the EU Project CASPAR (http://www.casparpreserves.eu/) (FP6-2005-IST-033572) and its evolution continued during the EU Project IST IP 3D-COFORM (http://www.3d-coform.eu/).
References
Definition of the cidoc conceptual reference model. http://www.cidoc-crm.org/docs/cidoc_crm_version_5.0.4.pdf
The Dublin Core Metadata Initiative. http://dublincore.org/
Riding the wave - How Europe can gain from the rising tide of scientific data (2010). http://cordis.europa.eu/fp7/ict/e-infrastructure/docs/hlg-sdi-report.pdf
Albano, A., Cardelli, L., Orsini, R.: Galileo: a strongly-typed, interactive conceptual language. ACM Trans. Database Syst. 10(2), 230–260 (1985)
Aldeco-Pérez, R., Moreau, L.: Information accountability supported by a provenance-based compliance framework. In: UK e-Science All Hands Meeting, vol. 1 (2009)
Amsterdamer, Y., Deutch, D., Milo, T., Tannen, V.: On provenance minimization. In: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. PODS ’11, pp. 141–152. ACM, New York, NY, USA (2011)
Anand, M.K., Bowers, S., Ludäscherr, B.: Techniques for efficiently querying scientific workflow provenance graphs. In: Proceedings of the 13th International Conference on extending database technology, EDBT ’10, pp. 287–298. ACM (2010)
Anand, M.K., Bowers, S., McPhillips, T., Ludäscher, B.: Efficient provenance storage over nested data collections. In: Proceedings of the 12th International Conference on extending database technology: advances in database technology. EDBT ’09, pp. 958–969. ACM, New York, NY, USA (2009)
Atkinson, M., DeWitt, D., Maier, D., Bancilhon, F., Dittrich, K., Zdonik, S.: Building an object-oriented database system. chap. The object-oriented database system manifesto, pp. 1–20. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1992)
Bancilhon, F., Spyratos, N.: Update semantics of relational views. ACM Trans. Database Syst. 6(4), 557–575 (1981)
Bechhofer, S., Horrocks, I., Goble, C.A., Stevens, R.: OilEd: A reasonable ontology editor for the semantic web. In: Proceedings of the Joint German/Austrian Conference on AI: advances in artificial intelligence. KI ’01, pp. 396–408. Springer, London, UK (2001)
Bizer, C., Heath, T., Berners-Lee, T.: Linked data—the story so far. Int. J. Semant. Web Inf. Systems (IJSWIS) 5(3), 1–22 (2009)
Boley, H.: Relationships between logic programming and RDF. In: Revised Papers from the PRICAI 2000 Workshop Reader. Four Workshops held at PRICAI 2000 on Advances in Artificial Intelligence, pp. 201–218. Springer, London, UK (2001)
Bonatti, P.A., Hogan, A., Polleres, A., Sauro, L.: Robust and scalable linked data reasoning incorporating provenance and trust annotations. Web Semant. Sci. Serv. Agents World Wide Web 9(2), 165–201 (2011)
Bowers, S., McPhillips, T., Ludäscher, B.: Declarative rules for inferring fine-grained data provenance from scientific workflow execution traces. In: Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes, IPAW’12, pp. 82–96 (2012)
Brickley, D., Guha, R.: Resource description framework (RDF) schema specification (2004). http://www.w3.org/TR/rdf-schema/
Bry, F.: Logic programming. chap. Intensional updates: abduction via deduction, pp. 561–575. MIT Press, Cambridge, MA, USA (1990)
Carey, M.J., DeWitt, D.J.: A data model and query language for exodus. In: Proceedings of the 1988 ACM SIGMOD international conference on Management of data. SIGMOD ’88, pp. 413–423. ACM, New York, NY, USA (1988)
Chapman, A.P., Jagadish, H.V., Ramanan, P.: Efficient provenance storage. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. SIGMOD ’08, pp. 993–1006. ACM, New York, NY, USA (2008)
Cosmadakis, S.S., Papadimitriou, C.H.: Updates of relational views. J. ACM 31(4), 742–760 (1984)
Dalal, M.: Investigations Into a Theory of Knowledge Base Revision: Preliminary Report. In: Rosenbloom, P., Szolovits, P. (eds.) Proceedings of the Seventh National Conference on Artificial Intelligence, vol. 2, pp. 475–479. AAAI Press, Menlo Park, California (1988)
Dayal, U., Bernstein, P.A.: On the correct translation of update operations on relational views. ACM Trans. Database Syst. 7(3), 381–416 (1982)
De Nies, T.: Constraints of the prov data model (2013). http://www.w3.org/TR/prov-constraints/
Decker, H.: Drawing updates from derivations. In: Proceedings of the third international conference on database theory on Database theory. ICDT ’90, pp. 437–451. Springer-Verlag New York Inc, New York, NY, USA (1990)
Doerr, M.: The CIDOC conceptual reference module: an ontological approach to semantic interoperability of metadata. AI Mag. 24(3), 75–92 (2003)
Doerr, M., Theodoridou, M.: CRMdig: a generic digital provenance model for scientific observation. In: Proceedings of TaPP’11: 3rd, USENIX Workshop on the Theory and Practice of Provenance (2011)
Erling, O., Mikhailov, I.: SPARQL and Scalable Inference on Demand (2009). http://virtuoso.openlinksw.com/whitepapers/SPARQL%20and%20Scalable%20Inference%20on%20Demand.pdf
Flouris, G., Konstantinidis, G., Antoniou, G., Christophides, V.: Formal foundations for RDF/S KB evolution. Knowledge and Information Systems pp. 1–39 (2012)
Flouris, G., Manakanatas, D., Kondylakis, H., Plexousakis, D., Antoniou, G.: Ontology change: classification and survey. Knowl. Eng. Rev. 23(02), 117–152 (2008)
Gabel, T., Sure, Y., Völker, J.: KAON—Ontology Management Infrastructure. SEKT informal deliverable 3.1.1.a, Institute AIFB, University of Karlsruhe (2004). http://www.aifb.uni-karlsruhe.de/WBS/ysu/publications/SEKT-D3.1.1.a.pdf
Gärdenfors, P.: Knowledge in Flux. Modelling the Dymanics of Epistemic States. MIT Press, Cambridge (1988)
Gärdenfors, P.: The dynamics of belief systems: foundations versus coherence theories. Revue Int. Philos. 44, 24–46 (1990)
Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Update exchange with mappings and provenance. In: Proceedings of the 33rd international conference on Very large data bases, VLDB ’07, pp. 675–686. VLDB Endowment (2007)
Gutierrez, C., Hurtado, C., Vaisman, A.: RDFS update: from theory to practice. In: Proceedings of the 8th extended semantic web conference on the semantic web: research and applications—Volume Part II. ESWC’11, pp. 93–107. Springer, Berlin, Heidelberg (2011)
Heinis, T., Alonso, G.: Efficient lineage tracking for scientific workflows. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. SIGMOD ’08, pp. 1007–1018. ACM, New York, NY, USA (2008)
Klein, M., Noy, N.: A component-based framework for ontology evolution. In: Workshop on Ontologies and Distributed Systems at IJCAI-03, Acapulco, Mexico (2003)
Konstantinidis, G., Flouris, G., Antoniou, G., Christophides, V.: A Formal Approach for RDF/S Ontology Evolution. In: Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence, pp. 70–74. IOS Press, Amsterdam, The Netherlands, The Netherlands (2008)
Lassila, O., Swick, R.R.: Resource description framework (RDF) model and syntax specification. W3c recommendation (1999). http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
Laurent, D., Luong, V.P., Spyratos, N.: Deleted tuples are useful when updating through universal scheme interfaces. In: Golshani, F. (ed.) ICDE, pp. 420–427. IEEE Computer Society (1992)
Laurent, D., Phan Luong, V., Spyratos, N.: Updating intensional predicates in deductive databases. In: Data Engineering, 1993. Proceedings. Ninth International Conference on, pp. 14–21 (1993)
Lim, C., Lu, S., Chebotko, A., Fotouhi, F.: Prospective and retrospective provenance collection in scientific workflow environments. In: IEEE SCC, pp. 449–456. IEEE Computer Society (2010)
Lösch, U., Rudolph, S., Vrandečić, D., Studer, R.: Tempus Fugit - Towards an Ontology Update Language. In: Proceedings of the 6th European Semantic Web Conference on the semantic web: research and applications. ESWC 2009 Heraklion, pp. 278–292. Springer, Berlin, Heidelberg (2009)
Magiridou, M., Sahtouris, S., Christophides, V., Koubarakis, M.: RUL: a declarative update language for RDF. In: Proceedings of the 4th international conference on the semantic web. ISWC’05, pp. 506–521. Springer, Berlin, Heidelberg (2005)
Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., Kwasnikowska, N., Miles, S., Missier, P., Myers, J., Plale, B., Simmhan, Y., Stephan, E.G., den Bussche, J.V.: The open provenance model core specification (v1.1). Future Generation Comp. Syst. 27(6), 743–756 (2011)
Moreau, L., Missier, P.: The PROV data model and abstract syntax notation (2011). http://www.w3.org/TR/2011/WD-prov-dm-20111018/
Mudge, M., Malzbender, T., Chalmers, A., Scopigno, R., Davis, J., Wang, O., Gunawardane, P., Ashley, M., Doerr, M., Proenca, A., Barbosa, J.: Image-based empirical information acquisition, scientific reliability, and long-term digital preservation for the natural sciences and cultural heritage. Eurographics Association, Crete, Greece (2008). http://www.eg.org/EG/DL/conf/EG2008/tutorials/T2.pdf
NASA: Science Instrument Details. http://mars.jpl.nasa.gov/msl/mission/instruments/
Neumann, T., Weikum, G.: x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. Proc. VLDB Endow. 3(1–2), 256–263 (2010)
Noy, N., Fergerson, R., Musen, M.: The Knowledge Model of Protégé-2000: Combining Interoperability and Flexibility. In: Proceedings of the 12th European Workshop on knowledge acquisition, modeling and management, EKAW ’00, pp. 17–32. Springer, Berlin (2000)
Polanyi, M.: The Tacit Dimension. Doubleday, Garden City, NY (1966)
Salza, S., Guercio, M., Grossi, M., Pröll, S., Strubulis, C., Tzitzikas, Y., Doerr, M., Flouris, G.: D24.1 Report on authenticity and plan for interoperable authenticity evaluation system. Tech. rep. (2012). http://www.alliancepermanentaccess.org/wp-content/uploads/downloads/2012/04/APARSEN-REP-D24_1-01-2_3.pdf
Schewe, K., Thalheim, B., Wetzel, I.: Foundations of object oriented database concepts. Tech. rep., Hamburg, Germany, Germany (1992). http://www.ncstrl.org:8900/ncstrl/servlet/search?formname=detail&id=oai%3Ancstrlh%3Auhamburg_cs%3Ancstrl.uhamburg_cs%2F%2FB-157
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)
Smith, M.K.: Michael Polanyi and tacit knowledge. The encyclopedia of informal education (2003). www.infed.org/thinkers/polanyi.htm
Sosa, E.: The raft and the pyramid: coherence versus foundations in the theory of knowledge. Midwest Stud. Philos. 5(1), 3–26 (1980)
Stojanovic, L., Motik, B.: Ontology Evolution within Ontology Editors. In: EKAW’02/EON Workshop, pp. 53–62 (2002)
Strubulis, C., Tzitzikas, Y., Doerr, M., Flouris, G.: Evolution of workflow provenance information in the presence of custom inference rules. In: 3rd International Workshop on the role of Semantic Web in Provenance Management (SWPM’12), Heraklion, Crete (2012)
Sure, Y., Angele, J., Staab, S.: OntoEdit: multifaceted inferencing for ontology engineering. J. Data Semant. 2800, 2003 (2003)
Theodoridou, M., Tzitzikas, Y., Doerr, M., Marketakis, Y., Melessanakis, V.: Modeling and querying provenance by extending CIDOC CRM. Distrib. Parallel Databases 27, 169–210 (2010)
Theoharis, Y., Georgakopoulos, G., Christophides, V.: PoweRGen: a power-law based generator of RDFS schemas. Inf. Systems 37(4), 306–319 (2012)
Vrain, C., Laurent, D.: Updates, induction and abduction in deductive databases. In: European Conference on Artificial Intelligence (ECAI) Workshop on Abductive and Inductive Reasoning (1996)
Wilkinson, K., LyngbÃęk, P., Hasan, W.: The iris architecture and implementation. IEEE Trans. Knowl. Data Eng. 2(1), 63–75 (1990)
Acknowledgments
Work done in the context of the of the following European projects: APARSEN (Alliance Permanent Access to the Records of Science in Europe Network, FP7 Network of Excellence, project number: 269977, duration: 2011–2014), DIACHRON (Managing the Evolution and Preservation of the Data Web, FP7 IP, project number 601043, duration: 2013–2016), 3D-COFORM IST IP (Tools and Expertise for 3D Collection Formation, project number: 231809, duration: 2008–2012) and PlanetData (FP7 Network of Excellence, project number: 257641, duration: 2010–2014).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Set of update operations
Since we focus on three inference rules, we should define operations for satisfying update requests related to these rules. The signatures of the required change operations are listed below (Table 1):
Appendix B: Algorithms of update operations
Below, we list the algorithms of our set of update operations which were presented previously in Appendix A.
Rights and permissions
About this article
Cite this article
Strubulis, C., Flouris, G., Tzitzikas, Y. et al. A case study on propagating and updating provenance information using the CIDOC CRM. Int J Digit Libr 15, 27–51 (2014). https://doi.org/10.1007/s00799-014-0125-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-014-0125-z