Empowering Provenance in Data Integration

  • Haridimos Kondylakis
  • Martin Doerr
  • Dimitris Plexousakis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5739)


The provenance of data has recently been recognized as central to the trust one places in data. This paper presents a novel framework in order to empower provenance in a mediator based data integration system. We use a simple mapping language for mapping schema constructs, between an ontology and relational sources, capable to carry provenance information. This language extends the traditional data exchange setting by translating our mapping specifications into source-to-target tuple generating dependencies (s-t tgds). Then we define formally the provenance information we want to retrieve i.e. annotation, source and tuple provenance. We provide three algorithms to retrieve provenance information using information stored on the mappings and the sources. We show the feasibility of our solution and the advantages of our framework.


Data Integration Provenance Mappings 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Buneman, P.: Information Integration Needs a History Lesson. University of Edinburgh, Edinburgh (2006)Google Scholar
  2. 2.
    Buneman, P., Cheney, J.: On the Expressiveness of Implicit Provenance in Query and Update Languages. ACM Transactions on Database Systems V, 1–45 (2008)Google Scholar
  3. 3.
    Glavic, B., Dittrich, K.R.: Data Provenance: A Categorization of Existing Approaches. In: BTW (2007)Google Scholar
  4. 4.
    Buneman, P., Khanna, S., Tan, W.C.: Why and Where: A Characterization of Data Provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, p. 316. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Uschold, M., Gruninger, M.: Ontologies: Principles, methods and applications. Knowledge Engineering Review 11, 93–155 (1996)CrossRefGoogle Scholar
  6. 6.
    Konstantinou, N., Spanos, D.-E., Mitrou, N.: Ontology and database mapping: A survey of current implementations and future directions. Journal of Web Engineering 7, 1–24 (2008)Google Scholar
  7. 7.
    Auer, S., Ives, Z.G.: Integrating Ontologies and Relational Data. University of Pennsylvania Department of Computer and Information Science Technical, Report No. MS-CIS-07-24 (2007)Google Scholar
  8. 8.
    Lenzerini, M.: Data integration: a theoretical perspective. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, Madison (2002)Google Scholar
  9. 9.
    Doerr, M., Ore, C.-E., Stead, S.: The CIDOC conceptual reference model: a new standard for knowledge sharing. Tutorials, posters, panels and industrial contributions at the 26th international conference on Conceptual modeling, vol. 83, Australian Computer Society, Inc., Auckland (2007)Google Scholar
  10. 10.
    Klein, M.: Combining and relating ontologies:an analysis of problems and solutions. In: IJCAI (2001)Google Scholar
  11. 11.
    Doan, A., Noy, N.F., Halevy, A.Y.: Introduction to the special issue on semantic integration. ACM SIGMOD Record 33, 11–13 (2004)CrossRefGoogle Scholar
  12. 12.
    Kalfoglou, Y., Schorlemmer, M.: Ontology mapping: the state of the art. Knowl. Eng. Rev. 18, 1–31 (2003)CrossRefzbMATHGoogle Scholar
  13. 13.
    Choi, N., Song, I.-Y., Han, H.: A survey on ontology mapping. SIGMOD Record 35, 34–41 (2006)CrossRefGoogle Scholar
  14. 14.
    Kondylakis, H., Doerr, M., Plexousakis, D.: Mapping Language for Information Integration. FORTH-ICS, Technical Report 385, ICS-FORTH (December 2006)Google Scholar
  15. 15.
    Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theoretical Computer Science 336, 89–124 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Chiticariu, L., Tan, W.-C.: Debugging schema mappings with routes. In: Proceedings of the 32nd international conference on Very large data bases. VLDB Endowment, Seoul (2006)Google Scholar
  17. 17.
    Wang, Y.R., Madnick, S.E.: A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective. In: Proceedings of the 16th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco (1990)Google Scholar
  18. 18.
    Woodruff, A., Stonebraker, M.: Supporting Fine-grained Data Lineage in a Database Visualization Environment. In: Proceedings of the Thirteenth International Conference on Data Engineering. IEEE Computer Society, Los Alamitos (1997)Google Scholar
  19. 19.
    Velegrakis, Y., Miller, R.J., Mylopoulos, J.: Representing and Querying Data Transformations. In: Proceedings of the 21st International Conference on Data Engineering. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  20. 20.
    Buneman, P., Khanna, S., Tan, W.-C.: On propagation of deletions and annotations through views. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, Madison (2002)Google Scholar
  21. 21.
    Tan, W.C.: Containment of relational queries with annotation propagation. In: Workshop on Database and Programming Languages, pp. 37–53 (2003)Google Scholar
  22. 22.
    Ioannidis, Y.E., Ramakrishnan, R.: Containment of conjunctive queries: beyond relations as sets. ACM Trans. Database Syst. 20, 288–324 (1995)CrossRefGoogle Scholar
  23. 23.
    Lee, T., Bressan, S., Madnick, S.E.: Source Attribution for Querying Against Semi-structured Documents. In: Workshop on Web Information and Data Management (1998)Google Scholar
  24. 24.
    Green, T.J., Karvounarakis, G., Tannen, V.: Provenance semirings. In: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, Beijing (2007)Google Scholar
  25. 25.
    Cui, Y., Widom, J.: Practical Lineage Tracing in Data Warehouses. In: Proceedings of the 16th International Conference on Data Engineering. IEEE Computer Society, Los Alamitos (2000)Google Scholar
  26. 26.
    Tan, W.C.: Provenance in Databases: Past, Current, and Future. IEEE Data Eng. Bull. 30, 3–12 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Haridimos Kondylakis
    • 1
  • Martin Doerr
    • 1
  • Dimitris Plexousakis
    • 1
  1. 1.Information Systems Laboratory FORTH-ICS Computer Science DepartmentUniversity of CreteGreece

Personalised recommendations