An annotation management system for relational databases

Abstract

We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system could be used for understanding the provenance (aka lineage) of data, who has seen or edited a piece of data or the quality of data, which are useful functionalities for applications that deal with integration of scientific and biological data.

We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted.

This is a preview of subscription content, access via your institution.

References

  1. 1.

    Apweiler, R., Bairoch, A., Wu, C., Barker, W., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., Martin, M., Natale, D., O'Donovan, C., Redaschi, N., Yeh, L.: Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 32, D115–D119 (2004)

    Article  Google Scholar 

  2. 2.

    Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL. Nucleic Acids Res. 28, 45–48, 2000

    Article  Google Scholar 

  3. 3.

    DBCAT, The Public Catalog of Databases. http://www.infobiogen.fr/services/dbcat/. Cited 5 June 2000

  4. 4.

    Denning, D.E., Lunt, T.F., Schell, R.R., Shockley, W.R., Heckman, M.: The seaview security model. In: Proceedings of the IEEE Symposium on Security and Privacy, Washington, DC, pp. 218–233, (1988)

  5. 5.

    Jajodia, S., Sandhu, R.S.: Polyinstantiation integrity in multilevel relations. In: Proceedings of the IEEE Symposium on Security and Privacy, Oakland, California, pp. 104–115, (1990)

  6. 6.

    Myers, A.C., Liskov, B.: A decentralized model for information control. In: Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), Saint-Malo, France, pp. 129–142, (1997)

  7. 7.

    Tan, W.: Containment of relational queries with annotation propagation. In: Proceedings of the International Workshop on Database and Programming Languages (DBPL), Potsdam, Germany, pp. 3‘7–53, (2003)

  8. 8.

    Lee, T., Bressan, S., Madnick, S.: Source attribution for querying against semi-structured documents. In: Workshop on Web Information and Data Management (WIDM), Washington, DC (1998)

  9. 9.

    Wang, Y.R., Madnick, S.E.: A polygen model for heterogeneous database systems: The source tagging perspective. In: Proceedings of the International Conference on Very Large Data Bases (VLDB), Brisbane, Queensland, Australia, pp. 519–538, (1990)

  10. 10.

    Cui, Y., Widom, J., Wiener, J.: Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. (TODS) 25(2), 179–227 (2000)

    Article  Google Scholar 

  11. 11.

    Buneman, P., Khanna, S., Tan, W.: Why and where: A characterization of data provenance. In: Proceedings of the International Conference on Database Theory (ICDT), London, United Kingdom, pp. 316–330, (2001)

  12. 12.

    Bernstein, P., Bergstraesser, T.: Meta-data support for data transformations using microsoft repository. IEEE Data Eng. Bull. 22(1), 9–14 (1999)

    Google Scholar 

  13. 13.

    Maier, D., Delcambre, L.: Superimposed information for the internet. In: Proceedings of the International Workshop on the Web and Databases (WebDB), Philadelphia, Pennsylvania, pp. 1–9, (1999)

  14. 14.

    Kahan, J., Koivunen, M., Prud'Hommeaux, E., Swick, R.: Annotea: An open rdf infrastructure for shared web annotations. In: Proceedings of the International World Wide Web Conference(WWW10), Hong Kong, China, pp. 623–632, (2001)

  15. 15.

    LaLiberte, D., Braverman, A.: A protocol for scalable group and public annotations. In: Proceedings of the International World Wide Web Conference(WWW3), Darmstadt, Germany (1995)

  16. 16.

    Phelps, T.A., Wilensky, R.: Multivalent documents. In: Proceedings of the Communications of the Association for Computing Machinery (CACM) 43(6), 82–90 (2000)

  17. 17.

    Schickler, M.A., Mazer, M.S., Brooks, C.: Pan-browser support for annotations and other meta-information on the world wide web. In: Proceedings of the International World Wide Web Conference(WWW5), Paris, France (1996)

  18. 18.

    W3C. Annotea Project. http://www.w3.org/2001/Annotea

  19. 19.

    biodas.org. http://biodas.org.

  20. 20.

    Dowell, R.: A distributed annotation system. Technical report, Department of Computer Science, Washington University in St. Louis (2001)

  21. 21.

    Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., Haussler, D.: The human genome browser at UCSC. Genome Res. 12(5), 996–1006 (2002)

    Article  Google Scholar 

  22. 22.

    Phelps, T.A., Wilensky, R.: Multivalent annotations. In: Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries, Pisa, Italy, pp. 287–303, (1997)

  23. 23.

    Phelps, T.A., Wilensky, R.: Robust intra-document locations. In: Proceedings of the International World Wide Web Conference(WWW9), Amsterdam, The Netherlands, pp. 105–118, (2000)

  24. 24.

    Buneman, P., Khanna, S., Tan, W.: On propagation of deletions and annotations through views. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), Wisconsin, Madison, pp. 150–158, (2002)

  25. 25.

    Abiteboul, S., Hull, R., Vianu, V.: Foundations of databases. Addison-Wesley Co., Reading, MA (1995)

    Google Scholar 

  26. 26.

    Kementseitsidis, A., Arenas, M., Miller, R.J.: Mapping data in peer-to-peer systems: Semantics and algorithmic issues. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), San Diego, CA, pp. 325–336, (2003)

  27. 27.

    Tan, W.: Containment of relational queries with annotation propagation. Technical report, Department of Computer Science, UC Santa Cruz (2003)

  28. 28.

    Chiticariu, L., Tan, W.-C., Vijayvargiya, G.: DBNotes: A post-it system for relational databases based on provenance. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD) '05, pp. 942–944, (2005)

  29. 29.

    TPC Transaction Processing Performance Council. http://www.tpc.org

  30. 30.

    Chaudhuri, S., Vardi, M.Y.: Optimization of real conjunctive queries. In: Proceedings of the ACM Symposium on Principles of Database Systems (PODS), Washington, DC, pp. 59–70, (1993)

  31. 31.

    Sagiv, Y., Yannakakis, M.: Equivalence among relational expressions with union and difference operators. J. Assoc. Comput. Machine. (JACM) 27(4), 633–655 (1980)

    MathSciNet  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Laura Chiticariu.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bhagwat, D., Chiticariu, L., Tan, WC. et al. An annotation management system for relational databases. The VLDB Journal 14, 373–396 (2005). https://doi.org/10.1007/s00778-005-0156-6

Download citation

Keywords

  • Data provenance
  • Lineage
  • Annotation propagation
  • Metadata