How to Trace and Revise Identities

  • Julien Gaugaz
  • Jakub Zakrzewski
  • Gianluca Demartini
  • Wolfgang Nejdl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5554)

Abstract

The Entity Name System (ENS) is a service aiming at providing globally unique URIs for all kinds of real-world entities such as persons, locations and products, based on descriptions of such entities. Because entity descriptions available to the ENS for deciding on entity identity—Do two entity descriptions refer to the same real-world entity?—are changing over time, the system has to revise its past decisions: One entity has been given two different URIs or two entities have been attributed the same URI. The question we have to investigate in this context is then: How do we propagate entity decision revisions to the clients which make use of the URIs provided by the ENS?

In this paper we propose a solution which relies on labelling the IDs with additional history information. These labels allow clients to locally detect deprecated URIs they are using and also merge IDs referring to the same real-world entity without needing to consult the ENS. Making update requests to the ENS only for the IDs detected as deprecated considerably reduces the number of update requests, at the cost of a decrease in uniqueness quality. We investigate how much the number of update requests decreases using ID history labelling, as well as how this impacts the uniqueness of the IDs on the client. For the experiments we use both artificially generated entity revision histories as well as a real case study based on the revision history of the Dutch and Simple English Wikipedia.

References

  1. 1.
    Berberich, K., Bedathur, S.J., Neumann, T., Weikum, G.: A time machine for text search. In: SIGIR, pp. 519–526 (2007)Google Scholar
  2. 2.
    Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)CrossRefMATHGoogle Scholar
  3. 3.
    Bose, R., Frew, J.: Lineage retrieval for scientific data processing: a survey. ACM Comput. Surv. 37(1), 1–28 (2005)CrossRefGoogle Scholar
  4. 4.
    Bouquet, P., Stoermer, H., Bazzanella, B.: An Entity Name System (’ENS’) for the Semantic Web. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 258–272. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Bouquet, P., Stoermer, H., Niederée, C., Maña, A.: Entity name system: The back-bone of an open and scalable web of data. In: ICSC, pp. 554–561. IEEE Computer Society Press, Los Alamitos (2008)Google Scholar
  6. 6.
    Cui, Y., Widom, J.: Lineage tracing for general data warehouse transformations. VLDB J. 12(1), 41–58 (2003)CrossRefGoogle Scholar
  7. 7.
    Gaugaz, J., Demartini, G.: Entity identifiers for lineage preservation. In: IRSW, Tenerife, Spain (June 2008)Google Scholar
  8. 8.
    Miller Jr., R.G.: Beyond ANOVA: Basics of Applied Statistics. Texts in Statistical Science Series. Chapman & Hall/CRC, Boca Raton (1997)Google Scholar
  9. 9.
    Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)CrossRefGoogle Scholar
  10. 10.
    Wu, G., Zhang, K., Liu, C., Li, J.: Adapting Prime Number Labeling Scheme for Directed Acyclic Graphs. In: Li Lee, M., Tan, K.-L., Wuwongse, V. (eds.) DASFAA 2006. LNCS, vol. 3882, pp. 787–796. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Zhao, J., Goble, C.A., Stevens, R., Bechhofer, S.: Semantically linking and browsing provenance logs for e-science. In: Bouzeghoub, M., Goble, C.A., Kashyap, V., Spaccapietra, S. (eds.) ICSNW 2004. LNCS, vol. 3226, pp. 158–176. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Julien Gaugaz
    • 1
  • Jakub Zakrzewski
    • 1
  • Gianluca Demartini
    • 1
  • Wolfgang Nejdl
    • 1
  1. 1.L3S Research CenterLeibniz Universität HannoverHannoverGermany

Personalised recommendations