How to Trace and Revise Identities
The Entity Name System (ENS) is a service aiming at providing globally unique URIs for all kinds of real-world entities such as persons, locations and products, based on descriptions of such entities. Because entity descriptions available to the ENS for deciding on entity identity—Do two entity descriptions refer to the same real-world entity?—are changing over time, the system has to revise its past decisions: One entity has been given two different URIs or two entities have been attributed the same URI. The question we have to investigate in this context is then: How do we propagate entity decision revisions to the clients which make use of the URIs provided by the ENS?
In this paper we propose a solution which relies on labelling the IDs with additional history information. These labels allow clients to locally detect deprecated URIs they are using and also merge IDs referring to the same real-world entity without needing to consult the ENS. Making update requests to the ENS only for the IDs detected as deprecated considerably reduces the number of update requests, at the cost of a decrease in uniqueness quality. We investigate how much the number of update requests decreases using ID history labelling, as well as how this impacts the uniqueness of the IDs on the client. For the experiments we use both artificially generated entity revision histories as well as a real case study based on the revision history of the Dutch and Simple English Wikipedia.
- 1.Berberich, K., Bedathur, S.J., Neumann, T., Weikum, G.: A time machine for text search. In: SIGIR, pp. 519–526 (2007)Google Scholar
- 5.Bouquet, P., Stoermer, H., Niederée, C., Maña, A.: Entity name system: The back-bone of an open and scalable web of data. In: ICSC, pp. 554–561. IEEE Computer Society Press, Los Alamitos (2008)Google Scholar
- 7.Gaugaz, J., Demartini, G.: Entity identifiers for lineage preservation. In: IRSW, Tenerife, Spain (June 2008)Google Scholar
- 8.Miller Jr., R.G.: Beyond ANOVA: Basics of Applied Statistics. Texts in Statistical Science Series. Chapman & Hall/CRC, Boca Raton (1997)Google Scholar