System Support for Name Authority Control Problem in Digital Libraries: OpenDBLP Approach
In maintaining Digital Libraries, having bibliographic data up-to-date is critical, yet often minor irregularities may cause information isolation. Unlike documents for which various kinds of unique ID systems exist (e.g., DOI, ISBN), other bibliographic entities such as author and publication venue do not have unique IDs. Therefore, in current Digital Libraries, tracking such bibliographic entities is not trivial. For instance, suppose a scholar changes her last name from A to B. Then, a user, searching for her publications under the new name B, cannot get old publications that appeared under A although they are by the same person. For such a scenario, since both A and B are the same person, it would be desirable for Digital Libraries to track their identities accordingly. In this paper, we investigate this problem known as name authority control, and present our system-oriented solution. We first identify three core building blocks that underlie the phenomenon, and show taxonomy where different combinations of the building blocks can occur. Then, we consider how systems can support the problem in two common functions of Digital Libraries – Update and Search. Finally, our test-bed called OpenDBLP is presented where the suggested solution is fully implemented as a proof of the concept.
Unable to display preview. Download preview PDF.
- 1.Hong, Y., Lee, D.: OpenDBLP: Rejuvenating the DBLP into Web Service Based Programmable Digital Library. Technical report, Penn State University (2004)Google Scholar
- 2.Hernandez, M.A., Stolfo, S.J.: The Merge/Purge Problem for Large Databases. In: ACM SIGMOD (1995)Google Scholar
- 3.Ley, M.: The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives. In: SPIRE, Lisbon, Portugal (September 2002)Google Scholar
- 4.Warnner, J.W., Brown, E.W.: Automated Name Authority Control. In: ACM/IEEE JCDL (2001)Google Scholar
- 5.Davis, P.T., Elson, D.K., Klavans, J.L.: Methods for Precise Named Entity Matching in Digital Collections. In: ACM/IEEE JCDL (2003)Google Scholar
- 6.Synman, M.M.M., van Rensburg, M.J.: Revolutionizing Name Authority Control. In: ACM DL (2000)Google Scholar
- 8.Han, H., Giles, C.L., Zha, H., et al.: Two Supervised Learning Approaches for Name Disambiguation in Author Citations. In: ACM/IEEE JCDL (2004)Google Scholar
- 9.CiteSeer: Scientific Literature Digital Library, http://citeseer.ist.psu.edu/
- 10.arXiv.org e-Print archive, http://arxiv.org/
- 11.Atkins, H., Lyons, C., Ratner, H., Risher, C., Shillum, C., Sidman, D., Stevens, A., Arms, W.: Reference Linking with DOIs: A Case Study. D-Lib Magazine (2000)Google Scholar
- 12.The Open Citation Project, http://opcit.eprints.org/
- 13.Fellegi, P., Sunter, A.B.: A Theory for Record Linkage. J. of the American Statistical Society 64, 1183–1210 (1969)Google Scholar
- 14.Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity Uncertainty and Citation Matching. In: Advances in Neural Info. Processing Sys. MIT Press, Cambridge (2003)Google Scholar