SyGAR – A Synthetic Data Generator for Evaluating Name Disambiguation Methods

  • Anderson A. Ferreira
  • Marcos André Gonçalves
  • Jussara M. Almeida
  • Alberto H. F. Laender
  • Adriano Veloso
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5714)

Abstract

Name ambiguity in the context of bibliographic citations is one of the hardest problems currently faced by the digital library community. Several methods have been proposed in the literature, but none of them provides the perfect solution for the problem. More importantly, basically all of these methods were tested in limited and restricted scenarios, which raises concerns about their practical applicability. In this work, we deal with these limitations by proposing a synthetic generator of ambiguous authorship records called SyGAR. The generator was validated against a gold standard collection of disambiguated records, and applied to evaluate three disambiguation methods in a relevant scenario.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Han, H., Giles, C.L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: JCDL, pp. 296–305 (2004)Google Scholar
  2. 2.
    Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: JCDL, pp. 334–343 (2005)Google Scholar
  3. 3.
    Huang, J., Ertekin, S., Giles, C.L.: Efficient name disambiguation for large-scale databases. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML/PKDD 2006. LNCS (LNAI), vol. 4213, pp. 536–544. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    On, B.W., Lee, D., Kang, J., Mitra, P.: Comparative study of name disambiguation problem using a scalable blocking-based framework. In: JCDL, pp. 344–353 (2005)Google Scholar
  5. 5.
    Song, Y., Huang, J., Councill, I.G., Li, J., Giles, C.L.: Efficient topic-based unsupervised name disambiguation. In: JCDL, pp. 342–351 (2007)Google Scholar
  6. 6.
    Veloso, A., Ferreira, A.A., Gonçalves, M.A., Laender, A.H.F., Meira Jr., W., Belém, R.: Cost-effective on-demand associative name disambiguation in bibliographic citations. Technical Report RT DCC.001/2009, DCC-UFMG (under review) (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Anderson A. Ferreira
    • 1
  • Marcos André Gonçalves
    • 1
  • Jussara M. Almeida
    • 1
  • Alberto H. F. Laender
    • 1
  • Adriano Veloso
    • 1
  1. 1.Department of Computer ScienceFederal University of Minas GeraisBelo Horizonte-MGBrazil

Personalised recommendations