Did They Notice? – A Case-Study on the Community Contribution to Data Quality in DBLP

  • Florian Reitz
  • Oliver Hoffmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6966)

Abstract

Defective metadata is a significant problem of digital libraries. So far, automatic error detectors have been in the focus of research interest. However, recent public projects have shown that patrons are willing to invest time to report errors if they are called to contribute. In this case-study, we analyze the community contribution to error detection for DBLP, a public bibliographic collection. Our study is based on e-mails sent to the project between January 2007 and November 2010. We manually and automatically identify error reports and analyze their contribution to corrections of the DBLP collection. We show that users frequently report certain types of defects while others are ignored. The detection of homonym-name inconsistencies in particular strongly depends on user input. We also discuss who sends the reports and which communities are particularly active in this matter.

Keywords

Digital Library Error Report Community Contribution Publication List Author Citation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bird, C., Gourley, A., Devanbu, P.T.: Detecting Patch Submission and Acceptance in OSS Projects. In: Workshop on Mining Software Repositories, p. 26. IEEE CS, Los Alamitos (2007)Google Scholar
  2. 2.
    Bovey, J.: Adding User-Editing to a Catalogue of Cartoon Drawings. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds.) ECDL 2006. LNCS, vol. 4172, pp. 457–460. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Ferreira, A.A., Veloso, A., Gonçalves, M.A., Laender, A.H.F.: Effective self-training author name disambiguation in scholarly digital libraries. In: Hunter, J., Lagoze, C., Giles, C.L., Li, Y.-F. (eds.) JCDL, pp. 39–48. ACM, New York (2010)Google Scholar
  4. 4.
    Han, H., Giles, C.L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Chen, H., Wactlar, H.D., Chen, C.c., Lim, E.-P., Christel, M.G. (eds.) JCDL, pp. 296–305. ACM, New York (2004)Google Scholar
  5. 5.
    Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a K-way spectral clustering method. In: Marlino, M., Sumner, T., Shipman III, F.M. (eds.) JCDL, pp. 334–343. ACM, New York (2005)Google Scholar
  6. 6.
    Kapoor, N., Butler, J.T., McNee, S.M., Fouty, G.C., Stemper, J.A., Konstan, J.A.: A Study of Citations in Users’ Online Personal Collections. In: Kovács, L., Fuhr, N., Meghini, C. (eds.) ECDL 2007. LNCS, vol. 4675, pp. 404–415. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Laender, A.H.F., de Lucena, C.J.P., Maldonado, J.C., de Souza e Silva, E., Ziviani, N.: Assessing the research and education quality of the top Brazilian Computer Science graduate programs. SIGCSE Bulletin 40(2), 135–145 (2008)CrossRefGoogle Scholar
  8. 8.
    Martins, W.S., Gonçalves, M.A., Laender, A.H.F., Pappa, G.L.: Learning to assess the quality of scientific conferences: a case study in computer science. In: Heath, F., Rice-Lively, M.L., Furuta, R. (eds.) JCDL, pp. 193–202. ACM, New York (2009)CrossRefGoogle Scholar
  9. 9.
    On, B.-W., Lee, D., Kang, J., Mitra, P.: Comparative study of name disambiguation problem using a scalable blocking-based framework. In: Marlino, M., Sumner, T., Shipman III, F.M. (eds.) JCDL, pp. 344–353. ACM, New York (2005)Google Scholar
  10. 10.
    Redman, T.C.: Data Quality for the Information Age, 1st edn. Artech House, Inc., Norwood (1996)Google Scholar
  11. 11.
    Reitz, F., Hoffmann, O.: An Analysis of the Evolving Coverage of Computer Science Sub-fields in the DBLP Digital Library. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 216–227. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Reitz, F., Hoffmann, O.: Learning from the Past: An Analysis of Person Name Corrections in DBLP Collection and Social Network Properties of Affected Entities. In: Memon, N., Alhajj, R. (eds.) International Conference on Advances in Social Networks Analysis and Mining, pp. 9–16. IEEE Computer Society, Los Alamitos (2010)CrossRefGoogle Scholar
  13. 13.
    Weißgerber, P., Neu, D., Diehl, S.: Small patches get in! In: Hassan, A.E., Lanza, M., Godfrey, M.W. (eds.) Workshop on Mining Software Repositories, pp. 67–76. ACM, New York (2008)Google Scholar
  14. 14.
    Zarro, M.A., Allen, R.B.: User-Contributed Descriptive Metadata for Libraries and Cultural Institutions. In: Lalmas, M., Jose, J.M., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 46–54. Springer, Heidelberg (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Florian Reitz
    • 1
  • Oliver Hoffmann
    • 1
    • 2
  1. 1.University of TrierTrierGermany
  2. 2.Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbHGermany

Personalised recommendations