Skip to main content

Did They Notice? – A Case-Study on the Community Contribution to Data Quality in DBLP

  • Conference paper
Research and Advanced Technology for Digital Libraries (TPDL 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6966))

Included in the following conference series:

Abstract

Defective metadata is a significant problem of digital libraries. So far, automatic error detectors have been in the focus of research interest. However, recent public projects have shown that patrons are willing to invest time to report errors if they are called to contribute. In this case-study, we analyze the community contribution to error detection for DBLP, a public bibliographic collection. Our study is based on e-mails sent to the project between January 2007 and November 2010. We manually and automatically identify error reports and analyze their contribution to corrections of the DBLP collection. We show that users frequently report certain types of defects while others are ignored. The detection of homonym-name inconsistencies in particular strongly depends on user input. We also discuss who sends the reports and which communities are particularly active in this matter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bird, C., Gourley, A., Devanbu, P.T.: Detecting Patch Submission and Acceptance in OSS Projects. In: Workshop on Mining Software Repositories, p. 26. IEEE CS, Los Alamitos (2007)

    Google Scholar 

  2. Bovey, J.: Adding User-Editing to a Catalogue of Cartoon Drawings. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds.) ECDL 2006. LNCS, vol. 4172, pp. 457–460. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. Ferreira, A.A., Veloso, A., Gonçalves, M.A., Laender, A.H.F.: Effective self-training author name disambiguation in scholarly digital libraries. In: Hunter, J., Lagoze, C., Giles, C.L., Li, Y.-F. (eds.) JCDL, pp. 39–48. ACM, New York (2010)

    Google Scholar 

  4. Han, H., Giles, C.L., Zha, H., Li, C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Chen, H., Wactlar, H.D., Chen, C.c., Lim, E.-P., Christel, M.G. (eds.) JCDL, pp. 296–305. ACM, New York (2004)

    Google Scholar 

  5. Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a K-way spectral clustering method. In: Marlino, M., Sumner, T., Shipman III, F.M. (eds.) JCDL, pp. 334–343. ACM, New York (2005)

    Google Scholar 

  6. Kapoor, N., Butler, J.T., McNee, S.M., Fouty, G.C., Stemper, J.A., Konstan, J.A.: A Study of Citations in Users’ Online Personal Collections. In: Kovács, L., Fuhr, N., Meghini, C. (eds.) ECDL 2007. LNCS, vol. 4675, pp. 404–415. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Laender, A.H.F., de Lucena, C.J.P., Maldonado, J.C., de Souza e Silva, E., Ziviani, N.: Assessing the research and education quality of the top Brazilian Computer Science graduate programs. SIGCSE Bulletin 40(2), 135–145 (2008)

    Article  Google Scholar 

  8. Martins, W.S., Gonçalves, M.A., Laender, A.H.F., Pappa, G.L.: Learning to assess the quality of scientific conferences: a case study in computer science. In: Heath, F., Rice-Lively, M.L., Furuta, R. (eds.) JCDL, pp. 193–202. ACM, New York (2009)

    Chapter  Google Scholar 

  9. On, B.-W., Lee, D., Kang, J., Mitra, P.: Comparative study of name disambiguation problem using a scalable blocking-based framework. In: Marlino, M., Sumner, T., Shipman III, F.M. (eds.) JCDL, pp. 344–353. ACM, New York (2005)

    Google Scholar 

  10. Redman, T.C.: Data Quality for the Information Age, 1st edn. Artech House, Inc., Norwood (1996)

    Google Scholar 

  11. Reitz, F., Hoffmann, O.: An Analysis of the Evolving Coverage of Computer Science Sub-fields in the DBLP Digital Library. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 216–227. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Reitz, F., Hoffmann, O.: Learning from the Past: An Analysis of Person Name Corrections in DBLP Collection and Social Network Properties of Affected Entities. In: Memon, N., Alhajj, R. (eds.) International Conference on Advances in Social Networks Analysis and Mining, pp. 9–16. IEEE Computer Society, Los Alamitos (2010)

    Chapter  Google Scholar 

  13. Weißgerber, P., Neu, D., Diehl, S.: Small patches get in! In: Hassan, A.E., Lanza, M., Godfrey, M.W. (eds.) Workshop on Mining Software Repositories, pp. 67–76. ACM, New York (2008)

    Google Scholar 

  14. Zarro, M.A., Allen, R.B.: User-Contributed Descriptive Metadata for Libraries and Cultural Institutions. In: Lalmas, M., Jose, J.M., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 46–54. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Reitz, F., Hoffmann, O. (2011). Did They Notice? – A Case-Study on the Community Contribution to Data Quality in DBLP. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2011. Lecture Notes in Computer Science, vol 6966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24469-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24469-8_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24468-1

  • Online ISBN: 978-3-642-24469-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics