Skip to main content

Citation Enrichment Improves Deduplication of Primary Evidence

  • Conference paper
  • First Online:
  • 842 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9441))

Abstract

Objective: To automatically detect duplicate citations in a bibliographical database.

Background: Citations retrieved from multiple search databases have different forms making manual and automatic detection of duplicates difficult. Existing methods rely on fuzzy-similarity measures which are error-prone.

Methods: We analysed four pairs of original search results from MEDLINE and EMBASE that were used to create systematic reviews. An automatic tool deduplicated citations by first enriching citations with Digital Object Identifiers (DOI), and/or other unique identifiers. Duplication of records was then determined by comparing these unique identifiers. We compared our method with the duplicate detection function of a popular citation management desktop application in several configurations.

Results: Citation Enrichment identified 93 % (range 86 %–100 %) of the duplicates indexed online and erroneously marked 3 % (range 0 %–6 %) documents as duplicates. The citation management application found 68 % (range 64 %–72 %) without error using default setting. When set for highest deduplication, the citation management application found 94 % of duplicates (range 77 %–100 %) and 4 % error (range 0 %–8 %).

Conclusion: Citation enrichment using unique identifiers enhances automatic deduplication. On its own, the approach seems slightly superior to tools that compare citations without enrichment. Methods that combine citation enrichment with existing fuzzy-matching may substantially reduce resource requirements of evidence synthesis.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

DOI:

Digital Object Identifiers

FN:

False Negatives

FP:

False Positives

HTML:

HyperText Markup Language

MAS:

Microsoft Academic Search

MASID:

MAS unique identifiers

P:

Precision

PDF:

Portable Document Format

R:

Recall

SR:

Systematic Reviews

TP:

True Positives

References

  1. Lefebvre, C., Manheimer, E., Glanville, J.: Chapter 6: searching for studies, in Cochrane handbook for systematic reviews of interventions. In: Higgins, J., Green, S. (eds.) The Cochrane Collaboration (2011). www.cochrane-handbook.org

  2. Qi, X., et al.: Find duplicates among the PubMed, EMBASE, and Cochrane library databases in systematic review. PLoS ONE 8(8), e71838 (2013)

    Article  Google Scholar 

  3. Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)

    Article  Google Scholar 

  4. Tsafnat, G., et al.: The automation of systematic reviews. BMJ Br. Med. J. 346, f139 (2013)

    Google Scholar 

  5. Carvalho, M.G., et al.: Replica identification using genetic programming. In: Proceedings of the 2008 ACM symposium on Applied computing. ACM (2008)

    Google Scholar 

  6. Chaudhuri, S., Ganti, V., Motwani, R.: Robust identification of fuzzy duplicates. In: 21st International Conference on Data Engineering, 2005, ICDE 2005, Proceedings. IEEE (2005)

    Google Scholar 

  7. Borges, E.N., et al.: A classification-based approach for bibliographic metadata deduplication. In: Proceedings of the IADIS International Conference WWW/Internet 2011 (2011)

    Google Scholar 

  8. Rathbone, J., et al.: Better duplicate detection for systematic reviewers: evaluation of systematic review assistant-deduplication module. Syst. Rev. 4(1), 6 (2015)

    Article  Google Scholar 

  9. Jiang, Y., et al.: Rule-based deduplication of article records from bibliographic databases. Database 2014, bat086 (2014)

    Article  Google Scholar 

  10. Gillies, M., et al.: Harms from amoxicillin: a systematic review and meta-analysis of randomised controlled trials for any indication. Unpublished raw data (2014)

    Google Scholar 

  11. Maconochie, I.K., Bhaumik, S.: Fluid therapy for acute bacterial meningitis. Cochrane Database Syst. Rev. 5 (2014). doi: 10.1002/14651858.CD004786.pub4

  12. Pugh, R., et al.: Short-course versus prolonged-course antibiotic therapy for hospital-acquired pneumonia in critically ill adults. Cochrane Database Syst Rev 10 (2011). doi: 10.1002/14651858.CD007577.pub2

  13. Coxeter, P., Hoffmann, T., Del Mar, C.B.: Shared decision making for acute respiratory infections in primary care. Cochrane Database Syst. Rev. 1, 1–11 (2014)

    Google Scholar 

  14. Choong, M.K., et al.: Automatic evidence retrieval for systematic reviews. J. Med. Internet Res. 16(10), e223 (2014)

    Google Scholar 

  15. Choong, M.K., et al.: Automatic evidence discovery for systematic reviews. J. Med. Internet Res. 16(10) (2014)

    Google Scholar 

  16. EndNote. http://endnote.com/. Available from: http://endnote.com/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miew Keen Choong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Choong, M.K., Thorning, S., Tsafnat, G. (2015). Citation Enrichment Improves Deduplication of Primary Evidence. In: Li, XL., Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D. (eds) Trends and Applications in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science(), vol 9441. Springer, Cham. https://doi.org/10.1007/978-3-319-25660-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25660-3_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25659-7

  • Online ISBN: 978-3-319-25660-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics