Large Scale Citation Matching Using Apache Hadoop

  • Mateusz Fedoryszak
  • Dominika Tkaczyk
  • Łukasz Bolikowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8092)

Abstract

During the process of citation matching links from bibliography entries to referenced publications are created. Such links are indicators of topical similarity between linked texts, are used in assessing the impact of the referenced document and improve navigation in the user interfaces of digital libraries. In this paper we present a citation matching method and show how to scale it up to handle great amounts of data using appropriate indexing and a MapReduce paradigm in the Hadoop environment.

Keywords

citation matching approximate indexing MapReduce Hadoop CRF SVM 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Mateusz Fedoryszak
    • 1
  • Dominika Tkaczyk
    • 1
  • Łukasz Bolikowski
    • 1
  1. 1.Interdisciplinary Centre for Mathematical and Computational ModellingUniversity of WarsawPoland

Personalised recommendations