Large Scale Citation Matching Using Apache Hadoop
During the process of citation matching links from bibliography entries to referenced publications are created. Such links are indicators of topical similarity between linked texts, are used in assessing the impact of the referenced document and improve navigation in the user interfaces of digital libraries. In this paper we present a citation matching method and show how to scale it up to handle great amounts of data using appropriate indexing and a MapReduce paradigm in the Hadoop environment.
Keywordscitation matching approximate indexing MapReduce Hadoop CRF SVM
Unable to display preview. Download preview PDF.