String Distance Metrics for Reference Matching and Search Query Correction

  • Jakub Piskorski
  • Marcin Sydow
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4439)

Abstract

String distance metrics have been widely used in various applications concerning processing of textual data. This paper reports on the exploration of their usability for tackling the reference matching task and for the automatic correction of misspelled search engine queries, in the context of highly inflective languages, in particular focusing on Polish. The results of numerous experiments in different scenarios are presented and they revealed some preferred metrics. Surprisingly good results were observed for correcting misspelled search engine queries. Nevertheless, a more in-depth analysis is necessary to achieve improvements. The work reported here constitutes a good point of departure for further research on this topic.

Keywords

string distance metrics reference matching search engine query correction information retrieval inflective languages 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Jakub Piskorski
    • 1
  • Marcin Sydow
    • 2
  1. 1.Joint Research Center of the European Commission, Web and Language Technology Group of IPSC, T.P. 267, Via Fermi 1, 21020 Ispra (VA)Italy
  2. 2.Polish-Japanese Institute of Information Technology (PJIIT), Department of Intelligent Systems, Koszykowa 86, 02-008 WarsawPoland

Personalised recommendations