Comparison of String Distance Metrics for Lemmatisation of Named Entities in Polish

  • Jakub Piskorski
  • Marcin Sydow
  • Karol Wieloch
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5603)

Abstract

This paper presents the results of recent experiments on application of string distance metrics to the problem of named entity lemmatisation in Polish. It extends of our work in [1] by introducing new results for organisation names. Furthermore, the results presented here and in [2,3] centering around the same topic were used to make a comparative study of the average usefulness of the numerous examined string distance metrics to lemmatisation of Polish named-entities of various types. In particular, we focus on lemmatisation of country names, organisation names and person names.

Keywords

named entities lemmatisation string distance metrics highly inflective languages 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jakub Piskorski
    • 1
  • Marcin Sydow
    • 2
  • Karol Wieloch
    • 3
  1. 1.Joint Research Centre of the European CommissionWeb Mining and Intelligence of IPSC,T.P. 267IspraItaly
  2. 2.Web Mining Lab, Intelligent Systems Dept.Polish-Japanese Institute of Information TechnologyWarsawPoland
  3. 3.Department of Information SystemsPoznań Univeristy of EconomicsPoznańPoland

Personalised recommendations