Generalized Mongue-Elkan Method for Approximate Text String Comparison

  • Sergio Jimenez
  • Claudia Becerra
  • Alexander Gelbukh
  • Fabio Gonzalez
Conference paper

DOI: 10.1007/978-3-642-00382-0_45

Part of the Lecture Notes in Computer Science book series (LNCS, volume 5449)
Cite this paper as:
Jimenez S., Becerra C., Gelbukh A., Gonzalez F. (2009) Generalized Mongue-Elkan Method for Approximate Text String Comparison. In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg

Abstract

The Mongue-Elkan method is a general text string comparison method based on an internal character-based similarity measure (e.g. edit distance) combined with a token level (i.e. word level) similarity measure. We propose a generalization of this method based on the notion of the generalized arithmetic mean instead of the simple average used in the expression to calculate the Monge-Elkan method. The experiments carried out with 12 well-known name-matching data sets show that the proposed approach outperforms the original Monge-Elkan method when character-based measures are used to compare tokens.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Sergio Jimenez
    • 1
  • Claudia Becerra
    • 1
  • Alexander Gelbukh
    • 2
  • Fabio Gonzalez
    • 1
  1. 1.Intelligent Systems Laboratory (LISI) Systems and Industrial Engineering DepartmentNational University of ColombiaColombia
  2. 2.Natural Language Laboratory Center for Computing Research (CIC)National Polytechnic Institute (IPN)Mexico

Personalised recommendations