Result Aggregation for Knowledge-Intensive Multicultural Name Matching

  • Keith J. Miller
  • Mark Arehart
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5603)

Abstract

In this paper, we describe a metasearch tool resulting from experiments in aggregating the results of different name matching algorithms on a knowledge-intensive multicultural name matching task. Three retrieval engines that match romanized names were tested on a noisy and predominantly Arabic dataset. One is based on a generic string matching algorithm; another is designed specifically for Arabic names; and the third makes use of culturally-specific matching strategies for multiple cultures. We show that even a relatively naïve method for aggregating results significantly increased effectiveness over each of the individual algorithms, resulting in nearly tripling the F-score of the worst-performing algorithm included in the aggregate, and in a 6-point improvement in F-score over the single best-performing algorithm included.

Keywords

Information Retrieval Name Matching System Combination 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Voorhees, E.M., Harman, D.K.: Overview of the Eighth Text REtrieval Conference (TREC-8). In: Voorhees, E.M., Harman, D.K. (eds.) The Eighth Text REtrieval Conference (TREC-8). U.S. Government Printing Office, Washington (2000)Google Scholar
  2. 2.
    Voorhees, E.M.: The philosophy of information retrieval evaluation. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 355–370. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  3. 3.
    Jaro, M.A.: Advances in Record-linkage Methodology a Applied to Matching the 1985 Census of Tampa, Florida. Journal of the American Statistical Association 89, 414–420 (1989)CrossRefGoogle Scholar
  4. 4.
    Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In: Proceedings of the Section on Survey Research Methods, pp. 354–359. American Statistical Association (1990)Google Scholar
  5. 5.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  6. 6.
    Aslam, J.A., Montague, M.: Models for Metasearch. In: Proceedings of the 24th Annual International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 276–284. ACM Press, New York (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Keith J. Miller
    • 1
  • Mark Arehart
    • 1
  1. 1.MITRE CorporationMcLeanUSA

Personalised recommendations