Skip to main content

On Foreign Name Search

  • Conference paper
Advances in Information Retrieval (ECIR 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

Abstract

We address foreign name search in a highly diverse user community. User sophistication ranges from highly experienced archivists to apprehensive users who shy away from technology; apprehensive users dominate system use. Thus, all system interfaces must assume minimal dependency on the user.

Our foreign names search approach, called Segments, is language independent; thus, there is no need to determine the language of origin from the diverse candidate set of thirteen languages. We compare Segments against traditional n-gram and Soundex based solutions. Actual and synthetic queries are used to search a names data set resident in the United States Holocaust Memorial Museum. We also search a subset of the 1990 United States Census Bureau Surnames data set to evaluate the performance of Segments on a predominately language specific (English) collection. Our results demonstrate statistically significant performance gains over both traditional approaches. The described approach supports search efforts at the United States Holocaust Memorial Museum.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beider, A., Morse, S.: Beider-Morse Phonetic Matching: An Alternative to Soundex with Fewer False Hits. Avotaynu: the International Review of Jewish Genealogy (Summer 2008)

    Google Scholar 

  2. Aljlayl, M., Frieder, O.: On Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach. In: ACM Eleventh Conference on Information and Knowledge Management (CIKM), Washington, DC (November 2002)

    Google Scholar 

  3. Amir, M.: From Memorials to Invaluable Historical Documentation: Using Yizkor Books as Resource for Studying A Vanished World. In: Annual Convention of the Association of Jewish Libraries, La Jolla, California (June 2001)

    Google Scholar 

  4. Aqeel, S., Beitzel, S., Jensen, E., Grossman, D., Frieder, O.: On the Development of Name Search Techniques for Arabic. Journal of the American Society of Information Science and Technology 57(6) (April 2006)

    Google Scholar 

  5. Damerau, F.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3), 171–176 (1964)

    Article  Google Scholar 

  6. Guy, F., Oard, D.: The TREC-2001 Cross-Language Information Retrieval Track: Searching Arabic using English, French or Arabic Queries. NIST TREC, Gaithersburg, Maryland (November 2001)

    Google Scholar 

  7. JewishGen, September 1 (2009), http://jewishgen.org

  8. Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  9. Mitton, R.: Spellchecking by Computers. Journal of the Simplified Spelling Society, J20 (1996)

    Google Scholar 

  10. Mokotoff, G.: Soundexing and Genealogy (2007), http://www.avotaynu.com/soundex.html (September 1, 2009)

  11. New York Public Library Yizkor Books, September 1 (2009), http://www.nypl.org/research/chss/jws/yizkorbookonline.cfm

  12. Patman, F., Shaefer, L.: Is Soundex Good Enough for You? On the Hidden Risks of Soundex-Based Name Searching. Language Analysis Systems, Inc., Herndon (2003)

    Google Scholar 

  13. Pollock, J., Zamora, A.: Automatic spelling correction in scientific and scholarly text. Communications of the ACM 27(4) (April 1984)

    Google Scholar 

  14. Snae, C., Bruckner, M.: Novel Phonetic Name Matching Algorithm with a Statistical Ontology for Analysing Names Given in Accordance with Thai Astrology. Issues in Informing Science and Information Technology (2009)

    Google Scholar 

  15. Soo, J., Cathey, R., Frieder, O., Amir, M., Frieder, G.: Yizkor Books: A Voice for the Silent Past. In: ACM Seventeenth Conference on Information and Knowledge Management (CIKM), Napa Valley, California (October 2008)

    Google Scholar 

  16. United States Census Bureau 1990 Surnames, September 1 (2009), http://www.census.gov/genealogy/names/dist.all.last

  17. Zobel, J., Dart, P.: Phonetic String Matching: Lessons from Information Retrieval. In: ACM Nineteenth Conference on Research and Development in Information Retrieval (SIGIR), Zurich, Switzerland (August 1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Soo, J., Frieder, O. (2010). On Foreign Name Search. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12275-0_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12274-3

  • Online ISBN: 978-3-642-12275-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics