A Case Study of Using Web Search Statistics: Case Restoration

  • Silviu Cucerzan
Conference paper

DOI: 10.1007/978-3-642-12116-6_17

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6008)
Cite this paper as:
Cucerzan S. (2010) A Case Study of Using Web Search Statistics: Case Restoration. In: Gelbukh A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg

Abstract

We investigate the use of Web search engine statistics for the task of case restoration. Because most engines are case insensitive, an approach based on search hit counts, as employed in previous work in natural language ambiguity resolution, is not applicable for this task. Consequently, we study the use of statistics computed from the snippets generated by a Web search engine, and we show that such statistics can achieve performance similar to corpus-based approaches. We also note that the top few results returned by a search engine may not the most representative for modeling phenomena in a language.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Silviu Cucerzan
    • 1
  1. 1.Microsoft ResearchRedmondUSA

Personalised recommendations