A Case Study of Using Web Search Statistics: Case Restoration

* Final gross prices may vary according to local VAT.

Get Access

Abstract

We investigate the use of Web search engine statistics for the task of case restoration. Because most engines are case insensitive, an approach based on search hit counts, as employed in previous work in natural language ambiguity resolution, is not applicable for this task. Consequently, we study the use of statistics computed from the snippets generated by a Web search engine, and we show that such statistics can achieve performance similar to corpus-based approaches. We also note that the top few results returned by a search engine may not the most representative for modeling phenomena in a language.