Computational Linguistics and Intelligent Text Processing

Volume 6008 of the series Lecture Notes in Computer Science pp 199-211

A Case Study of Using Web Search Statistics: Case Restoration

  • Silviu CucerzanAffiliated withMicrosoft Research

* Final gross prices may vary according to local VAT.

Get Access


We investigate the use of Web search engine statistics for the task of case restoration. Because most engines are case insensitive, an approach based on search hit counts, as employed in previous work in natural language ambiguity resolution, is not applicable for this task. Consequently, we study the use of statistics computed from the snippets generated by a Web search engine, and we show that such statistics can achieve performance similar to corpus-based approaches. We also note that the top few results returned by a search engine may not the most representative for modeling phenomena in a language.