Retrieving Customary Web Language to Assist Writers

  • Benno Stein
  • Martin Potthast
  • Martin Trenkmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5993)


This paper introduces Netspeak, a Web service which assists writers in finding adequate expressions. To provide statistically relevant suggestions, the service indexes more than 1.8 billion n-grams, n ≤ 5, along with their occurrence frequencies on the Web. If in doubt about a wording, a user can specify a query that has wildcards inserted at those positions where she feels uncertain.

Queries define patterns for which a ranked list of matching n-grams along with usage examples are retrieved. The ranking reflects the occurrence frequencies of the n-grams and informs about both absolute and relative usage. Given this choice of customary wordings, one can easily select the most appropriate. Especially second-language speakers can learn about style conventions and language usage.

To guarantee response times within milliseconds we have developed an index that considers occurrence probabilities, allowing for a biased sampling during retrieval. Our analysis shows that the extreme speedup obtained with this strategy (factor 68) comes without significant loss in retrieval quality.


Retrieval Time Retrieval Quality Statistical Natural Language Processing Literal Word Relevant Suggestion 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: IO-Top-k: Index-access Optimized Top-k Query Processing. In: Proc. of VLDB 2006 (2006)Google Scholar
  2. 2.
    Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, Displace, and Compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Brants, T., Franz, A.: Web 1T 5-gram Version 1. Linguistic Data Consortium (2006)Google Scholar
  4. 4.
    Brockett, C., Dolan, W.B., Gamon, M.: Correcting ESL Errors Using Phrasal SMT Techniques. In: Proc. of ACL 2006 (2006)Google Scholar
  5. 5.
    Cafarella, M.J., Etzioni, O.: A Search Engine for Natural Language Applications. In: Proc. of WWW 2005 (2005)Google Scholar
  6. 6.
    Ilyas, I.F., Beskales, G., Soliman, M.A.: A Survey of Top-k Query Processing Techniques in Relational Database Systems. ACM Comput. Surv. 40(4), 1–58 (2008)CrossRefGoogle Scholar
  7. 7.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT, Cambridge (1999)zbMATHGoogle Scholar
  8. 8.
    Resnik, P., Elkiss, A.: The Linguist’s Search Engine: An Overview. In: Proc. of ACL 2005 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Benno Stein
    • 1
  • Martin Potthast
    • 1
  • Martin Trenkmann
    • 1
  1. 1.Bauhaus-Universität WeimarGermany

Personalised recommendations