Chapter

The People’s Web Meets NLP

Part of the series Theory and Applications of Natural Language Processing pp 121-160

Date:

A Survey of NLP Methods and Resources for Analyzing the Collaborative Writing Process in Wikipedia

  • Oliver FerschkeAffiliated withUbiquitous Knowledge Processing Lab, Technische Universität Darmstadt Email author 
  • , Johannes DaxenbergerAffiliated withUbiquitous Knowledge Processing Lab, Technische Universität Darmstadt
  • , Iryna GurevychAffiliated withUbiquitous Knowledge Processing Lab, Technische Universität Darmstadt, German Institute for Educational Research and Educational Information

* Final gross prices may vary according to local VAT.

Get Access

Abstract

With the rise of the Web 2.0, participatory and collaborative content production have largely replaced the traditional ways of information sharing and have created the novel genre of collaboratively constructed language resources. A vast untapped potential lies in the dynamic aspects of these resources, which cannot be unleashed with traditional methods designed for static corpora. In this chapter, we focus on Wikipedia as the most prominent instance of collaboratively constructed language resources. In particular, we discuss the significance of Wikipedi’s revision history for applications in Natural Language Processing (NLP) and the unique prospects of the user discussions, a new resource that has just begun to be mined. While the body of research on processing Wikipedia’s revision history is dominated by works that use the revision data as the basis for practical applications such as spelling correction or vandalism detection, most of the work focused on user discussions uses NLP for analyzing and understanding the data itself.