Automatic Vandalism Detection in Wikipedia

  • Martin Potthast
  • Benno Stein
  • Robert Gerling
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)


We present results of a new approach to detect destructive article revisions, so-called vandalism, in Wikipedia. Vandalism detection is a one-class classification problem, where vandalism edits are the target to be identified among all revisions. Interestingly, vandalism detection has not been addressed in the Information Retrieval literature by now. In this paper we discuss the characteristics of vandalism as humans recognize it and develop features to render vandalism detection as a machine learning task. We compiled a large number of vandalism edits in a corpus, which allows for the comparison of existing and new detection approaches. Using logistic regression we achieve 83% precision at 77% recall with our model. Compared to the rule-based methods that are currently applied in Wikipedia, our approach increases the F-Measure performance by 49% while being faster at the same time.


IEEE Computer Society Class Imbalance Spam Detection Class Imbalance Problem Retrieval Literature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blanzieri, E., Bryl, A.: A Survey of Anti-Spam Techniques. Technical Report DIT-06-056, University of Trento (2006)Google Scholar
  2. 2.
    Buriol, L.S., Castillo, C., Donato, D., Leonardi, S., Millozzi, S.: Temporal Analysis of the Wikigraph. In: WI 2006, pp. 45–51. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  3. 3.
    Japkowicz, N., Stephen, S.: The Class Imbalance Problem: A Systematic Study. Intell. Data Anal. 6(5), 429–449 (2002)zbMATHGoogle Scholar
  4. 4.
    Kittur, A., Suh, B., Pendleton, B., Chi, E.: He says, she says: Conflict and Coordination in Wikipedia. In: CHI 2007, pp. 453–462. ACM, New York (2007)CrossRefGoogle Scholar
  5. 5.
    Priedhorsky, R., Chen, J., Lam, S., Panciera, K., Terveen, L., Riedl, J.: Creating, Destroying, and Restoring Value in Wikipedia. In: Group 2007 (2007)Google Scholar
  6. 6.
    Viégas, F.B.: The Visual Side of Wikipedia. In: HICSS 2007, p. 85. IEEE Computer Society, Los Alamitos (2007)Google Scholar
  7. 7.
    Viégas, F.B., Wattenberg, M., Dave, K.: Studying Cooperation and Conflict between Authors with History Flow Visualizations. In: CHI 2004, pp. 575–582. ACM Press, New York (2004)CrossRefGoogle Scholar
  8. 8.
    Viégas, F.B., Wattenberg, M., Kriss, J., van Ham, F.: Talk before you Type: Coordination in Wikipedia. In: HICSS 2007, p. 78. IEEE Computer Society, Los Alamitos (2007)Google Scholar
  9. 9.
    Potthast, M., Gerling, R. (eds): Web Technology & Information Systems Group, Bauhaus University Weimar. Wikipedia Vandalism Corpus WEBIS-VC07-11 (2007),

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Martin Potthast
    • 1
  • Benno Stein
    • 1
  • Robert Gerling
    • 1
  1. 1.Faculty of MediaBauhaus University WeimarWeimarGermany

Personalised recommendations