Skip to main content

Automatic Vandalism Detection in Wikipedia with Active Associative Classification

  • Conference paper
Theory and Practice of Digital Libraries (TPDL 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7489))

Included in the following conference series:

Abstract

Wikipedia and other free editing services for collaboratively generated content have quickly grown in popularity. However, the lack of editing control has made these services vulnerable to various types of malicious actions such as vandalism. State-of-the-art vandalism detection methods are based on supervised techniques, thus relying on the availability of large and representative training collections. Building such collections, often with the help of crowdsourcing, is very costly due to a natural skew towards very few vandalism examples in the available data as well as dynamic patterns. Aiming at reducing the cost of building such collections, we present a new active sampling technique coupled with an on-demand associative classification algorithm for Wikipedia vandalism detection. We show that our classifier enhanced with a simple undersampling technique for building the training set outperforms state-of-the-art classifiers such as SVMs and kNNs. Furthermore, by applying active sampling, we are able to reduce the need for training in almost 96% with only a small impact on detection results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval, 2nd edn. Pearson Education Ltd., Harlow (2011)

    Google Scholar 

  2. Belani, A.: Vandalism Detection in Wikipedia: a Bag-of-Words Classifier Approach. CoRR, abs/1001.0700 (2010)

    Google Scholar 

  3. Javanmardi, S., McDonald, D.W., Lopes, C.V.: Vandalism Detection in Wikipedia: A High-Performing, Feature-Rich Model and its Reduction Through Lasso. In: Proc. 7th International Symposium on Wikis and Open Collaboration, pp. 82–90 (2011)

    Google Scholar 

  4. Potthast, M.: Crowdsourcing a Wikipedia Vandalism Corpus. In: Proc. ACM SIGIR, pp. 789–790 (2010)

    Google Scholar 

  5. Potthast, M., Holfeld, T.: Overview of the 2nd International Competition on Wikipedia Vandalism Detection. In: CLEF (Notebook Papers/Labs/Workshop) (2011)

    Google Scholar 

  6. Potthast, M., Stein, B., Gerling, R.: Automatic Vandalism Detection in Wikipedia. In: Proc. 30th European Conference on IR Research, pp. 663–668 (2008)

    Google Scholar 

  7. Potthast, M., et al.: Overview of the 1st International Competition on Wikipedia Vandalism Detection. In: CLEF (Notebook Papers/Labs/Workshop) (2010)

    Google Scholar 

  8. Silva, R., Gonçalves, M.A., Veloso, A.: Rule-Based Active Sampling for Learning to Rank. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 240–255. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Veloso, A., Meira Jr., W., Zaki, M.J.: Lazy Associative Classification. In: Proc. IEEE International Conference on Data Mining, pp. 645–654 (2006)

    Google Scholar 

  10. Wikipedia. Vandalism on Wikipedia (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sumbana, M., Gonçalves, M.A., Silva, R., Almeida, J., Veloso, A. (2012). Automatic Vandalism Detection in Wikipedia with Active Associative Classification. In: Zaphiris, P., Buchanan, G., Rasmussen, E., Loizides, F. (eds) Theory and Practice of Digital Libraries. TPDL 2012. Lecture Notes in Computer Science, vol 7489. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33290-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33290-6_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33289-0

  • Online ISBN: 978-3-642-33290-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics