Predicting Low-Quality Wikipedia Articles Using User’s Judgements

Zhang, Ning; Ruan, Lingyun; Si, Luo

doi:10.1007/978-3-319-05467-4_6

Ning Zhang¹³,
Lingyun Ruan¹⁴ &
Luo Si¹⁵

Part of the book series: Computational Social Sciences ((CSS))

1696 Accesses
1 Altmetric

Abstract

Wikipedia has become the most popular on-line encyclopedia. Millions of users rely on it to obtain desired knowledge and thus it becomes important and practical to model the quality of Wikipedia articles and to have inferior contents which bother readers or even mislead readers to be predicted. While identifying low-quality articles with manual efforts is a possible solution, it costs too much manpower and is too time-consuming. In this paper, we utilize article ratings from Wikipedia users for the first time to assess article quality. We define “low-quality” based on those ratings and design automatic methods to identify potential low-quality articles. More specifically, we formulate the problem as a set of binary classification problems and label articles according to whether they are “low-quality”. We compare two baseline algorithms and Logistic Regression algorithm, and the results indicate that it is promising to design effective and efficient automatic solutions for the task. We believe that our work is important for ensuring the quality of Wikipedia, as well as other knowledge markets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Adler, B., de Alfaro, L., & Pye, I. (2010). Detecting Wikipedia vandalism using WikiTrust. Notebook papers of CLEF, pp. 22–23.
Google Scholar
Adler, B. T., de Alfaro, L., Mola-Velasco, S. M., Rosso, P., & West, A. G. (2011). Wikipedia vandalism detection: Combining natural language, metadata, and reputation features. In Computational linguistics and intelligent text processing, pp. 277–288. (Springer).
Google Scholar
Anderka, M., Stein, B., & Lipka, N. (2012). Predicting quality flaws in user-generated content: The case of Wikipedia. Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, pp. 981–990. (ACM).
Google Scholar
Blumenstock, J. E. (2008). Size matters: Word count as a measure of quality on Wikipedia. Proceedings of the 17th international conference on World Wide Web, pp. 1095–1096. (ACM).
Google Scholar
Hosmer, D. W. and Lemeshow, S. (2004). Applied logistic regression (Vol. 354). New York: Wiley.
Google Scholar
Hu, M., Lim, E.-P., Sun, A., Lauw, H. W., & Vuong, B.-Q. (2007). Measuring article quality in Wikipedia: Models and evaluation. Proceedings of the sixteenth ACM conference on information and knowledge management, pp. 243–252. (ACM).
Google Scholar
Potthast, M., Stein, B., & Gerling, R. (2008). Automatic vandalism detection in Wikipedia. InAdvances in information retrieval, pp. 663–668. (Springer).
Google Scholar
Smets, K., Goethals, B., & Verdonk, B. (2008). Automatic vandalism detection in Wikipedia: Towards a machine learning approach. AAAI workshop on Wikipedia and artificial intelligence: An Evolving Synergy, pp. 43–48.
Google Scholar
Stvilia, B., Twidale, M. B., Smith, L. C., & Gasser, L. (2005). Assessing information quality of a community-based encyclopedia. Proceedings of the international conference on information quality, pp. 442–454.
Google Scholar
Wilkinson, D. M., & Huberman, B. A. (2007). Cooperation and quality in Wikipedia. Proceedings of the 2007 international symposium on Wikis, pp. 157–164. (ACM).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Purdue University, West Lafayette, IN, USA
Ning Zhang
Google, Mountain View, CA, USA
Lingyun Ruan
Department of Computer Science, Purdue University, West Lafayette, IN, USA
Luo Si

Authors

Ning Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lingyun Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Luo Si
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Zhang .

Editor information

Editors and Affiliations

Computer Science, Purdue University, West Lafayette, Indiana, USA
Elisa Bertino
Brian Lamb School of Communication, Purdue University, West Lafayette, Indiana, USA
Sorin Adam Matei

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, N., Ruan, L., Si, L. (2015). Predicting Low-Quality Wikipedia Articles Using User’s Judgements. In: Bertino, E., Matei, S. (eds) Roles, Trust, and Reputation in Social Media Knowledge Markets. Computational Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-05467-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-05467-4_6
Published: 03 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05466-7
Online ISBN: 978-3-319-05467-4
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics