Abstract
Success of Wikipedia is decidedly due to the free availability of high quality articles across many different expertise areas. If most of these resolute collaborations between authoritative users might constitute referenceable sources, Wikipedia is not sheltered from well-identified problems regarding articles quality, e.g., reputability of third-party sources and vandalism. Because of the huge number of articles and the intensive edit rate, it is not reasonable to even consider the manual evaluation of the content quality of each article. In this paper, we tackle the problem of modeling and predicting the quality of articles in collaborative platforms. We propose a quality model integrating both temporal and structural features captured from the implicit peer review process enabled by Wikipedia. A generic HITS-like framework is developed and able to capture both the quality of the content and the authority of the associated authors. Notably, a mutual reinforcement principle held between articles quality and author’s authority is exploited in order to take advantage of the collaborative graph generated by the users. Experiments conducted on a set of representative data from Wikipedia show the effectiveness of the computed indicators both in an unsupervised and supervised scenario.
Keywords
- Random Forest Model
- Quality Article
- Good Article
- High Quality Article
- Unsupervised Model
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Adler BT, de Alfaro L (2007) A content-driven reputation system for the wikipedia. In: Proceedings of the 16th international conference on world wide web (WWW ’07). ACM, New York, NY, pp 261–270
Adler BT, Chatterjee K, de Alfaro L, Faella M, Pye I, Raman V (2008) Assigning trust to wikipedia content. In: Proceedings of the 4th international symposium on wikis (WikiSym ’08). ACM, New York, NY, pp 26:1–26:12
Biancani S (2014) Measuring the quality of edits to wikipedia. In: Proceedings of the international symposium on open collaboration (OpenSym ’14). ACM, New York, NY, pp 33:1–33:3
Blumenstock JE (2008) Size matters: word count as a measure of quality on wikipedia. In: Proceedings of the 17th international conference on world wide web (WWW ’08). ACM, New York, NY, pp 1095–1096
Cox LP (2011) Truth in crowdsourcing. IEEE Secur Priv 9(5):74–76
Dalip DH, Gonçalves MA, Cristo M, Calado P (2011) Automatic assessment of document quality in web collaborative digital libraries. J Data Inf Qual 2(3):14:1–14:30
De la Calzada G, Dekhtyar A (2010) On measuring the quality of wikipedia articles. In: Proceedings of the 4th workshop on information credibility (WICOW ’10). ACM, New York, NY, pp 11–18
de La Robertie B, Pitarch Y, Teste O (2015) Measuring article quality in wikipedia using the collaboration network. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015 (ASONAM ’15). ACM, New York, NY, pp 464–471
Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, Baltimore
Hu M, Lim E-P, Sun A, Lauw HW, Vuong B-Q (2007) Measuring article quality in wikipedia: Models and evaluation. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management (CIKM ’07). ACM, New York, NY, pp 243–252
Javanmardi S, Lopes C (2010) Statistical measure of quality in wikipedia. In: Proceedings of the first workshop on social media analytics (SOMA ’10). ACM, New York, NY, pp 132–138
Li X, Tang J, Wang T, Luo Z, de Rijke M (2015) Automatically assessing wikipedia article quality by exploiting article-editor networks. In: ECIR 2015: 37th European conference on information retrieval. Springer, Berlin
Suzuki Y (2015) Quality assessment of wikipedia articles using < i > h < ∕i > -index. J Inf Process 23(1):22–30
Suzuki Y, Yoshikawa M (2013) Assessing quality score of wikipedia article using mutual evaluation of editors and texts. In: Proceedings of the 22Nd ACM international conference on conference on information & knowledge management (CIKM ’13). ACM, New York, NY, pp 1727–1732
Wilkinson DM, Huberman BA (2007) Cooperation and quality in wikipedia. In: Proceedings of the 2007 international symposium on wikis (WikiSym ’07). ACM, New York, NY, pp 157–164
Wöhner T, Peters R (2009) Assessing the quality of wikipedia articles with lifecycle based metrics. In: Proceedings of the 5th international symposium on wikis and open collaboration (WikiSym ’09). ACM, New York, NY, pp 16:1–16:10
Yining W, Liwei W, Yuanzhi L, Di H, Wei C, Tie-Yan L (2013) A theoretical analysis of NDCG ranking measures. In: Proceedings of the 26th annual conference on learning theory
Zeng H, Alhossaini M, Fikes R, McGuinness L (2006) Mining revision history to assess trustworthiness of article fragments. In: Proceedings of the 2nd international conference on collaborative computing: networking, applications and worksharing
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
de La Robertie, B., Pitarch, Y., Teste, O. (2017). Structure-Based Features for Predicting the Quality of Articles in Wikipedia. In: Kawash, J., Agarwal, N., Özyer, T. (eds) Prediction and Inference from Social Networks and Social Media. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-51049-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-51049-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51048-4
Online ISBN: 978-3-319-51049-1
eBook Packages: Computer ScienceComputer Science (R0)