Structure-Based Features for Predicting the Quality of Articles in Wikipedia

  • Baptiste de La RobertieEmail author
  • Yoann Pitarch
  • Olivier Teste
Part of the Lecture Notes in Social Networks book series (LNSN)


Success of Wikipedia is decidedly due to the free availability of high quality articles across many different expertise areas. If most of these resolute collaborations between authoritative users might constitute referenceable sources, Wikipedia is not sheltered from well-identified problems regarding articles quality, e.g., reputability of third-party sources and vandalism. Because of the huge number of articles and the intensive edit rate, it is not reasonable to even consider the manual evaluation of the content quality of each article. In this paper, we tackle the problem of modeling and predicting the quality of articles in collaborative platforms. We propose a quality model integrating both temporal and structural features captured from the implicit peer review process enabled by Wikipedia. A generic HITS-like framework is developed and able to capture both the quality of the content and the authority of the associated authors. Notably, a mutual reinforcement principle held between articles quality and author’s authority is exploited in order to take advantage of the collaborative graph generated by the users. Experiments conducted on a set of representative data from Wikipedia show the effectiveness of the computed indicators both in an unsupervised and supervised scenario.


Random Forest Model Quality Article Good Article High Quality Article Unsupervised Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Adler BT, de Alfaro L (2007) A content-driven reputation system for the wikipedia. In: Proceedings of the 16th international conference on world wide web (WWW ’07). ACM, New York, NY, pp 261–270CrossRefGoogle Scholar
  2. 2.
    Adler BT, Chatterjee K, de Alfaro L, Faella M, Pye I, Raman V (2008) Assigning trust to wikipedia content. In: Proceedings of the 4th international symposium on wikis (WikiSym ’08). ACM, New York, NY, pp 26:1–26:12Google Scholar
  3. 3.
    Biancani S (2014) Measuring the quality of edits to wikipedia. In: Proceedings of the international symposium on open collaboration (OpenSym ’14). ACM, New York, NY, pp 33:1–33:3Google Scholar
  4. 4.
    Blumenstock JE (2008) Size matters: word count as a measure of quality on wikipedia. In: Proceedings of the 17th international conference on world wide web (WWW ’08). ACM, New York, NY, pp 1095–1096CrossRefGoogle Scholar
  5. 5.
    Cox LP (2011) Truth in crowdsourcing. IEEE Secur Priv 9(5):74–76CrossRefGoogle Scholar
  6. 6.
    Dalip DH, Gonçalves MA, Cristo M, Calado P (2011) Automatic assessment of document quality in web collaborative digital libraries. J Data Inf Qual 2(3):14:1–14:30Google Scholar
  7. 7.
    De la Calzada G, Dekhtyar A (2010) On measuring the quality of wikipedia articles. In: Proceedings of the 4th workshop on information credibility (WICOW ’10). ACM, New York, NY, pp 11–18CrossRefGoogle Scholar
  8. 8.
    de La Robertie B, Pitarch Y, Teste O (2015) Measuring article quality in wikipedia using the collaboration network. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining 2015 (ASONAM ’15). ACM, New York, NY, pp 464–471CrossRefGoogle Scholar
  9. 9.
    Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, BaltimorezbMATHGoogle Scholar
  10. 10.
    Hu M, Lim E-P, Sun A, Lauw HW, Vuong B-Q (2007) Measuring article quality in wikipedia: Models and evaluation. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management (CIKM ’07). ACM, New York, NY, pp 243–252CrossRefGoogle Scholar
  11. 11.
    Javanmardi S, Lopes C (2010) Statistical measure of quality in wikipedia. In: Proceedings of the first workshop on social media analytics (SOMA ’10). ACM, New York, NY, pp 132–138CrossRefGoogle Scholar
  12. 12.
    Li X, Tang J, Wang T, Luo Z, de Rijke M (2015) Automatically assessing wikipedia article quality by exploiting article-editor networks. In: ECIR 2015: 37th European conference on information retrieval. Springer, BerlinGoogle Scholar
  13. 13.
    Suzuki Y (2015) Quality assessment of wikipedia articles using < i > h < ∕i > -index. J Inf Process 23(1):22–30Google Scholar
  14. 14.
    Suzuki Y, Yoshikawa M (2013) Assessing quality score of wikipedia article using mutual evaluation of editors and texts. In: Proceedings of the 22Nd ACM international conference on conference on information &#38; knowledge management (CIKM ’13). ACM, New York, NY, pp 1727–1732Google Scholar
  15. 15.
    Wilkinson DM, Huberman BA (2007) Cooperation and quality in wikipedia. In: Proceedings of the 2007 international symposium on wikis (WikiSym ’07). ACM, New York, NY, pp 157–164CrossRefGoogle Scholar
  16. 16.
    Wöhner T, Peters R (2009) Assessing the quality of wikipedia articles with lifecycle based metrics. In: Proceedings of the 5th international symposium on wikis and open collaboration (WikiSym ’09). ACM, New York, NY, pp 16:1–16:10Google Scholar
  17. 17.
    Yining W, Liwei W, Yuanzhi L, Di H, Wei C, Tie-Yan L (2013) A theoretical analysis of NDCG ranking measures. In: Proceedings of the 26th annual conference on learning theoryGoogle Scholar
  18. 18.
    Zeng H, Alhossaini M, Fikes R, McGuinness L (2006) Mining revision history to assess trustworthiness of article fragments. In: Proceedings of the 2nd international conference on collaborative computing: networking, applications and worksharingGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Baptiste de La Robertie
    • 1
    Email author
  • Yoann Pitarch
    • 1
  • Olivier Teste
    • 1
  1. 1.Université de Toulouse, IRIT UMR5505ToulouseFrance

Personalised recommendations