Using Morphological and Semantic Features for the Quality Assessment of Russian Wikipedia

  • Włodzimierz Lewoniewski
  • Nina Khairova
  • Krzysztof Węcel
  • Nataliia Stratiienko
  • Witold Abramowicz
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 756)

Abstract

Nowadays, the assessment of the quality and credibility of Wikipedia articles becomes increasingly important. We propose to use morphological and semantic features to estimate the quality of Wikipedia articles in Russian language. We distinguished over 150 linguistic features and divided them into four groups. In these groups, we considered the features of encyclopedic style, readability and subjectivism of the article’s text. Based on Random Forest as a classification algorithm, we show the most importance linguistic features that affect the quality of Russian Wikipedia articles. We compare the classification results of our four linguistic features groups separately. We have achieved the F-measure of 89,75%.

Keywords

Quality assessment of texts Morphological and semantics features Russian Wikipedia articles Random forests classification Encyclopedic Readability Subjectivism 

References

  1. 1.
    Michael, B.: Wikipedia Or Encyclopædia Britannica: Which Has More Bias? Forbes (2015). http://www.forbes.com/sites/hbsworkingknowledge/2015/01/20/wikipedia-or-encyclopaedia-britannica-which-has-more-bias. Accessed 15 June 2017
  2. 2.
    Xu, Y., Luo, T.: Measuring article quality in Wikipedia: Lexical clue model. In Web Society (SWS). In: 2011 3rd Symposium on IEEE, pp. 141–146 (2011)Google Scholar
  3. 3.
    Anderka, M.: Analyzing and predicting quality flaws in user-generated content: the case of wikipedia. Ph.D., Bauhaus-Universitaet Weimar Germany (2013)Google Scholar
  4. 4.
    Kittur, A., Kraut, R.E.: Harnessing the wisdom of crowds in wikipedia: quality through coordination. In: Proceedings of the 2008 ACM conference on Computer Supported Cooperative Work, pp. 37–46. ACM (2008)Google Scholar
  5. 5.
    Velázquez, C.G., Cagnina, L.C., Errecalde, M.L.: On the feasibility of external factual support as Wikipedia’s quality metric. Procesamiento del Lenguaje Natural 58, 93–100 (2017)Google Scholar
  6. 6.
    Lipka, N., Stein, B.: Identifying featured articles in wikipedia: writing style matters. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1147–1148 (2010)Google Scholar
  7. 7.
    Khairova, N., Petrasova, S., Gautam, A.: The logical-linguistic model of fact extraction from english texts. In: International Conference on Information and Software Technologies, CCIS 2016, Communications in Computer and Information Science, pp. 625–635 (2016)Google Scholar
  8. 8.
    Warncke-Wang, M., Cosley, D., Riedl, J.: Tell me more: an actionable quality model for Wikipedia. In: Proceedings of the 9th International Symposium on Open Collaboration (2013)Google Scholar
  9. 9.
    Giles, G.: Internet encyclopaedias go head to head. Nature 438, 900–901 (2005)CrossRefGoogle Scholar
  10. 10.
    Panicheva, P., Ledovaya, Y., Bogolyubova, O.: Lexical, morphological and semantic correlates of the dark triad personality traits in russian facebook texts. In: Artificial Intelligence and Natural Language Conference (AINL), pp. 1–8. IEEE (2016)Google Scholar
  11. 11.
    Lenzner, T.: Are readability formulas valid tools for assessing survey question difficulty? Sociol. Methods Res. 43(4), 677–698 (2014)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Sharoff, S., Umanskaya, E., Wilson, J.: A frequency dictionary of Russian: core vocabulary for learners, Routledge (2014)Google Scholar
  13. 13.
    Khairova, N., Lewoniewski, W., Wecel, K.: Estimating the quality of articles in russian Wikipedia using the logical-linguistic model of fact extraction. In: International Conference on Business Information Systems, pp. 28–42 (2017)Google Scholar
  14. 14.
    Węcel, K., Lewoniewski, W.: Modelling the quality of attributes in wikipedia infoboxes. In: Abramowicz, W. (ed.) BIS 2015. LNBIP, vol. 228, pp. 308–320. Springer, Cham (2015). doi: 10.1007/978-3-319-26762-3_27 CrossRefGoogle Scholar
  15. 15.
    Lewoniewski, W., Węcel, K., Abramowicz, W.: Quality and importance of wikipedia articles in different languages. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 613–624. Springer, Cham (2016). doi: 10.1007/978-3-319-46254-7_50 CrossRefGoogle Scholar
  16. 16.
    Rebuschat, P.E., Detmar, M., McEnery, T.: Language learning research at the intersection of experimental, computational and corpus-based approaches, Language Learning (2017)Google Scholar
  17. 17.
    Wu, G., Harrigan, M., Cunningham, P.: Characterizing wikipedia pages using edit network motif profiles. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 45–52. ACM (2011)Google Scholar
  18. 18.
    Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Granitzer, M.: Measuring the quality of web content using factual information, In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality, pp. 7–10. ACM (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Włodzimierz Lewoniewski
    • 1
  • Nina Khairova
    • 2
  • Krzysztof Węcel
    • 1
  • Nataliia Stratiienko
    • 2
  • Witold Abramowicz
    • 1
  1. 1.Poznań University of Economics and BusinessPoznańPoland
  2. 2.National Technical University “Kharkiv Polytechnic Institute”KharkivUkraine

Personalised recommendations