Skip to main content

Using Morphological and Semantic Features for the Quality Assessment of Russian Wikipedia

  • Conference paper
  • First Online:
Information and Software Technologies (ICIST 2017)

Abstract

Nowadays, the assessment of the quality and credibility of Wikipedia articles becomes increasingly important. We propose to use morphological and semantic features to estimate the quality of Wikipedia articles in Russian language. We distinguished over 150 linguistic features and divided them into four groups. In these groups, we considered the features of encyclopedic style, readability and subjectivism of the article’s text. Based on Random Forest as a classification algorithm, we show the most importance linguistic features that affect the quality of Russian Wikipedia articles. We compare the classification results of our four linguistic features groups separately. We have achieved the F-measure of 89,75%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://meta.wikimedia.org/wiki/List_of_Wikipedias.

  2. 2.

    https://analytics.wikimedia.org.

  3. 3.

    http://www.alexa.com/siteinfo/wikipedia.org.

  4. 4.

    http://wikirank.net.

  5. 5.

    https://en.wikipedia.org/wiki/Wikipedia:Featured_article_criteria.

  6. 6.

    http://pymorphy2.readthedocs.io.

  7. 7.

    http://opencorpora.org.

  8. 8.

    http://www.ruscorpora.ru/en/.

  9. 9.

    Detailed definitions of the simple and complex facts are given in [13].

References

  1. Michael, B.: Wikipedia Or Encyclopædia Britannica: Which Has More Bias? Forbes (2015). http://www.forbes.com/sites/hbsworkingknowledge/2015/01/20/wikipedia-or-encyclopaedia-britannica-which-has-more-bias. Accessed 15 June 2017

  2. Xu, Y., Luo, T.: Measuring article quality in Wikipedia: Lexical clue model. In Web Society (SWS). In: 2011 3rd Symposium on IEEE, pp. 141–146 (2011)

    Google Scholar 

  3. Anderka, M.: Analyzing and predicting quality flaws in user-generated content: the case of wikipedia. Ph.D., Bauhaus-Universitaet Weimar Germany (2013)

    Google Scholar 

  4. Kittur, A., Kraut, R.E.: Harnessing the wisdom of crowds in wikipedia: quality through coordination. In: Proceedings of the 2008 ACM conference on Computer Supported Cooperative Work, pp. 37–46. ACM (2008)

    Google Scholar 

  5. Velázquez, C.G., Cagnina, L.C., Errecalde, M.L.: On the feasibility of external factual support as Wikipedia’s quality metric. Procesamiento del Lenguaje Natural 58, 93–100 (2017)

    Google Scholar 

  6. Lipka, N., Stein, B.: Identifying featured articles in wikipedia: writing style matters. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1147–1148 (2010)

    Google Scholar 

  7. Khairova, N., Petrasova, S., Gautam, A.: The logical-linguistic model of fact extraction from english texts. In: International Conference on Information and Software Technologies, CCIS 2016, Communications in Computer and Information Science, pp. 625–635 (2016)

    Google Scholar 

  8. Warncke-Wang, M., Cosley, D., Riedl, J.: Tell me more: an actionable quality model for Wikipedia. In: Proceedings of the 9th International Symposium on Open Collaboration (2013)

    Google Scholar 

  9. Giles, G.: Internet encyclopaedias go head to head. Nature 438, 900–901 (2005)

    Article  Google Scholar 

  10. Panicheva, P., Ledovaya, Y., Bogolyubova, O.: Lexical, morphological and semantic correlates of the dark triad personality traits in russian facebook texts. In: Artificial Intelligence and Natural Language Conference (AINL), pp. 1–8. IEEE (2016)

    Google Scholar 

  11. Lenzner, T.: Are readability formulas valid tools for assessing survey question difficulty? Sociol. Methods Res. 43(4), 677–698 (2014)

    Article  MathSciNet  Google Scholar 

  12. Sharoff, S., Umanskaya, E., Wilson, J.: A frequency dictionary of Russian: core vocabulary for learners, Routledge (2014)

    Google Scholar 

  13. Khairova, N., Lewoniewski, W., Wecel, K.: Estimating the quality of articles in russian Wikipedia using the logical-linguistic model of fact extraction. In: International Conference on Business Information Systems, pp. 28–42 (2017)

    Google Scholar 

  14. Węcel, K., Lewoniewski, W.: Modelling the quality of attributes in wikipedia infoboxes. In: Abramowicz, W. (ed.) BIS 2015. LNBIP, vol. 228, pp. 308–320. Springer, Cham (2015). doi:10.1007/978-3-319-26762-3_27

    Chapter  Google Scholar 

  15. Lewoniewski, W., Węcel, K., Abramowicz, W.: Quality and importance of wikipedia articles in different languages. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 613–624. Springer, Cham (2016). doi:10.1007/978-3-319-46254-7_50

    Chapter  Google Scholar 

  16. Rebuschat, P.E., Detmar, M., McEnery, T.: Language learning research at the intersection of experimental, computational and corpus-based approaches, Language Learning (2017)

    Google Scholar 

  17. Wu, G., Harrigan, M., Cunningham, P.: Characterizing wikipedia pages using edit network motif profiles. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 45–52. ACM (2011)

    Google Scholar 

  18. Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Granitzer, M.: Measuring the quality of web content using factual information, In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality, pp. 7–10. ACM (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Włodzimierz Lewoniewski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lewoniewski, W., Khairova, N., Węcel, K., Stratiienko, N., Abramowicz, W. (2017). Using Morphological and Semantic Features for the Quality Assessment of Russian Wikipedia. In: Damaševičius, R., Mikašytė, V. (eds) Information and Software Technologies. ICIST 2017. Communications in Computer and Information Science, vol 756. Springer, Cham. https://doi.org/10.1007/978-3-319-67642-5_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67642-5_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67641-8

  • Online ISBN: 978-3-319-67642-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics