Abstract
We present the method of estimating the quality of articles in Russian Wikipedia that is based on counting the number of facts in the article. For calculating the number of facts we use our logical-linguistic model of fact extraction. Basic mathematical means of the model are logical-algebraic equations of the finite predicates algebra. The model allows extracting of simple and complex types of facts in Russian sentences. We experimentally compare the effect of the density of these types of facts on the quality of articles in Russian Wikipedia. Better articles tend to have a higher density of facts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
We use ‘Subject’, ‘Object’ and ‘Predicate’ with the first upper-case letters to denote the element of a fact triplet Subject \(\text {-}{>}\) Predicate \(\text {-}{>}\) Object.
- 4.
- 5.
References
Anderka, M.: Analyzing and predicting quality flaws in user-generated content: the case of Wikipedia. PhD, Bauhaus-Universitaet Weimar Germany (2013)
Lipka, N., Stein, B.: Identifying featured articles in wikipedia: writing style matters. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1147–1148 (2010)
Khairova, N.F., Petrasova, S., Gautam, A.P.S.: The logical-linguistic model of fact extraction from English texts. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 625–635. Springer, Cham (2016). doi:10.1007/978-3-319-46254-7_51
Arthur, J.D., Stevens, K.T.: Document quality indicators: a framework for assessing documentation adequacy. J. Softw. Maint. Res. Pract. 4(3), 129–142 (1992)
Knight, S.A., Burn, J.: Developing a framework for assessing information quality on the world wide web. Informing Sci. J. 8, 159–172 (2005)
Shpak, O., Löwe, W., Wingkvist, A., Ericsson, M.: A method to test the information quality of technical documentation on websites. In: 2014 14th International Conference on Quality Software, pp. 296–304, October 2014
Lex, E., Juffinger, A., Granitzer, M.: Objectivity classification in online media. In: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, HT 2010, pp. 293–294. ACM, New York (2010)
Weber, N., Schoefegger, K., Bimrose, J., Ley, T., Lindstaedt, S., Brown, A., Barnes, S.-A.: Knowledge maturing in the semantic mediawiki: a design study in career guidance. In: Cress, U., Dimitrova, V., Specht, M. (eds.) EC-TEL 2009. LNCS, vol. 5794, pp. 700–705. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04636-0_71
Blumenstock, J.E.: Size matters: word count as a measure of quality on wikipedia. In: WWW, pp. 1095–1096 (2008)
Wingkvist, A., Ericsson, M., Löwe, W.: Making sense of technical information quality - a software-based approach measuring the quality of technical data depends on developing models from which metrics can be extracted and analyzed. Using an open source tool the authors describe one approach to this (2012)
Fellbaum, C.: Wordnet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Stein, B., Granitzer, M.: Measuring the quality of web content using factual information. In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality - WebQuality 2012, p. 7 (2012)
Horn, C., Zhila, A., Gelbukh, A., Kern, R., Lex, E.: Using factual density to measure informativeness of web documents. In: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013). NEALT Proceedings Series 16, Oslo University, Norway, 22–24 May 2013, Number 085, pp. 227–238. Linköping University Electronic Press (2013)
Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Commun. ACM 51(12), 68–74 (2008)
Eugene, A., Luis, G.: Extracting relations from large plain-text collections. In: Proceedings of ACM 2000 (2000)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics (2011)
Bondarenko, M., Shabanov-Kushnarenko, J.: The intelligence theory. In: SMIT, Kharkiv, p. 576 (2007)
Petrasova, S., Khairova, N.: Automatic identification of collocation similarity. In: 2015 Xth International Scientific and Technical Conference, Computer Sciences and Information Technologies (CSIT), pp. 136–138, September 2015
Fillmore, C.J.: The case for case. In: Bach, E., Harms, R. (eds.) Universals in Linguistic Theory. Holt, Rinehart, and Winston, London (1968)
Osborne, T., Gross, T.: Constructions are catenae: construction grammar meets dependency grammar. Cogn. Linguist. 23(1), 165–216 (2012)
Węcel, K., Lewoniewski, W.: Modelling the quality of attributes in wikipedia infoboxes. In: Abramowicz, W. (ed.) BIS 2015. LNBIP, vol. 228, pp. 308–320. Springer, Cham (2015). doi:10.1007/978-3-319-26762-3_27
Lewoniewski, W., Węcel, K., Abramowicz, W.: Quality and importance of wikipedia articles in different languages. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 613–624. Springer, Cham (2016). doi:10.1007/978-3-319-46254-7_50
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Khairova, N., Lewoniewski, W., Węcel, K. (2017). Estimating the Quality of Articles in Russian Wikipedia Using the Logical-Linguistic Model of Fact Extraction. In: Abramowicz, W. (eds) Business Information Systems. BIS 2017. Lecture Notes in Business Information Processing, vol 288. Springer, Cham. https://doi.org/10.1007/978-3-319-59336-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-59336-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59335-7
Online ISBN: 978-3-319-59336-4
eBook Packages: Computer ScienceComputer Science (R0)