Enrichment of Information in Multilingual Wikipedia Based on Quality Analysis

Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 303)


Despite the fact that Wikipedia is one of the most popular sources of information in the world, it is often criticized for the poor quality of content. In this online encyclopaedia articles on the same topic can be created and edited independently in different languages. Some of this language versions can provide valuable information on a specific topics. Wikipedia articles may include infobox, which used to collect and present a subset of important information about its subject. This study presents method for quality assessment of Wikipedia articles and information contained in their infoboxes. Choosing the best language versions of a particular article will allow for enrichment of information in less developed version editions of particular articles.


Wikipedia Article quality Infobox DBpedia 


  1. 1.
    Hodel-Widmer, T.B., Dittrich, K.R.: Concept and prototype of a collaborative business process environment for document processing. Data Knowl. Eng. 52(1), 61–120 (2005)CrossRefGoogle Scholar
  2. 2.
    Oeberst, A., Cress, U., Back, M., Nestler, S.: Individual versus collaborative information processing: the case of biases in Wikipedia. In: Cress, U., Moskaliuk, J., Jeong, H. (eds.) Mass Collaboration and Education. CCLS, vol. 16, pp. 165–185. Springer, Cham (2016). doi: 10.1007/978-3-319-13536-6_9 CrossRefGoogle Scholar
  3. 3.
    Staub, T., Hodel, T.: Wikipedia vs. academia: an investigation into the role of the internet in education, with a special focus on Wikipedia. Univ. J. Educ. Res. 4(2), 349–354 (2016)CrossRefGoogle Scholar
  4. 4.
    Callahan, E.S., Herring, S.C.: Cultural bias in wikipedia content on famous persons. J. Am. Soc. Inform. Sci. Technol. 62(10), 1899–1915 (2011)CrossRefGoogle Scholar
  5. 5.
    Bao, P., Hecht, B., Carton, S., Quaderi, M., Horn, M., Gergle, D.: Omnipedia: bridging the Wikipedia language gap. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1075–1084. ACM (2012)Google Scholar
  6. 6.
    Węcel, K., Lewoniewski, W.: Modelling the quality of attributes in Wikipedia infoboxes. In: Abramowicz, W. (ed.) BIS 2015. LNBIP, vol. 228, pp. 308–320. Springer, Cham (2015). doi: 10.1007/978-3-319-26762-3_27 CrossRefGoogle Scholar
  7. 7.
    Stvilia, B., Twidale, M.B., Smith, L.C., Gasser, L.: Assessing information quality of a community-based encyclopedia. In: Proceedings of the ICIQ, pp. 442–454 (2005)Google Scholar
  8. 8.
    Blumenstock, J.E.: Size matters: word count as a measure of quality on Wikipedia. In: WWW, pp. 1095–1096 (2008)Google Scholar
  9. 9.
    Warncke-Wang, M., Cosley, D., Riedl, J.: Tell me more: an actionable quality model for Wikipedia. In: WikiSym 2013, pp. 1–10 (2013)Google Scholar
  10. 10.
    Anderka, M.: Analyzing and predicting quality flaws in user-generated content: the case of Wikipedia. Ph.D., Bauhaus-Universitaet Weimar Germany(2013)Google Scholar
  11. 11.
    Lipka, N., Stein, B.: Identifying featured articles in Wikipedia: writing style matters. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1147–1148 (2010)Google Scholar
  12. 12.
    Horn, C., Zhila, A., Gelbukh, A., Kern, R., Lex, E.: Using factual density to measure informativeness of web documents. In: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013), 22–24 May 2013, Oslo University, Norway. NEALT Proceedings Series 16, vol. 085, pp. 227–238. Linköping University Electronic Press (2013)Google Scholar
  13. 13.
    Khairova, N., Lewoniewski, W., Węcel, K.: Estimating the quality of articles in russian wikipedia using the logical-linguistic model of fact extraction. In: Abramowicz, W. (ed.) BIS 2017. LNBIP, vol. 288, pp. 28–40. Springer, Cham (2017). doi: 10.1007/978-3-319-59336-4_3 CrossRefGoogle Scholar
  14. 14.
    Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Stein, B., Granitzer, M.: Measuring the quality of web content using factual information. In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality - WebQuality 2012, p. 7 (2012)Google Scholar
  15. 15.
    Wu, G., Harrigan, M., Cunningham, P.: Characterizing Wikipedia pages using edit network motif profiles. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 45–52. ACM (2011)Google Scholar
  16. 16.
    Velázquez, C.G., Cagnina, L.C., Errecalde, M.L.: On the feasibility of external factual support as wikipedia’s quality metric. Procesamiento del Lenguaje Natural 58, 93–100 (2017)Google Scholar
  17. 17.
    Lewoniewski, W., Węcel, K., Abramowicz, W.: Quality and importance of Wikipedia articles in different languages. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 613–624. Springer, Cham (2016). doi: 10.1007/978-3-319-46254-7_50 CrossRefGoogle Scholar
  18. 18.
    Lewoniewski, W., Węcel, K., Abramowicz, W.: Analiza porównawcza modeli jakości informacji w narodowych wersjach Wikipedii. In: Poręebska-Miąc, T. (eds.) Systemy Wspomagania Organizacji SWO 2015. Wydawnictwo Uniwersytetu Ekonomicznego w Katowicach, pp. 133–154 (2015)Google Scholar
  19. 19.
    Dang, Q.V., Ignat, C.L.: Quality assessment of Wikipedia articles without feature engineering. In: 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pp. 27–30. IEEE (2016)Google Scholar
  20. 20.
    Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P.: A general multiview framework for assessing the quality of collaboratively created content on web 2.0. J. Assoc. Inf. Sci. Technol. 68(2), 286–308 (2017)CrossRefGoogle Scholar
  21. 21.
    Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semant. Web J., 3813–3842 (2016)Google Scholar
  22. 22.
    Kontokostas, D., Westphal, P., Auer, S., Hellmann, S., Lehmann, J., Cornelissen, R., Zaveri, A.: Test-driven evaluation of linked data quality. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 747–758. ACM (2014)Google Scholar
  23. 23.
    Mihindukulasooriya, N., Rico, M., García-Castro, R., Gómez-Pérez, A.: An analysis of the quality issues of the properties available in the Spanish DBpedia. In: Puerta, J.M., Gámez, J.A., Dorronsoro, B., Barrenechea, E., Troncoso, A., Baruque, B., Galar, M. (eds.) CAEPIA 2015. LNCS, vol. 9422, pp. 198–209. Springer, Cham (2015). doi: 10.1007/978-3-319-24598-0_18 CrossRefGoogle Scholar
  24. 24.
    Jang, S., Megawati, M., Choi, J., Yi, M.: Semi-automatic quality assessment of linked data without requiring ontology. In: NLP-DBPEDIA@ ISWC, pp. 45–55 (2015)Google Scholar
  25. 25.
    Tacchini, E., Schultz, A., Bizer, C.: Experiments with Wikipedia cross-language data fusion. In: Workshop on Scripting and Development (2009)Google Scholar
  26. 26.
    Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops, EDBT-ICDT 2012, pp. 116–123. ACM, New York (2012)Google Scholar
  27. 27.
    Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRefGoogle Scholar
  28. 28.
    Lewoniewski, W., Węcel, K., Abramowicz, W.: Analysis of references across Wikipedia languages. In: Proceedings of the Information and Software Technologies: 23rd International Conference, ICIST 2017, Druskininkai, Lithuania, 12–14 October 2017 (2017). doi: 10.1007/978-3-319-67642-5_47
  29. 29.
    Lange, D., Böhm, C., Naumann, F.: Extracting structured information from Wikipedia articles to populate infoboxes. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1661–1664. ACM, New York (2010)Google Scholar
  30. 30.
    Schmidt, R., Möhring, M., Härting, R.-C., Zimmermann, A., Heitmann, J., Blum, F.: Leveraging textual information for improving decision-making in the business process lifecycle. In: Neves-Silva, R., Jain, L.C., Howlett, R.J. (eds.) Intelligent Decision Technologies. SIST, vol. 39, pp. 563–574. Springer, Cham (2015). doi: 10.1007/978-3-319-19857-6_48 Google Scholar
  31. 31.
    Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Automatic expansion of DBpedia exploiting Wikipedia cross-language information. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 397–411. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-38288-8_27 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Poznań University of Economics and BusinessPoznańPoland

Personalised recommendations