What Makes a Good Collaborative Knowledge Graph: Group Composition and Quality in Wikidata

  • Alessandro PiscopoEmail author
  • Chris Phethean
  • Elena Simperl
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10539)


Wikidata is a community-driven knowledge graph which has drawn much attention from researchers and practitioners since its inception in 2012. The large user pool behind this project has been able to produce information spanning over several domains, which is openly released and can be reused to feed any information-based application. Collaborative production processes in Wikidata have not yet been explored. Understanding them is key to prevent potentially harmful community dynamics and ensure the sustainability of the project in the long run. We performed a regression analysis to investigate how the contribution of different types of users, i.e. bots and human editors, registered or anonymous, influences outcome quality in Wikidata. Moreover, we looked at the effects of tenure and interest diversity among registered users. Our findings show that a balanced contribution of bots and human editors positively influence outcome quality, whereas higher numbers of anonymous edits may hinder performance. Tenure and interest diversity within groups also lead to higher quality. These results may be helpful to identify and address groups that are likely to underperform in Wikidata. Further work should analyse in detail the respective contributions of bots and registered users.


Wikidata Collaborative knowledge graphs Group composition 



This project is supported by funding received from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 642795 (WDAqua ITN).


  1. 1.
    Adler, B.T., de Alfaro, L.: A content-driven reputation system for the Wikipedia. In: Williamson, C.L., Zurko, M.E., Patel-Schneider, P.F., Shenoy, P.J. (eds.) Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, 8–12 May 2007, pp. 261–270. ACM (2007).
  2. 2.
    Ancona, D.G., Caldwell, D.F.: Demography and design: predictors of new product team performance. Organ. Sci. 3(3), 321–341 (1992)CrossRefGoogle Scholar
  3. 3.
    Arazy, O., Nov, O., Patterson, R., Yeo, L.: Information quality in Wikipedia: the effects of group composition and task conflict. J. Manag. Inf. Syst. 27(4), 71–98 (2011)CrossRefGoogle Scholar
  4. 4.
    Bedeian, A.G., Mossholder, K.W.: On the use of the coefficient of variation as a measure of diversity. Organ. Res. Methods 3(3), 285–297 (2000)CrossRefGoogle Scholar
  5. 5.
    Bender, R., Grouven, U.: Ordinal logistic regression in medical research. J. Roy. Coll. Phys. Lond. 31(5), 546–551 (1997)Google Scholar
  6. 6.
    Brant, R.: Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics 1171–1178 (1990)Google Scholar
  7. 7.
    Brasileiro, F., Almeida, J.P.A., Carvalho, V.A., Guizzardi, G.: Applying a multi-level modeling theory to assess taxonomic hierarchies in Wikidata. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 975–980. International World Wide Web Conferences Steering Committee (2016)Google Scholar
  8. 8.
    Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivity and member withdrawal in online volunteer groups. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems - CHI 2010, p. 821. ACM, New York, April 2010Google Scholar
  9. 9.
    Erxleben, F., Günther, M., Krötzsch, M., Mendez, J., Vrandečić, D.: Introducing Wikidata to the linked data web. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 50–65. Springer, Cham (2014). doi: 10.1007/978-3-319-11964-9_4 Google Scholar
  10. 10.
    Färber, M., Bartscherer, F., Menne, C., Rettinger, A.: Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semantic Web (Preprint), pp. 1–53 (2016)Google Scholar
  11. 11.
    Färber, M., Ell, B., Menne, C., Rettinger, A.: A comparative survey of DBpedia, freebase, OpenCyc Wikidata and YAGO. Seman. Web 1, 1–5 (2015)Google Scholar
  12. 12.
    Harrison, D.A., Klein, K.J.: What’s the difference? Diversity constructs as separation, variety, or disparity in organizations. Acad. Manag. Rev. 32(4), 1199–1228 (2007)CrossRefGoogle Scholar
  13. 13.
    Haythornthwaite, C.: Crowds and communities: light and heavyweight models of peer production. In: Proceedings of the 42nd Annual Hawaii International Conference on System Sciences, HICSS (2009)Google Scholar
  14. 14.
    Jehn, K.A., Northcraft, G.B., Neale, M.A.: Why differences make a difference: a field study of diversity, conflict, and performance in workgroups. Adm. Sci. Q. 44(4), 741–763 (1999)CrossRefGoogle Scholar
  15. 15.
    Kittur, A., Kraut, R.E.: Harnessing the wisdom of crowds in Wikipedia: quality through coordination. In: Proceedings of the ACM 2008 Conference on Computer Supported Cooperative Work - CSCW 2008, p. 37 (2008)Google Scholar
  16. 16.
    Lam, S.K., Karim, J., Riedl, J.: The effects of group composition on decision quality in a social production community. In: Proceedings of the 16th ACM International Conference on Supporting Group Work - GROUP 2010, p. 55 (2010)Google Scholar
  17. 17.
    Levine, J.M., Moreland, R.L.: Progress in small group research. Ann. Rev. Psychol. 41(1), 585–634 (1990)CrossRefGoogle Scholar
  18. 18.
    Lukyanenko, R., Parsons, J., Wiersma, Y.F.: The IQ of the crowd: understanding and improving information quality in structured user-generated content. Inf. Syst. Res. 25(4), 669–689 (2014).
  19. 19.
    Milliken, F.J., Martins, L.L.: Searching for common threads: understanding the multiple effects of diversity in organizational groups. Acad. Manag. Rev. 21(2), 402–433 (1996)Google Scholar
  20. 20.
    Moreland, R.L., Levine, J.M.: Socialization in organizations and work groups. In: Groups at Work: Theory and Research, p. 69 (2014)Google Scholar
  21. 21.
    Müller-Birn, C., Karran, B., Lehmann, J., Luczak-Roesch, M.: Peer-production system or collaborative ontology development effort: what is Wikidata? In: OpenSym 2015 - Conference on Open Collaboration, San Francisco, US, 19–21 August 2015 (2015)Google Scholar
  22. 22.
    Niederer, S., van Dijck, J.: Wisdom of the crowd or technicity of content? Wikipedia as a sociotechnical system. New Media Soc. 12(8), 1368–1387 (2010).
  23. 23.
    Pelled, L.H., Eisenhardt, K.M., Xin, K.R.: Exploring the black box: an analysis of work group diversity, conflict, and performance. Adm. Sci. Q. 44(1), 1–28 (1999)CrossRefGoogle Scholar
  24. 24.
    Piscopo, A., Phethean, C., Simperl, E.: Wikidatians are born: paths to full participation in a collaborative structured knowledge base. In: 50th Hawaii International Conference on System Sciences, HICSS 2017, Hilton Waikoloa Village, Hawaii, USA, 4–7 January 2017. AIS Electronic Library (AISeL) (2017)Google Scholar
  25. 25.
    Ribón, I.T., Vidal, M., Kämpgen, B., Sure-Vetter, Y.: GADES: a graph-based semantic similarity measure. In: SEMANTICS, pp. 101–104. ACM (2016)Google Scholar
  26. 26.
    Staab, S., Studer, R.: Handbook on Ontologies. Springer Science & Business Media, Heidelberg (2013)Google Scholar
  27. 27.
    Steiner, T.: Bots vs. wikipedians, anons vs. logged-ins. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, pp. 547–548. International World Wide Web Conferences Steering Committee (2014)Google Scholar
  28. 28.
    Surowiecki, J.: The Wisdom of Crowds. Anchor, Daman (2005)Google Scholar
  29. 29.
    Thakkar, H., Endris, K.M., Garica, J.M., Debattista, J., Lange, C., Auer, S.: Are linked datasets fit for open-domain question answering? A quality assessment. In: Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics (WIMS16). ACM (2016)Google Scholar
  30. 30.
    Van Knippenberg, D., De Dreu, C.K., Homan, A.C.: Work group diversity and group performance: an integrative model and research agenda. J. Appl. Psychol. 89(6), 1008 (2004)CrossRefGoogle Scholar
  31. 31.
    Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)CrossRefGoogle Scholar
  32. 32.
    Wagner, C.: Wiki: a technology for conversational knowledge management and group collaboration. Commun. Assoc. Inf. Syst. 13(1), 58 (2004)Google Scholar
  33. 33.
    Yapinus, G., Sarabadani, A., Halfaker, A.: Wikidata item quality labels (2017).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Alessandro Piscopo
    • 1
    Email author
  • Chris Phethean
    • 1
  • Elena Simperl
    • 1
  1. 1.University of SouthamptonSouthamptonUK

Personalised recommendations