The Evolution of Power and Standard Wikidata Editors: Comparing Editing Behavior over Time to Predict Lifespan and Volume of Edits

  • Cristina SarasuaEmail author
  • Alessandro Checco
  • Gianluca Demartini
  • Djellel Difallah
  • Michael Feldman
  • Lydia Pintscher


Knowledge bases are becoming a key asset leveraged for various types of applications on the Web, from search engines presenting ‘entity cards’ as the result of a query, to the use of structured data of knowledge bases to empower virtual personal assistants. Wikidata is an open general-interest knowledge base that is collaboratively developed and maintained by a community of thousands of volunteers. One of the major challenges faced in such a crowdsourcing project is to attain a high level of editor engagement. In order to intervene and encourage editors to be more committed to editing Wikidata, it is important to be able to predict at an early stage, whether an editor will or not become an engaged editor. In this paper, we investigate this problem and study the evolution that editors with different levels of engagement exhibit in their editing behaviour over time. We measure an editor’s engagement in terms of (i) the volume of edits provided by the editor and (ii) their lifespan (i.e. the length of time for which an editor is present at Wikidata). The large-scale longitudinal data analysis that we perform covers Wikidata edits over almost 4 years. We monitor evolution in a session-by-session- and monthly-basis, observing the way the participation, the volume and the diversity of edits done by Wikidata editors change. Using the findings in our exploratory analysis, we define and implement prediction models that use the multiple evolution indicators.


Wikidata Knowledge Power editors Standard editors Evolution 



We would like to thank Michele Catasta for his feedback at an early stage of this research, and the rest of the participants of our Dagstuhl Research Meeting “Crowdsourcing Research - Transcending Disciplinary Boundaries”. We also would like to thank Michael Luggen for his help to set up one of the machines used for the experiments of this project. This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 732328, as well as from the COST Action IC1302 - Keystone.

During the manuscript reviewing process, several authors changed their affiliation. Part of the work presented in this paper was carried out while Cristina Sarasua was affiliated with the University of Koblenz-Landau (Germany) and visited the University of Sheffield (United Kingdom), Gianluca Demartini was affiliated with the University of Sheffield (United Kingdom) and Djellel Difallah was affiliated with the University of Fribourg (Switzerland).


  1. Alvarez, Michael R. (2016). Computational Social Science: Discovery and Prediction, Analytical Methods for Social Research. Cambridge: Cambridge University Press.Google Scholar
  2. Ang, Lawrence; and Francis Buttle (2006). Customer Retention Management Processes: A Quantitative Study. European Journal of Marketing, vol. 40, no. 1/2, pp. 83–99.Google Scholar
  3. Clow, Doug (2013). MOOCs and the Funnel of Participation. LAK ’13. Third Conference on Learning Analytics and Knowledge. New York: ACM, pp. 185–189.Google Scholar
  4. Cosley, Dan; Dan Frankowski; Loren Terveen; and John Riedl (2007). SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia. IUI’07. Proceedings of the 12th International Conference on Intelligent User Interfaces, IUI ’07. New York: ACM, pp. 32–41.Google Scholar
  5. Cox, David R. (1992). Regression models and life-tables. Breakthroughs in statistics. Springer, pp. 527–541.Google Scholar
  6. Cuong, To Tu; and Claudia Müller-Birn (2016). SocInfo’16. Applicability of Sequence Analysis Methods in Analyzing Peer-Production Systems: A Case Study in Wikidata. Social Informatics. Berlin: Springer, pp. 142–156.Google Scholar
  7. Danescu-Niculescu-Mizil, Cristian; Robert West; Dan Jurafsky; Jure Leskovec; and Christopher Potts (2013). No Country for Old Members: User Lifecycle and Linguistic Change in Online Communities. WWW 2013. 22nd International World Wide Web Conference, Rio de Janeiro, Brazil, May 13-17, 2013. New York: ACM, pp. 307–318.Google Scholar
  8. Difallah, Djellel; Michele Catasta; Gianluca Demartini; and Philippe Cudré-Mauroux (2014). Scaling-Up the Crowd: Micro-Task Pricing Schemes for Worker Retention and Latency Improvement. HCOMP’14, Second AAAI Conference on Human Computation and Crowdsourcing,. AAAI, pp. 50–58.Google Scholar
  9. Dittus, Martin; Giovanni Quattrone; and Licia Capra (2016). Analysing Volunteer Engagement in Humanitarian Mapping: Building Contributor Communities at Large Scale. CSCW ’16. Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work Social Computing. New York: ACM, pp. 108–118.Google Scholar
  10. Druck, Gregory; Gerome Miklau; and Andrew Mccallum (2008). Learning to Predict the Quality of Contributions to Wikipedia. WikiAI’08. Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy. Palo Alto: AAAI Press, pp. 7–12.Google Scholar
  11. Duhigg, Charles (2012). The Power of Habit: Why We Do What We Do in Life and Business, Vol. 34. Random House.Google Scholar
  12. Fischler, Martin A; and Robert C Bolles (1981). Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, vol. 24, no. 6, pp. 381–395.Google Scholar
  13. Franklin, Michael J.; Donald Kossmann; Tim Kraska; Sukriti Ramesh; and Reynold Xin (2011). CrowdDB: Answering Queries with Crowdsourcing. SIGMOD 2011. Proceedings of the ACM SIGMOD International Conference on Management of Data, Athens, Greece, June 12-16, 2011. New York: ACM, pp. 61–72.Google Scholar
  14. Gandica, Yérali; Joäo Carvalho; and Fernando Sampaio dos Aidos (2015). Wikipedia editing dynamics. Physical Review E, vol. 91, no. 1, pp. 012824.Google Scholar
  15. Geiger, Stuart R.; and Aaron Halfaker (2013). Using Edit Sessions to Measure Participation in Wikipedia. CSCW 2013. Computer Supported Cooperative Work, San Antonio, TX, USA, February 23-27, 2013. New York: ACM, pp. 861–870.Google Scholar
  16. Gordini, Niccolo; and Valerio Veglio (2017). Customers Churn Prediction And Marketing Retention Strategies. An Application of Support Vector Machines Based On the Auc Parameter-Selection Technique In B2B E-Commerce Industry. Industrial Marketing Management, vol. 62 pp. 100–107.Google Scholar
  17. Halfaker, Aaron; Aniket Kittur; and John Riedl (2011). Don’t Bite the Newbies: How Reverts Affect the Quantity and Quality of Wikipedia Work. Proceedings of the 7th International Symposium on Wikis and Open Collaboration, 2011, Mountain View, CA, USA, October 3-5, 2011. New York: ACM, pp. 163–172.Google Scholar
  18. Halfaker, Aaron; Oliver Keyes; and Dario Taraborelli (2013). Making Peripheral Participation Legitimate: Reader Engagement Experiments in Wikipedia. CSCW 2013. Computer Supported Cooperative Work, San Antonio, TX, USA, February 23-27, 2013. New York: ACM, pp. 849–860.Google Scholar
  19. Huang, Shih-Wen; and Wai-Tat Fu (2013). Don’t Hide in the Crowd!: Increasing Social Transparency Between PeerWorkers Improves Crowdsourcing Outcomes. CHI ’13. ACM SIGCHI Conference on Human Factors in Computing Systems, Paris, France, April 27 - May 2, 2013. New York: ACM, pp. 621–630.Google Scholar
  20. Iba, Takashi; Keiichi Nemoto; Bernd Peters; and Peter A. Gloor (2010). Analyzing the Creative Editing Behavior ofWikipedia Editors Through Dynamic Social Network Analysis. Procedia - Social and Behavioral Sciences, vol. 2, no. 4, pp. 6441–6456.Google Scholar
  21. Lintott, Chris J; Kevin Schawinski; Slosar Anže; Land Kate; Bamford Steven; Thomas Daniel; M. Raddick Jordan; C Nichol Robert; Szalay Alex; Andreescu Dan; et al (2008). Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society, vol. 389, no. 3, pp. 1179–1189.Google Scholar
  22. Müller-Birn, Claudia; Benjamin Karran; Janette Lehmann; and Markus Luczak-Rösch (2015). Peer-production System or Collaborative Ontology Engineering Effort: What is Wikidata? OpenSym’15. Proceedings of the 11th International Symposium on Open Collaboration. New York: ACM, pp. 20:1–20:10.Google Scholar
  23. Michie, Susan; Maartje M van Stralen; and Robert West (2011). The Behaviour Change Wheel: A New Method for Characterising and Designing Behaviour Change Interventions. Implementation Science, vol. 6, no. 1, pp. 42.Google Scholar
  24. Nov, Oded (2007). What MotivatesWikipedians? Communications of the ACM, vol. 50, no. 11, pp. 60–64.Google Scholar
  25. Panciera, Katherine; Aaron Halfaker; and Loren Terveen (2009). Wikipedians Are Born, Not Made: A Study of Power Editors on Wikipedia. Proceedings of the ACM 2009 International Conference on Supporting Group Work. New York: ACM, pp. 51–60.Google Scholar
  26. Piscopo, Alessandro; Christopher Phethean; and Elena Simperl (2016). Wikidatians are born: paths to full participation in a collaborative structured knowledge base. HICSS 2017. 50th Hawaii International Conference on System Sciences, Hilton Waikoloa Village, Hawaii, USA, January 4-7, 2017. AIS Electronic Library (AISeL), pp. 4354–4363.Google Scholar
  27. Ponciano, Lesandro; and Francisco Brasileiro (2014). Finding Volunteers’ Engagement Profiles in Human Computation for Citizen Science Projects. Human Computation, vol. 1, no. 2,.Google Scholar
  28. Rosenberg, Larry J; and John A. Czepiel (1984). A Marketing Approach for Customer Retention. Journal of Consumer Marketing, vol. 1, no. 2, pp. 45–51.Google Scholar
  29. Ryan, Richard M; and Edward L Deci (2000). Self-determination Theory and the Facilitation of Intrinsic Motivation, Social Development, and Well-being. American Psychologist, vol. 55, no. 1, pp. 68.Google Scholar
  30. Sarabadani, Amir; Aaron Halfaker; and Dario Taraborelli (2017). Building automated vandalism detection tools for Wikidata. WWW 2017. Proceedings of the 26th International Conference on World Wide Web Companion. pp. 1647–1654.Google Scholar
  31. Schmachtenberg, Max; Christian Bizer; and Heiko Paulheim (2014). Adoption of the Linked Data Best Practices in Different Topical Domains. ISWC 2014, The Semantic Web - 13th International SemanticWeb Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I. Berlin: Springer, pp. 245–260.Google Scholar
  32. Shannon, Claude Elwood (2001). A Mathematical Theory of Communication. ACM SIGMOBILE Mobile Computing and Communications Review, vol. 5, no. 1, pp. 3–55.Google Scholar
  33. Singer, Philipp; Denis Helic; Andreas Hotho; and Markus Strohmaier (2015). Hyptrails: A Bayesian Approach for Comparing Hypotheses about Human Trails on the Web. WWW 2015. Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, May 18-22, 2015. New York: ACM, pp. 1003–1013.Google Scholar
  34. Stewart, Osamuyimen; David Lubensky; and Juan M. Huerta (2010). Crowdsourcing Participation Inequality: A SCOUT Model for the Enterprise Domain. HCOMP’10. Proceedings of the ACM SIGKDD Workshop on Human Computation. New York: ACM, pp. 30–33.Google Scholar
  35. Strohmaier, Markus; and Claudia Wagner (2014). Computational Social Science for the World Wide Web. IEEE Intelligent Systems, vol. 29, no. 5, pp. 84–88.Google Scholar
  36. Verhoef, Peter C. (2003). Understanding the Effect of Customer Relationship Management Efforts on Customer Retention and Customer Share Development. Journal of Marketing, vol. 67, no. 4, pp. 30–45.Google Scholar
  37. Vrandečić, Denny; and Markus Krötzsch (2014). Wikidata: a Free Collaborative Knowledge Base. Communications of the ACM, vol. 57, no. 10, pp. 78–85.Google Scholar
  38. Walk, Simon; Denis Helic; Florian Geigl; and Markus Strohmaier (2016). Activity Dynamics in Collaboration Networks. ACM Transactions on the Web (TWEB), vol. 10, no. 2, pp. 11.Google Scholar
  39. Walk, Simon; Philipp Singer; Lisette Espín Noboa; Tania Tudorache; Mark A. Musen; and Markus Strohmaier (2015). Understanding How Users Edit Ontologies: Comparing Hypotheses About Four Real-World Projects. ISWC 2015. Proceedings of the 14th International Conference on The Semantic Web - ISWC 2015 - Volume 9366. Springer-Verlag New York, Inc., pp. 551–568.Google Scholar
  40. West, Robert; Ingmar Weber; and Carlos Castillo (2012). A Data-driven Sketch of Wikipedia Editors. WWW 2012. Proceedings of the 21st World Wide Web Conference, Lyon, France, April 16-20, 2012 (Companion Volume). New York: ACM, pp. 631–632.Google Scholar
  41. Wulczyn, Ellllery; Robert West; Leila Zia; and Jure Leskovec (2016). Growing Wikipedia Across Languages via Recommendation. WWW 2016. Proceedings of the 25th International Conference onWorldWideWeb, Montreal, Canada, April 11 - 15, 2016. New York: ACM, pp. 975–985.Google Scholar
  42. Yasseri, Taha; Robert Sumi; and János Kertész (2012). Circadian Patterns of Wikipedia Editorial Activity: A Demographic Analysis. PLoS ONE, vol. 7, no. 1, pp. 1–8.Google Scholar
  43. Zaveri, Amrapali; Anisa Rula; Andrea Maurino; Ricardo Pietrobon; Jens Lehmann; and Sören Auer (2016). Quality assessment for linked open data: A survey. Semantic Web Journal, vol. 7, no. 1, pp. 63–93.Google Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  • Cristina Sarasua
    • 1
    Email author
  • Alessandro Checco
    • 2
  • Gianluca Demartini
    • 3
  • Djellel Difallah
    • 4
  • Michael Feldman
    • 1
  • Lydia Pintscher
    • 5
  1. 1.University of ZurichZurichSwitzerland
  2. 2.University of SheffieldSheffieldUK
  3. 3.University of QueenslandQueenslandAustralia
  4. 4.New York UniversityNew YorkUSA
  5. 5.Wikimedia DeutschlandDeutschlandGermany

Personalised recommendations