The Evolution of Power and Standard Wikidata Editors: Comparing Editing Behavior over Time to Predict Lifespan and Volume of Edits

Abstract

Knowledge bases are becoming a key asset leveraged for various types of applications on the Web, from search engines presenting ‘entity cards’ as the result of a query, to the use of structured data of knowledge bases to empower virtual personal assistants. Wikidata is an open general-interest knowledge base that is collaboratively developed and maintained by a community of thousands of volunteers. One of the major challenges faced in such a crowdsourcing project is to attain a high level of editor engagement. In order to intervene and encourage editors to be more committed to editing Wikidata, it is important to be able to predict at an early stage, whether an editor will or not become an engaged editor. In this paper, we investigate this problem and study the evolution that editors with different levels of engagement exhibit in their editing behaviour over time. We measure an editor’s engagement in terms of (i) the volume of edits provided by the editor and (ii) their lifespan (i.e. the length of time for which an editor is present at Wikidata). The large-scale longitudinal data analysis that we perform covers Wikidata edits over almost 4 years. We monitor evolution in a session-by-session- and monthly-basis, observing the way the participation, the volume and the diversity of edits done by Wikidata editors change. Using the findings in our exploratory analysis, we define and implement prediction models that use the multiple evolution indicators.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Notes

  1. 1.

    Wikidata’s Phabricator Ticketing System https://phabricator.wikimedia.org/tag/wikidata/

  2. 2.

    Wikidata Game https://tools.wmflabs.org/wikidata-game/distributed/

  3. 3.

    Wikidata Revolution presentation at Wikimania 2017, in August 2017 https://wikimania2017.wikimedia.org/wiki/Submissions/The_(Wiki)Data_(R)Evolution

  4. 4.

    According to Wikimedia Foundation, an active user is “A user with 5+ edits in the main namespace of a given project over the last 30 days” https://meta.wikimedia.org/wiki/Research:Metrics

  5. 5.

    Retention Science https://go.retentionscience.com/hubfs/Documents/Retention_Science_Predicting_Customer_Churn_Guide.pdf?t=1507690636756

  6. 6.

    In 2015 Google announced the port of the Freebase knowledge base to Wikidata.

  7. 7.

    Wikidata Wiki dump https://dumps.wikimedia.org/other/incr/wikidatawiki/

  8. 8.

    Wikibase actions https://www.mediawiki.org/wiki/Wikibase/API/de

  9. 9.

    Preprocessed Wikidata History https://github.com/criscod/wikidata_editors_evolution_jcscw2018

  10. 10.

    Research Metrics https://meta.wikimedia.org/wiki/Research:Metrics

  11. 11.

    Wikibase actions https://www.mediawiki.org/wiki/Wikibase/API

  12. 12.

    The Random Forest parameters chosen are: 100 estimators and bootstrap technique with subsample class balancing.

  13. 13.

    Causes to drop out in Wikipedia by the community https://www.wikizero.com/en/Wikipedia:WikiProject_Editor_Retention

  14. 14.

    Wikipedia Mentorship https://en.wikipedia.org/wiki/Wikipedia:Mentorship

  15. 15.

    This idea, together with the major findings of this research were presented in a talk at WikidataCon https://goo.gl/vKH1kj. The Wikidata community appreciated the findings and welcomed this proposal to improve editor attrition.

  16. 16.

    https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Data_quality_framework_for_Wikidata

References

  1. Alvarez, Michael R. (2016). Computational Social Science: Discovery and Prediction, Analytical Methods for Social Research. Cambridge: Cambridge University Press.

  2. Ang, Lawrence; and Francis Buttle (2006). Customer Retention Management Processes: A Quantitative Study. European Journal of Marketing, vol. 40, no. 1/2, pp. 83–99.

  3. Clow, Doug (2013). MOOCs and the Funnel of Participation. LAK ’13. Third Conference on Learning Analytics and Knowledge. New York: ACM, pp. 185–189.

  4. Cosley, Dan; Dan Frankowski; Loren Terveen; and John Riedl (2007). SuggestBot: Using Intelligent Task Routing to Help People Find Work in Wikipedia. IUI’07. Proceedings of the 12th International Conference on Intelligent User Interfaces, IUI ’07. New York: ACM, pp. 32–41.

  5. Cox, David R. (1992). Regression models and life-tables. Breakthroughs in statistics. Springer, pp. 527–541.

  6. Cuong, To Tu; and Claudia Müller-Birn (2016). SocInfo’16. Applicability of Sequence Analysis Methods in Analyzing Peer-Production Systems: A Case Study in Wikidata. Social Informatics. Berlin: Springer, pp. 142–156.

  7. Danescu-Niculescu-Mizil, Cristian; Robert West; Dan Jurafsky; Jure Leskovec; and Christopher Potts (2013). No Country for Old Members: User Lifecycle and Linguistic Change in Online Communities. WWW 2013. 22nd International World Wide Web Conference, Rio de Janeiro, Brazil, May 13-17, 2013. New York: ACM, pp. 307–318.

  8. Difallah, Djellel; Michele Catasta; Gianluca Demartini; and Philippe Cudré-Mauroux (2014). Scaling-Up the Crowd: Micro-Task Pricing Schemes for Worker Retention and Latency Improvement. HCOMP’14, Second AAAI Conference on Human Computation and Crowdsourcing,. AAAI, pp. 50–58.

  9. Dittus, Martin; Giovanni Quattrone; and Licia Capra (2016). Analysing Volunteer Engagement in Humanitarian Mapping: Building Contributor Communities at Large Scale. CSCW ’16. Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work Social Computing. New York: ACM, pp. 108–118.

  10. Druck, Gregory; Gerome Miklau; and Andrew Mccallum (2008). Learning to Predict the Quality of Contributions to Wikipedia. WikiAI’08. Proceedings of the Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy. Palo Alto: AAAI Press, pp. 7–12.

  11. Duhigg, Charles (2012). The Power of Habit: Why We Do What We Do in Life and Business, Vol. 34. Random House.

  12. Fischler, Martin A; and Robert C Bolles (1981). Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Communications of the ACM, vol. 24, no. 6, pp. 381–395.

  13. Franklin, Michael J.; Donald Kossmann; Tim Kraska; Sukriti Ramesh; and Reynold Xin (2011). CrowdDB: Answering Queries with Crowdsourcing. SIGMOD 2011. Proceedings of the ACM SIGMOD International Conference on Management of Data, Athens, Greece, June 12-16, 2011. New York: ACM, pp. 61–72.

  14. Gandica, Yérali; Joäo Carvalho; and Fernando Sampaio dos Aidos (2015). Wikipedia editing dynamics. Physical Review E, vol. 91, no. 1, pp. 012824.

  15. Geiger, Stuart R.; and Aaron Halfaker (2013). Using Edit Sessions to Measure Participation in Wikipedia. CSCW 2013. Computer Supported Cooperative Work, San Antonio, TX, USA, February 23-27, 2013. New York: ACM, pp. 861–870.

  16. Gordini, Niccolo; and Valerio Veglio (2017). Customers Churn Prediction And Marketing Retention Strategies. An Application of Support Vector Machines Based On the Auc Parameter-Selection Technique In B2B E-Commerce Industry. Industrial Marketing Management, vol. 62 pp. 100–107.

  17. Halfaker, Aaron; Aniket Kittur; and John Riedl (2011). Don’t Bite the Newbies: How Reverts Affect the Quantity and Quality of Wikipedia Work. Proceedings of the 7th International Symposium on Wikis and Open Collaboration, 2011, Mountain View, CA, USA, October 3-5, 2011. New York: ACM, pp. 163–172.

  18. Halfaker, Aaron; Oliver Keyes; and Dario Taraborelli (2013). Making Peripheral Participation Legitimate: Reader Engagement Experiments in Wikipedia. CSCW 2013. Computer Supported Cooperative Work, San Antonio, TX, USA, February 23-27, 2013. New York: ACM, pp. 849–860.

  19. Huang, Shih-Wen; and Wai-Tat Fu (2013). Don’t Hide in the Crowd!: Increasing Social Transparency Between PeerWorkers Improves Crowdsourcing Outcomes. CHI ’13. ACM SIGCHI Conference on Human Factors in Computing Systems, Paris, France, April 27 - May 2, 2013. New York: ACM, pp. 621–630.

  20. Iba, Takashi; Keiichi Nemoto; Bernd Peters; and Peter A. Gloor (2010). Analyzing the Creative Editing Behavior ofWikipedia Editors Through Dynamic Social Network Analysis. Procedia - Social and Behavioral Sciences, vol. 2, no. 4, pp. 6441–6456.

  21. Lintott, Chris J; Kevin Schawinski; Slosar Anže; Land Kate; Bamford Steven; Thomas Daniel; M. Raddick Jordan; C Nichol Robert; Szalay Alex; Andreescu Dan; et al (2008). Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Monthly Notices of the Royal Astronomical Society, vol. 389, no. 3, pp. 1179–1189.

  22. Müller-Birn, Claudia; Benjamin Karran; Janette Lehmann; and Markus Luczak-Rösch (2015). Peer-production System or Collaborative Ontology Engineering Effort: What is Wikidata? OpenSym’15. Proceedings of the 11th International Symposium on Open Collaboration. New York: ACM, pp. 20:1–20:10.

  23. Michie, Susan; Maartje M van Stralen; and Robert West (2011). The Behaviour Change Wheel: A New Method for Characterising and Designing Behaviour Change Interventions. Implementation Science, vol. 6, no. 1, pp. 42.

  24. Nov, Oded (2007). What MotivatesWikipedians? Communications of the ACM, vol. 50, no. 11, pp. 60–64.

  25. Panciera, Katherine; Aaron Halfaker; and Loren Terveen (2009). Wikipedians Are Born, Not Made: A Study of Power Editors on Wikipedia. Proceedings of the ACM 2009 International Conference on Supporting Group Work. New York: ACM, pp. 51–60.

  26. Piscopo, Alessandro; Christopher Phethean; and Elena Simperl (2016). Wikidatians are born: paths to full participation in a collaborative structured knowledge base. HICSS 2017. 50th Hawaii International Conference on System Sciences, Hilton Waikoloa Village, Hawaii, USA, January 4-7, 2017. AIS Electronic Library (AISeL), pp. 4354–4363.

  27. Ponciano, Lesandro; and Francisco Brasileiro (2014). Finding Volunteers’ Engagement Profiles in Human Computation for Citizen Science Projects. Human Computation, vol. 1, no. 2,.

  28. Rosenberg, Larry J; and John A. Czepiel (1984). A Marketing Approach for Customer Retention. Journal of Consumer Marketing, vol. 1, no. 2, pp. 45–51.

  29. Ryan, Richard M; and Edward L Deci (2000). Self-determination Theory and the Facilitation of Intrinsic Motivation, Social Development, and Well-being. American Psychologist, vol. 55, no. 1, pp. 68.

  30. Sarabadani, Amir; Aaron Halfaker; and Dario Taraborelli (2017). Building automated vandalism detection tools for Wikidata. WWW 2017. Proceedings of the 26th International Conference on World Wide Web Companion. pp. 1647–1654.

  31. Schmachtenberg, Max; Christian Bizer; and Heiko Paulheim (2014). Adoption of the Linked Data Best Practices in Different Topical Domains. ISWC 2014, The Semantic Web - 13th International SemanticWeb Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I. Berlin: Springer, pp. 245–260.

  32. Shannon, Claude Elwood (2001). A Mathematical Theory of Communication. ACM SIGMOBILE Mobile Computing and Communications Review, vol. 5, no. 1, pp. 3–55.

  33. Singer, Philipp; Denis Helic; Andreas Hotho; and Markus Strohmaier (2015). Hyptrails: A Bayesian Approach for Comparing Hypotheses about Human Trails on the Web. WWW 2015. Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, May 18-22, 2015. New York: ACM, pp. 1003–1013.

  34. Stewart, Osamuyimen; David Lubensky; and Juan M. Huerta (2010). Crowdsourcing Participation Inequality: A SCOUT Model for the Enterprise Domain. HCOMP’10. Proceedings of the ACM SIGKDD Workshop on Human Computation. New York: ACM, pp. 30–33.

  35. Strohmaier, Markus; and Claudia Wagner (2014). Computational Social Science for the World Wide Web. IEEE Intelligent Systems, vol. 29, no. 5, pp. 84–88.

  36. Verhoef, Peter C. (2003). Understanding the Effect of Customer Relationship Management Efforts on Customer Retention and Customer Share Development. Journal of Marketing, vol. 67, no. 4, pp. 30–45.

  37. Vrandečić, Denny; and Markus Krötzsch (2014). Wikidata: a Free Collaborative Knowledge Base. Communications of the ACM, vol. 57, no. 10, pp. 78–85.

  38. Walk, Simon; Denis Helic; Florian Geigl; and Markus Strohmaier (2016). Activity Dynamics in Collaboration Networks. ACM Transactions on the Web (TWEB), vol. 10, no. 2, pp. 11.

  39. Walk, Simon; Philipp Singer; Lisette Espín Noboa; Tania Tudorache; Mark A. Musen; and Markus Strohmaier (2015). Understanding How Users Edit Ontologies: Comparing Hypotheses About Four Real-World Projects. ISWC 2015. Proceedings of the 14th International Conference on The Semantic Web - ISWC 2015 - Volume 9366. Springer-Verlag New York, Inc., pp. 551–568.

  40. West, Robert; Ingmar Weber; and Carlos Castillo (2012). A Data-driven Sketch of Wikipedia Editors. WWW 2012. Proceedings of the 21st World Wide Web Conference, Lyon, France, April 16-20, 2012 (Companion Volume). New York: ACM, pp. 631–632.

  41. Wulczyn, Ellllery; Robert West; Leila Zia; and Jure Leskovec (2016). Growing Wikipedia Across Languages via Recommendation. WWW 2016. Proceedings of the 25th International Conference onWorldWideWeb, Montreal, Canada, April 11 - 15, 2016. New York: ACM, pp. 975–985.

  42. Yasseri, Taha; Robert Sumi; and János Kertész (2012). Circadian Patterns of Wikipedia Editorial Activity: A Demographic Analysis. PLoS ONE, vol. 7, no. 1, pp. 1–8.

  43. Zaveri, Amrapali; Anisa Rula; Andrea Maurino; Ricardo Pietrobon; Jens Lehmann; and Sören Auer (2016). Quality assessment for linked open data: A survey. Semantic Web Journal, vol. 7, no. 1, pp. 63–93.

Download references

Acknowledgments

We would like to thank Michele Catasta for his feedback at an early stage of this research, and the rest of the participants of our Dagstuhl Research Meeting “Crowdsourcing Research - Transcending Disciplinary Boundaries”. We also would like to thank Michael Luggen for his help to set up one of the machines used for the experiments of this project. This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 732328, as well as from the COST Action IC1302 - Keystone.

During the manuscript reviewing process, several authors changed their affiliation. Part of the work presented in this paper was carried out while Cristina Sarasua was affiliated with the University of Koblenz-Landau (Germany) and visited the University of Sheffield (United Kingdom), Gianluca Demartini was affiliated with the University of Sheffield (United Kingdom) and Djellel Difallah was affiliated with the University of Fribourg (Switzerland).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Cristina Sarasua.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sarasua, C., Checco, A., Demartini, G. et al. The Evolution of Power and Standard Wikidata Editors: Comparing Editing Behavior over Time to Predict Lifespan and Volume of Edits. Comput Supported Coop Work 28, 843–882 (2019). https://doi.org/10.1007/s10606-018-9344-y

Download citation

Keywords

  • Wikidata
  • Knowledge
  • Power editors
  • Standard editors
  • Evolution