Skip to main content

Externalities in knowledge production: evidence from a randomized field experiment

Abstract

Are there positive or negative externalities in knowledge production? We analyze whether current contributions to knowledge production increase or decrease the future growth of knowledge. To assess this, we use a randomized field experiment that added content to some pages in Wikipedia while leaving similar pages unchanged. We compare subsequent content growth over the next 4 years between the treatment and control groups. Our estimates allow us to rule out effects on 4-year growth of content length larger than twelve percent. We can also rule out effects on 4-year growth of content quality larger than four points, which is less than one-fifth of the size of the treatment itself. The treatment increased editing activity in the first 2 years, but most of these edits only modified the text added by the treatment. Our results have implications for information seeding and incentivizing contributions. They imply that additional content may inspire future contributions in the short- and medium-term but do not generate large externalities in the long term.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    Traditional channels of personal knowledge transmission require a double-coincidence demand and supply of knowledge. The “knowledge-seeker” and the “knowledge-holder” have to meet in person or at least at the same time. The elimination of such double-coincidences has been modeled to understand the advantage of monetary over barter-economies. Kiyotaki and Wright (1989). These features give such systems a drastic competitive advantage that may affect the education sector and other traditional channels of knowledge transmission. The sector of encyclopedic knowledge is one of the most salient examples of the new technology’s potential.

  2. 2.

    Nagaraj (2019) describes how such policies have been used by Wikipedia (seeding articles on more than 30,000 US cities from US Census Bureau data), OpenStreetMap (US Census maps), and Reddit (fake user accounts).

  3. 3.

    A comprehensive description of the experiment is provided in Hinnosaar et al. (2021b), who studied the impact of this treatment on real-world outcomes.

  4. 4.

    Other studies on Wikipedia have analyzed biases in Wikipedia’s content (Greenstein & Zhu, 2012, 2018; Hinnosaar, 2019) and the impact of Wikipedia on market outcomes (Hinnosaar et al., 2021b; Xu & Zhang, 2013) and science (Thompson & Hanley, 2018).

  5. 5.

    More generally, the literature suggests strong effects of social influence on individual choices related to savings (Duflo & Saez, 2002, 2003), education (De Giorgi et al., 2010; Hanushek et al., 2003), entertainment (Salganik et al., 2006), etc.

  6. 6.

    For further details of the randomization, see Hinnosaar et al. (2021b).

  7. 7.

    Kane and Ransbotham (2016) provide some evidence that the effect could be larger for less developed content. They find that in the case of less developed articles, 1%-increase in length implies 0.03–0.04 more monthly contributors. In our case, the treatment was on average 23% of the page length, which would imply 0.7–0.9 more users.

  8. 8.

    The minimum detectable effect size is calculated at 5%-significance level and 80% power.

  9. 9.

    A revision (or an edit) is a version of a Wikipedia article saved at a specific moment by a particular user. All revisions with the corresponding metadata, including full text, user, and timestamp, are preserved by Wikipedia and publicly available.

  10. 10.

    The drop in both the treatment and control groups in early 2013 comes from technical changes in Wikipedia: Addbot removed about 2000 characters from each page with an explanation similar to “Migrating 77 interwiki links, now provided by Wikidata”.

  11. 11.

    By August 2016, the page of Cordoba in French Wikipedia was relatively long, with 19,426 characters (at the time 93% of the pages in our sample were shorter than that). During August 2016, this user increased the page length to 100,702 characters, which is almost twice the length of the longest page at the time (57,076 characters). Our conclusions do not change if we exclude this page.

  12. 12.

    As we show below, the articles about Spanish cities in English-language Wikipedia are sometimes quite incomplete, so ideally we would have preferred to use Spanish Wikipedia as a comparison. Because the combination of necessary language skills is not common, it would have been prohibitively costly.

  13. 13.

    Similar correlations between quantity and quality of content have been found previously, for example by Chen et al. (2019).

  14. 14.

    Many Wikipedia editors save many revisions to the same page in a short period of time, for example, generating a new revision after each sentence they write. This is partly motivated by the fact that someone else might edit the page at the same time.

  15. 15.

    Edit distance is widely used in computational linguistics and computer science to measure the similarity of strings. It is a generic term that allows any weights of insert, delete, and substitution operations. Common variants put weight 1 to addition and deletions, and weight substitutions either by 1 (called Levenshtein distance) or by 2 (the measure we use). For each edit, we calculate the edit distance using PHP FineDiff class at the granularity level of a character.

  16. 16.

    Both indexes are used to measure the similarity between two documents. The Jaccard index is also known as the Intersection over Union and sometimes called the Tanimoto similarity. Another similarity measure used in earlier works in economics (Chen et al., 2019; Thompson & Hanley, 2018) is cosine similarity. It would be preferable if we want to capture the similarity of two pages not only in terms of content covered but the language and tone. As we aim to capture completeness of the page compared to a benchmark, we found the Tversky index most appropriate for the task.

  17. 17.

    For more information, see https://tech.yandex.com/translate/.

  18. 18.

    For example, pronouns (“it”, “their”) and prepositions (“on”, “before”). The full list is in B.7.

  19. 19.

    Terms such as “america”, “archipelago”, “area”, “arona”; about 250–500 terms for each article.

  20. 20.

    The three lowest grades in the Wikipedia content assessment are: Stub—a very basic description of the topic; Start—developing but still quite incomplete; Class C—substantial but is still missing important content or contains much irrelevant material. Source: https://en.wikipedia.org/wiki/Wikipedia:Content_assessment.

  21. 21.

    Note that these articles, on average, are also not very important according to the English Wikipedia article importance scheme. The scheme uses ratings from Low, Mid, High, to Top. Only 7 of the pages are rated as High importance, 15 as Mid, and 9 as Low importance, 29 of the articles have not been assigned an importance rating, which probably also implies that those articles are not highly important.

  22. 22.

    To make the pages comparable, we subtract the length of text added by the treatment from the length of pages in the treatment group after treatment (both in 2014 and in 2018). We do that because the outcome variable measures the percentage change, and hence, without subtracting the length of the treatment text, the treatment group would have a higher base when calculating the percentage, then the same increase in characters would give a smaller percentage for the treatment group.

  23. 23.

    Table B.2 in online appendix shows that the results are robust to alternative control variables. Table B.3 shows that the results are also robust when including the Dutch pages either in the control group or estimating intention-to-treat.

  24. 24.

    For expositional clarity, we interpret the coefficient in column 1 as measuring a percentage change in length. This is a logarithmic approximation that performs well when changes are small which is the case in our sample.

  25. 25.

    To make the pages comparable, we subtract the length of text added by the treatment from the length of pages in the treatment group after treatment. Hence, the estimates should be interpreted as the effect of treatment on page length after removing the mechanical increase created by the treatment.

  26. 26.

    For the long-term (4-year) effect, the ex-post minimum detectable effect size is 0.11 users.

  27. 27.

    Source: http://www.ravi.io/language-word-lengths.

  28. 28.

    Without externalities, our model is similar to the model in Chen et al. (2019), with two differences: the payoffs in their model depend on social impact (i.e., number of viewers) and participation is endogenous. A crucial difference in our model is the inclusion of externalities. The model could be generalized and solved using tools introduced in Hinnosaar (2018).

  29. 29.

    Note that there are other, more subtle differences in the research environments that may affect the outcomes. For example, in our setting, the added content comes from an outside source. Instead, in the settings of Kane and Ransbotham (2016), Aaltonen and Seiler (2016), and Zhu et al. (2020), the added content is created by the community.

References

  1. Aaltonen, A., & Seiler, S. (2016). Cumulative growth in user-generated content production: Evidence from Wikipedia. Management Science, 62, 2054–2069.

    Article  Google Scholar 

  2. Algan, Y., Benkler, Y., Morell, M. F., & Hergueux, J. (2013). Cooperation in a peer production economy experimental evidence from Wikipedia, manuscript.

  3. Ayres, I., Raseman, S., & Shih, A. (2013). Evidence from two large field experiments that peer comparison feedback can reduce residential energy usage. Journal of Law, Economics, and Organization, 29, 992–1022.

    Article  Google Scholar 

  4. Chen, Y., Farzan, R., Kraut, R. E., YeckehZaare, I., & Zhang, A. F. (2019). Motivating contributions to public information goods: A personalized field experiment on Wikipedia, manuscript.

  5. Chen, Y., Harper, F. M., Konstan, J., & Li, S. X. (2010). Social comparisons and contributions to online communities: A field experiment on movie lens. American Economic Review, 100, 1358–98.

    Article  Google Scholar 

  6. De Giorgi, G., Pellizzari, M., & Redaelli, S. (2010). Identification of social interactions through partially overlapping peer groups. American Economic Journal: Applied Economics, 2, 241–75.

    Google Scholar 

  7. Duflo, E., & Saez, E. (2002). Participation and investment decisions in a retirement plan: The influence of colleagues’ choices. Journal of Public Economics, 85, 121–148.

    Article  Google Scholar 

  8. Duflo, E., & Saez, E. (2003). The role of information and social interactions in retirement plan decisions: Evidence from a randomized experiment. Quarterly Journal of Economics, 118, 815–842.

    Article  Google Scholar 

  9. Egebark, J., & Ekström, M. (2017). Liking what others “like’’: Using Facebook to identify determinants of conformity. Experimental Economics, 21, 1–22.

    Google Scholar 

  10. Eurobarometer. (2012). Europeans and their languages, special report 386. European Commission.

  11. Fershtman, C., & Gandal, N. (2011). Direct and indirect knowledge spillovers: The “social network’’ of open-source projects. RAND Journal of Economics, 42, 70–91.

    Article  Google Scholar 

  12. Gallus, J. (2017). Fostering public good contributions with symbolic awards: A large-scale natural field experiment at Wikipedia. Management Science, 63, 3999–4015.

    Article  Google Scholar 

  13. Goldstein, N. J., Cialdini, R. B., & Griskevicius, V. (2008). A room with a viewpoint: Using social norms to motivate environmental conservation in hotels. Journal of Consumer Research, 35, 472–482.

    Article  Google Scholar 

  14. Greenstein, S., & Zhu, F. (2012). Is Wikipedia biased? In American Economic Review: Papers and Proceedings (pp. 343–348).

  15. Greenstein, S., & Zhu, F. (2018). Do experts or crowd-based models produce more bias? Evidence from Encyclopedia Britannica and Wikipedia. MIS Quarterly, 42, 945–959.

    Article  Google Scholar 

  16. Grossman, G. M., & Helpman, E. (1993). Innovation and growth in the global economy. MIT Press.

    Google Scholar 

  17. Hanushek, E. A., Kain, J. F., Markman, J. M., & Rivkin, S. G. (2003). Does peer ability affect student achievement? Journal of Applied Econometrics, 18, 527–544.

    Article  Google Scholar 

  18. Hinnosaar, M. (2019). Gender inequality in new media: Evidence from Wikipedia. Journal of Economic Behavior & Organization, 163, 262–276.

    Article  Google Scholar 

  19. Hinnosaar, M., Hinnosaar, T., Kummer, M., & Slivko, O. (2021a). Replication data for: “Externalities in knowledge production: Evidence from a randomized field experiment’’. Harvard Dataverse. https://doi.org/10.7910/DVN/T4VFCX.

    Article  Google Scholar 

  20. Hinnosaar, M., Hinnosaar, T., Kummer, M., & Slivko, O. (2021b). Wikipedia matters. Journal of Economics & Management Strategy, 163, 1–13.

    Google Scholar 

  21. Hinnosaar, T. (2018). Optimal sequential contests, manuscript.

  22. Huang, N., Burtch, G., Gu, B., Hong, Y., Liang, C., Wang, K., et al. (2018). Motivating user generated content with performance feedback: Evidence from randomized field experiments. Management Science, 65, 327–345.

    Article  Google Scholar 

  23. Jones, C. I. (1995). R&D-based models of economic growth. Journal of Political Economy, 103, 759–784.

    Article  Google Scholar 

  24. Jones, D., Molitor, D., & Reif, J. (2019). What do workplace wellness programs do? Evidence from the Illinois workplace wellness study. Quarterly Journal of Economics, 134, 1747–1791.

    Article  Google Scholar 

  25. Kane, G. C., & Ransbotham, S. (2016). Content as community regulator: The recursive relationship between consumption and contribution in open collaboration communities. Organization Science, 27, 1258–1274.

    Article  Google Scholar 

  26. Kiyotaki, N., & Wright, R. (1989). On money as a medium of exchange. Journal of Political Economy, 97, 927–954.

    Article  Google Scholar 

  27. Kummer, M. (2020). Attention in the peer production of user generated content: Evidence from 93 pseudo-experiments on Wikipedia, manuscript.

  28. Kummer, M., Slivko, O., & Zhang, X. (2019). Unemployment and digital public goods contribution. Information Systems Research, 31, 801–819.

    Article  Google Scholar 

  29. Lacetera, N., & Macis, M. (2010). Social image concerns and prosocial behavior: Field evidence from a nonlinear incentive scheme. Journal of Economic Behavior & Organization, 76, 225–237.

    Article  Google Scholar 

  30. Lerner, J., & Tirole, J. (2003). Some simple economics of open source. Journal of Industrial Economics, 50, 197–234.

    Article  Google Scholar 

  31. Manski, C. F. (1993). Identification of endogenous social effects: The reflection problem. Review of Economic Studies, 60, 531–542.

    Article  Google Scholar 

  32. Nagaraj, A. (2019). Information seeding and knowledge production in online communities: Evidence from OpenStreetMap, manuscript.

  33. Nov, O. (2007). What motivates Wikipedians? Communications of the ACM, 50, 60–64.

    Article  Google Scholar 

  34. Olivera, F., Goodman, P. S., & Tan, S.S.-L. (2008). Contribution behaviors in distributed environments. MIS Quarterly, 32, 23–42.

    Article  Google Scholar 

  35. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14, 130–137.

    Article  Google Scholar 

  36. Ransbotham, S., Kane, G. C., & Lurie, N. H. (2012). Network characteristics and the value of collaborative user-generated content. Marketing Science, 31, 387–405.

    Article  Google Scholar 

  37. Ren, Y., Chen, J., & Riedl, J. (2015). The impact and evolution of group diversity in online open collaboration. Management Science, 62, 1668–1686.

    Article  Google Scholar 

  38. Romer, P. M. (1990). Endogenous technological change. Journal of Political Economy, 98, S71–S102.

    Article  Google Scholar 

  39. Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311, 854–856.

    Article  Google Scholar 

  40. Shah, S. K. (2006). Motivation, governance, and the viability of hybrid forms in open source software development. Management Science, 52, 1000–1014.

    Article  Google Scholar 

  41. Slivko, O. (2018). Online “brain gain”: Do immigrants return knowledge home? manuscript.

  42. Sun, Y., Dong, X., & McIntyre, S. (2017). Motivation of user-generated content: Social connectedness moderates the effects of monetary rewards. Marketing Science, 36, 329–337.

    Article  Google Scholar 

  43. Thompson, N., & Hanley, D. (2018). Science is shaped by Wikipedia: Evidence from a randomized control trial, manuscript.

  44. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352.

    Article  Google Scholar 

  45. Westfall, P. H., & Young, S. S. (1993). Resampling-based multiple testing: Examples and methods for p-value adjustment (Vol. 279). Wiley.

    Google Scholar 

  46. Xu, S. X., & Zhang, X. (2013). Impact of Wikipedia on market information environment: Evidence on management disclosure and investor reaction. MIS Quarterly, 37, 1043–1068.

    Article  Google Scholar 

  47. Zhang, X., & Zhu, F. (2011). Group size and incentives to contribute: A natural experiment at Chinese Wikipedia. American Economic Review, 101, 1601–1615.

    Article  Google Scholar 

  48. Zhu, K., Walker, D., & Muchnik, L. (2020). Content growth and attention contagion in information networks: A natural experiment on Wikipedia. Information Systems Research, 31, 491–509.

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to John Duffy, two anonymous referees, Yan Chen, Chris Forman, Willa Friedman, Shane Greenstein, David Hugh-Jones, Tobias Kretschmer, Giovanni Mastrobuoni, Ignacio Monzón, Juan Morales, Abhishek Nagaraj, Stefan Penczynski, Imke Reimers, Stephan Seiler, Ananya Sen, Matthias Sutter, and Michael Zhang for valuable comments. We would also like to thank the seminar participants at the University of Strasbourg, the Collegio Carlo Alberto, George Mason University, ParisTech 2019, Digital Economy Workshop 2019 (Católica Lisbon), ZEW Conference on the Economics of ICT, the Advances with Field Experiments 2019 Conference (University of Chicago), SED 2021 for valuable input.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Toomas Hinnosaar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Data and code for this article are available in Hinnosaar et al. (2021a), Harvard Dataverse, https://doi.org/10.7910/DVN/T4VFCX.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 968 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hinnosaar, M., Hinnosaar, T., Kummer, M.E. et al. Externalities in knowledge production: evidence from a randomized field experiment. Exp Econ (2021). https://doi.org/10.1007/s10683-021-09730-x

Download citation

Keywords

  • Knowledge accumulation
  • User-generated content
  • Wikipedia
  • Public goods provision
  • Field experiment

JEL Classifications

  • L17
  • L86
  • C93