Externalities in knowledge production: evidence from a randomized field experiment

Hinnosaar, Marit; Hinnosaar, Toomas; Kummer, Michael E.; Slivko, Olga

doi:10.1007/s10683-021-09730-x

Externalities in knowledge production: evidence from a randomized field experiment

Original Paper
Published: 01 September 2021

Volume 25, pages 706–733, (2022)
Cite this article

Experimental Economics Aims and scope Submit manuscript

Marit Hinnosaar^1,4,5,
Toomas Hinnosaar ORCID: orcid.org/0000-0002-5256-2481^1,5,
Michael E. Kummer^2,6,7 &
…
Olga Slivko³

793 Accesses
3 Citations
11 Altmetric
Explore all metrics

Abstract

Are there positive or negative externalities in knowledge production? We analyze whether current contributions to knowledge production increase or decrease the future growth of knowledge. To assess this, we use a randomized field experiment that added content to some pages in Wikipedia while leaving similar pages unchanged. We compare subsequent content growth over the next 4 years between the treatment and control groups. Our estimates allow us to rule out effects on 4-year growth of content length larger than twelve percent. We can also rule out effects on 4-year growth of content quality larger than four points, which is less than one-fifth of the size of the treatment itself. The treatment increased editing activity in the first 2 years, but most of these edits only modified the text added by the treatment. Our results have implications for information seeding and incentivizing contributions. They imply that additional content may inspire future contributions in the short- and medium-term but do not generate large externalities in the long term.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Article Open access 30 April 2020

Literature reviews as independent studies: guidelines for academic practice

Article Open access 14 October 2022

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Article 12 April 2024

Notes

Traditional channels of personal knowledge transmission require a double-coincidence demand and supply of knowledge. The “knowledge-seeker” and the “knowledge-holder” have to meet in person or at least at the same time. The elimination of such double-coincidences has been modeled to understand the advantage of monetary over barter-economies. Kiyotaki and Wright (1989). These features give such systems a drastic competitive advantage that may affect the education sector and other traditional channels of knowledge transmission. The sector of encyclopedic knowledge is one of the most salient examples of the new technology’s potential.
Nagaraj (2019) describes how such policies have been used by Wikipedia (seeding articles on more than 30,000 US cities from US Census Bureau data), OpenStreetMap (US Census maps), and Reddit (fake user accounts).
A comprehensive description of the experiment is provided in Hinnosaar et al. (2021b), who studied the impact of this treatment on real-world outcomes.
Other studies on Wikipedia have analyzed biases in Wikipedia’s content (Greenstein & Zhu, 2012, 2018; Hinnosaar, 2019) and the impact of Wikipedia on market outcomes (Hinnosaar et al., 2021b; Xu & Zhang, 2013) and science (Thompson & Hanley, 2018).
More generally, the literature suggests strong effects of social influence on individual choices related to savings (Duflo & Saez, 2002, 2003), education (De Giorgi et al., 2010; Hanushek et al., 2003), entertainment (Salganik et al., 2006), etc.
For further details of the randomization, see Hinnosaar et al. (2021b).
Kane and Ransbotham (2016) provide some evidence that the effect could be larger for less developed content. They find that in the case of less developed articles, 1%-increase in length implies 0.03–0.04 more monthly contributors. In our case, the treatment was on average 23% of the page length, which would imply 0.7–0.9 more users.
The minimum detectable effect size is calculated at 5%-significance level and 80% power.
A revision (or an edit) is a version of a Wikipedia article saved at a specific moment by a particular user. All revisions with the corresponding metadata, including full text, user, and timestamp, are preserved by Wikipedia and publicly available.
The drop in both the treatment and control groups in early 2013 comes from technical changes in Wikipedia: Addbot removed about 2000 characters from each page with an explanation similar to “Migrating 77 interwiki links, now provided by Wikidata”.
By August 2016, the page of Cordoba in French Wikipedia was relatively long, with 19,426 characters (at the time 93% of the pages in our sample were shorter than that). During August 2016, this user increased the page length to 100,702 characters, which is almost twice the length of the longest page at the time (57,076 characters). Our conclusions do not change if we exclude this page.
As we show below, the articles about Spanish cities in English-language Wikipedia are sometimes quite incomplete, so ideally we would have preferred to use Spanish Wikipedia as a comparison. Because the combination of necessary language skills is not common, it would have been prohibitively costly.
Similar correlations between quantity and quality of content have been found previously, for example by Chen et al. (2019).
Many Wikipedia editors save many revisions to the same page in a short period of time, for example, generating a new revision after each sentence they write. This is partly motivated by the fact that someone else might edit the page at the same time.
Edit distance is widely used in computational linguistics and computer science to measure the similarity of strings. It is a generic term that allows any weights of insert, delete, and substitution operations. Common variants put weight 1 to addition and deletions, and weight substitutions either by 1 (called Levenshtein distance) or by 2 (the measure we use). For each edit, we calculate the edit distance using PHP FineDiff class at the granularity level of a character.
Both indexes are used to measure the similarity between two documents. The Jaccard index is also known as the Intersection over Union and sometimes called the Tanimoto similarity. Another similarity measure used in earlier works in economics (Chen et al., 2019; Thompson & Hanley, 2018) is cosine similarity. It would be preferable if we want to capture the similarity of two pages not only in terms of content covered but the language and tone. As we aim to capture completeness of the page compared to a benchmark, we found the Tversky index most appropriate for the task.
For more information, see https://tech.yandex.com/translate/.
For example, pronouns (“it”, “their”) and prepositions (“on”, “before”). The full list is in B.7.
Terms such as “america”, “archipelago”, “area”, “arona”; about 250–500 terms for each article.
The three lowest grades in the Wikipedia content assessment are: Stub—a very basic description of the topic; Start—developing but still quite incomplete; Class C—substantial but is still missing important content or contains much irrelevant material. Source: https://en.wikipedia.org/wiki/Wikipedia:Content_assessment.
Note that these articles, on average, are also not very important according to the English Wikipedia article importance scheme. The scheme uses ratings from Low, Mid, High, to Top. Only 7 of the pages are rated as High importance, 15 as Mid, and 9 as Low importance, 29 of the articles have not been assigned an importance rating, which probably also implies that those articles are not highly important.
To make the pages comparable, we subtract the length of text added by the treatment from the length of pages in the treatment group after treatment (both in 2014 and in 2018). We do that because the outcome variable measures the percentage change, and hence, without subtracting the length of the treatment text, the treatment group would have a higher base when calculating the percentage, then the same increase in characters would give a smaller percentage for the treatment group.
Table B.2 in online appendix shows that the results are robust to alternative control variables. Table B.3 shows that the results are also robust when including the Dutch pages either in the control group or estimating intention-to-treat.
For expositional clarity, we interpret the coefficient in column 1 as measuring a percentage change in length. This is a logarithmic approximation that performs well when changes are small which is the case in our sample.
To make the pages comparable, we subtract the length of text added by the treatment from the length of pages in the treatment group after treatment. Hence, the estimates should be interpreted as the effect of treatment on page length after removing the mechanical increase created by the treatment.
For the long-term (4-year) effect, the ex-post minimum detectable effect size is 0.11 users.
Source: http://www.ravi.io/language-word-lengths.
Without externalities, our model is similar to the model in Chen et al. (2019), with two differences: the payoffs in their model depend on social impact (i.e., number of viewers) and participation is endogenous. A crucial difference in our model is the inclusion of externalities. The model could be generalized and solved using tools introduced in Hinnosaar (2018).
Note that there are other, more subtle differences in the research environments that may affect the outcomes. For example, in our setting, the added content comes from an outside source. Instead, in the settings of Kane and Ransbotham (2016), Aaltonen and Seiler (2016), and Zhu et al. (2020), the added content is created by the community.

References

Aaltonen, A., & Seiler, S. (2016). Cumulative growth in user-generated content production: Evidence from Wikipedia. Management Science, 62, 2054–2069.
Article Google Scholar
Algan, Y., Benkler, Y., Morell, M. F., & Hergueux, J. (2013). Cooperation in a peer production economy experimental evidence from Wikipedia, manuscript.
Ayres, I., Raseman, S., & Shih, A. (2013). Evidence from two large field experiments that peer comparison feedback can reduce residential energy usage. Journal of Law, Economics, and Organization, 29, 992–1022.
Article Google Scholar
Chen, Y., Farzan, R., Kraut, R. E., YeckehZaare, I., & Zhang, A. F. (2019). Motivating contributions to public information goods: A personalized field experiment on Wikipedia, manuscript.
Chen, Y., Harper, F. M., Konstan, J., & Li, S. X. (2010). Social comparisons and contributions to online communities: A field experiment on movie lens. American Economic Review, 100, 1358–98.
Article Google Scholar
De Giorgi, G., Pellizzari, M., & Redaelli, S. (2010). Identification of social interactions through partially overlapping peer groups. American Economic Journal: Applied Economics, 2, 241–75.
Google Scholar
Duflo, E., & Saez, E. (2002). Participation and investment decisions in a retirement plan: The influence of colleagues’ choices. Journal of Public Economics, 85, 121–148.
Article Google Scholar
Duflo, E., & Saez, E. (2003). The role of information and social interactions in retirement plan decisions: Evidence from a randomized experiment. Quarterly Journal of Economics, 118, 815–842.
Article Google Scholar
Egebark, J., & Ekström, M. (2017). Liking what others “like’’: Using Facebook to identify determinants of conformity. Experimental Economics, 21, 1–22.
Google Scholar
Eurobarometer. (2012). Europeans and their languages, special report 386. European Commission.
Fershtman, C., & Gandal, N. (2011). Direct and indirect knowledge spillovers: The “social network’’ of open-source projects. RAND Journal of Economics, 42, 70–91.
Article Google Scholar
Gallus, J. (2017). Fostering public good contributions with symbolic awards: A large-scale natural field experiment at Wikipedia. Management Science, 63, 3999–4015.
Article Google Scholar
Goldstein, N. J., Cialdini, R. B., & Griskevicius, V. (2008). A room with a viewpoint: Using social norms to motivate environmental conservation in hotels. Journal of Consumer Research, 35, 472–482.
Article Google Scholar
Greenstein, S., & Zhu, F. (2012). Is Wikipedia biased? In American Economic Review: Papers and Proceedings (pp. 343–348).
Greenstein, S., & Zhu, F. (2018). Do experts or crowd-based models produce more bias? Evidence from Encyclopedia Britannica and Wikipedia. MIS Quarterly, 42, 945–959.
Article Google Scholar
Grossman, G. M., & Helpman, E. (1993). Innovation and growth in the global economy. MIT Press.
Google Scholar
Hanushek, E. A., Kain, J. F., Markman, J. M., & Rivkin, S. G. (2003). Does peer ability affect student achievement? Journal of Applied Econometrics, 18, 527–544.
Article Google Scholar
Hinnosaar, M. (2019). Gender inequality in new media: Evidence from Wikipedia. Journal of Economic Behavior & Organization, 163, 262–276.
Article Google Scholar
Hinnosaar, M., Hinnosaar, T., Kummer, M., & Slivko, O. (2021a). Replication data for: “Externalities in knowledge production: Evidence from a randomized field experiment’’. Harvard Dataverse. https://doi.org/10.7910/DVN/T4VFCX.
Article Google Scholar
Hinnosaar, M., Hinnosaar, T., Kummer, M., & Slivko, O. (2021b). Wikipedia matters. Journal of Economics & Management Strategy, 163, 1–13.
Google Scholar
Hinnosaar, T. (2018). Optimal sequential contests, manuscript.
Huang, N., Burtch, G., Gu, B., Hong, Y., Liang, C., Wang, K., et al. (2018). Motivating user generated content with performance feedback: Evidence from randomized field experiments. Management Science, 65, 327–345.
Article Google Scholar
Jones, C. I. (1995). R&D-based models of economic growth. Journal of Political Economy, 103, 759–784.
Article Google Scholar
Jones, D., Molitor, D., & Reif, J. (2019). What do workplace wellness programs do? Evidence from the Illinois workplace wellness study. Quarterly Journal of Economics, 134, 1747–1791.
Article Google Scholar
Kane, G. C., & Ransbotham, S. (2016). Content as community regulator: The recursive relationship between consumption and contribution in open collaboration communities. Organization Science, 27, 1258–1274.
Article Google Scholar
Kiyotaki, N., & Wright, R. (1989). On money as a medium of exchange. Journal of Political Economy, 97, 927–954.
Article Google Scholar
Kummer, M. (2020). Attention in the peer production of user generated content: Evidence from 93 pseudo-experiments on Wikipedia, manuscript.
Kummer, M., Slivko, O., & Zhang, X. (2019). Unemployment and digital public goods contribution. Information Systems Research, 31, 801–819.
Article Google Scholar
Lacetera, N., & Macis, M. (2010). Social image concerns and prosocial behavior: Field evidence from a nonlinear incentive scheme. Journal of Economic Behavior & Organization, 76, 225–237.
Article Google Scholar
Lerner, J., & Tirole, J. (2003). Some simple economics of open source. Journal of Industrial Economics, 50, 197–234.
Article Google Scholar
Manski, C. F. (1993). Identification of endogenous social effects: The reflection problem. Review of Economic Studies, 60, 531–542.
Article Google Scholar
Nagaraj, A. (2019). Information seeding and knowledge production in online communities: Evidence from OpenStreetMap, manuscript.
Nov, O. (2007). What motivates Wikipedians? Communications of the ACM, 50, 60–64.
Article Google Scholar
Olivera, F., Goodman, P. S., & Tan, S.S.-L. (2008). Contribution behaviors in distributed environments. MIS Quarterly, 32, 23–42.
Article Google Scholar
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14, 130–137.
Article Google Scholar
Ransbotham, S., Kane, G. C., & Lurie, N. H. (2012). Network characteristics and the value of collaborative user-generated content. Marketing Science, 31, 387–405.
Article Google Scholar
Ren, Y., Chen, J., & Riedl, J. (2015). The impact and evolution of group diversity in online open collaboration. Management Science, 62, 1668–1686.
Article Google Scholar
Romer, P. M. (1990). Endogenous technological change. Journal of Political Economy, 98, S71–S102.
Article Google Scholar
Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311, 854–856.
Article Google Scholar
Shah, S. K. (2006). Motivation, governance, and the viability of hybrid forms in open source software development. Management Science, 52, 1000–1014.
Article Google Scholar
Slivko, O. (2018). Online “brain gain”: Do immigrants return knowledge home? manuscript.
Sun, Y., Dong, X., & McIntyre, S. (2017). Motivation of user-generated content: Social connectedness moderates the effects of monetary rewards. Marketing Science, 36, 329–337.
Article Google Scholar
Thompson, N., & Hanley, D. (2018). Science is shaped by Wikipedia: Evidence from a randomized control trial, manuscript.
Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352.
Article Google Scholar
Westfall, P. H., & Young, S. S. (1993). Resampling-based multiple testing: Examples and methods for p-value adjustment (Vol. 279). Wiley.
Google Scholar
Xu, S. X., & Zhang, X. (2013). Impact of Wikipedia on market information environment: Evidence on management disclosure and investor reaction. MIS Quarterly, 37, 1043–1068.
Article Google Scholar
Zhang, X., & Zhu, F. (2011). Group size and incentives to contribute: A natural experiment at Chinese Wikipedia. American Economic Review, 101, 1601–1615.
Article Google Scholar
Zhu, K., Walker, D., & Muchnik, L. (2020). Content growth and attention contagion in information networks: A natural experiment on Wikipedia. Information Systems Research, 31, 491–509.
Article Google Scholar

Download references

Acknowledgements

We are grateful to John Duffy, two anonymous referees, Yan Chen, Chris Forman, Willa Friedman, Shane Greenstein, David Hugh-Jones, Tobias Kretschmer, Giovanni Mastrobuoni, Ignacio Monzón, Juan Morales, Abhishek Nagaraj, Stefan Penczynski, Imke Reimers, Stephan Seiler, Ananya Sen, Matthias Sutter, and Michael Zhang for valuable comments. We would also like to thank the seminar participants at the University of Strasbourg, the Collegio Carlo Alberto, George Mason University, ParisTech 2019, Digital Economy Workshop 2019 (Católica Lisbon), ZEW Conference on the Economics of ICT, the Advances with Field Experiments 2019 Conference (University of Chicago), SED 2021 for valuable input.

Author information

Authors and Affiliations

University of Nottingham, Nottingham, UK
Marit Hinnosaar & Toomas Hinnosaar
University of East Anglia, Norwich, UK
Michael E. Kummer
Rotterdam School of Management, Erasmus University, Rotterdam, The Netherlands
Olga Slivko
Collegio Carlo Alberto, Turin, Italy
Marit Hinnosaar
CEPR, London, UK
Marit Hinnosaar & Toomas Hinnosaar
Georgia Institute of Technology, Atlanta, USA
Michael E. Kummer
ZEW, Mannheim, Germany
Michael E. Kummer

Authors

Marit Hinnosaar
View author publications
You can also search for this author in PubMed Google Scholar
Toomas Hinnosaar
View author publications
You can also search for this author in PubMed Google Scholar
Michael E. Kummer
View author publications
You can also search for this author in PubMed Google Scholar
Olga Slivko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toomas Hinnosaar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Data and code for this article are available in Hinnosaar et al. (2021a), Harvard Dataverse, https://doi.org/10.7910/DVN/T4VFCX.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 968 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hinnosaar, M., Hinnosaar, T., Kummer, M.E. et al. Externalities in knowledge production: evidence from a randomized field experiment. Exp Econ 25, 706–733 (2022). https://doi.org/10.1007/s10683-021-09730-x

Download citation

Received: 06 May 2019
Revised: 18 August 2021
Accepted: 18 August 2021
Published: 01 September 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10683-021-09730-x

Keywords

JEL Classifications

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Externalities in knowledge production: evidence from a randomized field experiment

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Literature reviews as independent studies: guidelines for academic practice

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 968 KB)

Rights and permissions

About this article

Cite this article

Keywords

JEL Classifications

Navigation

Externalities in knowledge production: evidence from a randomized field experiment

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Literature reviews as independent studies: guidelines for academic practice

The Turing test of online reviews: Can we tell the difference between human-written and GPT-4-written online reviews?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 968 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classifications

Search

Navigation