An Exploration of Wikipedia Data as a Measure of Regional Knowledge Distribution
In today’s economies, knowledge is the key ingredient for prosperity. However, it is hard to measure this intangible asset appropriately. Standard economic models mostly rely on common measures such as enrollment rates and international test scores. However, these proxies focus rather on the quality of education of pupils than on the distribution of knowledge among the whole population, which is increasingly defined by alternative sources of education such as online learning platforms. As a consequence, the economically relevant stock of knowledge in a region is only roughly approximated. Furthermore, they are abstract in content, and both capital-, and time-consuming in census. This paper proposes to explore Wikipedia data as an alternative source of capturing the knowledge distribution on a narrow geographical scale. Wikipedia is by far the largest digital encyclopedia worldwide and provides data on usage and editing publicly. We compare Wikipedia usage worldwide and edits in the U.S. to existing measures of the acquisition and stock of knowledge. The results indicate that there is a significant correlation between Wikipedia interactions and knowledge approximations on different geographical scales. Considering these results, it seems promising to further explore Wikipedia data to develop a reliable, inexpensive, and real-time proxy of knowledge distribution around the world.
KeywordsMining of big social data Wikipedia Knowledge geographies
JEL ClassificationC 55 C 82 I 21
The authors are thankful for the feedback this work received on the brown-bag seminar at the Oxford Internet Institute and the SIS Statistics and Data Science conference in Florence, both taken place in June 2017. Particularly helpful comments have been made by Scott Hale, Otto Kässi, and Taha Yasseri.
- 2.Mayer-Schönberger, V., Cukier, K.: Learning with Big Data: The Future of Education. Houghton Mifflin Harcourt, New York (2014)Google Scholar
- 3.Wikimedia, Wikipedia (2017). https://en.wikipedia.org/wiki/Wikipedia
- 7.Collier, B., Bear, J.: Conflict, criticism, or confidence: an empirical examination of the gender gap in Wikipedia contributions. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, pp. 383–392. ACM (2012)Google Scholar
- 8.Gloor, P., De Boer, P., Lo, W., Wagner, S., Nemoto, K., Fuehres, H.: Cultural Anthropology Through the Lens of Wikipedia-A Comparison of Historical Leadership Networks in the English, Chinese, Japanese and German Wikipedia, arXiv preprint arXiv:1502.05256
- 10.Laufer, P., Wagner, C., Flöck, F., Strohmaier, M.: Mining cross-cultural relations from Wikipedia: a study of 31 European food cultures. In: Proceedings of the ACM Web Science Conference, p. 3. ACM (2015)Google Scholar
- 12.Yasseri, T., Spoerri, A., Graham, M., Kertész, J.: The most controversial topics in Wikipedia: a multilingual and geographical analysis. arXiv:1305.5566 [physics]
- 13.Borra, E., Weltevrede, E., Ciuccarelli, P., Kaltenbrunner, A., Laniado, D., Magni, G., Mauri, M., Rogers, R., Venturini, T.: Societal controversies in Wikipedia articles. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 193–196. ACM (2015)Google Scholar
- 16.Graham, M., De Sabbata, S., Zook, M.A.: Towards a study of information geographies: (im) mutable augmentations and a mapping of the geographies of information. Geo: Geogr. Environ. 2(1), 88–105 (2015)Google Scholar
- 18.Graham, M., De Sabbata, S.: Information Geographies at the Oxford Internet Institute (2014). http://geography.oii.ox.ac.uk/
- 19.Liao, H.-T., Hogan, B., Graham, M., Hale, S.A., Ford, H.: Wikipedia’s Networks and Geographies: Representation and Power in Peer-Produced Content (2010). https://www.oii.ox.ac.uk/research/projects/wikipedias-networks-and-geographies/