Skip to main content

An Exploration of Wikipedia Data as a Measure of Regional Knowledge Distribution

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10540))

Abstract

In today’s economies, knowledge is the key ingredient for prosperity. However, it is hard to measure this intangible asset appropriately. Standard economic models mostly rely on common measures such as enrollment rates and international test scores. However, these proxies focus rather on the quality of education of pupils than on the distribution of knowledge among the whole population, which is increasingly defined by alternative sources of education such as online learning platforms. As a consequence, the economically relevant stock of knowledge in a region is only roughly approximated. Furthermore, they are abstract in content, and both capital-, and time-consuming in census. This paper proposes to explore Wikipedia data as an alternative source of capturing the knowledge distribution on a narrow geographical scale. Wikipedia is by far the largest digital encyclopedia worldwide and provides data on usage and editing publicly. We compare Wikipedia usage worldwide and edits in the U.S. to existing measures of the acquisition and stock of knowledge. The results indicate that there is a significant correlation between Wikipedia interactions and knowledge approximations on different geographical scales. Considering these results, it seems promising to further explore Wikipedia data to develop a reliable, inexpensive, and real-time proxy of knowledge distribution around the world.

F. Stephany and F. Braesemann—Both authors contributed equally to this work.

This is a preview of subscription content, log in via an institution.

Notes

  1. 1.

    Wikipedia provides over 40 million articles in 250 languages worldwide and is ranked among the top-ten most popular websites, see [3].

  2. 2.

    According the [3] the English language version alone has more than 30 million registered editors and additionally a large number of not-registered editors.

  3. 3.

    http://bit.ly/PageViewsPerCountry.

  4. 4.

    http://bit.ly/WikiSQL.

  5. 5.

    Visualiszations of these information geographies can be found here: [18, 19].

  6. 6.

    http://data.worldbank.org/.

  7. 7.

    http://bit.ly/PageViewsPerCountry.

  8. 8.

    Thus we include all articles that link to the category “Computer Science” itself (level 1), to the subcategories that link to the “Computer Science” (level 2) and to their subcategories (level 3). This number of layers has been chosen to avoid the collection of edit data that are only very weakly related to computer science.

  9. 9.

    http://bit.ly/WikiSQL.

  10. 10.

    An interactive version of the map can be accessed via the online dashboard that provides supplementary information to this article: http://bit.ly/Wiki_Dashboard.

  11. 11.

    The data stems from the U.S. census: https://www.census.gov/.

  12. 12.

    Data on the number of students stems from http://www.stateuniversity.com/. City-level covariates are collected from http://www.city-data.com// and a list of academic computer science departments is available on Wikipedia: http://bit.ly/CS_Departments.

References

  1. Benos, N., Zotou, S.: Education and economic growth: a meta-regression analysis. World Dev. 64, 669–689 (2014). doi:10.1016/j.worlddev.2014.06.034

    Article  Google Scholar 

  2. Mayer-Schönberger, V., Cukier, K.: Learning with Big Data: The Future of Education. Houghton Mifflin Harcourt, New York (2014)

    Google Scholar 

  3. Wikimedia, Wikipedia (2017). https://en.wikipedia.org/wiki/Wikipedia

  4. Moy, C.L., Locke, J.R., Coppola, B.P., McNeil, A.J.: Improving science education and understanding through editing Wikipedia. J. Chem. Educ. 87(11), 1159–1162 (2010). doi:10.1021/ed100367v

    Article  Google Scholar 

  5. Ebner, M., Kickmeier-Rust, M., Holzinger, A.: Utilizing Wiki-Systems in higher education classes: a chance for universal access? Univ. Access Inf. Soc. 7(4), 199 (2008). doi:10.1007/s10209-008-0115-2

    Article  Google Scholar 

  6. Cain, J., Fox, B.I.: Web 2.0 and pharmacy education. Am. J. Pharm. Educ. 73(7), 120 (2009). doi:10.5688/aj7307120

    Article  Google Scholar 

  7. Collier, B., Bear, J.: Conflict, criticism, or confidence: an empirical examination of the gender gap in Wikipedia contributions. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, pp. 383–392. ACM (2012)

    Google Scholar 

  8. Gloor, P., De Boer, P., Lo, W., Wagner, S., Nemoto, K., Fuehres, H.: Cultural Anthropology Through the Lens of Wikipedia-A Comparison of Historical Leadership Networks in the English, Chinese, Japanese and German Wikipedia, arXiv preprint arXiv:1502.05256

  9. Eom, Y.-H., Aragón, P., Laniado, D., Kaltenbrunner, A., Vigna, S., Shepelyansky, D.L.: Interactions of cultures and top people of Wikipedia from ranking of 24 language editions. PloS one 10(3), e0114825 (2015)

    Article  Google Scholar 

  10. Laufer, P., Wagner, C., Flöck, F., Strohmaier, M.: Mining cross-cultural relations from Wikipedia: a study of 31 European food cultures. In: Proceedings of the ACM Web Science Conference, p. 3. ACM (2015)

    Google Scholar 

  11. Ronen, S., Gonçalves, B., Hu, K.Z., Vespignani, A., Pinker, S., Hidalgo, C.A.: Links that speak: the global language network and its association with global fame. Proc. Nat. Acad. Sci. 111(52), E5616–E5622 (2014)

    Article  Google Scholar 

  12. Yasseri, T., Spoerri, A., Graham, M., Kertész, J.: The most controversial topics in Wikipedia: a multilingual and geographical analysis. arXiv:1305.5566 [physics]

  13. Borra, E., Weltevrede, E., Ciuccarelli, P., Kaltenbrunner, A., Laniado, D., Magni, G., Mauri, M., Rogers, R., Venturini, T.: Societal controversies in Wikipedia articles. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 193–196. ACM (2015)

    Google Scholar 

  14. Graham, M., Straumann, R.K., Hogan, B.: Digital divisions of labor and informational magnetism: mapping participation in Wikipedia. Ann. Assoc. Am. Geogr. 105(6), 158–1178 (2015). doi:10.1080/00045608.2015.1072791

    Article  Google Scholar 

  15. Graham, M., Hogan, B., Straumann, R.K., Medhat, A.: Uneven geographies of user-generated information: patterns of increasing informational poverty. Ann. Assoc. Am. Geogr. 104(4), 746–764 (2014)

    Article  Google Scholar 

  16. Graham, M., De Sabbata, S., Zook, M.A.: Towards a study of information geographies: (im) mutable augmentations and a mapping of the geographies of information. Geo: Geogr. Environ. 2(1), 88–105 (2015)

    Google Scholar 

  17. Hardy, D., Frew, J., Goodchild, M.F.: Volunteered geographic information production as a spatial process. Int. J. Geogr. Inf. Sci. 26(7), 1191–1212 (2012)

    Article  Google Scholar 

  18. Graham, M., De Sabbata, S.: Information Geographies at the Oxford Internet Institute (2014). http://geography.oii.ox.ac.uk/

  19. Liao, H.-T., Hogan, B., Graham, M., Hale, S.A., Ford, H.: Wikipedia’s Networks and Geographies: Representation and Power in Peer-Produced Content (2010). https://www.oii.ox.ac.uk/research/projects/wikipedias-networks-and-geographies/

Download references

Acknowledgements

The authors are thankful for the feedback this work received on the brown-bag seminar at the Oxford Internet Institute and the SIS Statistics and Data Science conference in Florence, both taken place in June 2017. Particularly helpful comments have been made by Scott Hale, Otto Kässi, and Taha Yasseri.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabian Stephany .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Stephany, F., Braesemann, F. (2017). An Exploration of Wikipedia Data as a Measure of Regional Knowledge Distribution. In: Ciampaglia, G., Mashhadi, A., Yasseri, T. (eds) Social Informatics. SocInfo 2017. Lecture Notes in Computer Science(), vol 10540. Springer, Cham. https://doi.org/10.1007/978-3-319-67256-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67256-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67255-7

  • Online ISBN: 978-3-319-67256-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics