Advertisement

DataGorri: a tool for automated data collection of tabular web content

  • Julian Hackinger
Article

Abstract

The era of the internet has been a boon for empirical and evidence-based research. By providing ever increasing amounts of data, the internet offers numerous opportunities for new empirical studies. While some research questions require data that was previously more time-consuming to collect, other data was simply not available before the creation of the internet. However, publicly available information is still often unstructured and its collection can be highly resource-intensive. In this paper we present DataGorri, a software enabling the user-friendly and automated collection of repetitive and non-repetitive tabular data that is freely available on websites. This paper depicts the motivation underlying the software’s creation, describes its usage, and discusses its advantages and limitations.

Keywords

Software DataGorri Web scraper Data scraper Crawler Data collection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

We would like to thank everyone who has contributed to current or previous versions of DataGorri: Ivaylo Dimitrov, Matthias Franze, Stefan Hentschel, Lukas Holzner, Florian Kreitmair, Daniel Krieger, Michael Legenc, and Marc Müller. A list of DataGorri’s developers and contributors can also be found at https://www.julianhackinger.com/software/datagorri/. Furthermore, we thank Christian Feilcke and Miriam Leidinger, and two anonymous reviewers for comments, and Alexander Schlimm for research assistance.

References

  1. 1.
    Abramo, G., Cicero, T., D’Angelo, C.A. (2012). Revisiting size effects in higher education research productivity. Higher Education, 63(6), 701–717.CrossRefGoogle Scholar
  2. 2.
    Edelman, B. (2012). Using internet data for economic research. Journal of Economic Perspectives, 26(2), 189–206.CrossRefGoogle Scholar
  3. 3.
    Einav, L., & Levin, J. (2014a). The data revolution and economic analysis. Innovation Policy and the Economy, 14(1), 1–24.CrossRefGoogle Scholar
  4. 4.
    Einav, L., & Levin, J. (2014b). Economics in the age of big data. Science, 346(6210), 1243089.CrossRefGoogle Scholar
  5. 5.
    Faria, J.R., & Goel, R.K. (2010). Returns to networking in academia. Netnomics, 11(2), 103–117.CrossRefGoogle Scholar
  6. 6.
    Golden, J., & Carstensen, F.V. (1992a). Academic research productivity, department size and organization: Further results, comment. Economics of Education Review, 11(2), 153–160.CrossRefGoogle Scholar
  7. 7.
    Golden, J., & Carstensen, F.V. (1992b). Academic research productivity, department size and organization: Further results, rejoinder. Economics of Education Review, 11(2), 169–171.CrossRefGoogle Scholar
  8. 8.
    Hamermesh, D.S. (2013). Six decades of top economics publishing: Who and how? Journal of Economic Literature, 51(1), 162–172.CrossRefGoogle Scholar
  9. 9.
    Jordan, J.M., Meador, M., Walters, S.J. (1988). Effects of department size and organization on the research productivity of academic economists. Economics of Education Review, 7(2), 251–255.CrossRefGoogle Scholar
  10. 10.
    Jordan, J.M., Meador, M., Walters, S.J. (1989). Academic research productivity, department size and organization: Further results. Economics of Education Review, 8(4), 345–352.CrossRefGoogle Scholar
  11. 11.
    Meador, M., Walters, S.J., Jordan, J.M. (1992). Academic research productivity: Reply, still further results. Economics of Education Review, 11(2), 161–167.CrossRefGoogle Scholar
  12. 12.
    Netcraft. (2018). August 2018 web server survey. https://news.netcraft.com/archives/2018/08/24/august-2018-web-server-survey.html. Accessed: 03 September 2018.
  13. 13.
    Samuelson, P.A., & Nordhaus, W.D. (1998). Economics, 16th edition. Boston: Irwin/McGraw-Hill.Google Scholar
  14. 14.
    Wuchty, S., Jones, B.F., Uzzi, B. (2007). The increasing dominance of teams in production of knowledge. Science, 316(5827), 1036–1039.CrossRefGoogle Scholar
  15. 15.
    Zimmermann, C. (2013). Academic rankings with RePEc. Econometrics, 1(3), 249–280.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Chair of EconomicsTechnical University of MunichMunichGermany

Personalised recommendations