Advertisement

Cliometrica

, Volume 13, Issue 1, pp 83–125 | Cite as

Economic history goes digital: topic modeling the Journal of Economic History

  • Lino WehrheimEmail author
Original Paper

Abstract

Digitization and computer science have established a completely new set of methods with which to analyze large collections of texts. One of these methods is particularly promising for economic historians: topic models, i.e., statistical algorithms that automatically infer the content from large collections of texts. In this article, I present an introduction to topic modeling and give an initial review of the research using topic models. I illustrate their capacity by applying them to 2675 articles published in the Journal of Economic History between 1941 and 2016. By comparing the results to traditional research on the JEH and to recent studies on the cliometric revolution, I aim to demonstrate how topic models can enrich economic historians’ methodological toolboxes.

Keywords

Economic history Topic models Latent Dirichlet allocation Cliometrics Digitization Methodology 

JEL Classification

A12 C18 N01 

Notes

Acknowledgements

I am grateful to Claude Diebolt and Michael Haupert for generously sharing their data, Robert Whaples and Ann Carlos for insights concerning the JEH, and two anonymous referees for invaluable comments and suggestions on the manuscript. I thank Manuel Burghardt for patiently answering my technical questions on topic modeling, and Mark Spoerer, Tobias Jopp, and Katrin Kandlbinder for their continued support. Finally, I am very grateful to the participants in the research seminar in economic history as well as the lecture series on Digital Humanities at Universität Regensburg.

References

  1. Abramitzky R (2015) Economics and the modern economic historian. J Econ Hist 75(4):1240–1251Google Scholar
  2. Andorfer P (2017) Turing Test für das Topic Modeling. Von Menschen und Maschinen erstellte inhaltliche Analysen der Korrespondenz von Leo von Thun-Hohenstein im Vergleich. Zeitschrift für digitale Geisteswissenschaften.  https://doi.org/10.17175/2017_002
  3. Arguing with Digital History working group Digital History and Argument. White paper, Roy Rosenzweig Center for History and New Media (13 Nov 2017). https://rrchnm.org/argument-white-paper/
  4. Arun R, Suresh V, Veni Madhavan CE, Narasimha Murthy MN (2010) On finding the natural number of topics with latent Dirichlet allocation: some observations. In: Zaki MJ, Yu JX, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining, vol 6118. Springer, BerlinGoogle Scholar
  5. Bellstam G, Sanjai B, Cookson JA (2017) A text-based analysis of corporate innovation. SSRN working paper no. 2803232, May 2017Google Scholar
  6. Blei DM (2012a) Probabilistic topic models. Commun ACM 55(4):77–84Google Scholar
  7. Blei DM (2012b) Topic modeling and digital humanities. J Digit Human 2(1):8–11Google Scholar
  8. Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, pp 113–120Google Scholar
  9. Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Statist 1(1):17–35Google Scholar
  10. Blei DM, Lafferty JD (2009) Topic models. In: Srivastava AN, Sahami M (eds) Text mining: classification, clustering, and applications. CRC Press, Boca RatonGoogle Scholar
  11. Blei D, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022Google Scholar
  12. Bonilla T, Grimmer J (2013) Elevated threat levels and decreased expectations: how democracy handles terrorist threats. Poetics 41(6):650–669Google Scholar
  13. Boyd-Graber J, Blei D (2009) Multilingual topic models for unaligned text. In: UAI ‘09 Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pp 75–82Google Scholar
  14. Boyd-Graber J, Mimno D, Newman DJ (2015) Care and feeding of topic models. In: Blei DM, Erosheva EA, Fienberg SE, Airoldi EM (eds) Handbook of mixed membership models and their applications. Taylor and Francis, Boca RatonGoogle Scholar
  15. Boyd-Graber J, Hu Y, Mimno D (2017) Applications of topic models. Foundations and Trends in Information Retrieval, BostonGoogle Scholar
  16. Burguière A (2009) The Annales school: an intellectual history. Cornell University Press, IthacaGoogle Scholar
  17. Cao J, Xia T, Li J, Zhang Y, Tang S (2009) A density-based method for adaptive LDA model selection. Neurocomputing 72:1775–1781.  https://doi.org/10.1016/j.neucom.2008.06.011 Google Scholar
  18. Chang J, Boyd-Graber J, Wang C, Gerrish S, Blei DM (2009) Reading tea leaves: how humans interpret topic models. Adv Neural Inf Process Syst 2009:288–296Google Scholar
  19. Collins WJ (2015) Looking forward: positive and normative views of economic history’s future. J Econ Hist 75(4):1228–1233Google Scholar
  20. Daniel V, Neubert M, Orban A (2018) Fictional expectations and the global media in the Greek debt crisis: a topic modeling approach. Working papers of the Priority Programme 1859 “Experience and Expectation. Historical Foundations of Economic Behaviour” No 4, Mar 2018Google Scholar
  21. Deveaud R, Sanjuan E, Bellot P (2014) Accurate and effective latent concept modeling for ad hoc information retrieval. Doc Numérique 17:61–84.  https://doi.org/10.3166/dn.17.1.61-84 Google Scholar
  22. Diebolt C, Haupert M (2018) A cliometric counterfactual: what if there had been neither Fogel nor North? Cliometrica.  https://doi.org/10.1007/s11698-017-0167-8 Google Scholar
  23. DiMaggio P, Nag M, Blei D (2013) Exploiting affinities between topic modeling and the sociological perspective on culture: application to newspaper coverage of U.S. Government arts funding. Poetics 41(6):570–606Google Scholar
  24. Eidelman V, Boyd-Graber J, Resnik P (2012) Topic models for dynamic translation model adaptation. In: ACL ‘12 proceedings of the 50th annual meeting of the association for computational linguisticsGoogle Scholar
  25. Feinerer I (2017) Introduction to the tm Package: Text Mining in R. https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf. Accessed 27 Mar 2018
  26. Fligstein N, Brundage JS, Schultz M (2017) Seeing like the fed: culture, cognition, and framing in the failure to anticipate the financial crisis of 2008. Am Sociol Rev 82(5):879–909Google Scholar
  27. Fogel R (1962) A quantitative approach to the study of railroads in American economic growth: a report of some preliminary findings. J Econ Hist 22(2):163–197Google Scholar
  28. Freeman Smith R (1963) The formation and development of the International Bankers Committee on Mexico. J Econ Hist 23(4):574–586Google Scholar
  29. García D (2013) Sentiment during recessions. J Finance 68(3):1267–1300Google Scholar
  30. Gentzkow M, Kelly BT, Taddy M (2017) Text as data. NBER working paper no. 23276, Cambridge, MA, Mar 2017Google Scholar
  31. Goodrich C (1960) Economic history: one field or two? J Econ Hist 20(4):531–538Google Scholar
  32. Graham S, Milligan I, Weingart SB (2016) Exploring big historical data: the Historian’s macroscope. Imperial College Press, LondonGoogle Scholar
  33. Grajzl P, Murrell P (2017) A structural topic model of the features and the cultural origins of Bacon’s ideas. CESifo working paper no. 6643, Oct 2017Google Scholar
  34. Griffiths TL, Steyvers M (2004) Finding scientific topics. PNAS 101(1):5228–5235Google Scholar
  35. Grimmer J (2010) A Bayesian hierarchical topic model for political texts: measuring expressed agendas in senate press releases. Polit Anal 18(1):1–35.  https://doi.org/10.1093/pan/mpp034 Google Scholar
  36. Grimmer J, Stewart BM (2013) Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit Anal 21(3):267–297.  https://doi.org/10.1093/pan/mps028 Google Scholar
  37. Grün B, Hornik K (2011) Topicmodels: an R package for fitting topic models. J Stat Softw 40(13):1–30Google Scholar
  38. Hall D, Jurafsky D, Manning CD (2008) Studying the history of ideas using topic models. In: Proceedings of the conference on empirical methods in natural language processing, pp 363–371Google Scholar
  39. Hansen S, McMahon M (2016) Shocking language: understanding the macroeconomic effects of central bank communication. J Int Econ 99(1):S114–S133Google Scholar
  40. Hansen S, McMahon M, Prat A (2018) Transparency and deliberation within the FOMC: a computational linguistics approach. Q J Econ 133:801–870Google Scholar
  41. Haupert M (2016) History of cliometrics. In: Diebolt C, Haupert M (eds) Handbook of cliometrics. Springer, BerlinGoogle Scholar
  42. Heiberger RH, Koss C (2018) Computerlinguistische Textanalyse und Debatten im Parlament: Themen und Trends im Deutschen Bundestag seit 1990. In: Brichzin J, Krichewsky D, Ringel L, Schank J (eds) Soziologie der Parlamente: Neue Wege der politischen Institutionenforschung. Springer VS, WiesbadenGoogle Scholar
  43. Hockey S (2004) The history of humanities computing. In: Schreibman S, Siemens R, Unsworth J (eds) A companion to digital humanities. Blackwell, MaldenGoogle Scholar
  44. Jacobi C, van Atteveldt W, Welbers K (2015) Quantitative analysis of large amounts of journalistic texts using topic modelling. Digit J 4(1):89–106Google Scholar
  45. Jockers ML (2013) Macroanalysis: digital methods and literary history. University of Illinois Press, UrbanaGoogle Scholar
  46. Jockers ML (2014) Text analysis with R for students of literature. Quantitative methods in the humanities and social sciences. Springer, ChamGoogle Scholar
  47. JSTOR Text analyzer. http://www.jstor.org/analyze/. Accessed 29 Mar 2018
  48. Kuznets S (1952) National income estimates for the United States prior to 1870. J Econ Hist 121(2):115–130Google Scholar
  49. Lamoreaux N (2015) The future of economic history must be interdisciplinary. J Econ Hist 75(4):1251–1257Google Scholar
  50. Larsen VH, Thorsrud LA (2015) The value of news. CAMP working paper no. 6/2015, Oslo, Oct 2015Google Scholar
  51. Larsen VH, Thorsrud LA (2017) Asset returns, news topics, and media effects. CAMP working paper no. 5/2017, Oslo, Sept 2017Google Scholar
  52. Lau JH, Grieser K, Newman DJ, Baldwin T (2011) Automatic labelling of topic models. In: ACL ‘11 Proceedings of the 49th annual meeting of the association for computational linguistics, pp 1536–1545Google Scholar
  53. Lüdering J, Tillmann P (2016) Monetary policy on Twitter and its effect on asset prices: evidence from computational text analysis. Joint discussion paper series in economics no. 12-2016, Marburg, Mar 2016Google Scholar
  54. Lüdering J, Winker P (2016) Forward or backward looking? The economic discourse and the observed reality. J Econ Stat 236(4):483–515Google Scholar
  55. Margo RA (2018) The integration of economic history into economics. Cliometrica.  https://doi.org/10.1007/s11698-018-0170-8 Google Scholar
  56. McCallum A (2002) MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu/index.php. Accessed 19 Mar 2018
  57. McCloskey D (1976) Does the past have useful economics. J Econ Lit 14(2):434–461Google Scholar
  58. McCloskey D (1978) The achievements of the cliometrics school. J Econ Hist 38(1):13–28Google Scholar
  59. McCloskey D (1987) Econometric history. Studies in economic and social history. Palgrave, BasingstokeGoogle Scholar
  60. McFarland DA, Ramage D, Chuang J, Heer J, Manning CD, Jurafsky D (2013) Differentiating language usage through topic models. Poetics 41(6):607–625Google Scholar
  61. Meeks E, Weingart SB (2012) The digital humanities contribution to topic modeling. J Digit Human 2(1):2–6Google Scholar
  62. Miller IM (2013) Rebellion, crime and violence in Qing China, 1722–1911: a topic modeling approach. Poetics 41(6):626–649Google Scholar
  63. Mimno D (2012a) Computational historiography: data mining in a century of classics journals. ACM J Comput Cult Herit 5(1):1–19Google Scholar
  64. Mimno D (2012b) Lecture held at the Maryland Institute for technology in the humanities (topic modeling workshop). https://vimeo.com/53080123. Accessed 19 Mar 2018
  65. Mimno D, Wallach HM, Naradowsky J, Smith DA, McCallum A (2009) Polylingual topic models. EMNLP 2009:880–889Google Scholar
  66. Miner G (2012) Practical text mining and statistical analysis for non-structured text data applications. Elsevier/Academic Press, AmsterdamGoogle Scholar
  67. Mitchener KJ (2015) The 4D future of economic history: digitally-driven data design. J Econ Hist 75(4):1234–1239Google Scholar
  68. Mohr JW, Bogdanov P (2013) Introduction—topic models: what they are and why they matter. Poetics 41(6):545–569Google Scholar
  69. Moretti F (2013) Distant reading. Verso, London, New YorkGoogle Scholar
  70. Nelson RK mining the dispatch: digital Scholarship Lab, University of Richmond. http://dsl.richmond.edu/dispatch/pages/home. Accessed 19 Mar 2018
  71. Newman DJ, Block S (2006) Probabilistic topic decomposition of an eighteen-century American newspaper. J Am Soc Inform Sci Technol 57(6):753–767Google Scholar
  72. Nguyen TH, Shirai K (2015) Topic modeling based sentiment analysis on social media for stock market prediction. In: Proceedings of the 53rd annual meeting of the association for computational linguistics, pp 1354–1364Google Scholar
  73. Nikita M (2016) ldatuning (R package). https://cran.r-project.org/web/packages/ldatuning/ldatuning.pdf. Accessed 19 Mar 2018
  74. Oswald ME, Grosjean S (2004) Confirmation bias. In: Pohl R (ed) Cognitive illusions: a handbook on fallacies and biases in thinking, judgement and memory, 1st edn. Psychology Press, HoveGoogle Scholar
  75. Quinn KM, Monroe BL, Colaresi M, Crespin MH, Radev DR (2010) How to analyze political attention with minimal assumptions and costs. Am J Polit Sci 54(1):209–228Google Scholar
  76. Riddell AB (2014) How to read 22,198 Journal Articles: studying the history of German studies with topic models. In: Erlin M, Tatlock L (eds) Distant readings: topologies of German culture in the long nineteenth century. Boydell & Brewer, SuffolkGoogle Scholar
  77. Schofield A, Magnusson M, Mimno D (2017) Pulling out the stops: rethinking stopword removal for topic models. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics, pp 432–436Google Scholar
  78. Shirota Y, Hashimoto T, Sakura T (2015) Topic extraction analysis for monetary policy minutes of Japan in 2014: effects of the consumption tax hike in April. In: Perner P (ed) Advances in data mining: applications and theoretical aspects. Springer, ChamGoogle Scholar
  79. Steyvers M, Griffiths T (2007) Probabilistic topic models. In: Landauer TK, McNamara DS, Dennis S, Kintsch W (eds) Handbook of latent semantic analysis. Taylor and Francis, HobokenGoogle Scholar
  80. Tetlock PC (2007) Giving content to investor sentiment: the role of media in the stock market. J Finance 62(3):1139–1168Google Scholar
  81. Thorsrud LA (2016a) Nowcasting using news topics. Big data versus big bank. Norges Bank working paper 20/2016, Oslo, Dec 2016Google Scholar
  82. Thorsrud LA (2016b) Words are the new numbers: a newsy coincident index of business cycles. Norges Bank working paper 21/2016, Oslo, Dec 2016Google Scholar
  83. Underwood T (2018) The stone and the shell (blog). https://tedunderwood.com/. Accessed 19 Mar 2018
  84. Walker DD, Lund WB (2010) Evaluating models of latent document semantics in the presence of OCR errors. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 240–250Google Scholar
  85. Wallach HM (2006) Topic modeling: beyond bag of words. In: Proceedings of the 23rd international conference on machine learning, pp 977–987Google Scholar
  86. Wallach HM, Mimno D, McCallum A (2009) Rethinking LDA: why priors matter. Adv Neural Inf Process Syst 22:1973–1981Google Scholar
  87. Walters PG, Walters R (1944) The American career of David Parish. J Econ Hist 2(2):149–166Google Scholar
  88. Weingart SB (2018) The scottbot irregular (blog). http://www.scottbot.net/HIAL/index.html@p=19113.html. Accessed 19 Mar 2018
  89. Whaples R (1991) A quantitative history of the journal of economic history and the cliometric revolution. J Econ Hist 51(2):289–301Google Scholar
  90. Whaples R (2002) The supply and demand of economic history: recent trends in the journal of economic history. J Econ Hist 62(2):524–532Google Scholar
  91. Yang T-I, Torget AJ, Mihalcea R (2011) Topic modeling on historical newspapers. In: Proceedings of the 5th ACL-HLT workshop on language technology for cultural heritage, social sciences, and humanities, pp 96–104Google Scholar
  92. Zhao B, Xing EP (2007) HM-BiTAM: bilingual topic exploration, word alignment, and translation. In: NIPS’07 Proceedings of the 20th international conference on neural information processing systems, pp 1689–1696Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of History, Economic and Social History, Department of EconomicsUniversity of RegensburgRegensburgGermany

Personalised recommendations