Skip to main content

Digital Methods in Economic History: The Case of Computational Text Analysis

  • Reference work entry
  • First Online:
Handbook of Cliometrics
  • 60 Accesses

Abstract

In the last two decades, there has been a considerable increase in the supply of digital resources available to economic historians. At the same time, scholars have started to use innovative methods and technologies to study these digital sources. In this chapter, I will focus on one of these approaches – computational text analysis (CTA), also known as text mining – that has a great potential for economic historians. Firstly, I will provide an overview of examples of CTA that are relevant to economic historians, illustrating certain trends that have emerged so far. Secondly, to give a hands-on example of this kind of approach, I conduct a case study in which I apply a certain type of CTA, that is, topic-modelling, to a corpus of more than 17,000 research articles published in ten international economics and economic history journals since 1949. Covering flagship journals that represent the wide range of both fields, such as The American Economic Review, The Economic History Review, The Journal of Economic History, and The Journal of Economic Literature, I quantitatively compare the similarity of economics and economic history in terms of their research topics. Finally, I give a brief outlook on digital methods beyond the limits of CTA as well as some general reflections on the use of digital methods in our field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    This might be the most fundamental difference to the “traditional” humanities with their expertise in dealing with ambiguities. See also the final section.

  2. 2.

    An exception to this observation can be found in the series “Current Research in Digital History” published by the Roy Rosenzweig Center for History and New Media. See https://crdh.rrchnm.org (last access July 12, 2022).

  3. 3.

    Ash and Hansen (2023) provide a similar, though more technical, review article on methods for text analysis in economics. Other helpful introductions to textual analysis are provided by Gentzkow et al. (2019), Grimmer and Stewart (2013), Grimmer et al. (2022).

  4. 4.

    An example of studies that quantify textual sources without the help of digital tools can be found in Whaples (1991). Contrary to the Social Sciences, quantitative text analysis by means of coding-books, which confusingly is sometimes called “Qualitative Text Analysis,” does not seem to have been particularly common in economic history.

  5. 5.

    The most popular model, Latent Dirichlet Allocation (LDA), was introduced by Blei et al. (2003). Readers who are interested in the details of topic modelling are recommended to the introductions by Blei (2012) and Wehrheim (2019a).

  6. 6.

    The difference between pre-defined categorization scheme and categorization by means of a topic model will be further addressed in the section “Outlook and Conclusion.”

  7. 7.

    It goes without saying that topic modelling also requires human intervention, regarding, e.g., data selection, model specification, and evaluation. The central feature of topic models is that the two crucial steps of category building and classification are performed solely by the algorithm.

  8. 8.

    Topic models can be useful also for data collection. For example, the model created for this chapter contains a topic related to Germany (topic 12). I used this topic to identify relevant articles for the chapter “Cliometrics and the Study of German History” by Tobias Jopp and Mark Spoerer in this volume.

  9. 9.

    In the following section, I provide an overview of studies that apply some sort of computational text analysis and that are of potential interest to economic historians. Some of these studies do not address an EH question in the strict sense but still might be useful to future economic historians.

  10. 10.

    See Nicholson for an account on how the digital turn has affected the use of newspapers as a historiographic source.

  11. 11.

    This particularly concerns newspapers published after World War II for which access can be quite expensive if one is interested in full texts, which are necessary for applying methods such as topic modelling.

  12. 12.

    The preprocessing procedure and the model specifications are documented in the Online Appendix.

  13. 13.

    Diverting from this rule, topic 22 was classified as neutral due to its high topic share and its idiosyncratic development (see Online Appendix). This is of course a static picture as topics are defined as EH/Econ only once for the whole period. As we will see below, the affiliation of a topic can change over time.

  14. 14.

    Cf. https://www.aeaweb.org/articles?id=10.1257/000282803321455250 (last access July 15, 2022). It could be revealing to measure the overlap between the classification based on the topic model and JEL-codes. Due to a lack of data, this evaluation could not be carried out yet.

  15. 15.

    As a side note, the JEL-codes of this paper do not contain a N-code. One could argue that papers published in EEH all have an N-code by default, so assigning one explicitly is unnecessary. On the other hand, there are papers published in EEH that come with an explicit N-code. Others, again, come without any JEL-code.

  16. 16.

    As topic modelling is a probabilistic process, one should be eminently cautious in using the data produced by a topic model as input for further quantitative analysis. Repeatedly running the same model with the same parameters yields different results. Also, the results are highly dependable on the parameters, that is, the number of topics, the number of iterations, the hyperparameters alpha and beta, the seed-value, and finally, the preprocessing.

  17. 17.

    That is, the sum of the top five topics per year. Note that these actual five topics vary between years.

  18. 18.

    These thoughts result from my own experience in an interdisciplinary DH project (https://media-sentiment.uni-leipzig.de) as well as on conversations with colleagues with similar experiences.

  19. 19.

    See Ash and Hansen (2023) for a discussion of the use of textual data as input for econometric models.

  20. 20.

    Grimmer et al. (2022), see text analysis as an augmentation for human readers.

References

  • Abramitzky R (2015) Economics and the modern economic historian. J Econ Hist 75:1240–1251

    Article  Google Scholar 

  • Abramitzky R, Boustan L, Eriksson K et al (2021) Automated linking of historical data. J Econ Lit 59:865–918

    Article  Google Scholar 

  • Ambrosino A, Cedrini M, Davis JB et al (2018) What topic modeling could reveal about the evolution of economics. J Econ Methodol 25:329–348

    Article  Google Scholar 

  • Annaert J, Mensah L (2014) Cross-sectional predictability of stock returns, evidence from the 19th century Brussels Stock Exchange (1873–1914). Explor Econ Hist 52:22–43

    Article  Google Scholar 

  • Ash E, Hansen S (2023) Text algorithms in economics. Annu Rev Econ 15

    Google Scholar 

  • Ballandonne M, Cersosimo I (2023) Toward a “text as data” approach in the history and methodology of economics: an application to Adam Smith’s classics. J Hist Econ Thought 45

    Google Scholar 

  • Bellstam G, Bhagat S, Cookson JA (2021) A text-based analysis of corporate innovation. Manag Sci 67:4004–4031

    Article  Google Scholar 

  • Blaydes L, Grimmer J, McQueen A (2018) Mirrors for princes and sultans: advice on the art of governance in the medieval Christian and Islamic worlds. J Polit 80:1150–1167

    Article  Google Scholar 

  • Blei DM (2012) Probabilistic topic models. Commun ACM 55:77–84

    Article  Google Scholar 

  • Blei D, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  • Blomqvist C, Enflo K, Jakobsson A, Åström K (2023) Reading the Ransom: methodological advancements in extracting the Swedish Wealth Tax of 1571. Explor Econ Hist 87

    Google Scholar 

  • Cherrier B (2017) Classifying economics: a history of the JEL codes. J Econ Lit 55:545–579

    Article  Google Scholar 

  • Cioni M, Federico G, Vasta M (2020) The long-term evolution of economic history: evidence from the top five field journals (1927–2017). Cliometrica 14:1–39

    Article  Google Scholar 

  • Cioni M, Federico G, Vasta M (2022) Is economic history changing its nature? Evidence from top journals. Cliometrica 17:23–48. (Online First)

    Article  Google Scholar 

  • Combes P-P, Gobillon L, Zylberberg Y (2022) Urban economics in a historical perspective: recovering data with machine learning. Reg Sci Urban Econ 94:103711

    Article  Google Scholar 

  • Daniel V, ter Steege L (2020) Inflation expectations and the recovery from the Great Depression in Germany. Explor Econ Hist 75:101305

    Article  Google Scholar 

  • Daniel V, Neubert M, Orban A (2018) Fictional expectations and the global media in the Greek debt crisis: a topic modeling approach. Jahrbuch für Wirtschaftsgeschichte 59:525–566

    Article  Google Scholar 

  • Diaf S, Döpke J, Fritsche U, Rockenbach I (2022) Sharks and minnows in a shoal of words: measuring latent ideological positions based on text mining techniques. Eur J Polit Econ 75:102179

    Article  Google Scholar 

  • Diebolt C (2016) Cliometrica after 10 years: definition and principles of cliometric research. Cliometrica 10:1–4

    Article  Google Scholar 

  • Diebolt C, Haupert M (2019) We are Ninjas: how economic history has infiltrated economics. Sartoniana 32:197–221

    Google Scholar 

  • Diebolt C, Haupert M (2022) Cliometrics and the future of economic history. Essays Econ Bus Hist 40:1–20

    Google Scholar 

  • Ellingsen J, Larsen VH, Thorsrud LA (2022) News media versus FRED-MD for macroeconomic forecasting. J Appl Econ 37:63–81

    Article  Google Scholar 

  • Esteves R, Geisler Mesevage G (2019) Social networks in economic history: opportunities and challenges. Explor Econ Hist 74:101299

    Article  Google Scholar 

  • Ferguson-Cradler G (2021) Narrative and computational text analysis in business and economic history. Scand Econ Hist Rev 71:1–25

    Google Scholar 

  • Fernández-de-Pinedo N, La Parra-Perez A, Muñoz F-F (2022) Recent trends in publications of economic historians in Europe and North America (1980–2019): an empirical analysis. Cliometrica 17:1–22

    Article  Google Scholar 

  • Fickers A, van der Heijden T (2020) Inside the trading zone: thinkering in a digital history lab. Digit Hum Q 14

    Google Scholar 

  • Fligstein N, Brundage JS, Schultz M (2017) Seeing like the fed: culture, cognition, and framing in the failure to anticipate the financial crisis of 2008. Am Sociol Rev 82:879–909

    Article  Google Scholar 

  • Frydman R, Mangee N, Stillwagon J (2021) How market sentiment drives forecasts of stock returns. J Behav Financ 22:351–367

    Article  Google Scholar 

  • Gentzkow M, Kelly B, Taddy M (2019) Text as data. J Econ Lit 57:535–574

    Article  Google Scholar 

  • Grajzl P, Murrell P (2021) Characterizing a legal–intellectual culture: Bacon, Coke, and seventeenth-century England. Cliometrica 15:43–88

    Article  Google Scholar 

  • Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci U S A 101:5228–5235

    Article  Google Scholar 

  • Grimmer J, Stewart BM (2013) Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit Anal 21:267–297

    Article  Google Scholar 

  • Grimmer J, Roberts ME, Stewart BM (2022) Text as data: a new framework for machine learning and the social sciences. Princeton University Press, Princeton

    Google Scholar 

  • Guldi J (2019) Parliament’s debates about infrastructure: an exercise in using dynamic topic models to synthesize historical change. Technol Cult 60:1–33

    Article  Google Scholar 

  • Håkansson PG, Karlsson T, La Mela M (2022) Running out of time: using job ads to analyse the demand for messengers in the twentieth century. Scand Econ Hist Rev:1–20. (Online First)

    Google Scholar 

  • Hanna AJ, Turner JD, Walker CB (2020) News media and investor sentiment during bull and bear markets. Eur J Financ 26:1377–1395

    Article  Google Scholar 

  • Hansen S, McMahon M, Prat A (2018) Transparency and deliberation within the FOMC: a computational linguistics approach. Q J Econ 133:801–870

    Article  Google Scholar 

  • Harris C, Myers A, Briol C, Carlen S (2022) The binding force of economics. In: D’Amico DJ, Martin AG (eds) Contemporary methods and Austrian economics. pp 69–103

    Chapter  Google Scholar 

  • Hayo B, Henseler K, Steffen Rapp M, Zahner J (2022) Complexity of ECB communication and financial market trading. J Int Money Financ 128:102709

    Article  Google Scholar 

  • Heyer G (2009) Introduction to TMS 2009. In: Heyer G (ed) Text mining services. Leipzig, pp 1–14

    Google Scholar 

  • Jacobi C, van Atteveldt W, Welbers K (2015) Quantitative analysis of large amounts of journalistic texts using topic modelling. Digit Journal 4:89–106

    Article  Google Scholar 

  • Jaremski M (2020) Today’s economic history and tomorrow’s scholars. Cliometrica 14:169–180

    Article  Google Scholar 

  • Kabiri A, James H, Landon-Lane J, Nyman R (2022) The role of sentiment in the economy of the 1920s. Econ Hist Rev 76:3–30. (Online First)

    Google Scholar 

  • Komlos J (2003) Access to food and the biological standard of living: perspectives on the nutritional status of Native Americans. Am Econ Rev 93:252–255

    Article  Google Scholar 

  • Kronenberg C (2021) A new measure of 19th century US suicides. Soc Indic Res 157:803–815

    Article  Google Scholar 

  • Küsters A (2022) Applying lessons from the past? Exploring historical analogies in ECB speeches through text mining, 1997–2019. Int J Cent Bank 18:277–329

    Google Scholar 

  • La Mela M (2020) Tracing the emergence of Nordic allemansrätten through digitised parliamentary sources. In: Fridlund M, Oiva M, Paju P (eds) Digital histories: emergent approaches within the new digital history. Helsinki University Press, Helsinki, pp 181–197

    Chapter  Google Scholar 

  • La Parra-Perez A, Muñoz F-F, Fernandez-de-Pinedo N (2022) EconHist: a relational database for analyzing the evolution of economic history (1980–2019). Hist Methods J Quant Interdiscip Hist 55:45–60

    Article  Google Scholar 

  • Lack P (2021) Using word analysis to track the evolution of emotional well-being in nineteenth-century industrializing Britain. Hist Methods J Quant Interdiscip Hist 54:228–247

    Article  Google Scholar 

  • Lässig S (2021) Digital history: challenges and opportunities for the profession. Gesch Ges 47:5–34

    Google Scholar 

  • Lehenmeier C, Burghardt M, Mischka B (2020) Layout detection and table recognition – recent challenges in digitizing historical documents and handwritten tabular data. In: Hall M, Merčun T, Risse T, Duchateau F (eds) Digital libraries for open knowledge. Springer Cham, Cham, pp 229–242

    Google Scholar 

  • Lennard J (2020) Uncertainty and the great slump. Econ Hist Rev 73:844–867

    Article  Google Scholar 

  • Liebl B, Burghardt M (2020) From historical newspapers to machine-readable data: the origami OCR pipeline. In: Proceedings of the 1st workshop on computational humanities research (CHR)

    Google Scholar 

  • Lüdering J, Winker P (2016) Forward or backward looking? The economic discourse and the observed reality. J Econ Stat 236:483–515

    Google Scholar 

  • Marjanen J (2021) National sentiment: nation building and emotional language in nineteenth-century Finland. In: Kivimäki V, Suodenjoki S, Vahtikari T (eds) Lived nation as the history of experiences and emotions in Finland, 1800–2000. Palgrave Macmillan Cham, Cham, pp 61–83

    Google Scholar 

  • Merchant Klancher E, Alexander CS (2022) U.S. demography in transition. Hist Methods J Quant Interdiscip Hist 55:1–21

    Google Scholar 

  • Miller IM (2013) Rebellion, crime and violence in Qing China, 1722–1911: a topic modeling approach. Poetics 41:626–649

    Article  Google Scholar 

  • Mitchener KJ (2015) The 4D future of economic history: digitally-driven data design. J Econ Hist 75:1234–1239

    Article  Google Scholar 

  • Moretti F (2013) Distant reading. Verso Books, London/New York

    Google Scholar 

  • Pablo-Martí F, Alañón-Pardo Á, Sánchez A (2021) Complex networks to understand the past: the case of roads in Bourbon Spain. Cliometrica 15:477–534

    Article  Google Scholar 

  • Price J, Buckles K, Van Leeuwen J, Riley I (2021) Combining family history and machine learning to link historical records: the Census Tree data set. Explor Econ Hist 80:101391

    Article  Google Scholar 

  • Ros R, van Erp M, Rijpma H, Zijdeman R (2020) Mining wages in nineteenth-century job advertisements. The application of language resources and language technology to study economic and social inequality. Proceedings of LR4SSHOC: workshop about language resources for the SSH Cloud, pp 27–32

    Google Scholar 

  • Rosenzweig R (2003) Scarcity or abundance? preserving the past in a digital era. Am Hist Rev 108:735–762

    Article  Google Scholar 

  • Salmi H (2021) What is digital history? Wiley & Sons, Medford

    Google Scholar 

  • Seefeldt D, Thomas WG (2009) What is digital history? Perspect Hist 47

    Google Scholar 

  • Shen Z, Zhang K, Dell M (2020) A large dataset of historical japanese documents with complex layouts. IEEE/CVF conference on computer vision and pattern recognition workshops, pp 548–559

    Google Scholar 

  • Shiller RJ (2017) Narrative economics. Am Econ Rev 107:967–1004

    Article  Google Scholar 

  • Steyvers M, Griffiths T (2007) Probabilistic topic models. In: Landauer TK, McNamara DS, Dennis S, Kintsch W (eds) Handbook of latent semantic analysis. Psychology Press, Hoboken, pp 427–448

    Google Scholar 

  • Thorsrud LA (2020) Words are the new numbers: a newsy coincident index of business cycles. J Bus Econ Stat 38:393–409

    Article  Google Scholar 

  • Turner JD, Ye Q, Walker CB (2017) Media coverage and stock returns on the London Stock Exchange, 1825–70. Rev Financ 22:1605–1629

    Article  Google Scholar 

  • Verdickt G (2020) The effect of war risk on managerial and investor behavior: evidence from the Brussels Stock Exchange in the pre-1914 era. J Econ Hist 80:629–669

    Article  Google Scholar 

  • Viola L, Verheul J (2020) Mining ethnicity: discourse-driven topic modelling of immigrant discourses in the USA, 1898–1920. Digit Scholarsh Humanit 35:921–943

    Article  Google Scholar 

  • Wehrheim L (2019a) Economic history goes digital: topic modeling the journal of economic history. Cliometrica 13:83–125

    Article  Google Scholar 

  • Wehrheim L (2019b) Von Wirtschaftsweisen und Topic Models: 50 Jahre ökonomische Expertise aus einer Text Mining Perspektive. In: Sahle P (ed) DHd 2019 Digital Humanities: multimedial & multimodal. Konferenzabstracts. Frankfurt, pp 240–245

    Google Scholar 

  • Wehrheim L (2021) Im Olymp der Ökonomen. Zur öffentlichen Resonanz wirtschaftspolitischer Experten von 1965 bis 2015. Mohr Siebeck, Tübingen

    Google Scholar 

  • Wehrheim L (2022) The sound of silence. On the (in-)visibility of economic experts in German Print Media since the 1960s. Vierteljahrschrift für Sozial- und Wirtschaftsgeschichte 109:29–71

    Article  Google Scholar 

  • Wehrheim L, Jopp TA, Spoerer M (2023) Turn, turn, turn. A digital history of German HIstoriography, 1950–2019. J Interdiscip Hist 53:471–507

    Google Scholar 

  • Wevers M, Smits T (2019) The visual digital turn: using neural networks to study historical images. Digit Scholarsh Humanit 35:194–207

    Google Scholar 

  • Whaples R (1991) A quantitative history of the journal of economic history and the Cliometric revolution. J Econ Hist 51:289–301

    Article  Google Scholar 

  • Wiedemann G (2016) Text mining for qualitative data analysis in the social sciences: a study on democratic discourse in Germany. Springer VS, Wiesbaden

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lino Wehrheim .

Editor information

Editors and Affiliations

Electronic Supplementary Material(s)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Wehrheim, L. (2024). Digital Methods in Economic History: The Case of Computational Text Analysis. In: Diebolt, C., Haupert, M. (eds) Handbook of Cliometrics. Springer, Cham. https://doi.org/10.1007/978-3-031-35583-7_118

Download citation

Publish with us

Policies and ethics