Abstract
In the last two decades, there has been a considerable increase in the supply of digital resources available to economic historians. At the same time, scholars have started to use innovative methods and technologies to study these digital sources. In this chapter, I will focus on one of these approaches – computational text analysis (CTA), also known as text mining – that has a great potential for economic historians. Firstly, I will provide an overview of examples of CTA that are relevant to economic historians, illustrating certain trends that have emerged so far. Secondly, to give a hands-on example of this kind of approach, I conduct a case study in which I apply a certain type of CTA, that is, topic-modelling, to a corpus of more than 17,000 research articles published in ten international economics and economic history journals since 1949. Covering flagship journals that represent the wide range of both fields, such as The American Economic Review, The Economic History Review, The Journal of Economic History, and The Journal of Economic Literature, I quantitatively compare the similarity of economics and economic history in terms of their research topics. Finally, I give a brief outlook on digital methods beyond the limits of CTA as well as some general reflections on the use of digital methods in our field.
Notes
- 1.
This might be the most fundamental difference to the “traditional” humanities with their expertise in dealing with ambiguities. See also the final section.
- 2.
An exception to this observation can be found in the series “Current Research in Digital History” published by the Roy Rosenzweig Center for History and New Media. See https://crdh.rrchnm.org (last access July 12, 2022).
- 3.
- 4.
An example of studies that quantify textual sources without the help of digital tools can be found in Whaples (1991). Contrary to the Social Sciences, quantitative text analysis by means of coding-books, which confusingly is sometimes called “Qualitative Text Analysis,” does not seem to have been particularly common in economic history.
- 5.
- 6.
The difference between pre-defined categorization scheme and categorization by means of a topic model will be further addressed in the section “Outlook and Conclusion.”
- 7.
It goes without saying that topic modelling also requires human intervention, regarding, e.g., data selection, model specification, and evaluation. The central feature of topic models is that the two crucial steps of category building and classification are performed solely by the algorithm.
- 8.
Topic models can be useful also for data collection. For example, the model created for this chapter contains a topic related to Germany (topic 12). I used this topic to identify relevant articles for the chapter “Cliometrics and the Study of German History” by Tobias Jopp and Mark Spoerer in this volume.
- 9.
In the following section, I provide an overview of studies that apply some sort of computational text analysis and that are of potential interest to economic historians. Some of these studies do not address an EH question in the strict sense but still might be useful to future economic historians.
- 10.
See Nicholson for an account on how the digital turn has affected the use of newspapers as a historiographic source.
- 11.
This particularly concerns newspapers published after World War II for which access can be quite expensive if one is interested in full texts, which are necessary for applying methods such as topic modelling.
- 12.
The preprocessing procedure and the model specifications are documented in the Online Appendix.
- 13.
Diverting from this rule, topic 22 was classified as neutral due to its high topic share and its idiosyncratic development (see Online Appendix). This is of course a static picture as topics are defined as EH/Econ only once for the whole period. As we will see below, the affiliation of a topic can change over time.
- 14.
Cf. https://www.aeaweb.org/articles?id=10.1257/000282803321455250 (last access July 15, 2022). It could be revealing to measure the overlap between the classification based on the topic model and JEL-codes. Due to a lack of data, this evaluation could not be carried out yet.
- 15.
As a side note, the JEL-codes of this paper do not contain a N-code. One could argue that papers published in EEH all have an N-code by default, so assigning one explicitly is unnecessary. On the other hand, there are papers published in EEH that come with an explicit N-code. Others, again, come without any JEL-code.
- 16.
As topic modelling is a probabilistic process, one should be eminently cautious in using the data produced by a topic model as input for further quantitative analysis. Repeatedly running the same model with the same parameters yields different results. Also, the results are highly dependable on the parameters, that is, the number of topics, the number of iterations, the hyperparameters alpha and beta, the seed-value, and finally, the preprocessing.
- 17.
That is, the sum of the top five topics per year. Note that these actual five topics vary between years.
- 18.
These thoughts result from my own experience in an interdisciplinary DH project (https://media-sentiment.uni-leipzig.de) as well as on conversations with colleagues with similar experiences.
- 19.
See Ash and Hansen (2023) for a discussion of the use of textual data as input for econometric models.
- 20.
Grimmer et al. (2022), see text analysis as an augmentation for human readers.
References
Abramitzky R (2015) Economics and the modern economic historian. J Econ Hist 75:1240–1251
Abramitzky R, Boustan L, Eriksson K et al (2021) Automated linking of historical data. J Econ Lit 59:865–918
Ambrosino A, Cedrini M, Davis JB et al (2018) What topic modeling could reveal about the evolution of economics. J Econ Methodol 25:329–348
Annaert J, Mensah L (2014) Cross-sectional predictability of stock returns, evidence from the 19th century Brussels Stock Exchange (1873–1914). Explor Econ Hist 52:22–43
Ash E, Hansen S (2023) Text algorithms in economics. Annu Rev Econ 15
Ballandonne M, Cersosimo I (2023) Toward a “text as data” approach in the history and methodology of economics: an application to Adam Smith’s classics. J Hist Econ Thought 45
Bellstam G, Bhagat S, Cookson JA (2021) A text-based analysis of corporate innovation. Manag Sci 67:4004–4031
Blaydes L, Grimmer J, McQueen A (2018) Mirrors for princes and sultans: advice on the art of governance in the medieval Christian and Islamic worlds. J Polit 80:1150–1167
Blei DM (2012) Probabilistic topic models. Commun ACM 55:77–84
Blei D, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Blomqvist C, Enflo K, Jakobsson A, Åström K (2023) Reading the Ransom: methodological advancements in extracting the Swedish Wealth Tax of 1571. Explor Econ Hist 87
Cherrier B (2017) Classifying economics: a history of the JEL codes. J Econ Lit 55:545–579
Cioni M, Federico G, Vasta M (2020) The long-term evolution of economic history: evidence from the top five field journals (1927–2017). Cliometrica 14:1–39
Cioni M, Federico G, Vasta M (2022) Is economic history changing its nature? Evidence from top journals. Cliometrica 17:23–48. (Online First)
Combes P-P, Gobillon L, Zylberberg Y (2022) Urban economics in a historical perspective: recovering data with machine learning. Reg Sci Urban Econ 94:103711
Daniel V, ter Steege L (2020) Inflation expectations and the recovery from the Great Depression in Germany. Explor Econ Hist 75:101305
Daniel V, Neubert M, Orban A (2018) Fictional expectations and the global media in the Greek debt crisis: a topic modeling approach. Jahrbuch für Wirtschaftsgeschichte 59:525–566
Diaf S, Döpke J, Fritsche U, Rockenbach I (2022) Sharks and minnows in a shoal of words: measuring latent ideological positions based on text mining techniques. Eur J Polit Econ 75:102179
Diebolt C (2016) Cliometrica after 10 years: definition and principles of cliometric research. Cliometrica 10:1–4
Diebolt C, Haupert M (2019) We are Ninjas: how economic history has infiltrated economics. Sartoniana 32:197–221
Diebolt C, Haupert M (2022) Cliometrics and the future of economic history. Essays Econ Bus Hist 40:1–20
Ellingsen J, Larsen VH, Thorsrud LA (2022) News media versus FRED-MD for macroeconomic forecasting. J Appl Econ 37:63–81
Esteves R, Geisler Mesevage G (2019) Social networks in economic history: opportunities and challenges. Explor Econ Hist 74:101299
Ferguson-Cradler G (2021) Narrative and computational text analysis in business and economic history. Scand Econ Hist Rev 71:1–25
Fernández-de-Pinedo N, La Parra-Perez A, Muñoz F-F (2022) Recent trends in publications of economic historians in Europe and North America (1980–2019): an empirical analysis. Cliometrica 17:1–22
Fickers A, van der Heijden T (2020) Inside the trading zone: thinkering in a digital history lab. Digit Hum Q 14
Fligstein N, Brundage JS, Schultz M (2017) Seeing like the fed: culture, cognition, and framing in the failure to anticipate the financial crisis of 2008. Am Sociol Rev 82:879–909
Frydman R, Mangee N, Stillwagon J (2021) How market sentiment drives forecasts of stock returns. J Behav Financ 22:351–367
Gentzkow M, Kelly B, Taddy M (2019) Text as data. J Econ Lit 57:535–574
Grajzl P, Murrell P (2021) Characterizing a legal–intellectual culture: Bacon, Coke, and seventeenth-century England. Cliometrica 15:43–88
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci U S A 101:5228–5235
Grimmer J, Stewart BM (2013) Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit Anal 21:267–297
Grimmer J, Roberts ME, Stewart BM (2022) Text as data: a new framework for machine learning and the social sciences. Princeton University Press, Princeton
Guldi J (2019) Parliament’s debates about infrastructure: an exercise in using dynamic topic models to synthesize historical change. Technol Cult 60:1–33
Håkansson PG, Karlsson T, La Mela M (2022) Running out of time: using job ads to analyse the demand for messengers in the twentieth century. Scand Econ Hist Rev:1–20. (Online First)
Hanna AJ, Turner JD, Walker CB (2020) News media and investor sentiment during bull and bear markets. Eur J Financ 26:1377–1395
Hansen S, McMahon M, Prat A (2018) Transparency and deliberation within the FOMC: a computational linguistics approach. Q J Econ 133:801–870
Harris C, Myers A, Briol C, Carlen S (2022) The binding force of economics. In: D’Amico DJ, Martin AG (eds) Contemporary methods and Austrian economics. pp 69–103
Hayo B, Henseler K, Steffen Rapp M, Zahner J (2022) Complexity of ECB communication and financial market trading. J Int Money Financ 128:102709
Heyer G (2009) Introduction to TMS 2009. In: Heyer G (ed) Text mining services. Leipzig, pp 1–14
Jacobi C, van Atteveldt W, Welbers K (2015) Quantitative analysis of large amounts of journalistic texts using topic modelling. Digit Journal 4:89–106
Jaremski M (2020) Today’s economic history and tomorrow’s scholars. Cliometrica 14:169–180
Kabiri A, James H, Landon-Lane J, Nyman R (2022) The role of sentiment in the economy of the 1920s. Econ Hist Rev 76:3–30. (Online First)
Komlos J (2003) Access to food and the biological standard of living: perspectives on the nutritional status of Native Americans. Am Econ Rev 93:252–255
Kronenberg C (2021) A new measure of 19th century US suicides. Soc Indic Res 157:803–815
Küsters A (2022) Applying lessons from the past? Exploring historical analogies in ECB speeches through text mining, 1997–2019. Int J Cent Bank 18:277–329
La Mela M (2020) Tracing the emergence of Nordic allemansrätten through digitised parliamentary sources. In: Fridlund M, Oiva M, Paju P (eds) Digital histories: emergent approaches within the new digital history. Helsinki University Press, Helsinki, pp 181–197
La Parra-Perez A, Muñoz F-F, Fernandez-de-Pinedo N (2022) EconHist: a relational database for analyzing the evolution of economic history (1980–2019). Hist Methods J Quant Interdiscip Hist 55:45–60
Lack P (2021) Using word analysis to track the evolution of emotional well-being in nineteenth-century industrializing Britain. Hist Methods J Quant Interdiscip Hist 54:228–247
Lässig S (2021) Digital history: challenges and opportunities for the profession. Gesch Ges 47:5–34
Lehenmeier C, Burghardt M, Mischka B (2020) Layout detection and table recognition – recent challenges in digitizing historical documents and handwritten tabular data. In: Hall M, Merčun T, Risse T, Duchateau F (eds) Digital libraries for open knowledge. Springer Cham, Cham, pp 229–242
Lennard J (2020) Uncertainty and the great slump. Econ Hist Rev 73:844–867
Liebl B, Burghardt M (2020) From historical newspapers to machine-readable data: the origami OCR pipeline. In: Proceedings of the 1st workshop on computational humanities research (CHR)
Lüdering J, Winker P (2016) Forward or backward looking? The economic discourse and the observed reality. J Econ Stat 236:483–515
Marjanen J (2021) National sentiment: nation building and emotional language in nineteenth-century Finland. In: Kivimäki V, Suodenjoki S, Vahtikari T (eds) Lived nation as the history of experiences and emotions in Finland, 1800–2000. Palgrave Macmillan Cham, Cham, pp 61–83
Merchant Klancher E, Alexander CS (2022) U.S. demography in transition. Hist Methods J Quant Interdiscip Hist 55:1–21
Miller IM (2013) Rebellion, crime and violence in Qing China, 1722–1911: a topic modeling approach. Poetics 41:626–649
Mitchener KJ (2015) The 4D future of economic history: digitally-driven data design. J Econ Hist 75:1234–1239
Moretti F (2013) Distant reading. Verso Books, London/New York
Pablo-Martí F, Alañón-Pardo Á, Sánchez A (2021) Complex networks to understand the past: the case of roads in Bourbon Spain. Cliometrica 15:477–534
Price J, Buckles K, Van Leeuwen J, Riley I (2021) Combining family history and machine learning to link historical records: the Census Tree data set. Explor Econ Hist 80:101391
Ros R, van Erp M, Rijpma H, Zijdeman R (2020) Mining wages in nineteenth-century job advertisements. The application of language resources and language technology to study economic and social inequality. Proceedings of LR4SSHOC: workshop about language resources for the SSH Cloud, pp 27–32
Rosenzweig R (2003) Scarcity or abundance? preserving the past in a digital era. Am Hist Rev 108:735–762
Salmi H (2021) What is digital history? Wiley & Sons, Medford
Seefeldt D, Thomas WG (2009) What is digital history? Perspect Hist 47
Shen Z, Zhang K, Dell M (2020) A large dataset of historical japanese documents with complex layouts. IEEE/CVF conference on computer vision and pattern recognition workshops, pp 548–559
Shiller RJ (2017) Narrative economics. Am Econ Rev 107:967–1004
Steyvers M, Griffiths T (2007) Probabilistic topic models. In: Landauer TK, McNamara DS, Dennis S, Kintsch W (eds) Handbook of latent semantic analysis. Psychology Press, Hoboken, pp 427–448
Thorsrud LA (2020) Words are the new numbers: a newsy coincident index of business cycles. J Bus Econ Stat 38:393–409
Turner JD, Ye Q, Walker CB (2017) Media coverage and stock returns on the London Stock Exchange, 1825–70. Rev Financ 22:1605–1629
Verdickt G (2020) The effect of war risk on managerial and investor behavior: evidence from the Brussels Stock Exchange in the pre-1914 era. J Econ Hist 80:629–669
Viola L, Verheul J (2020) Mining ethnicity: discourse-driven topic modelling of immigrant discourses in the USA, 1898–1920. Digit Scholarsh Humanit 35:921–943
Wehrheim L (2019a) Economic history goes digital: topic modeling the journal of economic history. Cliometrica 13:83–125
Wehrheim L (2019b) Von Wirtschaftsweisen und Topic Models: 50 Jahre ökonomische Expertise aus einer Text Mining Perspektive. In: Sahle P (ed) DHd 2019 Digital Humanities: multimedial & multimodal. Konferenzabstracts. Frankfurt, pp 240–245
Wehrheim L (2021) Im Olymp der Ökonomen. Zur öffentlichen Resonanz wirtschaftspolitischer Experten von 1965 bis 2015. Mohr Siebeck, Tübingen
Wehrheim L (2022) The sound of silence. On the (in-)visibility of economic experts in German Print Media since the 1960s. Vierteljahrschrift für Sozial- und Wirtschaftsgeschichte 109:29–71
Wehrheim L, Jopp TA, Spoerer M (2023) Turn, turn, turn. A digital history of German HIstoriography, 1950–2019. J Interdiscip Hist 53:471–507
Wevers M, Smits T (2019) The visual digital turn: using neural networks to study historical images. Digit Scholarsh Humanit 35:194–207
Whaples R (1991) A quantitative history of the journal of economic history and the Cliometric revolution. J Econ Hist 51:289–301
Wiedemann G (2016) Text mining for qualitative data analysis in the social sciences: a study on democratic discourse in Germany. Springer VS, Wiesbaden
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Electronic Supplementary Material(s)
Rights and permissions
Copyright information
© 2024 Springer Nature Switzerland AG
About this entry
Cite this entry
Wehrheim, L. (2024). Digital Methods in Economic History: The Case of Computational Text Analysis. In: Diebolt, C., Haupert, M. (eds) Handbook of Cliometrics. Springer, Cham. https://doi.org/10.1007/978-3-031-35583-7_118
Download citation
DOI: https://doi.org/10.1007/978-3-031-35583-7_118
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35582-0
Online ISBN: 978-3-031-35583-7
eBook Packages: Economics and FinanceReference Module Humanities and Social SciencesReference Module Business, Economics and Social Sciences