Using Newspapers for Textual Indicators: Guidance Based on Spanish- and Portuguese-Speaking Countries

Andres-Escayola, Erik; Ghirelli, Corinna; Molina, Luis; Perez, Javier J.; Vidal, Elena

doi:10.1007/s10614-023-10433-z

Using Newspapers for Textual Indicators: Guidance Based on Spanish- and Portuguese-Speaking Countries

Published: 31 August 2023

(2023)
Cite this article

Computational Economics Aims and scope Submit manuscript

Erik Andres-Escayola¹,
Corinna Ghirelli ORCID: orcid.org/0000-0001-7859-0811²,
Luis Molina²,
Javier J. Perez² &
…
Elena Vidal³

120 Accesses
1 Citation
Explore all metrics

Abstract

This paper investigates the role that two key methodological choices play in the construction of dictionary-based indicators: the selection of local versus foreign newspapers, and the breadth of the press coverage (i.e. the amount of newspapers considered). The large literature in this field is almost silent about the robustness of research results to these two choices. These questions are relevant since the production of newspaper-based economic indicators is growing fast. We use as a case study the well-known economic policy uncertainty (EPU) index, taking as examples the six largest Latin American economies (Argentina, Brazil, Chile, Colombia, Mexico and Peru) and Spain. First, we develop EPU measures based on press with different levels of proximity, i.e. local versus foreign, and corroborate that they deliver broadly similar narratives. Second, we examine the macroeconomic effects of EPU shocks computed using these different sources by means of a structural Bayesian vector autoregression framework and find similar responses from the statistical point of view. These two applications should reassure researchers that they can rely on foreign sources to construct EPU indexes. This option may foster the comparability of results across countries and lay the groundwork for cross-country studies of uncertainty. Finally, we show that constructing EPU indexes based on only one newspaper, an option followed by many studies, may yield biased responses. Increasing the number of sources reduces the chances of obtaining biased responses. This suggests that it is important to maximize the breadth of the press coverage when building text-based indicators, since this would improve the robustness and credibility of results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Understanding Uncertainty Shocks in Uruguay Through VAR Modeling

Article 20 May 2023

State Aid for Newspapers: A Summary Assessment

Press Coverage of the 2012 U.S. Presidential Election: A Multinational, Cross-Language Comparison

Availability of data and materials

The data that support the findings of this study are available from the corresponding author upon request.

Notes

It was first calculated for the US and then constructed for other countries. The Policy Uncertainty website centralizes most EPU indexes based on the procedure of Baker et al. (2016).
Examples include the political violence index of Mueller and Rauh (2018), the geopolitical risk index (Caldara & Iacoviello, 2022), the World Uncertainty Index (Ahir et al., 2019) and the Reported Social Unrest Index (Barrett et al., 2022).
This list is not exhaustive. Another strand of the literature relies on massive sources such as the entire Dow Jones news archives. We do not mention this literature since it is out of scope.
A dedicated literature related to journalism examines differences between local and foreign press coverage and the consequences for textual analysis, giving examples of how different news sources may convey different messages (Papacharissi & de Fatima Oliveira, 2008; Pollak et al., 2011).
Separate results based on the Anglophone or Spanish press are available upon request.
Results for other relevant variables included in our empirical exercises are also similar, most notably as regards financial variables such as the exchange rate and a measure of financial risk.
For all countries except Brazil, we translate keywords from Spanish to English (for Brazil, we translate from Portuguese to English). Table 11 in Sect. B of the Appendix provides the list of keywords in Spanish and English.
The only exception is Infobae for Argentina. We included it because it is very popular in Argentina and we wanted to have at least 3 newspapers per country. Limiting the articles to those in the printed versions of the newspapers ensures the quality and relevance of the stories because editors select articles to be published in print given space limitations.
The time coverage of each newspaper is provided in Table 10 in Sect. B of the Appendix.
To define the Anglophone press, we rely on the newspapers considered by Barrett et al. (2022) to construct the Reported Social unrest index (RSUI), which consists in newspapers from UK, US and Canada.
For brevity, the list of keywords is reported in Table 12 in Sect. B of the Appendix.
Compared to the EPU indexes in Ghirelli et al. (2021), the indicators we present in this paper show two small technical differences: (i) we add a few new keywords that allow us to better capture the currency crisis in Argentina, and (ii) we select articles about the country of interest based on Factiva indexation rather than considering articles in which the name of the country appears in the text. This last choice is motivated by the fact that, especially in the Anglophone press, a news article may mention several Latin American countries even though it actually tells a story about one specific country. We check the robustness of our indicators to these technical changes, and it turns out that the results are very similar (available upon request).
For example, to avoid counting articles from Argentina in which the Argentinian press refers to the reactions of Chileans living in Argentina to the events in Chile in October 2019.
November 2002–December 2020 for the EPU index based on the local press; January 1997–December 2020 for the EPU index based on the foreign press, according to the availability of press data.
Frequency distribution for specific Latin American countries is not shown to avoid clutter. They are available upon request.
It is widely known that different newspapers may take opposite stances on economic issues (Lott & Hassett, 2014). This is due to the political or editorial bias of the newspaper and may exists also within local sources or within foreign sources.
To save space, the narratives of the Latin American countries are relegated to Tables 13, 14, 15, 16, 17 and 18 in Sect. C of the Appendix.
For example, these could be events that are fully discounted by local newspapers, or whose effects are going to be less relevant or less controversial for the country examined than for the countries of origin of the foreign press.
To construct the macroeconomic variables at the level of the Latin American region, we take the simple mean of the variables across the six Latin American countries, in order not to over-represent Mexico and Brazil in the aggregate.
Results are robust to the inclusion of the Covid-19 period, but the impulse response functions are more unstable. Results are available upon request. In the context of vector autoregression (VAR) analysis, there is still no consensus on how to deal with the Covid-19 outlier encountered in most macroeconomic variables. Hence, shortening the sample is the most plausible approach for the purposes of this paper.
In Sect. A of the Appendix, we provide additional details (Table 8) and the main descriptive statistics (Table 4, 5, 6 and 7) for the data.
The VIX represents the market’s expectations regarding the relative strength of near-term price changes in the S &P 500 index. Because it is derived from the prices of S &P 500 index options with near-term expiration dates, it generates a 30-day forward projection of volatility.
In particular, it is a measure of the sentiment of global financial markets towards the country (a more reliable country attracts more flows).
For the computational implementation of the models, we use the developer version of the BEAR toolbox. For further details, see Dieppe et al. (2016).
For clarity, Fig.13 in Sect. F of the Appendix represents graphically the recursive identification à la Cholesky by means of a diagrammatic flow.
The coefficients of the impulse responses are also reported in Table 21 of the Appendix.
Results for single countries are available upon request.
The EMBI is calculated as the spread between US bonds and the emerging-market bonds. It is developed by JP Morgan Chase and is considered the main indicator of country risk for emerging markets.
Note, we focus on local newspapers since we already showed that the EPU index based on local press is equivalent to using the EPU index based on all available press, from a quantitative point of view. This choice is taken for simplicity, since if we considered the subset of all available press the number of possible combinations would explode.
Considering 7 (6) inputs, the number of possible combination without repetitions is 2 to the power of 7 (6) minus 1.
Given that some sources are available for shorter time spans, for Mexico and Brazil we drop out 7 combinations because the resulting EPU indexes that are too short due to text data availability.
The videos are available under https://github.com/eiae/EPU-press-analysis.
Note, the variation that emerges within each block refers to the specific selection of newspaper and may be due the editorial bias. This is out of hte scope of this paper.
This restriction implies that local variables are not Granger-causing global variables. Accordingly, we assume that Brazil, Mexico and Spain are small open economies when compared to the global variables considered in the models.
The key difference between Equation F.1 and F.3 is that the reduced-form VAR is a mere statistical model, for which estimation is possible. Conversely, the structural augmentation makes the model economically interpretable but not feasible for estimation.
If the diagonal elements of D are left unrestricted, this can be done without loss of generality as to have mutually uncorrelated shocks with unit variances and as many structural shocks as variables in the model. Under this representation, a unit shock represents a magnitude of one standard deviation by construction.
A useful result is that when D is lower triangular so is $D_0$.

References

Aguilar, P., Ghirelli, C., Pacce, M., & Urtasun, A. (2021). Can news help measure economic sentiment? An application in COVID-19 times. Economics Letters, 199, 109730.
Article Google Scholar
Ahir, H., Bloom, N., & Furceri, D. (2019). The World Uncertainty Index. Working Papers 19–027, Standford Institute for Economic Policy Research.
Alexopoulos, M., & Cohen, J. (2015). The power of print: Uncertainty shocks, markets, and the economy. International Review of Economics & Finance, 40(C), 8–28.
Article Google Scholar
Aprigliano, V., Emiliozzi, S., Guaitoli, G., Luciani, A., Marcucci, J., & Monteforte, L. (2022). The power of text-based indicators in forecasting Italian economic activity. International Journal of Forecasting.
Ardia, D., Bluteau, K., & Boudt, K. (2019). Questioning the news about economic growth: Sparse forecasting using thousands of news-based sentiment values. International Journal of Forecasting, 35(4), 1370–1386.
Article Google Scholar
Armesto, M. T., Hernandez-Murillo, R., Owang, M. T., & Piger, J. (2009). Measuring the information content of the beige book: A mixed data sampling approach. Journal of Money, Credit and Banking, 41(1), 35–55.
Article Google Scholar
Azzimonti, M. (2018). Partisan conflict and private investment. Journal of Monetary Economics, 93, 114–131. Carnegie-Rochester-NYU Conference on Public Policy held at the Stern School of Business at New York University.
Article Google Scholar
Bachmann, R., Elstner, S., & Sims, E. R. (2013). Uncertainty and economic activity: Evidence from business survey data. American Economic Journal: Macroeconomics, 5(2), 217–49.
Google Scholar
Baker, S. R., Bloom, N., & Davis, S. J. (2016). Measuring economic policy uncertainty. The Quarterly Journal of Economics, 131(4), 1593–1636.
Article Google Scholar
Barrett, P., Appendino, M., Nguyen, K., & de Leon Miranda, J. (2022). Measuring Social Unrest Using Media Reports. Journal of Development Economics 158(102924).
Bloom, N. (2009). The Impact of Uncertainty Shocks. Econometrica, 77(3), 623–685.
Article Google Scholar
Caldara, D., Fuentes-Albero, C., Gilchrist, S., & Zakrajšek, E. (2016). The macroeconomic impact of financial and uncertainty shocks. European Economic Review, 88, 185–207.
Article Google Scholar
Caldara, D., & Iacoviello, M. (2022). Measuring geopolitical risk. American Economic Review, 112(4), 1194–1225.
Article Google Scholar
Cerda, R., Silva, A., & Valente, J. T. (2018). Economic uncertainty impact in a small open economy: The case of Chile. Applied Economics, 50(26), 2894–2908.
Article Google Scholar
Consoli, S., Barbaglia, L., & Manzan, S. (2022). Fine-grained, aspect-based sentiment analysis on economic and financial lexicon. Knowledge-Based Systems, 247, 108781.
Article Google Scholar
Dieppe, A., Legrand, R., & Van Roye, B. (2016). The BEAR toolbox. ECB working paper, European Central Bank.
Fraiberger, S. P., Lee, D., Puy, D., & Ranciere, R. (2021). Media sentiment and international asset prices. Journal of International Economics, 133, 103526.
Article Google Scholar
Garcia, D. (2013). Sentiment during Recessions. The Journal of Finance, 68(3), 1267–1300.
Article Google Scholar
Ghirelli, C., Pérez, J. J., & Urtasun, A. (2019). A new economic policy uncertainty index for Spain. Economics Letters, 182, 64–67.
Article Google Scholar
Ghirelli, C., Pérez, J. J., & Urtasun, A. (2021). The spillover effects of economic policy uncertainty in Latin America on the Spanish economy. Latin American Journal of Central Banking, 2(2), 100029.
Article Google Scholar
Gil-León, J., & Silva-Pinzón, D. (2019). Índice de incertidumbre de política económica (EPU) para Colombia, 2000–2017. Ensayos de Economía, 29(55), 37–56.
Article Google Scholar
Huang, Y., & Luk, P. (2020). Measuring economic policy uncertainty in China. China Economic Review 59(101367).
Jirasavetakul, L. -B., & Spilimbergo, A. (2018). Economic Policy Uncertainty in Turkey. IMF Working Papers 18/272, International Monetary Fund.
Jurado, K., Ludvigson, S. C., & Ng, S. (2015). Measuring uncertainty. American Economic Review, 105(3), 1177–1216.
Article Google Scholar
Kalamara, E., Turrell, A., Redl, C., Kapetanios, G., & Kapadia, S. (2022). Making text count: Economic forecasting using newspaper text. Journal of Applied Econometrics, 37(5), 896–919.
Article Google Scholar
Lott, J. R., & Hassett, K. A. (2014). Is newspaper coverage of economic events politically biased? Public Choice, 160(1/2), 65–108.
Article Google Scholar
Mueller, H., & Rauh, C. (2018). Reading between the lines: Prediction of political violence using newspaper text. American Political Science Review, 112(2), 358–375.
Article Google Scholar
Nyman, R., Kapadia, S., & Tuckett, D. (2021). News and narratives in financial systems: Exploiting big data for systemic risk assessment. Journal of Economic Dynamics and Control, 127, 104119.
Article Google Scholar
Papacharissi, Z., & de Fatima Oliveira, M. (2008). News frames terrorism: A comparative analysis of frames employed in terrorism coverage in U.S. and U.K. Newspapers. The International Journal of Press/Politics, 13(1), 52–74.
Article Google Scholar
Pollak, S., Coesemans, R., Daelemans, W., & Lavrac, N. (2011). Detecting contrast patterns in newspaper articles by combining discourse analysis and text mining. Pragmatics, 21(4), 647–683.
Google Scholar
Rambaccussing, D., & Kwiatkowski, A. (2020). Forecasting with news sentiment: Evidence with UK newspapers. International Journal of Forecasting, 36(4), 1501–1516.
Article Google Scholar
Scotti, C. (2016). Surprise and uncertainty indexes: Real-time aggregation of real-activity macro-surprises. Journal of Monetary Economics, 82, 1–19.
Article Google Scholar
Shapiro, A. H., Sudhof, M., & Wilson, D. J. (2022). Measuring news sentiment. Journal of Econometrics, 228(2), 221–243.
Article Google Scholar
Thorsrud, L. A. (2020). Words are the new numbers: A newsy coincident index of the business cycle. Journal of Business & Economic Statistics, 38(2), 393–409.
Article Google Scholar

Download references

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Author information

Authors and Affiliations

European Central Bank, Frankfurt, Germany
Erik Andres-Escayola
Banco de España, Madrid, Spain
Corinna Ghirelli, Luis Molina & Javier J. Perez
OECD, Paris, France
Elena Vidal

Authors

Erik Andres-Escayola
View author publications
You can also search for this author in PubMed Google Scholar
Corinna Ghirelli
View author publications
You can also search for this author in PubMed Google Scholar
Luis Molina
View author publications
You can also search for this author in PubMed Google Scholar
Javier J. Perez
View author publications
You can also search for this author in PubMed Google Scholar
Elena Vidal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to the study conception and design, material preparation, data collection, analysis, writing, reviewing and editing.

Corresponding author

Correspondence to Corinna Ghirelli.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank Concha Artola, Rodolfo Campos, Marina Diakonova, Angel Gavilán, Danilo Leiva, José M. González-Minguez, Margarita Machelett, Evi Pappa, Gabriel Pérez Quirós, Jacopo Timini, and seminar participants at Banco de España for comments. The views expressed in this paper are those of the authors and do not necessarily represent the views of the Banco de España or the Eurosystem. Elena Vidal wrote part of this paper while affiliated at the Bank of Spain.

Appendix

1.1 A Data description

The data used to estimate the impact of policy uncertainty on macro and financial variables are sourced via Refinitiv. A complete description of the data is provided in Table 8.

In the case of quarterly data, real GDP series are taken from the NationaI Institute for Geography and Statistics of Brazil (IBGE in Portuguese), the National Institute of Statistics and Geography of Mexico (INEGI in Spanish) and the Oxford Economic database for Argentina, Chile, Colombia and Peru. Inflation is calculated from the respective domestic consumer price indexes, constructed and published by the respective national statistics offices. In both cases, we take the quarterly rate of change. Macro series are seasonally adjusted; we use Refinitiv to adjust for seasonality any series published without seasonal adjustment. The quarterly rates of each country are averaged to obtain the Latin American inflation and growth rates, thus avoiding the overweighting of Brazil and Mexico.

Net portfolio capital inflows from non residents are extracted from balance of payments publications of the respective national statistics offices and complemented with data from the IMF’s databases to build some series back to 2003. Capital flow series are scaled by the nominal GDP levels, and we avoid taking moving averages to reflect the direct impact of increases in EPU on the evolution of capital inflows and outflows. Latin American series are obtained adding portfolio capital flows.

The bilateral exchange rates vis–à–vis the USD are taken from Reuters. As there is no Latin American exchange rate against the US dollar, we take the quarterly changes in each exchange rate and average them for the six countries analysed. We use the benchmark MSCI equity indices in USD to estimate the change in stock prices, and although MSCI has a Latin American index, we prefer to extract the six individual indexes and calculate the average of the quarterly changes as the aggregate index is biased towards the evolution of the major firms of Brazil and Mexico, which have the highest market capitalization.

Finally, we include the VIX index, which represents the market’s expectations regarding price changes in the S &P 500 index. Because it is derived from the prices of Standard and Poor’s index options with near-term expiration dates, it generates a 30-day forward projection of volatility. The VIX is included in levels and attempts to capture global uncertainty, so it is useful to disentangle the effects of EPU shocks from the effects of international events that, in turn, could affect EPU.

The robustness exercise with monthly data uses the same variables as above but for the activity index and portfolio capital flows of non-residents. For the former, we use the monthly GDP proxies published by the national statistics offices and central banks of the region (Estimador Mensual de Actividad Económica (EMAE) for Argentina; Índice de Atividade Econômica do Banco Central (IBC-Br) for Brazil; Índice Mensual de Actividad Económica (Imacec) for Chile; Indicador de Seguimiento a la Economía (ISE) for Colombia; Indicador Global de la Actividad Económica (IGAE) for Mexico; and the INEI monthly GDP for Peru). In the case of portfolio flows by non-residents, the central banks of all countries except Peru publish monthly data.

Table 4 Descriptive statistics for Latin America

Using Newspapers for Textual Indicators: Guidance Based on Spanish- and Portuguese-Speaking Countries

Abstract

Access this article

Similar content being viewed by others

Understanding Uncertainty Shocks in Uruguay Through VAR Modeling

State Aid for Newspapers: A Summary Assessment

Press Coverage of the 2012 U.S. Presidential Election: A Multinational, Cross-Language Comparison

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 A Data description

1.2 B Constructing the EPU Indexes: Local Press Coverage by Country

1.3 C Narrative Across Alternative Press Coverage by Country

1.4 D Our EPU Indexes by Country

1.5 E Our EPU Indexes for the Latin American Region

1.6 F Structural VAR Model with Recursive Identification à la Cholesky

1.7 G Robustness Results: BVAR Specifications

1.8 H Robustness Results: Breadth of Press Coverage

1.9 I Additional Benchmark Results: BVAR Exercise for the Latin American Region

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation