Skip to main content
Log in

Benchmarking econometric and machine learning methodologies in nowcasting GDP

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

Nowcasting can play a key role in giving policymakers timelier insight to data published with a significant time lag, such as final GDP figures. Currently, there are a plethora of methodologies and approaches for practitioners to choose from. However, there lacks a comprehensive comparison of these disparate approaches in terms of predictive performance and characteristics. This paper addresses that deficiency by examining the performance of 17 different methodologies in nowcasting US quarterly GDP growth, including all the methods most commonly employed in nowcasting, as well as some of the most popular traditional machine learning approaches. Performance was assessed over a 20-year period, from 2002 to 2022. This span encompassed two crises, the 2008 financial crisis and the COVID crisis, as well as extended tranquil periods. The two best-performing methodologies in the analysis were long short-term memory artificial neural networks (LSTM) and Bayesian vector autoregression (Bayesian VAR). To facilitate further application and testing of each of the examined methodologies, an open-source repository containing boilerplate code that can be applied to different datasets is published alongside the paper, available at: github.com/dhopp1/nowcasting_benchmark

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

Download references

Acknowledgements

The author would like to thank Anu Peltola for her valuable comments and feedback.

Funding

No funds, grants, or other support were received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Hopp.

Ethics declarations

Conflict of interest

The author has no relevant financial or non-financial interests to disclose.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Additional tabular results

See Tables 4, 5, 6 and 7.

Table 4 RMSE as a proportion of ARMA model, 2002–2007
Table 5 RMSE as a proportion of ARMA model, 2008–2009
Table 6 RMSE as a proportion of ARMA model, 2010–2019
Table 7 RMSE as a proportion of ARMA model, 2020–2022

1.2 Additional graphical results

See Figs. 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, and 25.

Fig. 4
figure 4

Nowcasts (part 1 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 5
figure 5

Nowcasts (part 2 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 6
figure 6

Nowcasts (part 3 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 7
figure 7

Nowcasts (part 4 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 8
figure 8

Nowcasts (part 5 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 9
figure 9

Nowcasts (part 6 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 10
figure 10

Nowcasts (part 7 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 11
figure 11

Nowcasts (part 8 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 12
figure 12

Nowcasts (part 9 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 13
figure 13

Nowcasts (part 10 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 14
figure 14

Nowcasts (part 11 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 15
figure 15

Nowcasts (part 12 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 16
figure 16

Nowcasts (part 13 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 17
figure 17

Nowcasts (part 14 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 18
figure 18

Nowcasts (part 15 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 19
figure 19

Nowcasts (part 16 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 20
figure 20

Nowcasts (part 17 of 17). Note: This plot shows the predictions of the methodology at each quarter in the test period. The black line denotes what the actual quarterly growth rate was, while the various colored dotted lines denote the predictions at various data vintages. Dotted lines clustered together that track the black line closely indicate a well-performing model. Dotted lines spread out quite different from each other and far from the black line indicate a poorly performing model. (Color figure online)

Fig. 21
figure 21

2 months before vintage, pairwise Diebold–Mariano test statistics. Alternative hypothesis that the column methodology has superior predictions to the row methodology. Note: This plot shows the p values of the pairwise Diebold–Mariano test. The alternative hypothesis of the tests is that the methodology in the column has superior predictions to the methodology in the row. Significant values are marked both by color and with an asterisk(s). A methodology whose column is completely shaded (good) means that it achieved significance in each of the pairwise tests, while if the column is completely white (bad),it did not achieve significance in any of the pairwise tests. The inverse is true for rows; a completely shaded row means every other methodology achieved significance compared to it (bad), while a completely white row means no other methodology achieved significance compared to it (good). ***\(p < 0.01\), **\(p < 0.05\), *\(p < 0.1\). (Color figure online)

Fig. 22
figure 22

1 month before vintage, pairwise Diebold–Mariano test statistics. Alternative hypothesis that the column methodology has superior predictions to the row methodology. Note: This plot shows the p values of the pairwise Diebold–Mariano test. The alternative hypothesis of the tests is that the methodology in the column has superior predictions to the methodology in the row. Significant values are marked both by color and with an asterisk(s). A methodology whose column is completely shaded (good) means that it achieved significance in each of the pairwise tests, while if the column is completely white (bad),it did not achieve significance in any of the pairwise tests. The inverse is true for rows; a completely shaded row means every other methodology achieved significance compared to it (bad), while a completely white row means no other methodology achieved significance compared to it (good). ***\(p < 0.01\), **\(p < 0.05\), *\(p < 0.1\). (Color figure online)

Fig. 23
figure 23

Month of vintage, pairwise Diebold–Mariano test statistics. Alternative hypothesis that the column methodology has superior predictions to the row methodology. Note: This plot shows the p values of the pairwise Diebold–Mariano test. The alternative hypothesis of the tests is that the methodology in the column has superior predictions to the methodology in the row. Significant values are marked both by color and with an asterisk(s). A methodology whose column is completely shaded (good) means that it achieved significance in each of the pairwise tests, while if the column is completely white (bad),it did not achieve significance in any of the pairwise tests. The inverse is true for rows; a completely shaded row means every other methodology achieved significance compared to it (bad), while a completely white row means no other methodology achieved significance compared to it (good). ***\(p < 0.01\), **\(p < 0.05\), \(*p < 0.1\). (Color figure online)

Fig. 24
figure 24

1 month after vintage, pairwise Diebold–Mariano test statistics. Alternative hypothesis that the column methodology has superior predictions to the row methodology. Note: This plot shows the p values of the pairwise Diebold–Mariano test. The alternative hypothesis of the tests is that the methodology in the column has superior predictions to the methodology in the row. Significant values are marked both by color and with an asterisk(s). A methodology whose column is completely shaded (good) means that it achieved significance in each of the pairwise tests, while if the column is completely white (bad),it did not achieve significance in any of the pairwise tests. The inverse is true for rows; a completely shaded row means every other methodology achieved significance compared to it (bad), while a completely white row means no other methodology achieved significance compared to it (good). ***\(p < 0.01\), **\(p < 0.05\), *\(p < 0.1\). (Color figure online)

Fig. 25
figure 25

2 months after vintage, pairwise Diebold–Mariano test statistics. Alternative hypothesis that the column methodology has superior predictions to the row methodology. Note: This plot shows the p values of the pairwise Diebold–Mariano test. The alternative hypothesis of the tests is that the methodology in the column has superior predictions to the methodology in the row. Significant values are marked both by color and with an asterisk(s). A methodology whose column is completely shaded (good) means that it achieved significance in each of the pairwise tests, while if the column is completely white (bad), it did not achieve significance in any of the pairwise tests. The inverse is true for rows; a completely shaded row means every other methodology achieved significance compared to it (bad), while a completely white row means no other methodology achieved significance compared to it (good). ***\(p < 0.01\), **\(p < 0.05\), *\(p < 0.1\). (Color figure online)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hopp, D. Benchmarking econometric and machine learning methodologies in nowcasting GDP. Empir Econ 66, 2191–2247 (2024). https://doi.org/10.1007/s00181-023-02515-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-023-02515-6

Keywords

Navigation