Forecasting oil prices with random forests

Kohlscheen, Emanuel

doi:10.1007/s00181-023-02480-0

Forecasting oil prices with random forests

Published: 07 August 2023

Volume 66, pages 927–943, (2024)
Cite this article

Empirical Economics Aims and scope Submit manuscript

Emanuel Kohlscheen¹

288 Accesses
Explore all metrics

Abstract

This study analyzes oil price movements through the lens of an agnostic random forest model, which is based on 1000 regression trees. It shows that this highly disciplined, yet flexible computational model reduces in-sample root-mean-square errors (RMSEs) by 65% relative to a standard linear least square model that uses the same comprehensive set of 11 high-frequency explanatory factors. In 1–3 months ahead price forecasting exercises the RMSE reduction relative to OLS ranges exceeds 50%, highlighting the relevance of non-linearities in oil markets. The results underscore the importance of incorporating financial factors into oil models: US interest rates, the dollar and the VIX together account for 39% of the models’ RMSE reduction in the post-2010 sample, rising to 48% in the post-2020 sample.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Forecasting the realized variance of oil-price returns: a disaggregated analysis of the role of uncertainty and geopolitical risk

Article 07 March 2022

Forecasting precious metal returns with multivariate random forests

Article 04 September 2018

Gold Price Forecasting and Related Influence Factors Analysis Based on Random Forest

Notes

The dollar index varied very little on that day.
The full description of all the time series that were used can be found in the Appendix.
Observed added.
Fernandez-Delgado et al (2014) compared the performance of 179 classifier models across 121 datasets, and found (the relatively simple) random forests to be the top performers.
Within the asset pricing literature, Gu et al (2020) report large gains for investors that use machine learning. They attribute the predictive gains to “allowing nonlinear predictor interactions missed by other methods”.
Guliyev and Mustafayev (2022) have also analyzed oil prices using machine learning techniques. However, their study uses a less complete set of explanatory variables (for instance for activity and financial data), does not report their average absolute Shapley values, nor derives partial effects.
Note that predicted values here, as in econometrics, refer to fitted values. These are based on contemporaneous explanatory variables. In Sect. 5, prediction refers to pure forecasting.
Figure 1 plotted using p = 10 (see later discussion).
Mentch and Zhou (2020) find that “the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal to noise (SNR) settings.”.
At least as long as p < 4 is excluded.
Predictor importance factors are based on how much each factor contributed to reduce RMSEs after the splits in each tree. In other words, it is a metric on how many of the splits were based on that feature.
This follows the Chakraborty and Joseph (2017) recommendation to limit complexity from the onset.

References

Aastveit KA, Bjørnland HC, Thorsrud LA (2014) What drives oil prices? Emerging versus developed economies. J Appl Econom 30:1013–1028
Article MathSciNet Google Scholar
Akram QF (2009) Commodity prices, interest rates and the dollar. Energy Econ 31:838–851
Article Google Scholar
Athey S, Imbens GW (2019) Machine learning methods that economists should know about. Annu Rev Econ 11:685–725
Article Google Scholar
Avalos F (2014) Do oil prices drive food prices? The tale of a structural break. J Int Money Financ 42:253–271
Article Google Scholar
Baumeister C, Killian L (2014) What central bankers need to know about forecasting oil prices. Int Econ Rev 55(3):869–889
Article Google Scholar
Baumeister C, Killian L (2016) Forty years of oil price fluctuations: why the price of oil may still surprise us. J Econ Perspect 30(1):139–160
Article Google Scholar
Baumeister C, Korobilis D, Lee TK (2022) Energy markets and global economic conditions. Rev Econ Stat 104(4):828–844
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression Trees. CRC Press, Boca Raton, FL
Chakraborty C and Joseph A (2017) Machine learning at central banks. Bank of England Working Paper no. 674
de Schryder S, Peersman G (2015) The US dollar exchange rate and the demand for oil. Energy J 36(3):263–285
Article Google Scholar
Elder J, Serletis A (2010) Oil price uncertainty. J Money Credit Bank 42(6):1137–1159
Article Google Scholar
Fernandez-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181
MathSciNet Google Scholar
Gu S, Kelly B, Xiu D (2020) Empirical asset pricing via machine learning. Rev Financ Stud 33:5
Article Google Scholar
Guliyev H, Mustafayev E (2022) Predicting the changes in the WTI oil price dynamics using machine learning models. Resour Policy 77:102664
Article Google Scholar
Hamilton JD (1983) Oil and the macroeconomy since World War II. J Political Econ 91:2
Article Google Scholar
Hamilton JD (1996) This is what happened to the oil price-macroeconomy relationship. J Monet Econ 38:215–220
Article Google Scholar
Hamilton JD (2003) What is an oil shock? J Econom 113:363–398
Article MathSciNet Google Scholar
Hamilton JD (2009a) Causes and consequences of the oil shock of 2007–08. Brook Pap Econ Act 2009(1):215–261
Article Google Scholar
Hamilton JD (2009b) Understanding crude oil prices. Energy J 30(2):179–206
Article Google Scholar
Herrera AM, Karadi MB, Rangaraju SK (2019) Oil price shocks and U.S. economic activity. Energy Policy 129:89–99
Article Google Scholar
Jo S (2014) The effects of oil price uncertainty on global real economic activity. J Money Credit Bank 46(6):1113–1135
Article Google Scholar
Jordà O (2005) Estimation and inference of impulse responses by local projections. Am Econ Rev 95(1):161–182
Article Google Scholar
Kilian L (2009) Not all oil price shocks are alike: disentangling demand and supply shock in the crude oil market. Am Econ Rev 99:1053–1069
Article Google Scholar
Kilian L, Vigfusson RJ (2017ba) The role of oil price shocks in causing U.S. recessions. J Money Credit Bank 49(8):1747–1775
Article Google Scholar
Kilian L, Vigfusson RJ (2017b) Do oil prices help forecast U.S. real GDP? The role of nonlinearities and asymmetries. J Bus Econ Stat 31(1):78–93
Article MathSciNet Google Scholar
Kilian L, Zhou X (2018) Modelling fluctuations in the global demand for commodities. J Int Money Financ 88:54–78
Article Google Scholar
Kleinberg J, Ludwig J, Mullainathan S, Obermeyer Z (2015) Prediction policy problems. Am Econ Rev 105(5):491–495
Article PubMed PubMed Central Google Scholar
Medeiros MC, Vasconcelos GFR, Veiga A, Zilberman E (2021) Forecasting inflation in a data-rich environment: the benefits of machine learning methods. J Bus Econ Stat 39(1):98–119
Article MathSciNet Google Scholar
Mentch L, Zhou S (2020) Randomization as regularization: a degrees of freedom explanation for random forest success. J Mach Learn Res 21:1–36
MathSciNet Google Scholar
Mullainathan S, Spiess J (2017) Machine learning: an applied econometric approach. J Econ Perspect 31(2):87–106
Article Google Scholar
Varian H (2014) Big data: new tricks for econometrics. J Econ Perspect 28(2):3–28
Article Google Scholar

Download references

Acknowledgements

I am grateful to Christiane Baumeister, Deniz Igan and Paulo Santos Monteiro for helpful comments and to Emese Kuruc for excellent research assistance. The views expressed in this paper are those of the author and do not necessarily reflect those of the Bank for International Settlements.

Author information

Authors and Affiliations

Bank for International Settlements, Centralbahnplatz 2, 4002, Basel, Switzerland
Emanuel Kohlscheen

Authors

Emanuel Kohlscheen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Emanuel Kohlscheen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

See Fig. 8.

See Tables 4, 5 and 6.

Table 4 Varying max tree depth: normalized factor importance

Full size table

Table 5 WTI price: normalized factor importance

Full size table

Table 6 Summary statistics

Full size table

Data Appendix

The database on which this study is based comprises 3144 observations. All series are daily, with the exception of the core PCE index and PMIs, which are linearly interpolated to daily frequency.

List of variables by sources:

1.
Bloomberg
•
Brent oil price in US $/barrel.
•
West Texas Intermediate (WTI) oil price in US $/barrel.
•
VIX: VIX volatility index traded on the CBOE.
•
Interest rate: 2 year T-Bill yield, United States.
•
CESI AE: Citibank economic surprise index, advanced economies.
•
CESI EME: Citibank economic surprise index, emerging economies.
2.
IHS Markit
•
PMI AE: GDP weighted average of the manufacturing PMIs of the United States, euro area, Japan, United Kingdom, Canada, Australia, Switzerland, Denmark and New Zealand. Interpolated to daily frequency.
•
PMI EME: GDP weighted average of the manufacturing PMIs of China, India, Brazil, Russia, Mexico, Turkey, Indonesia, Korea, Malaysia, the Philippines, Thailand, Colombia, Czechia, Poland and Taiwan. Interpolated to daily frequency.
3.
Our world in Data.org
•
Covid: daily number of global Covid-19 casualties, 7-day moving average.
4.
Bank for international settlements
•
Dollar index: nominal effective exchange rate, United States.
5.
International monetary fund
•
pnfc: price index of non-fuel commodities.
6.
Federal reserve, St. Louis (FRED)
•
Core PCE index: Core PCE index, interpolated to daily frequency.

The time period of the sample is 2 January 2010–20 January 2022.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kohlscheen, E. Forecasting oil prices with random forests. Empir Econ 66, 927–943 (2024). https://doi.org/10.1007/s00181-023-02480-0

Download citation

Received: 10 November 2022
Accepted: 20 July 2023
Published: 07 August 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00181-023-02480-0

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Forecasting oil prices with random forests

Abstract

Access this article

Similar content being viewed by others

Forecasting the realized variance of oil-price returns: a disaggregated analysis of the role of uncertainty and geopolitical risk

Forecasting precious metal returns with multivariate random forests

Gold Price Forecasting and Related Influence Factors Analysis Based on Random Forest

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Data Appendix

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Forecasting oil prices with random forests

Abstract

Access this article

Similar content being viewed by others

Forecasting the realized variance of oil-price returns: a disaggregated analysis of the role of uncertainty and geopolitical risk

Forecasting precious metal returns with multivariate random forests

Gold Price Forecasting and Related Influence Factors Analysis Based on Random Forest

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Data Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation