Skip to main content
Log in

Forecasting oil prices with random forests

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

This study analyzes oil price movements through the lens of an agnostic random forest model, which is based on 1000 regression trees. It shows that this highly disciplined, yet flexible computational model reduces in-sample root-mean-square errors (RMSEs) by 65% relative to a standard linear least square model that uses the same comprehensive set of 11 high-frequency explanatory factors. In 1–3 months ahead price forecasting exercises the RMSE reduction relative to OLS ranges exceeds 50%, highlighting the relevance of non-linearities in oil markets. The results underscore the importance of incorporating financial factors into oil models: US interest rates, the dollar and the VIX together account for 39% of the models’ RMSE reduction in the post-2010 sample, rising to 48% in the post-2020 sample.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The dollar index varied very little on that day.

  2. The full description of all the time series that were used can be found in the Appendix.

  3. Observed added.

  4. Fernandez-Delgado et al (2014) compared the performance of 179 classifier models across 121 datasets, and found (the relatively simple) random forests to be the top performers.

  5. Within the asset pricing literature, Gu et al (2020) report large gains for investors that use machine learning. They attribute the predictive gains to “allowing nonlinear predictor interactions missed by other methods”.

  6. Guliyev and Mustafayev (2022) have also analyzed oil prices using machine learning techniques. However, their study uses a less complete set of explanatory variables (for instance for activity and financial data), does not report their average absolute Shapley values, nor derives partial effects.

  7. Note that predicted values here, as in econometrics, refer to fitted values. These are based on contemporaneous explanatory variables. In Sect. 5, prediction refers to pure forecasting.

  8. Figure 1 plotted using p = 10 (see later discussion).

  9. Mentch and Zhou (2020) find that “the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal to noise (SNR) settings.”.

  10. At least as long as p < 4 is excluded.

  11. Predictor importance factors are based on how much each factor contributed to reduce RMSEs after the splits in each tree. In other words, it is a metric on how many of the splits were based on that feature.

  12. This follows the Chakraborty and Joseph (2017) recommendation to limit complexity from the onset.

References

  • Aastveit KA, Bjørnland HC, Thorsrud LA (2014) What drives oil prices? Emerging versus developed economies. J Appl Econom 30:1013–1028

    Article  MathSciNet  Google Scholar 

  • Akram QF (2009) Commodity prices, interest rates and the dollar. Energy Econ 31:838–851

    Article  Google Scholar 

  • Athey S, Imbens GW (2019) Machine learning methods that economists should know about. Annu Rev Econ 11:685–725

    Article  Google Scholar 

  • Avalos F (2014) Do oil prices drive food prices? The tale of a structural break. J Int Money Financ 42:253–271

    Article  Google Scholar 

  • Baumeister C, Killian L (2014) What central bankers need to know about forecasting oil prices. Int Econ Rev 55(3):869–889

    Article  Google Scholar 

  • Baumeister C, Killian L (2016) Forty years of oil price fluctuations: why the price of oil may still surprise us. J Econ Perspect 30(1):139–160

    Article  Google Scholar 

  • Baumeister C, Korobilis D, Lee TK (2022) Energy markets and global economic conditions. Rev Econ Stat 104(4):828–844

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression Trees. CRC Press, Boca Raton, FL

  • Chakraborty C and Joseph A (2017) Machine learning at central banks. Bank of England Working Paper no. 674

  • de Schryder S, Peersman G (2015) The US dollar exchange rate and the demand for oil. Energy J 36(3):263–285

    Article  Google Scholar 

  • Elder J, Serletis A (2010) Oil price uncertainty. J Money Credit Bank 42(6):1137–1159

    Article  Google Scholar 

  • Fernandez-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181

    MathSciNet  Google Scholar 

  • Gu S, Kelly B, Xiu D (2020) Empirical asset pricing via machine learning. Rev Financ Stud 33:5

    Article  Google Scholar 

  • Guliyev H, Mustafayev E (2022) Predicting the changes in the WTI oil price dynamics using machine learning models. Resour Policy 77:102664

    Article  Google Scholar 

  • Hamilton JD (1983) Oil and the macroeconomy since World War II. J Political Econ 91:2

    Article  Google Scholar 

  • Hamilton JD (1996) This is what happened to the oil price-macroeconomy relationship. J Monet Econ 38:215–220

    Article  Google Scholar 

  • Hamilton JD (2003) What is an oil shock? J Econom 113:363–398

    Article  MathSciNet  Google Scholar 

  • Hamilton JD (2009a) Causes and consequences of the oil shock of 2007–08. Brook Pap Econ Act 2009(1):215–261

    Article  Google Scholar 

  • Hamilton JD (2009b) Understanding crude oil prices. Energy J 30(2):179–206

    Article  Google Scholar 

  • Herrera AM, Karadi MB, Rangaraju SK (2019) Oil price shocks and U.S. economic activity. Energy Policy 129:89–99

    Article  Google Scholar 

  • Jo S (2014) The effects of oil price uncertainty on global real economic activity. J Money Credit Bank 46(6):1113–1135

    Article  Google Scholar 

  • Jordà O (2005) Estimation and inference of impulse responses by local projections. Am Econ Rev 95(1):161–182

    Article  Google Scholar 

  • Kilian L (2009) Not all oil price shocks are alike: disentangling demand and supply shock in the crude oil market. Am Econ Rev 99:1053–1069

    Article  Google Scholar 

  • Kilian L, Vigfusson RJ (2017ba) The role of oil price shocks in causing U.S. recessions. J Money Credit Bank 49(8):1747–1775

    Article  Google Scholar 

  • Kilian L, Vigfusson RJ (2017b) Do oil prices help forecast U.S. real GDP? The role of nonlinearities and asymmetries. J Bus Econ Stat 31(1):78–93

    Article  MathSciNet  Google Scholar 

  • Kilian L, Zhou X (2018) Modelling fluctuations in the global demand for commodities. J Int Money Financ 88:54–78

    Article  Google Scholar 

  • Kleinberg J, Ludwig J, Mullainathan S, Obermeyer Z (2015) Prediction policy problems. Am Econ Rev 105(5):491–495

    Article  PubMed  PubMed Central  Google Scholar 

  • Medeiros MC, Vasconcelos GFR, Veiga A, Zilberman E (2021) Forecasting inflation in a data-rich environment: the benefits of machine learning methods. J Bus Econ Stat 39(1):98–119

    Article  MathSciNet  Google Scholar 

  • Mentch L, Zhou S (2020) Randomization as regularization: a degrees of freedom explanation for random forest success. J Mach Learn Res 21:1–36

    MathSciNet  Google Scholar 

  • Mullainathan S, Spiess J (2017) Machine learning: an applied econometric approach. J Econ Perspect 31(2):87–106

    Article  Google Scholar 

  • Varian H (2014) Big data: new tricks for econometrics. J Econ Perspect 28(2):3–28

    Article  Google Scholar 

Download references

Acknowledgements

I am grateful to Christiane Baumeister, Deniz Igan and Paulo Santos Monteiro for helpful comments and to Emese Kuruc for excellent research assistance. The views expressed in this paper are those of the author and do not necessarily reflect those of the Bank for International Settlements.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Emanuel Kohlscheen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

See Fig. 8.

See Tables 4, 5 and 6.

Fig. 8
figure 8

Example: regression tree

Table 4 Varying max tree depth: normalized factor importance
Table 5 WTI price: normalized factor importance
Table 6 Summary statistics

Data Appendix

The database on which this study is based comprises 3144 observations. All series are daily, with the exception of the core PCE index and PMIs, which are linearly interpolated to daily frequency.

List of variables by sources:

  1. 1.

    Bloomberg

  2. Brent oil price in US $/barrel.

  3. West Texas Intermediate (WTI) oil price in US $/barrel.

  4. VIX: VIX volatility index traded on the CBOE.

  5. Interest rate: 2 year T-Bill yield, United States.

  6. CESI AE: Citibank economic surprise index, advanced economies.

  7. CESI EME: Citibank economic surprise index, emerging economies.

  8. 2.

    IHS Markit

  9. PMI AE: GDP weighted average of the manufacturing PMIs of the United States, euro area, Japan, United Kingdom, Canada, Australia, Switzerland, Denmark and New Zealand. Interpolated to daily frequency.

  10. PMI EME: GDP weighted average of the manufacturing PMIs of China, India, Brazil, Russia, Mexico, Turkey, Indonesia, Korea, Malaysia, the Philippines, Thailand, Colombia, Czechia, Poland and Taiwan. Interpolated to daily frequency.

  11. 3.

    Our world in Data.org

  12. Covid: daily number of global Covid-19 casualties, 7-day moving average.

  13. 4.

    Bank for international settlements

  14. Dollar index: nominal effective exchange rate, United States.

  15. 5.

    International monetary fund

  16. pnfc: price index of non-fuel commodities.

  17. 6.

    Federal reserve, St. Louis (FRED)

  18. Core PCE index: Core PCE index, interpolated to daily frequency.

The time period of the sample is 2 January 2010–20 January 2022.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kohlscheen, E. Forecasting oil prices with random forests. Empir Econ 66, 927–943 (2024). https://doi.org/10.1007/s00181-023-02480-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-023-02480-0

Keywords

JEL Classification

Navigation