Abstract
This study analyzes oil price movements through the lens of an agnostic random forest model, which is based on 1000 regression trees. It shows that this highly disciplined, yet flexible computational model reduces in-sample root-mean-square errors (RMSEs) by 65% relative to a standard linear least square model that uses the same comprehensive set of 11 high-frequency explanatory factors. In 1–3 months ahead price forecasting exercises the RMSE reduction relative to OLS ranges exceeds 50%, highlighting the relevance of non-linearities in oil markets. The results underscore the importance of incorporating financial factors into oil models: US interest rates, the dollar and the VIX together account for 39% of the models’ RMSE reduction in the post-2010 sample, rising to 48% in the post-2020 sample.
Similar content being viewed by others
Notes
The dollar index varied very little on that day.
The full description of all the time series that were used can be found in the Appendix.
Observed added.
Fernandez-Delgado et al (2014) compared the performance of 179 classifier models across 121 datasets, and found (the relatively simple) random forests to be the top performers.
Within the asset pricing literature, Gu et al (2020) report large gains for investors that use machine learning. They attribute the predictive gains to “allowing nonlinear predictor interactions missed by other methods”.
Guliyev and Mustafayev (2022) have also analyzed oil prices using machine learning techniques. However, their study uses a less complete set of explanatory variables (for instance for activity and financial data), does not report their average absolute Shapley values, nor derives partial effects.
Note that predicted values here, as in econometrics, refer to fitted values. These are based on contemporaneous explanatory variables. In Sect. 5, prediction refers to pure forecasting.
Figure 1 plotted using p = 10 (see later discussion).
Mentch and Zhou (2020) find that “the additional randomness injected into individual trees serves as a form of implicit regularization, making random forests an ideal model in low signal to noise (SNR) settings.”.
At least as long as p < 4 is excluded.
Predictor importance factors are based on how much each factor contributed to reduce RMSEs after the splits in each tree. In other words, it is a metric on how many of the splits were based on that feature.
This follows the Chakraborty and Joseph (2017) recommendation to limit complexity from the onset.
References
Aastveit KA, Bjørnland HC, Thorsrud LA (2014) What drives oil prices? Emerging versus developed economies. J Appl Econom 30:1013–1028
Akram QF (2009) Commodity prices, interest rates and the dollar. Energy Econ 31:838–851
Athey S, Imbens GW (2019) Machine learning methods that economists should know about. Annu Rev Econ 11:685–725
Avalos F (2014) Do oil prices drive food prices? The tale of a structural break. J Int Money Financ 42:253–271
Baumeister C, Killian L (2014) What central bankers need to know about forecasting oil prices. Int Econ Rev 55(3):869–889
Baumeister C, Killian L (2016) Forty years of oil price fluctuations: why the price of oil may still surprise us. J Econ Perspect 30(1):139–160
Baumeister C, Korobilis D, Lee TK (2022) Energy markets and global economic conditions. Rev Econ Stat 104(4):828–844
Breiman L (2001) Random forests. Mach Learn 45:5–32
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression Trees. CRC Press, Boca Raton, FL
Chakraborty C and Joseph A (2017) Machine learning at central banks. Bank of England Working Paper no. 674
de Schryder S, Peersman G (2015) The US dollar exchange rate and the demand for oil. Energy J 36(3):263–285
Elder J, Serletis A (2010) Oil price uncertainty. J Money Credit Bank 42(6):1137–1159
Fernandez-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(1):3133–3181
Gu S, Kelly B, Xiu D (2020) Empirical asset pricing via machine learning. Rev Financ Stud 33:5
Guliyev H, Mustafayev E (2022) Predicting the changes in the WTI oil price dynamics using machine learning models. Resour Policy 77:102664
Hamilton JD (1983) Oil and the macroeconomy since World War II. J Political Econ 91:2
Hamilton JD (1996) This is what happened to the oil price-macroeconomy relationship. J Monet Econ 38:215–220
Hamilton JD (2003) What is an oil shock? J Econom 113:363–398
Hamilton JD (2009a) Causes and consequences of the oil shock of 2007–08. Brook Pap Econ Act 2009(1):215–261
Hamilton JD (2009b) Understanding crude oil prices. Energy J 30(2):179–206
Herrera AM, Karadi MB, Rangaraju SK (2019) Oil price shocks and U.S. economic activity. Energy Policy 129:89–99
Jo S (2014) The effects of oil price uncertainty on global real economic activity. J Money Credit Bank 46(6):1113–1135
Jordà O (2005) Estimation and inference of impulse responses by local projections. Am Econ Rev 95(1):161–182
Kilian L (2009) Not all oil price shocks are alike: disentangling demand and supply shock in the crude oil market. Am Econ Rev 99:1053–1069
Kilian L, Vigfusson RJ (2017ba) The role of oil price shocks in causing U.S. recessions. J Money Credit Bank 49(8):1747–1775
Kilian L, Vigfusson RJ (2017b) Do oil prices help forecast U.S. real GDP? The role of nonlinearities and asymmetries. J Bus Econ Stat 31(1):78–93
Kilian L, Zhou X (2018) Modelling fluctuations in the global demand for commodities. J Int Money Financ 88:54–78
Kleinberg J, Ludwig J, Mullainathan S, Obermeyer Z (2015) Prediction policy problems. Am Econ Rev 105(5):491–495
Medeiros MC, Vasconcelos GFR, Veiga A, Zilberman E (2021) Forecasting inflation in a data-rich environment: the benefits of machine learning methods. J Bus Econ Stat 39(1):98–119
Mentch L, Zhou S (2020) Randomization as regularization: a degrees of freedom explanation for random forest success. J Mach Learn Res 21:1–36
Mullainathan S, Spiess J (2017) Machine learning: an applied econometric approach. J Econ Perspect 31(2):87–106
Varian H (2014) Big data: new tricks for econometrics. J Econ Perspect 28(2):3–28
Acknowledgements
I am grateful to Christiane Baumeister, Deniz Igan and Paulo Santos Monteiro for helpful comments and to Emese Kuruc for excellent research assistance. The views expressed in this paper are those of the author and do not necessarily reflect those of the Bank for International Settlements.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
See Fig. 8.
Data Appendix
The database on which this study is based comprises 3144 observations. All series are daily, with the exception of the core PCE index and PMIs, which are linearly interpolated to daily frequency.
List of variables by sources:
-
1.
Bloomberg
-
•
Brent oil price in US $/barrel.
-
•
West Texas Intermediate (WTI) oil price in US $/barrel.
-
•
VIX: VIX volatility index traded on the CBOE.
-
•
Interest rate: 2 year T-Bill yield, United States.
-
•
CESI AE: Citibank economic surprise index, advanced economies.
-
•
CESI EME: Citibank economic surprise index, emerging economies.
-
2.
IHS Markit
-
•
PMI AE: GDP weighted average of the manufacturing PMIs of the United States, euro area, Japan, United Kingdom, Canada, Australia, Switzerland, Denmark and New Zealand. Interpolated to daily frequency.
-
•
PMI EME: GDP weighted average of the manufacturing PMIs of China, India, Brazil, Russia, Mexico, Turkey, Indonesia, Korea, Malaysia, the Philippines, Thailand, Colombia, Czechia, Poland and Taiwan. Interpolated to daily frequency.
-
3.
Our world in Data.org
-
•
Covid: daily number of global Covid-19 casualties, 7-day moving average.
-
4.
Bank for international settlements
-
•
Dollar index: nominal effective exchange rate, United States.
-
5.
International monetary fund
-
•
pnfc: price index of non-fuel commodities.
-
6.
Federal reserve, St. Louis (FRED)
-
•
Core PCE index: Core PCE index, interpolated to daily frequency.
The time period of the sample is 2 January 2010–20 January 2022.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kohlscheen, E. Forecasting oil prices with random forests. Empir Econ 66, 927–943 (2024). https://doi.org/10.1007/s00181-023-02480-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00181-023-02480-0