Information Systems Frontiers

, Volume 21, Issue 1, pp 109–123 | Cite as

Extracting Knowledge from Technical Reports for the Valuation of West Texas Intermediate Crude Oil Futures

  • Joseph D. PrusaEmail author
  • Ryan T. Sagul
  • Taghi M. Khoshgoftaar


This paper proposes and demonstrates an approach for the often-attempted problem of market prediction, framed as classification task. We restrict our study to a widely purchased and well recognized commodity, West Texas Intermediate crude oil, which experiences significant volatility. For this purpose, nine learners using features extracted from monthly International Energy Agency (IEA) reports to predict undervalued, overvalued, and accurate valuation of the oil futures between 2003 and 2015. The often touted “Efficient Market Hypothesis” (EMH) suggests that it is impossible for individual investors to “beat the market” as market and external forces, such as geopolitical crises and natural disasters, are nearly impossible to predict. However, four algorithms were statistically better at the 95% confidence interval than “Zero-Rule” and “Random-Guess” strategies which are expected to pseudo-reflect the EMH. Furthermore, the addition of text features can significantly improve performance compared to only using price history from the oil futures data, challenging the validity of the semi-strong versions of the EMH in the crude oil market.


Machine learning Text mining Crude oil market 



We acknowledge partial support by the NSF (CNS-1427536). Opinions, findings, conclusions, or recommendations in this material are the authors’ and do not reflect the views of the NSF.


  1. Mittermayer, a.M., & Knolmayer, G.F. (2006). Newscats: A news categorization and trading systems. In Sixth international conference on data mining (icdm’06) (pp. 1002-1007), (to appear in print),
  2. Berenson, M.L., Goldstein, M., Levine, D. (1983). Intermediate statistical methods and applications: a computer package approach, 2nd edn. Upper Saddle River: Prentice Hall.Google Scholar
  3. Bong-Chan, K. (1996). Time-varying risk premia, volatility, and technical trading rule profits: Evidence from foreign currency futures markets. Journal of Financial Economics, 41(2), 249–290. Retrieved from Scholar
  4. Choi, K., & Hammoudeh, S. (2010). Volatility behavior of oil, industrial commodity and stock markets in a regime-switching environment. Energy Policy, 38(8), 4388–4399. Retrieved from Scholar
  5. Crawford, M., Khoshgoftaar, T.M., Prusa, J.D. (2016). Reducing feature set explosion to facilitate real-world review spam detection. In The twenty-ninth international flairs conference.Google Scholar
  6. Fama, E.F. (1970). Efficient capital markets: A review of theory and empirical work. The journal of Finance, 25(2), 383–417.CrossRefGoogle Scholar
  7. Froot, K.A., & Frankel, J.A. (1989). Forward discount bias: Is it an exchange risk premium?. The Quarterly Journal of Economics, 104(1), 139–161.CrossRefGoogle Scholar
  8. Graham, J.R., & Harvey, C.R. (1996). Market timing ability and volatility implied in investment newletters’ asset allocation recommendations (Tech. Rep.). National Bureau of Economic Research.Google Scholar
  9. Graham, J.R., & Harvey, C.R. (1997). Grading the performance of market-timing newsletters. Financial Analysts Journal, 53(6), 54–66.CrossRefGoogle Scholar
  10. Grossman, S.J., & Stiglitz, J.E. (1980). On the impossibility of informationally efficient markets. The American economic review, 70(3), 393–408.Google Scholar
  11. International Energy Agency. (n.d.). Monthly oil data service (mods). Retrieved from
  12. Jensen, M.C. (1968). The performance of mutual funds in the period 1945–1964. The Journal of Finance, 23(2), 389–416.CrossRefGoogle Scholar
  13. Jones, C.P., & Litzenberger, R.H. (1970). Quarterly earnings reports and intermediate stock price trends. The Journal of Finance, 25(1), 143–148.CrossRefGoogle Scholar
  14. Kaufmann, R.K., & Ullman, B. (2009). Oil prices, speculation, and fundamentals: Interpreting causal relations among spot and futures prices. Energy Economics, 31(4), 550–558. Retrieved from
  15. Lai, K., & et al. (2005). Journal of Systems Science and Complexity, 18(2), 145–166.Google Scholar
  16. Laibson, D. (1997). Golden eggs and hyperbolic discounting. The Quarterly Journal of Economics, 112(2), 443–478.CrossRefGoogle Scholar
  17. Lawrence, R. (1997). Using neural networks to forecast stock market prices. University of Manitoba, 333.Google Scholar
  18. Li, X., & Yu, T. (2016). Forecasting oil price trends with sentiment of online news articles. Procedia Computer Science, 91(2016), 1081–1087.CrossRefGoogle Scholar
  19. Malkiel, B.G. (2005). Reflections on the efficient market hypothesis: 30 years later. Financial Review, 40(1), 1–9.CrossRefGoogle Scholar
  20. Nassirtoussi, A.K., Aghabozorgi, S., Wah, T.Y., Ngo, D.C.L. (2014). Text mining for market prediction:A systematic review. Expert Systems with Applications, 41(16), 7653–7670.CrossRefGoogle Scholar
  21. Rachlin, G., Last, M., Alberg, D., Kandel, A. (2007). Admiral: A data mining based financial trading system . In 2007 ieee symposium on computational intelligence and data mining (pp. 720-0-725).
  22. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523.CrossRefGoogle Scholar
  23. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1–47.CrossRefGoogle Scholar
  24. Seker, S.E., Mert, C., Al-Naami, K., Ozalp, N., Ayan, U. (2014). Time series analysis on stock market for text mining correlation of economy news. Retrieved from CoRR arXiv:1403.2002.
  25. Seliya, N., Khoshgoftaar, T.M., Van Hulse, J. (2009). A study on the relationships of classifier performance metrics. In 21st international conference on Tools with artificial intelligence, 2009. ictai’09 (pp. 59–66).Google Scholar
  26. Sewell, M.V. (2012). The efficient market hypothesis: Empirical evidence. International Journal of Statistics and Probability, 1(2), 164.CrossRefGoogle Scholar
  27. Sun, A., Lachanski, M., Fabozzi, F.J. (2016). Trade the tweet: Social media text mining and sparse matrix factorization for stock market prediction. International Review of Financial Analysis, 48, 272–281. Retrieved from Scholar
  28. Weiss, G.M., & Provost, F. (2003). Learning when training data are costly: the effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315–354.CrossRefGoogle Scholar
  29. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J. (2016). Data mining: practical machine learning tools and techniques. Morgan Kaufmann.Google Scholar
  30. Xie, W., Yu, L., Xu, S., Wang, S. (2006). A new method for crude oil price forecasting based on support vector machines. In Computational Science—ICCS 2006 (pp. 444–451).Google Scholar
  31. Yu, L., Dai, W., Tang, L. (2016). A novel decomposition ensemble model with extended extreme learning machine for crude oil price forecasting. Engineering Applications of Artificial Intelligence, 47, 110–121.CrossRefGoogle Scholar
  32. Yu, L., Wang, S., Lai, K. (2005). A rough-set-refined text mining approach for crude oil market tendency forecasting. International Journal of Knowledge and Systems Sciences, 2(1), 33– 46.Google Scholar
  33. Zhang, J.-L., Zhang, Y.-J., Zhang, L. (2015). A novel hybrid method for crude oil price forecasting. Energy Economics, 49, 649– 659.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Florida Atlantic UniversityBoca RatonUSA

Personalised recommendations