Natural language based financial forecasting: a survey

Abstract

Natural language processing (NLP), or the pragmatic research perspective of computational linguistics, has become increasingly powerful due to data availability and various techniques developed in the past decade. This increasing capability makes it possible to capture sentiments more accurately and semantics in a more nuanced way. Naturally, many applications are starting to seek improvements by adopting cutting-edge NLP techniques. Financial forecasting is no exception. As a result, articles that leverage NLP techniques to predict financial markets are fast accumulating, gradually establishing the research field of natural language based financial forecasting (NLFF), or from the application perspective, stock market prediction. This review article clarifies the scope of NLFF research by ordering and structuring techniques and applications from related work. The survey also aims to increase the understanding of progress and hotspots in NLFF, and bring about discussions across many different disciplines.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. Anton M, Polk C (2014) Connected stocks. J Finance 69(3):1099–1127

    Article  Google Scholar 

  2. Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards. J Finance 59(3):1259–1294

    Article  Google Scholar 

  3. Avramov D, Zhou G (2010) Bayesian portfolio analysis. Annu Rev Financ Econ 2:25–47

    Article  Google Scholar 

  4. Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: 7th language resources and evaluation conference, pp 2200–2204

  5. Banko M, Cafarella MJ, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the web. In: International joint conference on artificial intelligence, pp 2670–2676

  6. Bao T, Hommes C, Makarewicz T (2015) Bubble formation and (in)efficient markets in learning-to-forecast and -optimise experiments. Tinbergen Institute Discussion Paper TI 2015-107/II. https://www.econstor.eu/bitstream/10419/125108/1/15107.pdf

  7. Bengio Y, Ducharme R, Vincent P (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  8. Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  Google Scholar 

  9. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8

    Article  Google Scholar 

  10. Bouchey P, Nemtchinov V, Wong TKL (2015) Volatility harvesting in theory and practice. J Wealth Manage 18(3):89–100

    Article  Google Scholar 

  11. Brabazon A, O’Neill M (2008) An introduction to evolutionary computation in finance. IEEE Comput Intell Mag 3(4):42–55

    Article  Google Scholar 

  12. Brachman RJ, Khabaza T et al (1996) Mining business databases. Commun ACM 39(11):42–48

    Article  Google Scholar 

  13. Brown GW, Cliff MT (2004) Investor sentiment and the near-term stock market. J Empir Finance 11:1–27

    Article  Google Scholar 

  14. Bühler K (1934) Sprachtheorie. Fischer, Jena

    Google Scholar 

  15. Cambria E (2013) An introduction to concept-level sentiment analysis. In: Lecture notes in computer science (LNCS), vol 8266. Springer, pp 478–483

  16. Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107

    Article  Google Scholar 

  17. Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57

    Article  Google Scholar 

  18. Cambria E, Livingstone A, Hussain A (2012) The hourglass of emotions. In: Lecture notes in computer science, vol 7403. Springer, pp 144–157

  19. Cambria E, Wang H, White B (2014) Guest editorial: big social data analysis. Knowl-Based Syst 69:1–2

    Article  Google Scholar 

  20. Cambria E, Poria S, Bajpai R, Schuller B (2016) SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: International conference on computational linguistics (COLING), pp 2666–2677

  21. Cambria E, Poria S, Gelbukh A, Thelwall M (2017) Sentiment analysis is a big suitcase. IEEE Intell Syst 32(6):74–80

  22. Cavalcante RC, Brasileiro RC, Souza VL, Nobrega JP, Oliveira AL (2016) Computational intelligence and financial markets: a survey and future directions. Expert Syst Appl 55:194–211

    Article  Google Scholar 

  23. Chan SW, Chong MW (2017) Sentiment analysis in financial texts. Decis Support Syst 94:53–64

    Article  Google Scholar 

  24. Chan S, Franklin J (2011) A text-based decision support system for financial sequence prediction. Decis Support Syst 52(1):189–198

    Article  Google Scholar 

  25. Chang CY, Zhang Y, Teng Z, Bozanic Z, Ke B (2016) Measuring the information content of financial news. In: Proceedings of the the 26th international conference on computational linguistics

  26. Chaturvedi I, Ong YS, Tsang I, Welsch R, Cambria E (2016) Learning word dependencies in text by means of a deep recurrent belief network. Knowl-Based Syst 108:144–154

    Article  Google Scholar 

  27. Chaturvedi I, Ragusa E, Gastaldo P, Zunino R, Cambria E (2017) Bayesian network based extreme learning machine for subjectivity detection. J Frankl Inst. https://doi.org/10.1016/j.jfranklin.2017.06.007

    Google Scholar 

  28. Chen N, Ribeiro B, Chen A (2016) Financial credit risk assessment: a recent review. Artif Intell Rev 45:1–23

    Article  Google Scholar 

  29. Choi H, Varian H (2012) Predicting the present with google trends. Econ Rec 88(1):2–9

    Article  Google Scholar 

  30. Chomsky N (1956) Three models for the description of language. IRE Trans Inf Theory 2(3):113–124. https://doi.org/10.1109/TIT.1956.1056813

    MATH  Article  Google Scholar 

  31. Cohen L, Frazzini A (2008) Economic links and predictable returns. J Finance 63(4):1977–2011

    Article  Google Scholar 

  32. Das SR, Chen MY (2007) Yahoo! for amazon: sentiment extraction from small talk on the web. Manage Sci 53(9):1375–1388

    Article  Google Scholar 

  33. Ding X (2016) Research on methodology of market trends prediction based on social media. Ph.D. thesis, Harbin Institute of Technology

  34. Ding X, Zhang Y, Liu T, Duan J (2015) Deep learning for event-driven stock prediction. In: International joint conference on artificial intelligence

  35. Dong L, Wang Z, Xiong D (2017) Stock market prediction based on text information. Acta Scientiarum Naturalium Universitatis Pekinesis. https://doi.org/10.13209/j.0479-8023.2017.037

  36. Fama EF (1970) Efficient capital markets: a review of theory and empirical work. J Finance 25:383–417

    Article  Google Scholar 

  37. Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89

    Article  Google Scholar 

  38. Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge

    Google Scholar 

  39. Frazier KB, Ingram RW, Tennyson BM (1984) A methodology for the analysis of narrative accounting disclosures. J Account Res 22(1):318–331

    Article  Google Scholar 

  40. Fung GPC, Yu JX, Lam W (2003) Stock prediction: integrating text mining approach using real-time news. In: 2003 IEEE international conference on computational intelligence for financial engineering, 2003. Proceedings, pp 395–402. https://doi.org/10.1109/CIFER.2003.1196287

  41. Groth SS, Muntermann J (2011) An intraday market risk management approach based on textual analysis. Decis Support Syst 50(4):680–691

    Article  Google Scholar 

  42. Guha RV, Lenat DB (1990) Cyc: a midterm report. AI Mag 11(3):32–59

    Google Scholar 

  43. Hagenau M, Liebmann M, Neumann D (2013) Automated news reading: stock price prediction based on financial news using context-capturing features. Decis Support Syst 55(3):685–697. https://doi.org/10.1016/j.dss.2013.02.006

    Article  Google Scholar 

  44. Hajizadeh E, Ardakani HD, Shahrabi J (2010) Application of data mining techniques in stock markets: a survey. J Econ Int Finance 2(7):109–118

    Google Scholar 

  45. Hamilton WL, Clark K, Leskovec J, Jurafsky D (2016) Inducing domain-specific sentiment lexicons from unlabeled corpora. In: Empirical methods in natural language processing (EMNLP), pp 595–605

  46. Harmer GP, Abbott D (1999) Parrondo’s paradox. Stat Sci 14(2):206–213

    MathSciNet  MATH  Article  Google Scholar 

  47. Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), pp 174–181

  48. Henry E (2008) Are investors influenced by how earnings press releases are written? Int J Bus Commun 45:363–407

    Article  Google Scholar 

  49. Heston SL, Sinha NR (2016) News versus sentiment: predicting stock returns from news stories. Technical Report 2016-048: Board of Governors of the Federal Reserve System, Washington

  50. Hofman JM, Sharma A, Watts DJ (2017) Prediction and explanation in social systems. Science 355(6324):486–488

    Article  Google Scholar 

  51. Hommes CH (2006) Heterogeneous agent models in economics and finance. In: Tesfatsion L, Judd K (eds) Handbook of computational economics II: agent-based economics. Elsevier, pp 1109–86

  52. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–177

  53. Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688

    Article  Google Scholar 

  54. Kelly EF (1975) Computer recognition of English word senses. Elsevier, Amsterdam

    Google Scholar 

  55. Kittrell J (2011) Sentiment reversals as buy signals. Wiley, Hoboken, pp 231–244. https://doi.org/10.1002/9781118467411.ch9

    Google Scholar 

  56. Koleva N, Paiva D (2009) Copula-based regression models: a survey. J Stat Plan Inference 139(11):3847–3856. https://doi.org/10.1016/j.jspi.2009.05.023

    MathSciNet  MATH  Article  Google Scholar 

  57. Kumar BS, Ravi V (2016) A survey of the applications of text mining in financial domain. Knowl-Based Syst 114:128–147

    Article  Google Scholar 

  58. Lakonishok J, Maberly E (1990) The weekend effect: trading patterns of individual and institutional investors. J Finance 40:231–243

    Article  Google Scholar 

  59. Lavrenko V, Schmill M, Lawrie D, Ogilvie P, Jensen D, Allan J (2000) Language models for financial news recommendation. In: Proceedings of the ninth international conference on information and knowledge management, pp 389–396

  60. LeBaron B, Arthur W, Palmer R (1999) Time series properties of an artificial stock market. J Econ Dyn Control 23:1487–1516

    MATH  Article  Google Scholar 

  61. Leetaru K, Schrodt PA (2013) Gdelt: global data on events, location, and tone, 1979–2012. In: ISA annual convention, vol 2. Citeseer

  62. Li B, Hoi SCH (2014) Online portfolio selection: a survey. ACM Comput Surv 46(3). https://doi.org/10.1145/2512962

  63. Li Q, Wang T, Gong Q, Chen Y, Lin Z, Song SK (2014a) Media-aware quantitative trading based on public web information. Decis Support Syst 61:93–105

    Article  Google Scholar 

  64. Li Q, Wang T, Li P, Liu L, Gong Q, Chen Y (2014b) The effect of news and public mood on stock movements. Inf Sci 278:826–840

    Article  Google Scholar 

  65. Li X, Xie H, Chen L, Wang J, Deng X (2014c) News impact on stock price return via sentiment analysis. Knowl-Based Syst 69:14–23

    Article  Google Scholar 

  66. Li B, Hoi SCH, Sahoo D, Liu ZY (2015) Moving average reversion strategy for on-line portfolio selection. Artif Intell 222:104–123

    MathSciNet  Article  Google Scholar 

  67. Li Q, Jiang L, Li P, Chen H (2015) Tensor-based learning for predicting stock movements. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 1784–1790

  68. Li L, Qin B, Ren W, Liu T (2016) Truth discovery with memory network. CoRR arXiv:1611.01868

  69. Liu H, Singh P (2004) ConceptNet—a practical commonsense reasoning tool-kit. BT Technol J 22(4):211–226

    Article  Google Scholar 

  70. Liu C, Hoi SCH, Zhao P, Sun J (2016) Online arima algorithms for time series prediction. In: Thirtieth AAAI conference on artificial intelligence

  71. Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. J Finance 66:67–97

    Article  Google Scholar 

  72. Loughran T, McDonald B (2016) Textual analysis in accounting and finance: a survey. J Account Res 54(4):1187–1230

    Article  Google Scholar 

  73. Ma Y, Cambria E, Gao S (2016) Label embedding for zero-shot fine-grained named entity typing. In: COLING, pp 171–180

  74. Majumder N, Poria S, Gelbukh A, Cambria E (2017) Deep learning based document modeling for personality detection from text. IEEE Intell Syst 32(2):74–79

    Article  Google Scholar 

  75. Malik HH, Bhardwaj VS, Fiorletta H (2011) Accurate information extraction for quantitative financial events. In: Proceedings of the 20th ACM international conference on information and knowledge management

  76. Marsella S, Gratch J (2014) Computationally modeling human emotion. Commun ACM 57(12):56–67

    Article  Google Scholar 

  77. Mihalcea R, Garimella A (2016) What men say, what women hear: finding gender-specific meaning shades. IEEE Intell Syst 31(4):62–67

    Article  Google Scholar 

  78. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. CoRR arXiv:1310.4546

  79. Moniz A, de Jong F (2014) Classifying the influence of negative affect expressed by the financial media on investor behavior. In: Fifth information interaction in context symposium, IIiX ’14, Regensburg, Germany, 26–29 Aug 2014, pp 275–278

  80. Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York,, pp 1089–1098. https://doi.org/10.1145/1835804.1835941

  81. Nassirtoussi AK, Aghabozorgi S, Waha TY, Ngo DCL (2014) Text mining for market prediction: a systematic review. Expert Syst Appl 41:7653–7670

    Article  Google Scholar 

  82. Nguyen TH, Shirai K (2015) Topic modeling based sentiment analysis on social media for stock market prediction. In: The 53rd annual meeting of the association for computational linguistics (ACL), pp 1354–1364

  83. Nguyen TH, Shirai K, Velcin J (2015) Sentiment analysis on social media for stock movement prediction. Expert Syst Appl 42:9603–9611

    Article  Google Scholar 

  84. Njølstad LSH (2014) Sentiment analysis for financial applications. Master’s thesis, Norwegian University of Science and Technology

  85. Nofer M, Hinz O (2015) Using twitter to predict the stock market: where is the mood effect? Bus Inf Syst Eng 57(4):229–242

    Article  Google Scholar 

  86. Oliveira N, Cortez P, Areal N (2016) Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decis Support Syst 85:62–73

    Article  Google Scholar 

  87. Oliveira N, Cortez P, Areal N (2017) The impact of microblogging data for stock market prediction: using twitter to predict returns, volatility, trading volume and survey sentiment indices. Expert Syst Appl 73:125–144

    Article  Google Scholar 

  88. Owyang J (2009) The future of the social web. Forrester Research Inc, Cambridge

    Google Scholar 

  89. Park CH, Irwin SH (2004) The profitability of technical analysis: a review. AgMAS project research report 2004-04, University of Illinois at Urbana-Champaign

  90. Peters EE (1991) A chaotic attractor for the S&P 500. Financ Anal J 47(2):55–62+81. http://www.jstor.org/stable/4479416

  91. Poria S, Cambria E, Gelbukh A (2016a) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49

    Article  Google Scholar 

  92. Poria S, Cambria E, Hazarika D, Vij P (2016b) A deeper look into sarcastic tweets using deep convolutional neural networks. In: COLING, pp 1601–1612

  93. Poria S, Chaturvedi I, Cambria E, Hussain A (2016c) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: ICDM, Barcelona, pp 439–448

  94. Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion 37:98–125

    Article  Google Scholar 

  95. Qian B, Rasheed K (2004) Hurst exponent and financial market predictability. In: Proceedings of the 2nd IASTED international conference on financial engineering and applications, pp 203–209

  96. Rachlin G, Last M, Alberg D, Kandel A (2007) Admiral: a data mining based financial trading system. In: IEEE symposium on computational intelligence and data mining

  97. Rajput V, Bobde S (2016) Stock market forecasting techniques: literature survey. Int J Comput Sci Mob Comput 5(6):500–506

    Google Scholar 

  98. Reuters T (2016) OptiRisk: Marketpsych indices and sentiment analysis toolkit. Products Leaflets Thomson Reuters

  99. Ruiz EJ, Hristidis V, Castillo C, Gionis A, Jaimes A (2012) Correlating financial time series with micro-blogging activity. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 513–522

  100. Sag IA, Baldwin T, Bond F, Copestake A, Flickinger D (2002) Multiword expressions: a pain in the neck for NLP. In: Lecture notes in computer science, vol 2276, pp 1–15

  101. Samo YLK, Vervuurt A (2016) Stochastic portfolio theory: a machine learning approach. In: Proceedings of the thirty-second conference on uncertainty in artificial intelligence (UAI)

  102. Schneider MJ, Gupta S (2016) Forecasting sales of new and existing products using consumer reviews: a random projections approach. Int J Forecast 32:243–256

    Article  Google Scholar 

  103. Schumaker RP, Chen H (2009) Textual analysis of stock market prediction using breaking financial news: the AZFin text system. ACM Trans Inf Syst 27(2):1–19. https://doi.org/10.1145/1462198.1462204

    Article  Google Scholar 

  104. Schumaker RP, Zhang Y, Huang CN, Chen H (2012) Financial fraud detection using vocal, linguistic and financial cues. Decis Support Syst 53:458–464

    Article  Google Scholar 

  105. Sehgal V, Song C (2007) Sops: stock prediction using web sentiment. In: Proceedings of the seventh IEEE international conference on data mining workshops, pp 21–26

  106. Shacham S (1983) A shortened version of the profile of mood states. J Personal Assess 47(3):305–306

    Article  Google Scholar 

  107. Shen W, Wang J, Ma S (2014) Doubly regularized portfolio with risk minimization. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence. AAAI Press, pp 1286–1292

  108. Si J, Mukherjee A, Liu B, Li Q, Li H, Deng X (2013) Exploiting topic based twitter sentiment for stock prediction. In: The 51st annual meeting of the association for computational linguistics (ACL)

  109. Si J, Mukherjee A, Liu B, Pan SJ, Li Q, Li H (2014) Exploiting social relations and sentiment for stock prediction. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1139–1145

  110. Sowa JF (1987) Semantic networks. In: Shapiro SC (eds) Encyclopedia of artificial intelligence. Wiley, pp 1011–1024

  111. Stein D, Bouchey P, Atwill T, Nemtchinov V (2013) Why does diversifying and rebalancing create alpha? White paper, Parametric

  112. Tai Y, Kao H (2013) Automatic domain-specific sentiment lexicon generation with label propagation. In: The 15th international conference on information integration and web-based applications and services, Vienna, Austria

  113. Taleb NN (2008) Finiteness of variance is irrelevant in the practice of quantitative finance. Complexity 14(3):66–76. https://doi.org/10.1002/cplx.20263

    MathSciNet  Article  Google Scholar 

  114. Tetlock PC, Saar-Tsechansky M, Macskassy S (2008) More than words: quantifying language to measure firms’ fundamentals. J Finance 63(3):1437–1467

    Article  Google Scholar 

  115. Ticknor JL (2013) A bayesian regularized artificial neural network for stock market forecasting. Expert Syst Appl 40(14):5501–5506

    Article  Google Scholar 

  116. Tkác M, Verner R (2016) Artificial neural networks in business: two decades of research. Appl Soft Comput 38:788–804

    Article  Google Scholar 

  117. Uhl M (2014) Reuters sentiment and stock returns. J Behav Finance 15(4):287–298

    Article  Google Scholar 

  118. Valitutti R (2004) WordNet-affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation, pp 1083–1086

  119. Vui CS et al (2013) A review of stock market prediction with artificial neural network. In: IEEE international conference on control system, computing and engineering, pp 477–482

  120. Wei W, Mao Y, Wang B (2016) Twitter volume spikes and stock options pricing. Comput Commun 73:271–281

    Article  Google Scholar 

  121. Weidmann NB, Ward MD (2010) Predicting conflict in space and time. J Confl Resolut 54(6):883–901

    Article  Google Scholar 

  122. Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005) OpinionFinder: a system for subjectivity analysis. In: Empirical methods in natural language processing (EMNLP)

  123. Witte JH (2015) Volatility harvesting: extracting return from randomness. CoRR arXiv:1508.05241

  124. Wuthrich B, Cho V, Leung S, Permunetilleke D, Sankaran K, Zhang J (1998) Daily stock market forecast from textual web data. In: IEEE international conference on systems, man, and cybernetics, vol 3, pp 2720–2725

  125. Xing FZ, Cambria E, Zou X (2017) Predicting evolving chaotic time series with fuzzy neural networks. In: International joint conference on neural networks (IJCNN), pp 3176–3183

  126. Yoshihara A, Seki K, Uehara K (2016) Leveraging temporal properties of news events for stock market prediction. Artif Intell Res 5(1):103–110

    Google Scholar 

  127. Zhang GP (2003) Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50:159–175

    MATH  Article  Google Scholar 

  128. Zhang W, Li C, Ye Y, Li W, Ngai EW (2015) Dynamic business network analysis for correlated stock price movement prediction. IEEE Intell Syst 30(2):26–33

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Erik Cambria.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xing, F.Z., Cambria, E. & Welsch, R.E. Natural language based financial forecasting: a survey. Artif Intell Rev 50, 49–73 (2018). https://doi.org/10.1007/s10462-017-9588-9

Download citation

Keywords

  • Financial forecasting
  • Natural language processing
  • Text mining
  • Predictive analytics
  • Knowledge engineering
  • Computational finance