Artificial Intelligence Review

, Volume 50, Issue 1, pp 49–73 | Cite as

Natural language based financial forecasting: a survey

  • Frank Z. Xing
  • Erik CambriaEmail author
  • Roy E. Welsch


Natural language processing (NLP), or the pragmatic research perspective of computational linguistics, has become increasingly powerful due to data availability and various techniques developed in the past decade. This increasing capability makes it possible to capture sentiments more accurately and semantics in a more nuanced way. Naturally, many applications are starting to seek improvements by adopting cutting-edge NLP techniques. Financial forecasting is no exception. As a result, articles that leverage NLP techniques to predict financial markets are fast accumulating, gradually establishing the research field of natural language based financial forecasting (NLFF), or from the application perspective, stock market prediction. This review article clarifies the scope of NLFF research by ordering and structuring techniques and applications from related work. The survey also aims to increase the understanding of progress and hotspots in NLFF, and bring about discussions across many different disciplines.


Financial forecasting Natural language processing Text mining Predictive analytics Knowledge engineering Computational finance 


  1. Anton M, Polk C (2014) Connected stocks. J Finance 69(3):1099–1127CrossRefGoogle Scholar
  2. Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards. J Finance 59(3):1259–1294CrossRefGoogle Scholar
  3. Avramov D, Zhou G (2010) Bayesian portfolio analysis. Annu Rev Financ Econ 2:25–47CrossRefGoogle Scholar
  4. Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: 7th language resources and evaluation conference, pp 2200–2204Google Scholar
  5. Banko M, Cafarella MJ, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the web. In: International joint conference on artificial intelligence, pp 2670–2676Google Scholar
  6. Bao T, Hommes C, Makarewicz T (2015) Bubble formation and (in)efficient markets in learning-to-forecast and -optimise experiments. Tinbergen Institute Discussion Paper TI 2015-107/II.
  7. Bengio Y, Ducharme R, Vincent P (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155zbMATHGoogle Scholar
  8. Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84CrossRefGoogle Scholar
  9. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8CrossRefGoogle Scholar
  10. Bouchey P, Nemtchinov V, Wong TKL (2015) Volatility harvesting in theory and practice. J Wealth Manage 18(3):89–100CrossRefGoogle Scholar
  11. Brabazon A, O’Neill M (2008) An introduction to evolutionary computation in finance. IEEE Comput Intell Mag 3(4):42–55CrossRefGoogle Scholar
  12. Brachman RJ, Khabaza T et al (1996) Mining business databases. Commun ACM 39(11):42–48CrossRefGoogle Scholar
  13. Brown GW, Cliff MT (2004) Investor sentiment and the near-term stock market. J Empir Finance 11:1–27CrossRefGoogle Scholar
  14. Bühler K (1934) Sprachtheorie. Fischer, JenaGoogle Scholar
  15. Cambria E (2013) An introduction to concept-level sentiment analysis. In: Lecture notes in computer science (LNCS), vol 8266. Springer, pp 478–483Google Scholar
  16. Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107CrossRefGoogle Scholar
  17. Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57CrossRefGoogle Scholar
  18. Cambria E, Livingstone A, Hussain A (2012) The hourglass of emotions. In: Lecture notes in computer science, vol 7403. Springer, pp 144–157Google Scholar
  19. Cambria E, Wang H, White B (2014) Guest editorial: big social data analysis. Knowl-Based Syst 69:1–2CrossRefGoogle Scholar
  20. Cambria E, Poria S, Bajpai R, Schuller B (2016) SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: International conference on computational linguistics (COLING), pp 2666–2677Google Scholar
  21. Cambria E, Poria S, Gelbukh A, Thelwall M (2017) Sentiment analysis is a big suitcase. IEEE Intell Syst 32(6):74–80Google Scholar
  22. Cavalcante RC, Brasileiro RC, Souza VL, Nobrega JP, Oliveira AL (2016) Computational intelligence and financial markets: a survey and future directions. Expert Syst Appl 55:194–211CrossRefGoogle Scholar
  23. Chan SW, Chong MW (2017) Sentiment analysis in financial texts. Decis Support Syst 94:53–64CrossRefGoogle Scholar
  24. Chan S, Franklin J (2011) A text-based decision support system for financial sequence prediction. Decis Support Syst 52(1):189–198CrossRefGoogle Scholar
  25. Chang CY, Zhang Y, Teng Z, Bozanic Z, Ke B (2016) Measuring the information content of financial news. In: Proceedings of the the 26th international conference on computational linguisticsGoogle Scholar
  26. Chaturvedi I, Ong YS, Tsang I, Welsch R, Cambria E (2016) Learning word dependencies in text by means of a deep recurrent belief network. Knowl-Based Syst 108:144–154CrossRefGoogle Scholar
  27. Chaturvedi I, Ragusa E, Gastaldo P, Zunino R, Cambria E (2017) Bayesian network based extreme learning machine for subjectivity detection. J Frankl Inst. Google Scholar
  28. Chen N, Ribeiro B, Chen A (2016) Financial credit risk assessment: a recent review. Artif Intell Rev 45:1–23CrossRefGoogle Scholar
  29. Choi H, Varian H (2012) Predicting the present with google trends. Econ Rec 88(1):2–9CrossRefGoogle Scholar
  30. Chomsky N (1956) Three models for the description of language. IRE Trans Inf Theory 2(3):113–124. zbMATHCrossRefGoogle Scholar
  31. Cohen L, Frazzini A (2008) Economic links and predictable returns. J Finance 63(4):1977–2011CrossRefGoogle Scholar
  32. Das SR, Chen MY (2007) Yahoo! for amazon: sentiment extraction from small talk on the web. Manage Sci 53(9):1375–1388CrossRefGoogle Scholar
  33. Ding X (2016) Research on methodology of market trends prediction based on social media. Ph.D. thesis, Harbin Institute of TechnologyGoogle Scholar
  34. Ding X, Zhang Y, Liu T, Duan J (2015) Deep learning for event-driven stock prediction. In: International joint conference on artificial intelligenceGoogle Scholar
  35. Dong L, Wang Z, Xiong D (2017) Stock market prediction based on text information. Acta Scientiarum Naturalium Universitatis Pekinesis.
  36. Fama EF (1970) Efficient capital markets: a review of theory and empirical work. J Finance 25:383–417CrossRefGoogle Scholar
  37. Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89CrossRefGoogle Scholar
  38. Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, CambridgezbMATHGoogle Scholar
  39. Frazier KB, Ingram RW, Tennyson BM (1984) A methodology for the analysis of narrative accounting disclosures. J Account Res 22(1):318–331CrossRefGoogle Scholar
  40. Fung GPC, Yu JX, Lam W (2003) Stock prediction: integrating text mining approach using real-time news. In: 2003 IEEE international conference on computational intelligence for financial engineering, 2003. Proceedings, pp 395–402.
  41. Groth SS, Muntermann J (2011) An intraday market risk management approach based on textual analysis. Decis Support Syst 50(4):680–691CrossRefGoogle Scholar
  42. Guha RV, Lenat DB (1990) Cyc: a midterm report. AI Mag 11(3):32–59Google Scholar
  43. Hagenau M, Liebmann M, Neumann D (2013) Automated news reading: stock price prediction based on financial news using context-capturing features. Decis Support Syst 55(3):685–697. CrossRefGoogle Scholar
  44. Hajizadeh E, Ardakani HD, Shahrabi J (2010) Application of data mining techniques in stock markets: a survey. J Econ Int Finance 2(7):109–118Google Scholar
  45. Hamilton WL, Clark K, Leskovec J, Jurafsky D (2016) Inducing domain-specific sentiment lexicons from unlabeled corpora. In: Empirical methods in natural language processing (EMNLP), pp 595–605Google Scholar
  46. Harmer GP, Abbott D (1999) Parrondo’s paradox. Stat Sci 14(2):206–213MathSciNetzbMATHCrossRefGoogle Scholar
  47. Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In: Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), pp 174–181Google Scholar
  48. Henry E (2008) Are investors influenced by how earnings press releases are written? Int J Bus Commun 45:363–407CrossRefGoogle Scholar
  49. Heston SL, Sinha NR (2016) News versus sentiment: predicting stock returns from news stories. Technical Report 2016-048: Board of Governors of the Federal Reserve System, WashingtonGoogle Scholar
  50. Hofman JM, Sharma A, Watts DJ (2017) Prediction and explanation in social systems. Science 355(6324):486–488CrossRefGoogle Scholar
  51. Hommes CH (2006) Heterogeneous agent models in economics and finance. In: Tesfatsion L, Judd K (eds) Handbook of computational economics II: agent-based economics. Elsevier, pp 1109–86Google Scholar
  52. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–177Google Scholar
  53. Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688CrossRefGoogle Scholar
  54. Kelly EF (1975) Computer recognition of English word senses. Elsevier, AmsterdamGoogle Scholar
  55. Kittrell J (2011) Sentiment reversals as buy signals. Wiley, Hoboken, pp 231–244. Google Scholar
  56. Koleva N, Paiva D (2009) Copula-based regression models: a survey. J Stat Plan Inference 139(11):3847–3856. MathSciNetzbMATHCrossRefGoogle Scholar
  57. Kumar BS, Ravi V (2016) A survey of the applications of text mining in financial domain. Knowl-Based Syst 114:128–147CrossRefGoogle Scholar
  58. Lakonishok J, Maberly E (1990) The weekend effect: trading patterns of individual and institutional investors. J Finance 40:231–243CrossRefGoogle Scholar
  59. Lavrenko V, Schmill M, Lawrie D, Ogilvie P, Jensen D, Allan J (2000) Language models for financial news recommendation. In: Proceedings of the ninth international conference on information and knowledge management, pp 389–396Google Scholar
  60. LeBaron B, Arthur W, Palmer R (1999) Time series properties of an artificial stock market. J Econ Dyn Control 23:1487–1516zbMATHCrossRefGoogle Scholar
  61. Leetaru K, Schrodt PA (2013) Gdelt: global data on events, location, and tone, 1979–2012. In: ISA annual convention, vol 2. CiteseerGoogle Scholar
  62. Li B, Hoi SCH (2014) Online portfolio selection: a survey. ACM Comput Surv 46(3).
  63. Li Q, Wang T, Gong Q, Chen Y, Lin Z, Song SK (2014a) Media-aware quantitative trading based on public web information. Decis Support Syst 61:93–105CrossRefGoogle Scholar
  64. Li Q, Wang T, Li P, Liu L, Gong Q, Chen Y (2014b) The effect of news and public mood on stock movements. Inf Sci 278:826–840CrossRefGoogle Scholar
  65. Li X, Xie H, Chen L, Wang J, Deng X (2014c) News impact on stock price return via sentiment analysis. Knowl-Based Syst 69:14–23CrossRefGoogle Scholar
  66. Li B, Hoi SCH, Sahoo D, Liu ZY (2015) Moving average reversion strategy for on-line portfolio selection. Artif Intell 222:104–123MathSciNetCrossRefGoogle Scholar
  67. Li Q, Jiang L, Li P, Chen H (2015) Tensor-based learning for predicting stock movements. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 1784–1790Google Scholar
  68. Li L, Qin B, Ren W, Liu T (2016) Truth discovery with memory network. CoRR arXiv:1611.01868
  69. Liu H, Singh P (2004) ConceptNet—a practical commonsense reasoning tool-kit. BT Technol J 22(4):211–226CrossRefGoogle Scholar
  70. Liu C, Hoi SCH, Zhao P, Sun J (2016) Online arima algorithms for time series prediction. In: Thirtieth AAAI conference on artificial intelligenceGoogle Scholar
  71. Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. J Finance 66:67–97CrossRefGoogle Scholar
  72. Loughran T, McDonald B (2016) Textual analysis in accounting and finance: a survey. J Account Res 54(4):1187–1230CrossRefGoogle Scholar
  73. Ma Y, Cambria E, Gao S (2016) Label embedding for zero-shot fine-grained named entity typing. In: COLING, pp 171–180Google Scholar
  74. Majumder N, Poria S, Gelbukh A, Cambria E (2017) Deep learning based document modeling for personality detection from text. IEEE Intell Syst 32(2):74–79CrossRefGoogle Scholar
  75. Malik HH, Bhardwaj VS, Fiorletta H (2011) Accurate information extraction for quantitative financial events. In: Proceedings of the 20th ACM international conference on information and knowledge managementGoogle Scholar
  76. Marsella S, Gratch J (2014) Computationally modeling human emotion. Commun ACM 57(12):56–67CrossRefGoogle Scholar
  77. Mihalcea R, Garimella A (2016) What men say, what women hear: finding gender-specific meaning shades. IEEE Intell Syst 31(4):62–67CrossRefGoogle Scholar
  78. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. CoRR arXiv:1310.4546
  79. Moniz A, de Jong F (2014) Classifying the influence of negative affect expressed by the financial media on investor behavior. In: Fifth information interaction in context symposium, IIiX ’14, Regensburg, Germany, 26–29 Aug 2014, pp 275–278Google Scholar
  80. Mueen A, Keogh E (2010) Online discovery and maintenance of time series motifs. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York,, pp 1089–1098.
  81. Nassirtoussi AK, Aghabozorgi S, Waha TY, Ngo DCL (2014) Text mining for market prediction: a systematic review. Expert Syst Appl 41:7653–7670CrossRefGoogle Scholar
  82. Nguyen TH, Shirai K (2015) Topic modeling based sentiment analysis on social media for stock market prediction. In: The 53rd annual meeting of the association for computational linguistics (ACL), pp 1354–1364Google Scholar
  83. Nguyen TH, Shirai K, Velcin J (2015) Sentiment analysis on social media for stock movement prediction. Expert Syst Appl 42:9603–9611CrossRefGoogle Scholar
  84. Njølstad LSH (2014) Sentiment analysis for financial applications. Master’s thesis, Norwegian University of Science and TechnologyGoogle Scholar
  85. Nofer M, Hinz O (2015) Using twitter to predict the stock market: where is the mood effect? Bus Inf Syst Eng 57(4):229–242CrossRefGoogle Scholar
  86. Oliveira N, Cortez P, Areal N (2016) Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decis Support Syst 85:62–73CrossRefGoogle Scholar
  87. Oliveira N, Cortez P, Areal N (2017) The impact of microblogging data for stock market prediction: using twitter to predict returns, volatility, trading volume and survey sentiment indices. Expert Syst Appl 73:125–144CrossRefGoogle Scholar
  88. Owyang J (2009) The future of the social web. Forrester Research Inc, CambridgeGoogle Scholar
  89. Park CH, Irwin SH (2004) The profitability of technical analysis: a review. AgMAS project research report 2004-04, University of Illinois at Urbana-ChampaignGoogle Scholar
  90. Peters EE (1991) A chaotic attractor for the S&P 500. Financ Anal J 47(2):55–62+81.
  91. Poria S, Cambria E, Gelbukh A (2016a) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49CrossRefGoogle Scholar
  92. Poria S, Cambria E, Hazarika D, Vij P (2016b) A deeper look into sarcastic tweets using deep convolutional neural networks. In: COLING, pp 1601–1612Google Scholar
  93. Poria S, Chaturvedi I, Cambria E, Hussain A (2016c) Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: ICDM, Barcelona, pp 439–448Google Scholar
  94. Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion 37:98–125CrossRefGoogle Scholar
  95. Qian B, Rasheed K (2004) Hurst exponent and financial market predictability. In: Proceedings of the 2nd IASTED international conference on financial engineering and applications, pp 203–209Google Scholar
  96. Rachlin G, Last M, Alberg D, Kandel A (2007) Admiral: a data mining based financial trading system. In: IEEE symposium on computational intelligence and data miningGoogle Scholar
  97. Rajput V, Bobde S (2016) Stock market forecasting techniques: literature survey. Int J Comput Sci Mob Comput 5(6):500–506Google Scholar
  98. Reuters T (2016) OptiRisk: Marketpsych indices and sentiment analysis toolkit. Products Leaflets Thomson ReutersGoogle Scholar
  99. Ruiz EJ, Hristidis V, Castillo C, Gionis A, Jaimes A (2012) Correlating financial time series with micro-blogging activity. In: Proceedings of the fifth ACM international conference on web search and data mining, pp 513–522Google Scholar
  100. Sag IA, Baldwin T, Bond F, Copestake A, Flickinger D (2002) Multiword expressions: a pain in the neck for NLP. In: Lecture notes in computer science, vol 2276, pp 1–15Google Scholar
  101. Samo YLK, Vervuurt A (2016) Stochastic portfolio theory: a machine learning approach. In: Proceedings of the thirty-second conference on uncertainty in artificial intelligence (UAI)Google Scholar
  102. Schneider MJ, Gupta S (2016) Forecasting sales of new and existing products using consumer reviews: a random projections approach. Int J Forecast 32:243–256CrossRefGoogle Scholar
  103. Schumaker RP, Chen H (2009) Textual analysis of stock market prediction using breaking financial news: the AZFin text system. ACM Trans Inf Syst 27(2):1–19. CrossRefGoogle Scholar
  104. Schumaker RP, Zhang Y, Huang CN, Chen H (2012) Financial fraud detection using vocal, linguistic and financial cues. Decis Support Syst 53:458–464CrossRefGoogle Scholar
  105. Sehgal V, Song C (2007) Sops: stock prediction using web sentiment. In: Proceedings of the seventh IEEE international conference on data mining workshops, pp 21–26Google Scholar
  106. Shacham S (1983) A shortened version of the profile of mood states. J Personal Assess 47(3):305–306CrossRefGoogle Scholar
  107. Shen W, Wang J, Ma S (2014) Doubly regularized portfolio with risk minimization. In: Proceedings of the twenty-eighth AAAI conference on artificial intelligence. AAAI Press, pp 1286–1292Google Scholar
  108. Si J, Mukherjee A, Liu B, Li Q, Li H, Deng X (2013) Exploiting topic based twitter sentiment for stock prediction. In: The 51st annual meeting of the association for computational linguistics (ACL)Google Scholar
  109. Si J, Mukherjee A, Liu B, Pan SJ, Li Q, Li H (2014) Exploiting social relations and sentiment for stock prediction. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1139–1145Google Scholar
  110. Sowa JF (1987) Semantic networks. In: Shapiro SC (eds) Encyclopedia of artificial intelligence. Wiley, pp 1011–1024Google Scholar
  111. Stein D, Bouchey P, Atwill T, Nemtchinov V (2013) Why does diversifying and rebalancing create alpha? White paper, ParametricGoogle Scholar
  112. Tai Y, Kao H (2013) Automatic domain-specific sentiment lexicon generation with label propagation. In: The 15th international conference on information integration and web-based applications and services, Vienna, AustriaGoogle Scholar
  113. Taleb NN (2008) Finiteness of variance is irrelevant in the practice of quantitative finance. Complexity 14(3):66–76. MathSciNetCrossRefGoogle Scholar
  114. Tetlock PC, Saar-Tsechansky M, Macskassy S (2008) More than words: quantifying language to measure firms’ fundamentals. J Finance 63(3):1437–1467CrossRefGoogle Scholar
  115. Ticknor JL (2013) A bayesian regularized artificial neural network for stock market forecasting. Expert Syst Appl 40(14):5501–5506CrossRefGoogle Scholar
  116. Tkác M, Verner R (2016) Artificial neural networks in business: two decades of research. Appl Soft Comput 38:788–804CrossRefGoogle Scholar
  117. Uhl M (2014) Reuters sentiment and stock returns. J Behav Finance 15(4):287–298CrossRefGoogle Scholar
  118. Valitutti R (2004) WordNet-affect: an affective extension of WordNet. In: Proceedings of the 4th international conference on language resources and evaluation, pp 1083–1086Google Scholar
  119. Vui CS et al (2013) A review of stock market prediction with artificial neural network. In: IEEE international conference on control system, computing and engineering, pp 477–482Google Scholar
  120. Wei W, Mao Y, Wang B (2016) Twitter volume spikes and stock options pricing. Comput Commun 73:271–281CrossRefGoogle Scholar
  121. Weidmann NB, Ward MD (2010) Predicting conflict in space and time. J Confl Resolut 54(6):883–901CrossRefGoogle Scholar
  122. Wilson T, Hoffmann P, Somasundaran S, Kessler J, Wiebe J, Choi Y, Cardie C, Riloff E, Patwardhan S (2005) OpinionFinder: a system for subjectivity analysis. In: Empirical methods in natural language processing (EMNLP)Google Scholar
  123. Witte JH (2015) Volatility harvesting: extracting return from randomness. CoRR arXiv:1508.05241
  124. Wuthrich B, Cho V, Leung S, Permunetilleke D, Sankaran K, Zhang J (1998) Daily stock market forecast from textual web data. In: IEEE international conference on systems, man, and cybernetics, vol 3, pp 2720–2725Google Scholar
  125. Xing FZ, Cambria E, Zou X (2017) Predicting evolving chaotic time series with fuzzy neural networks. In: International joint conference on neural networks (IJCNN), pp 3176–3183Google Scholar
  126. Yoshihara A, Seki K, Uehara K (2016) Leveraging temporal properties of news events for stock market prediction. Artif Intell Res 5(1):103–110Google Scholar
  127. Zhang GP (2003) Time series forecasting using a hybrid arima and neural network model. Neurocomputing 50:159–175zbMATHCrossRefGoogle Scholar
  128. Zhang W, Li C, Ye Y, Li W, Ngai EW (2015) Dynamic business network analysis for correlated stock price movement prediction. IEEE Intell Syst 30(2):26–33CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringNanyang Technological UniversitySingaporeSingapore
  2. 2.MIT Sloan School of ManagementMassachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations