Automatic Detection of Uncertain Statements in the Financial Domain

  • Christoph Kilian TheilEmail author
  • Sanja Štajner
  • Heiner Stuckenschmidt
  • Simone Paolo Ponzetto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10762)


The automatic detection of uncertain statements can benefit NLP tasks such as deception detection and information extraction. Furthermore, it can enable new analyses in social sciences such as business where the quantification of uncertainty or risk plays a significant role. Thus, for the first time, we approached the automatic detection of uncertain statements as a binary sentence classification task on the transcripts of spoken language in the financial domain. We created a new dataset and – besides using bag-of-words, part-of-speech tags, and dictionaries – developed rule-based features tailored to our task. Finally, we analyzed systematically, which features perform best in the financial domain as opposed to the previously researched encyclopedic domain.


Automatic uncertainty detection Binary sentence classification Financial domain 



We thank Alexander Diete for his help with the data acquisition and technical advice as well as Clemens Müller for his help with the annotation. This work was supported by the SFB 884 on the Political Economy of Reforms at the University of Mannheim (project C4), funded by the German Research Foundation (DFG).


  1. 1.
    Hyland, K.: Hedging in Scientific Research Articles. John Benjamins, Amsterdam/Philadelphia (1998)CrossRefGoogle Scholar
  2. 2.
    Larcker, D.F., Zakolyukina, A.: Detecting deceptive disucssions in conference calls. J. Account. Res. 50, 494–540 (2012)CrossRefGoogle Scholar
  3. 3.
    Bachenko, J., Fitzpatrick, E., Schonwetter, M.: Verification and implementation of language-based deception indicators in civil and criminal narratives. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, pp. 25–32 (2008)Google Scholar
  4. 4.
    Szarvas, G.: Hedge classification in biomedical texts with a weakly supervised selection of keywords. In: Proceedings of ACL-08: HLT, Columbus, OH, pp. 281–289 (2008)Google Scholar
  5. 5.
    Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, pp. 992–999 (2007)Google Scholar
  6. 6.
    Riloff, E., Wiebe, J., Wilson, T.: Learning subjective nouns using extraction pattern bootstrapping. In: Proceedings of the Seventh Conference on Natural Language Learning, Edmonton, pp. 25–32 (2003)Google Scholar
  7. 7.
    Farkas, R., Vincze, V., Móra, G., Csirik, J., Szarvas, G.: The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning: Shared Task, Uppsala, pp. 1–12 (2010)Google Scholar
  8. 8.
    Loughran, T., McDonald, B.: Textual analysis in accounting and finance: a survey. J. Account. Res. 54, 1187–1230 (2016)CrossRefGoogle Scholar
  9. 9.
    Light, M., Qiu, X.Y., Srinivasan, P.: The language of bioscience: facts, speculations, and statements in between. In: HLT-NAACL 2004 Workshop: BioLINK 2004, Linking Biological Literature, Ontologies and Databases, Boston, MA, pp. 17–24 (2004)Google Scholar
  10. 10.
    Ganter, V., Strube, M.: Finding hedges by chasing weasels: hedge detection using wikipedia tags and shallow linguistic features. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Singapore, pp. 173–176 (2009)Google Scholar
  11. 11.
    Li, F.: Textual analysis of corporate disclosures: a survey of the literature. J. Account. Lit. 29, 143–165 (2010)Google Scholar
  12. 12.
    Kearney, C., Liu, S.: Textual sentiment in finance: a survey of methods and models. Int. Rev. Financ. Anal. 33, 171–185 (2014)CrossRefGoogle Scholar
  13. 13.
    Das, S.R.: Text and context: language analytics in finance. Found. Trends Financ. 8, 144–261 (2014)CrossRefGoogle Scholar
  14. 14.
    Li, F.: Annual report readability, current earnings, and earnings persistence. J. Account. Econ. 45, 221–247 (2008)CrossRefGoogle Scholar
  15. 15.
    Li, F.: The information content of forward-looking statements in corporate filings: a naïve bayesian machine learning approach. J. Account. Res. 50, 494–540 (2012)Google Scholar
  16. 16.
    Loughran, T., McDonald, B., Yun, H.: A wolf in sheeps clothing: the use of ethics-related terms in 10-K reports. J. Bus. Ethics 89, 39–49 (2009)CrossRefGoogle Scholar
  17. 17.
    Loughran, T., McDonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Financ. 66, 35–65 (2011)CrossRefGoogle Scholar
  18. 18.
    Loughran, T., McDonald, B.: Measuring readability in financial disclosures. J. Financ. 69, 1643–1671 (2014)CrossRefGoogle Scholar
  19. 19.
    Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 41–48 (1960)CrossRefGoogle Scholar
  20. 20.
    Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)CrossRefGoogle Scholar
  21. 21.
    Fleiss, J.L.: Statistical Methods for Rates and Proportions, 2nd edn. John Wiley, New York (1981)zbMATHGoogle Scholar
  22. 22.
    Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media, Sebastopol (2009)zbMATHGoogle Scholar
  23. 23.
    Bird, S., Loper, E.: Natural language toolkit: taggers (2017). Accessed 27 Jan 2017
  24. 24.
    Honnibal, M.: Averaged perceptron tagger (2013). Accessed 27 Jan 2017
  25. 25.
    Honnibal, M.: A good part-of-speech tagger in about 200 lines of python (2013). Accessed 27 Jan 2017
  26. 26.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)CrossRefGoogle Scholar
  27. 27.
    Le Cessie, S., van Houwelingen, J.: Ridge estimators in logistic regression. Appl. Stat. 41, 191–201 (1992)CrossRefGoogle Scholar
  28. 28.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)Google Scholar
  29. 29.
    Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods-Support Vector Learning (1998)Google Scholar
  30. 30.
    Aha, D., Kibler, D.: Instance-based learning algorithms. Mach. Learn. 6, 37–66 (1991)zbMATHGoogle Scholar
  31. 31.
    Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)CrossRefGoogle Scholar
  32. 32.
    Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)Google Scholar
  33. 33.
    Breiman, L.: Random forests. Mach. Learn. 41, 5–32 (2001)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Christoph Kilian Theil
    • 1
    Email author
  • Sanja Štajner
    • 1
  • Heiner Stuckenschmidt
    • 1
  • Simone Paolo Ponzetto
    • 1
  1. 1.Data and Web Science GroupUniversity of MannheimMannheimGermany

Personalised recommendations