Segment Information Extraction from Financial Annual Reports Using Neural Network

  • Tomoki ItoEmail author
  • Hiroki Sakaji
  • Kiyoshi Izumi
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1128)


This is an extension from a selected paper from JSAI2019. To extract business contents automatically from financial reports is an important problem in the financial area. Especially, segment names and their explanations are important contents that should be extracted. However, the methods for extracting these types of information from financial reports have not been established. In this study, we aim to develop a practical solution for extracting these types of information. To solve this problem, we developed a manually annotated dataset for the task of extracting the segment names and their explanations of each company from financial reports and then developed a recurrent neural network model to solve this task. Our method using the manually annotated dataset outperformed the baseline methods in the task of extracting segment names and their explanations of each company from annual financial reports. In addition, we experimentally demonstrated that our method can be available for this task even when we have a small training dataset. This work is the first work for applying a machine learning method to the task of extracting segment names and their explanations. The insights from this work should be valuable in the industrial area.


Text mining Financial documents Neural network model 



This work was supported in part by JSPS KAKENHI Grant Number JP17J04768.


  1. 1.
    Alves, P., Rayson, P., Walker, M., Young, S.: Heterogeneous narrative content in annual reports published as pdf files: extraction, classification and incremental predictive ability. SSRN Electron. J. (2016)Google Scholar
  2. 2.
    Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, pp. 2670–2676 (2007)Google Scholar
  3. 3.
    Cho, K., van Merriënboer, B., Gulcehre, C., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)Google Scholar
  4. 4.
    Corro, L.D., Gemulla, R.: ClausIE: clause-based open information extraction. In: In Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366 (2013)Google Scholar
  5. 5.
    Cui, L., Wei, F., Zhou, M.: Neural open information extraction. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp. 407–413 (2018)Google Scholar
  6. 6.
    Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT 2019, pp. 4171–4186 (2019)Google Scholar
  7. 7.
    EL-Haj, M., Rayson, P., Young, S., Walker, M.: Detecting document structure in a very large corpus of UK financial reports. In: Proceedings of The 9th Edition of the Language Resources and Evaluation Conference, pp. 26–31 (2014)Google Scholar
  8. 8.
    Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, pp. 3–10 (2011)Google Scholar
  9. 9.
    Hajek, P., Henriques, R.: Mining corporate annual reports for intelligent detection of financial statement fraud - a comparative study of machine learning methods. Knowl.-Based Syst. 128, 139–152 (2017)CrossRefGoogle Scholar
  10. 10.
    Isonuma, M., Fujino, T., Mori, J., Matsuo, Y., Sakata, I.: Extractive summarization using multi-task learning with document classification. In: EMNLP (2017)Google Scholar
  11. 11.
    Kitamori, S., Sakai, H., Sakaji, H.: Extraction of sentences concerning business performance forecast and economic forecast from summaries of financial statements by deep learning. In: IEEE CIFEr (2017)Google Scholar
  12. 12.
    Lee, H., Surdeanu, M., MacCartney, B., Jurafsky, D.: On the importance of text analysis for stock price prediction. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), pp. 1170–1175 (2014)Google Scholar
  13. 13.
    Madaan, A., Mittal, A., Mausam, Ramakrishnan, G., Sarawagi, S.: Numerical relation extraction with minimal supervision. In: Proceedings of Thirtieth AAAI Conference on Artificial Intelligence, pp. 2764–2771 (2016)Google Scholar
  14. 14.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)Google Scholar
  15. 15.
    Niklaus, C., Cetto, M., Freitas, A., Handschuh, S.: A survey on open information extraction. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3866–3878 (2018)Google Scholar
  16. 16.
    Pires, F.M., Abreu, S.: Automatic selection of table areas in documents for information extraction. In: EPIA (2013)Google Scholar
  17. 17.
    Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)Google Scholar
  18. 18.
    Sakaji, H., Murono, R., Sakai, H., Bennett, J., Izumi, K.: Discovery of rare causal knowledge from financial statement summaries. In: IEEE CIFEr (2017)Google Scholar
  19. 19.
    Sheikh, M., Conlon, S.: A rule-based system to extract financial information. J. Comput. Inf. Syst. 52, 10–19 (2012)Google Scholar
  20. 20.
    Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: NIPS (2015)Google Scholar
  21. 21.
    Wang, W., Yan, M., Wu, C.: Multi-granularity hierarchical attention fusion networks for reading comprehension and question answering. In: ACL (2018)Google Scholar
  22. 22.
    Wang, W., Yang, N., Wei, F., Chang, B., Zhou, M.: Gated self-matching networks for reading comprehension and question answering. In: ACL (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Graduate School of EngineeringThe University of TokyoBunkyōJapan

Personalised recommendations