Applied Data Science in Financial Industry

Spruit, Marco; Ferati, Drilon

doi:10.1007/978-3-030-30809-4_32

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

Included in the following conference series:

The International Research & Innovation Forum

1106 Accesses
2 Citations

Abstract

In a time when the employment of Natural Language Processing techniques in domains such as biomedicine, national security, finance and law, is flourishing, this study takes a deep look in its application in policy documents. Besides providing an overview of the current state of the literature that treats these concepts, the study at hand implements a set of unprecedented Natural Language Processing techniques on internal bank policies. The implementation of these techniques, together with the results that derive from the experiment and the experts’ evaluation, introduce a Meta-Algorithmic Modelling framework for processing internal business policies. This framework relies on three Natural Language Processing techniques, namely information extraction, automatic summarization and automatic keyword extraction. For the reference extraction and keyword extraction tasks we calculated Precision, Recall and F-scores. For the former we obtained 0.99, 0.84, and 0.89; for the latter we obtained 0.79, 0.87 and 0.83, respectively. Finally, our summary extraction approach was positively evaluated using a qualitative assessment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Spence, D.: Data, data everywhere: a special report on managing information. Economics, 1–10 (2010)
Google Scholar
Grimes, S.: Unstructured data and the 80% rule. Clarabridge Bridge (2008)
Google Scholar
Witten, I.H.: Text mining. Int. J. Comput. Biol. Drug Des. 198 (2004)
Google Scholar
Friedman, C., Johnson, S.B., Forman, B., Starren, J.: Architectural requirements for a multipurpose natural language processor in the clinical environment. Proc. Symp. Comput. Appl. Med. Care, 347–351 (1995)
Google Scholar
Haug, P.J., Ranum, D.L., Frederick, P.R.: Computerized extraction of coded findings from free-text radiologic reports. work in progress. Radiology 174(2), 543–548 (1990)
Article Google Scholar
Bholat, D., Hansen, S., Santos, P., Schonhardt-Bailey, C.: Text mining for central banks. Cent. Cent. Bank. Stud. Handb. 33, 1–19 (2015)
Google Scholar
Fan, W., Wallace, L., Rich, S., Zhang, Z.: Tapping the power of text mining. Commun. ACM 49(9), 76–82 (2006)
Article Google Scholar
Zhao, Y.: Analysing twitter data with text mining and social network analysis. In: Proceedings of the 11th Australasian Data Mining and Analytics Conference (AusDM 2013) (2013)
Google Scholar
Anton, A.I., Earp, J.: The Lack of Clarity in Financial Privacy Policies and the Need for Standardization, no. August, pp. 1–12 (2003)
Google Scholar
Anton, A., Earp, J.: A requirements taxonomy for reducing Web site privacy vulnerabilities. Requir. Eng. 9, 169–185 (2004)
Article Google Scholar
Spruit, M., Jagesar, R.: Power to the people! Meta-algorithmic modelling in applied data science. In: Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, vol. 1, no. IC3K, pp. 400–406 (2016)
Google Scholar
Wohlin, C.: Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: 18th International Conference on Evaluation and Assessment in Software Engineering (EASE 2014), pp. 1–10 (2014)
Google Scholar
Spruit, M., Lytras, M.: Applied data science in patient-centric healthcare: adaptive analytic systems for empowering physicians and patients. Telemat. Inf. (2018)
Google Scholar
Hevner, A.R., March, S.T., Park, J., Ram, S., Ram, S.: Research essay design science in information. MIS Q. 28(1), 75–105 (2004)
Article Google Scholar
Copeland, L.: A Practitioner’s Guide to Software Test Design. Artech House (2003)
Google Scholar
Sanner, M.F. et al.: Python: a programming language for software integration and development. J. Mol. Graph Model 17(1), 57–61 (1999)
Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc. (2009)
Google Scholar
Voutilainen, A.: Part-of-speech tagging. The Oxford handbook of computational linguistics (2003)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Larson, M.: Automatic summarization 5(3) (2012)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Assoc. Comput. Linguist. (2004)
Google Scholar
Page, L., Brin, S.: PageRank: bringing order to the web. Stanford Digit. Libr. Work. Pap. 72 (1997)
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
Google Scholar
Soukoreff, R.W., MacKenzie, I.S.: Measuring errors in text entry tasks, 319 (2001)
Google Scholar
Spruit, M.R.: Measuring syntactic variation in Dutch dialects. Lit. Linguist. Comput. 21(4) (2006)
Article Google Scholar
Heeringa, W., Nerbonne, J., Van Bezooijen, R., Spruit, M.R.: Geography and population size as explanatory factors for variation in the Dutch dialectal area. Tijdschr. Voor Ned. Taal-en Lett. 123(1) (2007)
Google Scholar
Renz, I., Ficzay, A., Hitzler, H.: Keyword extraction for text characterization. In: 8th International Conference on Application Natural Language to Information Systems, pp. 228–234 (2003)
Google Scholar
Wilson, R.C., Hancock, E.R.: Levenshtein distance for graph spectral features. In: Proceedings of the International Conference on Pattern Recognition, vol. 2, no. C, pp. 489–492 (2004)
Google Scholar
Rajaraman, A., Ullman, J.D.: Data mining. Min. Massive Datasets 18(Suppl), 114–142 (2011)
Google Scholar
Sasaki, Y.: The truth of the F-measure. Teach Tutor mater, 1–5 (2007)
Google Scholar
Makhoul, J., Kubala, F.: Performance measures for information extraction, 249–252 (1999)
Google Scholar
Powers, D.M.W.: Evaluation: from precision, recall and F-measure to roc, informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
Google Scholar
Hripcsak, G., Rothschild, A.S.: Agreement, the F-measure, and reliability in information retrieval. J. Am. Med. Informatics Assoc. 12(3), 296–298 (2005)
Article Google Scholar
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Min. Appl. Theory, 1–277 (2010)
Google Scholar
Yang, K., Chen, Z., Cai, Y., Huang, D.P., Leung, H.: Improved automatic keyword extraction given more semantic knowledge. 9645, 112–125 (2016)
Google Scholar
Hulth, A., Megyeesi, B.B.: A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 353–360 (July 2006)
Google Scholar
Liu, F., Pennell, D., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 620–628 (2009)
Google Scholar
Zhang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., Wang, B.: Automatic keyword extraction from documents using conditional random fields. J. Comput. Inf. 43, 1169–1180 (2008)
Google Scholar
Brinkkemper, S.: Method engineering: Engineering of information systems development methods and tools. Inf. Softw. Technol. 38, no. 4 SPEC. ISS., pp. 275–280 (1996)
Google Scholar
van de Weerd, I., Brinkkemper, S.: Meta-modeling for situational analysis and design methods. Handb. Res. Mod. Syst. Anal. Des. Technol. Appl. 35 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computing Sciences, Utrecht University, Princetonplein 5, 3584 CC, Utrecht, The Netherlands
Marco Spruit & Drilon Ferati

Authors

Marco Spruit
View author publications
You can also search for this author in PubMed Google Scholar
Drilon Ferati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco Spruit .

Editor information

Editors and Affiliations

Research & Innovation Institute (Rii), Warsaw, Poland
Anna Visvizi
Deree College, The American College of Greece, Athens, Greece
Miltiadis D. Lytras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spruit, M., Ferati, D. (2019). Applied Data Science in Financial Industry. In: Visvizi, A., Lytras, M. (eds) Research & Innovation Forum 2019. RIIFORUM 2019. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-030-30809-4_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-30809-4_32
Published: 29 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30808-7
Online ISBN: 978-3-030-30809-4
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics