The Analysis of Big Financial Data Through Artificial Intelligence Methods

Ozhan, Erkan; Uzun, Erdinç

doi:10.1007/978-981-33-6811-8_4

Erkan Ozhan³ &
Erdinç Uzun³

Part of the book series: Accounting, Finance, Sustainability, Governance & Fraud: Theory and Application ((AFSGFTA))

1409 Accesses
1 Citations

Abstract

A new data world which never get deformed, can be reached from anywhere, continuously stream and multiply, emerged with the evolution of technology. The data, in particular, created by business firms, scientific research centers, and automation systems reached great amounts. It has become the main target of many data analysts to reach meaningful, unexplored, and valuable information or deductions among these piles of data. In this chapter, firstly the techniques of artificial intelligence and the skills of these techniques were discussed. Later, the mostly-used techniques in the finance sector, the advantages and weaknesses of these techniques, and the methods which can be used to process the data created by the finance sector, which creates big data and is one of the leading sources, was comparatively shown. The current version of the mostly-used artificial intelligence methods in the finance sector was scanned and the new skills and contributions it provides to the sector were examined. What Classification, clustering, association rules, and time series analysis methods, in particular, cover and what problems they can produce solutions to were examined and the readers were informed about these techniques. It was aimed to give information about forming credit score and customer segmentation, where classification and clustering methods are especially employed, with sample studies. It was aimed to present the principles the up-to-date methods are based on and their theoretical and practical applications in a meaningful way. In addition to these, information about practical and useful software that can be used for data analysis in the finance sector was given and the skills of this software were conveyed to the readers. Finally, how the techniques of processing big data can be used was examined through samples as the finance data are classified as big data. The difficulties met during the analysis of big data, a natural result created by this sector, and solutions to them were presented. Updated big data processing solutions like Hadoop, Spark, MapReduce, Distributed computing, and GPU (Graphics Processing Unit) computing, in particular, were comparatively explained. The main principles that big data processing techniques are based on were simplified in a way that the readers could understand and were supported by examples from the sector. Especially, Spark, Hadoop, and MapReduce methods, which are leading methods in processing big data, were examined. Finally, the contributions made to the sector by artificial intelligence and big data processing techniques were generally summarized and the results were presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Data Mining Techniques Applied in the Financial Industry

Data Mining in Finance: Current Advances and Future Challenges

What Is Data Mining and How Does It Work?

Notes

1.
Identifying the license plate of a vehicle is actually determining the class of each letter on the license plate in an alphabet with an average number of 40 classes.
2.
These companies promise the required infrastructure for businesses by providing monthly or annual subscriptions. Even though they are widely used today, some businesses are establishing their own data analysis departments for carrying out these analyses.
3.
ENIAC- Electronic Numerical Integrator And Computer: “Built between 1943 and 1945 the first large-scale computer to run at electronic speed without being slowed by any mechanical parts” (CHM).
4.
https://github.com/erdincuzun/ml_intro.
5.
https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.
6.
House Price Prediction with Numeric-only, Dataset https://www.kaggle.com/youngseoklee/house-price-prediction-with-numeric-only-dataset/data.
7.
https://github.com/sowmyacr/kmeans_cluster/blob/master/CLV.csv.
8.
https://archive.ics.uci.edu/ml/datasets/online+retail.
9.
For detail: https://www.cs.waikato.ac.nz/ml/weka/.
10.
For detail: https://rapidminer.com/.
11.
Fro detail: https://orange.biolab.si/.
12.
For detail: https://www.knime.com/.
13.
For detail: https://www.python.org/.
14.
For detail: https://www.r-project.org/.
15.
Visit https://kudu.apache.org/ for more detailed information.
16.
https://kafka.apache.org/.

References

Anderson TE, Dahlin MD, Neefe JM, et al (1995) Serverless network file systems. SIGOPS Oper Syst Rev 29:109–126. https://doi.org/10.1145/224057.224066.
Article Google Scholar
Apache Software Foundation (2019) Apache Hadoop 2.10.0—HDFS Architecture. https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Introduction. Accessed 13 Dec 2019.
Apache Software Foundation Apache Hadoop 3.2.1—Apache Hadoop YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 19 Dec 2019.
Apache Software Foundation Impala. https://impala.apache.org/overview.html. Accessed 23 Dec 2019.
The Apache Software Apache Hadoop. https://hadoop.apache.org/. Accessed 17 Dec 2019.
The Apache Software Foundation Apache Kudu—Fast Analytics on Fast Data. https://kudu.apache.org/. Accessed 19 Dec 2019.
Artis M, Ayuso M, Guillén M (2002) Detection of automobile insurance fraud with discrete choice models and misclassified claims. J Risk Insur 69:325–340.
Article Google Scholar
Atlassian (2016) Sentry Tutorial—Apache Sentry—Apache Software Foundation. https://cwiki.apache.org/confluence/display/SENTRY/Sentry+Tutorial. Accessed 23 Dec 2019.
Attiya H, Welch J (2004) Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley.
Google Scholar
Bahrammirzaee A (2010) A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems. Neural Comput Appl 19:1165–1195. https://doi.org/10.1007/s00521-010-0362-z.
Article Google Scholar
Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50:602–613. https://doi.org/10.1016/j.dss.2010.08.008.
Blazejewski A, Coggins R (2004) Application of self-organizing maps to clustering of high-frequency financial data. In: Proceedings of the Second Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, pp 85–90.
Google Scholar
Boeing G, Waddell P (2017) New insights into rental housing markets across the united states: web scraping and analyzing craigslist rental listings. J Plan Educ Res 37:457–476. https://doi.org/10.1177/0739456X16664789.
Article Google Scholar
Brause R, Langsdorf T, Hepp M (1999) Neural data mining for credit card fraud detection. In: Proceedings 11th International Conference on Tools with Artificial Intelligence, pp 103–106.
Google Scholar
Brockwell PJ, Davis RA (2016) Introduction to Time Series and Forecasting. Springer International Publishing.
Google Scholar
Cai L, Zhu Y (2015) The challenges of data quality and data quality assessment in the big data era. Data Sci J 14:2. https://doi.org/10.5334/dsj-2015-002.
Article Google Scholar
Castillo O, Melin P (1995) An intelligent system for financial time series prediction combining dynamical systems theory, fractal theory, and statistical methods. In: Proceedings of 1995 Conference on Computational Intelligence for Financial Engineering (CIFEr). IEEE, pp 151–155.
Google Scholar
Chapman B, Jost G, van der Pas R, Kuck DJ (2008) Using OpenMP: Portable Shared Memory Parallel Programming. https://apps2.mdp.ac.id/perpustakaan/ebook/Karya%20Umum/Portable_Shared_Memory_Parallel_Programming.pdf. Accessed 3 May 2020.
Chawla NV (2009) Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook. Springer US, Boston, MA, pp 875–886.
Chapter Google Scholar
Chen D, Sain SL, Guo K (2012) Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J Database Mark Cust Strateg Manag 19:197–208. https://doi.org/10.1057/dbm.2012.17.
Article Google Scholar
Chen S (2016) Detection of fraudulent financial statements using the hybrid data mining approach. Springerplus 5:89. https://doi.org/10.1186/s40064-016-1707-6.
Article Google Scholar
CHM ENIAC—CHM Revolution. https://www.computerhistory.org/revolution/birth-of-the-computer/4/78. Accessed 9 Dec 2019.
Chou C-H, Hsieh S-C, Qiu C-J (2017) Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction. Appl Soft Comput 56:298–316. https://doi.org/10.1016/j.asoc.2017.03.014.
Article Google Scholar
Ciszak L (2008) Application of clustering and association methods in data cleaning. In: 2008 International Multiconference on Computer Science and Information Technology, pp 97–103.
Google Scholar
Cryer JD, Chan KS (2008) Time Series Analysis: With Applications in R. Springer. New York.
Book Google Scholar
Cumby C, Fano A, Ghani R, Krema M (2004) Predicting customer shopping lists from point-of-sale purchase data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, pp 402–409.
Google Scholar
de Sá AGC, Pereira ACM, Pappa GL (2018) A customized classification algorithm for credit card fraud detection. Eng Appl Artif Intell 72:21–29. https://doi.org/10.1016/j.engappai.2018.03.011.
Doumpos M, Zopounidis C (2002) Multi–criteria classification methods in financial and banking decisions. Int Trans Oper Res 9:567–581. https://doi.org/10.1111/1475-3995.00374.
Article Google Scholar
Erl T, Khattak W, Buhler P (2015) Big Data Fundamentals: Concepts, Drivers & Techniques. Prentice Hall.
Google Scholar
Farajian MA, Mohammadi S (2010) Mining the banking customer behavior using clustering and association rules methods. Int J Indust Eng Prod Res 21:239–245.
Google Scholar
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. ACM, New York, NY, USA, pp 29–43.
Google Scholar
Gottfredson LS (1997) Mainstream science on intelligence: an editorial with 52 signatories, history, and bibliography. Intelligence 24:13–23. https://doi.org/10.1016/S0160-2896(97)90011-8.
Article Google Scholar
Guida T (2018) Big Data and Machine Learning in Quantitative Investment. Wiley.
Google Scholar
Gupta R, Pathak C (2014) A machine learning framework for predicting purchase by online customers based on dynamic pricing. Procedia Comput Sci 36:599–605. https://doi.org/10.1016/j.procs.2014.09.060.
Article Google Scholar
Hamuro Y, Katoh N, Edward IH, et al (2003) Combining information fusion with string pattern analysis: a new method for predicting future purchase behavior BT—Information fusion in data mining. In: Torra V (ed). Springer Berlin Heidelberg, Berlin, Heidelberg, pp 161–187.
Google Scholar
Holland JH (1992) Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press.
Google Scholar
Holmes A (2012) Hadoop in practice—MEAP. In: Hadoop in Practice. p 525.
Google Scholar
Hsu CF, Hung HF (2009) Classification methods of credit rating—a comparative analysis on SVM, MDA and RST. In: 2009 International Conference on Computational Intelligence and Software Engineering. pp 1–4.
Google Scholar
Hurwitz J, Nugent A, Halper F, Kaufman M (2013) Big Data for Dummies, For Dummies; 1st Edition (April 15, 2013).
Google Scholar
Ishwarappa, Anuradha J (2015) A brief introduction on Big Data 5Vs characteristics and Hadoop Technology. Procedia Comput Sci 48:319–324. https://doi.org/10.1016/j.procs.2015.04.188.
Article Google Scholar
Joudaki H, Rashidian A, Minaei-Bidgoli B, et al (2015) Using data mining to detect health care fraud and abuse: a review of literature. Glob J Health Sci 7:194.
Google Scholar
Kaggle, House Price Prediction with Numeric-only, https://www.kaggle.com/youngseoklee/house-price-prediction-with-numeric-only-dataset/data. Accessed 5 Apr 2020.
Khan MA, Uddin MF, Gupta N (2014) Seven V’s of Big Data understanding Big Data to extract value. In: Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education. IEEE, pp 1–5.
Google Scholar
Kim E, Kim W, Lee Y (2003) Combination of multiple classifiers for the customer’s purchase behavior prediction. Decis Support Syst 34:167–175. https://doi.org/10.1016/S0167-9236(02)00079-9.
Kirkos E, Spathis C, Manolopoulos Y (2007) Data mining techniques for the detection of fraudulent financial statements. Expert Syst Appl 32:995–1003.
Article Google Scholar
Kirlidog M, Asuk C (2012) A fraud detection approach with data mining in health insurance. Procedia-Social Behav Sci 62:989–994.
Article Google Scholar
Kshemkalyani AD, Singhal M (2011) Distributed Computing: Principles, Algorithms, and Systems. Cambridge University Press.
Google Scholar
Kumar BS, Ravi V (2016) A survey of the applications of text mining in financial domain. Knowledge-Based Syst 114:128–147.
Article Google Scholar
Kunigk J, Buss I, Wilkinson P, George L (2018) Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale. O’Reilly Media.
Google Scholar
Labrinidis A, Jagadish H V (2012) Challenges and opportunities with Big Data. Proc VLDB Endow 5:2032–2033. https://doi.org/10.14778/2367502.2367572.
McCarthy J, Minsky ML, Rochester N, Shannon CE (2006) A proposal for the dartmouth summer research project on artificial intelligence, August 31, 1955. AI Magazine 27(4):12. https://doi.org/10.1609/aimag.v27i4.1904.
Article Google Scholar
Meng X, Bradley J, Yavuz B, et al (2016) MLlib: Machine Learning in Apache Spark. J Mach Learn Res 17:1–7.
Google Scholar
Mitchell TM (1999) Machine learning and data mining. Commun ACM 42:30–36. https://doi.org/10.1145/319382.319388.
Article Google Scholar
Mukid MA, Widiharih T, Rusgiyono A, Prahutama A (2018) Credit scoring analysis using weighted k nearest neighbor. In: Warsito, B and Putro, SP and Khumaeni A (ed) 7th International Seminar on New Paradigm and Innovation on Natural Science and Its Application. IOP PUBLISHING LTD, DIRAC HOUSE, TEMPLE BACK, BRISTOL BS1 6BE, ENGLAND.
Google Scholar
Ngai EWT, Hu Y, Wong YH, et al (2011) The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Support Syst 50:559–569. https://doi.org/10.1016/j.dss.2010.08.006.
Owens JD, Houston M, Luebke D, et al (2008) GPU computing. Proc IEEE 96:879–899. https://doi.org/10.1109/JPROC.2008.917757.
Article Google Scholar
Pavlidis NG, Plagianakos VP, Tasoulis DK, Vrahatis MN (2006) Financial forecasting through unsupervised clustering and neural networks. Oper Res 6:103–127. https://doi.org/10.1007/BF02941227.
Article Google Scholar
Qiu J, Lin Z, Li Y (2015) Predicting customer purchase behavior in the e-commerce context. Electron Commer Res 15:427–452. https://doi.org/10.1007/s10660-015-9191-6.
Article Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7.
Article Google Scholar
Sato M (2002) OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors. In: 15th International Symposium on System Synthesis, 2002. pp 109–111.
Google Scholar
Schmuck F, Haskin R (2002) GPFS: A shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, USA.
Google Scholar
Spaggiari JM, Kovacevic M, Noland B, Bosshart R (2018) Getting Started with Kudu: Perform Fast Analytics on Fast Data. O’Reilly Media.
Google Scholar
Trobec R, Slivnik B, Bulić P, Robič B (2018) Introduction to Parallel Computing: From Algorithms to Programming on State-of-the-Art Platforms. Springer International Publishing.
Google Scholar
Turkington G, Deshpande T, Karanth S (2016) Hadoop: Data Processing and Modelling. Packt Publishing.
Google Scholar
Uzun E, Özhan E (2018) Examining the impact of feature selection on classification of user reviews in web pages. In: International Conference on Artificial Intelligence and Data Processing (IDAP 2018). Malatya, Turkey, pp 430–437.
Google Scholar
Vohra D (2016) Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools. Apress.
Google Scholar
Wagner W, Otto J, Chung Q (2002) Knowledge acquisition for expert systems in accounting and financial problem domains. Knowledge-Based Syst 15:439–447. https://doi.org/10.1016/S0950-7051(02)00026-6.
Article Google Scholar
Wang Y, Xu W (2018) Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95.
Article Google Scholar
Wei-Yang Lin, Ya-Han Hu, Chih-Fong Tsai (2012) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst Man, Cybern Part C (Applications Rev) 42:421–436. https://doi.org/10.1109/TSMCC.2011.2170420.
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data Mining: Practical Machine Learning Tools and Techniques. Elsevier Science.
Google Scholar
Woodward WA, Gray HL, Elliott AC (2017) Applied Time Series Analysis with R. CRC Press.
Google Scholar
Xindong Wu, Xingquan Zhu, Gong-Qing Wu, Wei Ding (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26:97–107. https://doi.org/10.1109/TKDE.2013.109.
Article Google Scholar
Yao M, Zhou A, Jia M (2018) Applied Artificial Intelligence: A Handbook for Business Leaders. Topbots.
Google Scholar
Yeh I-C, Lien C (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36:2473–2480. https://doi.org/10.1016/j.eswa.2007.12.020.
Article Google Scholar
Zhi-min Xu, Rui Zhang (2009) Financial revenue analysis based on association rules mining. In: 2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA), pp 220–223.
Google Scholar
Zikopoulos P, Eaton C (2011) Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, 1st edn. McGraw-Hill Osborne Media.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Namik Kemal University, Tekirdag, Turkey
Erkan Ozhan & Erdinç Uzun

Authors

Erkan Ozhan
View author publications
You can also search for this author in PubMed Google Scholar
Erdinç Uzun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Izmir Bakircay University, Izmir, Turkey
Sezer Bozkuş Kahyaoğlu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ozhan, E., Uzun, E. (2021). The Analysis of Big Financial Data Through Artificial Intelligence Methods. In: Bozkuş Kahyaoğlu, S. (eds) The Impact of Artificial Intelligence on Governance, Economics and Finance, Volume I. Accounting, Finance, Sustainability, Governance & Fraud: Theory and Application. Springer, Singapore. https://doi.org/10.1007/978-981-33-6811-8_4

Download citation

DOI: https://doi.org/10.1007/978-981-33-6811-8_4
Published: 27 April 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-6810-1
Online ISBN: 978-981-33-6811-8
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics

The Analysis of Big Financial Data Through Artificial Intelligence Methods

Abstract

Access this chapter

Similar content being viewed by others

Data Mining Techniques Applied in the Financial Industry

Data Mining in Finance: Current Advances and Future Challenges

What Is Data Mining and How Does It Work?

Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

The Analysis of Big Financial Data Through Artificial Intelligence Methods

Abstract

Access this chapter

Similar content being viewed by others

Data Mining Techniques Applied in the Financial Industry

Data Mining in Finance: Current Advances and Future Challenges

What Is Data Mining and How Does It Work?

Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation