Abstract
A new data world which never get deformed, can be reached from anywhere, continuously stream and multiply, emerged with the evolution of technology. The data, in particular, created by business firms, scientific research centers, and automation systems reached great amounts. It has become the main target of many data analysts to reach meaningful, unexplored, and valuable information or deductions among these piles of data. In this chapter, firstly the techniques of artificial intelligence and the skills of these techniques were discussed. Later, the mostly-used techniques in the finance sector, the advantages and weaknesses of these techniques, and the methods which can be used to process the data created by the finance sector, which creates big data and is one of the leading sources, was comparatively shown. The current version of the mostly-used artificial intelligence methods in the finance sector was scanned and the new skills and contributions it provides to the sector were examined. What Classification, clustering, association rules, and time series analysis methods, in particular, cover and what problems they can produce solutions to were examined and the readers were informed about these techniques. It was aimed to give information about forming credit score and customer segmentation, where classification and clustering methods are especially employed, with sample studies. It was aimed to present the principles the up-to-date methods are based on and their theoretical and practical applications in a meaningful way. In addition to these, information about practical and useful software that can be used for data analysis in the finance sector was given and the skills of this software were conveyed to the readers. Finally, how the techniques of processing big data can be used was examined through samples as the finance data are classified as big data. The difficulties met during the analysis of big data, a natural result created by this sector, and solutions to them were presented. Updated big data processing solutions like Hadoop, Spark, MapReduce, Distributed computing, and GPU (Graphics Processing Unit) computing, in particular, were comparatively explained. The main principles that big data processing techniques are based on were simplified in a way that the readers could understand and were supported by examples from the sector. Especially, Spark, Hadoop, and MapReduce methods, which are leading methods in processing big data, were examined. Finally, the contributions made to the sector by artificial intelligence and big data processing techniques were generally summarized and the results were presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Identifying the license plate of a vehicle is actually determining the class of each letter on the license plate in an alphabet with an average number of 40 classes.
- 2.
These companies promise the required infrastructure for businesses by providing monthly or annual subscriptions. Even though they are widely used today, some businesses are establishing their own data analysis departments for carrying out these analyses.
- 3.
ENIAC- Electronic Numerical Integrator And Computer: “Built between 1943 and 1945 the first large-scale computer to run at electronic speed without being slowed by any mechanical parts” (CHM).
- 4.
- 5.
- 6.
House Price Prediction with Numeric-only, Dataset https://www.kaggle.com/youngseoklee/house-price-prediction-with-numeric-only-dataset/data.
- 7.
- 8.
- 9.
For detail: https://www.cs.waikato.ac.nz/ml/weka/.
- 10.
For detail: https://rapidminer.com/.
- 11.
Fro detail: https://orange.biolab.si/.
- 12.
For detail: https://www.knime.com/.
- 13.
For detail: https://www.python.org/.
- 14.
For detail: https://www.r-project.org/.
- 15.
Visit https://kudu.apache.org/ for more detailed information.
- 16.
References
Anderson TE, Dahlin MD, Neefe JM, et al (1995) Serverless network file systems. SIGOPS Oper Syst Rev 29:109–126. https://doi.org/10.1145/224057.224066.
Apache Software Foundation (2019) Apache Hadoop 2.10.0—HDFS Architecture. https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Introduction. Accessed 13 Dec 2019.
Apache Software Foundation Apache Hadoop 3.2.1—Apache Hadoop YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 19 Dec 2019.
Apache Software Foundation Impala. https://impala.apache.org/overview.html. Accessed 23 Dec 2019.
The Apache Software Apache Hadoop. https://hadoop.apache.org/. Accessed 17 Dec 2019.
The Apache Software Foundation Apache Kudu—Fast Analytics on Fast Data. https://kudu.apache.org/. Accessed 19 Dec 2019.
Artis M, Ayuso M, Guillén M (2002) Detection of automobile insurance fraud with discrete choice models and misclassified claims. J Risk Insur 69:325–340.
Atlassian (2016) Sentry Tutorial—Apache Sentry—Apache Software Foundation. https://cwiki.apache.org/confluence/display/SENTRY/Sentry+Tutorial. Accessed 23 Dec 2019.
Attiya H, Welch J (2004) Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley.
Bahrammirzaee A (2010) A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems. Neural Comput Appl 19:1165–1195. https://doi.org/10.1007/s00521-010-0362-z.
Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50:602–613. https://doi.org/10.1016/j.dss.2010.08.008.
Blazejewski A, Coggins R (2004) Application of self-organizing maps to clustering of high-frequency financial data. In: Proceedings of the Second Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, pp 85–90.
Boeing G, Waddell P (2017) New insights into rental housing markets across the united states: web scraping and analyzing craigslist rental listings. J Plan Educ Res 37:457–476. https://doi.org/10.1177/0739456X16664789.
Brause R, Langsdorf T, Hepp M (1999) Neural data mining for credit card fraud detection. In: Proceedings 11th International Conference on Tools with Artificial Intelligence, pp 103–106.
Brockwell PJ, Davis RA (2016) Introduction to Time Series and Forecasting. Springer International Publishing.
Cai L, Zhu Y (2015) The challenges of data quality and data quality assessment in the big data era. Data Sci J 14:2. https://doi.org/10.5334/dsj-2015-002.
Castillo O, Melin P (1995) An intelligent system for financial time series prediction combining dynamical systems theory, fractal theory, and statistical methods. In: Proceedings of 1995 Conference on Computational Intelligence for Financial Engineering (CIFEr). IEEE, pp 151–155.
Chapman B, Jost G, van der Pas R, Kuck DJ (2008) Using OpenMP: Portable Shared Memory Parallel Programming. https://apps2.mdp.ac.id/perpustakaan/ebook/Karya%20Umum/Portable_Shared_Memory_Parallel_Programming.pdf. Accessed 3 May 2020.
Chawla NV (2009) Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook. Springer US, Boston, MA, pp 875–886.
Chen D, Sain SL, Guo K (2012) Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J Database Mark Cust Strateg Manag 19:197–208. https://doi.org/10.1057/dbm.2012.17.
Chen S (2016) Detection of fraudulent financial statements using the hybrid data mining approach. Springerplus 5:89. https://doi.org/10.1186/s40064-016-1707-6.
CHM ENIAC—CHM Revolution. https://www.computerhistory.org/revolution/birth-of-the-computer/4/78. Accessed 9 Dec 2019.
Chou C-H, Hsieh S-C, Qiu C-J (2017) Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction. Appl Soft Comput 56:298–316. https://doi.org/10.1016/j.asoc.2017.03.014.
Ciszak L (2008) Application of clustering and association methods in data cleaning. In: 2008 International Multiconference on Computer Science and Information Technology, pp 97–103.
Cryer JD, Chan KS (2008) Time Series Analysis: With Applications in R. Springer. New York.
Cumby C, Fano A, Ghani R, Krema M (2004) Predicting customer shopping lists from point-of-sale purchase data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, pp 402–409.
de Sá AGC, Pereira ACM, Pappa GL (2018) A customized classification algorithm for credit card fraud detection. Eng Appl Artif Intell 72:21–29. https://doi.org/10.1016/j.engappai.2018.03.011.
Doumpos M, Zopounidis C (2002) Multi–criteria classification methods in financial and banking decisions. Int Trans Oper Res 9:567–581. https://doi.org/10.1111/1475-3995.00374.
Erl T, Khattak W, Buhler P (2015) Big Data Fundamentals: Concepts, Drivers & Techniques. Prentice Hall.
Farajian MA, Mohammadi S (2010) Mining the banking customer behavior using clustering and association rules methods. Int J Indust Eng Prod Res 21:239–245.
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. ACM, New York, NY, USA, pp 29–43.
Gottfredson LS (1997) Mainstream science on intelligence: an editorial with 52 signatories, history, and bibliography. Intelligence 24:13–23. https://doi.org/10.1016/S0160-2896(97)90011-8.
Guida T (2018) Big Data and Machine Learning in Quantitative Investment. Wiley.
Gupta R, Pathak C (2014) A machine learning framework for predicting purchase by online customers based on dynamic pricing. Procedia Comput Sci 36:599–605. https://doi.org/10.1016/j.procs.2014.09.060.
Hamuro Y, Katoh N, Edward IH, et al (2003) Combining information fusion with string pattern analysis: a new method for predicting future purchase behavior BT—Information fusion in data mining. In: Torra V (ed). Springer Berlin Heidelberg, Berlin, Heidelberg, pp 161–187.
Holland JH (1992) Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press.
Holmes A (2012) Hadoop in practice—MEAP. In: Hadoop in Practice. p 525.
Hsu CF, Hung HF (2009) Classification methods of credit rating—a comparative analysis on SVM, MDA and RST. In: 2009 International Conference on Computational Intelligence and Software Engineering. pp 1–4.
Hurwitz J, Nugent A, Halper F, Kaufman M (2013) Big Data for Dummies, For Dummies; 1st Edition (April 15, 2013).
Ishwarappa, Anuradha J (2015) A brief introduction on Big Data 5Vs characteristics and Hadoop Technology. Procedia Comput Sci 48:319–324. https://doi.org/10.1016/j.procs.2015.04.188.
Joudaki H, Rashidian A, Minaei-Bidgoli B, et al (2015) Using data mining to detect health care fraud and abuse: a review of literature. Glob J Health Sci 7:194.
Kaggle, House Price Prediction with Numeric-only, https://www.kaggle.com/youngseoklee/house-price-prediction-with-numeric-only-dataset/data. Accessed 5 Apr 2020.
Khan MA, Uddin MF, Gupta N (2014) Seven V’s of Big Data understanding Big Data to extract value. In: Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education. IEEE, pp 1–5.
Kim E, Kim W, Lee Y (2003) Combination of multiple classifiers for the customer’s purchase behavior prediction. Decis Support Syst 34:167–175. https://doi.org/10.1016/S0167-9236(02)00079-9.
Kirkos E, Spathis C, Manolopoulos Y (2007) Data mining techniques for the detection of fraudulent financial statements. Expert Syst Appl 32:995–1003.
Kirlidog M, Asuk C (2012) A fraud detection approach with data mining in health insurance. Procedia-Social Behav Sci 62:989–994.
Kshemkalyani AD, Singhal M (2011) Distributed Computing: Principles, Algorithms, and Systems. Cambridge University Press.
Kumar BS, Ravi V (2016) A survey of the applications of text mining in financial domain. Knowledge-Based Syst 114:128–147.
Kunigk J, Buss I, Wilkinson P, George L (2018) Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale. O’Reilly Media.
Labrinidis A, Jagadish H V (2012) Challenges and opportunities with Big Data. Proc VLDB Endow 5:2032–2033. https://doi.org/10.14778/2367502.2367572.
McCarthy J, Minsky ML, Rochester N, Shannon CE (2006) A proposal for the dartmouth summer research project on artificial intelligence, August 31, 1955. AI Magazine 27(4):12. https://doi.org/10.1609/aimag.v27i4.1904.
Meng X, Bradley J, Yavuz B, et al (2016) MLlib: Machine Learning in Apache Spark. J Mach Learn Res 17:1–7.
Mitchell TM (1999) Machine learning and data mining. Commun ACM 42:30–36. https://doi.org/10.1145/319382.319388.
Mukid MA, Widiharih T, Rusgiyono A, Prahutama A (2018) Credit scoring analysis using weighted k nearest neighbor. In: Warsito, B and Putro, SP and Khumaeni A (ed) 7th International Seminar on New Paradigm and Innovation on Natural Science and Its Application. IOP PUBLISHING LTD, DIRAC HOUSE, TEMPLE BACK, BRISTOL BS1 6BE, ENGLAND.
Ngai EWT, Hu Y, Wong YH, et al (2011) The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Support Syst 50:559–569. https://doi.org/10.1016/j.dss.2010.08.006.
Owens JD, Houston M, Luebke D, et al (2008) GPU computing. Proc IEEE 96:879–899. https://doi.org/10.1109/JPROC.2008.917757.
Pavlidis NG, Plagianakos VP, Tasoulis DK, Vrahatis MN (2006) Financial forecasting through unsupervised clustering and neural networks. Oper Res 6:103–127. https://doi.org/10.1007/BF02941227.
Qiu J, Lin Z, Li Y (2015) Predicting customer purchase behavior in the e-commerce context. Electron Commer Res 15:427–452. https://doi.org/10.1007/s10660-015-9191-6.
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7.
Sato M (2002) OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors. In: 15th International Symposium on System Synthesis, 2002. pp 109–111.
Schmuck F, Haskin R (2002) GPFS: A shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, USA.
Spaggiari JM, Kovacevic M, Noland B, Bosshart R (2018) Getting Started with Kudu: Perform Fast Analytics on Fast Data. O’Reilly Media.
Trobec R, Slivnik B, Bulić P, Robič B (2018) Introduction to Parallel Computing: From Algorithms to Programming on State-of-the-Art Platforms. Springer International Publishing.
Turkington G, Deshpande T, Karanth S (2016) Hadoop: Data Processing and Modelling. Packt Publishing.
Uzun E, Özhan E (2018) Examining the impact of feature selection on classification of user reviews in web pages. In: International Conference on Artificial Intelligence and Data Processing (IDAP 2018). Malatya, Turkey, pp 430–437.
Vohra D (2016) Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools. Apress.
Wagner W, Otto J, Chung Q (2002) Knowledge acquisition for expert systems in accounting and financial problem domains. Knowledge-Based Syst 15:439–447. https://doi.org/10.1016/S0950-7051(02)00026-6.
Wang Y, Xu W (2018) Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95.
Wei-Yang Lin, Ya-Han Hu, Chih-Fong Tsai (2012) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst Man, Cybern Part C (Applications Rev) 42:421–436. https://doi.org/10.1109/TSMCC.2011.2170420.
Witten IH, Frank E, Hall MA, Pal CJ (2016) Data Mining: Practical Machine Learning Tools and Techniques. Elsevier Science.
Woodward WA, Gray HL, Elliott AC (2017) Applied Time Series Analysis with R. CRC Press.
Xindong Wu, Xingquan Zhu, Gong-Qing Wu, Wei Ding (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26:97–107. https://doi.org/10.1109/TKDE.2013.109.
Yao M, Zhou A, Jia M (2018) Applied Artificial Intelligence: A Handbook for Business Leaders. Topbots.
Yeh I-C, Lien C (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36:2473–2480. https://doi.org/10.1016/j.eswa.2007.12.020.
Zhi-min Xu, Rui Zhang (2009) Financial revenue analysis based on association rules mining. In: 2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA), pp 220–223.
Zikopoulos P, Eaton C (2011) Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, 1st edn. McGraw-Hill Osborne Media.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Ozhan, E., Uzun, E. (2021). The Analysis of Big Financial Data Through Artificial Intelligence Methods. In: Bozkuş Kahyaoğlu, S. (eds) The Impact of Artificial Intelligence on Governance, Economics and Finance, Volume I. Accounting, Finance, Sustainability, Governance & Fraud: Theory and Application. Springer, Singapore. https://doi.org/10.1007/978-981-33-6811-8_4
Download citation
DOI: https://doi.org/10.1007/978-981-33-6811-8_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-6810-1
Online ISBN: 978-981-33-6811-8
eBook Packages: Business and ManagementBusiness and Management (R0)