Skip to main content

The Analysis of Big Financial Data Through Artificial Intelligence Methods

  • Chapter
  • First Online:
The Impact of Artificial Intelligence on Governance, Economics and Finance, Volume I

Abstract

A new data world which never get deformed, can be reached from anywhere, continuously stream and multiply, emerged with the evolution of technology. The data, in particular, created by business firms, scientific research centers, and automation systems reached great amounts. It has become the main target of many data analysts to reach meaningful, unexplored, and valuable information or deductions among these piles of data. In this chapter, firstly the techniques of artificial intelligence and the skills of these techniques were discussed. Later, the mostly-used techniques in the finance sector, the advantages and weaknesses of these techniques, and the methods which can be used to process the data created by the finance sector, which creates big data and is one of the leading sources, was comparatively shown. The current version of the mostly-used artificial intelligence methods in the finance sector was scanned and the new skills and contributions it provides to the sector were examined. What Classification, clustering, association rules, and time series analysis methods, in particular, cover and what problems they can produce solutions to were examined and the readers were informed about these techniques. It was aimed to give information about forming credit score and customer segmentation, where classification and clustering methods are especially employed, with sample studies. It was aimed to present the principles the up-to-date methods are based on and their theoretical and practical applications in a meaningful way. In addition to these, information about practical and useful software that can be used for data analysis in the finance sector was given and the skills of this software were conveyed to the readers. Finally, how the techniques of processing big data can be used was examined through samples as the finance data are classified as big data. The difficulties met during the analysis of big data, a natural result created by this sector, and solutions to them were presented. Updated big data processing solutions like Hadoop, Spark, MapReduce, Distributed computing, and GPU (Graphics Processing Unit) computing, in particular, were comparatively explained. The main principles that big data processing techniques are based on were simplified in a way that the readers could understand and were supported by examples from the sector. Especially, Spark, Hadoop, and MapReduce methods, which are leading methods in processing big data, were examined. Finally, the contributions made to the sector by artificial intelligence and big data processing techniques were generally summarized and the results were presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Identifying the license plate of a vehicle is actually determining the class of each letter on the license plate in an alphabet with an average number of 40 classes.

  2. 2.

    These companies promise the required infrastructure for businesses by providing monthly or annual subscriptions. Even though they are widely used today, some businesses are establishing their own data analysis departments for carrying out these analyses.

  3. 3.

    ENIAC- Electronic Numerical Integrator And Computer: “Built between 1943 and 1945 the first large-scale computer to run at electronic speed without being slowed by any mechanical parts” (CHM).

  4. 4.

    https://github.com/erdincuzun/ml_intro.

  5. 5.

    https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients.

  6. 6.

    House Price Prediction with Numeric-only, Dataset https://www.kaggle.com/youngseoklee/house-price-prediction-with-numeric-only-dataset/data.

  7. 7.

    https://github.com/sowmyacr/kmeans_cluster/blob/master/CLV.csv.

  8. 8.

    https://archive.ics.uci.edu/ml/datasets/online+retail.

  9. 9.

    For detail: https://www.cs.waikato.ac.nz/ml/weka/.

  10. 10.

    For detail: https://rapidminer.com/.

  11. 11.

    Fro detail: https://orange.biolab.si/.

  12. 12.

    For detail: https://www.knime.com/.

  13. 13.

    For detail: https://www.python.org/.

  14. 14.

    For detail: https://www.r-project.org/.

  15. 15.

    Visit https://kudu.apache.org/ for more detailed information.

  16. 16.

    https://kafka.apache.org/.

References

  • Anderson TE, Dahlin MD, Neefe JM, et al (1995) Serverless network file systems. SIGOPS Oper Syst Rev 29:109–126. https://doi.org/10.1145/224057.224066.

    Article  Google Scholar 

  • Apache Software Foundation (2019) Apache Hadoop 2.10.0—HDFS Architecture. https://hadoop.apache.org/docs/r2.10.0/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Introduction. Accessed 13 Dec 2019.

  • Apache Software Foundation Apache Hadoop 3.2.1—Apache Hadoop YARN. https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 19 Dec 2019.

  • Apache Software Foundation Impala. https://impala.apache.org/overview.html. Accessed 23 Dec 2019.

  • The Apache Software Apache Hadoop. https://hadoop.apache.org/. Accessed 17 Dec 2019.

  • The Apache Software Foundation Apache Kudu—Fast Analytics on Fast Data. https://kudu.apache.org/. Accessed 19 Dec 2019.

  • Artis M, Ayuso M, Guillén M (2002) Detection of automobile insurance fraud with discrete choice models and misclassified claims. J Risk Insur 69:325–340.

    Article  Google Scholar 

  • Atlassian (2016) Sentry Tutorial—Apache Sentry—Apache Software Foundation. https://cwiki.apache.org/confluence/display/SENTRY/Sentry+Tutorial. Accessed 23 Dec 2019.

  • Attiya H, Welch J (2004) Distributed Computing: Fundamentals, Simulations, and Advanced Topics. Wiley.

    Google Scholar 

  • Bahrammirzaee A (2010) A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems. Neural Comput Appl 19:1165–1195. https://doi.org/10.1007/s00521-010-0362-z.

    Article  Google Scholar 

  • Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50:602–613. https://doi.org/10.1016/j.dss.2010.08.008.

  • Blazejewski A, Coggins R (2004) Application of self-organizing maps to clustering of high-frequency financial data. In: Proceedings of the Second Workshop on Australasian Information Security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, pp 85–90.

    Google Scholar 

  • Boeing G, Waddell P (2017) New insights into rental housing markets across the united states: web scraping and analyzing craigslist rental listings. J Plan Educ Res 37:457–476. https://doi.org/10.1177/0739456X16664789.

    Article  Google Scholar 

  • Brause R, Langsdorf T, Hepp M (1999) Neural data mining for credit card fraud detection. In: Proceedings 11th International Conference on Tools with Artificial Intelligence, pp 103–106.

    Google Scholar 

  • Brockwell PJ, Davis RA (2016) Introduction to Time Series and Forecasting. Springer International Publishing.

    Google Scholar 

  • Cai L, Zhu Y (2015) The challenges of data quality and data quality assessment in the big data era. Data Sci J 14:2. https://doi.org/10.5334/dsj-2015-002.

    Article  Google Scholar 

  • Castillo O, Melin P (1995) An intelligent system for financial time series prediction combining dynamical systems theory, fractal theory, and statistical methods. In: Proceedings of 1995 Conference on Computational Intelligence for Financial Engineering (CIFEr). IEEE, pp 151–155.

    Google Scholar 

  • Chapman B, Jost G, van der Pas R, Kuck DJ (2008) Using OpenMP: Portable Shared Memory Parallel Programming. https://apps2.mdp.ac.id/perpustakaan/ebook/Karya%20Umum/Portable_Shared_Memory_Parallel_Programming.pdf. Accessed 3 May 2020.

  • Chawla NV (2009) Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook. Springer US, Boston, MA, pp 875–886.

    Chapter  Google Scholar 

  • Chen D, Sain SL, Guo K (2012) Data mining for the online retail industry: a case study of RFM model-based customer segmentation using data mining. J Database Mark Cust Strateg Manag 19:197–208. https://doi.org/10.1057/dbm.2012.17.

    Article  Google Scholar 

  • Chen S (2016) Detection of fraudulent financial statements using the hybrid data mining approach. Springerplus 5:89. https://doi.org/10.1186/s40064-016-1707-6.

    Article  Google Scholar 

  • CHM ENIAC—CHM Revolution. https://www.computerhistory.org/revolution/birth-of-the-computer/4/78. Accessed 9 Dec 2019.

  • Chou C-H, Hsieh S-C, Qiu C-J (2017) Hybrid genetic algorithm and fuzzy clustering for bankruptcy prediction. Appl Soft Comput 56:298–316. https://doi.org/10.1016/j.asoc.2017.03.014.

    Article  Google Scholar 

  • Ciszak L (2008) Application of clustering and association methods in data cleaning. In: 2008 International Multiconference on Computer Science and Information Technology, pp 97–103.

    Google Scholar 

  • Cryer JD, Chan KS (2008) Time Series Analysis: With Applications in R. Springer. New York.

    Book  Google Scholar 

  • Cumby C, Fano A, Ghani R, Krema M (2004) Predicting customer shopping lists from point-of-sale purchase data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, USA, pp 402–409.

    Google Scholar 

  • de Sá AGC, Pereira ACM, Pappa GL (2018) A customized classification algorithm for credit card fraud detection. Eng Appl Artif Intell 72:21–29. https://doi.org/10.1016/j.engappai.2018.03.011.

  • Doumpos M, Zopounidis C (2002) Multi–criteria classification methods in financial and banking decisions. Int Trans Oper Res 9:567–581. https://doi.org/10.1111/1475-3995.00374.

    Article  Google Scholar 

  • Erl T, Khattak W, Buhler P (2015) Big Data Fundamentals: Concepts, Drivers & Techniques. Prentice Hall.

    Google Scholar 

  • Farajian MA, Mohammadi S (2010) Mining the banking customer behavior using clustering and association rules methods. Int J Indust Eng Prod Res 21:239–245.

    Google Scholar 

  • Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. ACM, New York, NY, USA, pp 29–43.

    Google Scholar 

  • Gottfredson LS (1997) Mainstream science on intelligence: an editorial with 52 signatories, history, and bibliography. Intelligence 24:13–23. https://doi.org/10.1016/S0160-2896(97)90011-8.

    Article  Google Scholar 

  • Guida T (2018) Big Data and Machine Learning in Quantitative Investment. Wiley.

    Google Scholar 

  • Gupta R, Pathak C (2014) A machine learning framework for predicting purchase by online customers based on dynamic pricing. Procedia Comput Sci 36:599–605. https://doi.org/10.1016/j.procs.2014.09.060.

    Article  Google Scholar 

  • Hamuro Y, Katoh N, Edward IH, et al (2003) Combining information fusion with string pattern analysis: a new method for predicting future purchase behavior BT—Information fusion in data mining. In: Torra V (ed). Springer Berlin Heidelberg, Berlin, Heidelberg, pp 161–187.

    Google Scholar 

  • Holland JH (1992) Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press.

    Google Scholar 

  • Holmes A (2012) Hadoop in practice—MEAP. In: Hadoop in Practice. p 525.

    Google Scholar 

  • Hsu CF, Hung HF (2009) Classification methods of credit rating—a comparative analysis on SVM, MDA and RST. In: 2009 International Conference on Computational Intelligence and Software Engineering. pp 1–4.

    Google Scholar 

  • Hurwitz J, Nugent A, Halper F, Kaufman M (2013) Big Data for Dummies, For Dummies; 1st Edition (April 15, 2013).

    Google Scholar 

  • Ishwarappa, Anuradha J (2015) A brief introduction on Big Data 5Vs characteristics and Hadoop Technology. Procedia Comput Sci 48:319–324. https://doi.org/10.1016/j.procs.2015.04.188.

    Article  Google Scholar 

  • Joudaki H, Rashidian A, Minaei-Bidgoli B, et al (2015) Using data mining to detect health care fraud and abuse: a review of literature. Glob J Health Sci 7:194.

    Google Scholar 

  • Kaggle, House Price Prediction with Numeric-only, https://www.kaggle.com/youngseoklee/house-price-prediction-with-numeric-only-dataset/data. Accessed 5 Apr 2020.

  • Khan MA, Uddin MF, Gupta N (2014) Seven V’s of Big Data understanding Big Data to extract value. In: Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education. IEEE, pp 1–5.

    Google Scholar 

  • Kim E, Kim W, Lee Y (2003) Combination of multiple classifiers for the customer’s purchase behavior prediction. Decis Support Syst 34:167–175. https://doi.org/10.1016/S0167-9236(02)00079-9.

  • Kirkos E, Spathis C, Manolopoulos Y (2007) Data mining techniques for the detection of fraudulent financial statements. Expert Syst Appl 32:995–1003.

    Article  Google Scholar 

  • Kirlidog M, Asuk C (2012) A fraud detection approach with data mining in health insurance. Procedia-Social Behav Sci 62:989–994.

    Article  Google Scholar 

  • Kshemkalyani AD, Singhal M (2011) Distributed Computing: Principles, Algorithms, and Systems. Cambridge University Press.

    Google Scholar 

  • Kumar BS, Ravi V (2016) A survey of the applications of text mining in financial domain. Knowledge-Based Syst 114:128–147.

    Article  Google Scholar 

  • Kunigk J, Buss I, Wilkinson P, George L (2018) Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale. O’Reilly Media.

    Google Scholar 

  • Labrinidis A, Jagadish H V (2012) Challenges and opportunities with Big Data. Proc VLDB Endow 5:2032–2033. https://doi.org/10.14778/2367502.2367572.

  • McCarthy J, Minsky ML, Rochester N, Shannon CE (2006) A proposal for the dartmouth summer research project on artificial intelligence, August 31, 1955. AI Magazine 27(4):12. https://doi.org/10.1609/aimag.v27i4.1904.

    Article  Google Scholar 

  • Meng X, Bradley J, Yavuz B, et al (2016) MLlib: Machine Learning in Apache Spark. J Mach Learn Res 17:1–7.

    Google Scholar 

  • Mitchell TM (1999) Machine learning and data mining. Commun ACM 42:30–36. https://doi.org/10.1145/319382.319388.

    Article  Google Scholar 

  • Mukid MA, Widiharih T, Rusgiyono A, Prahutama A (2018) Credit scoring analysis using weighted k nearest neighbor. In: Warsito, B and Putro, SP and Khumaeni A (ed) 7th International Seminar on New Paradigm and Innovation on Natural Science and Its Application. IOP PUBLISHING LTD, DIRAC HOUSE, TEMPLE BACK, BRISTOL BS1 6BE, ENGLAND.

    Google Scholar 

  • Ngai EWT, Hu Y, Wong YH, et al (2011) The application of data mining techniques in financial fraud detection: a classification framework and an academic review of literature. Decis Support Syst 50:559–569. https://doi.org/10.1016/j.dss.2010.08.006.

  • Owens JD, Houston M, Luebke D, et al (2008) GPU computing. Proc IEEE 96:879–899. https://doi.org/10.1109/JPROC.2008.917757.

    Article  Google Scholar 

  • Pavlidis NG, Plagianakos VP, Tasoulis DK, Vrahatis MN (2006) Financial forecasting through unsupervised clustering and neural networks. Oper Res 6:103–127. https://doi.org/10.1007/BF02941227.

    Article  Google Scholar 

  • Qiu J, Lin Z, Li Y (2015) Predicting customer purchase behavior in the e-commerce context. Electron Commer Res 15:427–452. https://doi.org/10.1007/s10660-015-9191-6.

    Article  Google Scholar 

  • Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7.

    Article  Google Scholar 

  • Sato M (2002) OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors. In: 15th International Symposium on System Synthesis, 2002. pp 109–111.

    Google Scholar 

  • Schmuck F, Haskin R (2002) GPFS: A shared-disk file system for large computing clusters. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, USA.

    Google Scholar 

  • Spaggiari JM, Kovacevic M, Noland B, Bosshart R (2018) Getting Started with Kudu: Perform Fast Analytics on Fast Data. O’Reilly Media.

    Google Scholar 

  • Trobec R, Slivnik B, Bulić P, Robič B (2018) Introduction to Parallel Computing: From Algorithms to Programming on State-of-the-Art Platforms. Springer International Publishing.

    Google Scholar 

  • Turkington G, Deshpande T, Karanth S (2016) Hadoop: Data Processing and Modelling. Packt Publishing.

    Google Scholar 

  • Uzun E, Özhan E (2018) Examining the impact of feature selection on classification of user reviews in web pages. In: International Conference on Artificial Intelligence and Data Processing (IDAP 2018). Malatya, Turkey, pp 430–437.

    Google Scholar 

  • Vohra D (2016) Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools. Apress.

    Google Scholar 

  • Wagner W, Otto J, Chung Q (2002) Knowledge acquisition for expert systems in accounting and financial problem domains. Knowledge-Based Syst 15:439–447. https://doi.org/10.1016/S0950-7051(02)00026-6.

    Article  Google Scholar 

  • Wang Y, Xu W (2018) Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95.

    Article  Google Scholar 

  • Wei-Yang Lin, Ya-Han Hu, Chih-Fong Tsai (2012) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst Man, Cybern Part C (Applications Rev) 42:421–436. https://doi.org/10.1109/TSMCC.2011.2170420.

  • Witten IH, Frank E, Hall MA, Pal CJ (2016) Data Mining: Practical Machine Learning Tools and Techniques. Elsevier Science.

    Google Scholar 

  • Woodward WA, Gray HL, Elliott AC (2017) Applied Time Series Analysis with R. CRC Press.

    Google Scholar 

  • Xindong Wu, Xingquan Zhu, Gong-Qing Wu, Wei Ding (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26:97–107. https://doi.org/10.1109/TKDE.2013.109.

    Article  Google Scholar 

  • Yao M, Zhou A, Jia M (2018) Applied Artificial Intelligence: A Handbook for Business Leaders. Topbots.

    Google Scholar 

  • Yeh I-C, Lien C (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst Appl 36:2473–2480. https://doi.org/10.1016/j.eswa.2007.12.020.

    Article  Google Scholar 

  • Zhi-min Xu, Rui Zhang (2009) Financial revenue analysis based on association rules mining. In: 2009 Asia-Pacific Conference on Computational Intelligence and Industrial Applications (PACIIA), pp 220–223.

    Google Scholar 

  • Zikopoulos P, Eaton C (2011) Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, 1st edn. McGraw-Hill Osborne Media.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Ozhan, E., Uzun, E. (2021). The Analysis of Big Financial Data Through Artificial Intelligence Methods. In: Bozkuş Kahyaoğlu, S. (eds) The Impact of Artificial Intelligence on Governance, Economics and Finance, Volume I. Accounting, Finance, Sustainability, Governance & Fraud: Theory and Application. Springer, Singapore. https://doi.org/10.1007/978-981-33-6811-8_4

Download citation

Publish with us

Policies and ethics