Skip to main content

Predictive Analytics Techniques: Theory and Applications in Finance

  • Chapter
  • First Online:
Financial Data Analytics

Part of the book series: Contributions to Finance and Accounting ((CFA))

Abstract

This chapter presents several models associated with predictive analysis across disciplines. While the models are presented to appeal to finance professionals and learners, they were chosen because of their wide use across disciplines. The chapter covers five models: logistic regression, time series analysis, decision trees, multiple linear regression, and RFM (Recency, Frequency, Monetary) Segmentation with k-means. The models are presented with moderate mathematical depth and with an emphasis on building a working software implementation in the R programming language. The theoretical justifications are weaved with the software creation, under the premise that seeing the model work provides the necessary encouragement to learn the theory (in a different book perhaps). Some datasets are accessed from public repositories, while other are synthetic and have been created specifically for this project. Each model is built to be used in finance as well as in any discipline in which similar questions are asked of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abreu, R. J., Souza, R. M., & Oliveira, J. G. (2019). Applying singular spectrum analysis and Arima-Garch for forecasting Eur/Usd exchange rate. Revista de Administração Mackenzie, 20(4), 1–32.

    Google Scholar 

  • Ahmar, A. S., & del Val, E. B. (2020). SutteARIMA: Short-term forecasting method, a case: Covid-19 and stock market in Spain. Science of the Total Environment, 729.

    Google Scholar 

  • Carrasco, R. A., Blasco, M. F., Garcia-Madariaga, J., & Herrera-Viedma, E. (2019). A Fuzzy linguistic RFM model applied to campaign management. International Journal of Interactive Multimedia and Artificial Intelligence, 5(4), 21.

    Article  Google Scholar 

  • Chan, N. H. (2010). Time series: Applications to finance with R and S-Plus (2nd ed.). Wiley.

    Book  Google Scholar 

  • Chen, D.-G., & Chen, J. K. (2021). Statistical regression modeling with R: Longitudinal and multilevel modeling (Emerging topics in statistics and biostatistics). Springer.

    Book  Google Scholar 

  • David, S. A., Trevisan, L. R., Lopes, A. M., Machado, J. A. T., & Inácio, C. M. C., Jr. (2017). Dynamics of commodities prices: integer and fractional models. Fundamenta Informaticae, 151(1–4), 389–408.

    Article  Google Scholar 

  • Dixon, M. F., Halperin, I., & Bilokon, P. (2020). Machine learning in finance: From theory to practice (1st ed.). Springer.

    Book  Google Scholar 

  • Güçdemir, H., & Selim, H. (2015). Integrating multi-criteria decision making and clustering for business customer segmentation. Industrial Management & Data Systems, 115(6), 1022–1040.

    Article  Google Scholar 

  • Gul, F., & Khan, K. (2019). An empirical study of investor attitudinal factors influencing herd behavior: Evidence from Pakistan Stock Exchange. Abasyn University Journal of Social Sciences, 12(1), 1–11.

    Article  Google Scholar 

  • Hay-Jahans, C. (2017). An R companion to linear statistical models (1st ed.). CRC Press.

    Google Scholar 

  • Hilbe, J. M. (2018). Practical guide to logistic regression. CRC Press.

    Google Scholar 

  • Kitagawa, G. (2020). Introduction to time series modeling with applications in R (2nd ed.). CRC Press.

    Book  Google Scholar 

  • Kleinbaum, D. G., Kupper, L. L., Nizam, A., & Rosenberg, E. S. (2013). Applied regression analysis and other multivariable methods (5th ed.) Cengage Learning.

    Google Scholar 

  • Kroese, D. P., Botev, Z., Taimre, T., & Vaisman, R. (2019). Data science and machine learning: Mathematical and statistical methods (Chapman & Hall/CRC machine learning & pattern recognition) (1st ed.). Chapman and Hall/CRC.

    Book  Google Scholar 

  • Lantz, B. (2019). Machine learning with R: Expert techniques for predictive modeling (3rd ed.). Packt.

    Google Scholar 

  • Liang, Q., Ling, L., Tang, J., Zeng, H., & Zhuang, M. (2020). Managerial overconfidence, firm transparency, and stock price crash risk: Evidence from an emerging market. China Finance Review International, 10(3), 271–296.

    Article  Google Scholar 

  • McCarthy, R. V., McCarthy, M. M., Ceccucci, W., & Halawi, L. (2019). Applying predictive analytics (1st ed.). Springer.

    Book  Google Scholar 

  • Saha, P., Bose, I., & Mahanti, A. (2016). A knowledge based scheme for risk assessment in loan processing by banks. Decision Support Systems, 84, 78.

    Article  Google Scholar 

  • Seager, H. R. (1900). The economic writings of Sir William Petty, together with observations upon the bills of mortality, more probably by Captain John Graunt William Petty John Graunt Charles Henry Hull. The Annals of the American Academy of Political and Social Science, 15, 145–149.

    Article  Google Scholar 

  • Searle, S. R., & Gruber, M. H. J. (2016). Linear models (Wiley series in probability and statistics) (2nd ed.). Wiley.

    Google Scholar 

  • Shapiro, F. R. (2006). The Yale book of quotations. Yale University Press.

    Google Scholar 

  • Sutor, R. S. (2019). Dancing with Qubits: How quantum computing works and how it can change the world. Packt.

    Google Scholar 

  • Teng, H. -W., & Lee, M. (2019). Estimation procedures of using five alternative machine learning methods for predicting credit card default. Review of Pacific Basin Financial Markets & Policies, 22(3), N.PAG.

    Google Scholar 

  • Turvey, C. G., Kong, R., & Huo, X. (2010). Borrowing amongst friends: the economics of informal credit in rural China. China Agricultural Economic Review, 2(2), 133–147.

    Article  Google Scholar 

  • Ünkaya, G., & Sayin, G. (2019). Halka Açik Fi̇nans Dişi Şi̇rketlerde Sürekli̇li̇k Ri̇ski̇ni̇n Karar Ağaci Modeli̇ İle Öngörülmesi̇. Mali Cozum Dergisi / Financial Analysis, 29(156), 13–28.

    Google Scholar 

  • Vieira, M., Snyder, B., Henriques, E., & Reis, L. (2019). European offshore wind capital cost trends up to 2020. Energy Policy, 129, 1364–1371.

    Article  Google Scholar 

  • Zhang, Y., Bradlow, E. T., & Small, D. S. (2015). Predicting customer value using clumpiness: From RFM to RFMC. Marketing Science, 34(2), 195–208.

    Article  Google Scholar 

  • Zhao, Y. (2020). Research on personal credit evaluation of internet finance based on blockchain and decision tree algorithm. EURASIP Journal on Wireless Communications & Networking, 2020(1), N.PAG.

    Google Scholar 

Download references

Acknowledgments

I would like to thank all the collaborators on this book project, starting with Dr. Sinem Derindere. Within a few years, the trying times we are currently experiencing due to the COVID-19 pandemic, will have faded away and be replaced by the pressing events of that day. But, in the Winter of 2020–2021, working on this project, offers a glimpse of meaning and purpose to the work we are all carrying in our respective parts of the world.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isac Artzi .

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Data 1

(RMD 2 kb)

Data 2

(CSV 33 kb)

Data 3

(RMD 4 kb)

Data 4

(RMD 2 kb)

Data 5

(XLSX 10 kb)

Data 6

(CSV 4231 kb)

Data 7

(XLSX 1693 kb)

Data 8

(CSV 16160 kb)

Data 9

(CSV 807 kb)

Data 10

(RMD 4 kb)

Data 11

(CSV 1269 kb)

Data 12

(CSV 503 kb)

Data 13

(CSV 145 kb)

Data 14

(RMD 5 kb)

Data 15

(RMD 2 kb)

Data 16

(RMD 5 kb)

Data 17

(CSV 9 kb)

Data 18

(RMD 3 kb)

Data 19

(RMD 8 kb)

Data 20

(RMD 423 bytes)

Data 21

(CSV 987 bytes)

Data 22

(RMD 3 kb)

Data 23

(CSV 262 kb)

Key Terms and Definitions

Key Terms and Definitions

ARIMA :

Auto-regressive Integrated Moving Average, is one of the two most widely used approaches to time series forecasting. It aims to describe autocorrelations in the data as predictors of future values.

AUC curve :

Area Under the ROC Curve.

Exponential smoothing :

One of the two most widely used approaches to time series forecasting. It is based on a description of the trend and seasonality in the data.

Gain and Lift chart :

A chart used to evaluate the performance of a (classification) model. It shows the difference between making predictions using a model and without the model.

k -means :

An unsupervised clustering algorithm, where k is the number of clusters, set by the user.

Logistic regression :

A statistical model that uses the logistic function (sigmoid shaped) to model a binary dependent variable (outcome).

Non-Constant Variance (NCV) test :

Computes a score test of the hypothesis of constant error variance against the alternative that the error variance changes with the level of the response (fitted values), or with a linear combination of predictors.

Odd ratio :

A measure of association between an independent variable and an outcome. It represents the odds that an outcome will occur given a particular event, compared to the odds of the outcome occurring in the absence of that event.

RFM :

Recency, frequency, monetary value is a marketing analysis tool, using measures to identify best customers for a business.

ROC curve :

Receiver Operating Characteristic curve. It visualizes the ratio between TPR (True Positive Rate) and FPR (False Positive Rate).

Seasonal Variation :

A component of a time series which is defined as the repetitive and predictable movement around the trend line in 1one year or less.

Time Series :

A set of points collected over a period of time.

Tree pruning :

A process to reduce the size of a decision tree, by removing noncritical subtrees.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Artzi, I. (2022). Predictive Analytics Techniques: Theory and Applications in Finance. In: Derindere Köseoğlu, S. (eds) Financial Data Analytics. Contributions to Finance and Accounting. Springer, Cham. https://doi.org/10.1007/978-3-030-83799-0_3

Download citation

Publish with us

Policies and ethics