Quantitative Data Analysis in Finance

  • Xiang Shi
  • Peng ZhangEmail author
  • Samee U. Khan


Quantitative tools have been widely adopted in order to extract the massive information from a variety of financial data. Mathematics, statistics and computers algorithms have never been so important to financial practitioners in history. Investment banks develop equilibrium models to evaluate financial instruments; mutual funds applied time series to identify the risks in their portfolio; and hedge funds hope to extract market signals and statistical arbitrage from noisy market data. The rise of quantitative finance in the last decade relies on the development of computer techniques that make processing large datasets possible. As more data is available at a higher frequency, more researches in quantitative finance have switched to the microstructures of financial market. High frequency data is a typical example of big data that is characterized by the 3V’s: velocity, variety and volume. In addition, the signal-to-noise ratio in financial time series is usually very small. High frequency datasets are more likely to be exposed to extreme values, jumps and errors than the low frequency ones. Specific data processing techniques and quantitative models are elaborately designed to extract information from financial data efficiently. In this chapter, we present the quantitative data analysis approaches in finance. First, we review the development of quantitative finance in the past decade. Then we discuss the characteristics of high frequency data and the challenges it brings. The quantitative data analysis consists of two basic steps: (i) data cleaning and aggregating; (ii) data modeling. We review the mathematics tools and computing technologies behind the two steps. The valuable information extracted from raw data is represented by a group of statistics. The most widely used statistics in finance are expected return and volatility, which are the fundamentals of modern portfolio theory. We further introduce some simple portfolio optimization strategies as an example of the application of financial data analysis. Big data has already changed financial industry fundamentally; while quantitative tools for addressing massive financial data still have a long way to go. Adoptions of advanced statistics, information theory, machine learning and faster computing algorithms are inevitable in order to predict complicated financial markets. These topics are briefly discussed in the later part of this chapter.


Transaction Cost Hedge Fund GARCH Model Portfolio Weight High Frequency Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    I. Aldridge, High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems (Wiley, Hoboken, 2009)Google Scholar
  2. 2.
    I. Aldridge, Trends: all finance will soon be big data finance (2015).
  3. 3.
    S.-I. Amari, H. Nagaoka, Methods of Information Geometry (American Mathematical Society, Providence, 2007)zbMATHGoogle Scholar
  4. 4.
    T.G. Andersen, T. Bollerslev, Intraday periodicity and volatility persistence in financial markets. J. Empir. Financ. 4(2), 115–158 (1997)CrossRefGoogle Scholar
  5. 5.
    T.G. Andersen, T. Bollerslev et al., Intraday and interday volatility in the Japanese stock market. J. Int. Financ. Mark. Inst. Money 10(2), 107–130 (2000)CrossRefGoogle Scholar
  6. 6.
    A. Beck, Y.S.A. Kim et al., Empirical analysis of ARMA-GARCH models in market risk estimation on high-frequency US data. Stud. Nonlinear Dyn. Econom. 17(2), 167–177 (2013)MathSciNetGoogle Scholar
  7. 7.
    F. Black, M. Scholes, The pricing of options and corporate liabilities. J. Polit. Econ. 81, 637–654 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    T. Bollerslev, Generalized autoregressive conditional heteroskedasticity. J. Econom. 31(3), 307–327 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    C.T. Brownlees, G.M. Gallo, Financial econometric analysis at ultra-high frequency: data handling concerns. Comput. Stat. Data Anal. 51(4), 2232–2245 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    N. Cesa-Bianchi, G. Lugosi, Prediction, Learning, and Games (Cambridge University Press, Cambridge, 2006)CrossRefzbMATHGoogle Scholar
  11. 11.
    A. Chekhlov, S.P. Uryasev et al., Portfolio optimization with drawdown constraints. Research report 2000-5. Available at SSRN (2000)
  12. 12.
    J. Choi, A.P. Mullhaupt, Geometric shrinkage priors for Khlerian signal filters. Entropy 17(3), 1347–1357 (2015)MathSciNetCrossRefGoogle Scholar
  13. 13.
    T.M. Cover, Universal portfolios. Math. Financ. 1(1), 1–29 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    T.M. Cover, E. Ordentlich, Universal portfolios with side information. IEEE Trans. Inform. Theory 42(2), 348–363 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    J.C. Cox, S.A. Ross, The valuation of options for alternative stochastic processes. J. Financ. Econ. 3(1–2), 145–166 (1976)CrossRefGoogle Scholar
  16. 16.
    J.C. Cox, S.A. Ross et al., Option pricing: a simplified approach. J. Financ. Econ. 7(3), 229–263 (1979)CrossRefzbMATHGoogle Scholar
  17. 17.
    D.W. Diamond, R.E. Verrecchia, Constraints on short-selling and asset price adjustment to private information. J. Financ. Econ. 18(2), 277–311 (1987)CrossRefGoogle Scholar
  18. 18.
    X. Dong, New development on market microstructure and macrostructure: patterns of US high frequency data and a unified factor model framework. Ph.D. Dissertation, State University of New York at Stony Brook (2013)Google Scholar
  19. 19.
    D. Duffie, Dynamic Asset Pricing Theory (Princeton University Press, Princeton, 2010)zbMATHGoogle Scholar
  20. 20.
    A. Dufour, R.F. Engle, Time and the price impact of a trade. J. Financ. 55(6), 2467–2498 (2000)CrossRefGoogle Scholar
  21. 21.
    D. Easley, M. O’hara, Time and the process of security price adjustment. J. Financ. 47(2), 577–605 (1992)Google Scholar
  22. 22.
    R.F. Engle, Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econom. J. Econom. Soc. 50, 987–1007 (1982)Google Scholar
  23. 23.
    R.F. Engle, The econometrics of ultra-high-frequency data. Econometrica 68(1), 1–22 (2000)CrossRefzbMATHGoogle Scholar
  24. 24.
    R.F. Engle, S. Manganelli, CAViaR: conditional autoregressive value at risk by regression quantiles. J. Bus. Econ. Stat. 22(4), 367–381 (2004)MathSciNetCrossRefGoogle Scholar
  25. 25.
    R.F. Engle, J.R. Russell, Autoregressive conditional duration: a new model for irregularly spaced transaction data. Econometrica 66, 1127–1162 (1998)Google Scholar
  26. 26.
    B. Fang, P. Zhang, in Big Data in Finance. Big Data Concepts, Theories, and Applications, ed. by S. Yu, S. Guo (Springer International Publishing, Cham, 2016), pp. 391–412Google Scholar
  27. 27.
    R. Gençay, M. Dacorogna et al., An Introduction to High-Frequency Finance (Academic Press, San Diego, 2001)Google Scholar
  28. 28.
    L. Györfi, I. Vajda, Growth optimal investment with transaction costs. Algorithmic Learning Theory (Springer, Berlin, 2008)Google Scholar
  29. 29.
    J.M. Harrison, D.M. Kreps, Martingales and arbitrage in multiperiod securities markets. J. Econ. Theory 20(3), 381–408 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    D.P. Helmbold, R.E. Schapire et al., On-line portfolio selection using multiplicative updates. Math. Financ. 8(4), 325–347 (1998)CrossRefzbMATHGoogle Scholar
  31. 31.
    T. Jia, Algorithms and structures for covariance estimates with application to finance. Ph.D. Dissertation, State University of New York at Stony Brook (2013)Google Scholar
  32. 32.
    Y.S. Kim, Multivariate tempered stable model with long-range dependence and time-varying volatility. Front. Appl. Math. Stat. 1, 1 (2015)CrossRefGoogle Scholar
  33. 33.
    O. Ledoit, M. Wolf, Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empir. Financ. 10(5), 603–621 (2003)CrossRefGoogle Scholar
  34. 34.
    B. Li, S.C. Hoi, Online portfolio selection: a survey. ACM Comput. Surv. (CSUR) 46(3), 35 (2014)zbMATHGoogle Scholar
  35. 35.
    J. Lintner, The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Rev. Econ. Stat. 47, 13–37 (1965)Google Scholar
  36. 36.
    C. Liu, D.B. Rubin, The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4), 633–648 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    H. Markowitz, Portfolio selection. J. Financ. 7(1), 77–91 (1952)Google Scholar
  38. 38.
    S.L. Marple Jr., Digital Spectral Analysis with Applications (Prentice-Hall, Inc, Englewood Cliffs, 1987)Google Scholar
  39. 39.
    Y. Matsuyama, The alpha-EM algorithm: surrogate likelihood maximization using alpha-logarithmic information measures. IEEE Trans. Inform. Theory 49(3), 692–706 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  40. 40.
    A.J. McNeil, R. Frey et al., Quantitative Risk Management: Concepts, Techniques and Tools (Princeton University Press, Princeton, 2005)Google Scholar
  41. 41.
    X.-L. Meng, D.B. Rubin, Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2), 267–278 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    R.C. Merton, Lifetime portfolio selection under uncertainty: the continuous-time case. Rev. Econ. Stat. 51, 247–257 (1969)Google Scholar
  43. 43.
    A. Meucci, ‘P’Versus ‘Q’: differences and commonalities between the two areas of quantitative finance. GARP Risk Prof., 47–50 (2011)Google Scholar
  44. 44.
    A.M. Mineo, F. Romito, A method to ‘clean up’ ultra high-frequency data, Vita e pensiero (2007)Google Scholar
  45. 45.
    A.M. Mineo, F. Romito, Different methods to clean up ultra high-frequency data. Atti della XLIV Riunione Scientifica della Societa’Italiana di Statistica (2008)Google Scholar
  46. 46.
    J. Mossin, Equilibrium in a capital asset market. Econom.: J. Econom. Soc. 34, 768–783 (1966)Google Scholar
  47. 47.
    A.P. Mullhaupt, K.S. Riedel, Band matrix representation of triangular input balanced form. IEEE Trans. Autom. Control (1998)Google Scholar
  48. 48.
    R.M. Neal, G.E. Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, Learning in Graphical Models (Springer, New York, 1998), pp. 355–368Google Scholar
  49. 49.
    J. Nocedal, S. Wright, Numerical Optimization (Springer Science and Business Media, New York, 2006)Google Scholar
  50. 50.
    S.T. Rachev, S. Mittnik et al., Financial Econometrics: From Basics to Advanced Modeling Techniques (Wiley, New York, 2007)zbMATHGoogle Scholar
  51. 51.
    R.T. Rockafellar, S. Uryasev, Optimization of conditional value-at-risk. J. Risk 2, 21–42 (2000)CrossRefGoogle Scholar
  52. 52.
    D.B. Rubin, D.T. Thayer, EM algorithms for ML factor analysis. Psychometrika 47(1), 69–76 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  53. 53.
    J.R. Russell, R. Engle et al., Analysis of high-frequency data. Handb. Financ. Econom. 1, 383–426 (2009)Google Scholar
  54. 54.
    W.F. Sharpe, Capital asset prices: a theory of market equilibrium under conditions of risk. J. Financ. 19(3), 425–442 (1964)Google Scholar
  55. 55.
    X. Shi, A. Kim, Coherent risk measure and normal mixture distributions with application in portfolio optimization and risk allocation (2015). Available at SSRN
  56. 56.
    W. Sun, S.Z. Rachev et al., Long-range dependence, fractal processes, and intra-daily data, Handbook on Information Technology in Finance (Springer, New York, 2008), pp. 543–585CrossRefGoogle Scholar
  57. 57.
    S. Tomov, R. Nath et al., Dense linear algebra solvers for multicore with GPU accelerators, in IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum (IPDPSW) (IEEE, 2010)Google Scholar
  58. 58.
    J.L. Treynor, Toward a theory of market value of risky assets. Available at SSRN (1961). doi: 10.2139/ssrn.628187
  59. 59.
    Y. Yan, Introduction to TAQ. WRDS Users Conference Presentation (2007)Google Scholar
  60. 60.
    P. Zhang, Y. Gao, Matrix multiplication on high-density multi-GPU architectures: theoretical and experimental investigations, in High Performance Computing: 30th International Conference, ISC High Performance 2015, Frankfurt, Germany, 12–16 July 2015, Proceedings, ed. by M.J. Kunkel, T. Ludwig (Springer International Publishing, Cham, 2015), pp. 17–30Google Scholar
  61. 61.
    P. Zhang, Y. Gao et al., A data-oriented method for scheduling dependent tasks on high-density multi-GPU systems, in IEEE 17th International Conference on High Performance Computing and Communications (HPCC), IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), IEEE 12th International Conference on Embedded Software and Systems (ICESS) New York, NY, 2015, pp. 694–699Google Scholar
  62. 62.
    P. Zhang, L. Liu et al., A data-driven paradigm for mapping problems. Parallel Comput. 48, 108–124 (2015)CrossRefGoogle Scholar
  63. 63.
    P. Zhang, K. Yu et al., QuantCloud: big data infrastructure for quantitative finance on the cloud. IEEE Trans. Big Data (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Stony Brook UniversityStony BrookUSA
  2. 2.North Dakota State UniversityFargoUSA

Personalised recommendations