Advertisement

Computational Management Science

, Volume 14, Issue 3, pp 367–391 | Cite as

Regularised gradient boosting for financial time-series modelling

  • Alexandros Agapitos
  • Anthony Brabazon
  • Michael O’Neill
Original Paper
  • 274 Downloads

Abstract

Gradient Boosting (GB) learns an additive expansion of simple basis-models. This is accomplished by iteratively fitting an elementary model to the negative gradient of a loss function with respect to the expansion’s values at each training data-point evaluated at each iteration. For the case of squared-error loss function, the negative gradient takes the form of an ordinary residual for a given training data-point. Studies have demonstrated that running GB for hundreds of iterations can lead to overfitting, while a number of authors showed that by adding noise to the training data, generalisation is impaired even with relatively few basis-models. Regularisation is realised through the shrinkage of every newly-added basis-model to the expansion. This paper demonstrates that GB with shrinkage-based regularisation is still prone to overfitting in noisy datasets. We use a transformation based on a sigmoidal function for reducing the influence of extreme values in the residuals of a GB iteration without removing them from the training set. This extension is built on top of shrinkage-based regularisation. Simulations using synthetic, noisy data show that the proposed method slows-down overfitting and reduces the generalisation error of regularised GB. The proposed method is then applied to the inherently noisy domain of financial time-series modelling. Results suggest that for the majority of datasets the method generalises better when compared against standard regularised GB, as well as against a range of other time-series modelling methods.

Keywords

Boosting algorithms Gradient boosting Stagewise additive modelling Regularisation Financial time-series modelling Financial forecasting Feedforward neural networks Noisy data Ensemble learning 

References

  1. Agapitos A, Brabazon A, O’Neill M (2016) Genetic programming with memory for financial trading. In: Squillero G, Burelli P (eds) 19th European conference on the applications of evolutionary computation. Lecture Notes in Computer Science, vol 9597. Springer, Porto, pp 19–34Google Scholar
  2. Agapitos A, Dyson M, Lucas SM, Sepulveda F (2008) Learning to recognise mental activities: genetic programming of stateful classifiers for brain–computer interfacing. In: Keijzer M, Antoniol G, Congdon CB, Deb K, Doerr B, Hansen N, Holmes JH, Hornby GS, Howard D, Kennedy J, Kumar S, Lobo FG, Miller JF, Moore J, Neumann F, Pelikan M, Pollack J, Sastry K, Stanley K, Stoica A, Talbi EG, Wegener I (eds) GECCO ’08: Proceedings of the 10th annual conference on genetic and evolutionary computation. ACM, Atlanta, pp 1155–1162Google Scholar
  3. Agapitos A, Lucas SM (2006) Evolving efficient recursive sorting algorithms. In: Proceedings of the 2006 IEEE congress on evolutionary computation. IEEE Press, Vancouver, pp 9227–9234Google Scholar
  4. Agapitos A, Lucas SM (2006) Learning recursive functions with object oriented genetic programming. In: Proceedings of the 9th European conference on genetic programming. Lecture notes in computer science, vol 3905. Springer, Budapest, pp 166–177Google Scholar
  5. Agapitos A, Lucas SM (2007a) Evolving a statistics class using object oriented evolutionary programming. In: Ebner M, O’Neill M, Ekárt A, Vanneschi L, Esparcia-Alcázar AI (eds) Proceedings of the 10th European conference on genetic programming. Lecture notes in computer science, vol 4445. Springer, Valencia, pp 291–300Google Scholar
  6. Agapitos A, Lucas SM (2007b) Evolving modular recursive sorting algorithms. In: Ebner M, O’Neill M, Ekárt A, Vanneschi L, Esparcia-Alcázar AI (eds) Proceedings of the 10th European conference on genetic programming. Lecture notes in computer science, vol 4445. Springer, Valencia, pp 301–310Google Scholar
  7. Agapitos A, O’Neill M, Brabazon A (2010) Evolutionary learning of technical trading rules without data-mining bias. In: Schaefer R, Cotta C, Kolodziej J, Rudolph G (eds) PPSN 2010 11th international conference on parallel problem solving from nature. Lecture notes in computer science, vol 6238. Springer, Krakow, pp 294–303Google Scholar
  8. Agapitos A, O’Neill M, Brabazon A (2014) Ensemble Bayesian model averaging in genetic programming. In: Coello Coello CA (ed) Proceedings of the 2014 IEEE congress on evolutionary computation. Beijing, pp 2451–2458Google Scholar
  9. Agapitos A, O’Neill M, Brabazon A, Theodoridis T (2011a) Learning environment models in car racing using stateful genetic programming. In: Proceedings of the 2011 IEEE conference on computational intelligence and games. IEEE, Seoul, pp 219–226Google Scholar
  10. Agapitos A, O’Neill M, Brabazon A, Theodoridis T (2011b) Maximum margin decision surfaces for increased generalisation in evolutionary decision tree learning. In: Silva S, Foster JA, Nicolau M, Giacobini M, Machado P (eds) Proceedings of the 14th European conference on genetic programming, EuroGP 2011, LNCS, vol 6621. Springer, Turin, pp 61–72Google Scholar
  11. Agapitos A, O’Neill M, Kattan A, Lucas SM (2016) Recursion in tree-based genetic programming. Genetic programming and evolvable machines. (Online first)Google Scholar
  12. Agapitos A, Togelius J, Lucas SM (2007a) Evolving controllers for simulated car racing using object oriented genetic programming. In: Thierens D, Beyer HG, Bongard J, Branke J, Clark JA, Cliff D, Congdon CB, Deb K, Doerr B, Kovacs T, Kumar S, Miller JF, Moore J, Neumann F, Pelikan M, Poli R, Sastry K, Stanley KO, Stutzle T, Watson RA, Wegener I (eds) GECCO ’07: Proceedings of the 9th annual conference on genetic and evolutionary computation, vol 2. ACM Press, London, pp 1543–1550Google Scholar
  13. Agapitos A, Togelius J, Lucas SM (2007b) Multiobjective techniques for the use of state in genetic programming applied to simulated car racing. In: Srinivasan D, Wang L (eds) 2007 IEEE congress on evolutionary computation. IEEE Computational Intelligence Society, IEEE Press, Singapore, pp 1562–1569Google Scholar
  14. Agapitos A, Togelius J, Lucas SM, Schmidhuber J, Konstantinidis A (2008) Generating diverse opponents with multiobjective evolution. In: Proceedings of the 2008 IEEE symposium on computational intelligence and games. IEEE, PerthGoogle Scholar
  15. Angelova A, Abu-Mostafa Y, Perona P (2005) Pruning training sets for learning of object categories. In: Proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 01. IEEE Computer Society, Washington, DC, pp 494–501Google Scholar
  16. Audrino F, Barone-Adesi G (2005) Functional gradient descent for financial time series with an application to the measurement of market risk. J Bank Financ 29:959–977CrossRefGoogle Scholar
  17. Audrino F, Buhlmann P (2003) Functional gradient descent for financial time series with an application to the measurement of market risk. J Comput Financ 6:65–89CrossRefGoogle Scholar
  18. Bartlett PL, Traskin M (2007) Adaboost is consistent. J Mach Learn Res 8:2347–2368Google Scholar
  19. Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1–2):105–139CrossRefGoogle Scholar
  20. Brabazon A, O’Neill M (2004) Evolving technical trading rules for spot foreign-exchange markets using grammatical evolution. Comput Manag Sci 1(3):311–327CrossRefGoogle Scholar
  21. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140Google Scholar
  22. Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22(4):477–505CrossRefGoogle Scholar
  23. Cao L, Tay F (2001) Financial forecasting using support vector machines. Neural Comput Appl 10:184–192CrossRefGoogle Scholar
  24. Cao L, Tay F (2003) Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans Neural Netw 14(6):1506–1518CrossRefGoogle Scholar
  25. Carlin B, Polson N, Stoffer D (1992) A monte carlo approach to nonnormal and nonlinear state space modeling. J Stat Am Assoc 87:493–500CrossRefGoogle Scholar
  26. Chen R, Tsay R (1993) Functional-coefficient autoregressive models. J Am Stat Assoc 88:298–308Google Scholar
  27. Chen R, Tsay R (1993) Nonlinear additive ARX models. J Am Stat Assoc 88:955–967CrossRefGoogle Scholar
  28. Choi H, Lee M, Rhee M (1995) Trading S&P 500 stock index futures using a neural network. In: Annual international conference on artificial intelligence applications in Wall street, pp 63–72Google Scholar
  29. Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157CrossRefGoogle Scholar
  30. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Saitta L (ed) Machine Learning, Proceedings of the thirteenth international conference (ICML ’96), Bari, July 3–6. Morgan Kaufmann, pp 148–156Google Scholar
  31. Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28(2):337–407CrossRefGoogle Scholar
  32. Friedman JH (1999) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378CrossRefGoogle Scholar
  33. Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232CrossRefGoogle Scholar
  34. Galvan-Lopez E, Fagan D, Murphy E, Swafford JM, Agapitos A, O’Neill M, Brabazon A (2010) Comparing the performance of the evolvable PiGrammatical evolution genotype-phenotype map to grammatical evolution in the dynamic Ms. Pac-Man environment. In: 2010 IEEE World congress on computational intelligence. IEEE Computational Intelligence Society, IEEE Press, Barcelona, pp 1587–1594Google Scholar
  35. Gavrishchaka VV (2006) Boosting-based frameworks in financial modeling: application to symbolic volatility forecasting. In: Fomby TB, Terrell D (eds) Econometric analysis of financial and economic time series, vol 20. Emerald Group Publishing Limited, Bingley, pp 123–151CrossRefGoogle Scholar
  36. Granger C, Andersen A (1978) An introduction to bilinear time series models. Vandenhoek and Ruprecht, GottingenGoogle Scholar
  37. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18CrossRefGoogle Scholar
  38. Hamilton J (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57:357–384CrossRefGoogle Scholar
  39. Ince H (2006) Non-parametric regression methods. Comput Manag Sci 3(2):161–174CrossRefGoogle Scholar
  40. Jiang W (2004) Process consistency for adaboost. Ann Stat 32(1):13–29CrossRefGoogle Scholar
  41. Kamijo K, Tanigawa T (1990) Stock price pattern recognition: a recurrent neural network approach. In: The international joint conference on neural networks, pp 215–221Google Scholar
  42. Kaufman P (2005) New trading systems and methods, 4th edn. Wiley, LondonGoogle Scholar
  43. Kimoto T, Asakawa K, Yoda M, Takeoma M (1990) Stock market prediction system with modular neural network. In: The international joint conference on neural networks, pp 1–6Google Scholar
  44. Lahmiri S (2011) A comparison of ppn and svm for stock market prediction using economic and technical information. Int J Comput Appl 29(3):24–30Google Scholar
  45. Lewis P, Stevens J (1991) Nonlinear modeling of time-series using multivariate adaptive regression splines. J Am Stat Assoc 86:864–877CrossRefGoogle Scholar
  46. Maclin R, Opitz D (1997) American Association for Artificial Intelligence. In: Proceedings of the National Conference on Artificial Intelligence, vol 14. AAAI Press, pp 546–551Google Scholar
  47. Mason L, Baxter J, Bartlett PL, Frean MR (1999) Boosting algorithms as gradient descent. In: Solla SA, Leen TK, Müller K (eds) Advances in neural information processing systems 12, [NIPS Conference, Denver, Colorado, USA, November 29–December 4, 1999]. The MIT Press, pp 512–518Google Scholar
  48. Matas J, Febrero-Bande M, Gonzlez-Manteiga W, Reboredo J (2010) Boosting garch and neural networks for the prediction of heteroskedastic time series. Math Comput Model 51(34):256–271CrossRefGoogle Scholar
  49. McDermott J, Agapitos A, Brabazon A, O’Neill M (2014) Geometric semantic genetic programming for financial data. In: Esparcia-Alcazar AI, Mora AM (eds) 17th European conference on the applications of evolutionary computation, LNCS, vol 8602. Springer, Granada, pp 215–226Google Scholar
  50. Mease D, Wyner A (2008) Evidence contrary to the statistical view of boosting. J Mach Learn Res 9:131–156Google Scholar
  51. Mease D, Wyner AJ, Buja A (2007) Cost-weighted boosting with jittering and over/under-sampling: Jous-boost. J Mach Learn Res 8:409–439Google Scholar
  52. Merler S, Caprile B, Furlanello C (2004) Bias-variance control via hard points shaving. IJPRAI 18(5):891–903Google Scholar
  53. Muhlenbach F, Lallich S, Zighed DA (2004) Identifying and handling mislabelled instances. J Intell Inf Syst 22(1):89–109CrossRefGoogle Scholar
  54. Priestley M (1980) State-dependent models: a general approach to nonlinear time-series analysis. J Time Ser Anal 1:47–71CrossRefGoogle Scholar
  55. Qin Q, Wang QG, Li J, Ge SS (2013) Linear and nonlinear trading models with gradient boosted random forests and application to singapore stock market. J Intell Learn Syst Appl 5:1–10Google Scholar
  56. Rätsch G, Onoda T, Müller K (1998) Regularizing adaboost. In: Kearns MJ, Solla SA, Cohn DA (eds) Advances in neural information processing systems 11, [NIPS Conference, Denver, Colorado, USA, November 30–December 5, 1998]. The MIT Press, pp 564–570Google Scholar
  57. Rosset S (2005) Robust boosting and its relation to bagging. In: Grossman R, Bayardo R, Bennett KP (eds) KDD. ACM, pp 249–255Google Scholar
  58. Rätsch G, Onoda T, Müller KR (2001) Soft margins for adaboost. Mach Learn 42(3):287–320CrossRefGoogle Scholar
  59. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. The MIT Press, CambridgeGoogle Scholar
  60. Takenouchi T, Eguchi S (2004) Robustifying adaboost by adding the naive error rate. Neural Comput 16(4):767–787CrossRefGoogle Scholar
  61. Tay F, Cao L (2001) Improved financial time series forecasting by combining support vector machines with self-organising feature map. Intell Data Anal 5:339–354Google Scholar
  62. Tay F, Cao L (2002) Modified support vector machines in financial time series forecasting. Neurocomputing 48:847–861CrossRefGoogle Scholar
  63. Theodoridis T, Agapitos A, Hu H (2010) A QA-TSK fuzzy model vs evolutionary decision trees towards nonlinear action pattern recognition. In: Proceedings of the 2010 IEEE international conference on information and automation. IEEE, Harbin, pp 1813–1818Google Scholar
  64. Tong H (1978) On a threshold model. In: Chen C (ed) Pattern recognition and signal processing. NATO ASI Series E: Applied Sc.(29). Sijthoff & Noordhoff, Netherlands, pp 575–586. ISBN 9789028609785Google Scholar
  65. Trevor H, Robert T, Jerome F (2009) The elements of statistical learning, 2nd edn. Springer, BerlinGoogle Scholar
  66. Trippi R (1992) DeSieno: trading equity index futures with a neural network. J Portf Manag 19:27–33CrossRefGoogle Scholar
  67. Tsay R (2010) Analysis of financial time series, 3rd edn. Wiley, New YorkCrossRefGoogle Scholar
  68. Tuite C, Agapitos A, O’Neill M, Brabazon A (2011) Early stopping criteria to counteract overfitting in genetic programming. In: Krasnogor N, Lanzi PL, Engelbrecht A, Pelta D, Gershenson C, Squillero G, Freitas A, Ritchie M, Preuss M, Gagne C, Ong YS, Raidl G, Gallager M, Lozano J, Coello-Coello C, Silva DL, Hansen N, Meyer-Nieberg S, Smith J, Eiben G, Bernado-Mansilla E, Browne W, Spector L, Yu T, Clune J, Hornby G, Wong ML, Collet P, Gustafson S, Watson JP, Sipper M, Poulding S, Ochoa G, Schoenauer M, Witt C, Auger A (eds) GECCO ’11: Proceedings of the 13th annual conference companion on genetic and evolutionary computation. ACM, Dublin, pp 203–204Google Scholar
  69. Tuite C, Agapitos A, O’Neill M, Brabazon A (2012) Tackling overfitting in evolutionary-driven financial model induction. In: Brabazon A, O’Neill M, Maringer D (eds) Natural computing in computational finance (Volume 4). Studies in computational intelligence, vol 380, chap. 8. Springer, pp 141–161Google Scholar
  70. Vezhnevets A, Barinova O (2007) Avoiding boosting overfitting by removing confusing samples. In: Kok JN, Koronacki J, de Mántaras RL, Matwin S, Mladenic D, Skowron A (eds) Machine learning: ECML 2007, 18th European conference on machine learning, Warsaw, Poland, September 17–21, 2007, Proceedings, Lecture notes in computer science, vol 4701. Springer, pp 430–441Google Scholar
  71. Yoon Y, Swales G (1991) Predicting stock price performance: a neural network approach. In: The 24th annual Hawaii international conference on system sciences, pp 156–162Google Scholar
  72. Zhang T, Yu B (2005) Boosting with early stopping: convergence and consistency. Ann Stat 33(4):1538–1579CrossRefGoogle Scholar
  73. Zheng Z (2006) Boosting and bagging of neural networks with applications to financial time series. Technical reportGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  • Alexandros Agapitos
    • 1
  • Anthony Brabazon
    • 2
  • Michael O’Neill
    • 2
  1. 1.School of Computer ScienceUniversity College DublinDublinIreland
  2. 2.School of BusinessUniversity College DublinDublinIreland

Personalised recommendations