Advertisement

Generalising Random Forest Parameter Optimisation to Include Stability and Cost

  • C. H. Bryan Liu
  • Benjamin Paul Chamberlain
  • Duncan A. Little
  • Ângelo Cardoso
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10536)

Abstract

Random forests are among the most popular classification and regression methods used in industrial applications. To be effective, the parameters of random forests must be carefully tuned. This is usually done by choosing values that minimize the prediction error on a held out dataset. We argue that error reduction is only one of several metrics that must be considered when optimizing random forest parameters for commercial applications. We propose a novel metric that captures the stability of random forest predictions, which we argue is key for scenarios that require successive predictions. We motivate the need for multi-criteria optimization by showing that in practical applications, simply choosing the parameters that lead to the lowest error can introduce unnecessary costs and produce predictions that are not stable across independent runs. To optimize this multi-criteria trade-off, we present a new framework that efficiently finds a principled balance between these three considerations using Bayesian optimisation. The pitfalls of optimising forest parameters purely for error reduction are demonstrated using two publicly available real world datasets. We show that our framework leads to parameter settings that are markedly different from the values discovered by error reduction metrics alone.

Keywords

Bayesian optimisation Parameter tuning Random forest Machine learning application Model stability 

References

  1. 1.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MATHGoogle Scholar
  2. 2.
    Breiman, L.: Heuristics of instability in model selection. Ann. Stat. 24(6), 2350–2383 (1996)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Chamberlain, B.P., Cardoso, A., Liu, C.H.B., Pagliari, R., Deisenroth, M.P.: Customer lifetime value prediction using embeddings. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1753–1762 (2017)Google Scholar
  4. 4.
    Criminisi, A.: Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends® Comput. Graph. Vis. 7(2–3), 81–227 (2012)MATHGoogle Scholar
  5. 5.
    Elisseeff, A., Evgeniou, T., Pontil, M.: Stability of randomized learning algorithms. J. Mach. Learn. Res. 6(1), 55–79 (2005)MathSciNetMATHGoogle Scholar
  6. 6.
    Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D., Amorim Fernández-Delgado, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133–3181 (2014)MathSciNetMATHGoogle Scholar
  7. 7.
    Hoffman, M.W., Shahriari, R.: Modular mechanisms for Bayesian optimization. In: NIPS Workshop on Bayesian Optimization (2014)Google Scholar
  8. 8.
    Huang, B.F.F., Boutros, P.C.: The parameter sensitivity of random forests. BMC Bioinform. 17(1), 331 (2016)CrossRefGoogle Scholar
  9. 9.
    Khalilia, M., Chakraborty, S., Popescu, M.: Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inf. Dec. Making 11(1), 51 (2011)CrossRefGoogle Scholar
  10. 10.
    Kushner, H.J.: A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Basic Eng. 86(1), 97–106 (1964)CrossRefGoogle Scholar
  11. 11.
    Martinez-Cantin, R.: BayesOpt: a bayesian optimization library for nonlinear optimization, experimental design and bandits. J. Mach. Learn. Res. 15, 3735–3739 (2014)MathSciNetMATHGoogle Scholar
  12. 12.
    Močkus, J.: On bayesian methods for seeking the extremum. In: Marchuk, G.I. (ed.) Optimization Techniques 1974. LNCS, vol. 27, pp. 400–404. Springer, Heidelberg (1975).  https://doi.org/10.1007/3-540-07165-2_55 CrossRefGoogle Scholar
  13. 13.
    Snoek, J., Larochelle, H., Adams, R.: Practical bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems, pp. 2951–2959 (2012)Google Scholar
  14. 14.
    Tamaddoni, A., Stakhovych, S., Ewing, M.: Comparing churn prediction techniques and assessing their performance: a contingent perspective. J. Serv. Res. 19(2), 123–141 (2016)CrossRefGoogle Scholar
  15. 15.
    Vanderveld, A., Pandey, A., Han, A., Parekh, R.: An engagement-based customer lifetime value system for e-commerce. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 293–302 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • C. H. Bryan Liu
    • 1
  • Benjamin Paul Chamberlain
    • 2
  • Duncan A. Little
    • 1
  • Ângelo Cardoso
    • 1
  1. 1.ASOS.comLondonUK
  2. 2.Department of ComputingImperial College LondonLondonUK

Personalised recommendations