Accelerating difficulty estimation for conformal regression forests

  • Henrik BoströmEmail author
  • Henrik Linusson
  • Tuve Löfström
  • Ulf Johansson
Open Access


The conformal prediction framework allows for specifying the probability of making incorrect predictions by a user-provided confidence level. In addition to a learning algorithm, the framework requires a real-valued function, called nonconformity measure, to be specified. The nonconformity measure does not affect the error rate, but the resulting efficiency, i.e., the size of output prediction regions, may vary substantially. A recent large-scale empirical evaluation of conformal regression approaches showed that using random forests as the learning algorithm together with a nonconformity measure based on out-of-bag errors normalized using a nearest-neighbor-based difficulty estimate, resulted in state-of-the-art performance with respect to efficiency. However, the nearest-neighbor procedure incurs a significant computational cost. In this study, a more straightforward nonconformity measure is investigated, where the difficulty estimate employed for normalization is based on the variance of the predictions made by the trees in a forest. A large-scale empirical evaluation is presented, showing that both the nearest-neighbor-based and the variance-based measures significantly outperform a standard (non-normalized) nonconformity measure, while no significant difference in efficiency between the two normalized approaches is observed. The evaluation moreover shows that the computational cost of the variance-based measure is several orders of magnitude lower than when employing the nearest-neighbor-based nonconformity measure. The use of out-of-bag instances for calibration does, however, result in nonconformity scores that are distributed differently from those obtained from test instances, questioning the validity of the approach. An adjustment of the variance-based measure is presented, which is shown to be valid and also to have a significant positive effect on the efficiency. For conformal regression forests, the variance-based nonconformity measure is hence a computationally efficient and theoretically well-founded alternative to the nearest-neighbor procedure.


Conformal prediction Nonconformity measures Regression Random forests 

Mathematics Subject Classification (2010)

62G08 62G15 62J02 62M20 


  1. 1.
    Bache, K., Lichman, M.: UCI machine learning repository (2013).
  2. 2.
    Boström, H.: Forests of probability estimation trees. IJPRAI 26(2) (2012)Google Scholar
  3. 3.
    Boström, H., Linusson, H., Löfström, T., Johansson, U.: Evaluation of a variance-based nonconformity measure for regression forests. In: Conformal and Probabilistic Prediction with Applications - 5th International Symposium, COPA 2016, Madrid, Spain, April 20-22, 2016, Proceedings, pp. 75–89 (2016)Google Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)zbMATHGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  6. 6.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Gammerman, A., Vovk, V., Vapnik, V.: Learning by transduction. In: Proceedings of the Fourteenth conference on Uncertainty in Artificial Intelligence, pp. 148–155. Morgan Kaufmann (1998)Google Scholar
  8. 8.
    Johansson, U., Boström, H., Löfström, T., Linusson, H.: Regression conformal prediction with random forests. Mach. Learn. 97(1-2), 155–176 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Löfström, T., Johansson, U., Boström, H.: Effective utilization of data in inductive conformal prediction. In: The 2013 international joint conference on neural networks (IJCNN). IEEE (2013)Google Scholar
  10. 10.
    Papadopoulos, H.: Inductive conformal prediction: Theory and application to neural networks. Tools in Artificial Intelligence 18(315-330), 2 (2008)Google Scholar
  11. 11.
    Papadopoulos, H., Gammerman, A., Vovk, V.: Normalized nonconformity measures for regression conformal prediction In: Proceedings of the IASTED International Conference on Artificial Intelligence and Applications (AIA 2008), pp. 64–69 (2008)Google Scholar
  12. 12.
    Papadopoulos, H., Haralambous, H.: Reliable prediction intervals with regression neural networks. Neural Netw. 24(8), 842–851 (2011)CrossRefGoogle Scholar
  13. 13.
    Papadopoulos, H., Proedrou, K., Vovk, V., Gammerman, A.: Inductive confidence machines for regression. In: Machine Learning: ECML 2002, pp. 345–356. Springer (2002)Google Scholar
  14. 14.
    Papadopoulos, H., Vovk, V., Gammerman, A.: Regression conformal prediction with nearest neighbours. J. Artif. Intell. Res. 40(1), 815–840 (2011)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Rasmussen, C.E., Neal, R.M., Hinton, G., van Camp, D., Revow, M., Ghahramani, Z., Kustra, R., Tibshirani, R.: Delve data for evaluating learning in valid experiments (1996).
  16. 16.
    Vovk, V.: Cross-conformal predictors. Ann. Math. Artif. Intell. 74(1-2), 9–28 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Vovk, V., Gammerman, A., Shafer, G.: Algorithmic learning in a random world. Springer (2006)Google Scholar

Copyright information

© The Author(s) 2017

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Department of Computer and Systems SciencesStockholm UniversityStockholmSweden
  2. 2.Department of Information TechnologyUniversity of BoråsBoråsSweden
  3. 3.Department of Computer Science and InformaticsJönköping UniversityJönköpingSweden

Personalised recommendations