Confidence in predictions from random tree ensembles

Bhattacharyya, Siddhartha

doi:10.1007/s10115-012-0600-z

Confidence in predictions from random tree ensembles

Regular paper
Published: 09 January 2013

Volume 35, pages 391–410, (2013)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Siddhartha Bhattacharyya¹

607 Accesses
16 Citations
Explore all metrics

Abstract

Obtaining an indication of confidence of predictions is desirable for many data mining applications. Predictions complemented with confidence levels can inform on the certainty or extent of reliability that may be associated with the prediction. This can be useful in varied application contexts where model outputs form the basis for potentially costly decisions, and in general across risk sensitive applications. The conformal prediction framework presents a novel approach for obtaining valid confidence measures associated with predictions from machine learning algorithms. Confidence levels are obtained from the underlying algorithm, using a non-conformity measure which indicates how ‘atypical’ a given example set is. The non-conformity measure is a key to determining the usefulness and efficiency of the approach. This paper considers inductive conformal prediction in the context of random tree ensembles like random forests, which have been noted to perform favorably across problems. Focusing on classification tasks, and considering realistic data contexts including class imbalance, we develop non-conformity measures for assessing the confidence of predicted class labels from random forests. We examine the performance of these measures on multiple data sets. Results demonstrate the usefulness and validity of the measures, their relative differences, and highlight the effectiveness of conformal prediction random forests for obtaining predictions with associated confidence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Basilico JD, Munson MA, Kolda TG, Dixon KR, Kegelmeyer WP (2011) COMET: a recipe for learning and using large ensembles on massive data. In: Proceedings of the 2011 IEEE international conference on data mining (ICDM 2011), pp 41–50
Bhattacharyya S, Jha S, Tharakunnel K, Westland JC (2011) Data mining for credit card fraud: a comparative study. Decis Support Syst 50(3):602–613
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Google Scholar
Breiman L, Cutler A (2005) Random forest. http://www.math.usu.edu/~adele/forests
Caruana R, Karampatziakis R, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 25th international conference on machine learning (ICML ’08), pp 96–103
Chen C, Liaw A, Breiman L (2004) Using random forest to learn imbalanced data. Technical Report 666. University of California at Berkeley, Statistics Department 2004
Deodhar M, Ghosh J (2009) Mining for the most certain predictions from dyadic data. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’09), pp 249–258
Devetyarov D, Nouretdinov I (2010) Prediction with confidence based on a random forest classifier. In: Proceedings of AIAI, 2010, pp 37–44
Dietterich TG (2002) Ensemble learning. In: Arbib MA (ed) The handbook of brain theory and neural networks, 2nd edn. The MIT Press, Cambridge, MA
Gammerman A, Vovk V (2007) Hedging predictions in machine learning. Comput J 50(2):151–163
Article Google Scholar
Heskes T (1997) Practical confidence and prediction intervals. Adv Neural Inf Process Syst (NIPS’97) 9:176–82
Google Scholar
Hulse JV, Khoshgoftaar TM, Napolitano A (2007) Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on machine learning (ICML ’07), pp 935–942
Lambrou A, Papadopoulos H, Gammerman A (2011) Reliable confidence measures for medical diagnosis with evolutionary algorithms. IEEE Trans Inf Technol Biomed 15(1):93–99
Article Google Scholar
Laxhammar R, Falkman G (2010) Conformal prediction for distribution-independent anomaly detection in streaming vessel data. In: Proceedings of the first international workshop on novel data stream pattern mining techniques (StreamKDD ’10), pp 47–55
Melluish T, Saunders C, Nouretdinov I, Vovk V (2001) Comparing the Bayes and typicalness frameworks. In: Proceedings of the 12th European conference on machine learning (EMCL ’01), pp 360–371
Papadopoulos H, Vovk V, Gammerman A (2007) Conformal prediction with neural networks. In: Proceedings of the 19th IEEE international conference on tools with artificial intelligence, vol 2, pp 388–395
Shafer G, Vovk V (2008) A tutorial on conformal prediction. J Mach Learn Res 9:371–421
MathSciNet MATH Google Scholar
Shrestha D, Solomatine D (2006) Machine learning approaches for estimation of prediction interval for the model output. Neural Netw 19(2):225–235
Article MATH Google Scholar
Statnikov A, Wang L, Aliferis CF (2008) A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinf 9:319–324
Google Scholar
Vens C, Costa F (2011) Random forest based feature induction. In: 2011 IEEE 11th international conference on data mining(ICDM, 2011), pp 744–753
Verikas A, Gelzinis A, Bacauskiene M (2011) Mining data with random forests: a survey and results of new tests. Pattern Recognit 44:2, 330–349
Google Scholar
Vovk V, Gammerman A, Shafer G (2005) Algorithmic learning in a random world. Springer, New York
MATH Google Scholar
Whitrow C, Hand DJ, Juszczak P, Weston D, Adams NM (2009) Transaction aggregation as a strategy for credit card fraud detection. Data Min Knowl Discov 18(1):30–55
Article MathSciNet Google Scholar
Wang B, Japkowicz N (2010) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst 25(1):1–20
Article Google Scholar
Wang H, Lin C, Yang F, Hu X (2009) Hedged predictions for traditional Chinese chronic gastritis diagnosis with confidence machine. Comput Biol Med 39:5, 425–432
Google Scholar
Yang F, Wang H, Mi H, Lin C, Cai W (2009) Using random forest for reliable classification and cost-sensitive learning for medical diagnosis. BMC Bioinf 10(Suppl 1):S22
Google Scholar

Download references

Author information

Authors and Affiliations

Information and Decision Sciences, College of Business Administration, University of Illinois, Chicago, IL, USA
Siddhartha Bhattacharyya

Authors

Siddhartha Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siddhartha Bhattacharyya.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhattacharyya, S. Confidence in predictions from random tree ensembles. Knowl Inf Syst 35, 391–410 (2013). https://doi.org/10.1007/s10115-012-0600-z

Download citation

Received: 13 March 2012
Revised: 27 August 2012
Accepted: 04 December 2012
Published: 09 January 2013
Issue Date: May 2013
DOI: https://doi.org/10.1007/s10115-012-0600-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Confidence in predictions from random tree ensembles

Abstract

Access this article

Similar content being viewed by others

Efficient Venn predictors using random forests

Explainable Ensemble Trees

CHIRPS: Explaining random forest classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Confidence in predictions from random tree ensembles

Abstract

Access this article

Similar content being viewed by others

Efficient Venn predictors using random forests

Explainable Ensemble Trees

CHIRPS: Explaining random forest classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation