A boosting method with asymmetric mislabeling probabilities which depend on covariates

Hayashi, Kenichi

doi:10.1007/s00180-011-0250-8

A boosting method with asymmetric mislabeling probabilities which depend on covariates

Original Paper
Published: 24 March 2011

Volume 27, pages 203–218, (2012)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Kenichi Hayashi¹

160 Accesses
6 Citations
Explore all metrics

Abstract

A new boosting method for a kind of noisy data is developed, where the probability of mislabeling depends on the label of a case. The mechanism of the model is based on a simple idea and gives natural interpretation as a mislabel model. The boosting algorithm is derived from an extension of the exponential loss function, which provides the AdaBoost algorithm. A connection between the proposed method and an asymmetric mislabel model is shown. It is also shown that the loss function proposed constructs a classifier which attains the minimum error rate for a true label. Numerical experiments illustrate how well the proposed method performs in comparison to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bartlett PL, Traskin M (2007) AdaBoost is consistent. J Mach Learn Res 8: 2347–2368
MathSciNet MATH Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 26: 123–140
Google Scholar
Breiman L (2001) Random forests. Mach Learn 45: 5–32
Article MATH Google Scholar
Chhikara RS, McKeon JJ (1984) Linear discriminant analysis with misallocation in training samples. J Stat Am Assoc 79: 899–906
MathSciNet MATH Google Scholar
Copas JB (1988) Binary regression models for contaminated data. J R Stat Soc Series B 50: 225–265
MathSciNet Google Scholar
Domingo C, Watanabe O (2000) MadaBoost: a modification of AdaBoost. In: Proceedings of the 13th conference on computational learning theory. Morgan Kaufmann, San Francisco, pp 180–189
Ekholm A, Palmgren J (1982) A model for a binary response with misclassifications. In: GLIM82: proceedings of international conference on generalized linear models. Springer, Berlin, pp 128–143
Fleuret F (2009) Multi-layer boosting for pattern recognition. Pattern Recognit Lett 30: 237–241
Article Google Scholar
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 5: 1189–1232
Article Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 5: 119–139
Article MathSciNet Google Scholar
Hayashi K, Shimizu Y, Kano Y (2008) Consistency of penalized risk of boosting methods in binary classification. New Trends in Psychometrics Universal Academic Press, pp 87–96
Hayashi K (2010, submitted) A simple extension of AdaBoost for asymmetric mislabeled data
Kanamori T, Takenouchi T, Eguchi S, Murata N (2007) Robust loss functions for boosting. Neural Comput, 2183–2244
Kawakita M, Ikeda S, Eguchi S (2006) A Bridge between boosting and a kernel machine, the Institute of Statistical Mathematics Research Memorandum No. 1006, 13
Lachenbruch PA (1966) Discriminant analysis when the initial samples are misclassified. Technometrics 8: 657–662
Article MathSciNet Google Scholar
Lachenbruch PA (1975) Discriminant analysis. Hafner Press, New York
MATH Google Scholar
Lebanon G, Lafferty J (2001) Boosting and maximum likelihood for exponential models. In: Advances in neural information processing systems, 11. MIT Press, Cambridge
Lugosi G, Vayatis N (2004) On the Bayes-risk of consistency of regularized boosting methods. Ann Stat 32: 30–55
MathSciNet MATH Google Scholar
Malossini A, Blanzieri E, Ng R (2006) Detecting potential labeling errors in microarrays by data perturbation. Bioinformatics 22: 2114–2121
Article Google Scholar
Mason L, Baxter J, Bartlett P, Frean M (1999) Boosting algorithm as gradient descent in function space. In: Advances in neural information processing systems 11, MIT Press, Cambridge
Mease D, Wyner AJ, Buja A (2007) Boosted classification trees and class probability/quantile estimation. J Mach Learn Res 8: 409–439
MATH Google Scholar
Murata N, Takenouchi T, Kanamori T, Eguchi S (2004) Information geometry of U-Boost and Bregman divergence. Neural Comput 16: 1437–1481
Article MATH Google Scholar
Rätsch G, Onoda T, Müller K-R (2001) Soft margins for AdaBoost. Mach Learn 42: 287–320
Article MATH Google Scholar
Sano N, Suzuki H, Koda M (2004) A robust boosting method for mislabeled data. J Oper Res Soc Jpn 47(3): 182–196
MathSciNet MATH Google Scholar
Shmiovici A, Ben-Gal I (2007) Using a VOM model for reconstructing potential coding regions in EST sequences. Comput Stat 22: 49–69
Article Google Scholar
Takenouchi T, Eguchi S (2004) Robustifying AdaBoost by adding the naive error rate. Neural Comput 16: 767–787
Article MATH Google Scholar
Viola P, Jones M (2001) Fast and robust classification using asymmetric AdaBoost and detector cascade. Neural Inf Process Syst 14: 1311–1318
Google Scholar
White H (1982) Maximum likelihood estimation of misspecirfied models. Econometrica 50: 1–25
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Division of Mathematical Science, Department of System Innovation, Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama-cho, Toyonaka, Osaka, 560-8534, Japan
Kenichi Hayashi

Authors

Kenichi Hayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kenichi Hayashi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hayashi, K. A boosting method with asymmetric mislabeling probabilities which depend on covariates. Comput Stat 27, 203–218 (2012). https://doi.org/10.1007/s00180-011-0250-8

Download citation

Received: 21 December 2009
Accepted: 14 February 2011
Published: 24 March 2011
Issue Date: June 2012
DOI: https://doi.org/10.1007/s00180-011-0250-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A boosting method with asymmetric mislabeling probabilities which depend on covariates

Abstract

Access this article

Similar content being viewed by others

A Study on the Noise Label Influence in Boosting Algorithms: AdaBoost, GBM and XGBoost

De-noising boosting methods for variable selection and estimation subject to error-prone variables

Robust Algorithms via PAC-Bayes and Laplace Distributions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A boosting method with asymmetric mislabeling probabilities which depend on covariates

Abstract

Access this article

Similar content being viewed by others

A Study on the Noise Label Influence in Boosting Algorithms: AdaBoost, GBM and XGBoost

De-noising boosting methods for variable selection and estimation subject to error-prone variables

Robust Algorithms via PAC-Bayes and Laplace Distributions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation