Abstract
Minimum Classification Error (MCE) training, which has been widely used as one of the recent standards of discriminative training for classifiers, is characterized by a smooth sigmoidal-form classification error count loss. The smoothness of this loss function effectively increases training robustness to unseen samples, well approximates the ultimate, minimum classification error probability status, and leads to accurate classification over unseen samples. However, few rational methods have been developed for controlling the smoothness, which is often determined through many repetitions of the experimental setting; this empirical approach has been a disincentive to the increased popularization of MCE training. To alleviate this long-standing problem, we propose a new MCE training method that automatically determines loss smoothness. The proposed method is based on Parzen-estimation-based MCE re-formalization, and the loss smoothness degree is determined so that Parzen distribution can be an accurate approximation to the unknown true distribution, whose positive-domain integration corresponds to classification error probability, in one-dimensional misclassification measure space. Through systematic experiments, we show that the proposed method efficiently yields a classification accuracy that nearly matches the best accuracy obtained by the conventional, trial-and-error-mode repetitions of smoothness setting.
Similar content being viewed by others
Notes
If multiple data points have the same value, they are all removed.
References
Juang, B.-H., & Katagiri, S. (1992). Discriminative learning for minimum error classification. IEEE Transactions on Signal Processing, 40(12), 3043–3054.
McDermott, E., & Katagiri, S. (2004). A derivation of minimum classification error from the theoretical classification risk using Parzen estimation. Computer Speech and Language, 18, 107–122.
Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers, EC-16, 299–307.
Parzen, E. (1962). On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33, 1065–1076.
Bishop, C.M. (2006). Pattern recognition and machine learning. New York: Springer.
Watanabe, H., Tokuno, J., Ohashi, T., Katagiri, S., Ohsaki, M. (2011). Minimum classification error training with automatic setting of loss smoothness. Proc. IEEE MLSP2011.
Tokuno, J., Ohashi, T., Watanabe, H., Katagiri, S., Ohsaki, M. (2011). Minimum classification error training with automatic control of loss smoothness. Proc. IEEE TENCON2011 (pp. 270–274).
Silverman, B.W. (1986). Density estimation for statistics and data analysis. Boca Raton/London: Chapman & Hall/CRC.
Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers, C-25, 1175–1179.
He, X., Deng, L., Chou, W. (2008). Discriminative learning in sequential pattern recognition—a unifying review for optimization-oriented speech recognition. IEEE Signal Processing Magazine, 25(5), 14–36.
Jiang, H. (2010). Discriminative training of HMMs for automatic speech recognition: a survey. Computer Speech & Language, 24(4), 589–608.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported in part by Grant-in-Aid for Scientific Research (B), No. 22300064.
Rights and permissions
About this article
Cite this article
Watanabe, H., Tokuno, J., Ohashi, T. et al. Minimum Classification Error Training Incorporating Automatic Loss Smoothness Determination. J Sign Process Syst 74, 311–322 (2014). https://doi.org/10.1007/s11265-013-0746-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-013-0746-2