Pattern Analysis and Applications

, Volume 7, Issue 1, pp 2–17 | Cite as

Confidence Transformation for Combining Classifiers

Original Article


This paper investigates a number of confidence transformation methods for measurement-level combination of classifiers. Each confidence transformation method is the combination of a scaling function and an activation function. The activation functions correspond to different types of confidences: likelihood (exponential), log-likelihood, sigmoid, and the evidence combination of sigmoid measures. The sigmoid and evidence measures serve as approximates to class probabilities. The scaling functions are derived by Gaussian density modeling, logistic regression with variable inputs, etc. We test the confidence transformation methods in handwritten digit recognition by combining variable sets of classifiers: neural classifiers only, distance classifiers only, strong classifiers, and mixed strong/weak classifiers. The results show that confidence transformation is efficient to improve the combination performance in all the settings. The normalization of class probabilities to unity of sum is shown to be detrimental to the combination performance. Comparing the scaling functions, the Gaussian method and the logistic regression perform well in most cases. Regarding the confidence types, the sigmoid and evidence measures perform well in most cases, and the evidence measure generally outperforms the sigmoid measure. We also show that the confidence transformation methods are highly robust to the validation sample size in parameter estimation.


Classifier combination Confidence transformation Evidence combination Gaussian modeling Logistic regression Pattern classification 


  1. 1.
    Mandler E, Schürman J. Combining the classification results of independent classifiers based on the Dempster-Shafer theory of evidence. In: Gelsema ES, Kanal LN (eds). Pattern Recognition and Artificial Intelligence. Elsevier, 1988, pp.381–393.Google Scholar
  2. 2.
    Xu L, Krzyzak A, Suen CY. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Trans. System, Man, and Cybernetics 1992; 22(3): 418–435.Google Scholar
  3. 3.
    Ho TK, Hull J, Srihari SN. Decision combination in multiple classifier systems. IEEE Trans. Pattern Analysis and Machine Intelligence 1994; 16(1): 66–75.Google Scholar
  4. 4.
    Kittler J, Hatef M, Duin RPW, Matas J. On combining classifiers. IEEE Trans. Pattern Analysis and Machine Intelligence 1998; 20(3): 226–239.Google Scholar
  5. 5.
    Suen CY, Lam L. Multiple classifier combination methodologies for different output levels. In: Kittler J, Roli F (eds). Multiple Classifier Systems, LNCS 1857. Springer, 2000, pp.52–66.Google Scholar
  6. 6.
    Rahman AF, Fairhurst MC. A novel confidence-based framework for multiple expert decision fusion. In: Carter N, Nixon MS (eds). Proc. 9th British Machine Vision Conference, 1998.Google Scholar
  7. 7.
    Bengio S, Marcel C, Marcel S, Mariethoz J. Confidence measures for multimodal identity identification. Information Fusion 2002; 3(4): 267–276.Google Scholar
  8. 8.
    Duin RPW. The combining classifiers: to train or not to train. In: Proc. 16th International Conference on Pattern Recognition, Vol.2. Quebec, Canada, 2002, pp.765–770.Google Scholar
  9. 9.
    Liu CL, Nakagawa M. Precise candidate selection for large character set recognition by confidence evaluation. IEEE Trans. Pattern Analysis and Machine Intelligence 2000; 22(6): 636–642.Google Scholar
  10. 10.
    Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW. The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Trans. Neural Networks 1990; 1(4): 296–298.Google Scholar
  11. 11.
    Richard MD, Lippmann RP. Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation 1991; 4:461–483.Google Scholar
  12. 12.
    Duda RO, Hart PE, Stork DG, Pattern Classification, 2nd edition. Wiley-Interscience, 2001.Google Scholar
  13. 13.
    Cordella LP, Foggia P, Sansone C, Tortorella F, Vento M. Reliability parameters to improve combination strategies in multi-expert systems. Pattern Analysis and Applications 1999; 2(3): 205–214.Google Scholar
  14. 14.
    Atukorale AS, Suganthan PN. Combining classifiers based on confidence values. In Proc. 5th International Conference on Document Analysis and Recognition. Bangalore, India, 1999, pp.37–40.Google Scholar
  15. 15.
    Lin X, Ding X, Chen M, Zhang R, Wu Y. Adaptive confidence transform based classifier combination for Chinese character recognition. Pattern Recognition Letters 1998; 19:975–988.Google Scholar
  16. 16.
    Denker JS, LeCun Y. Transforming neural-net output levels to probability distribution. In: Lippmann RP, Moody JE, Touretzky DS (eds). Advances in Neural Information Processing 3. Morgan Kauffman, 1991, pp.853–859.Google Scholar
  17. 17.
    Hoekstra A, Tholen SA, Duin RPW. Estimating the reliability of neural network classification. In Proc. International Conference on Artificial Neural Networks. Bochum, Germany, 1996, pp.53–58.Google Scholar
  18. 18.
    Duin RPW, Tax DMJ. Classifier conditional posterior probabilities. In: Amin A, Dori D, Pudil P, Fremman H (eds). Advances in Pattern Recognition, LNCS 1451. Springer, 1998, pp.611–619.Google Scholar
  19. 19.
    Gillick L, Ito Y, Young J. A probabilistic approach to confidence estimation and evaluation. In Proc. International Conference on Acoustics, Speech, and Signal Processing, vol.2. Munich, Germany, 1997, pp.879–882.Google Scholar
  20. 20.
    Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, In: Smola AJ, Bartlett P, Scholkopf D, Schuurmanns D (eds). Advances in Large Margin Classifiers. MIT Press, 1999.Google Scholar
  21. 21.
    Gorski N. Practical combination of multiple classifiers, In: Downton AC, Impedovo S (eds), Progress of Handwriting Recognition. World Scientific, 1997.Google Scholar
  22. 22.
    Wei W, Leen TK, Barnard E. A fast histogram-based postprocessor that improves posterior probability estimates. Neural Computation 1999; 11(5): 1235–1248.Google Scholar
  23. 23.
    Schürmann J, Pattern Classification: A United View of Statistical and Neural Approaches. Wiley-Interscience, 1996.Google Scholar
  24. 24.
    Hao H, Liu CL, Sako H. Confidence evaluation for combining diverse classifiers. In Proc. 7th International Conference on Document Analysis and Recognition. Edinburgh, Scotland, 2003, pp.760–764.Google Scholar
  25. 25.
    Hashem S. Optimal linear combinations of neural networks. Neural Networks 1997; 10(4): 599–614.Google Scholar
  26. 26.
    Ueda N. Optimal linear combination of neural networks for improving classification performance. IEEE Trans. Pattern Analysis and Machine Intelligence 2000; 22(2): 207–215.Google Scholar
  27. 27.
    Lee DS, Srihari SN. A theory of classifier combination: the neural network approach. In Proc. 3rd International Conference on Document Analysis and Recognition. Montreal, 1995, pp.42–45.Google Scholar
  28. 28.
    Duin RPW, Tax DMJ. Experiments with classifier combining rules. In: Kittler J, Roli F (eds). Multiple Classifier Systems, LNCS 1857. Springer, 2000, pp.16-29.Google Scholar
  29. 29.
    Kuncheva LI, Bezdek JC, Duin RPW. Decision templates for multiple classifier fusion: an experimental comparison. Pattern Recognition 2001; 34(2): 299–314.Google Scholar
  30. 30.
    Shafer G, A Mathematical Theory of Evidence. Princeton Univ. Press, 1976.Google Scholar
  31. 31.
    Barnett JA. Computational methods for a mathematical theory of evidence. In Proc. 7th International Joint Conference on Artificial Intelligence. Vancouver, Canada, 1981, pp.868–875.Google Scholar
  32. 32.
    Rogova G. Combining the results of several neural network classifiers. Neural Networks 1994; 7(5): 777–781.Google Scholar
  33. 33.
    Tomai CI, Srihari SN. Combination of type III digit recognizers using the Dempster-Shafer theory of edivence. In Prof. 7th International Conference on Document Analysis and Recognition. Edinburgh, 2003, pp.854–858.Google Scholar
  34. 34.
    Jain AK, Prabhakar S, Chen S. Combining multiple matches for a high security fingerprint verification system. Pattern Recognition Letters 1999; 20(11–13): 1371–1379.Google Scholar
  35. 35.
    Wu L, Oviatt SL, Cohen PR. From members to teams to committee—a robust approach to gestural and multimodal recognition. IEEE Trans. Neural Networks 2002; 13(4): 972–982.Google Scholar
  36. 36.
    Bridle JS. Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Fogelman-Soulie, Herault (eds). Neurocomputing: Algorithms, Architectures and Applications. Springer, 1990, pp.227–236.Google Scholar
  37. 37.
    Robbins H, Monro S. A stochastic approximation method. Annals of Mathematical Statistics 1951; 22:400–407.Google Scholar
  38. 38.
    Liu CL, Sako H, Fujisawa H. Performance evaluation of pattern classifiers for handwritten character recognition. Int. J. Document Analysis and Recognition 2002; 4(3): 191–204.Google Scholar
  39. 39.
    Liu CL, Nakashima K, Sako H, Fujisawa H. Handwritten digit recognition: benchmarking of state-of-the-art techniques. Pattern Recognition 2003; 36(10): 2271–2285.Google Scholar
  40. 40.
    Liu CL, Nakashima K, Sako H, Fujisawa H. Handwritten digit recognition: investigation of normalization and feature extraction techniques. Pattern Recognition 2003; 37(2):265–279Google Scholar
  41. 41.
    Hamanaka M, Yamada K, Tsukumo J. Normalization-cooperated feature extraction method for handprinted Kanji character recognition. In Proc. 3rd International Workshop on Frontiers of Handwriting Recognition. Buffalo, NY, 1993, pp.343-348.Google Scholar
  42. 42.
    Liu CL, Liu YJ, Dai RW. Preprocessing and statistical/structural feature extraction for handwritten numeral recognition. In: Downton AC, Impedovo S (eds). Progress of Handwriting Recognition. World Scientific, 1997, pp.161-168.Google Scholar
  43. 43.
    Liu CL, Koga M, Sako H, Fujisawa H. Aspect ratio adaptive normalization for handwritten character recognition. In: Tan T, Shi Y, Gao W (eds). Advances in Multimodal Interfaces—ICMI2000, LNCS 1948. Springer, 2000, pp.418–425.Google Scholar
  44. 44.
    Bishop CM, Neural Networks for Pattern Recognition. Claderon Press, Oxford, 1995.Google Scholar
  45. 45.
    Kreßel U, Schürmann J. Pattern classification techniques based on function approximation. In: Bunke H, Wang PSP (eds). Handbook of Character Recognition and Document Image Analysis. World Scientific, 1997, pp.49–78.Google Scholar
  46. 46.
    Liu CL, Nakagawa M. Evaluation of prototype learning algorithms for nearest neighbor classifier in application to handwritten character recognition. Pattern Recognition 2001; 34(3): 601–615.Google Scholar
  47. 47.
    Liu CL, Sako H, Fujisawa H. Learning quadratic discriminant function for handwritten character recognition. In Proc. 16th International Conference on Pattern Recognition, vol.4. Quebec, Canada, 2002, pp.44–47.Google Scholar
  48. 48.
    Grother PJ, NIST special database 19: handprinted forms and characters database. Technical report and CDROM, 1995.Google Scholar

Copyright information

© Springer-Verlag London Limited 2004

Authors and Affiliations

  1. 1.Central Research LaboratoryHitachiTokyo 185-8601Japan
  2. 2.Department of Computer ScienceUniversity of Science and Technology BeijingBeijing 100083P.R. China

Personalised recommendations