Skip to main content

Automatic Selection of a Discrimination Rule Based upon Minimization of the Empirical Risk

  • Conference paper
Book cover Pattern Recognition Theory and Applications

Part of the book series: NATO ASI Series ((NATO ASI F,volume 30))

  • 200 Accesses

Abstract

A discrimination rule is chosen from a possibly infinite collection of discrimination rules based upon the minimization of the observed error in a test sample. For example, the collection could include all k nearest neighbor rules (for all k), all linear discriminators, and all kernel-based rules (for all possible choices of the smoothing parameter). We do not put any restrictions on the collection.

We study how close the probability of error of the selected rule is to the (unknown) minimal probability of error over the entire collection. If both training sample and test sample have n observations, the expected value of the difference is shown to be \(O\left( {\sqrt {\log (n)/n} } \right)\) for many reasonable collections, such as the one mentioned above. General inequalities governing this error are given which are of a combinatorial nature, i.e., they are valid for all possible distributions of the data, and most practical collections of rules.

The theory is based in part on the work of Vapnik and Chervonenkis regarding minimization of the empirical risk. For all proofs, technical details, and additional examples, we refer to Devroye (1986).

As a by-product, we establish that for some nonparametric rules, the probability of error of the selected rule converges at the optimal rate (achievable within the given collection of non-parametric rules) to the Bayes probability of error, and this without actually knowing the optimal rate of convergence to the Bayes probability of error.

Research of the author was sponsored by NSERC Grant A3456 and by FCAR Grant EQ-16T8

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. O. Bashkirov, E.M. Braverman, and I.E. Muchnik, “Potential function algorithms for pattern recognition learning machines,” Automation and Remote Control, vol. 25, pp. 692–695, 1964.

    Google Scholar 

  2. L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees, Wadsworth International, Belmont, CA., 1984.

    MATH  Google Scholar 

  3. T. Cacoullos, “Estimation of a multivariate density,” Annals of the Institute of Statistical Mathematics, vol. 18, pp. 179–190, 1965.

    Article  MathSciNet  Google Scholar 

  4. R.G. Casey and G. Nagy, “Decision tree design using a probabilistic model,” IEEE Transactions on Information Theory, vol. IT-30, pp. 93–99, 1984.

    Article  Google Scholar 

  5. T.M. Cover, “Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition,” IEEE Transactions on Electronic Computers, vol. EC-14, pp. 326–334, 1965.

    Google Scholar 

  6. T.M. Cover, “Learning in pattern recognition,” in Methodologies of Pattern Recognition, ed. S. Watanabe, pp. 111–132, Academic Press, New York, N.Y., 1969.

    Google Scholar 

  7. T.M. Cover and T.J. Wagner, “Topics in statistical pattern recognition,” Communication and Cybernetics, vol. 10, pp. 15–46, 1975.

    Article  Google Scholar 

  8. L. Devroye, “A universal k-nearest neighbor procedure in discrimination,” in Proceedings of the 1978 IEEE Computer Society Conference on Pattern Recognition and Image Processing, pp. 142–147, 1978.

    Google Scholar 

  9. L. Devroye and T.J. Wagner, “Distribution-free performance bounds for potential function rules,” IEEE Transactions on Information Theory, vol. IT-25, pp. 601–604, 1979.

    Article  MathSciNet  MATH  Google Scholar 

  10. L. Devroye and T.J. Wagner, “Distribution-free performance bounds with the re- substitution error estimate,” IEEE Transactions on Information Theory, vol. IT-25, pp. 208–210, 1979.

    Article  MathSciNet  MATH  Google Scholar 

  11. L. Devroye and T.J. Wagner, “Distribution-free inequalities for the deleted and holdout error estimates,” IEEE Transactions on Information Theory, vol. IT-25, pp. 202–207, 1979.

    Article  MathSciNet  MATH  Google Scholar 

  12. L. Devroye and T.J. Wagner, “Distribution-free consistency results in non- parametric discrimination and regression function estimation,” Annals of Statistics, vol. 8, pp. 231–239, 1980.

    Article  MathSciNet  MATH  Google Scholar 

  13. L. Devroye, “Bounds for the uniform deviation of empirical measures,” Journal of Multivariate Analysis, vol. 12, pp. 72–79, 1982.

    Article  MathSciNet  MATH  Google Scholar 

  14. L. Devroye and L. Gyorfi, Nonparametric Density Estimation: the L 1 View, John Wiley, New York, 1985.

    MATH  Google Scholar 

  15. L. Devroye, “Automatic pattern recognition: a study of the probability of error,” Technical Report, School of Computer Science, McGill University, 1986.

    Google Scholar 

  16. R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, John Wiley, New York, N.Y., 1973.

    MATH  Google Scholar 

  17. B. Efron, “Bootstrap methods: another look at the jackknife,” Annals of Statistics, vol. 7, pp. 1–26, 1979.

    Article  MathSciNet  MATH  Google Scholar 

  18. B. Efron, “Estimating the error rate of a prediction rule: improvement on cross validation,” Journal of the American Statistical Association, vol. 78, pp. 316–331, 1983.

    Article  MathSciNet  MATH  Google Scholar 

  19. L. Feinholz, “Estimation of the performance of partitioning algorithms in pattern classification,” M.Sc.Thesis, Department of Mathematics, McGill University, Montreal, 1979.

    Google Scholar 

  20. E. Fix and J.L. Hodges, “Discriminating analysis, nonparametric discrimination, consistency properties,” Report 21–49-004, USAF School of Aviation Medicine, Randolph Field, Texas, 1951.

    Google Scholar 

  21. E. Fix and J.L. Hodges, “Discriminatory analysis: small sample performance,” Report 21–49-004, USAF School of Aviation Medicine, Randolph Field, Texas, 1952.

    Google Scholar 

  22. N. Glick, “Sample-based classification procedures derived from density estimators,” Journal of the American Statistical Association, vol. 67, pp. 116–122, 1972.

    Article  MATH  Google Scholar 

  23. N. Glick, “Sample-based classification procedures related to empiric distributions,” Transactions on Information Theory, vol. IT-22, pp. 454–461, 1976.

    Article  MathSciNet  MATH  Google Scholar 

  24. N. Glick, “Additive estimators for probabilities of correct classification,” Pattern Recognition, vol. 10, pp. 211–222, 1978.

    Article  MATH  Google Scholar 

  25. W. Greblicki, A. Krzyzak, and M. Pawlak, “Distribution-free pointwise consistency of kernel regression estimate,” Annals of Statistics, vol. 12, pp. 1570–1575, 1984.

    Article  MathSciNet  MATH  Google Scholar 

  26. L.N. Kanal, “Pattern in pattern recognition,” IEEE Transactions on Information Theory, vol. IT-20, pp. 697–722, 1974.

    Article  MathSciNet  MATH  Google Scholar 

  27. Y.K. Lin and K.S. Fu, “Automatic classification of cervical cells using a binary tree classifier,” Pattern Recognition, vol. 16, pp. 69–80, 1983.

    Article  Google Scholar 

  28. A.L. Lunts and V.L. Brailosvsky, “Evaluation of attributes obtained in statistical decision rules,” Engineering Cybernetics, vol. 5, pp. 98–109, 1967.

    Google Scholar 

  29. P. Massart, “Vitesse de convergence dans le théorème de la limite centrale pour le processus empirique,” Ph.D. Dissertation, Université de Paris-Sud, Orsay, France, 1983.

    Google Scholar 

  30. W. Meisel, “Potential functions in mathematical pattern recognition,” IEEE Transactions on Computers, vol. C-18, pp. 911–918, 1969.

    Article  MathSciNet  MATH  Google Scholar 

  31. R.A. Olshen, “Comments on a paper by C.J. Stone,” Annals of Statistics, vol. 5, pp. 632–633, 1977.

    Google Scholar 

  32. E. Parzen, “On the estimation of a probability density function and the mode,” Annals of Mathematical Statistics, vol. 33, pp. 1065–1076, 1962.

    Article  MathSciNet  MATH  Google Scholar 

  33. H.J. Payne and W.S. Meisel, “An algorithm for constructing optimal binary decision trees,” IEEE Transactions on Computers, vol. C-26, pp. 905–916, 1977.

    Article  MathSciNet  MATH  Google Scholar 

  34. M. Rosenblatt, “Remark on some nonparametric estimates of a density function,” Annals of Mathematical Statistics, vol. 27, pp. 832–837, 1956.

    Article  MathSciNet  MATH  Google Scholar 

  35. G. Sebestyen,Decision Making Processes in Pattern Recognition, Macmillan, New York, N.Y., 1962.

    Google Scholar 

  36. I.K. Sethi and B. Chatterjee, “Efficient decision tree design for discrete variable pattern recognition problems,” Pattern Recognition, vol. 9, pp. 197–206, 1977.

    Article  Google Scholar 

  37. I.K. Sethi and G.P.R. Sarvarayudu, “Hierarchical classifier design using mutual Information,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PA MI-4, pp. 441–445, 1981.

    Article  Google Scholar 

  38. C. Spiegelman and J. Sacks, “Consistent window estimation in nonparametric regression,” Annals of Statistics, vol. 8, pp. 240–246, 1980.

    Article  MathSciNet  MATH  Google Scholar 

  39. C.J. Stone, “Consistent nonparametric regression,” Annals of Statistics, vol. 8, pp. 1348–1360, 1977.

    Article  Google Scholar 

  40. M. Stone, “Cross-validatory choice and assessment of statistical predictions,” Journal of the Royal Statistical Society, vol. 36, pp. 111–147, 1974.

    MATH  Google Scholar 

  41. G.T. Toussaint, “Bibliography on estimation of misclassification,” IEEE Transactions on Information Theory, vol. IT-20, pp. 474–479, 1974.

    Article  MathSciNet  Google Scholar 

  42. J. VanRyzin, “Bayes risk consistency of classification procedures using density estimation,” Sankhya Series A, vol. 28, pp. 161–170, 1966.

    MathSciNet  Google Scholar 

  43. V.N. Vapnik, Estimation of Dependences Based on Empirical Data, Springer- Verlag, 1982.

    MATH  Google Scholar 

  44. V.N. Vapnik and A. Ya. Chervonenkis, “Theory of uniform convergence of frequencies of events to their probabilities and problems of search for an optimal solution from empirical data,” Automation and Remote Control, vol. 32, pp. 207–217, 1971.

    MathSciNet  Google Scholar 

  45. V.N. Vapnik and A. Ya. Chervonenkis, “On the uniform convergence of relative frequencies of events to their probabilities,” Theory of Probability and its Applications, vol. 16, pp. 264–280, 1971.

    Article  MATH  Google Scholar 

  46. V.N. Vapnik and A. Ya. Chervonenkis, “Ordered risk minimization. I,” Automation and Remote Control, vol. 35, pp. 1226–1235, 1974.

    MathSciNet  MATH  Google Scholar 

  47. V.N. Vapnik and A. Ya. Chervonenkis, “Ordered risk minimization. II,” Automation and Remote Control, vol. 35, pp. 1043–1412, 1974.

    Google Scholar 

  48. V.N. Vapnik and A. Ya. Chervonenkis, Theory of Pattern Recognition, Nauka, Moscow, 1974.

    MATH  Google Scholar 

  49. V. N. Vapnik and A. Ya. Chervonenkis, “Necessary and sufficient conditions for the uniform convergence of means to their expectations,” Theory of Probability and its Applications, vol. 26, pp. 532–553, 1981.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1987 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Devroye, L. (1987). Automatic Selection of a Discrimination Rule Based upon Minimization of the Empirical Risk. In: Devijver, P.A., Kittler, J. (eds) Pattern Recognition Theory and Applications. NATO ASI Series, vol 30. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-83069-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-83069-3_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-83071-6

  • Online ISBN: 978-3-642-83069-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics