Abstract
This article presents a new statistical inference method for classification. Instead of minimizing a loss function that solely takes residuals into account, it uses the Kolmogorov-Smirnov bounds for the cumulative distribution function of the residuals, as such taking conservative bounds for the underlying probability distribution for the population of residuals into account. The loss functions considered are based on the theory of support vector machines. Parameters for the discriminant functions are computed using a minimax criterion, and for a wide range of popular loss functions, the computations are shown to be feasible based on new optimization results presented in this article. The method is illustrated in examples, both with small simulated data sets and with real-world data.
Similar content being viewed by others
References
Augustin, T., and F. P. A. Coolen. 2004. Nonparametric predictive inference and interval probability. J. Stat. Plan. Inference, 124, 251–272.
Berger, J. O. 1985. Statistical decision theory and Bayesian analysis. New York, NY: Springer-Verlag.
Coolen, F. P. A. 2011. Nonparametric predictive inference. In International encyclopedia of statistical science, ed. M. Lovric, 968–970. Berlin: Springer.
Coolen, F. P. A., M. C. Troffaes, and T. Augustin. 2011. Imprecise probability. In International encyclopedia of statistical science, ed. M. Lovric, 645–648. Berlin: Springer.
Dempster, A. P. 1967. Upper and lower probabilities induced by a multi-valued mapping. Ann. Math. Stat., 38: 325–339.
Destercke, S., D. Dubois, and E. Chojnacki. 2008. Unifying practical uncertainty representations—I: Generalized p-boxes. Int. J. Approx. Reasoning, 49, 649–663.
Evgeniou, T., T. Poggio, M. Pontil, and A. Verri. 2002. Regularization and statistical learning theory for data analysis. Comput. Stat. Data Anal., 38, 421–432.
Frank, A., and A. Asuncion. n.d. UCI machine learning repository. Available at https://doi.org/archive.ics.uci.edu/ml
Frey, J. 2009. Confidence bands for the cdf when sampling from a finite population. Comput. Stat. Data Anal., 53, 4126–4132.
Gilboa, I., and D. Schmeidler. 1989. Maxmin expected utility with non-unique prior. J. Math. Econ., 18, 141–153.
Hastie, T., R. Tibshirani, and J. Friedman. 2001. The elements of statistical learning: Data mining, inference and prediction. New York, NY: Springer.
Johnson, N. L., and F. Leone. 1964. Statistics and experimental design in engineering and the physical sciences, vol. 1. New York, NY: Wiley.
Kriegler, E., and H. Held. 2005. Utilizing belief functions for the estimation of future climate change. Int. J. Approx. Reasoning, 39, 185–209.
Montgomery, V. J., F. P. A. Coolen, and A. D. M. Hart. 2009. Bayesian probability boxes in risk assessment. J. Stat. Theory Pract., 3, 69–83.
Mulier, F. M., and V. Cherkassky. 2007. Learning from data: Concepts, theory, and methods. Hoboken, NJ: Wiley.
Petit-Renaud, S., and T. Denoeux. 2004. Nonparametric regression analysis of uncertain and imprecise data using belief functions. Int. J. Approx. Reasoning, 35, 1–28.
Quaeghebeur, E., and G. de Cooman. 2005. Imprecise probability models for inference in exponential families. In Proceedings of the 4rd International Symposium on Imprecise Probabilities and Their Applications, ISIPTA’05, ed. J.-M. Bernard, T. Seidenfeld, and M. Zaffalon, 287–296. Pittsburgh, PA: Carnegie Mellon University, July.
R Development Core Team. 2005. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Robert, C. P. 1994. The Bayesian choice. New York, NY: Springer.
Scholkopf, B., and A. J. Smola. 2002. Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.
Shafer, G. 1976. A mathematical theory of evidence. Princeton, NJ: Princeton University Press.
Smola, A. J., and B. Scholkopf. 2004. A tutorial on support vector regression. Stat. Comput., 14, 199–222.
Tikhonov, A. N., and V. Y. Arsenin. 1977. Solutions of ill-posed problems. Washington, DC: W. H. Winston.
Troffaes, M. C. M. 2007. Decision making under uncertainty using imprecise probabilities. Int. J. Approx. Reasoning, 45, 17–29.
Utkin, L. V. 2010. Regression analysis using the imprecise Bayesian normal model. Int. J. Data Anal. Techniques Strategies, 2, 356–372.
Utkin, L. V., and T. Augustin. 2005. Efficient algorithms for decision making under partial prior information and general ambiguity attitudes. In Proceedings of the 4th International Symposium on Imprecise Probabilities and Their Applications, ISIPTA’05, ed. T. Seidenfeld, F. G. Cozman, and R. Nau, 349–358, Pittsburgh, PA: Carnegie Mellon University, SIPTA, July.
Utkin, L. V., and F. P. A. Coolen. 2011. On reliability growth models using Kolmogorov-Smirnov bounds. Int. J. Performability Eng., 7, 5–19.
Vapnik, V. 1998. Statistical learning theory. New York, NY: Wiley.
Walley, P. 1991. Statistical reasoning with imprecise probabilities. London, UK: Chapman and Hall.
Walley, P. 1996. Inferences from multinomial data: Learning about a bag of marbles (with discussion). J. R. Stat. Soc. Ser. B 58, 3–57.
Walley, P. 1996. Measures of uncertainty in expert systems. Artif. Intelligence, 83, 1–58.
Walter, G., and T. Augustin, and A. Peters. 2007. Linear regression analysis under sets of conjugate priors. In Proceedings of the Fifth International Symposium on Imprecise Probabilities and Their Applications, ed. G. de Cooman, J. Vejnarova, and M. Zaffalon, 445–455. Prague, Czech Republic.
Wasserman, L. 2006. All of nonparametric statistics. New York, NY: Springer.
Webb, A. R. 2002. Statistical pattern recognition, 2nd ed. New York, NY: Wiley.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Utkin, L.V., Coolen, F.P.A. Classification With Support Vector Machines and Kolmogorov-Smirnov Bounds. J Stat Theory Pract 8, 297–318 (2014). https://doi.org/10.1080/15598608.2013.788985
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1080/15598608.2013.788985