Annals of Operations Research

, Volume 263, Issue 1–2, pp 21–43 | Cite as

Robust relevance vector machine for classification with variational inference

Data Mining and Analytics


The relevance vector machine (RVM) is a widely employed statistical method for classification, which provides probability outputs and a sparse solution. However, the RVM can be very sensitive to outliers far from the decision boundary which discriminates between two classes. In this paper, we propose the robust RVM based on a weighting scheme, which is insensitive to outliers and simultaneously maintains the advantages of the original RVM. Given a prior distribution of weights, weight values are determined in a probabilistic way and computed automatically during training. Our theoretical result indicates that the influences of outliers are bounded through the probabilistic weights. Also, a guideline for determining hyperparameters governing a prior is discussed. The experimental results from synthetic and real data sets show that the proposed method performs consistently better than the RVM if a training data set is contaminated by outliers.


Relevance vector machine Outlier Robust classification Sparsity 



The authors thank the anonymous reviewers and editors for their helpful and constructive comments that greatly contributed to improving the paper.


  1. An, L. T. H., & Tao, P. D. (1997). Solving a class of linearly constrained indefinite quadratic problems by D.C. algorithms. Journal of Global Optimization, 11(3), 253–285.CrossRefGoogle Scholar
  2. Bishop, C. M., & Tipping, M. E. (2000), Variational relevance vector machine. In Proceedings of the 16th conference on uncertainty in artificial intelligence (pp. 46–53).Google Scholar
  3. Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.CrossRefGoogle Scholar
  4. Caruana, R., & Niculescu-Mizil, A. (2004). Data mining in metric space: An empirical analysis of supervised learning performance criteria. In Proceedings of the 10th international conference on knowledge discovery and data mining (pp. 69–78).Google Scholar
  5. Chang, C. C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 27:21–27:27.CrossRefGoogle Scholar
  6. Christmann, A., & Steinwart, I. (2004). On robustness properties of convex risk minimization methods for pattern recognition. Journal of Machine Learning Research, 5, 1007–1034.Google Scholar
  7. Debruyne, M., Serneels, S., & Verdonck, T. (2009). Robustified least squares support vector classification. Journal of Chemometrics, 23(9), 479–486.CrossRefGoogle Scholar
  8. Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.Google Scholar
  9. Fang, Y., & Jeong, M. K. (2008). Robust probabilistic multivariate calibration model. Technometrics, 50, 305–316.CrossRefGoogle Scholar
  10. Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.Google Scholar
  11. Hwang, S., Yum, B., & Jeong, M. K. (2014). Robust relevance vector machine with variational inference for improving virtual metrology accuracy. IEEE Transaction on Semiconductor Manufacturing, 27, 1–12.CrossRefGoogle Scholar
  12. Hwang, S., Kim, N., Jeong, M. K., & Yum, B. (2015). Robust kernel based regression with bounded influence for outliers. Journal of Operations Research Society (to appear).Google Scholar
  13. Jaakkola, T. S. (2000). Tutorial on variational approximation methods. Technical Report, MIT Artificial Intelligence Lab.Google Scholar
  14. Jaakkola, T. S., & Jordan, M. I. (2000). Bayesian parameter estimation via variational methods. Statistics and Computing, 10(1), 25–37.CrossRefGoogle Scholar
  15. Lee, K., Kim, N., & Jeong, M. K. (2014). A sparse signomial model for classification and regression. Annals of Operations Research, 216, 257–286.CrossRefGoogle Scholar
  16. Lin, X. W., Wahba, G., Xiang, D., Gao, F. Y., Klein, R., & Klein, B. (2000). Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV. Annals of Statistics, 28(6), 1570–1600.CrossRefGoogle Scholar
  17. Ling, C. X., Huang, J., & Zhang, H. (2003). AUC: A better measure than accuracy in comparing learning algorithms. In Proceedings of the 2003 Canadian artificial intelligence conference (pp. 329–341).Google Scholar
  18. Ma, Z., & Leijon, A. (2011). Bayesian estimation of beta mixture models with variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(11), 2160–2173.CrossRefGoogle Scholar
  19. Mackay, D. J. C. (1992). The evidence framework applied to classification networks. Neural Computation, 4(5), 720–736.CrossRefGoogle Scholar
  20. Neal, R. M. (1996). Bayesian learning for neural networks. New York: Springer.CrossRefGoogle Scholar
  21. Ormerod, J. T., & Wand, M. P. (2010). Explaining variational approximations. The American Statistician, 64(2), 140–153.CrossRefGoogle Scholar
  22. Park, S. Y., & Liu, Y. (2011). Robust penalized logistic regression with truncated loss functions. Canadian Journal of Statistics, 39(2), 300–323.CrossRefGoogle Scholar
  23. Ratsch, G., Onoda, T., & Muller, K. R. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320.CrossRefGoogle Scholar
  24. Song, Q., Hu, W., & Xie, W. (2002). Robust support vector machine with bullet hole image classification. IEEE Transactions on Systems Man and Cybernetics Part C Applications and Reviews, 32(4), 440–448.CrossRefGoogle Scholar
  25. Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.Google Scholar
  26. Wu, Y., & Liu, Y. (2007). Robust truncated hinge loss support vector machines. Journal of the American Statistical Association, 102(479), 974–983.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.Department of Industrial and Systems EngineeringKorea Advanced Institute of Science and TechnologyDaejeonKorea
  2. 2.RUTCOR (Rutgers Center for Operations Research)Rutgers, The State University of New JerseyPiscatawayUSA

Personalised recommendations