Scalable and Interpretable One-Class SVMs with Deep Learning and Random Fourier Features

  • Minh-Nghia Nguyen
  • Ngo Anh VienEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)


One-class support vector machine (OC-SVM) for a long time has been one of the most effective anomaly detection methods and extensively adopted in both research as well as industrial applications. The biggest issue for OC-SVM is yet the capability to operate with large and high-dimensional datasets due to optimization complexity. Those problems might be mitigated via dimensionality reduction techniques such as manifold learning or autoencoder. However, previous work often treats representation learning and anomaly prediction separately. In this paper, we propose autoencoder based one-class support vector machine (AE-1SVM) that brings OC-SVM, with the aid of random Fourier features to approximate the radial basis kernel, into deep learning context by combining it with a representation learning architecture and jointly exploit stochastic gradient descent to obtain end-to-end training. Interestingly, this also opens up the possible use of gradient-based attribution methods to explain the decision making for anomaly detection, which has ever been challenging as a result of the implicit mappings between the input space and the kernel space. To the best of our knowledge, this is the first work to study the interpretability of deep learning in anomaly detection. We evaluate our method on a wide range of unsupervised anomaly detection tasks in which our end-to-end training architecture achieves a performance significantly better than the previous work using separate training. Code related to this paper is available at:


  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). Software available from
  2. 2.
    Ancona, M., Ceolini, E., Öztireli, C., Gross, M.: Towards better understanding of gradient-based attribution methods for deep neural networks. In: International Conference on Learning Representations (2018).
  3. 3.
    Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Müller, K.R.: How to explain individual classification decisions. J. Mach. Learn. Res. 11, 1803–1831 (2010)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Barnett, V., Lewis, T.: Outliers in Statistical Data. Wiley, New York (1974)zbMATHGoogle Scholar
  5. 5.
    Bengio, Y., Lecun, Y.: Scaling Learning Algorithms Towards AI. MIT Press, Cambridge (2007)Google Scholar
  6. 6.
    Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM (JACM) 58(3), 11 (2011)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Chalapathy, R., Menon, A.K., Chawla, S.: Robust, deep and inductive anomaly detection. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017, Part I. LNCS (LNAI), vol. 10534, pp. 36–51. Springer, Cham (2017). Scholar
  8. 8.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRefGoogle Scholar
  9. 9.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)CrossRefGoogle Scholar
  10. 10.
    Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233–240. ACM, New York (2006)Google Scholar
  11. 11.
    Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017).
  12. 12.
    Erfani, S.M., Baktashmotlagh, M., Rajasegarar, S., Karunasekera, S., Leckie, C.: R1SVM: a randomised nonlinear approach to large-scale anomaly detection. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015, pp. 432–438. AAAI Press (2015)Google Scholar
  13. 13.
    Erfani, S.M., Rajasegarar, S., Karunasekera, S., Leckie, C.: High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognit. 58(C), 121–134 (2016)CrossRefGoogle Scholar
  14. 14.
    Grubbs, F.E.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 1–21 (1969)CrossRefGoogle Scholar
  15. 15.
    Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16(5), 550–554 (1994)CrossRefGoogle Scholar
  16. 16.
    Kim, J., Scott, C.D.: Robust kernel density estimation. J. Mach. Learn. Res. 13, 2529–2565 (2012)MathSciNetzbMATHGoogle Scholar
  17. 17.
    LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010).
  18. 18.
    Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008)Google Scholar
  19. 19.
    Montavon, G., Bach, S., Binder, A., Samek, W., Müller, K.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognit. 65, 211–222 (2017)CrossRefGoogle Scholar
  20. 20.
    Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, vol. 20, pp. 1177–1184. Curran Associates Inc. (2008)Google Scholar
  21. 21.
    Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.: Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Niethammer, M., et al. (eds.) IPMI 2017. LNCS, vol. 10265, pp. 146–157. Springer, Cham (2017). Scholar
  22. 22.
    Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. In: Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS 1999, pp. 582–588. MIT Press, Cambridge (1999)Google Scholar
  23. 23.
    Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: ICML (2017)Google Scholar
  25. 25.
    Shrikumar, A., Greenside, P., Shcherbina, A., Kundaje, A.: Not just a black box: learning important features through propagating activation differences. In: ICML (2017)Google Scholar
  26. 26.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: Workshop at International Conference on Learning Representations (2014)Google Scholar
  27. 27.
    Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML (2017)Google Scholar
  28. 28.
    Sutherland, D.J., Schneider, J.G.: On the error of random Fourier features. In: UAI (2015)Google Scholar
  29. 29.
    Tax, D.M., Duin, R.P.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)CrossRefGoogle Scholar
  30. 30.
    Williams, C.K.I., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Proceedings of the 13th International Conference on Neural Information Processing Systems, NIPS 2000, pp. 661–667. MIT Press, Cambridge (2000)Google Scholar
  31. 31.
    Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 478–487. (2016).
  32. 32.
    Zenati, H., Foo, C.S., Lecouat, B., Manek, G., Chandrasekhar, V.R.: Efficient GAN-based anomaly detection. In: ICLR Workshop (2018)Google Scholar
  33. 33.
    Zhai, S., Cheng, Y., Lu, W., Zhang, Z.: Deep structured energy based models for anomaly detection. In: Proceedings of the 33rd International Conference on Machine Learning, ICML 2016, New York City, 19–24 June 2016, pp. 1100–1109 (2016)Google Scholar
  34. 34.
    Zhou, C., Paffenroth, R.C.: Anomaly detection with robust deep autoencoders. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 665–674. ACM, New York (2017)Google Scholar
  35. 35.
    Zimek, A., Schubert, E., Kriegel, H.P.: A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. ASA Data Sci. J. 5(5), 363–387 (2012)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Zong, B., et al.: Deep autoencoding Gaussian mixture model for unsupervised anomaly detection. In: International Conference on Learning Representations (2018).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Electronics, Electrical Engineering and Computer ScienceQueen’s University BelfastBelfastUK

Personalised recommendations