Machine Learning

, Volume 107, Issue 8–10, pp 1363–1383 | Cite as

Deep Gaussian Process autoencoders for novelty detection

  • Rémi DominguesEmail author
  • Pietro Michiardi
  • Jihane Zouaoui
  • Maurizio Filippone
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2018 Journal Track


Novelty detection is one of the classic problems in machine learning that has applications across several domains. This paper proposes a novel autoencoder based on Deep Gaussian Processes for novelty detection tasks. Learning the proposed model is made tractable and scalable through the use of random feature approximations and stochastic variational inference. The result is a flexible model that is easy to implement and train, and can be applied to general novelty detection tasks, including large-scale problems and data with mixed-type features. The experiments indicate that the proposed model achieves competitive results with state-of-the-art novelty detection methods.


Novelty detection Deep Gaussian Processes Autoencoder Unsupervised learning Stochastic variational inference 



The authors wish to thank the Amadeus Middleware Fraud Detection team directed by Virginie Amar and Jeremie Barlet, led by the product owner Christophe Allexandre and composed of Jean-Blas Imbert, Jiang Wu, Damien Fontanes and Yang Pu for building and labeling the transactions, shared-access and payment-sub datasets. MF gratefully acknowledges support from the AXA Research Fund.


  1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from Scholar
  2. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv:1701.07875v2
  3. Bache, K., & Lichman, M. (2013). UCI machine learning repository. Irvine: University of California, School of Information and Computer Sciences.
  4. Bradshaw, J., Alexander, & Ghahramani, Z. (2017). Adversarial examples, uncertainty, and transfer testing robustness in Gaussian process hybrid deep networks. arXiv:1707.02476
  5. Bui, T. D., Hernández-Lobato, D., Hernández-Lobato, J. M., Li, Y., & Turner, R. E. (2016). Deep Gaussian Processes for regression using approximate expectation propagation. In M. Balcan & K. Q. Weinberger (Eds.), Proceedings of the 33nd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, volume 48 of JMLR workshop and conference proceedings (pp. 1472–1481). Scholar
  6. Cho, Y., & Saul, L. K. (2009). Kernel methods for deep learning. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 342–350). Curran Associates, Inc.Google Scholar
  7. Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning, ICML ’08 (pp. 160–167). New York, NY: ACM.Google Scholar
  8. Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2017). Random feature expansions for Deep Gaussian Processes. In D. Precup & Y. W. Teh (Eds.), Proceedings of the 34th international conference on machine learning, volume 70 of proceedings of machine learning research (pp. 884–893). Sydney: International Convention Centre, PMLR.Google Scholar
  9. Dai, Z., Damianou, A., González, J., & Lawrence, N. (2016). Variationally auto-encoded Deep Gaussian Processes. In Proceedings of the fourth international conference on learning representations (ICLR 2016).Google Scholar
  10. Damianou, A. C., & Lawrence, N. D. (2013). Deep Gaussian Processes. In Proceedings of the sixteenth international conference on artificial intelligence and statistics, AISTATS 2013, Scottsdale, AZ, USA, April 29–May 1, 2013, volume 31 of JMLR proceedings (pp. 207–215). Scholar
  11. Davis, J., & Goadrich, M. (2006). The relationship between precision–recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning, ICML ’06 (pp. 233–240). New York, NY: ACM.Google Scholar
  12. Dereszynski, E. W., & Dietterich, T. G. (2011). Spatiotemporal models for data-anomaly detection in dynamic environmental monitoring campaigns. ACM Transactions on Sensor Networks (TOSN), 8(1), 3.CrossRefGoogle Scholar
  13. Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016). Density estimation using real NVP. arXiv:1605.08803
  14. Domingues, R., Filippone, M., Michiardi, P., & Zouaoui, J. (2018). A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recognition, 74, 406–421.CrossRefGoogle Scholar
  15. Duvenaud, D. K., Rippel, O., Adams, R. P., & Ghahramani, Z. (2014). Avoiding pathologies in very deep networks. In Proceedings of the seventeenth international conference on artificial intelligence and statistics, AISTATS 2014, Reykjavik, Iceland, April 22–25, 2014, volume 33 of JMLR workshop and conference proceedings (pp. 202–210). Scholar
  16. Emmott, A., Das, S., Dietterich, T., Fern, A., & Wong, W.-K. (2016). A meta-analysis of the anomaly detection problem. arXiv:1503.01158v2
  17. Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd international conference on international conference on machine learning—volume 48, ICML’16 (pp. 1050–1059). Scholar
  18. Gal, Y., Hron, J., & Kendall, A. (2017). Concrete dropout. arXiv:1705.07832
  19. Garca, S., Fernndez, A., Luengo, J., & Herrera, F. (2010). Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences, 180(10), 2044–2064.CrossRefGoogle Scholar
  20. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014) Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 27, pp. 2672–2680). Curran Associates, Inc.Google Scholar
  21. Graves, A. (2011). Practical variational inference for neural networks. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 24, pp. 2348–2356). Curran Associates, Inc.Google Scholar
  22. Greensmith, J., Twycross, J., & Aickelin, U. (2006). Dendritic cells for anomaly detection. In IEEE international conference on evolutionary computation, 2006 (pp. 664–671).
  23. Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A.-R., Jaitly, N., et al. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82–97.CrossRefGoogle Scholar
  24. Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), 85–126.CrossRefzbMATHGoogle Scholar
  25. Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429–449.zbMATHGoogle Scholar
  26. Kemmler, M., Rodner, E., Wacker, E.-S., & Denzler, J. (2013). One-class classification with Gaussian processes. Pattern Recognition, 46(12), 3507–3518.CrossRefGoogle Scholar
  27. Kim, J., & Scott, C. D. (2012). Robust kernel density estimation. Journal of Machine Learning Research, 13, 2529–2565.MathSciNetzbMATHGoogle Scholar
  28. Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of the second international conference on learning representations (ICLR 2014).Google Scholar
  29. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International conference on neural information processing systems, NIPS’12 (pp. 1097–1105). Curran Associates Inc.Google Scholar
  30. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.CrossRefGoogle Scholar
  31. Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. In Proceedings of the 2008 eighth IEEE international conference on data mining, ICDM ’08 (pp. 413–422). IEEE Computer Society.Google Scholar
  32. Liu, H., Shah, S., & Jiang, W. (2004). On-line outlier detection and data cleaning. Computers & Chemical Engineering, 28(9), 1635–1647.CrossRefGoogle Scholar
  33. Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.zbMATHGoogle Scholar
  34. Markou, M., & Singh, S. (2003). Novelty detection: A review-part 2: Neural network based approaches. Signal Processing, 83(12), 2499–2521.CrossRefzbMATHGoogle Scholar
  35. Neal, R. M. (1996). Bayesian learning for neural networks (lecture notes in statistics) (1st ed.). Berlin: Springer.CrossRefGoogle Scholar
  36. Pimentel, M. A., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 99, 215–249.CrossRefGoogle Scholar
  37. Pokrajac, D., Lazarevic, A., & Latecki, L. J. (2007). Incremental local outlier detection for data streams. In Computational intelligence and data mining, 2007. CIDM 2007. IEEE symposium on (pp. 504–515). IEEE.Google Scholar
  38. Prastawa, M., Bullitt, E., Ho, S., & Gerig, G. (2004). A brain tumor segmentation framework based on outlier detection. Medical Image Analysis, 8(3), 275–283.CrossRefGoogle Scholar
  39. Rasmussen, C. E., & Williams, C. (2006). Gaussian processes for machine learning. Cambridge, MA: MIT Press.zbMATHGoogle Scholar
  40. Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 22, 400–407.MathSciNetCrossRefzbMATHGoogle Scholar
  41. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.MathSciNetzbMATHGoogle Scholar
  42. Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3), 611–622.MathSciNetCrossRefzbMATHGoogle Scholar
  43. Uria, B., Côté, M.-A., Gregor, K., Murray, I., & Larochelle, H. (2016). Neural autoregressive distribution estimation. Journal of Machine Learning Research, 17(205), 1–37.MathSciNetzbMATHGoogle Scholar
  44. Vergari, A., Peharz, R., Di Mauro, N., Molina, A., Kersting, K., & Esposito, F. (2018). Sum-product autoencoding: Encoding and decoding representations using sum-product networks. In Proceedings of the AAAI conference on artificial intelligence (AAAI).Google Scholar
  45. Worden, K., Manson, G., & Fieller, N. R. (2000). Damage detection using outlier analysis. Journal of Sound and Vibration, 229(3), 647–667.CrossRefGoogle Scholar
  46. Zimek, A., Schubert, E., & Kriegel, H.-P. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining, 5(5), 363–387.MathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Department of Data ScienceEURECOMSophia AntipolisFrance
  2. 2.AmadeusSophia AntipolisFrance

Personalised recommendations