Deep Gaussian Process autoencoders for novelty detection
Novelty detection is one of the classic problems in machine learning that has applications across several domains. This paper proposes a novel autoencoder based on Deep Gaussian Processes for novelty detection tasks. Learning the proposed model is made tractable and scalable through the use of random feature approximations and stochastic variational inference. The result is a flexible model that is easy to implement and train, and can be applied to general novelty detection tasks, including large-scale problems and data with mixed-type features. The experiments indicate that the proposed model achieves competitive results with state-of-the-art novelty detection methods.
KeywordsNovelty detection Deep Gaussian Processes Autoencoder Unsupervised learning Stochastic variational inference
The authors wish to thank the Amadeus Middleware Fraud Detection team directed by Virginie Amar and Jeremie Barlet, led by the product owner Christophe Allexandre and composed of Jean-Blas Imbert, Jiang Wu, Damien Fontanes and Yang Pu for building and labeling the transactions, shared-access and payment-sub datasets. MF gratefully acknowledges support from the AXA Research Fund.
- Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.Google Scholar
- Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv:1701.07875v2
- Bache, K., & Lichman, M. (2013). UCI machine learning repository. Irvine: University of California, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.
- Bradshaw, J., Alexander, & Ghahramani, Z. (2017). Adversarial examples, uncertainty, and transfer testing robustness in Gaussian process hybrid deep networks. arXiv:1707.02476
- Bui, T. D., Hernández-Lobato, D., Hernández-Lobato, J. M., Li, Y., & Turner, R. E. (2016). Deep Gaussian Processes for regression using approximate expectation propagation. In M. Balcan & K. Q. Weinberger (Eds.), Proceedings of the 33nd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, volume 48 of JMLR workshop and conference proceedings (pp. 1472–1481). JMLR.org.Google Scholar
- Cho, Y., & Saul, L. K. (2009). Kernel methods for deep learning. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (Vol. 22, pp. 342–350). Curran Associates, Inc.Google Scholar
- Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on machine learning, ICML ’08 (pp. 160–167). New York, NY: ACM.Google Scholar
- Cutajar, K., Bonilla, E. V., Michiardi, P., & Filippone, M. (2017). Random feature expansions for Deep Gaussian Processes. In D. Precup & Y. W. Teh (Eds.), Proceedings of the 34th international conference on machine learning, volume 70 of proceedings of machine learning research (pp. 884–893). Sydney: International Convention Centre, PMLR.Google Scholar
- Dai, Z., Damianou, A., González, J., & Lawrence, N. (2016). Variationally auto-encoded Deep Gaussian Processes. In Proceedings of the fourth international conference on learning representations (ICLR 2016).Google Scholar
- Damianou, A. C., & Lawrence, N. D. (2013). Deep Gaussian Processes. In Proceedings of the sixteenth international conference on artificial intelligence and statistics, AISTATS 2013, Scottsdale, AZ, USA, April 29–May 1, 2013, volume 31 of JMLR proceedings (pp. 207–215). JMLR.org.Google Scholar
- Davis, J., & Goadrich, M. (2006). The relationship between precision–recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning, ICML ’06 (pp. 233–240). New York, NY: ACM.Google Scholar
- Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016). Density estimation using real NVP. arXiv:1605.08803
- Duvenaud, D. K., Rippel, O., Adams, R. P., & Ghahramani, Z. (2014). Avoiding pathologies in very deep networks. In Proceedings of the seventeenth international conference on artificial intelligence and statistics, AISTATS 2014, Reykjavik, Iceland, April 22–25, 2014, volume 33 of JMLR workshop and conference proceedings (pp. 202–210). JMLR.org.Google Scholar
- Emmott, A., Das, S., Dietterich, T., Fern, A., & Wong, W.-K. (2016). A meta-analysis of the anomaly detection problem. arXiv:1503.01158v2
- Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd international conference on international conference on machine learning—volume 48, ICML’16 (pp. 1050–1059). JMLR.org.Google Scholar
- Gal, Y., Hron, J., & Kendall, A. (2017). Concrete dropout. arXiv:1705.07832
- Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014) Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 27, pp. 2672–2680). Curran Associates, Inc.Google Scholar
- Graves, A. (2011). Practical variational inference for neural networks. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (Vol. 24, pp. 2348–2356). Curran Associates, Inc.Google Scholar
- Greensmith, J., Twycross, J., & Aickelin, U. (2006). Dendritic cells for anomaly detection. In IEEE international conference on evolutionary computation, 2006 (pp. 664–671). https://doi.org/10.1109/CEC.2006.1688374.
- Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of the second international conference on learning representations (ICLR 2014).Google Scholar
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International conference on neural information processing systems, NIPS’12 (pp. 1097–1105). Curran Associates Inc.Google Scholar
- Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. In Proceedings of the 2008 eighth IEEE international conference on data mining, ICDM ’08 (pp. 413–422). IEEE Computer Society.Google Scholar
- Pokrajac, D., Lazarevic, A., & Latecki, L. J. (2007). Incremental local outlier detection for data streams. In Computational intelligence and data mining, 2007. CIDM 2007. IEEE symposium on (pp. 504–515). IEEE.Google Scholar
- Vergari, A., Peharz, R., Di Mauro, N., Molina, A., Kersting, K., & Esposito, F. (2018). Sum-product autoencoding: Encoding and decoding representations using sum-product networks. In Proceedings of the AAAI conference on artificial intelligence (AAAI).Google Scholar