Skip to main content

ODDITY: An Ensemble Framework Leverages Contrastive Representation Learning for Superior Anomaly Detection

  • Conference paper
  • First Online:
Information and Communications Security (ICICS 2022)

Abstract

Ensemble approaches are promising for anomaly detection due to the heterogeneity of network traffic. However, existing ensemble approaches lack applicability and efficiency. We propose ODDITY, a new end-to-end data-driven ensemble framework. ODDITY use Diverse Autoencoders trained on a pre-clustered subset with contrastive representation learning to encourage base-leaners to give distinct predictions. Then, ODDITY combines the extracted features with a supervised gradient boosting meta-learner. Experiments using benchmarking and real-world network traffic datasets demonstrate that ODDITY is superior in terms of efficiency and precision. ODDITY averages 0.8350 AUPRC on benchmarking datasets (10% better than traditional machine learning algorithms and 6% better than the state-of-the-art semi-supervised ensemble method). ODDITY also outperforms the state-of-the-art on real-world datasets regarding better detection accuracy and speed. Moreover, ODDITY is more resilient to evasion attacks and has a promising potential for unsupervised anomaly detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cse-cic-ids2018 datasets. https://www.unb.ca/cic/datasets/ids-2018.html. Accessed 23 June 2021

  2. Aggarwal, C.C., Sathe, S.: Theoretical foundations and algorithms for outlier ensembles? ACM SIGKDD Explor. Newsl. 17(1), 24–47 (2015)

    Article  Google Scholar 

  3. Aggarwal, C.C., Sathe, S.: Outlier Ensembles, pp. 1–34. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54765-7

    Book  Google Scholar 

  4. Bandaragoda, T.R., Ting, K.M., Albrecht, D., Liu, F.T., Wells, J.R.: Efficient anomaly detection by isolation using nearest neighbour ensemble. In: Proceedings of IEEE International Conference on Data Mining Workshop (2014)

    Google Scholar 

  5. Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). https://doi.org/10.1109/tpami.2013.50

    Article  Google Scholar 

  6. Bow, S.T.: Multilayer perceptron. In: Pattern Recognition and Image Preprocessing, pp. 201–224, November 2002

    Google Scholar 

  7. Breiman, L.: Machine learning. Bagging predictors 24(2), 123–140 (1996). https://doi.org/10.1007/BF00058655

    Article  Google Scholar 

  8. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF. ACM SIGMOD Rec. 29(2), 93–104 (2000). https://doi.org/10.1145/335191.335388

    Article  Google Scholar 

  9. Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 3–14 (2017)

    Google Scholar 

  10. Chen, J., Sathe, S., Aggarwal, C., Turaga, D.: Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 90–98, September 2017. https://doi.org/10.1137/1.9781611974973.11

  11. Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)

    Google Scholar 

  12. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)

    Google Scholar 

  13. Dua, D., Graff, C.: UCI Machine Learning Repository (2017)

    Google Scholar 

  14. Erfani, S.M., Rajasegarar, S., Karunasekera, S., Leckie, C.: High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recogn. 58, 121–134 (2016). https://doi.org/10.1016/j.patcog.2016.03.028

    Article  Google Scholar 

  15. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997). https://doi.org/10.1006/jcss.1997.1504

    Article  MathSciNet  MATH  Google Scholar 

  16. Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. KI-2012 Poster 59–63 (2012)

    Google Scholar 

  17. Guo, C., Gardner, J., You, Y., Wilson, A.G., Weinberger, K.: Simple black-box adversarial attacks. In: International Conference on Machine Learning, pp. 2484–2493 (2019)

    Google Scholar 

  18. Hardin, J., Rocke, D.M.: Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Comput. Stat. Data Anal. 44(4), 625–638 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ke, G., et al.: Lightgbm: a highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  20. Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. Adv. Neural. Inf. Process. Syst. 7, 231–238 (1994)

    Google Scholar 

  21. Liao, Y., Vemuri, V.: Use of k-nearest neighbor classifier for intrusion detection. Comput. Secur. 21(5), 439–448 (2002)

    Article  Google Scholar 

  22. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining (2008). https://doi.org/10.1109/icdm.2008.17

  23. Liu, X., et al.: Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. (2021)

    Google Scholar 

  24. Micenková, B., McWilliams, B., Assent, I.: Learning outlier ensembles: the best of both worlds-supervised and unsupervised. In: Proceedings of the ACM SIGKDD 2014 Workshop on Outlier Detection and Description under Data Diversity (ODD2). New York, pp. 51–54. Citeseer (2014)

    Google Scholar 

  25. Micenková, B., McWilliams, B., Assent, I.: Learning representations for outlier detection on a budget. arXiv preprint arXiv:1507.08104 (2015)

  26. Mirsky, Y., Doitshman, T., Elovici, Y., Shabtai, A.: Kitsune: an ensemble of autoencoders for online network intrusion detection. arXiv:1802.09089 (2018)

  27. Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS) (2015)

    Google Scholar 

  28. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  29. Rayana, S., Akoglu, L.: Less is more: Building selective anomaly ensembles with application to event detection in temporal graphs. In: Proceedings of the 2015 SIAM International Conference on Data Mining (2015)

    Google Scholar 

  30. Sarvari, H., Domeniconi, C., Prenkaj, B., Stilo, G.: Unsupervised boosting-based autoencoder ensembles for outlier detection. In: PAKDD 2021. LNCS (LNAI), vol. 12712, pp. 91–103. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-75762-5_8, https://arxiv.org/pdf/1910.09754v1.pdf

  31. Sathe, S., Aggarwal, C.: Lodes: Local density meets spectral outlier detection. In: Proceedings of the 2016 SIAM International Conference on Data Mining (2016)

    Google Scholar 

  32. Shyu, M.L., Chen, S.C., Sarinnapakorn, K., Chang, L.: A novel anomaly detection scheme based on principal component classifier. Miami Univ. Dept. of Electrical and Computer Engineering, Technical report (2003)

    Google Scholar 

  33. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv:1312.6199 (2013)

  34. Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992)

    Article  Google Scholar 

  35. Zhao, Y., Hryniewicki, M.K.: XGBOD: improving supervised outlier detection with unsupervised representation learning. In: 2018 International Joint Conference on Neural Networks (2018)

    Google Scholar 

  36. Zhao, Y., Nasrullah, Z., Li, Z.: PyOD: a python toolbox for scalable outlier detection. J. Mach. Learn. Res. 20(96), 1–7 (2019)

    MathSciNet  Google Scholar 

  37. Zhou, Z.H.: Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyi Peng .

Editor information

Editors and Affiliations

Appendices

A Experimental Datasets

Details of each dataset used in the experiment are provided in Table 5. Due to the size of the KDD99, UNSW-NB15, and IDS-2018 datasets, we randomly choose a portion of them.

Table 5. Summary of datasets

B Feature Importance map

The feature importance map in Fig. 7 reveals the importance of features for the Letter dataset and initially has 32 features (column 1–32), and DAs in ODDITY extract 20 more features (column 33–52). As shown in Fig. 7, the final classifier LGBM in ODDITY assign high importance factors on features (column 51, column 54, column 48, etc.) results in improved performance.

C ODDITY in Unsupervised setting

Replace the final supervised meta-learner with an unsupervised classifier to extend ODDITY to unsupervised learning. We incorporate ODDITY with three unsupervised classifiers, namely HBOS [16], Isolation Forest [22] and MCD [18]. Above mentioned methods are implemented using PyOD [36]. By choosing AUROC as the metrics, the hyperparameters and architecture of ODDITY remain the same as in Sect. 5.2. Table 6 summarizes the experimental results of averaging of ten trials. After utilizing the diverse features extracted by DA, the ROC of kNN improves by 0.3 %, the ROC of IF improves by 1.3 %, and the ROC of MCD improves by 8%. Since MCD + MCD outperforms others, we further compare the performance of MCD +ODDITY with other commonly used unsupervised anomaly detection techniques, including kNN, IF, PCA [32], and LOF [8]. ODDITY shows compelling potential in unsupervised anomaly detection by outperforming all other methods (Table 7).

Fig. 7.
figure 7

Feature Importance of LGBM and ODDITY on Letter dataset

Table 6. ODDITY in unsupervised learning
Table 7. Comparison of unsupervised anomaly detection methods

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Peng, H. et al. (2022). ODDITY: An Ensemble Framework Leverages Contrastive Representation Learning for Superior Anomaly Detection. In: Alcaraz, C., Chen, L., Li, S., Samarati, P. (eds) Information and Communications Security. ICICS 2022. Lecture Notes in Computer Science, vol 13407. Springer, Cham. https://doi.org/10.1007/978-3-031-15777-6_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15777-6_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15776-9

  • Online ISBN: 978-3-031-15777-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics