Robust Representation for Domain Adaptation in Network Security

Bartos, Karel; Sofka, Michal

doi:10.1007/978-3-319-23461-8_8

Karel Bartos^12,13 &
Michal Sofka¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9286))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3503 Accesses
6 Citations
3 Altmetric

Abstract

The goal of domain adaptation is to solve the problem of different joint distribution of observation and labels in the training and testing data sets. This problem happens in many practical situations such as when a malware detector is trained from labeled datasets at certain time point but later evolves to evade detection. We solve the problem by introducing a new representation which ensures that a conditional distribution of the observation given labels is the same. The representation is computed for bags of samples (network traffic logs) and is designed to be invariant under shifting and scaling of the feature values extracted from the logs and under permutation and size changes of the bags. The invariance of the representation is achieved by relying on a self-similarity matrix computed for each bag. In our experiments, we will show that the representation is effective for training detector of malicious traffic in large corporate networks. Compared to the case without domain adaptation, the recall of the detector improves from 0.81 to 0.88 and precision from 0.998 to 0.999.

Download to read the full chapter text

Chapter PDF

Domain Adaptation with Maximum Margin Criterion with Application to Network Traffic Classification

A Machine Learning Framework for Studying Domain Generation Algorithm (DGA)-Based Malware

Uit-DGAdetector: detect domains generated by algorithms using machine learning

Article 28 March 2024

Keywords

References

Ben-David, S., Blitzer, J., Crammer, K., Pereira, F., et al.: Analysis of representations for domain adaptation. Advances in neural information processing systems 19, 137 (2007)
Google Scholar
Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120–128. Association for Computational Linguistics (2006)
Google Scholar
Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Transactions on Knowledge and Data Engineering 20(2), 172–188 (2008)
Article Google Scholar
Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. The Journal of Machine Learning Research 11, 1109–1135 (2010)
MathSciNet MATH Google Scholar
Chen, Y., Garcia, E.K., Gupta, M.R., Rahimi, A., Cazzanti, L.: Similarity-based classification: Concepts and algorithms. The Journal of Machine Learning Research 10, 747–776 (2009)
MathSciNet MATH Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. The Journal of Machine Learning Research 7, 551–585 (2006)
MathSciNet MATH Google Scholar
Dai, W., Yang, Q., Xue, G.-R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine learning, pp. 193–200. ACM (2007)
Google Scholar
Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3) (2012)
Google Scholar
Farnham, G., Leune, K.: Tools and standards for cyber threat intelligence projects. Technical report, SANS Institute InfoSec Reading Room, p. 10 (2013)
Google Scholar
Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., Schölkopf, B.: Covariate shift by kernel mean matching. Dataset shift in machine learning (2009)
Google Scholar
Iyer, A., Nath, S., Sarawagi, S.: Maximum mean discrepancy for class ratio estimation: convergence bounds and kernel selection. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 530–538 (2014)
Google Scholar
Junejo, I.N., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(1), 172–185 (2011)
Article Google Scholar
Körner, M., Denzler, J.: Temporal self-similarity for appearance-based action recognition in multi-view setups. In: Wilson, R., Hancock, E., Bors, A., Smith, W. (eds.) CAIP 2013, Part I. LNCS, vol. 8047, pp. 163–171. Springer, Heidelberg (2013)
Chapter Google Scholar
Müller, M., Clausen, C.: Transposition-invariant self-similarity matrices. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 47–50 (2007)
Google Scholar
Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Advances in Neural Information Processing Systems (NIPS), pp. 1410–1418 (2009)
Google Scholar
Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference 90(2), 227–244 (2000)
Article MathSciNet Google Scholar
Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems, pp. 935–943 (2013)
Google Scholar
Zhang, K., Schölkopf, B., Muandet, K., Wang, Z.: Domain adaptation under target and conditional shift. In: Dasgupta, S., Mcallester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning (ICML 2013), JMLR Workshop and Conference Proceedings, vol. 28, pp. 819–827 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
Karel Bartos
Cisco Systems, Karlovo Namesti 10, 12000, Prague, Czech Republic
Karel Bartos & Michal Sofka

Authors

Karel Bartos
View author publications
You can also search for this author in PubMed Google Scholar
Michal Sofka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karel Bartos .

Editor information

Editors and Affiliations

Huawei Noah’s Ark Lab, Shatin, Hong Kong
Albert Bifet
Siemens AG Corporate Technology, München, Germany
Michael May
IBM Research Brazil, Rio de Janeiro, Brazil
Bianca Zadrozny
Universitat Politècnica de Catalunya, Barcelona, Spain
Ricard Gavalda
Università di Pisa, Pisa, Italy
Dino Pedreschi
Eurecat / Yahoo Labs, Barcelona, Spain
Francesco Bonchi
University of Porto - INESC TEC, Porto, Portugal
Jaime Cardoso
Otto-von-Guericke University, Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bartos, K., Sofka, M. (2015). Robust Representation for Domain Adaptation in Network Security. In: Bifet, A., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9286. Springer, Cham. https://doi.org/10.1007/978-3-319-23461-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-23461-8_8
Published: 29 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23460-1
Online ISBN: 978-3-319-23461-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robust Representation for Domain Adaptation in Network Security

Abstract

Chapter PDF

Similar content being viewed by others

Domain Adaptation with Maximum Margin Criterion with Application to Network Traffic Classification

A Machine Learning Framework for Studying Domain Generation Algorithm (DGA)-Based Malware

Uit-DGAdetector: detect domains generated by algorithms using machine learning

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Robust Representation for Domain Adaptation in Network Security

Abstract

Chapter PDF

Similar content being viewed by others

Domain Adaptation with Maximum Margin Criterion with Application to Network Traffic Classification

A Machine Learning Framework for Studying Domain Generation Algorithm (DGA)-Based Malware

Uit-DGAdetector: detect domains generated by algorithms using machine learning

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation