Advertisement

Robust Representation for Domain Adaptation in Network Security

  • Karel BartosEmail author
  • Michal Sofka
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9286)

Abstract

The goal of domain adaptation is to solve the problem of different joint distribution of observation and labels in the training and testing data sets. This problem happens in many practical situations such as when a malware detector is trained from labeled datasets at certain time point but later evolves to evade detection. We solve the problem by introducing a new representation which ensures that a conditional distribution of the observation given labels is the same. The representation is computed for bags of samples (network traffic logs) and is designed to be invariant under shifting and scaling of the feature values extracted from the logs and under permutation and size changes of the bags. The invariance of the representation is achieved by relying on a self-similarity matrix computed for each bag. In our experiments, we will show that the representation is effective for training detector of malicious traffic in large corporate networks. Compared to the case without domain adaptation, the recall of the detector improves from 0.81 to 0.88 and precision from 0.998 to 0.999.

Keywords

Traffic classification Machine learning Malware detection HTTP traffic 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ben-David, S., Blitzer, J., Crammer, K., Pereira, F., et al.: Analysis of representations for domain adaptation. Advances in neural information processing systems 19, 137 (2007)Google Scholar
  2. 2.
    Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120–128. Association for Computational Linguistics (2006)Google Scholar
  3. 3.
    Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Transactions on Knowledge and Data Engineering 20(2), 172–188 (2008)CrossRefGoogle Scholar
  4. 4.
    Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. The Journal of Machine Learning Research 11, 1109–1135 (2010)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Chen, Y., Garcia, E.K., Gupta, M.R., Rahimi, A., Cazzanti, L.: Similarity-based classification: Concepts and algorithms. The Journal of Machine Learning Research 10, 747–776 (2009)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. The Journal of Machine Learning Research 7, 551–585 (2006)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Dai, W., Yang, Q., Xue, G.-R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine learning, pp. 193–200. ACM (2007)Google Scholar
  8. 8.
    Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3) (2012)Google Scholar
  9. 9.
    Farnham, G., Leune, K.: Tools and standards for cyber threat intelligence projects. Technical report, SANS Institute InfoSec Reading Room, p. 10 (2013)Google Scholar
  10. 10.
    Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., Schölkopf, B.: Covariate shift by kernel mean matching. Dataset shift in machine learning (2009)Google Scholar
  11. 11.
    Iyer, A., Nath, S., Sarawagi, S.: Maximum mean discrepancy for class ratio estimation: convergence bounds and kernel selection. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 530–538 (2014)Google Scholar
  12. 12.
    Junejo, I.N., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(1), 172–185 (2011)CrossRefGoogle Scholar
  13. 13.
    Körner, M., Denzler, J.: Temporal self-similarity for appearance-based action recognition in multi-view setups. In: Wilson, R., Hancock, E., Bors, A., Smith, W. (eds.) CAIP 2013, Part I. LNCS, vol. 8047, pp. 163–171. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  14. 14.
    Müller, M., Clausen, C.: Transposition-invariant self-similarity matrices. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 47–50 (2007)Google Scholar
  15. 15.
    Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Advances in Neural Information Processing Systems (NIPS), pp. 1410–1418 (2009)Google Scholar
  16. 16.
    Shimodaira, H.: Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference 90(2), 227–244 (2000)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems, pp. 935–943 (2013)Google Scholar
  18. 18.
    Zhang, K., Schölkopf, B., Muandet, K., Wang, Z.: Domain adaptation under target and conditional shift. In: Dasgupta, S., Mcallester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning (ICML 2013), JMLR Workshop and Conference Proceedings, vol. 28, pp. 819–827 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Faculty of Electrical EngineeringCzech Technical University in PraguePragueCzech Republic
  2. 2.Cisco SystemsPragueCzech Republic

Personalised recommendations