Robust Representation for Domain Adaptation in Network Security
The goal of domain adaptation is to solve the problem of different joint distribution of observation and labels in the training and testing data sets. This problem happens in many practical situations such as when a malware detector is trained from labeled datasets at certain time point but later evolves to evade detection. We solve the problem by introducing a new representation which ensures that a conditional distribution of the observation given labels is the same. The representation is computed for bags of samples (network traffic logs) and is designed to be invariant under shifting and scaling of the feature values extracted from the logs and under permutation and size changes of the bags. The invariance of the representation is achieved by relying on a self-similarity matrix computed for each bag. In our experiments, we will show that the representation is effective for training detector of malicious traffic in large corporate networks. Compared to the case without domain adaptation, the recall of the detector improves from 0.81 to 0.88 and precision from 0.998 to 0.999.
KeywordsTraffic classification Machine learning Malware detection HTTP traffic
Unable to display preview. Download preview PDF.
- 1.Ben-David, S., Blitzer, J., Crammer, K., Pereira, F., et al.: Analysis of representations for domain adaptation. Advances in neural information processing systems 19, 137 (2007)Google Scholar
- 2.Blitzer, J., McDonald, R., Pereira, F.: Domain adaptation with structural correspondence learning. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120–128. Association for Computational Linguistics (2006)Google Scholar
- 7.Dai, W., Yang, Q., Xue, G.-R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine learning, pp. 193–200. ACM (2007)Google Scholar
- 8.Duan, L., Tsang, I.W., Xu, D.: Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(3) (2012)Google Scholar
- 9.Farnham, G., Leune, K.: Tools and standards for cyber threat intelligence projects. Technical report, SANS Institute InfoSec Reading Room, p. 10 (2013)Google Scholar
- 10.Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., Schölkopf, B.: Covariate shift by kernel mean matching. Dataset shift in machine learning (2009)Google Scholar
- 11.Iyer, A., Nath, S., Sarawagi, S.: Maximum mean discrepancy for class ratio estimation: convergence bounds and kernel selection. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), pp. 530–538 (2014)Google Scholar
- 14.Müller, M., Clausen, C.: Transposition-invariant self-similarity matrices. In: Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR), pp. 47–50 (2007)Google Scholar
- 15.Palatucci, M., Pomerleau, D., Hinton, G.E., Mitchell, T.M.: Zero-shot learning with semantic output codes. In: Advances in Neural Information Processing Systems (NIPS), pp. 1410–1418 (2009)Google Scholar
- 17.Socher, R., Ganjoo, M., Manning, C.D., Ng, A.: Zero-shot learning through cross-modal transfer. In: Advances in Neural Information Processing Systems, pp. 935–943 (2013)Google Scholar
- 18.Zhang, K., Schölkopf, B., Muandet, K., Wang, Z.: Domain adaptation under target and conditional shift. In: Dasgupta, S., Mcallester, D. (eds.) Proceedings of the 30th International Conference on Machine Learning (ICML 2013), JMLR Workshop and Conference Proceedings, vol. 28, pp. 819–827 (2013)Google Scholar