Evaluating Host-Based Anomaly Detection Systems: Application of the Frequency-Based Algorithms to ADFA-LD
ADFA Linux data set (ADFA-LD) is released recently for substituting the existing benchmark data sets in the area of host-based anomaly detection which have lost most of their relevance to modern computer systems. ADFA-LD is composed of thousands of system call traces collected from a contemporary Linux local server, with six types of up-to-date cyber attack involved. Previously, we have conducted a preliminary analysis of ADFA-LD, and shown that the frequency-based algorithms can be realised at a cheaper computational cost in contrast with the short sequence-based algorithms, while achieving an acceptable performance. In this paper, we further exploit the potential of the frequency-based algorithms, in attempts to reduce the dimension of the frequency vectors and identify the optimal distance functions. Two typical frequency-based algorithms, i.e., k-nearest neighbour (kNN) and k-means clustering (kMC), are applied to validate the effectiveness and efficiency.
Keywordshost-based intrusion detection system (HIDS) Unix system call
Unable to display preview. Download preview PDF.
- 1.Stavroulakis, P., Stamp, M.: Handbook of information and communication security. Springer (2010)Google Scholar
- 4.Forrest, S., Hofmeyr, S., Somayaji, A., Longstaff, T.A.: A sense of self for Unix processes. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy, pp. 120–128 (1996)Google Scholar
- 6.Forrest, S., Hofmeyr, S., Somayaji, A.: The Evolution of System-Call Monitoring. In: Annual Computer Security Applications Conference, ACSAC 2008, pp. 418–430 (2008)Google Scholar
- 7.Eskin, E., Wenke, L., Stolfo, S.J.: Modeling system calls for intrusion detection with dynamic window sizes. In: Proceedings of the DARPA Information Survivability Conference Exposition II, DISCEX 2001, pp. 165–175 (2001)Google Scholar
- 8.Hoang, X.D., Hu, J.: An efficient hidden Markov model training scheme for anomaly intrusion detection of server applications based on system calls. In: Proceedings of the 12th IEEE International Conference on Networks (ICON 2004), pp. 470–474 (2004)Google Scholar
- 10.Creech, G., Hu, J.: Generation of a new IDS test dataset: Time to retire the KDD collection. In: 2013 IEEE Wireless Communications and Networking Conference (WCNC), pp. 4487–4492 (2013)Google Scholar
- 15.Xie, M., Hu, J.: Evaluating host-based anomaly detection systems: A preliminary analysis of ADFA-LD. In: 2013 6th International Congress on Image and Signal Processing (CISP), pp. 1711–1716 (2013)Google Scholar
- 17.Jolliffe, I.: Principal component analysis. Wiley Online Library (2005)Google Scholar
- 18.Xie, M., Han, S., Tian, B.: Highly Efficient Distance-Based Anomaly Detection through Univariate with PCA in Wireless Sensor Networks. In: 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 564–571 (2011)Google Scholar
- 19.Xie, M., Hu, J., Tian, B.: Histogram-Based Online Anomaly Detection in Hierarchical Wireless Sensor Networks. In: 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 751–759 (2012)Google Scholar
- 21.Hu, J., Gingrich, D., Sentosa, A.: A k-Nearest Neighbor Approach for User Authentication through Biometric Keystroke Dynamics. In: IEEE International Conference on Communications, ICC 2008, pp. 1556–1560 (2008)Google Scholar
- 22.Hartigan, J.A., Wong, M.A.: Algorithm AS 136: A k-means clustering algorithm. Applied Statistics, 100–108 (1979)Google Scholar