A Geometric Framework for Unsupervised Anomaly Detection

Eskin, Eleazar; Arnold, Andrew; Prerau, Michael; Portnoy, Leonid; Stolfo, Sal

doi:10.1007/978-1-4615-0953-0_4

Eleazar Eskin³,
Andrew Arnold³,
Michael Prerau³,
Leonid Portnoy³ &
…
Sal Stolfo³

Part of the book series: Advances in Information Security ((ADIS,volume 6))

1064 Accesses
420 Citations

Abstract

Most current intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is typically expensive to produce. We present a new geometric framework for unsupervised anomaly detection, which are algorithms that are designed to process unlabeled data. In our framework, data elements are mapped to a feature space which is typically a vector space ℛ^d. Anomalies are detected by determining which points lies in sparse regions of the feature space. We present two feature maps for mapping data elements to a feature space. Our first map is a data-dependent normalization feature map which we apply to network connections. Our second feature map is a spectrum kernel which we apply to system call traces. We present three algorithms for detecting which points lie in sparse regions of the feature space. We evaluate our methods by performing experiments over network records from the KDD CUP 1999 data set and system call traces from the 1999 Lincoln Labs DARPA evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data. John Wiley and Sons.
MATH Google Scholar
Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. (2000). LOF: identifying density-based local outliers. In ACM SIGMOD Int. Conf. on Management of Data, pages 93–104.
Google Scholar
Christina Leslie, E. E. and Noble, W. S. (2002). The spectrum kernel: A string kernel for SVM protein classification. In Proceedings of the Pacific Symposium on Biocomputing (PSB-2002), Kaua’i, Hawaii.
Google Scholar
Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK.
Google Scholar
Denning, D. (1987). An intrusion detection model. IEEE Transactions on Software Engineering, SE-13:222–232.
Article Google Scholar
Eskin, E. (2000). Anomaly detection over noisy data using learned probability distributions. In Proceedings of the International Conference on Machine Learning.
Google Scholar
Eskin, E., Lee, W., and Stolfo, S. J. (2001). Modeling system calls for intrusion detection with dynamic window sizes. ln Proceedings of DARPA Information Survivabilty Conference and Exposition II (DISCEX II), Anaheim, CA.
Google Scholar
Fan, W. and Stolfo, S. (2002). Ensemble-based adaptive intrusion detection. In Proceedings of 2002 SIAM International Conference on Data Mining, Arlington, VA.
Google Scholar
Forrest, S., Hofmeyr, S. A., Somayaji, A., and Longstaff, T. A. (1996). A sense of self for unix processes. In 1996 IEEE Symposium on Security and Privacy, pages 120–128. IEEE Computer Society.
Google Scholar
Ghosh, A. and Schwartzbard, A. (1999). A study in using neural networks for anomaly and misuse detection. In Proceedings of the 8th USENIX Security Symposium.
Google Scholar
Haussler, D. (1999). Convolution kernels on discrete structures. Technical Report UCS-CRL-99–10, UC Santa Cruz.
Google Scholar
Helman, P. and Bhangoo, J. (1997). A statistically base system for prioritizing information exploration under uncertainty. IEEE Transactions on Systems,Man and Cybernetics, Part A: Systems and Humans, 27(4):449–466.
Article Google Scholar
Hofmeyr, S. A., Forrest, S., and Somayaji, A. (1998). Intrusion detect using sequences of system calls. Journal of Computer Security, 6:151–180.
Google Scholar
Javitz, H. S. and Valdes, A. (1993). The NIDES statistical component: description and justification. In Technical Report,Computer Science Laboratory, SRI International.
Google Scholar
KDD99-Cup (1999). The third international knowledge discovery and data mining tools competition dataset http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.htm1.
Google Scholar
Knorr, E. M. and Ng, R. T. (1998). Algorithms for mining distance-based outliers in large datasets. In Proc. 24th Int. Conf. Very Large Data Bases, VLDB, pages 392–403.
Google Scholar
Knorr, E. M. and Ng, R. T. (1999). Finding intentional knowledge of distance-based outliers. The VLDB Journal, pages 211–222.
Google Scholar
Lane, T. and Brodley, C. E. (1997). Sequence matching and learning in anomaly detection for computer security. In AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, pages 43–49. AAAI Press.
Google Scholar
Lee, W. and Stolfo, S. J. (1998). Data mining approaches for intrusion detection. In Proceedings of the 1998 USENIX Security Symposium.
Google Scholar
Lee, W., Stolfo, S. J., and Chan, P. K. (1997). Learning patterns from unix processes execution traces for intrusion detection. In AAAI Workshop on AI Approaches to Fraud Detection and Risk Management, pages 50–56. AAAI Press.
Google Scholar
Lee, W., Stolfo, S. J., and Mok, K. (1999). Data mining in work flow environments: Experiences in intrusion detection. In Proceedings of the 1999 Conference on Knowledge Discovery and Data Mining (KDD99).
Google Scholar
Lippmann, R. P., Cunningham, R. K., Fried, D. J., Graf, I., Kendall, K. R., Webster, S. W., and Zissman, M. (1999). Results of the 1999 darpa off-line intrusion detection evaluation. In Second International Workshop on Recent Advances in Intrusion Detection (RAID 1999), West Lafayette, IN.
Google Scholar
McCallum, A., Nigam, K., and Ungar, L. H. (2000). Efficient clustering of high-dimensional data sets with application to reference matching. In Knowledge Discovery and Data Mining, pages 169–178.
Google Scholar
Paxson, V. (1998). Bro: A system for detecting network intruders in real-time. In Proceedings of the 7th USENIX Security Symposium, San Antonio, TX.
Google Scholar
Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In Schölkopf, B., Burges, C. J. C., and Smola, A. J., editors, Advances in Kernel Methods — Support Vector Learning, pages 185–208, Cambridge, MA. MIT Press.
Google Scholar
Portnoy, L., Eskin, E., and Stolfo, S. J. (2001). Intrusion detection with unlabeled data using clustering. In Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001), Philadelphia, PA.
Google Scholar
Provost, F., Fawcett, T., and Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning.
Google Scholar
Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (1999). Estimating the support of a high-dimensional distribution. Technical Report 99–87, Microsoft Research. To appear in Neural Computation, 2001.
Google Scholar
Warrender, C., Forrest, S., and Pearlmutter, B. (1999). Detecting intrusions using system calls: alternative data models. In 1999 IEEE Symposium on Security and Privacy, pages 133–145. IEEE Computer Society.
Google Scholar
Watkins, C. (2000). Dynamic alignment kernels. In Smola, A., Bartlett, P., Schölkopf, B., and Schuurmans, D., editors, Advances in Large Margin Classifiers, pages 39–50, Cambridge, MA. MIT Press.
Google Scholar
Ye, N. (2000). A markov chain model of temporal behavior for anomaly detection,. In Proceedings of the 2000 IEEE Systems, Man, and Cybernetics Information Assurance and Security Workshop.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Columbia University, Columbia
Eleazar Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy & Sal Stolfo

Authors

Eleazar Eskin
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Arnold
View author publications
You can also search for this author in PubMed Google Scholar
Michael Prerau
View author publications
You can also search for this author in PubMed Google Scholar
Leonid Portnoy
View author publications
You can also search for this author in PubMed Google Scholar
Sal Stolfo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

George Mason University, USA
Daniel Barbará & Sushil Jajodia &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S. (2002). A Geometric Framework for Unsupervised Anomaly Detection. In: Barbará, D., Jajodia, S. (eds) Applications of Data Mining in Computer Security. Advances in Information Security, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0953-0_4

Download citation

DOI: https://doi.org/10.1007/978-1-4615-0953-0_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5321-8
Online ISBN: 978-1-4615-0953-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics