Skip to main content

A Geometric Framework for Unsupervised Anomaly Detection

Detecting Intrusions in Unlabeled Data

  • Chapter
Applications of Data Mining in Computer Security

Part of the book series: Advances in Information Security ((ADIS,volume 6))

Abstract

Most current intrusion detection systems employ signature-based methods or data mining-based methods which rely on labeled training data. This training data is typically expensive to produce. We present a new geometric framework for unsupervised anomaly detection, which are algorithms that are designed to process unlabeled data. In our framework, data elements are mapped to a feature space which is typically a vector space ℛd. Anomalies are detected by determining which points lies in sparse regions of the feature space. We present two feature maps for mapping data elements to a feature space. Our first map is a data-dependent normalization feature map which we apply to network connections. Our second feature map is a spectrum kernel which we apply to system call traces. We present three algorithms for detecting which points lie in sparse regions of the feature space. We evaluate our methods by performing experiments over network records from the KDD CUP 1999 data set and system call traces from the 1999 Lincoln Labs DARPA evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data. John Wiley and Sons.

    MATH  Google Scholar 

  • Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. (2000). LOF: identifying density-based local outliers. In ACM SIGMOD Int. Conf. on Management of Data, pages 93–104.

    Google Scholar 

  • Christina Leslie, E. E. and Noble, W. S. (2002). The spectrum kernel: A string kernel for SVM protein classification. In Proceedings of the Pacific Symposium on Biocomputing (PSB-2002), Kaua’i, Hawaii.

    Google Scholar 

  • Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK.

    Google Scholar 

  • Denning, D. (1987). An intrusion detection model. IEEE Transactions on Software Engineering, SE-13:222–232.

    Article  Google Scholar 

  • Eskin, E. (2000). Anomaly detection over noisy data using learned probability distributions. In Proceedings of the International Conference on Machine Learning.

    Google Scholar 

  • Eskin, E., Lee, W., and Stolfo, S. J. (2001). Modeling system calls for intrusion detection with dynamic window sizes. ln Proceedings of DARPA Information Survivabilty Conference and Exposition II (DISCEX II), Anaheim, CA.

    Google Scholar 

  • Fan, W. and Stolfo, S. (2002). Ensemble-based adaptive intrusion detection. In Proceedings of 2002 SIAM International Conference on Data Mining, Arlington, VA.

    Google Scholar 

  • Forrest, S., Hofmeyr, S. A., Somayaji, A., and Longstaff, T. A. (1996). A sense of self for unix processes. In 1996 IEEE Symposium on Security and Privacy, pages 120–128. IEEE Computer Society.

    Google Scholar 

  • Ghosh, A. and Schwartzbard, A. (1999). A study in using neural networks for anomaly and misuse detection. In Proceedings of the 8th USENIX Security Symposium.

    Google Scholar 

  • Haussler, D. (1999). Convolution kernels on discrete structures. Technical Report UCS-CRL-99–10, UC Santa Cruz.

    Google Scholar 

  • Helman, P. and Bhangoo, J. (1997). A statistically base system for prioritizing information exploration under uncertainty. IEEE Transactions on Systems,Man and Cybernetics, Part A: Systems and Humans, 27(4):449–466.

    Article  Google Scholar 

  • Hofmeyr, S. A., Forrest, S., and Somayaji, A. (1998). Intrusion detect using sequences of system calls. Journal of Computer Security, 6:151–180.

    Google Scholar 

  • Javitz, H. S. and Valdes, A. (1993). The NIDES statistical component: description and justification. In Technical Report,Computer Science Laboratory, SRI International.

    Google Scholar 

  • KDD99-Cup (1999). The third international knowledge discovery and data mining tools competition dataset http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.htm1.

    Google Scholar 

  • Knorr, E. M. and Ng, R. T. (1998). Algorithms for mining distance-based outliers in large datasets. In Proc. 24th Int. Conf. Very Large Data Bases, VLDB, pages 392–403.

    Google Scholar 

  • Knorr, E. M. and Ng, R. T. (1999). Finding intentional knowledge of distance-based outliers. The VLDB Journal, pages 211–222.

    Google Scholar 

  • Lane, T. and Brodley, C. E. (1997). Sequence matching and learning in anomaly detection for computer security. In AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, pages 43–49. AAAI Press.

    Google Scholar 

  • Lee, W. and Stolfo, S. J. (1998). Data mining approaches for intrusion detection. In Proceedings of the 1998 USENIX Security Symposium.

    Google Scholar 

  • Lee, W., Stolfo, S. J., and Chan, P. K. (1997). Learning patterns from unix processes execution traces for intrusion detection. In AAAI Workshop on AI Approaches to Fraud Detection and Risk Management, pages 50–56. AAAI Press.

    Google Scholar 

  • Lee, W., Stolfo, S. J., and Mok, K. (1999). Data mining in work flow environments: Experiences in intrusion detection. In Proceedings of the 1999 Conference on Knowledge Discovery and Data Mining (KDD99).

    Google Scholar 

  • Lippmann, R. P., Cunningham, R. K., Fried, D. J., Graf, I., Kendall, K. R., Webster, S. W., and Zissman, M. (1999). Results of the 1999 darpa off-line intrusion detection evaluation. In Second International Workshop on Recent Advances in Intrusion Detection (RAID 1999), West Lafayette, IN.

    Google Scholar 

  • McCallum, A., Nigam, K., and Ungar, L. H. (2000). Efficient clustering of high-dimensional data sets with application to reference matching. In Knowledge Discovery and Data Mining, pages 169–178.

    Google Scholar 

  • Paxson, V. (1998). Bro: A system for detecting network intruders in real-time. In Proceedings of the 7th USENIX Security Symposium, San Antonio, TX.

    Google Scholar 

  • Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In Schölkopf, B., Burges, C. J. C., and Smola, A. J., editors, Advances in Kernel Methods — Support Vector Learning, pages 185–208, Cambridge, MA. MIT Press.

    Google Scholar 

  • Portnoy, L., Eskin, E., and Stolfo, S. J. (2001). Intrusion detection with unlabeled data using clustering. In Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001), Philadelphia, PA.

    Google Scholar 

  • Provost, F., Fawcett, T., and Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In Proceedings of the Fifteenth International Conference on Machine Learning.

    Google Scholar 

  • Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. (1999). Estimating the support of a high-dimensional distribution. Technical Report 99–87, Microsoft Research. To appear in Neural Computation, 2001.

    Google Scholar 

  • Warrender, C., Forrest, S., and Pearlmutter, B. (1999). Detecting intrusions using system calls: alternative data models. In 1999 IEEE Symposium on Security and Privacy, pages 133–145. IEEE Computer Society.

    Google Scholar 

  • Watkins, C. (2000). Dynamic alignment kernels. In Smola, A., Bartlett, P., Schölkopf, B., and Schuurmans, D., editors, Advances in Large Margin Classifiers, pages 39–50, Cambridge, MA. MIT Press.

    Google Scholar 

  • Ye, N. (2000). A markov chain model of temporal behavior for anomaly detection,. In Proceedings of the 2000 IEEE Systems, Man, and Cybernetics Information Assurance and Security Workshop.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S. (2002). A Geometric Framework for Unsupervised Anomaly Detection. In: Barbará, D., Jajodia, S. (eds) Applications of Data Mining in Computer Security. Advances in Information Security, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0953-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0953-0_4

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5321-8

  • Online ISBN: 978-1-4615-0953-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics