Skip to main content
Log in

FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called “normal” instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Breunig M, Kriegel H, Ng R, Sander J (2000) LOF: identifying density-based local outliers. ACM SIGMOD Rec 29(2): 93–104

    Article  Google Scholar 

  • Byers S, Raftery AE (1998) Nearest-neighbor clutter removal for estimating features in spatial point processes. J Am Stat Assoc 93(442): 577–584

    Article  MATH  Google Scholar 

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3): 1–58

    Article  Google Scholar 

  • Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines software. http://www.csie.ntu.edu.tw/~cjlin/libsvm

  • Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning, pp 115–123

  • Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9: 1871–1874

    MATH  Google Scholar 

  • Guttormsson SE, Marks RJ, El-Sharkawi MA, Kerszenbaum I (1999) Elliptical novelty grouping for on-line short-turn detection of excited running rotors. IEEE Trans Energy Convers 14(1): 16–22

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1): 10–18

    Article  Google Scholar 

  • Huang YA, Fan W, Lee W, Yu PS (2003) Cross-feature analysis for detecting ad-hoc routing anomalies. In: ICDCS ’03: proceedings of the 23rd international conference on distributed computing systems. IEEE Computer Society, Washington, DC, USA

  • John G, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Mateo, pp 338–345

  • Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 157–166

  • Leon D, Podgurski A, Dickinson W (2005) Visualizing similarity between program executions. Proceedings of the 16th IEEE international symposium on software reliability engineering, pp 311–321

  • Mitchell T (1997) Machine learning. McGraw-Hill, New York

    MATH  Google Scholar 

  • Noto K, Brodley C, Slonim D (2010) Anomaly detection using an ensemble of feature models. In: Proceedings of the 10th IEEE international conference on data mining (ICDM 2010)

  • Quinlan JR (1990) Probabilistic decision trees, vol 3, chap 5. Morgan Kaufmann, San Mateo, pp. 140–153

  • Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  • Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5): 1207–1245

    Article  Google Scholar 

  • Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27: 379–423 (Part I)

    MathSciNet  MATH  Google Scholar 

  • Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3-4): 591–611

    Article  MathSciNet  MATH  Google Scholar 

  • Smith R, Bivens A, Embrechts M, Palagiri C, Szymanski B (2002) Clustering approaches for anomaly based intrusion detection. In: Proceedings of intelligent engineering systems through artificial neural networks, pp 579–584

  • Spackman KA (1989) Signal detection theory: valuable tools for evaluating inductive learning. In: Proceedings of the sixth international workshop on machine learning. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 160–163

  • Tang J, Chen Z, Fu A, Cheung D (2002) Enhancing effectiveness of outlier detections for low density patterns. Lecture notes in computer science. Springer, New York, pp 535–548

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keith Noto.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Noto, K., Brodley, C. & Slonim, D. FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min Knowl Disc 25, 109–133 (2012). https://doi.org/10.1007/s10618-011-0234-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-011-0234-x

Keywords

Navigation