FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection

Noto, Keith; Brodley, Carla; Slonim, Donna

doi:10.1007/s10618-011-0234-x

FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection

Published: 08 September 2011

Volume 25, pages 109–133, (2012)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Keith Noto¹,
Carla Brodley¹ &
Donna Slonim¹

1181 Accesses
57 Citations
3 Altmetric
Explore all metrics

Abstract

Anomaly detection involves identifying rare data instances (anomalies) that come from a different class or distribution than the majority (which are simply called “normal” instances). Given a training set of only normal data, the semi-supervised anomaly detection task is to identify anomalies in the future. Good solutions to this task have applications in fraud and intrusion detection. The unsupervised anomaly detection task is different: Given unlabeled, mostly-normal data, identify the anomalies among them. Many real-world machine learning tasks, including many fraud and intrusion detection tasks, are unsupervised because it is impractical (or impossible) to verify all of the training data. We recently presented FRaC, a new approach for semi-supervised anomaly detection. FRaC is based on using normal instances to build an ensemble of feature models, and then identifying instances that disagree with those models as anomalous. In this paper, we investigate the behavior of FRaC experimentally and explain why FRaC is so successful. We also show that FRaC is a superior approach for the unsupervised as well as the semi-supervised anomaly detection task, compared to well-known state-of-the-art anomaly detection methods, LOF and one-class support vector machines, and to an existing feature-modeling approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised and Active Learning Using Maximin-Based Anomaly Detection

Revisiting Attribute Independence Assumption in Probabilistic Unsupervised Anomaly Detection

Performance Comparison of Anomaly Detection Algorithms

References

Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Breunig M, Kriegel H, Ng R, Sander J (2000) LOF: identifying density-based local outliers. ACM SIGMOD Rec 29(2): 93–104
Article Google Scholar
Byers S, Raftery AE (1998) Nearest-neighbor clutter removal for estimating features in spatial point processes. J Am Stat Assoc 93(442): 577–584
Article MATH Google Scholar
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3): 1–58
Article Google Scholar
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines software. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning, pp 115–123
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9: 1871–1874
MATH Google Scholar
Guttormsson SE, Marks RJ, El-Sharkawi MA, Kerszenbaum I (1999) Elliptical novelty grouping for on-line short-turn detection of excited running rotors. IEEE Trans Energy Convers 14(1): 16–22
Article Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1): 10–18
Article Google Scholar
Huang YA, Fan W, Lee W, Yu PS (2003) Cross-feature analysis for detecting ad-hoc routing anomalies. In: ICDCS ’03: proceedings of the 23rd international conference on distributed computing systems. IEEE Computer Society, Washington, DC, USA
John G, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Mateo, pp 338–345
Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 157–166
Leon D, Podgurski A, Dickinson W (2005) Visualizing similarity between program executions. Proceedings of the 16th IEEE international symposium on software reliability engineering, pp 311–321
Mitchell T (1997) Machine learning. McGraw-Hill, New York
MATH Google Scholar
Noto K, Brodley C, Slonim D (2010) Anomaly detection using an ensemble of feature models. In: Proceedings of the 10th IEEE international conference on data mining (ICDM 2010)
Quinlan JR (1990) Probabilistic decision trees, vol 3, chap 5. Morgan Kaufmann, San Mateo, pp. 140–153
Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Google Scholar
Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5): 1207–1245
Article Google Scholar
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27: 379–423 (Part I)
MathSciNet MATH Google Scholar
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3-4): 591–611
Article MathSciNet MATH Google Scholar
Smith R, Bivens A, Embrechts M, Palagiri C, Szymanski B (2002) Clustering approaches for anomaly based intrusion detection. In: Proceedings of intelligent engineering systems through artificial neural networks, pp 579–584
Spackman KA (1989) Signal detection theory: valuable tools for evaluating inductive learning. In: Proceedings of the sixth international workshop on machine learning. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp 160–163
Tang J, Chen Z, Fu A, Cheung D (2002) Enhancing effectiveness of outlier detections for low density patterns. Lecture notes in computer science. Springer, New York, pp 535–548
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tufts University, 161 College Ave., Medford, MA, 02155, USA
Keith Noto, Carla Brodley & Donna Slonim

Authors

Keith Noto
View author publications
You can also search for this author in PubMed Google Scholar
Carla Brodley
View author publications
You can also search for this author in PubMed Google Scholar
Donna Slonim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keith Noto.

Additional information

Responsible editor: Eamonn Keogh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Noto, K., Brodley, C. & Slonim, D. FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min Knowl Disc 25, 109–133 (2012). https://doi.org/10.1007/s10618-011-0234-x

Download citation

Received: 08 March 2011
Accepted: 10 August 2011
Published: 08 September 2011
Issue Date: July 2012
DOI: https://doi.org/10.1007/s10618-011-0234-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection

Abstract

Access this article

Similar content being viewed by others

Unsupervised and Active Learning Using Maximin-Based Anomaly Detection

Revisiting Attribute Independence Assumption in Probabilistic Unsupervised Anomaly Detection

Performance Comparison of Anomaly Detection Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection

Abstract

Access this article

Similar content being viewed by others

Unsupervised and Active Learning Using Maximin-Based Anomaly Detection

Revisiting Attribute Independence Assumption in Probabilistic Unsupervised Anomaly Detection

Performance Comparison of Anomaly Detection Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation