Abstract
Anomaly detection is an important problem in real-world applications. It is particularly challenging in the streaming data setting where it is infeasible to store the entire data in order to apply some algorithm. Many methods for identifying anomalies from data have been proposed in the past. The method of detecting anomalies based on a low-rank approximation of the input data that are non-anomalous using matrix sketching has shown to have low time, space requirements, and good empirical performance. However, this method fails to capture the non-linearities in the data. In this work, a kernel-based anomaly detection method is proposed which transforms the data to the kernel space using random Fourier features (RFF). When compared to the previous methods, the proposed approach attains significant empirical performance improvement in datasets with large number of examples.
Keywords
- Streaming data
- Anomaly detection
- Random Fourier features
- Matrix sketching
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3) (2009). https://doi.org/10.1145/1541880.1541882
Fujimaki, R., Yairi, T., Machida, K.: An approach to spacecraft anomaly detection problem using kernel feature space. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 401–410. ACM (2005). https://doi.org/10.1145/1081870.1081917
Lakhina, A., Crovella, M., Diot, C.: Characterization of network-wide anomalies in traffic flows. In: SIGCOMM (2004). https://doi.org/10.1145/1028788.1028813
Huang, L., Nguyen, X., Garofalakis, M., Jordan, M.I., Joseph, A., Taft, N.: In-network PCA and anomaly detection. In: NIPS, pp. 617–624 (2006)
Huang, L., Nguyen, X., Garofalakis, M., Hellerstein, J.M., Jordan, M.I., Joseph, A.D., Taft, N.: Communication-efficient online detection of network-wide anomalies. In: INFOCOM (2007). https://doi.org/10.1109/INFCOM.2007.24
Huang, H., Kasiviswanathan, S.P.: Streaming anomaly detection using randomized matrix sketching. Proc. VLDB Endow. 9(3), 192–203 (2015)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: IEEE ICDM, pp. 413–422 (2008). https://doi.org/10.1109/ICDM.2008.17
Ting, K.M., Zhou, G.T., Liu, F.T., Tan, J.S.: Mass estimation and its applications. In: ACM SIGKDD (2010). https://doi.org/10.1145/1835804.1835929
Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., Kanamori, T.: Statistical outlier detection using direct density ratio estimation. KAIS 26(2) (2011)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, pp. 1177–1184 (2007)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000). https://doi.org/10.1145/342009.335388
Tang, J., Chen, Z., Fu, A.W.C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 535–548. Springer, Berlin, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_53
Nicolau, M., McDermott, J.: A hybrid autoencoder and density estimation model for anomaly detection. In: International Conference on Parallel Problem Solving from Nature, pp. 717–726. Springer International Publishing (2016). https://doi.org/10.1007/978-3-319-45823-6_67
Liberty, E.: Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 581–588. ACM (2013). https://doi.org/10.1145/2487575.2487623
Uzilov, A.V., Keegan, J.M., Mathews, D.H.: Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC bioinform. 7(1) (2006)
Blackard, J.A., Dean, D.J.: Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput. electron. agric. 24(3), 131–151 (1999)
Caruana, R., Joachims, T., Backstrom, L.: KDD-Cup 2004: results and analysis. ACM SIGKDD Explor. Newslett. 6(2), 95–108 (2004)
Lecun, Y., Cortes, C.: The MNIST database of handwritten digits. (2009). http://yann.lecun.com/exdb/mnist/
UCI repository. https://archive.ics.uci.edu/ml/machine-learning-databases/kddcup99-mld/ (1999)
Acknowledgements
The authors would like to thank the financial support offered by the Ministry of Electronics and Information Technology (MeitY), Govt. of India under the Visvesvaraya Ph.D Scheme for Electronics and Information Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Francis, D.P., Raimond, K. (2018). A Random Fourier Features based Streaming Algorithm for Anomaly Detection in Large Datasets. In: Rajsingh, E., Veerasamy, J., Alavi, A., Peter, J. (eds) Advances in Big Data and Cloud Computing. Advances in Intelligent Systems and Computing, vol 645. Springer, Singapore. https://doi.org/10.1007/978-981-10-7200-0_18
Download citation
DOI: https://doi.org/10.1007/978-981-10-7200-0_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7199-7
Online ISBN: 978-981-10-7200-0
eBook Packages: EngineeringEngineering (R0)