Abstract
We live in a world of abundance of information, but lack the ability to fully benefit from it, as succinctly described by John Naisbitt in his 1982 book, “we are drowning in information, but starved for knowledge”. The information, collected by various sensors and humans, is corrupted by noise, ambiguity and distortions and suffers from the data deluge problem. Combining the noisy, ambiguous and distorted information that comes from a variety of sources scattered around the globe in order to synthesize accurate and actionable knowledge is a challenging problem. To make things even more complex, there are intentionally developed intrusive mechanisms that aim to disturb accurate information fusion and knowledge extraction; these mechanisms include cyber attacks, cyber espionage and cyber crime, to name a few. Intrusion detection has become a major research focus over the past two decades and several intrusion detection approaches, such as rule-based, signature-based and computer intelligence based approaches were developed. Out of these, computational intelligence based anomaly detection mechanisms show the ability to handle hitherto unknown intrusions and attacks. However, these approaches suffer from two different issues: (i) they are not designed to detect similar attacks on a large number of devices, and (ii) they are not designed for quickest detection. In this chapter, we describe an approach that helps to scale-up existing computational intelligence approaches to implement quickest anomaly detection in millions of devices at the same time.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Malware is an umbrella term used to refer to a variety of software intrusions, including viruses, worms, Trojan horses, ransomware, spyware, scareware, adware and so on. These can take the form of executable code, scripts, active content and other software. The majority of recent active malware threats were worms or Trojans rather than viruses [17]. When malware is used in a deliberate and concerted manner, as in APTs, one needs sophisticated monitoring and mitigating strategies to address them.
References
H.-J. Liao, C.-H. R. Lin, Y.-C. Lin, and K.-Y. Tung, “Intrusion detection system: A comprehensive review,” Journal of Network and Computer Applications, vol. 36, no. 1, pp. 16–24, 2013.
M. Shetty and N. Shekokar, “Data mining techniques for real time intrusion detection systems,” International Journal of Scientific & Engineering Research, vol. 3, no. 4, 2012.
C. Kolias, G. Kambourakis, and M. Maragoudakis, “Swarm intelligence in intrusion detection: A survey,” computers & security, vol. 30, no. 8, pp. 625–642, 2011.
S. Shin, S. Lee, H. Kim, and S. Kim, “Advanced probabilistic approach for network intrusion forecasting and detection,” Expert Systems with Applications, vol. 40, no. 1, pp. 315–322, 2013.
S. X. Wu and W. Banzhaf, “The use of computational intelligence in intrusion detection systems: A review,” Applied Soft Computing, vol. 10, no. 1, pp. 1–35, 2010.
L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: a survey,” Data Mining and Knowledge Discovery, vol. 29, no. 3, pp. 626–688, 2015.
G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in Proceedings of the 26th Annual Computer Security Applications Conference, pp. 1–9, ACM, 2010.
D. Savage, X. Zhang, X. Yu, P. Chou, and Q. Wang, “Anomaly detection in online social networks,” Social Networks, vol. 39, pp. 62–70, 2014.
W. Xu, F. Zhang, and S. Zhu, “Toward worm detection in online social networks,” in Proceedings of the 26th Annual Computer Security Applications Conference, pp. 11–20, ACM, 2010.
P. Chen, L. Desmet, and C. Huygens, “A study on advanced persistent threats,” in IFIP International Conference on Communications and Multimedia Security, pp. 63–72, Springer, 2014.
D. Kushner, “The real story of stuxnet,” ieee Spectrum, vol. 3, no. 50, pp. 48–53, 2013.
Symantec, “Symantec internet security threat report,” tech. rep., Symantec, 2011.
Fox-IT, “Interim report, diginotar cert authority breach,” tech. rep., Fox-IT Business Unit Cybercrime, Delft, 2011.
U. Rivner, “Anatomy of an attack”.
N. Villeneuve, J. T. Bennett, N. Moran, T. Haq, M. Scott, and K. Geers, Operation" Ke3chang: Targeted Attacks Against Ministries of Foreign Affairs. 2013.
D. Kindlund, X. Chen, M. Scott, and N. D. Moran, Ned anMoran, “Operation snowman: Deputydog actor compromises us veterans of foreign wars website,” 2014.
E. M. Hutchins, M. J. Cloppert, and R. M. Amin, “Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains,” Leading Issues in Information Warfare & Security Research, vol. 1, p. 80, 2011.
C. Tankard, “Advanced persistent threats and how to monitor and deter them,” Network security, vol. 2011, no. 8, pp. 16–19, 2011.
L. Huang, X. Nguyen, M. Garofalakis, M. I. Jordan, A. Joseph, and N. Taft, “In-network PCA and anomaly detection,” in NIPS, vol. 19, 2006.
C. C. Aggarwal, “On abnormality detection in spuriously populated data streams.,” in SDM, SIAM, 2005.
D.-S. Pham, S. Venkatesh, M. Lazarescu, and S. Budhaditya, “Anomaly detection in large-scale data stream networks,” Data Mining and Knowledge Discovery, vol. 28, no. 1, pp. 145–189, 2014.
X. Jiang and G. F. Cooper, “A real-time temporal bayesian architecture for event surveillance and its application to patient-specific multiple disease outbreak detection,” Data Mining and Knowledge Discovery, vol. 20, no. 3, pp. 328–360, 2010.
V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Computing Surveys (CSUR), vol. 41, no. 3, p. 15, 2009.
V. Barnett and T. Lewis, Outliers in statistical data, vol. 3. Wiley New York, 1984.
A. Koufakou and M. Georgiopoulos, “A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes,” Data Mining and Knowledge Discovery, vol. 20, no. 2, pp. 259–289, 2010.
T. White, Hadoop: The Definitive Guide: The Definitive Guide. O’Reilly Media, 2009.
D. J. Hand, “Discrimination and classification,” Wiley Series in Probability and Mathematical Statistics, Chichester: Wiley, 1981, vol. 1, 1981.
K. V. Mardia, J. T. Kent, and J. M. Bibby, “Multivariate analysis (probability and mathematical statistics),” 1980.
T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani, The elements of statistical learning, vol. 2. Springer, 2009.
S. Singh, S. Ruan, K. Choi, K. Pattipati, P. Willett, S. M. Namburu, S. Chigusa, D. V. Prokhorov, and L. Qiao, “An optimization-based method for dynamic multiple fault diagnosis problem,” in Aerospace Conference, 2007 IEEE, pp. 1–13, IEEE, 2007.
M. A. Carreira-Perpinan, “A review of dimension reduction techniques,” Department of Computer Science. University of Sheffield. Tech. Rep. CS-96-09, pp. 1–69, 1997.
I. K. Fodor, “A survey of dimension reduction techniques,” 2002.
J. T. Jolliffe, Principal Component Analysis. New York: Springer, 2010.
R. Bro, “Multiway calidration. multilinear pls,” Journal of Chemometrics, vol. 10, pp. 47–61, 1996.
S. Roberts and R. Everson, Independent component analysis: principles and practice. Cambridge University Press, 2001.
T.-W. Lee, Independent component analysis. Springer, 2010.
S. Kaski, “Dimensionality reduction by random mapping: Fast similarity computation for clustering,” in Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on, vol. 1, pp. 413–418, IEEE, 1998.
J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
T. Hastie and W. Stuetzle, “Principal curves,” Journal of the American Statistical Association, vol. 84, no. 406, pp. 502–516, 1989.
M. D. Ritchie, L. W. Hahn, N. Roodi, L. R. Bailey, W. D. Dupont, F. F. Parl, and J. H. Moore, “Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer,” The American Journal of Human Genetics, vol. 69, no. 1, pp. 138–147, 2001.
M. D. Ritchie, L. W. Hahn, and J. H. Moore, “Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity,” Genetic epidemiology, vol. 24, no. 2, pp. 150–157, 2003.
M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios, and N. Koudas, “Non-linear dimensionality reduction techniques for classification and visualization,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 645–651, ACM, 2002.
H. Ritter and T. Kohonen, “Self-organizing semantic maps,” Biological cybernetics, vol. 61, no. 4, pp. 241–254, 1989.
T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.
R. H. Shumway and D. S. Stoffer, Time series analysis and its applications: with R examples. Springer Science & Business Media, 2010.
K. Singh, S. C. Guntuku, A. Thakur, and C. Hota, “Big data analytics framework for peer-to-peer Botnet detection using random forests,” Information Sciences, vol. 278, pp. 488–497, 2014.
J. Camacho, G. Maciá-Fernández, J. Diaz-Verdejo, and P. Garcia-Teodoro, “Tackling the big data 4 vs for anomaly detection,” in Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on, pp. 500–505, IEEE, 2014.
M. A. Hayes and M. A. Capretz, “Contextual anomaly detection in big sensor data,” in 2014 IEEE International Congress on Big Data, pp. 64–71, IEEE, 2014.
B. Balasingam, M. Sankavaram, K. Choi, D. F. M. Ayala, D. Sidoti, K. Pattipati, P. Willett, C. Lintz, G. Commeau, F. Dorigo, et al., “Online anomaly detection in big data,” in Information Fusion (FUSION), 2014 17th International Conference on, pp. 1–8, IEEE, 2014.
D. Pasupuleti, P. Mannaru, B. Balasingam, M. Baum, K. Pattipati, P. Willett, C. Lintz, G. Commeau, F. Dorigo, and J. Fahrny, “Online playtime prediction for cognitive video streaming,” in Information Fusion (Fusion), 2015 18th International Conference on, pp. 1886–1891, IEEE, 2015.
J. E. Jackson, A user’s guide to principal components, vol. 587. John Wiley & Sons, 2005.
D. Zumoffen and M. Basualdo, “From large chemical plant data to fault diagnosis integrated to decentralized fault-tolerant control: pulp mill process application,” Industrial & Engineering Chemistry Research, vol. 47, no. 4, pp. 1201–1220, 2008.
D. Garcıa-Alvarez, “Fault detection using principal component analysis (PCA) in a wastewater treatment plant (wwtp),” in Proceedings of the International Student’s Scientific Conference, 2009.
G. H. Golub and C. F. Van Loan, Matrix computations, vol. 3. JHU Press, 2012.
Z. Meng, A. Wiesel, and A. Hero, “Distributed principal component analysis on networks via directed graphical models,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pp. 2877–2880, IEEE, 2012.
M. Basseville, I. V. Nikiforov, et al., Detection of abrupt changes: theory and application, vol. 104. Prentice Hall Englewood Cliffs, 1993.
E. Page, “Continuous inspection schemes,” Biometrika, pp. 100–115, 1954.
A. N. Shiryaev, “The problem of the most rapid detection of a disturbance in a stationary process,” Soviet Math. Dokl., no. 2, pp. 795–799, 1961.
Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with applications to tracking and navigation: theory algorithms and software. John Wiley & Sons, 2004.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Balasingam, B., Mannaru, P., Sidoti, D., Pattipati, K., Willett, P. (2017). Online Anomaly Detection in Big Data: The First Line of Defense Against Intruders. In: Pedrycz, W., Chen, SM. (eds) Data Science and Big Data: An Environment of Computational Intelligence. Studies in Big Data, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-319-53474-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-53474-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53473-2
Online ISBN: 978-3-319-53474-9
eBook Packages: EngineeringEngineering (R0)