Online Anomaly Detection in Big Data: The First Line of Defense Against Intruders

Balasingam, Balakumar; Mannaru, Pujitha; Sidoti, David; Pattipati, Krishna; Willett, Peter

doi:10.1007/978-3-319-53474-9_4

Online Anomaly Detection in Big Data: The First Line of Defense Against Intruders

Balakumar Balasingam⁴,
Pujitha Mannaru⁴,
David Sidoti⁴,
Krishna Pattipati⁴ &
…
Peter Willett⁴

Chapter
First Online: 22 March 2017

3491 Accesses
2 Citations
2 Altmetric

Part of the book series: Studies in Big Data ((SBD,volume 24))

Abstract

We live in a world of abundance of information, but lack the ability to fully benefit from it, as succinctly described by John Naisbitt in his 1982 book, “we are drowning in information, but starved for knowledge”. The information, collected by various sensors and humans, is corrupted by noise, ambiguity and distortions and suffers from the data deluge problem. Combining the noisy, ambiguous and distorted information that comes from a variety of sources scattered around the globe in order to synthesize accurate and actionable knowledge is a challenging problem. To make things even more complex, there are intentionally developed intrusive mechanisms that aim to disturb accurate information fusion and knowledge extraction; these mechanisms include cyber attacks, cyber espionage and cyber crime, to name a few. Intrusion detection has become a major research focus over the past two decades and several intrusion detection approaches, such as rule-based, signature-based and computer intelligence based approaches were developed. Out of these, computational intelligence based anomaly detection mechanisms show the ability to handle hitherto unknown intrusions and attacks. However, these approaches suffer from two different issues: (i) they are not designed to detect similar attacks on a large number of devices, and (ii) they are not designed for quickest detection. In this chapter, we describe an approach that helps to scale-up existing computational intelligence approaches to implement quickest anomaly detection in millions of devices at the same time.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Malware is an umbrella term used to refer to a variety of software intrusions, including viruses, worms, Trojan horses, ransomware, spyware, scareware, adware and so on. These can take the form of executable code, scripts, active content and other software. The majority of recent active malware threats were worms or Trojans rather than viruses [17]. When malware is used in a deliberate and concerted manner, as in APTs, one needs sophisticated monitoring and mitigating strategies to address them.

References

H.-J. Liao, C.-H. R. Lin, Y.-C. Lin, and K.-Y. Tung, “Intrusion detection system: A comprehensive review,” Journal of Network and Computer Applications, vol. 36, no. 1, pp. 16–24, 2013.
Article Google Scholar
M. Shetty and N. Shekokar, “Data mining techniques for real time intrusion detection systems,” International Journal of Scientific & Engineering Research, vol. 3, no. 4, 2012.
Google Scholar
C. Kolias, G. Kambourakis, and M. Maragoudakis, “Swarm intelligence in intrusion detection: A survey,” computers & security, vol. 30, no. 8, pp. 625–642, 2011.
Article Google Scholar
S. Shin, S. Lee, H. Kim, and S. Kim, “Advanced probabilistic approach for network intrusion forecasting and detection,” Expert Systems with Applications, vol. 40, no. 1, pp. 315–322, 2013.
Article Google Scholar
S. X. Wu and W. Banzhaf, “The use of computational intelligence in intrusion detection systems: A review,” Applied Soft Computing, vol. 10, no. 1, pp. 1–35, 2010.
Article Google Scholar
L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: a survey,” Data Mining and Knowledge Discovery, vol. 29, no. 3, pp. 626–688, 2015.
Article MathSciNet Google Scholar
G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in Proceedings of the 26th Annual Computer Security Applications Conference, pp. 1–9, ACM, 2010.
Google Scholar
D. Savage, X. Zhang, X. Yu, P. Chou, and Q. Wang, “Anomaly detection in online social networks,” Social Networks, vol. 39, pp. 62–70, 2014.
Article Google Scholar
W. Xu, F. Zhang, and S. Zhu, “Toward worm detection in online social networks,” in Proceedings of the 26th Annual Computer Security Applications Conference, pp. 11–20, ACM, 2010.
Google Scholar
P. Chen, L. Desmet, and C. Huygens, “A study on advanced persistent threats,” in IFIP International Conference on Communications and Multimedia Security, pp. 63–72, Springer, 2014.
Google Scholar
D. Kushner, “The real story of stuxnet,” ieee Spectrum, vol. 3, no. 50, pp. 48–53, 2013.
Article Google Scholar
Symantec, “Symantec internet security threat report,” tech. rep., Symantec, 2011.
Google Scholar
Fox-IT, “Interim report, diginotar cert authority breach,” tech. rep., Fox-IT Business Unit Cybercrime, Delft, 2011.
Google Scholar
U. Rivner, “Anatomy of an attack”.
Google Scholar
N. Villeneuve, J. T. Bennett, N. Moran, T. Haq, M. Scott, and K. Geers, Operation" Ke3chang: Targeted Attacks Against Ministries of Foreign Affairs. 2013.
Google Scholar
D. Kindlund, X. Chen, M. Scott, and N. D. Moran, Ned anMoran, “Operation snowman: Deputydog actor compromises us veterans of foreign wars website,” 2014.
Google Scholar
“https://en.wikipedia.org/wiki/malware”.
E. M. Hutchins, M. J. Cloppert, and R. M. Amin, “Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains,” Leading Issues in Information Warfare & Security Research, vol. 1, p. 80, 2011.
Google Scholar
C. Tankard, “Advanced persistent threats and how to monitor and deter them,” Network security, vol. 2011, no. 8, pp. 16–19, 2011.
Article Google Scholar
L. Huang, X. Nguyen, M. Garofalakis, M. I. Jordan, A. Joseph, and N. Taft, “In-network PCA and anomaly detection,” in NIPS, vol. 19, 2006.
Google Scholar
C. C. Aggarwal, “On abnormality detection in spuriously populated data streams.,” in SDM, SIAM, 2005.
Google Scholar
D.-S. Pham, S. Venkatesh, M. Lazarescu, and S. Budhaditya, “Anomaly detection in large-scale data stream networks,” Data Mining and Knowledge Discovery, vol. 28, no. 1, pp. 145–189, 2014.
Article MATH Google Scholar
X. Jiang and G. F. Cooper, “A real-time temporal bayesian architecture for event surveillance and its application to patient-specific multiple disease outbreak detection,” Data Mining and Knowledge Discovery, vol. 20, no. 3, pp. 328–360, 2010.
Article MathSciNet Google Scholar
V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Computing Surveys (CSUR), vol. 41, no. 3, p. 15, 2009.
Article Google Scholar
V. Barnett and T. Lewis, Outliers in statistical data, vol. 3. Wiley New York, 1984.
MATH Google Scholar
A. Koufakou and M. Georgiopoulos, “A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes,” Data Mining and Knowledge Discovery, vol. 20, no. 2, pp. 259–289, 2010.
Article MathSciNet Google Scholar
T. White, Hadoop: The Definitive Guide: The Definitive Guide. O’Reilly Media, 2009.
Google Scholar
D. J. Hand, “Discrimination and classification,” Wiley Series in Probability and Mathematical Statistics, Chichester: Wiley, 1981, vol. 1, 1981.
MATH Google Scholar
K. V. Mardia, J. T. Kent, and J. M. Bibby, “Multivariate analysis (probability and mathematical statistics),” 1980.
Google Scholar
T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani, The elements of statistical learning, vol. 2. Springer, 2009.
Google Scholar
S. Singh, S. Ruan, K. Choi, K. Pattipati, P. Willett, S. M. Namburu, S. Chigusa, D. V. Prokhorov, and L. Qiao, “An optimization-based method for dynamic multiple fault diagnosis problem,” in Aerospace Conference, 2007 IEEE, pp. 1–13, IEEE, 2007.
Google Scholar
M. A. Carreira-Perpinan, “A review of dimension reduction techniques,” Department of Computer Science. University of Sheffield. Tech. Rep. CS-96-09, pp. 1–69, 1997.
Google Scholar
I. K. Fodor, “A survey of dimension reduction techniques,” 2002.
Google Scholar
J. T. Jolliffe, Principal Component Analysis. New York: Springer, 2010.
MATH Google Scholar
R. Bro, “Multiway calidration. multilinear pls,” Journal of Chemometrics, vol. 10, pp. 47–61, 1996.
Article Google Scholar
S. Roberts and R. Everson, Independent component analysis: principles and practice. Cambridge University Press, 2001.
Google Scholar
T.-W. Lee, Independent component analysis. Springer, 2010.
Google Scholar
S. Kaski, “Dimensionality reduction by random mapping: Fast similarity computation for clustering,” in Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on, vol. 1, pp. 413–418, IEEE, 1998.
Google Scholar
J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
Article Google Scholar
T. Hastie and W. Stuetzle, “Principal curves,” Journal of the American Statistical Association, vol. 84, no. 406, pp. 502–516, 1989.
Article MathSciNet MATH Google Scholar
M. D. Ritchie, L. W. Hahn, N. Roodi, L. R. Bailey, W. D. Dupont, F. F. Parl, and J. H. Moore, “Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer,” The American Journal of Human Genetics, vol. 69, no. 1, pp. 138–147, 2001.
Article Google Scholar
M. D. Ritchie, L. W. Hahn, and J. H. Moore, “Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity,” Genetic epidemiology, vol. 24, no. 2, pp. 150–157, 2003.
Article Google Scholar
M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios, and N. Koudas, “Non-linear dimensionality reduction techniques for classification and visualization,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 645–651, ACM, 2002.
Google Scholar
H. Ritter and T. Kohonen, “Self-organizing semantic maps,” Biological cybernetics, vol. 61, no. 4, pp. 241–254, 1989.
Article Google Scholar
T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.
Article Google Scholar
R. H. Shumway and D. S. Stoffer, Time series analysis and its applications: with R examples. Springer Science & Business Media, 2010.
Google Scholar
K. Singh, S. C. Guntuku, A. Thakur, and C. Hota, “Big data analytics framework for peer-to-peer Botnet detection using random forests,” Information Sciences, vol. 278, pp. 488–497, 2014.
Article Google Scholar
J. Camacho, G. Maciá-Fernández, J. Diaz-Verdejo, and P. Garcia-Teodoro, “Tackling the big data 4 vs for anomaly detection,” in Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on, pp. 500–505, IEEE, 2014.
Google Scholar
M. A. Hayes and M. A. Capretz, “Contextual anomaly detection in big sensor data,” in 2014 IEEE International Congress on Big Data, pp. 64–71, IEEE, 2014.
Google Scholar
B. Balasingam, M. Sankavaram, K. Choi, D. F. M. Ayala, D. Sidoti, K. Pattipati, P. Willett, C. Lintz, G. Commeau, F. Dorigo, et al., “Online anomaly detection in big data,” in Information Fusion (FUSION), 2014 17th International Conference on, pp. 1–8, IEEE, 2014.
Google Scholar
D. Pasupuleti, P. Mannaru, B. Balasingam, M. Baum, K. Pattipati, P. Willett, C. Lintz, G. Commeau, F. Dorigo, and J. Fahrny, “Online playtime prediction for cognitive video streaming,” in Information Fusion (Fusion), 2015 18th International Conference on, pp. 1886–1891, IEEE, 2015.
Google Scholar
J. E. Jackson, A user’s guide to principal components, vol. 587. John Wiley & Sons, 2005.
Google Scholar
D. Zumoffen and M. Basualdo, “From large chemical plant data to fault diagnosis integrated to decentralized fault-tolerant control: pulp mill process application,” Industrial & Engineering Chemistry Research, vol. 47, no. 4, pp. 1201–1220, 2008.
Article Google Scholar
D. Garcıa-Alvarez, “Fault detection using principal component analysis (PCA) in a wastewater treatment plant (wwtp),” in Proceedings of the International Student’s Scientific Conference, 2009.
Google Scholar
G. H. Golub and C. F. Van Loan, Matrix computations, vol. 3. JHU Press, 2012.
Google Scholar
Z. Meng, A. Wiesel, and A. Hero, “Distributed principal component analysis on networks via directed graphical models,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pp. 2877–2880, IEEE, 2012.
Google Scholar
M. Basseville, I. V. Nikiforov, et al., Detection of abrupt changes: theory and application, vol. 104. Prentice Hall Englewood Cliffs, 1993.
Google Scholar
E. Page, “Continuous inspection schemes,” Biometrika, pp. 100–115, 1954.
Google Scholar
A. N. Shiryaev, “The problem of the most rapid detection of a disturbance in a stationary process,” Soviet Math. Dokl., no. 2, pp. 795–799, 1961.
MATH Google Scholar
Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with applications to tracking and navigation: theory algorithms and software. John Wiley & Sons, 2004.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Connecticut, 371 Fairfield Way, U-4157, Storrs, CT, 06269, USA
Balakumar Balasingam, Pujitha Mannaru, David Sidoti, Krishna Pattipati & Peter Willett

Authors

Balakumar Balasingam
View author publications
You can also search for this author in PubMed Google Scholar
Pujitha Mannaru
View author publications
You can also search for this author in PubMed Google Scholar
David Sidoti
View author publications
You can also search for this author in PubMed Google Scholar
Krishna Pattipati
View author publications
You can also search for this author in PubMed Google Scholar
Peter Willett
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Balakumar Balasingam .

Editor information

Editors and Affiliations

Electrical & Computer Engineering, University of Alberta Electrical & Computer Engineering, Edmonton AL, Canada
Witold Pedrycz
Dept of CS and Information Engineering, National Taiwan Univ of Science and Tech Dept of CS and Information Engineering, Taipei, Taiwan
Shyi-Ming Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Balasingam, B., Mannaru, P., Sidoti, D., Pattipati, K., Willett, P. (2017). Online Anomaly Detection in Big Data: The First Line of Defense Against Intruders. In: Pedrycz, W., Chen, SM. (eds) Data Science and Big Data: An Environment of Computational Intelligence. Studies in Big Data, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-319-53474-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-53474-9_4
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53473-2
Online ISBN: 978-3-319-53474-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics