Skip to main content

Online Anomaly Detection in Big Data: The First Line of Defense Against Intruders

  • Chapter
  • First Online:

Part of the book series: Studies in Big Data ((SBD,volume 24))

Abstract

We live in a world of abundance of information, but lack the ability to fully benefit from it, as succinctly described by John Naisbitt in his 1982 book, “we are drowning in information, but starved for knowledge”. The information, collected by various sensors and humans, is corrupted by noise, ambiguity and distortions and suffers from the data deluge problem. Combining the noisy, ambiguous and distorted information that comes from a variety of sources scattered around the globe in order to synthesize accurate and actionable knowledge is a challenging problem. To make things even more complex, there are intentionally developed intrusive mechanisms that aim to disturb accurate information fusion and knowledge extraction; these mechanisms include cyber attacks, cyber espionage and cyber crime, to name a few. Intrusion detection has become a major research focus over the past two decades and several intrusion detection approaches, such as rule-based, signature-based and computer intelligence based approaches were developed. Out of these, computational intelligence based anomaly detection mechanisms show the ability to handle hitherto unknown intrusions and attacks. However, these approaches suffer from two different issues: (i) they are not designed to detect similar attacks on a large number of devices, and (ii) they are not designed for quickest detection. In this chapter, we describe an approach that helps to scale-up existing computational intelligence approaches to implement quickest anomaly detection in millions of devices at the same time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Malware is an umbrella term used to refer to a variety of software intrusions, including viruses, worms, Trojan horses, ransomware, spyware, scareware, adware and so on. These can take the form of executable code, scripts, active content and other software. The majority of recent active malware threats were worms or Trojans rather than viruses [17]. When malware is used in a deliberate and concerted manner, as in APTs, one needs sophisticated monitoring and mitigating strategies to address them.

References

  1. H.-J. Liao, C.-H. R. Lin, Y.-C. Lin, and K.-Y. Tung, “Intrusion detection system: A comprehensive review,” Journal of Network and Computer Applications, vol. 36, no. 1, pp. 16–24, 2013.

    Article  Google Scholar 

  2. M. Shetty and N. Shekokar, “Data mining techniques for real time intrusion detection systems,” International Journal of Scientific & Engineering Research, vol. 3, no. 4, 2012.

    Google Scholar 

  3. C. Kolias, G. Kambourakis, and M. Maragoudakis, “Swarm intelligence in intrusion detection: A survey,” computers & security, vol. 30, no. 8, pp. 625–642, 2011.

    Article  Google Scholar 

  4. S. Shin, S. Lee, H. Kim, and S. Kim, “Advanced probabilistic approach for network intrusion forecasting and detection,” Expert Systems with Applications, vol. 40, no. 1, pp. 315–322, 2013.

    Article  Google Scholar 

  5. S. X. Wu and W. Banzhaf, “The use of computational intelligence in intrusion detection systems: A review,” Applied Soft Computing, vol. 10, no. 1, pp. 1–35, 2010.

    Article  Google Scholar 

  6. L. Akoglu, H. Tong, and D. Koutra, “Graph based anomaly detection and description: a survey,” Data Mining and Knowledge Discovery, vol. 29, no. 3, pp. 626–688, 2015.

    Article  MathSciNet  Google Scholar 

  7. G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in Proceedings of the 26th Annual Computer Security Applications Conference, pp. 1–9, ACM, 2010.

    Google Scholar 

  8. D. Savage, X. Zhang, X. Yu, P. Chou, and Q. Wang, “Anomaly detection in online social networks,” Social Networks, vol. 39, pp. 62–70, 2014.

    Article  Google Scholar 

  9. W. Xu, F. Zhang, and S. Zhu, “Toward worm detection in online social networks,” in Proceedings of the 26th Annual Computer Security Applications Conference, pp. 11–20, ACM, 2010.

    Google Scholar 

  10. P. Chen, L. Desmet, and C. Huygens, “A study on advanced persistent threats,” in IFIP International Conference on Communications and Multimedia Security, pp. 63–72, Springer, 2014.

    Google Scholar 

  11. D. Kushner, “The real story of stuxnet,” ieee Spectrum, vol. 3, no. 50, pp. 48–53, 2013.

    Article  Google Scholar 

  12. Symantec, “Symantec internet security threat report,” tech. rep., Symantec, 2011.

    Google Scholar 

  13. Fox-IT, “Interim report, diginotar cert authority breach,” tech. rep., Fox-IT Business Unit Cybercrime, Delft, 2011.

    Google Scholar 

  14. U. Rivner, “Anatomy of an attack”.

    Google Scholar 

  15. N. Villeneuve, J. T. Bennett, N. Moran, T. Haq, M. Scott, and K. Geers, Operation" Ke3chang: Targeted Attacks Against Ministries of Foreign Affairs. 2013.

    Google Scholar 

  16. D. Kindlund, X. Chen, M. Scott, and N. D. Moran, Ned anMoran, “Operation snowman: Deputydog actor compromises us veterans of foreign wars website,” 2014.

    Google Scholar 

  17. https://en.wikipedia.org/wiki/malware”.

  18. E. M. Hutchins, M. J. Cloppert, and R. M. Amin, “Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains,” Leading Issues in Information Warfare & Security Research, vol. 1, p. 80, 2011.

    Google Scholar 

  19. C. Tankard, “Advanced persistent threats and how to monitor and deter them,” Network security, vol. 2011, no. 8, pp. 16–19, 2011.

    Article  Google Scholar 

  20. L. Huang, X. Nguyen, M. Garofalakis, M. I. Jordan, A. Joseph, and N. Taft, “In-network PCA and anomaly detection,” in NIPS, vol. 19, 2006.

    Google Scholar 

  21. C. C. Aggarwal, “On abnormality detection in spuriously populated data streams.,” in SDM, SIAM, 2005.

    Google Scholar 

  22. D.-S. Pham, S. Venkatesh, M. Lazarescu, and S. Budhaditya, “Anomaly detection in large-scale data stream networks,” Data Mining and Knowledge Discovery, vol. 28, no. 1, pp. 145–189, 2014.

    Article  MATH  Google Scholar 

  23. X. Jiang and G. F. Cooper, “A real-time temporal bayesian architecture for event surveillance and its application to patient-specific multiple disease outbreak detection,” Data Mining and Knowledge Discovery, vol. 20, no. 3, pp. 328–360, 2010.

    Article  MathSciNet  Google Scholar 

  24. V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Computing Surveys (CSUR), vol. 41, no. 3, p. 15, 2009.

    Article  Google Scholar 

  25. V. Barnett and T. Lewis, Outliers in statistical data, vol. 3. Wiley New York, 1984.

    MATH  Google Scholar 

  26. A. Koufakou and M. Georgiopoulos, “A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes,” Data Mining and Knowledge Discovery, vol. 20, no. 2, pp. 259–289, 2010.

    Article  MathSciNet  Google Scholar 

  27. T. White, Hadoop: The Definitive Guide: The Definitive Guide. O’Reilly Media, 2009.

    Google Scholar 

  28. D. J. Hand, “Discrimination and classification,” Wiley Series in Probability and Mathematical Statistics, Chichester: Wiley, 1981, vol. 1, 1981.

    MATH  Google Scholar 

  29. K. V. Mardia, J. T. Kent, and J. M. Bibby, “Multivariate analysis (probability and mathematical statistics),” 1980.

    Google Scholar 

  30. T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani, The elements of statistical learning, vol. 2. Springer, 2009.

    Google Scholar 

  31. S. Singh, S. Ruan, K. Choi, K. Pattipati, P. Willett, S. M. Namburu, S. Chigusa, D. V. Prokhorov, and L. Qiao, “An optimization-based method for dynamic multiple fault diagnosis problem,” in Aerospace Conference, 2007 IEEE, pp. 1–13, IEEE, 2007.

    Google Scholar 

  32. M. A. Carreira-Perpinan, “A review of dimension reduction techniques,” Department of Computer Science. University of Sheffield. Tech. Rep. CS-96-09, pp. 1–69, 1997.

    Google Scholar 

  33. I. K. Fodor, “A survey of dimension reduction techniques,” 2002.

    Google Scholar 

  34. J. T. Jolliffe, Principal Component Analysis. New York: Springer, 2010.

    MATH  Google Scholar 

  35. R. Bro, “Multiway calidration. multilinear pls,” Journal of Chemometrics, vol. 10, pp. 47–61, 1996.

    Article  Google Scholar 

  36. S. Roberts and R. Everson, Independent component analysis: principles and practice. Cambridge University Press, 2001.

    Google Scholar 

  37. T.-W. Lee, Independent component analysis. Springer, 2010.

    Google Scholar 

  38. S. Kaski, “Dimensionality reduction by random mapping: Fast similarity computation for clustering,” in Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on, vol. 1, pp. 413–418, IEEE, 1998.

    Google Scholar 

  39. J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.

    Article  Google Scholar 

  40. T. Hastie and W. Stuetzle, “Principal curves,” Journal of the American Statistical Association, vol. 84, no. 406, pp. 502–516, 1989.

    Article  MathSciNet  MATH  Google Scholar 

  41. M. D. Ritchie, L. W. Hahn, N. Roodi, L. R. Bailey, W. D. Dupont, F. F. Parl, and J. H. Moore, “Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer,” The American Journal of Human Genetics, vol. 69, no. 1, pp. 138–147, 2001.

    Article  Google Scholar 

  42. M. D. Ritchie, L. W. Hahn, and J. H. Moore, “Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity,” Genetic epidemiology, vol. 24, no. 2, pp. 150–157, 2003.

    Article  Google Scholar 

  43. M. Vlachos, C. Domeniconi, D. Gunopulos, G. Kollios, and N. Koudas, “Non-linear dimensionality reduction techniques for classification and visualization,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 645–651, ACM, 2002.

    Google Scholar 

  44. H. Ritter and T. Kohonen, “Self-organizing semantic maps,” Biological cybernetics, vol. 61, no. 4, pp. 241–254, 1989.

    Article  Google Scholar 

  45. T. Kohonen, “The self-organizing map,” Proceedings of the IEEE, vol. 78, no. 9, pp. 1464–1480, 1990.

    Article  Google Scholar 

  46. R. H. Shumway and D. S. Stoffer, Time series analysis and its applications: with R examples. Springer Science & Business Media, 2010.

    Google Scholar 

  47. K. Singh, S. C. Guntuku, A. Thakur, and C. Hota, “Big data analytics framework for peer-to-peer Botnet detection using random forests,” Information Sciences, vol. 278, pp. 488–497, 2014.

    Article  Google Scholar 

  48. J. Camacho, G. Maciá-Fernández, J. Diaz-Verdejo, and P. Garcia-Teodoro, “Tackling the big data 4 vs for anomaly detection,” in Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on, pp. 500–505, IEEE, 2014.

    Google Scholar 

  49. M. A. Hayes and M. A. Capretz, “Contextual anomaly detection in big sensor data,” in 2014 IEEE International Congress on Big Data, pp. 64–71, IEEE, 2014.

    Google Scholar 

  50. B. Balasingam, M. Sankavaram, K. Choi, D. F. M. Ayala, D. Sidoti, K. Pattipati, P. Willett, C. Lintz, G. Commeau, F. Dorigo, et al., “Online anomaly detection in big data,” in Information Fusion (FUSION), 2014 17th International Conference on, pp. 1–8, IEEE, 2014.

    Google Scholar 

  51. D. Pasupuleti, P. Mannaru, B. Balasingam, M. Baum, K. Pattipati, P. Willett, C. Lintz, G. Commeau, F. Dorigo, and J. Fahrny, “Online playtime prediction for cognitive video streaming,” in Information Fusion (Fusion), 2015 18th International Conference on, pp. 1886–1891, IEEE, 2015.

    Google Scholar 

  52. J. E. Jackson, A user’s guide to principal components, vol. 587. John Wiley & Sons, 2005.

    Google Scholar 

  53. D. Zumoffen and M. Basualdo, “From large chemical plant data to fault diagnosis integrated to decentralized fault-tolerant control: pulp mill process application,” Industrial & Engineering Chemistry Research, vol. 47, no. 4, pp. 1201–1220, 2008.

    Article  Google Scholar 

  54. D. Garcıa-Alvarez, “Fault detection using principal component analysis (PCA) in a wastewater treatment plant (wwtp),” in Proceedings of the International Student’s Scientific Conference, 2009.

    Google Scholar 

  55. G. H. Golub and C. F. Van Loan, Matrix computations, vol. 3. JHU Press, 2012.

    Google Scholar 

  56. Z. Meng, A. Wiesel, and A. Hero, “Distributed principal component analysis on networks via directed graphical models,” in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pp. 2877–2880, IEEE, 2012.

    Google Scholar 

  57. M. Basseville, I. V. Nikiforov, et al., Detection of abrupt changes: theory and application, vol. 104. Prentice Hall Englewood Cliffs, 1993.

    Google Scholar 

  58. E. Page, “Continuous inspection schemes,” Biometrika, pp. 100–115, 1954.

    Google Scholar 

  59. A. N. Shiryaev, “The problem of the most rapid detection of a disturbance in a stationary process,” Soviet Math. Dokl., no. 2, pp. 795–799, 1961.

    MATH  Google Scholar 

  60. Y. Bar-Shalom, X. R. Li, and T. Kirubarajan, Estimation with applications to tracking and navigation: theory algorithms and software. John Wiley & Sons, 2004.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Balakumar Balasingam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Balasingam, B., Mannaru, P., Sidoti, D., Pattipati, K., Willett, P. (2017). Online Anomaly Detection in Big Data: The First Line of Defense Against Intruders. In: Pedrycz, W., Chen, SM. (eds) Data Science and Big Data: An Environment of Computational Intelligence. Studies in Big Data, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-319-53474-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53474-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53473-2

  • Online ISBN: 978-3-319-53474-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics