Abstract
Cyber security is the major concern in today’s highly networked environment and logging is the primary way of tracking compliance with the security policies. However analyzing the massive amount of logs has become a “Big Data” problem. Apache Spark is one of the latest and most notable incarnation of Data Flow Models in cluster computing. In terms of security log analysis, it provides an exceptional batch or interactive working environment. In this study, Apache Spark along with its distinctive features is briefly introduced, the challenges related to security logs analyzes are discussed and then some of Spark’s security log analyzing capabilities are demonstrated through a problem related to big security logs. Finally, a sample Spark Application is presented that extracts statistics relevant to the problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
Spark uses HDFS as distributed file system environment amongst cluster nodes.
- 6.
- 7.
References
Kent, K., Souppaya, M.: Guide to computer security log management. recommendations of the national institute of standards and technology (2006). http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-92.pdf. Accessed 10 Jan 2016
Fekete, R.: Log message classification with syslog-ng (2010). http://lwn.net/Articles/369075/. Accessed 10 Jan 2016
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI 2012, p. 2. USENIX Association, Berkeley (2012)
Spark, A.: Apache spark web site. http://spark.apache.org. Accessed 10 Jan 2016
Kreps, J.: The log: What every software engineer should know about real-time data’s unifying abstraction (2013). https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying. Accessed 14 Jan 2016
Dean, A.: The three eras of business data processing (2014). http://snowplowanalytics.com/blog/2014/01/20/the-three-eras-of-business-data-processing. Accessed 14 Jan 2016
codecondo.com: 8 tools for log monitoring and processing big data (2014). http://codecondo.com/8-tools-for-log-monitoring-and-processing-big-data. Accessed 14 Jan 2016
Kobielus, J.: Big data log analysis thrives on machine learning (2014). http://www.infoworld.com/article/2608064/big-data/big-data-log-analysis-thrives-on-machine-learning.html. Accessed 14 Jan 2016
Zeltser, L.: Critical log review checklist for security incidents (2015). https://zeltser.com/security-incident-log-review-checklist. Accessed 14 Jan 2016
databricks.com: Log analysis with spark. https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/index.html. Accessed 14 Jan 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Oktay, T., Sayar, A. (2017). Analyzing Big Security Logs in Cluster with Apache Spark. In: Angelov, P., Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds) Advances in Big Data. INNS 2016. Advances in Intelligent Systems and Computing, vol 529. Springer, Cham. https://doi.org/10.1007/978-3-319-47898-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-47898-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47897-5
Online ISBN: 978-3-319-47898-2
eBook Packages: EngineeringEngineering (R0)