Automatic Graph-Based Clustering for Security Logs

  • Hudan StudiawanEmail author
  • Christian Payne
  • Ferdous Sohel
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 926)


Computer security events are recorded in several log files. It is necessary to cluster these logs to discover security threats, detect anomalies, or identify a particular error. A problem arises when large quantities of security log data need to be checked as existing tools do not provide sufficiently sophisticated grouping results. In addition, existing methods need user input parameters and it is not trivial to find optimal values for these. Therefore, we propose a method for the automatic clustering of security logs. First, we present a new graph-theoretic approach for security log clustering based on maximal clique percolation. Second, we add an intensity threshold to the obtained maximal clique to consider the edge weight before proceeds to the percolations. Third, we use the simulated annealing algorithm to optimize the number of percolations and intensity threshold for maximal clique percolation. The entire process is automatic and does not need any user input. Experimental results on various real-world datasets show that the proposed method achieves superior clustering results compared to other methods.



This work is supported by the Indonesia Lecturer Scholarship (BUDI) from Indonesia Endowment Fund or Education (LPDP), Ministry of Finance of Republic of Indonesia.


  1. 1.
    Abubaker, A., Baharum, A., Alrefaei, M.: Automatic clustering using multi-objective particle swarm and simulated annealing. PLoS One 10(7), e0130995 (2015)CrossRefGoogle Scholar
  2. 2.
    Basin, D., Schaller, P., Schläpfer, M.: Logging and log analysis. In: Applied Information Security, pp. 69–80. Springer, Heidelberg (2011)Google Scholar
  3. 3.
    Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Inc., Newton (2009)zbMATHGoogle Scholar
  4. 4.
    Bron, C., Kerbosch, J.: Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16(9), 575–577 (1973)CrossRefGoogle Scholar
  5. 5.
    Caliński, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Chuvakin, A.: Scan 34 2005 from The Honeynet Project (2005).
  7. 7.
    Chuvakin, A.: Free Honeynet Log Data for Research (2006).
  8. 8.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 1(2), 224–227 (1979)CrossRefGoogle Scholar
  9. 9.
    Farkas, I.J., Ábel, D., Palla, G., Vicsek, T.: Weighted network modules. New J. Phys. 9(6), 180 (2007)CrossRefGoogle Scholar
  10. 10.
    Fu, Q., Lou, J.G., Wang, Y., Li, J.: Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the 9th IEEE International Conference on Data Mining, pp. 149–158 (2009)Google Scholar
  11. 11.
    Geisshirt, K.: Pluggable Authentication Modules. Packt Publishing, Birmingham (2007)Google Scholar
  12. 12.
    Hagberg, A., Schult, D., Swart, P.: Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in Science Conference, pp. 11–15 (2008)Google Scholar
  13. 13.
    Harary, F.: Graph Theory. Addison-Wesley, Reading (1994)zbMATHGoogle Scholar
  14. 14.
    He, P., Zhu, J., He, S., Li, J., Lyu, M.R.: An evaluation study on log parsing and its use in log mining. In: Proceedings of the 46th IEEE/IFIP International Conference on Dependable Systems and Networks (2016)Google Scholar
  15. 15.
    Hofstede, R., Hendriks, L., Sperotto, A., Pras, A.: SSH Compromise Detection using NetFlow/IPFIX. ACM SIGCOMM Comput. Commun. Rev. 44(5), 20–26 (2014)CrossRefGoogle Scholar
  16. 16.
    Islam, H., Ahmed, T.: Anomaly clustering based on correspondence analysis. In: Proceedings of the 32nd IEEE International Conference on Advanced Information Networking and Applications, pp. 1019–1025 (2018)Google Scholar
  17. 17.
    Joshi, B., Bista, U., Ghimire, M.: Intelligent clustering scheme for log data streams. In: Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing, pp. 454–465 (2014)Google Scholar
  18. 18.
    Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Landauer, M., Wurzenberger, M., Skopik, F., Settanni, G., Filzmoser, P.: Dynamic log file analysis: an unsupervised cluster evolution approach for anomaly detection. Comput. Secur. 79, 94–116 (2018)CrossRefGoogle Scholar
  20. 20.
    Lin, Q., Zhang, H., Lou, J.G., Zhang, Y., Chen, X.: Log clustering based problem identification for online service systems. In: Proceedings of the 38th International Conference on Software Engineering Companion, pp. 102–111 (2016)Google Scholar
  21. 21.
    Makanju, A., Zincir-Heywood, A.N., Milios, E.E.: Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1255–1264 (2009)Google Scholar
  22. 22.
    Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)CrossRefGoogle Scholar
  23. 23.
    National CyberWatch Center: Snort fast alert logs from The U.S. National CyberWatch (MACCDC) (2012).
  24. 24.
    Reid, F., McDaid, A., Hurley, N.: Percolation computation in complex networks. In: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 274–281 (2012)Google Scholar
  25. 25.
    Sconzo, M.: security data samples repository (2014).
  26. 26.
    Studiawan, H., Payne, C., Sohel, F.: Graph clustering and anomaly detection of access control log for forensic purposes. Digit. Invest. 21(June), 76–87 (2017)CrossRefGoogle Scholar
  27. 27.
    Studiawan, H., Sohel, F., Payne, C.: Automatic log parser to support forensic analysis. In: Proceedings of the 16th Australian Digital Forensics Conference, pp. 1–10 (2018)Google Scholar
  28. 28.
    Sun, L.X., Danzer, K.: Fuzzy cluster analysis by simulated annealing. J. Chemometr. 10, 325–342 (1996)CrossRefGoogle Scholar
  29. 29.
    Taerat, N., Brandt, J., Gentile, A., Wong, M., Leangsuksun, C.: Baler: deterministic, lossless log message clustering tool. Comput. Sci. - Res. Dev. 26(3–4), 285–295 (2011)CrossRefGoogle Scholar
  30. 30.
    Tang, L., Li, T., Perng, C.S.: LogSig: generating system events from raw textual logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 785–794 (2011)Google Scholar
  31. 31.
    Vaarandi, R.: A data clustering algorithm for mining patterns from event logs. In: Proceedings of the IEEE Workshop on IP Operations and Management, pp. 119–126 (2003)Google Scholar
  32. 32.
    Vaarandi, R., Pihelgas, M.: LogCluster - a data clustering and pattern mining algorithm for event logs. In: Proceedings of the 11th International Conference on Network and Service Management, pp. 1–7 (2015)Google Scholar
  33. 33.
    Yang, W., Rueda, L., Ngom, A.: A simulated annealing approach to find the optimal parameters for fuzzy clustering microarray data. In: Proceedings of the 25th International Conference of the Chilean Computer Science Society, pp. 45–54 (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Hudan Studiawan
    • 1
    Email author
  • Christian Payne
    • 1
  • Ferdous Sohel
    • 1
  1. 1.Discipline of Information Technology, Mathematics, and StatisticsMurdoch UniversityPerthAustralia

Personalised recommendations