Identifying Suspicious Activities in Company Networks Through Data Mining and Visualization

  • Dieter Landes
  • Florian Otto
  • Sven Schumann
  • Frank Schlottke
Part of the Advanced Information and Knowledge Processing book series (AI&KP)


Company data are a precious asset which need to be truly authentic and must not be disclosed to unauthorized parties. In this contribution, we report on ongoing work that aims at supporting human IT security experts by pinpointing significant alerts that really need closer inspection. We developed an experimental tool environment to support the analysis of IT infrastructure data with data mining methods. In particular, various clustering algorithms are used to differentiate normal behavior from activities that call for intervention through IT security experts. Before being subjected to clustering, data can be pre-processed in various ways. In particular, categorical values can be cleverly mapped to numerical values while preserving the semantics of the data as far as possible. Resulting clusters can be subjected to visual inspection using techniques such as parallel coordinates or pixel-based techniques, e.g. circle segments or recursive patterns.

Preliminary results indicate that clustering is well suited to structure monitoring data appropriately. Also, fairly large data volumes can be clustered effectively and efficiently. Currently, the main focus is on more elaborate visualization and classification techniques.


Cluster Algorithm Data Object Categorical Attribute Common Neighbor Data Mining Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The SecMine project is supported under grant no. 17049X10 by Bundesministerium für Bildung und Forschung (BMBF). We thank Christian Bergmann, Toni Böhnlein, Sebastian Detsch, Thomas Geus, Steffen Hammer, Johannes Henninger, Matthias Herrmann, Sebastian Jakob, Daniel Klett, Adrian Köhlein, Evelyn Krüger, Benjamin Krull, Andreas Kühntopf, Hannes Müller, Marc Pieruschek, Markus Pütz, Markus Ring, Martin Rosenbaum, Manuel Schnapp, Tobias Schmidtlein, Christopher Schramm, Elena Tereshko, Melanie Westendorf, Thomas Worch, and Bernhard Sick for their contributions.


  1. 1.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high-dimensional data for data mining applications. In: Proc. 25th Int. Conference on Management of Data (SIGMOD’98), pp. 94–105 (1998) Google Scholar
  2. 2.
    Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: Proc. SIAM Int. Conference on Data Mining, pp. 243–254 (2008) Google Scholar
  3. 3.
    Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern-Oriented Software Architecture—A System of Patterns. Wiley, Chichester (1996) Google Scholar
  4. 4.
    Chaturvedi, A.D., Green, P.E., Carroll, J.D.: k-Means, k-medians, and k-modes: special cases of partitioning multiway data. In: Classification Society of North America Meeting, Houston (1994) Google Scholar
  5. 5.
    Chou, C.-H., Su, M.-C., Lai, E.: A new cluster validity measure and its application to image compression. PAA Pattern Anal. Appl. 7(2), 205–220 (2004) MathSciNetGoogle Scholar
  6. 6.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 224–227 (1979) CrossRefGoogle Scholar
  7. 7.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–38 (1977) MathSciNetMATHGoogle Scholar
  8. 8.
    Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974) MathSciNetCrossRefGoogle Scholar
  9. 9.
    Dutta, M., Kakoti Mahanta, A., Pujari, A.K.: QROCK: A quick version of the ROCK algorithm for clustering of categorical data. Pattern Recognit. Lett. 26, 2364–2373 (2005) CrossRefGoogle Scholar
  10. 10.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd Int. Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231 (1996) Google Scholar
  11. 11.
    Goil, S., Nagesh, H., Choudhary, A.: MAFIA: Efficient and scalable subspace clustering for very large data sets. Technical report CPDC-TR-9906-010, Northwestern University, Evanston (1999) Google Scholar
  12. 12.
    Guha, S., Rastogi, R., Shim, K.: ROCK; a robust clustering algorithm for categorical attributes. In: Proc. 15th Int. Conference on Data Engineering (ICDE’99), pp. 512–521 (1999) Google Scholar
  13. 13.
    Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Syst. 17(2/3), 107–145 (2001) MATHCrossRefGoogle Scholar
  14. 14.
    Han, J., Kamber, M., Pei, J.: Data Mining—Concepts and Techniques, 3rd edn. Morgan Kaufmann, Waltham (2012) MATHGoogle Scholar
  15. 15.
    Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. In: Data Mining and Knowledge Discovery, vol. 2, pp. 283–302 (1998) Google Scholar
  16. 16.
    Inselberg, A.: The plane with parallel coordinates. Vis. Comput. 1, 69–91 (1985) MATHCrossRefGoogle Scholar
  17. 17.
    Inselberg, A., Dimsdale, B.: Parallel coordinates: a tool for visualizing multidimensional geometry. In: Proc. 1st IEEE Conference on Visualization (Visualization’90), pp. 361–378 (1990) CrossRefGoogle Scholar
  18. 18.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988) MATHGoogle Scholar
  19. 19.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999) CrossRefGoogle Scholar
  20. 20.
    Keim, D., Kriegel, H.-P., Ankerst, M.: Recursive pattern: a technique for visualizing very large amounts of data. In: Proc. 6th IEEE Conference on Visualization (Visualization’95), pp. 279–286 (1995) CrossRefGoogle Scholar
  21. 21.
    Kozak, M.: Watch out for superman: first visualize, then analyze. IEEE Comput. Graphics Appl. 32(3), 6–9 (2012) MathSciNetCrossRefGoogle Scholar
  22. 22.
    Liu, Q., Dong, G.: CPCQ—contrast pattern based clustering quality index for categorical data. Pattern Recognit. 45, 1739–1748 (2012) CrossRefGoogle Scholar
  23. 23.
    Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: Proc. 10th Int. Conference on Data Mining (ICDM 2010), pp. 911–916 (2010) CrossRefGoogle Scholar
  24. 24.
    Lloyd, S.P.: Least squares optimization in PCM. Technical report, Bell Labs (1957). Also IEEE Trans. Inf. Theory 28(2), 129–137 (1982) Google Scholar
  25. 25.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967) Google Scholar
  26. 26.
    Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proc. 4th Int. Conference on the Practical Application of Knowledge Discovery and Data Mining, pp. 29–39 (2000) Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Dieter Landes
    • 1
  • Florian Otto
    • 1
  • Sven Schumann
    • 2
  • Frank Schlottke
    • 3
  1. 1.Coburg University of Applied Sciences and ArtsCoburgGermany
  2. 2.HUK COBURGCoburgGermany
  3. 3.Applied SecurityStockstadtGermany

Personalised recommendations