Leveraging Event Structure for Adaptive Machine Learning on Big Data Landscapes

  • Amir AzodiEmail author
  • Marian Gawron
  • Andrey Sapegin
  • Feng Cheng
  • Christoph Meinel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9395)


Modern machine learning techniques have been applied to many aspects of network analytics in order to discover patterns that can clarify or better demonstrate the behavior of users and systems within a given network. Often the information to be processed has to be converted to a different type in order for machine learning algorithms to be able to process them. To accurately process the information generated by systems within a network, the true intention and meaning behind the information must be observed. In this paper we propose different approaches for mapping network information such as IP addresses to integer values that attempts to keep the relation present in the original format of the information intact. With one exception, all of the proposed mappings result in (at most) 64 bit long outputs in order to allow atomic operations using CPUs with 64 bit registers. The mapping output size is restricted in the interest of performance. Additionally we demonstrate the benefits of the new mappings for one specific machine learning algorithm (k-means) and compare the algorithm’s results for datasets with and without the proposed transformations.


Machine learning Network monitoring Traffic classification Event normalization 


  1. 1.
    Cheng, F., Meinel, C., Azodi, A., Jaeger, D.: A new approach to building a multi-tier direct access knowledgebase for ids/siem systems. In: Proceedings of the 11th IEEE International Conference on Dependable, Autonomic and Secure Computing (DASC2013), Chengdu, China, 12 2013. IEEE CS (2013)Google Scholar
  2. 2.
    Cheng, F., Meinel, C., Azodi, A., Jaeger, D.: Pushing the limits in event normalisation to improve attack detection in ids/siem systems. In: Proceedings of the 1st International Conference on Advanced Cloud and Big Data, Nanjing, China, 12 2013. IEEE CS (2013)Google Scholar
  3. 3.
    Azodi, A., Gawron, M., Cheng, F., Meinel, C., Sapegin, A., Jaeger, D.: Hierarchical object log format for normalization of security events. In: Proceedings of the 9th International Conference on Information Assurance and Security (IAS 2013), Tunis, Tunisia, 12 2013. IEEE CS (2013)Google Scholar
  4. 4.
    Aumasson, J.-P., Bernstein, D.J.: Siphash: a fast short-input prf, Jan 2015.
  5. 5.
    Brink, H., Richards, J.: Real-world machine learning. In: MEAP, pp. 1–22 (2014)Google Scholar
  6. 6.
    Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2(2), 121–167 (1998)CrossRefGoogle Scholar
  7. 7.
    Consul, P.C., Famoye, F.: Generalized poisson distribution. In: Lagrangian Probability Distributions, pp. 165–190 (2006)Google Scholar
  8. 8.
    Fangohr, H.: Performance of python’s long data type, Jan 2013.
  9. 9.
    Google Inc. Cityhash provides hash functions for strings, Jan 2010.
  10. 10.
    Google Inc. The farmhash family of hash functions, Jan 2015.
  11. 11.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)CrossRefzbMATHGoogle Scholar
  12. 12.
    Okabe, A., Boots, B., Sugihara, K., Chiu, S.N.: Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, vol. 501. Wiley, New York (2009)zbMATHGoogle Scholar
  13. 13.
    Schneider, P.: Tcp/ip traffic classification based on port numbers. Division of Applied Sciences, Cambridge, MA, 2138 (1996)Google Scholar
  14. 14.
    Schreiber, T.: A voronoi diagram based adaptive k-means-type clustering algorithm for multidimensional weighted data. In: Bieri, H., Noltemeier, H. (eds.) CG-WS 1991. LNCS, vol. 553, pp. 265–275. Springer, Heidelberg (1991) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Amir Azodi
    • 1
    Email author
  • Marian Gawron
    • 1
  • Andrey Sapegin
    • 1
  • Feng Cheng
    • 1
  • Christoph Meinel
    • 1
  1. 1.Hasso Plattner Institute (HPI)University of PotsdamPotsdamGermany

Personalised recommendations