Skip to main content

Clustering-Based Protocol Classification via Dimensionality Reduction

  • Chapter
  • First Online:
Cyber Security: Analytics, Technology and Automation

Part of the book series: Intelligent Systems, Control and Automation: Science and Engineering ((ISCA,volume 78))

  • 2919 Accesses

Abstract

We propose a unique framework that is based upon diffusion processes and other methodologies for finding meaningful geometric descriptions in high-dimensional datasets. We will show that the eigenfunctions of the generated underlying Markov matrices can be used to construct diffusion processes that generate efficient representations of complex geometric structures for high-dimensional data analysis. This is done by non-linear transformations that identify geometric patterns in these huge datasets that find the connections among them while projecting them onto low dimensional spaces. Our methods automatically classify and recognize network protocols. The main core of the proposed methodology is based upon training the system to extract heterogeneous features that automatically (unsupervised) classify network protocols. Then, the algorithms are capable to classify and recognize in real-time incoming network data. The algorithms are capable to cluster the data into manifolds that are embedded in low-dimensional space, analyzed and visualized. In addition, the methodology parameterized the data in the low-dimensional space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: SODA ‘07 Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, Philadelphia, PA. SIAM, pp 1027–1035

    Google Scholar 

  • Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Department of Information and Computer Science. http://www.ics.uci.edu/mlearn/MLRepository.html

  • Chung FRK (1997) Spectral graph theory, volume 92 of CBMS Regional Conference Series in Mathematics. AMS

    Google Scholar 

  • Coifman RR, Lafon S (2006a) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30

    Article  MATH  MathSciNet  Google Scholar 

  • Coifman RR, Lafon S (2006b) Geometric harmonics: a novel tool for multiscale out-of-sample extension of empirical functions. Appl Comput Harmon Anal 21(1):31–52

    Article  MATH  MathSciNet  Google Scholar 

  • Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc Natl Acad Sci USA 102(21):7426–7431

    Article  Google Scholar 

  • David G (2009) Anomaly detection and classification via diffusion processes in hyper-networks. PhD thesis, Tel Aviv University

    Google Scholar 

  • Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871

    Article  Google Scholar 

  • Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304

    Article  Google Scholar 

  • Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

    Article  Google Scholar 

  • Lafon S, Keller Y, Coifman RR (2006) Data fusion and multicue data matching by diffusion maps. IEEE Trans Pattern Anal Mach Intell 28(11):1784–1797

    Article  Google Scholar 

  • Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inform Theory 28(2):129–137

    Article  MATH  MathSciNet  Google Scholar 

  • MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1, Berkeley, CA. University of California Press, pp 281–297

    Google Scholar 

  • Nadler B, Lafon S, Coifman RR, Kevrekidis IG (2006) Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Appl Comput Harmon Anal 21(1):113–127

    Article  MATH  MathSciNet  Google Scholar 

  • Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(6):559–572

    Article  Google Scholar 

  • Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Boston

    Google Scholar 

  • Zhang R, Rudnicky AI (2002) A large scale clustering scheme for kernel k-means. In: Proceedings of the 16th international conference on pattern recognition (ICPR 02), vol 4, New York. IEEE, pp 289–292

    Google Scholar 

  • Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: An efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gil David .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

David, G. (2015). Clustering-Based Protocol Classification via Dimensionality Reduction. In: Lehto, M., Neittaanmäki, P. (eds) Cyber Security: Analytics, Technology and Automation. Intelligent Systems, Control and Automation: Science and Engineering, vol 78. Springer, Cham. https://doi.org/10.1007/978-3-319-18302-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18302-2_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18301-5

  • Online ISBN: 978-3-319-18302-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics