String Kernel Based SVM for Internet Security Implementation

  • Zbynek Michlovský
  • Shaoning Pang
  • Nikola Kasabov
  • Tao Ban
  • Youki Kadobayashi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5864)

Abstract

For network intrusion and virus detection, ordinary methods detect malicious network traffic and viruses by examining packets, flow logs or content of memory for any signatures of the attack. This implies that if no signature is known/created in advance, attack detection will be problematical. Addressing unknown attacks detection, we develop in this paper a network traffic and spam analyzer using a string kernel based SVM (support vector machine) supervised machine learning. The proposed method is capable of detecting network attack without known/earlier determined attack signatures, as SVM automatically learning attack signatures from traffic data. For application to internet security, we have implemented the proposed method for spam email detection over the SpamAssasin and E. M. Canada datasets, and network application authentication via real connection data analysis. The obtained above 99% accuracies have demonstrated the usefulness of string kernel SVMs on network security for either detecting ‘abnormal’ or protecting ‘normal’ traffic.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Chang, C.-C., Lin, C.-J.: LIBSVM:a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
  2. 2.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, New York (2004)Google Scholar
  3. 3.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, Hoboken (2000)Google Scholar
  4. 4.
    Charras, C., Lecroqk, T.: Sequence comparison (1998), http://www-igm.univ-mlv.fr/~lecroq/seqcomp/index.html
  5. 5.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444Google Scholar
  6. 6.
    Fisk, M., Varghese, G.: Applying Fast String Matching to Intrusion Detection (September 2002)Google Scholar
  7. 7.
    Aizerman, A., Braverman, E.M., Rozoner, L.I.: Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control 25, 821–837 (1964)Google Scholar
  8. 8.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: COLT 1992: Proceedings of the fifth annual workshop on Computational learning theory, pp. 144–152. ACM, New York (1992)CrossRefGoogle Scholar
  9. 9.
    Yuan, G.-X., Chang, C.-C., Lin, C.-J.: LIBSVM: libsvm experimental code for string inputs, http://140.112.30.28/~cjlin/libsvmtools/string/libsvm-2.88-string.zip
  10. 10.
    Scarfone, K., Mell, P.: Guide to intrusion detection and prevention systems (idps). In: NIST: National Institute of Standards and Technology (2007), http://csrc.nist.gov/publications/nistpubs/800-94/SP800-94.pdf
  11. 11.
    Vapnik, V.N.: The nature of statistical learning. Springer, New York (1995)MATHGoogle Scholar
  12. 12.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)Google Scholar
  13. 13.
    Caswell, B., Beale, J., Foster, J.C., Faircloth, J.: Snort 2.0 Intrusion Detection. Syngress (2003), http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/1931836744
  14. 14.
    Whitman, M.E., Mattord, H.J.: Principles of Information Security. Course Technology Press, Boston (2004)Google Scholar
  15. 15.
    Combs, G., et al.: Wireshark: network protocol analyzer, http://www.wireshark.org/
  16. 16.
    Elson, J.: tcpflow: tcpflow reconstructs the actual data streams and stores each flow in a separate file for later analysis, http://www.circlemud.org/jelson/software/tcpflow/
  17. 17.
    Bogomolny, A.: Distance Between Strings, http://www.cut-the-knot.org/doyouknow/Strings.shtml
  18. 18.
    SpamAssassin public mail corpus, http://spamassassin.apache.org/publiccorpus/
  19. 19.
  20. 20.
    Lai, C.-C.: An empirical study of three machine learning methods for spam filtering. Knowledge-Based Systems 20, 249–254 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Zbynek Michlovský
    • 1
  • Shaoning Pang
    • 1
  • Nikola Kasabov
    • 1
  • Tao Ban
    • 2
  • Youki Kadobayashi
    • 2
  1. 1.Knowledge Engineering & Discover Research InstituteAuckland University of TechnologyAucklandNew Zealand
  2. 2.Information Security Research CenterNational Institute of Information and Communications TechnologyTokyoJapan

Personalised recommendations