Advertisement

A Fast Bit-Parallel Algorithm for Gapped String Kernels

  • Chuanhuan Yin
  • Shengfeng Tian
  • Shaomin Mu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4232)

Abstract

In this paper, we present a new kind of gapped string kernel, named length-weighted kernels, including p-length-weighted and all-length-weighted kernels. Moreover, we propose a dynamic programming algorithm based on suffix kernel to compute the length-weighted kernels. Given strings s and t, and a gap penalty λ, all-length-weighted kernel can be calculated in time O(|s||t|) using our algorithms. Based on the relationship between all-length and p-length kernels, the p-length-weighted can be computed in O(p|s||t|) time. Furthermore, a bit-parallel technique is used to reduce the complexity from O(p|s||t|) to O(⌈pk/w⌉|s||t|), where w is the word size of the machine (e.g. 32 or 64 in practice) and k is determined by the longest matching subsequence of two strings s and t. The empirical results suggest that this bit-parallel technique algorithm combined with dynamic programming and suffix kernel technique outperforms the other approaches in some cases where the necessary condition of using bit-parallel technique can be satisfied.

Keywords

Dynamic Programming Intrusion Detection Weighted Kernel Suffix Tree Word Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Vapnik, V.N.: The nature of statistical learning theory. Springer, New York (2000)MATHGoogle Scholar
  2. 2.
    Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)Google Scholar
  3. 3.
    Shawe-Taylor, C., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press, Cambridge (2004)Google Scholar
  4. 4.
    Watkins, C.: Dynamic alignment kernels. Technical Report CSD-TR-98-11, Department of Computer Science, Royal Holloway University of London (1999)Google Scholar
  5. 5.
    Haussler, D.: Convolution kernels on discrete structures. Technical report, UC Santa Cruz (1999)Google Scholar
  6. 6.
    Leslie, C., Kuang, R.: Fast String Kernels using Inexact Matching for Protein Sequences. Journal of Machine Learning Research 5, 1435–1455 (2004)MathSciNetGoogle Scholar
  7. 7.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Dordrecht (2002)Google Scholar
  8. 8.
    Lodhi, H., Saunders, C., Shawe-Taylor, C., Cristianini, N., Watkins, C.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)MATHCrossRefGoogle Scholar
  9. 9.
    Yin, C.H., Tian, S.F., Mu, S.M.: Detecting Anomalous Process Using Gapped String Kernels. Journal of Computational Information Systems (accepted)Google Scholar
  10. 10.
    Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the pacific biocomputing Symposium (2002)Google Scholar
  11. 11.
    Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for SVM protein classification. In: Proceedings of Neural Information Processing Systems (2002)Google Scholar
  12. 12.
    Cancedda, N., Gaussier, E., Goutte, C., Renders, J.-M.: Word-Sequence Kernels. Journal of Machine Learning Research 3, 1059–1082 (2003)MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    Rousu, J., Shawe-Taylor, J.: Efficient computation of gapped substring kernels on large alphabets. Journal of Machine Learning Research 6, 1323–1344 (2005)MathSciNetGoogle Scholar
  14. 14.
    Hyyrö, H., Navarro, G.: Bit-Parallel Witnesses and Their Applications to Approximate String Matching. Algorithmic 41, 203–231 (2004)CrossRefGoogle Scholar
  15. 15.
    Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. Journal of the ACM 3, 395–415 (1999)CrossRefGoogle Scholar
  16. 16.
    Forrest, S., Hofmeyr, S.A., Somayaji, A.: Longstaff. T.A.: A Sense of Self for UNIX Processes. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 120–128 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Chuanhuan Yin
    • 1
  • Shengfeng Tian
    • 1
  • Shaomin Mu
    • 1
  1. 1.School of Computer and Information TechnologyBeijing Jiaotong UniversityBeijingChina

Personalised recommendations