A Fast Bit-Parallel Algorithm for Gapped String Kernels
In this paper, we present a new kind of gapped string kernel, named length-weighted kernels, including p-length-weighted and all-length-weighted kernels. Moreover, we propose a dynamic programming algorithm based on suffix kernel to compute the length-weighted kernels. Given strings s and t, and a gap penalty λ, all-length-weighted kernel can be calculated in time O(|s||t|) using our algorithms. Based on the relationship between all-length and p-length kernels, the p-length-weighted can be computed in O(p|s||t|) time. Furthermore, a bit-parallel technique is used to reduce the complexity from O(p|s||t|) to O(⌈pk/w⌉|s||t|), where w is the word size of the machine (e.g. 32 or 64 in practice) and k is determined by the longest matching subsequence of two strings s and t. The empirical results suggest that this bit-parallel technique algorithm combined with dynamic programming and suffix kernel technique outperforms the other approaches in some cases where the necessary condition of using bit-parallel technique can be satisfied.
KeywordsDynamic Programming Intrusion Detection Weighted Kernel Suffix Tree Word Size
Unable to display preview. Download preview PDF.
- 2.Cristianini, N., Shawe-Taylor, J.: An introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)Google Scholar
- 3.Shawe-Taylor, C., Cristianini, N.: Kernel methods for pattern analysis. Cambridge University Press, Cambridge (2004)Google Scholar
- 4.Watkins, C.: Dynamic alignment kernels. Technical Report CSD-TR-98-11, Department of Computer Science, Royal Holloway University of London (1999)Google Scholar
- 5.Haussler, D.: Convolution kernels on discrete structures. Technical report, UC Santa Cruz (1999)Google Scholar
- 7.Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Dordrecht (2002)Google Scholar
- 9.Yin, C.H., Tian, S.F., Mu, S.M.: Detecting Anomalous Process Using Gapped String Kernels. Journal of Computational Information Systems (accepted)Google Scholar
- 10.Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: a string kernel for SVM protein classification. In: Proceedings of the pacific biocomputing Symposium (2002)Google Scholar
- 11.Leslie, C., Eskin, E., Weston, J., Noble, W.S.: Mismatch string kernels for SVM protein classification. In: Proceedings of Neural Information Processing Systems (2002)Google Scholar
- 16.Forrest, S., Hofmeyr, S.A., Somayaji, A.: Longstaff. T.A.: A Sense of Self for UNIX Processes. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 120–128 (1996)Google Scholar