Tuning N-gram String Kernel SVMs via Meta Learning
Even though Support Vector Machines (SVMs) are capable of identifying patterns in high dimensional kernel spaces, their performance is determined by two main factors: SVM cost parameter and kernel parameters. This paper identifies a mechanism to extract meta features from string datasets, and derives a n-gram string kernel SVM optimization method. In the method, a meta model is trained over computed string meta-features for each dataset from a string dataset pool, learning algorithm parameters, and accuracy information to predict the optimal parameter combination for a given string classification task. In the experiments, the n-gram SVM were optimized using the proposed algorithm over four string datasets: spam, Reuters-21578, Network Application Detection and e-News Categorization. The experiment results revealed that the proposed algorithm was able to produce parameter combinations which yield good string classification accuracies for n-gram SVM on all string datasets.
KeywordsMeta learning n-gram String Kernels SVM Text Categorization SVM Optimization
Unable to display preview. Download preview PDF.
- 1.Zhang, X.L., Chen, X., He, Z.: An ACO-based algorithm for parameter optimization of support vector machines. Expert Systems with Applications (9), 6618–6628 (2010)Google Scholar
- 3.Lam, W., Lai, K.: A meta-learning approach for text categorization. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 303–309. ACM, New York (2001)Google Scholar
- 4.Hersh, W.: Information retrieval: A health and biomedical perspective. Springer, New York (2008)Google Scholar
- 6.Spam assassin public mail corpus (2002), http://spamassassin.apache.org/publiccorpus/ (Retrieved December 23, 2009)