Abstract
This paper proposes an approach called feature weighted confidence with support vector machine (FWC–SVM) to incorporate prior knowledge into SVM with sample confidence. First, we use prior features to express prior knowledge. Second, FWC–SVM is biased to assign larger weights for prior weights in the slope vector \(\omega \) than weights corresponding to non-prior features. Third, FWC–SVM employs an adaptive paradigm to update sample confidence and feature weights iteratively. We conduct extensive experiments to compare FWC–SVM with the state-of-the-art methods including standard SVM, WSVM, and WMSVM on an English dataset as Reuters-21578 text collection and a Chinese dataset as TanCorpV1.0 text collection. Experimental results demonstrate that in case of non-noisy data, FWC–SVM outperforms other methods when the retaining level is not larger than 0.8. In case of noisy data, FWC–SVM can produce better performance than WSVM on Reuters-21578 dataset when the retaining level is larger than 0.4 and on TanCorpV1.0 dataset when the retaining level is larger than 0.5. We also discuss the strength and weakness of the proposed FWC–SVM approach.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Vapnik V (1982) Estimation of dependences based on empirical data. Springer, Berlin
Li J, Cao Y, Wang Y et al (2016) Online learning algorithms for double-weighted least squares twin bounded support vector machines. Neural Process Lett 45(1):1–21
Tomar D, Agarwal S (2015) Hybrid feature selection based weighted least squares twin support vector machine approach for diagnosing breast cancer, hepatitis, and diabetes. Adv Artif Neural Syst 2015. https://doi.org/10.1155/2015/265637
Liu Y, Bi J, Fan Z (2017) A method for multi-class sentiment classification based on an improved one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm. Inf Sci 394:38–52
Zhu F, Yang J, Gao C et al (2016) A weighted one-class support vector machine. Neurocomputing 189:1–10
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Krupka E, Tishby N (2007) Incorporating prior knowledge on features into learning. In: Proceedings of the eleventh international conference on artificial intelligence and statistics
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–450
Kunapuli G, Bennett KP, Shabbeer A et al (2010) Online knowledge-based support vector machines. In: Proceedings of European conference on machine learning and knowledge discovery in databases, pp 145–161
Iwata T, Tanaka T, Yamada T et al (2011) Improving classifier performance using data with different taxonomies. IEEE Trans Knowl Data Eng 23(11):1668–1677
Zhang L, Zhou W (2011) Density-induced margin support vector machines. Pattern Recognit 44(7):1448–1460
Orchel M (2011) Incorporating priori knowledge from detractor points into support vector classification. In: Adaptive and natural computing algorithms (LNCS 6594), pp 332–341
Lauer F, Bloch G (2008) Incorporating prior knowledge in support vector machines for classification: a review. Neurocomputing 71(7):1578–1594
Niyogi P, Girosi F, Poggio T (1998) Incorporating prior information in machine learning by creating virtual examples. Proc IEEE 86(11):2196–2209
Lin G, Wang S (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471
Krishnapuram R, Keller JM (1996) The possibilistic c-means algorithm: insights and recommendations. IEEE Trans Fuzzy Syst 4(3):385–393
Wu X, Srihari R (2004) Incorporating prior knowledge with weighted margin support vector machines. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 326–333
Zhang W, Yoshida T, Tang X (2008) Text classification based on multi-word with support vector machine. Knowl Based Syst 21(8):879–886
Chechik G, Heitz G, Elidan G et al (2008) Max-margin classification of data with absent features. J Mach Learn Res 9:1–21
Bordes A, Ertekin S, Weston J et al (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res 6:1579–1619
The Reuters-21578 data set of English text collection. http://www.research.att.com/~lewis
The USPTO (United States Patent and Trademark Office) stopwords list. http://ftp.uspto.gov/patft/help/stopword.htm
The QTag tool for English part-of-speech. http://www.english.bham.ac.uk/staff/oma-son/software/qtag.html
The Porter stemming algorithm for English. http://tartarus.org/martin/PorterStemmer/
Salton G, Yang CS (1973) On the specification of term values in automatic indexing. J Doc 29(4):351–372
Zhang W, Yoshida T, Tang X et al (2011) A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Syst Appl 38(3):2758–2765
The TanCorpV1.0 corpus. http://www.searchforum.org.cn/tansongbo/corpus1.php
ICTCLAS: a Chinese morphological analysis tool. http://nlp.org.cn/zhp/ICTCLAS/codes.html
The Chinese stop word list. http://www.datatang.com/data/19300
Zhang W, Yoshida T, Tang XJ (2009) Using ontology to improve precision of terminology extraction from documents. Expert Syst Appl 36(5):9333–9339
The WordNet. http://wordnet.princeton.edu
Yang YM, Liu X (1999) A re-examination of text categorization methods. In: Proceedings on the 22nd annual international ACM SIGIR conference on research and development in information retrieval, Berkeley, CA, pp 42–49
The JOptimizer. http://www.joptimizer.com/
Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88(422):486–494
Mann HB, Whitney R (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60
Cauwenberghs G, Poggio T (2001) Incremental and decremental support vector machine learning. In: Advances in neural information processing systems 13 (NIPS 2000), pp 409–415
Crammer K, Dekel O, Keshet J et al (2006) Online passive-aggressive algorithms. J Mach Learn Res 7:551–585
Antoine B, Seyda E, Jason W et al (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res 6:1579–1619
Zhang W, Yoshida T, Tang X et al (2010) Text clustering using frequent itemsets. Knowl Based Syst 23(5):379–388
Acknowledgements
This research was supported in part by National Natural Science Foundation of China under Grant Nos. 61379046, 91318302, and 61432001 and the Innovation Fund Project of Xi’an Science and Technology Program (Special Series for Xi’an University No. 2016CXWL21).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, W., Yu, L., Yoshida, T. et al. Feature weighted confidence to incorporate prior knowledge into support vector machines for classification. Knowl Inf Syst 58, 371–397 (2019). https://doi.org/10.1007/s10115-018-1165-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1165-2