Advertisement

Knowledge and Information Systems

, Volume 58, Issue 2, pp 371–397 | Cite as

Feature weighted confidence to incorporate prior knowledge into support vector machines for classification

  • Wen ZhangEmail author
  • Lean Yu
  • Taketoshi Yoshida
  • Qing Wang
Regular Paper
  • 137 Downloads

Abstract

This paper proposes an approach called feature weighted confidence with support vector machine (FWC–SVM) to incorporate prior knowledge into SVM with sample confidence. First, we use prior features to express prior knowledge. Second, FWC–SVM is biased to assign larger weights for prior weights in the slope vector \(\omega \) than weights corresponding to non-prior features. Third, FWC–SVM employs an adaptive paradigm to update sample confidence and feature weights iteratively. We conduct extensive experiments to compare FWC–SVM with the state-of-the-art methods including standard SVM, WSVM, and WMSVM on an English dataset as Reuters-21578 text collection and a Chinese dataset as TanCorpV1.0 text collection. Experimental results demonstrate that in case of non-noisy data, FWC–SVM outperforms other methods when the retaining level is not larger than 0.8. In case of noisy data, FWC–SVM can produce better performance than WSVM on Reuters-21578 dataset when the retaining level is larger than 0.4 and on TanCorpV1.0 dataset when the retaining level is larger than 0.5. We also discuss the strength and weakness of the proposed FWC–SVM approach.

Keywords

Feature weighted confidence Prior knowledge Support vector machine Classification 

Notes

Acknowledgements

This research was supported in part by National Natural Science Foundation of China under Grant Nos. 61379046, 91318302, and 61432001 and the Innovation Fund Project of Xi’an Science and Technology Program (Special Series for Xi’an University No. 2016CXWL21).

References

  1. 1.
    Vapnik V (1982) Estimation of dependences based on empirical data. Springer, BerlinzbMATHGoogle Scholar
  2. 2.
    Li J, Cao Y, Wang Y et al (2016) Online learning algorithms for double-weighted least squares twin bounded support vector machines. Neural Process Lett 45(1):1–21Google Scholar
  3. 3.
    Tomar D, Agarwal S (2015) Hybrid feature selection based weighted least squares twin support vector machine approach for diagnosing breast cancer, hepatitis, and diabetes. Adv Artif Neural Syst 2015.  https://doi.org/10.1155/2015/265637
  4. 4.
    Liu Y, Bi J, Fan Z (2017) A method for multi-class sentiment classification based on an improved one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm. Inf Sci 394:38–52CrossRefGoogle Scholar
  5. 5.
    Zhu F, Yang J, Gao C et al (2016) A weighted one-class support vector machine. Neurocomputing 189:1–10CrossRefGoogle Scholar
  6. 6.
    Vapnik V (1995) The nature of statistical learning theory. Springer, BerlinCrossRefzbMATHGoogle Scholar
  7. 7.
    Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  8. 8.
    Krupka E, Tishby N (2007) Incorporating prior knowledge on features into learning. In: Proceedings of the eleventh international conference on artificial intelligence and statisticsGoogle Scholar
  9. 9.
    Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intell Data Anal 6(5):429–450CrossRefzbMATHGoogle Scholar
  10. 10.
    Kunapuli G, Bennett KP, Shabbeer A et al (2010) Online knowledge-based support vector machines. In: Proceedings of European conference on machine learning and knowledge discovery in databases, pp 145–161Google Scholar
  11. 11.
    Iwata T, Tanaka T, Yamada T et al (2011) Improving classifier performance using data with different taxonomies. IEEE Trans Knowl Data Eng 23(11):1668–1677CrossRefGoogle Scholar
  12. 12.
    Zhang L, Zhou W (2011) Density-induced margin support vector machines. Pattern Recognit 44(7):1448–1460CrossRefzbMATHGoogle Scholar
  13. 13.
    Orchel M (2011) Incorporating priori knowledge from detractor points into support vector classification. In: Adaptive and natural computing algorithms (LNCS 6594), pp 332–341Google Scholar
  14. 14.
    Lauer F, Bloch G (2008) Incorporating prior knowledge in support vector machines for classification: a review. Neurocomputing 71(7):1578–1594CrossRefGoogle Scholar
  15. 15.
    Niyogi P, Girosi F, Poggio T (1998) Incorporating prior information in machine learning by creating virtual examples. Proc IEEE 86(11):2196–2209CrossRefGoogle Scholar
  16. 16.
    Lin G, Wang S (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471CrossRefGoogle Scholar
  17. 17.
    Krishnapuram R, Keller JM (1996) The possibilistic c-means algorithm: insights and recommendations. IEEE Trans Fuzzy Syst 4(3):385–393CrossRefGoogle Scholar
  18. 18.
    Wu X, Srihari R (2004) Incorporating prior knowledge with weighted margin support vector machines. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 326–333Google Scholar
  19. 19.
    Zhang W, Yoshida T, Tang X (2008) Text classification based on multi-word with support vector machine. Knowl Based Syst 21(8):879–886CrossRefGoogle Scholar
  20. 20.
    Chechik G, Heitz G, Elidan G et al (2008) Max-margin classification of data with absent features. J Mach Learn Res 9:1–21zbMATHGoogle Scholar
  21. 21.
    Bordes A, Ertekin S, Weston J et al (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res 6:1579–1619MathSciNetzbMATHGoogle Scholar
  22. 22.
    The Reuters-21578 data set of English text collection. http://www.research.att.com/~lewis
  23. 23.
    The USPTO (United States Patent and Trademark Office) stopwords list. http://ftp.uspto.gov/patft/help/stopword.htm
  24. 24.
    The QTag tool for English part-of-speech. http://www.english.bham.ac.uk/staff/oma-son/software/qtag.html
  25. 25.
    The Porter stemming algorithm for English. http://tartarus.org/martin/PorterStemmer/
  26. 26.
    Salton G, Yang CS (1973) On the specification of term values in automatic indexing. J Doc 29(4):351–372CrossRefGoogle Scholar
  27. 27.
    Zhang W, Yoshida T, Tang X et al (2011) A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Syst Appl 38(3):2758–2765CrossRefGoogle Scholar
  28. 28.
  29. 29.
    ICTCLAS: a Chinese morphological analysis tool. http://nlp.org.cn/zhp/ICTCLAS/codes.html
  30. 30.
    The Chinese stop word list. http://www.datatang.com/data/19300
  31. 31.
    Zhang W, Yoshida T, Tang XJ (2009) Using ontology to improve precision of terminology extraction from documents. Expert Syst Appl 36(5):9333–9339CrossRefGoogle Scholar
  32. 32.
  33. 33.
    Yang YM, Liu X (1999) A re-examination of text categorization methods. In: Proceedings on the 22nd annual international ACM SIGIR conference on research and development in information retrieval, Berkeley, CA, pp 42–49Google Scholar
  34. 34.
  35. 35.
    Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88(422):486–494MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Mann HB, Whitney R (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Cauwenberghs G, Poggio T (2001) Incremental and decremental support vector machine learning. In: Advances in neural information processing systems 13 (NIPS 2000), pp 409–415Google Scholar
  38. 38.
    Crammer K, Dekel O, Keshet J et al (2006) Online passive-aggressive algorithms. J Mach Learn Res 7:551–585MathSciNetzbMATHGoogle Scholar
  39. 39.
    Antoine B, Seyda E, Jason W et al (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res 6:1579–1619MathSciNetzbMATHGoogle Scholar
  40. 40.
    Zhang W, Yoshida T, Tang X et al (2010) Text clustering using frequent itemsets. Knowl Based Syst 23(5):379–388CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Economics and ManagementBeijing University of TechnologyBeijingPeople’s Republic of China
  2. 2.Center for Big Data SciencesBeijing University of Chemical TechnologyBeijingPeople’s Republic of China
  3. 3.School of Knowledge ScienceJapan Advanced Institute of Science and TechnologyNomiJapan
  4. 4.State Key Laboratory of Computer Science, Institute of SoftwareChinese Academy of SciencesBeijingPeople’s Republic of China

Personalised recommendations