Analytic Feature Selection for Support Vector Machines

  • Carly Stambaugh
  • Hui Yang
  • Felix Breuer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7988)

Abstract

Support vector machines (SVMs) rely on the inherent geometry of a data set to classify training data. Because of this, we believe SVMs are an excellent candidate to guide the development of an analytic feature selection algorithm, as opposed to the more commonly used heuristic methods. We propose a filter-based feature selection algorithm based on the inherent geometry of a feature set. Through observation, we identified six geometric properties that differ between optimal and suboptimal feature sets, and have statistically significant correlations to classifier performance. Our algorithm is based on logistic and linear regression models using these six geometric properties as predictor variables. The proposed algorithm achieves excellent results on high dimensional text data sets, with features that can be organized into a handful of feature types; for example, unigrams, bigrams or semantic structural features. We believe this algorithm is a novel and effective approach to solving the feature selection problem for linear SVMs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Vapnik, V.N.: Statistical Learning Theory. Wiley-Interscience (1998)Google Scholar
  2. 2.
    Han, J., Kamber, M.: Data mining: concepts and techniques, 2nd edn. Morgan Kaufmann (2006)Google Scholar
  3. 3.
    Molina, L.C., Belanche, L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 306–313. IEEE Comput. Soc. (2002)Google Scholar
  4. 4.
    Joachims, T.: Making large-scale support vector machine learning practical (1998)Google Scholar
  5. 5.
    Garg, A., Har-peled, S., Roth, D.: On generalization bounds, projection profile, and margin distribution (2002)Google Scholar
  6. 6.
    Bradley, P., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Machine Learning Proceedings of the Fifteenth International Conference, ICML 1998, pp. 82–90. Morgan Kaufmann (1998)Google Scholar
  7. 7.
    Bern, M., Eppstein, D.: Optimization over zonotopes and training support vector machines (2001)Google Scholar
  8. 8.
    Webster, R.: Convexity. Oxford University Press, Oxford (1994)MATHGoogle Scholar
  9. 9.
    Ziegler, G.M.: Lectures on Polytopes. Springer (1995)Google Scholar
  10. 10.
    Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: Open source scientific tools for Python (2001)Google Scholar
  11. 11.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes: The Art of Scientific Computing, 3rd edn. Cambridge University Press, New York (2007)Google Scholar
  12. 12.
    Swaminathan, R., Sharma, A., Yang, H.: Opinion mining for biomedical text data: Feature space design and feature selection. In: The Nineth International Workshop on Data Mining in Bioinformatics, BIOKDD 2010 (July 2010)Google Scholar
  13. 13.
    Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the ACL (2004)Google Scholar
  14. 14.
    Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media (2009)Google Scholar
  15. 15.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010)Google Scholar
  16. 16.
    Tesic, J.: Evaluating a class of dimensionality reduction algorithms abstractGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Carly Stambaugh
    • 1
  • Hui Yang
    • 2
  • Felix Breuer
    • 1
  1. 1.Department of MathematicsSan Francisco State UniversitySan FranciscoUSA
  2. 2.Department of Computer ScienceSan Francisco State UniversitySan FranciscoUSA

Personalised recommendations