Control of Sparseness for Feature Selection

  • Erinija Pranckeviciene
  • Richard Baumgartner
  • Ray Somorjai
  • Christopher Bowman
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3138)


In linear discriminant (LD) analysis high sample size/feature ratio is desirable. The linear programming procedure (LP) for LD identification handles the curse of dimensionality through simultaneous minimization of the L1 norm of the classification errors and the LD weights. The sparseness of the solution – the fraction of features retained – can be controlled by a parameter in the objective function. By qualitatively analyzing the objective function and the constraints of the problem, we show why sparseness arises. In a sparse solution, large values of the LD weight vector reveal those individual features most important for the decision boundary.


Objective Function Support Vector Machine Feature Selection Feasible Region Decision Boundary 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Raudys, S.: Statistical and neural classifiers. Springer, Heidelberg (2001)zbMATHGoogle Scholar
  2. 2.
    Chen, L., Liao, H.M., Ko, M., Lin, J., Yu, G.: A new lda-based face recognition system which can solve the small sample size problem. Pattern recognition 33, 1713–1726 (2000)CrossRefGoogle Scholar
  3. 3.
    Howland, P., Jeon, M., Park, H.: Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition. SIAM Journal on Matrix Analysis and Applications 25-1, 165–179 (2003)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Bradley, P., Mangasarian, O., Street, W.: Feature selection via mathematical programming. INFORMS Journal on Computing 10(2), 209–217 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Bhattacharyya, C., Grate, L.R., Rizki, A., et al.: Simultaneous relevant feature identification and classification in high-dimensional spaces: application to molecular profiling data. Signal Processing 83(4), 729–743 (2003)zbMATHCrossRefGoogle Scholar
  6. 6.
    Guo, G.D., Dyer, C.: Simultaneous Feature Selection and Classifier Training via Linear Programming: A Case Study for Face Expression Recognition. In: Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison, Wisconsin, June 18-20, vol. 1, pp. 346–352 (2003)Google Scholar
  7. 7.
    Pedroso, J.P., Murata, N.: Support vector machines with different norms: motivation, formulations and results. Pattern recognition letters 12(2), 1263–1272 (2001)CrossRefGoogle Scholar
  8. 8.
    Kecman, V., Hadzic, I.: Support vectors selection by linear programming. In: Proc. IJCNN 2000, vol. 5, pp. 193–198 (2000)Google Scholar
  9. 9.
    Vapnik, V.: Introduction to statistical learning theory. Springer, Heidelberg (2001)Google Scholar
  10. 10.
    Rosen, J.B., Park, H., Glick, J., Zhang, L.: Accurate solutions to overdetermined linear equations with errors using L1 norm minimization. Computational optimization and applications 17, 329–341 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Szeliski, R.: Regularization in neural nets. In: Smolensky, P., et al. (eds.) Mathematical perspectives on neural networks, pp. 497–532. Lawrence Erlbaum Associates, Mahwah (1996)Google Scholar
  12. 12.
    Poggio, T., Smale, S.: The mathematics of learning:dealing with data. Notices of the American Mathematical Society. Notices of the American Mathematical Society, vol. 50(5), pp. 537–544 (2003)Google Scholar
  13. 13.
    Saunders, C., Gammerman, A., Vovk, V.: Ridge regression learning algorithm in dual variables. In: Proceedings of the 15th International Conference on Machine Learning (1998)Google Scholar
  14. 14.
    Mika, S., Ratsch, G., Weston, J., Schoelkopf, B., Smola, A., Muller, K.R.: Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE PAMI 25(5), 623–628 (2003)Google Scholar
  15. 15.
    Arthanari, T.S., Dodge, Y.: Mathematical programming in statistics. John Willey and sons, West Sussex (1981)zbMATHGoogle Scholar
  16. 16.
    Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine (2004)Google Scholar
  17. 17.
    Figureido, M.: Adaptive sparseness for supervised learning. IEEE PAMI 25(9), 1150–1159 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Erinija Pranckeviciene
    • 1
    • 2
  • Richard Baumgartner
    • 1
  • Ray Somorjai
    • 1
  • Christopher Bowman
    • 1
  1. 1.Institute for BiodiagnosticsNational Research Council CanadaWinnipegCanada
  2. 2.Kaunas University of TecnologyKaunasLithuania

Personalised recommendations