Fast Sparse Multinomial Regression Applied to Hyperspectral Data

  • Janete S. Borges
  • José M. Bioucas-Dias
  • André R. S. Marçal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4142)


Methods for learning sparse classification are among the state-of-the-art in supervised learning. Sparsity, essential to achieve good generalization capabilities, can be enforced by using heavy tailed priors/ regularizers on the weights of the linear combination of functions. These priors/regularizers favour a few large weights and many to exactly zero. The Sparse Multinomial Logistic Regression algorithm [1] is one of such methods, that adopts a Laplacian prior to enforce sparseness. Its applicability to large datasets is still a delicate task from the computational point of view, sometimes even impossible to perform. This work implements an iterative procedure to calculate the weights of the decision function that is O(m 2) faster than the original method introduced in [1] (m is the number of classes). The benchmark dataset Indian Pines is used to test this modification. Results over subsets of this dataset are presented and compared with others computed with support vector machines.


Training Sample Linear Discriminant Analysis Hyperspectral Image Linear Kernel Hyperspectral Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Krishnapuram, B., Carin, L., Figueiredo, M.A.T., Hartemink, A.J.: Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 957–968 (2005)CrossRefGoogle Scholar
  2. 2.
    Landgrebe, D.A.: Signal Theory Methods in Multispectral Remote Sensing. John Wiley and Sons, Inc., Hoboken, New Jersey (2003)CrossRefGoogle Scholar
  3. 3.
    Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)MATHGoogle Scholar
  4. 4.
    Camps-Valls, G., Bruzzone, L.: Kernel-based methods for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing 43(6), 1351–1362 (2005)CrossRefGoogle Scholar
  5. 5.
    Tipping, M.: Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244 (2001)MATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    Figueiredo, M.: Adaptive Sparseness for Supervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(9), 1150–1159 (2003)CrossRefGoogle Scholar
  7. 7.
    Csato, L., Opper, M.: Sparse online Gaussian processes. Neural Computation 14(3), 641–668 (2002)MATHCrossRefGoogle Scholar
  8. 8.
    Lawrence, N.D., Seeger, M., Herbrich, R.: Fast sparse Gaussian process methods: The informative vector machine. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems 15, pp. 609–616. MIT Press, Cambridge (2003)Google Scholar
  9. 9.
    Krishnapuram, B., Carin, L., Hartemink, A.J.: Joint classifier and feature optimization for cancer diagnosis using gene expression data. In: Proceedings of the International Conference in Research in Computational Molecular Biology (RECOMB 2003), Berlin, Germany (2003)Google Scholar
  10. 10.
    Krishnapuram, B., Carin, L., Hartemink, A.J., Figueiredo, M.A.T.: A Bayesian approach to joint feature selection and classifier design. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1105–1111 (2004)CrossRefGoogle Scholar
  11. 11.
    Quarteroni, A., Sacco, R., Saleri, F.: Numerical Mathematics. TAM Series, vol. 37. Springer, New York (2000)Google Scholar
  12. 12.
    Bioucas Dias, J.M.: Fast Sparse Multinomial Logistic Regression - Technical Report. Instituto Superior Técnico (2006), Available at:
  13. 13.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining. Inference and Prediction. Springer, New York (2001)MATHGoogle Scholar
  14. 14.
    Lange, K., Hunter, D., Yang, I.: Optimizing transfer using surrogate objective functions. Journal of Computational and Graphical Statistics 9, 1–59 (2000)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Landgrebe, D.A.: NW Indiana’s Indian Pine (1992), Available at:
  16. 16.
    The MathWorks: MATLAB The Language of Technical Computing - Using MATLAB: version 6. The Math Works, Inc. (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Janete S. Borges
    • 1
  • José M. Bioucas-Dias
    • 2
  • André R. S. Marçal
    • 1
  1. 1.Faculdade de CiênciasUniversidade do Porto, DMAPortoPortugal
  2. 2.Instituto de Telecomunicações – Instituto Superior TécnicoLisboaPortugal

Personalised recommendations