# Dictionary Learning for Fast Classification Based on Soft-thresholding

- 2.5k Downloads
- 10 Citations

## Abstract

Classifiers based on sparse representations have recently been shown to provide excellent results in many visual recognition and classification tasks. However, the high cost of computing sparse representations at test time is a major obstacle that limits the applicability of these methods in large-scale problems, or in scenarios where computational power is restricted. We consider in this paper a simple yet efficient alternative to sparse coding for feature extraction. We study a classification scheme that applies the *soft-thresholding* nonlinear mapping in a dictionary, followed by a linear classifier. A novel supervised dictionary learning algorithm tailored for this low complexity classification architecture is proposed. The dictionary learning problem, which *jointly* learns the dictionary and linear classifier, is cast as a *difference of convex* (DC) program and solved efficiently with an iterative DC solver. We conduct experiments on several datasets, and show that our learning algorithm that leverages the structure of the classification problem outperforms generic learning procedures. Our simple classifier based on soft-thresholding also competes with the recent sparse coding classifiers, when the dictionary is learned appropriately. The adopted classification scheme further requires less computational time at the testing stage, compared to other classifiers. The proposed scheme shows the potential of the adequately trained soft-thresholding mapping for classification and paves the way towards the development of very efficient classification methods for vision problems.

## Keywords

Dictionary learning Soft-thresholding Sparse coding Rectifier linear units Neural networks## Notes

### Acknowledgments

The authors would like to thank the associate editor and the anonymous reviewers for their valuable comments and references that helped to improve the quality of this paper.

## References

- 1.Akata, Z., Perronnin, F., Harchaoui, Z., & Schmid, C. (2014). Good practice in large-scale learning for image classification.
*IEEE Transactions on Pattern Analysis and Machine Intelligence, 36*(3), 507–520.Google Scholar - 2.An, L. T. H., & Tao, P. D. (2005). The DC (difference of convex functions) programming and DCA revisited with dc models of real world nonconvex optimization problems.
*Annals of Operations Research*,*133*(1–4), 23–46.CrossRefMathSciNetMATHGoogle Scholar - 3.Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems.
*SIAM Journal on Imaging Sciences*,*2*(1), 183–202.CrossRefMathSciNetMATHGoogle Scholar - 4.Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*35*(8), 1798–1828.CrossRefMATHGoogle Scholar - 5.Bishop, C. M. (1995).
*Neural networks for pattern recognition*. Oxford: oxford University Press Inc.Google Scholar - 6.Boyd, S., & Vandenberghe, L. (2004).
*Convex optimization*. Cambridge: Cambridge University Press.CrossRefMATHGoogle Scholar - 7.Burges, C. (1998). A tutorial on support vector machines for pattern recognition.
*Data Mining and Knowledge Discovery*,*2*(2), 121–167.CrossRefGoogle Scholar - 8.Chang, C. C, & Lin, C. J. (2011). LIBSVM: A library for support vector machines.
*ACM Transactions on Intelligent Systems and Technology, 2*, 27:1–27:27.Google Scholar - 9.Chen, C. F., Wei, C. P., & Wang, Y. C. (2012). Low-rank matrix recovery with structural incoherence for robust face recognition. In
*IEEE conference on computer vision and pattern recognition (CVPR)*(pp. 2618–2625).Google Scholar - 10.Coates, A., & Ng, A. (2011). The importance of encoding versus training with sparse coding and vector quantization. In
*International conference on machine learning (ICML)*(pp. 921–928).Google Scholar - 11.Denil, M., & de Freitas, N. (2012). Recklessly approximate sparse coding. arXiv preprint arXiv:12080959.Google Scholar
- 12.Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries.
*IEEE Transactions on Image Processing*,*15*(12), 3736–3745.CrossRefMathSciNetGoogle Scholar - 13.Fadili, J., Starck, J. L., & Murtagh, F. (2009). Inpainting and zooming using sparse representations.
*The Computer Journal*,*52*(1), 64–79.CrossRefGoogle Scholar - 14.Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). LIBLINEAR: A library for large linear classification.
*Journal of Machine Learning Research*,*9*, 1871–1874.MATHGoogle Scholar - 15.Figueras i Ventura, R., Vandergheynst, P., & Frossard, P. (2006). Low-rate and flexible image coding with redundant representations.
*IEEE Transactions on Image Processing*,*15*(3), 726–739.CrossRefGoogle Scholar - 16.Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier networks. In
*International Conference on Artificial Intelligence and Statistics (AISTATS)*(Vol. 15, pp. 315–323).Google Scholar - 17.Gregor, K., & LeCun, Y. (2010). Learning fast approximations of sparse coding. In
*International Conference on Machine Learning (ICML)*(pp. 399–406).Google Scholar - 18.Horst, R. (2000).
*Introduction to global optimization*. Berlin: Springer.CrossRefGoogle Scholar - 19.Huang, K., & Aviyente, S. (2006). Sparse representation for signal classification. In
*Advances in neural information processing systems*(pp. 609–616).Google Scholar - 20.Hull, J. J. (1994). A database for handwritten text recognition research.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*16*(5), 550–554.CrossRefGoogle Scholar - 21.Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2010a). Fast inference in sparse coding algorithms with applications to object recognition. arXiv preprint arXiv:10103467.Google Scholar
- 22.Kavukcuoglu, K., Sermanet, P., Boureau, Y. L., Gregor, K., Mathieu, M., & LeCun, Y. (2010b). Learning convolutional feature hierarchies for visual recognition. In
*Advances in neural information processing systems (NIPS)*(pp. 1090–1098).Google Scholar - 23.Krizhevsky, A., & Hinton, G. (2009).
*Learning multiple layers of features from tiny images*. Master’s thesis, Department of Computer Science, University of Toronto.Google Scholar - 24.Larochelle, H., Mandel, M., Pascanu, R., & Bengio, Y. (2012). Learning algorithms for the classification restricted boltzmann machine.
*The Journal of Machine Learning Research*,*13*, 643–669.Google Scholar - 25.LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition.
*Proceedings of the IEEE*,*86*(11), 2278–2324.CrossRefGoogle Scholar - 26.Ma, L., Wang, C., Xiao, B., & Zhou, W. (2012). Sparse representation for face recognition based on discriminative low-rank dictionary learning. In
*IEEE conference on computer vision and pattern recognition (CVPR)*(pp. 2586–2593).Google Scholar - 27.Maas, A., Hannun, A., & Ng, A. (2013). Rectifier nonlinearities improve neural network acoustic models. In
*International conference on machine learning (ICML)*.Google Scholar - 28.Mairal, J., Bach, F., & Ponce, J. (2012). Task-driven dictionary learning.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*34*(4), 791–804.CrossRefGoogle Scholar - 29.Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2010). Online learning for matrix factorization and sparse coding.
*The Journal of Machine Learning Research*,*11*, 19–60.MathSciNetMATHGoogle Scholar - 30.Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2008). Supervised dictionary learning. In
*Advances in neural information processing systems (NIPS)*(pp. 1033–1040).Google Scholar - 31.Mairal, J., & Yu, B. (2012). Complexity analysis of the lasso regularization path. In
*International conference on machine learning (ICML)*(pp. 353–360).Google Scholar - 32.Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: transfer learning from unlabeled data. In
*International conference on machine learning (ICML)*(pp. 759–766).Google Scholar - 33.Ramirez, I., Sprechmann, P., & Sapiro, G. (2010). Classification and clustering via dictionary learning with structured incoherence and shared features. In
*IEEE conference on computer vision and pattern recognition (CVPR)*(pp. 3501–3508).Google Scholar - 34.Shawe-Taylor, J., & Cristianini, N. (2004).
*Kernel methods for pattern analysis*. Cambridge: Cambridge University Press.CrossRefGoogle Scholar - 35.Sriperumbudur, B. K., Torres, D. A., & Lanckriet, G. R. (2007). Sparse eigen methods by DC programming. In
*International conference on machine learning (ICML)*(pp. 831–838).Google Scholar - 36.Tao, P. D., & An, L. T. H. (1998). A DC optimization algorithm for solving the trust-region subproblem.
*SIAM Journal on Optimization*,*8*(2), 476–505.CrossRefMathSciNetMATHGoogle Scholar - 37.Valkealahti, K., & Oja, E. (1998). Reduced multidimensional co-occurrence histograms in texture classification.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*20*(1), 90–94.CrossRefGoogle Scholar - 38.Wright, J., Yang, A., Ganesh, A., Sastry, S., & Ma, Y. (2009). Robust face recognition via sparse representation.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*31*(2), 210–227.CrossRefGoogle Scholar - 39.Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In
*IEEE conference on computer vision and pattern recognition (CVPR)*(pp. 1794–1801).Google Scholar - 40.Yuille, A., Rangarajan, A., & Yuille, A. (2002). The concave-convex procedure (cccp). In
*Advances in neural information processing systems (NIPS)*(Vol. 2, pp. 1033–1040).Google Scholar - 41.Zeiler, M., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., & Hinton, G. (2013). On rectified linear units for speech processing. In
*IEEE International conference on acoustics, speech and signal processing (ICASSP)*(pp. 3517–3521).Google Scholar - 42.Zhang, Y., Jiang, Z., & Davis, L. (2013). Learning structured low-rank representations for image classification. In
*IEEE conference on computer vision and pattern recognition (CVPR)*(pp. 676–683).Google Scholar