# Efficient Dictionary Learning with Sparseness-Enforcing Projections

## Abstract

Learning dictionaries suitable for sparse coding instead of using engineered bases has proven effective in a variety of image processing tasks. This paper studies the optimization of dictionaries on image data where the representation is enforced to be explicitly sparse with respect to a smooth, normalized sparseness measure. This involves the computation of Euclidean projections onto level sets of the sparseness measure. While previous algorithms for this optimization problem had at least quasi-linear time complexity, here the first algorithm with linear time complexity and constant space complexity is proposed. The key for this is the mathematically rigorous derivation of a characterization of the projection’s result based on a soft-shrinkage function. This theory is applied in an original algorithm called Easy Dictionary Learning (EZDL), which learns dictionaries with a simple and fast-to-compute Hebbian-like learning rule. The new algorithm is efficient, expressive and particularly simple to implement. It is demonstrated that despite its simplicity, the proposed learning algorithm is able to generate a rich variety of dictionaries, in particular a topographic organization of atoms or separable atoms. Further, the dictionaries are as expressive as those of benchmark learning algorithms in terms of the reproduction quality on entire images, and result in an equivalent denoising performance. EZDL learns approximately 30 % faster than the already very efficient Online Dictionary Learning algorithm, and is therefore eligible for rapid data set analysis and problems with vast quantities of learning samples.

## Keywords

Sparse coding Sparse representations Dictionary learning Explicit sparseness constraints Sparseness-enforcing projections## Notes

### Acknowledgments

The authors are grateful to Heiko Neumann, Florian Schüle, and Michael Gabb for helpful discussions. We would like to thank Julien Mairal and Karl Skretting for making implementations of their algorithms available. Parts of this work were performed on the computational resource bwUniCluster funded by the Ministry of Science, Research and Arts and the Universities of the State of Baden-Württemberg, Germany, within the framework program bwHPC. This work was supported by Daimler AG, Germany.

## References

- Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation.
*IEEE Transactions on Signal Processing*,*54*(11), 4311–4322.CrossRefGoogle Scholar - Bauer, F., & Memisevic, R. (2013). Feature grouping from spatially constrained multiplicative interaction. In
*Proceedings of the International Conference on Learning Representations*. arXiv:1301.3391v3. - Bell, A. J., & Sejnowski, T. J. (1997). The "independent components" of natural scenes are edge filters.
*Vision Research*,*37*(23), 3327–3338.CrossRefGoogle Scholar - Bertsekas, D. P. (1999).
*Nonlinear programming*(2nd ed.). Belmont: Athena Scientific.MATHGoogle Scholar - Bishop, C. M. (1995).
*Neural networks for pattern recognition*. Oxford: Clarendon Press.Google Scholar - Blackford, L. S., et al. (2002). An updated set of basic linear algebra subprograms (BLAS).
*ACM Transactions on Mathematical Software*,*28*(2), 135–151.CrossRefMathSciNetGoogle Scholar - Bottou, L., & LeCun, Y. (2004). Large scale online learning. In
*Advances in Neural Information Processing Systems*(Vol. 16, pp. 217–224).Google Scholar - Bredies, K., & Lorenz, D. A. (2008). Linear convergence of iterative soft-thresholding.
*Journal of Fourier Analysis and Applications*,*14*(5–6), 813–837.CrossRefMathSciNetMATHGoogle Scholar - Coates, A., & Ng, A. Y. (2011). The importance of encoding versus training with sparse coding and vector quantization. In
*Proceedings of the International Conference on Machine Learning*(pp. 921–928).Google Scholar - Deutsch, F. (2001).
*Best approximation in inner product spaces*. New York: Springer.CrossRefMATHGoogle Scholar - Dong, W., Zhang, L., Shi, G., & Wu, X. (2011). Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization.
*IEEE Transactions on Image Processing*,*20*(7), 1838–1857.CrossRefMathSciNetGoogle Scholar - Donoho, D. L. (1995). De-noising by soft-thresholding.
*IEEE Transactions on Information Theory*,*41*(3), 613–627.CrossRefMathSciNetMATHGoogle Scholar - Donoho, D. L. (2006). For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution.
*Communications on Pure and Applied Mathematics*,*59*(6), 797–829.CrossRefMathSciNetMATHGoogle Scholar - Duarte-Carvajalino, J. M., & Sapiro, G. (2009). Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization.
*IEEE Transactions on Image Processing*,*18*(7), 1395–1408.CrossRefMathSciNetGoogle Scholar - Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank.
*Psychometrika*,*1*(3), 211–218.CrossRefMATHGoogle Scholar - Elad, M. (2006). Why simple shrinkage is still relevant for redundant representations?
*IEEE Transactions on Information Theory*,*52*(12), 5559–5569.CrossRefMathSciNetMATHGoogle Scholar - Foucart, S., & Rauhut, H. (2013).
*Mathematical introduction to compressive sensing*. New York: Birkhäuser.CrossRefMATHGoogle Scholar - Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Alken, P., et al. (2009).
*GNU scientific library reference manual*(3rd ed.). Bristol: Network Theory Ltd.Google Scholar - Gharavi-Alkhansari, M., & Huang, T. S. (1998). A fast orthogonal matching pursuit algorithm. In
*Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing*(Vol. III, pp. 1389–1392).Google Scholar - Goldberg, D. (1991). What every computer scientist should know about floating-point arithmetic.
*ACM Computing Surveys*,*23*(1), 5– 48.CrossRefGoogle Scholar - Hawe, S., Seibert, M., & Kleinsteuber, M. (2013). Separable dictionary learning. In
*Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*(pp. 438–445).Google Scholar - Hoggar, S. G. (2006).
*Mathematics of digital images: Creation, compression, restoration, recognition*. Cambridge: Cambridge University Press.CrossRefGoogle Scholar - Horev, I., Bryt, O., & Rubinstein, R. (2012). Adaptive image compression using sparse dictionaries. In
*Proceedings of the International Conference on Systems, Signals and Image Processing*(pp. 592–595).Google Scholar - Hoyer, P. O. (2004). Non-negative matrix factorization with sparseness constraints.
*Journal of Machine Learning Research*,*5*, 1457– 1469.MathSciNetMATHGoogle Scholar - Hoyer, P. O., & Hyvärinen, A. (2000). Independent component analysis applied to feature extraction from colour and stereo images.
*Network: Computation in Neural Systems, 11*(3), 191–210.Google Scholar - Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex.
*Journal of Physiology*,*148*(3), 574–591.CrossRefGoogle Scholar - Hurley, N., & Rickard, S. (2009). Comparing measures of sparsity.
*IEEE Transactions on Information Theory*,*55*(10), 4723–4741.CrossRefMathSciNetGoogle Scholar - Hyvärinen, A. (1999). Sparse code shrinkage: Denoising of nongaussian data by maximum likelihood estimation.
*Neural Computation*,*11*(7), 1739–1768.CrossRefGoogle Scholar - Hyvärinen, A., & Hoyer, P. O. (2000). Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces.
*Neural Computation*,*12*(7), 1705–1720.CrossRefGoogle Scholar - Hyvärinen, A., Hoyer, P. O., & Inki, M. (2001). Topographic independent component analysis.
*Neural Computation*,*13*(7), 1527–1558.CrossRefMATHGoogle Scholar - Hyvärinen, A., Hurri, J., & Hoyer, P. O. (2009).
*Natural image statistics–A probabilistic approach to early computational vision*. London: Springer.MATHGoogle Scholar - Jones, J. P., & Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex.
*Journal of Neurophysiology*,*58*(6), 1233– 1258.Google Scholar - Kavukcuoglu, K., Ranzato, M., Fergus, R., & LeCun, Y. (2009). Learning invariant features through topographic filter maps. In
*Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*(pp. 1605–1612).Google Scholar - Kohonen, T. (1990). The self-organizing map.
*Proceedings of the IEEE*,*78*(9), 1464–1480.CrossRefGoogle Scholar - Kreutz-Delgado, K., Murray, J. F., Rao, B. D., Engan, K., Lee, T.-W., & Sejnowski, T. J. (2003). Dictionary learning algorithms for sparse representation.
*Neural Computation*,*15*(2), 349–396.CrossRefMATHGoogle Scholar - Laughlin, S. B., & Sejnowski, T. J. (2003). Communication in neuronal networks.
*Science*,*301*(5641), 1870–1874.CrossRefGoogle Scholar - Liu, J., & Ye, J. (2009). Efficient Euclidean projections in linear time. In
*Proceedings of the International Conference on Machine Learning*(pp. 657–664).Google Scholar - Lopes, M. E. (2013). Estimating unknown sparsity in compressed sensing. In
*Proceedings of the International Conference on Machine Learning*(pp. 217–225).Google Scholar - Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009a). Online dictionary learning for sparse coding. In
*Proceedings of the International Conference on Machine Learning*(pp. 689–696).Google Scholar - Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009b). Non-local sparse models for image restoration. In
*Proceedings of the International Conference on Computer Vision*(pp. 2272–2279).Google Scholar - Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization.
*The Computer Journal*,*7*(4), 308–313.CrossRefMATHGoogle Scholar - Neudecker, H. (1969). Some theorems on matrix differentiation with special reference to Kronecker matrix products.
*Journal of the American Statistical Association*,*64*(327), 953–963.CrossRefMATHGoogle Scholar - Olmos, A., & Kingdom, F. A. A. (2004). A biologically inspired algorithm for the recovery of shading and reflectance images.
*Perception*,*33*(12), 1463–1473.CrossRefGoogle Scholar - Olshausen, B. A. (2003). Learning sparse, overcomplete representations of time-varying natural images. In
*Proceedings of the International Conference on Image Processing*(Vol. I, pp. 41–44).Google Scholar - Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images.
*Nature*,*381*(6583), 607–609.CrossRefGoogle Scholar - Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by V1?
*Vision Research*,*37*(23), 3311–3325.Google Scholar - Potluru, V. K., Plis, S. M., Le Roux, J., Pearlmutter, B. A., Calhoun, V. D., & Hayes, T. P. (2013). Block coordinate descent for sparse NMF. In
*Proceedings of the International Conference on Learning Representations*. arXiv:1301.3527v2. - Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007).
*Numerical recipes: The art of scientific computing*(3rd ed.). Cambridge: Cambridge University Press.Google Scholar - Rigamonti, R., Sironi, A., Lepetit, V., & Fua, P. (2013). Learning separable filters. In
*Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition*(pp. 2754–2761).Google Scholar - Ringach, D. L. (2002). Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex.
*Journal of Neurophysiology*,*88*(1), 455–463.Google Scholar - Rodgers, J. L., & Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient.
*The American Statistician*,*42*(1), 59–66.CrossRefGoogle Scholar - Rozell, C. J., Johnson, D. H., Baraniuk, R. G., & Olshausen, B. A. (2008). Sparse coding via thresholding and local competition in neural circuits.
*Neural Computation*,*20*(10), 2526–2563.CrossRefMathSciNetGoogle Scholar - Skretting, K., & Engan, K. (2010). Recursive least squares dictionary learning algorithm.
*IEEE Transactions on Signal Processing*,*58*(4), 2121–2130.CrossRefMathSciNetGoogle Scholar - Skretting, K., & Engan, K. (2011). Image compression using learned dictionaries by RLS-DLA and compared with K-SVD. In
*Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing*(pp. 1517–1520).Google Scholar - Society of Motion Picture and Television Engineers (SMPTE). (1993). Recommended practice RP 177–193: Derivation of basic television color equations.Google Scholar
- Theis, F. J., Stadlthanner, K., & Tanaka, T. (2005). First results on uniqueness of sparse non-negative matrix factorization. In
*Proceedings of the European Signal Processing Conference*(Vol. 3, pp. 1672–1675)Google Scholar - Thom, M., & Palm, G. (2013). Sparse activity and sparse connectivity in supervised learning.
*Journal of Machine Learning Research*,*14*, 1091–1143.Google Scholar - Tošić, I., Olshausen, B. A., & Culpepper, B. J. (2011). Learning sparse representations of depth.
*IEEE Journal of Selected Topics in Signal Processing*,*5*(5), 941–952.Google Scholar - Traub, J. F. (1964).
*Iterative methods for the solution of equations*. Englewood Cliffs: Prentice-Hall.MATHGoogle Scholar - van Hateren, J. H., & Ruderman, D. L. (1998). Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex.
*Proceedings of the Royal Society B*,*265*(1412), 2315–2320.CrossRefGoogle Scholar - Wang, Z., & Bovik, A. C. (2009). Mean squared error: Love it or leave it? A new look at signal fidelity measures.
*IEEE Signal Processing Magazine*,*26*(1), 98–117.CrossRefGoogle Scholar - Watson, A. B. (1994). Image compression using the discrete cosine transform.
*The Mathematica Journal*,*4*(1), 81–88.Google Scholar - Willmore, B., & Tolhurst, D. J. (2001). Characterizing the sparseness of neural codes.
*Network: Computation in Neural Systems*,*12*(3), 255–270.Google Scholar - Wilson, D. R., & Martinez, T. R. (2003). The general inefficiency of batch training for gradient descent learning.
*Neural Networks*,*16*(10), 1429–1451.CrossRefGoogle Scholar - Yang, J., Wang, Z., Lin, Z., Cohen, S., & Huang, T. (2012). Coupled dictionary training for image super-resolution.
*IEEE Transactions on Image Processing*,*21*(8), 3467–3478.CrossRefMathSciNetGoogle Scholar - Yang, J., Wright, J., Huang, T., & Ma, Y. (2010). Image super-resolution via sparse representation.
*IEEE Transactions on Image Processing*,*19*(11), 2861–2873.CrossRefMathSciNetGoogle Scholar - Zelnik-Manor, L., Rosenblum, K., & Eldar, Y. C. (2012). Dictionary optimization for block-sparse representations.
*IEEE Transactions on Signal Processing*,*60*(5), 2386–2395.CrossRefMathSciNetGoogle Scholar