International Journal of Computer Vision

, Volume 114, Issue 2–3, pp 168–194 | Cite as

Efficient Dictionary Learning with Sparseness-Enforcing Projections

  • Markus Thom
  • Matthias Rapp
  • Günther Palm


Learning dictionaries suitable for sparse coding instead of using engineered bases has proven effective in a variety of image processing tasks. This paper studies the optimization of dictionaries on image data where the representation is enforced to be explicitly sparse with respect to a smooth, normalized sparseness measure. This involves the computation of Euclidean projections onto level sets of the sparseness measure. While previous algorithms for this optimization problem had at least quasi-linear time complexity, here the first algorithm with linear time complexity and constant space complexity is proposed. The key for this is the mathematically rigorous derivation of a characterization of the projection’s result based on a soft-shrinkage function. This theory is applied in an original algorithm called Easy Dictionary Learning (EZDL), which learns dictionaries with a simple and fast-to-compute Hebbian-like learning rule. The new algorithm is efficient, expressive and particularly simple to implement. It is demonstrated that despite its simplicity, the proposed learning algorithm is able to generate a rich variety of dictionaries, in particular a topographic organization of atoms or separable atoms. Further, the dictionaries are as expressive as those of benchmark learning algorithms in terms of the reproduction quality on entire images, and result in an equivalent denoising performance. EZDL learns approximately 30 % faster than the already very efficient Online Dictionary Learning algorithm, and is therefore eligible for rapid data set analysis and problems with vast quantities of learning samples.


Sparse coding Sparse representations Dictionary learning Explicit sparseness constraints Sparseness-enforcing projections 



The authors are grateful to Heiko Neumann, Florian Schüle, and Michael Gabb for helpful discussions. We would like to thank Julien Mairal and Karl Skretting for making implementations of their algorithms available. Parts of this work were performed on the computational resource bwUniCluster funded by the Ministry of Science, Research and Arts and the Universities of the State of Baden-Württemberg, Germany, within the framework program bwHPC. This work was supported by Daimler AG, Germany.


  1. Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(11), 4311–4322.CrossRefGoogle Scholar
  2. Bauer, F., & Memisevic, R. (2013). Feature grouping from spatially constrained multiplicative interaction. In Proceedings of the International Conference on Learning Representations. arXiv:1301.3391v3.
  3. Bell, A. J., & Sejnowski, T. J. (1997). The "independent components" of natural scenes are edge filters. Vision Research, 37(23), 3327–3338.CrossRefGoogle Scholar
  4. Bertsekas, D. P. (1999). Nonlinear programming (2nd ed.). Belmont: Athena Scientific.zbMATHGoogle Scholar
  5. Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Clarendon Press.Google Scholar
  6. Blackford, L. S., et al. (2002). An updated set of basic linear algebra subprograms (BLAS). ACM Transactions on Mathematical Software, 28(2), 135–151.CrossRefMathSciNetGoogle Scholar
  7. Bottou, L., & LeCun, Y. (2004). Large scale online learning. In Advances in Neural Information Processing Systems (Vol. 16, pp. 217–224).Google Scholar
  8. Bredies, K., & Lorenz, D. A. (2008). Linear convergence of iterative soft-thresholding. Journal of Fourier Analysis and Applications, 14(5–6), 813–837.CrossRefMathSciNetzbMATHGoogle Scholar
  9. Coates, A., & Ng, A. Y. (2011). The importance of encoding versus training with sparse coding and vector quantization. In Proceedings of the International Conference on Machine Learning (pp. 921–928).Google Scholar
  10. Deutsch, F. (2001). Best approximation in inner product spaces. New York: Springer.CrossRefzbMATHGoogle Scholar
  11. Dong, W., Zhang, L., Shi, G., & Wu, X. (2011). Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Transactions on Image Processing, 20(7), 1838–1857.CrossRefMathSciNetGoogle Scholar
  12. Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory, 41(3), 613–627.CrossRefMathSciNetzbMATHGoogle Scholar
  13. Donoho, D. L. (2006). For most large underdetermined systems of linear equations the minimal \(\ell _1\)-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59(6), 797–829.CrossRefMathSciNetzbMATHGoogle Scholar
  14. Duarte-Carvajalino, J. M., & Sapiro, G. (2009). Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization. IEEE Transactions on Image Processing, 18(7), 1395–1408.CrossRefMathSciNetGoogle Scholar
  15. Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1(3), 211–218.CrossRefzbMATHGoogle Scholar
  16. Elad, M. (2006). Why simple shrinkage is still relevant for redundant representations? IEEE Transactions on Information Theory, 52(12), 5559–5569.CrossRefMathSciNetzbMATHGoogle Scholar
  17. Foucart, S., & Rauhut, H. (2013). Mathematical introduction to compressive sensing. New York: Birkhäuser.CrossRefzbMATHGoogle Scholar
  18. Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Alken, P., et al. (2009). GNU scientific library reference manual (3rd ed.). Bristol: Network Theory Ltd.Google Scholar
  19. Gharavi-Alkhansari, M., & Huang, T. S. (1998). A fast orthogonal matching pursuit algorithm. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. III, pp. 1389–1392).Google Scholar
  20. Goldberg, D. (1991). What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, 23(1), 5– 48.CrossRefGoogle Scholar
  21. Hawe, S., Seibert, M., & Kleinsteuber, M. (2013). Separable dictionary learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 438–445).Google Scholar
  22. Hoggar, S. G. (2006). Mathematics of digital images: Creation, compression, restoration, recognition. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  23. Horev, I., Bryt, O., & Rubinstein, R. (2012). Adaptive image compression using sparse dictionaries. In Proceedings of the International Conference on Systems, Signals and Image Processing (pp. 592–595).Google Scholar
  24. Hoyer, P. O. (2004). Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5, 1457– 1469.MathSciNetzbMATHGoogle Scholar
  25. Hoyer, P. O., & Hyvärinen, A. (2000). Independent component analysis applied to feature extraction from colour and stereo images. Network: Computation in Neural Systems, 11(3), 191–210.Google Scholar
  26. Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. Journal of Physiology, 148(3), 574–591.CrossRefGoogle Scholar
  27. Hurley, N., & Rickard, S. (2009). Comparing measures of sparsity. IEEE Transactions on Information Theory, 55(10), 4723–4741.CrossRefMathSciNetGoogle Scholar
  28. Hyvärinen, A. (1999). Sparse code shrinkage: Denoising of nongaussian data by maximum likelihood estimation. Neural Computation, 11(7), 1739–1768.CrossRefGoogle Scholar
  29. Hyvärinen, A., & Hoyer, P. O. (2000). Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Computation, 12(7), 1705–1720.CrossRefGoogle Scholar
  30. Hyvärinen, A., Hoyer, P. O., & Inki, M. (2001). Topographic independent component analysis. Neural Computation, 13(7), 1527–1558.CrossRefzbMATHGoogle Scholar
  31. Hyvärinen, A., Hurri, J., & Hoyer, P. O. (2009). Natural image statistics–A probabilistic approach to early computational vision. London: Springer.zbMATHGoogle Scholar
  32. Jones, J. P., & Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. Journal of Neurophysiology, 58(6), 1233– 1258.Google Scholar
  33. Kavukcuoglu, K., Ranzato, M., Fergus, R., & LeCun, Y. (2009). Learning invariant features through topographic filter maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1605–1612).Google Scholar
  34. Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.CrossRefGoogle Scholar
  35. Kreutz-Delgado, K., Murray, J. F., Rao, B. D., Engan, K., Lee, T.-W., & Sejnowski, T. J. (2003). Dictionary learning algorithms for sparse representation. Neural Computation, 15(2), 349–396.CrossRefzbMATHGoogle Scholar
  36. Laughlin, S. B., & Sejnowski, T. J. (2003). Communication in neuronal networks. Science, 301(5641), 1870–1874.CrossRefGoogle Scholar
  37. Liu, J., & Ye, J. (2009). Efficient Euclidean projections in linear time. In Proceedings of the International Conference on Machine Learning (pp. 657–664).Google Scholar
  38. Lopes, M. E. (2013). Estimating unknown sparsity in compressed sensing. In Proceedings of the International Conference on Machine Learning (pp. 217–225).Google Scholar
  39. Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009a). Online dictionary learning for sparse coding. In Proceedings of the International Conference on Machine Learning (pp. 689–696).Google Scholar
  40. Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009b). Non-local sparse models for image restoration. In Proceedings of the International Conference on Computer Vision (pp. 2272–2279).Google Scholar
  41. Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4), 308–313.CrossRefzbMATHGoogle Scholar
  42. Neudecker, H. (1969). Some theorems on matrix differentiation with special reference to Kronecker matrix products. Journal of the American Statistical Association, 64(327), 953–963.CrossRefzbMATHGoogle Scholar
  43. Olmos, A., & Kingdom, F. A. A. (2004). A biologically inspired algorithm for the recovery of shading and reflectance images. Perception, 33(12), 1463–1473.CrossRefGoogle Scholar
  44. Olshausen, B. A. (2003). Learning sparse, overcomplete representations of time-varying natural images. In Proceedings of the International Conference on Image Processing (Vol. I, pp. 41–44).Google Scholar
  45. Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607–609.CrossRefGoogle Scholar
  46. Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37(23), 3311–3325.Google Scholar
  47. Potluru, V. K., Plis, S. M., Le Roux, J., Pearlmutter, B. A., Calhoun, V. D., & Hayes, T. P. (2013). Block coordinate descent for sparse NMF. In Proceedings of the International Conference on Learning Representations. arXiv:1301.3527v2.
  48. Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical recipes: The art of scientific computing (3rd ed.). Cambridge: Cambridge University Press.Google Scholar
  49. Rigamonti, R., Sironi, A., Lepetit, V., & Fua, P. (2013). Learning separable filters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2754–2761).Google Scholar
  50. Ringach, D. L. (2002). Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. Journal of Neurophysiology, 88(1), 455–463.Google Scholar
  51. Rodgers, J. L., & Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient. The American Statistician, 42(1), 59–66.CrossRefGoogle Scholar
  52. Rozell, C. J., Johnson, D. H., Baraniuk, R. G., & Olshausen, B. A. (2008). Sparse coding via thresholding and local competition in neural circuits. Neural Computation, 20(10), 2526–2563.CrossRefMathSciNetGoogle Scholar
  53. Skretting, K., & Engan, K. (2010). Recursive least squares dictionary learning algorithm. IEEE Transactions on Signal Processing, 58(4), 2121–2130.CrossRefMathSciNetGoogle Scholar
  54. Skretting, K., & Engan, K. (2011). Image compression using learned dictionaries by RLS-DLA and compared with K-SVD. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 1517–1520).Google Scholar
  55. Society of Motion Picture and Television Engineers (SMPTE). (1993). Recommended practice RP 177–193: Derivation of basic television color equations.Google Scholar
  56. Theis, F. J., Stadlthanner, K., & Tanaka, T. (2005). First results on uniqueness of sparse non-negative matrix factorization. In Proceedings of the European Signal Processing Conference (Vol. 3, pp. 1672–1675)Google Scholar
  57. Thom, M., & Palm, G. (2013). Sparse activity and sparse connectivity in supervised learning. Journal of Machine Learning Research, 14, 1091–1143.Google Scholar
  58. Tošić, I., Olshausen, B. A., & Culpepper, B. J. (2011). Learning sparse representations of depth. IEEE Journal of Selected Topics in Signal Processing, 5(5), 941–952.Google Scholar
  59. Traub, J. F. (1964). Iterative methods for the solution of equations. Englewood Cliffs: Prentice-Hall.zbMATHGoogle Scholar
  60. van Hateren, J. H., & Ruderman, D. L. (1998). Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex. Proceedings of the Royal Society B, 265(1412), 2315–2320.CrossRefGoogle Scholar
  61. Wang, Z., & Bovik, A. C. (2009). Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal Processing Magazine, 26(1), 98–117.CrossRefGoogle Scholar
  62. Watson, A. B. (1994). Image compression using the discrete cosine transform. The Mathematica Journal, 4(1), 81–88.Google Scholar
  63. Willmore, B., & Tolhurst, D. J. (2001). Characterizing the sparseness of neural codes. Network: Computation in Neural Systems, 12(3), 255–270.Google Scholar
  64. Wilson, D. R., & Martinez, T. R. (2003). The general inefficiency of batch training for gradient descent learning. Neural Networks, 16(10), 1429–1451.CrossRefGoogle Scholar
  65. Yang, J., Wang, Z., Lin, Z., Cohen, S., & Huang, T. (2012). Coupled dictionary training for image super-resolution. IEEE Transactions on Image Processing, 21(8), 3467–3478.CrossRefMathSciNetGoogle Scholar
  66. Yang, J., Wright, J., Huang, T., & Ma, Y. (2010). Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19(11), 2861–2873.CrossRefMathSciNetGoogle Scholar
  67. Zelnik-Manor, L., Rosenblum, K., & Eldar, Y. C. (2012). Dictionary optimization for block-sparse representations. IEEE Transactions on Signal Processing, 60(5), 2386–2395.CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.driveU / Institute of Measurement, Control and MicrotechnologyUlm UniversityUlmGermany
  2. 2.Institute of Neural Information ProcessingUlm UniversityUlmGermany

Personalised recommendations