Majorization Minimization Technique for Optimally Solving Deep Dictionary Learning


The concept of deep dictionary learning (DDL) has been recently proposed. Unlike shallow dictionary learning which learns single level of dictionary to represent the data, it uses multiple layers of dictionaries. So far, the problem could only be solved in a greedy fashion; this was achieved by learning a single layer of dictionary in each stage where the coefficients from the previous layer acted as inputs to the subsequent layer (only the first layer used the training samples as inputs). This was not optimal; there was feedback from shallower to deeper layers but not the other way. This work proposes an optimal solution to DDL whereby all the layers of dictionaries are solved simultaneously. We employ the Majorization Minimization approach. Experiments have been carried out on benchmark datasets; it shows that optimal learning indeed improves over greedy piecemeal learning. Comparison with other unsupervised deep learning tools (stacked denoising autoencoder, deep belief network, contractive autoencoder and K-sparse autoencoder) show that our method supersedes their performance both in accuracy and speed.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    Tariyal S, Majumdar A, Singh R, Vatsa M (2016) Deep dictionary learning. IEEE Access 4:10096–10109

  2. 2.

    Singhal V, Gogna A, Majumdar A (2016) Deep dictionary learning versus Deep belief network versus stacked autoencoder: an empirical analysis. In: International conference on neural information processing, pp 337–344

  3. 3.

    Tariyal S, Aggarwal HK, Majumdar A (2016) Greedy deep dictionary learning for hyperspectral image classification. IEEE Whisp

  4. 4.

    Salakhutdinov R, Mnih A, Hinton G (2007) Restricted Boltzmann machines for collaborative filtering. In: ACM international conference on machine learning, pp 791–798

  5. 5.

    Cho K, Ilin A, Raiko T (2011) Improved learning of Gaussian-Bernoulli restricted Boltzmann machines. In: International conference on artificial neural networks, pp 10–17

  6. 6.

    Salakhutdinov R, Hinton GE (2009) Deep Boltzmann machines. In: AISTATS, p 3

  7. 7.

    Cho KH, Raiko T, Ilin A (2013) Gaussian-Bernoulli deep Boltzmann machine. In: International joint conference on neural networks, pp 1–7

  8. 8.

    Baldi P (2012) Autoencoders, unsupervised learning, and deep architectures. In: ICML workshop on unsupervised and transfer learning, pp 37–50

  9. 9.

    Japkowicz N, Hanson SJ, Gluck MA (2000) Nonlinear autoassociation is not equivalent to PCA. Neural Comput 12(3):531–545

    Article  Google Scholar 

  10. 10.

    Yu J, Zhang B, Kuang Z, Lin D, Fan J (2016) iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur. doi:10.1109/TIFS.2016.2636090

  11. 11.

    Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern. doi:10.1109/TCYB.2016.2591583

  12. 12.

    Qiao M, Liu L, Yu J, Xu C, Tao D (2017) Diversified dictionaries for multi-instance learning. Pattern Recognit 64:407–416

    Article  Google Scholar 

  13. 13.

    Bengio Y (2009) Learning deep architectures for AI. Foundations and Trends\(\textregistered \) in Machine Learning 2(1):1–127

  14. 14.

    Wu F, Jing XY, Yue D (2016) Multi-view discriminant dictionary learning via learning view-specific and shared structured dictionaries for image classification. Neural Process Lett. doi:10.1007/s11063-016-9545-7

  15. 15.

    Peng Y, Long X, Lu BL (2015) Graph based semi-supervised learning via structure preserving low-rank representation. Neural Process Lett 41(3):389–406

    Article  Google Scholar 

  16. 16.

    Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057

    Article  Google Scholar 

  17. 17.

    Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  MATH  Google Scholar 

  18. 18.

    Engan K, Aase SO, Husoy J H (1999) Method of optimal directions for frame design. In: IEEE international conference on acoustics, speech, and signal processing, pp 2443–2446

  19. 19.

    Trigeorgis G, Bousmalis K, Zafeiriou S, Schuller B (2014) A deep semi-NMF model for learning hidden representations. In: ACM international conference on machine learning, pp 1692–1700

  20. 20.

    Figueiredo MA, Bioucas-Dias JM, Nowak RD (2007) Majorization-minimization algorithms for wavelet-based image restoration. IEEE Trans Image Process 16(12):2980–2991

    MathSciNet  Article  Google Scholar 

  21. 21.

    Févotte C (2011) Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization. In: IEEE international conference on acoustics, speech and signal processing, pp 1980–1983

  22. 22.

    Mairal J (2015) Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J Optim 25(2):829–855

    MathSciNet  Article  MATH  Google Scholar 

  23. 23.

    Sriperumbudur BK, Torres DA, Lanckriet GR (2011) A majorization-minimization approach to the sparse generalized eigenvalue problem. Mach Learn 85(1–2):3–39

    MathSciNet  Article  MATH  Google Scholar 

  24. 24.

    Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: ACM international conference on machine learning, pp 473–480

  25. 25.

    20 Newsgroup.

  26. 26.

    Larochelle H, Bengio Y (2008) Classification using discriminative restricted Boltzmann machines. In: ACM international conference on machine learning, pp 536–543

  27. 27.

    Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302

    Article  Google Scholar 

  28. 28.

    GTZAN Dataset.

  29. 29.

    Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  30. 30.

    Makhzani A, Frey B (2013) k-Sparse autoencoders. arXiv preprint. arXiv:1312.5663

  31. 31.

    Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: ACM international conference on machine learning, pp 833–840

  32. 32.

    Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    MathSciNet  Article  MATH  Google Scholar 

  33. 33.

    Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Article  Google Scholar 

  34. 34.

    Tian F, Gao B, Cui Q, Chen E, Liu T-Y (2014) Learning deep representations for graph clustering. In: AAAI

  35. 35.

    Peng X, Xiao S, Feng J, Yau WY, Yi Z (2016) Deep subspace clustering with sparsity prior. In: The 25th international joint conference on artificial intelligence

Download references


The authors thank the Infosys Center for Artificial Intelligence for partial support.

Author information



Corresponding author

Correspondence to Angshul Majumdar.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Singhal, V., Majumdar, A. Majorization Minimization Technique for Optimally Solving Deep Dictionary Learning. Neural Process Lett 47, 799–814 (2018).

Download citation


  • Deep learning
  • Dictionary learning
  • Optimization