Majorization Minimization Technique for Optimally Solving Deep Dictionary Learning

Article

Abstract

The concept of deep dictionary learning (DDL) has been recently proposed. Unlike shallow dictionary learning which learns single level of dictionary to represent the data, it uses multiple layers of dictionaries. So far, the problem could only be solved in a greedy fashion; this was achieved by learning a single layer of dictionary in each stage where the coefficients from the previous layer acted as inputs to the subsequent layer (only the first layer used the training samples as inputs). This was not optimal; there was feedback from shallower to deeper layers but not the other way. This work proposes an optimal solution to DDL whereby all the layers of dictionaries are solved simultaneously. We employ the Majorization Minimization approach. Experiments have been carried out on benchmark datasets; it shows that optimal learning indeed improves over greedy piecemeal learning. Comparison with other unsupervised deep learning tools (stacked denoising autoencoder, deep belief network, contractive autoencoder and K-sparse autoencoder) show that our method supersedes their performance both in accuracy and speed.

Keywords

Deep learning Dictionary learning Optimization 

Notes

Acknowledgements

The authors thank the Infosys Center for Artificial Intelligence for partial support.

References

  1. 1.
    Tariyal S, Majumdar A, Singh R, Vatsa M (2016) Deep dictionary learning. IEEE Access 4:10096–10109Google Scholar
  2. 2.
    Singhal V, Gogna A, Majumdar A (2016) Deep dictionary learning versus Deep belief network versus stacked autoencoder: an empirical analysis. In: International conference on neural information processing, pp 337–344Google Scholar
  3. 3.
    Tariyal S, Aggarwal HK, Majumdar A (2016) Greedy deep dictionary learning for hyperspectral image classification. IEEE WhispGoogle Scholar
  4. 4.
    Salakhutdinov R, Mnih A, Hinton G (2007) Restricted Boltzmann machines for collaborative filtering. In: ACM international conference on machine learning, pp 791–798Google Scholar
  5. 5.
    Cho K, Ilin A, Raiko T (2011) Improved learning of Gaussian-Bernoulli restricted Boltzmann machines. In: International conference on artificial neural networks, pp 10–17Google Scholar
  6. 6.
    Salakhutdinov R, Hinton GE (2009) Deep Boltzmann machines. In: AISTATS, p 3Google Scholar
  7. 7.
    Cho KH, Raiko T, Ilin A (2013) Gaussian-Bernoulli deep Boltzmann machine. In: International joint conference on neural networks, pp 1–7Google Scholar
  8. 8.
    Baldi P (2012) Autoencoders, unsupervised learning, and deep architectures. In: ICML workshop on unsupervised and transfer learning, pp 37–50Google Scholar
  9. 9.
    Japkowicz N, Hanson SJ, Gluck MA (2000) Nonlinear autoassociation is not equivalent to PCA. Neural Comput 12(3):531–545CrossRefGoogle Scholar
  10. 10.
    Yu J, Zhang B, Kuang Z, Lin D, Fan J (2016) iPrivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur. doi: 10.1109/TIFS.2016.2636090
  11. 11.
    Yu J, Yang X, Gao F, Tao D (2016) Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans Cybern. doi: 10.1109/TCYB.2016.2591583
  12. 12.
    Qiao M, Liu L, Yu J, Xu C, Tao D (2017) Diversified dictionaries for multi-instance learning. Pattern Recognit 64:407–416CrossRefGoogle Scholar
  13. 13.
    Bengio Y (2009) Learning deep architectures for AI. Foundations and Trends\(\textregistered \) in Machine Learning 2(1):1–127Google Scholar
  14. 14.
    Wu F, Jing XY, Yue D (2016) Multi-view discriminant dictionary learning via learning view-specific and shared structured dictionaries for image classification. Neural Process Lett. doi: 10.1007/s11063-016-9545-7
  15. 15.
    Peng Y, Long X, Lu BL (2015) Graph based semi-supervised learning via structure preserving low-rank representation. Neural Process Lett 41(3):389–406CrossRefGoogle Scholar
  16. 16.
    Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057CrossRefGoogle Scholar
  17. 17.
    Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791CrossRefGoogle Scholar
  18. 18.
    Engan K, Aase SO, Husoy J H (1999) Method of optimal directions for frame design. In: IEEE international conference on acoustics, speech, and signal processing, pp 2443–2446Google Scholar
  19. 19.
    Trigeorgis G, Bousmalis K, Zafeiriou S, Schuller B (2014) A deep semi-NMF model for learning hidden representations. In: ACM international conference on machine learning, pp 1692–1700Google Scholar
  20. 20.
    Figueiredo MA, Bioucas-Dias JM, Nowak RD (2007) Majorization-minimization algorithms for wavelet-based image restoration. IEEE Trans Image Process 16(12):2980–2991MathSciNetCrossRefGoogle Scholar
  21. 21.
    Févotte C (2011) Majorization-minimization algorithm for smooth Itakura-Saito nonnegative matrix factorization. In: IEEE international conference on acoustics, speech and signal processing, pp 1980–1983Google Scholar
  22. 22.
    Mairal J (2015) Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM J Optim 25(2):829–855MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Sriperumbudur BK, Torres DA, Lanckriet GR (2011) A majorization-minimization approach to the sparse generalized eigenvalue problem. Mach Learn 85(1–2):3–39MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: ACM international conference on machine learning, pp 473–480Google Scholar
  25. 25.
  26. 26.
    Larochelle H, Bengio Y (2008) Classification using discriminative restricted Boltzmann machines. In: ACM international conference on machine learning, pp 536–543Google Scholar
  27. 27.
    Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302CrossRefGoogle Scholar
  28. 28.
  29. 29.
    Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408MathSciNetMATHGoogle Scholar
  30. 30.
    Makhzani A, Frey B (2013) k-Sparse autoencoders. arXiv preprint. arXiv:1312.5663
  31. 31.
    Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. In: ACM international conference on machine learning, pp 833–840Google Scholar
  32. 32.
    Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227CrossRefGoogle Scholar
  34. 34.
    Tian F, Gao B, Cui Q, Chen E, Liu T-Y (2014) Learning deep representations for graph clustering. In: AAAIGoogle Scholar
  35. 35.
    Peng X, Xiao S, Feng J, Yau WY, Yi Z (2016) Deep subspace clustering with sparsity prior. In: The 25th international joint conference on artificial intelligenceGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Indraprastha Institute of Information TechnologyDelhiIndia

Personalised recommendations