Skip to main content
Log in

Imposing Class-Wise Feature Similarity in Stacked Autoencoders by Nuclear Norm Regularization

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

This work proposes a new formulation for supervised stacked autoencoder. We argue that features from the same class should be similar to each other and hence linearly dependent. This means that, when stacked as columns, the feature matrix for each class will be rank deficient (low-rank). We impose this constraint into the stacked autoencoder formulation in the form of nuclear norm penalties on class-wise feature matrices at each level. The nuclear norm penalty is the convex surrogate of rank, and promotes a low-rank solution as desired by our proposal. Owing to the nuclear norm penalties, our formulation is non-smooth; hence cannot be solved using gradient descent based techniques like backpropagation directly. Moreover we learn the stacked autoencoder in one go, without the usual pre-training followed by fine-tuning regime. Both the ends (non-smooth cost function and single stage training for all the layers simultaneously) are met by employing the variable splitting followed by augmented Lagrangian method of alternating directions. Two sets of experiments have been carried out. The first set is on a variety of benchmark datasets. Our method excels over other deep learning models compared against—class sparse stacked autoencoder, deep belief network and discriminative deep belief network. The second experiment is on the brain computer classification problem; we find that our method outperforms prior deep learning based solutions utilized for this task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

References

  1. Bengio Y (2009) Learning deep architectures for AI. Found Trends\({\textregistered }\) Mach Learn 2(1):1–127

  2. Maria J, Amaro J, Falcao G, Alexandre LA (2016) Stacked autoencoders using low-power accelerated architectures for object recognition in autonomous systems. Neural Process Lett 43(2):445–458

    Article  Google Scholar 

  3. Zhou S, Chen Q, Wang X (2013) Convolutional deep networks for visual data classification. Neural Process Lett 38(1):17–27

    Article  Google Scholar 

  4. Chen K, Salman A (2011) Learning speaker-specific characteristics with a deep neural architecture. IEEE Trans Neural Netw 22(11):1744–1756

    Article  Google Scholar 

  5. Kamimura R, Nakanishi S (1995) Feature detectors by autoencoders: decomposition of input patterns into atomic features by neural networks. Neural Process Lett 2(6):17–22

    Article  Google Scholar 

  6. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  7. Bengio Y, Lamblin P, Popovici D, Larochelle H (2010) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153

    Google Scholar 

  8. Jarrett K, Kavukcuoglu K, Ranzato MA, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: International conference on computer vision, pp 2146–2153

  9. Bianchini M, Scarselli F (2015) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 25(8):1553–1565

    Article  Google Scholar 

  10. Ng A (2011) Sparse autoencoder. In: CS294A lecture notes, p 72

  11. Makhzani A, Frey B (2013) k-sparse autoencoders. arXiv preprint arXiv:1312.5663

  12. Cho K (2013) Simple sparsification improves sparse denoising autoencoders in denoising highly noisy images. ICML

  13. Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. ICML

  14. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  15. Wan L, Zeiler M, Zhang S, LeCun Y, Fergus R (2013) Regularization of neural networks using dropconnect. ICML

  16. Frazão X, Alexandre LA (2014) DropAll: generalization of two convolutional neural network regularization methods. In: Image analysis and recognition, pp 282–289

  17. Majumdar A, Vatsa M, Singh R (2017) Face recognition via class sparsity based supervised encoding. IEEE Trans Pattern Anal Mach Intell 39:1273–1280

    Article  Google Scholar 

  18. Razakarivony S, Jurie F (2014) Discriminative autoencoders for small targets detection. ICPR

  19. Chen M, Weinberger KQ, Sha F, Bengio Y (2014) Marginalized denoising auto-encoders for nonlinear representations. ICML

  20. Recht B, Xu W, Hassibi B (2011) Null space conditions and thresholds for rank minimization. Math Program 127(1):175–202

    Article  MathSciNet  MATH  Google Scholar 

  21. Candès EJ, Tao T (2009) The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inf Theory 56(5):2053–2080

    Article  MathSciNet  MATH  Google Scholar 

  22. Majumdar A, Ward RK (2011) Some empirical advances in matrix completion. Signal Process 91(5):1334–1338

    Article  MATH  Google Scholar 

  23. Gogna A, Shukla A, Majumdar A (2014) Matrix Recovery using Split Bregman. ICPR

  24. Yin W, Osher S, Goldfarb D, Darbon J (2008) Bregman iterative algorithms for \(\ell _1\)-minimization with applications to compressed sensing. SIAM J Imaging Sci 1(1):143–168

    Article  MathSciNet  MATH  Google Scholar 

  25. Candès EJ, Eldar YC, Strohmer T, Voroninski V (2013) Phase retrieval via matrix completion. SIAM J Imaging Sci 6(1):199–225

    Article  MathSciNet  MATH  Google Scholar 

  26. Liu Z, Vandenberghe L (2009) Interior-point method for nuclear norm approximation with application to system identification. SIAM J Matrix Anal Appl 31(3), 1235-1256

  27. Zhao B, Haldar JP, Brinegar C, Liang ZP (2010) Low rank matrix recovery for real-time cardiac MRI. IEEE ISBI

  28. Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(1):1–37

    Article  MathSciNet  MATH  Google Scholar 

  29. Gogna A, Majumdar A (2015) Matrix completion incorporating auxiliary information for recommender system design. Expert Syst Appl 24(14):5789–5799

    Article  Google Scholar 

  30. Tripathi A, Majumdar A (2017) Asymmetric stacked autoencoder. IEEE IJCNN

  31. Gülçehre Ç, Moczulski M, Denil M, Bengio Y (2016) Noisy Activation Functions. ICML

  32. Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. ICML

  33. Bhattacharya U, Chaudhuri BB (2009) Handwritten numeral databases of Indian scripts and multistage recogni-tion of mixed numerals. IEEE Trans Pattern Anal Mach Intell 31(3):444–457

    Article  Google Scholar 

  34. http://www.iro.umontreal.ca/%7Elisa/twiki/bin/view.cgi/Public/RectanglesData

  35. https://archive.ics.uci.edu/ml/datasets/ISOLET

  36. Larochelle H, Bengio Y (2008) Classification using discriminative restricted Boltzmann machines. ICML

  37. http://ceit.aut.ac.ir/~keyvanrad/DeeBNet%20Toolbox.html

  38. Sankaran A, Sharma G, Singh R, Vatsa M, Majumdar A (2017) Class sparsity signature based restricted Boltzmann machines. Pattern Recognit 61:674–685

    Article  Google Scholar 

  39. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227

    Article  Google Scholar 

  40. Turner JT, Page A, Mohsenin T, Oates T (2014) Deep belief networks used on high resolution multichannel electroencephalography data for seizure detection. AAAI

  41. Al-kaysi AM, Al-Ani A, Boonstra TW (2015) A multichannel deep belief network for the classification of EEG data. ICONIP

  42. Ren Y, Wu Y (2014) Convolutional deep belief networks for feature extraction of EEG signal. IEEE IJCNN

  43. Brodu N, Lotte F, Lecuyer A (2011) Comparative study of band-power extraction techniques for motor imagery classification’. In: IEEE symposium on computational intelligence, cognitive algorithms, mind and brain (CCMB)

  44. Anderson CW, Stolz EA, Shamsunder S (1998) Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks. IEEE Trans Biomed Eng 45(3):277–286

    Article  Google Scholar 

  45. Müller-Gerking J, Pfurtcheller G, Flyvberg H (1991) Designing optimal spatial filters for single-trial EEG classification in a movement task. Clin Neurophysiol 110(5):787–789

    Article  Google Scholar 

  46. Khurana P, Majumdar A, Ward RK (2016) Class-wise deep dictionaries for EEG classification. IEEE IJCNN

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Angshul Majumdar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, K., Majumdar, A. Imposing Class-Wise Feature Similarity in Stacked Autoencoders by Nuclear Norm Regularization. Neural Process Lett 48, 615–629 (2018). https://doi.org/10.1007/s11063-017-9731-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-017-9731-2

Keywords

Navigation