Imposing Class-Wise Feature Similarity in Stacked Autoencoders by Nuclear Norm Regularization

Gupta, Kavya; Majumdar, Angshul

doi:10.1007/s11063-017-9731-2

Imposing Class-Wise Feature Similarity in Stacked Autoencoders by Nuclear Norm Regularization

Published: 02 November 2017

Volume 48, pages 615–629, (2018)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

336 Accesses
2 Citations
Explore all metrics

Abstract

This work proposes a new formulation for supervised stacked autoencoder. We argue that features from the same class should be similar to each other and hence linearly dependent. This means that, when stacked as columns, the feature matrix for each class will be rank deficient (low-rank). We impose this constraint into the stacked autoencoder formulation in the form of nuclear norm penalties on class-wise feature matrices at each level. The nuclear norm penalty is the convex surrogate of rank, and promotes a low-rank solution as desired by our proposal. Owing to the nuclear norm penalties, our formulation is non-smooth; hence cannot be solved using gradient descent based techniques like backpropagation directly. Moreover we learn the stacked autoencoder in one go, without the usual pre-training followed by fine-tuning regime. Both the ends (non-smooth cost function and single stage training for all the layers simultaneously) are met by employing the variable splitting followed by augmented Lagrangian method of alternating directions. Two sets of experiments have been carried out. The first set is on a variety of benchmark datasets. Our method excels over other deep learning models compared against—class sparse stacked autoencoder, deep belief network and discriminative deep belief network. The second experiment is on the brain computer classification problem; we find that our method outperforms prior deep learning based solutions utilized for this task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Bengio Y (2009) Learning deep architectures for AI. Found Trends\({\textregistered }\) Mach Learn 2(1):1–127
Maria J, Amaro J, Falcao G, Alexandre LA (2016) Stacked autoencoders using low-power accelerated architectures for object recognition in autonomous systems. Neural Process Lett 43(2):445–458
Article Google Scholar
Zhou S, Chen Q, Wang X (2013) Convolutional deep networks for visual data classification. Neural Process Lett 38(1):17–27
Article Google Scholar
Chen K, Salman A (2011) Learning speaker-specific characteristics with a deep neural architecture. IEEE Trans Neural Netw 22(11):1744–1756
Article Google Scholar
Kamimura R, Nakanishi S (1995) Feature detectors by autoencoders: decomposition of input patterns into atomic features by neural networks. Neural Process Lett 2(6):17–22
Article Google Scholar
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Bengio Y, Lamblin P, Popovici D, Larochelle H (2010) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153
Google Scholar
Jarrett K, Kavukcuoglu K, Ranzato MA, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: International conference on computer vision, pp 2146–2153
Bianchini M, Scarselli F (2015) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 25(8):1553–1565
Article Google Scholar
Ng A (2011) Sparse autoencoder. In: CS294A lecture notes, p 72
Makhzani A, Frey B (2013) k-sparse autoencoders. arXiv preprint arXiv:1312.5663
Cho K (2013) Simple sparsification improves sparse denoising autoencoders in denoising highly noisy images. ICML
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. ICML
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Wan L, Zeiler M, Zhang S, LeCun Y, Fergus R (2013) Regularization of neural networks using dropconnect. ICML
Frazão X, Alexandre LA (2014) DropAll: generalization of two convolutional neural network regularization methods. In: Image analysis and recognition, pp 282–289
Majumdar A, Vatsa M, Singh R (2017) Face recognition via class sparsity based supervised encoding. IEEE Trans Pattern Anal Mach Intell 39:1273–1280
Article Google Scholar
Razakarivony S, Jurie F (2014) Discriminative autoencoders for small targets detection. ICPR
Chen M, Weinberger KQ, Sha F, Bengio Y (2014) Marginalized denoising auto-encoders for nonlinear representations. ICML
Recht B, Xu W, Hassibi B (2011) Null space conditions and thresholds for rank minimization. Math Program 127(1):175–202
Article MathSciNet MATH Google Scholar
Candès EJ, Tao T (2009) The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inf Theory 56(5):2053–2080
Article MathSciNet MATH Google Scholar
Majumdar A, Ward RK (2011) Some empirical advances in matrix completion. Signal Process 91(5):1334–1338
Article MATH Google Scholar
Gogna A, Shukla A, Majumdar A (2014) Matrix Recovery using Split Bregman. ICPR
Yin W, Osher S, Goldfarb D, Darbon J (2008) Bregman iterative algorithms for \(\ell _1\)-minimization with applications to compressed sensing. SIAM J Imaging Sci 1(1):143–168
Article MathSciNet MATH Google Scholar
Candès EJ, Eldar YC, Strohmer T, Voroninski V (2013) Phase retrieval via matrix completion. SIAM J Imaging Sci 6(1):199–225
Article MathSciNet MATH Google Scholar
Liu Z, Vandenberghe L (2009) Interior-point method for nuclear norm approximation with application to system identification. SIAM J Matrix Anal Appl 31(3), 1235-1256
Zhao B, Haldar JP, Brinegar C, Liang ZP (2010) Low rank matrix recovery for real-time cardiac MRI. IEEE ISBI
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(1):1–37
Article MathSciNet MATH Google Scholar
Gogna A, Majumdar A (2015) Matrix completion incorporating auxiliary information for recommender system design. Expert Syst Appl 24(14):5789–5799
Article Google Scholar
Tripathi A, Majumdar A (2017) Asymmetric stacked autoencoder. IEEE IJCNN
Gülçehre Ç, Moczulski M, Denil M, Bengio Y (2016) Noisy Activation Functions. ICML
Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. ICML
Bhattacharya U, Chaudhuri BB (2009) Handwritten numeral databases of Indian scripts and multistage recogni-tion of mixed numerals. IEEE Trans Pattern Anal Mach Intell 31(3):444–457
Article Google Scholar
http://www.iro.umontreal.ca/%7Elisa/twiki/bin/view.cgi/Public/RectanglesData
https://archive.ics.uci.edu/ml/datasets/ISOLET
Larochelle H, Bengio Y (2008) Classification using discriminative restricted Boltzmann machines. ICML
http://ceit.aut.ac.ir/~keyvanrad/DeeBNet%20Toolbox.html
Sankaran A, Sharma G, Singh R, Vatsa M, Majumdar A (2017) Class sparsity signature based restricted Boltzmann machines. Pattern Recognit 61:674–685
Article Google Scholar
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Article Google Scholar
Turner JT, Page A, Mohsenin T, Oates T (2014) Deep belief networks used on high resolution multichannel electroencephalography data for seizure detection. AAAI
Al-kaysi AM, Al-Ani A, Boonstra TW (2015) A multichannel deep belief network for the classification of EEG data. ICONIP
Ren Y, Wu Y (2014) Convolutional deep belief networks for feature extraction of EEG signal. IEEE IJCNN
Brodu N, Lotte F, Lecuyer A (2011) Comparative study of band-power extraction techniques for motor imagery classification’. In: IEEE symposium on computational intelligence, cognitive algorithms, mind and brain (CCMB)
Anderson CW, Stolz EA, Shamsunder S (1998) Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks. IEEE Trans Biomed Eng 45(3):277–286
Article Google Scholar
Müller-Gerking J, Pfurtcheller G, Flyvberg H (1991) Designing optimal spatial filters for single-trial EEG classification in a movement task. Clin Neurophysiol 110(5):787–789
Article Google Scholar
Khurana P, Majumdar A, Ward RK (2016) Class-wise deep dictionaries for EEG classification. IEEE IJCNN

Download references

Author information

Authors and Affiliations

IIIT Delhi, New Delhi, India
Kavya Gupta & Angshul Majumdar

Authors

Kavya Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Angshul Majumdar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Angshul Majumdar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, K., Majumdar, A. Imposing Class-Wise Feature Similarity in Stacked Autoencoders by Nuclear Norm Regularization. Neural Process Lett 48, 615–629 (2018). https://doi.org/10.1007/s11063-017-9731-2

Download citation

Published: 02 November 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11063-017-9731-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imposing Class-Wise Feature Similarity in Stacked Autoencoders by Nuclear Norm Regularization

Abstract

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation