Abstract
Relationship between samples is often ignored when training neural networks for classification tasks. If properly utilized, such information can bring many benefits for the trained models. On the one hand, neural networks trained ignoring similarities between samples may represent different samples closely even if they belong to different classes, which undermines discrimination abilities of the trained models. On the other hand, regularizing inter-class and intra-class similarities in the feature space during training can effectively disentangle the representation between classes and make the representation sparse. To achieve this, a new regularization method is proposed to penalize positive inter-class similarities and negative intra-class similarities in the feature space. Experimental results show that the proposed method can not only obtain sparse and disentangled representation but also improve the performance of the trained models on many datasets.
Similar content being viewed by others
References
Arzamasov V (2018) Electrical grid stability simulated data data set. https://archive.ics.uci.edu/ml/datasets/Electrical+Grid+Stability+Simulated+Data+
Bengio Y, Courville A, Vincent P (2012) Representation learning: a review and new perspectives. arXiv:1206.5538
Bock RK (2007) Magic gamma telescope data set. https://archive.ics.uci.edu/ml/datasets/MAGIC+Gamma+Telescope
Chen TQ, Rubanova Y, Bettencourt J, Duvenaud DK (2018) Neural ordinary differential equations. In: Advances in neural information processing systems, pp 6571–6583
Dua D, Graff C (2017) UCI machine learning repository. https://archive.ics.uci.edu/ml
Freire AL, Barreto GA, Veloso M, Varela AT (2009) Short-term memory mechanisms in neural network learning of robot navigation tasks: a case study. In: Robotics symposium
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. arXiv:1412.6572
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645
Higgins I, Amos D, Pfau D, Racanière S, Matthey L, Rezende DJ, Lerchner A (2018) Towards a definition of disentangled representations. CoRR arXiv:1812.02230
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:1503.02531
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR arXiv:1502.03167
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. Technical reports, Citeseer
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems
LeCun Y, Bottou L, Bengio Y, Haffner P et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Lei Ba J, Kiros JR, Hinton GE (2016) Layer normalization. arXiv:1607.06450
Li P, Church KW, Hastie TJ (2007) Conditional random sampling: a sketch-based sampling technique for sparse data. In: Advances in neural information processing systems, pp 873–880
Li Z, Liu J, Yang Y, Zhou X, Lu H (2013) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150
Liu X, Liu W, Mei T, Ma H (2016) A deep learning-based approach to progressive vehicle re-identification for urban surveillance. In: Proceedings of the European conference on computer vision (ECCV), pp 869–884
Locatello F, Tschannen M, Bauer S, Rätsch G, Schölkopf B, Bachem O (2019) Disentangling factors of variation using few labels. arXiv:1905.01258
Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? In: Advances in neural information processing systems 32. Curran Associates Inc., pp 4694–4703. http://papers.nips.cc/paper/8717-when-does-label-smoothing-help.pdf
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: NIPS workshop on deep learning and unsupervised feature learning 2011
Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607
Praagman J (1985) Classification and regression trees: Leo breiman, jerome h. friedman, richard a. olshen and charles j. stone the wadsworth statistics, probability series, wadsworth, belmont, (1984) x + 358 pages. Eur J Oper Res 19(1):144. https://doi.org/10.1016/0377-2217(85)90321-2
Rajkovic V (1997) Nursery data set. https://archive.ics.uci.edu/ml/datasets/Nursery
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Stolcke A, Seide F (2018) Achieving human parity in conversational speech recognition using CNTK and a GPU farm. In: GPU technology conference. https://www.microsoft.com/en-us/research/publication/achieving-human-parity-conversational-speech-recognition-usingcntk-gpu-farm/, gPU Technology Conference
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. CoRR arXiv:1409.3215
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. arXiv:1512.00567
Thomas V, Bengio E, Fedus W, Pondard J, Beaudoin P, Larochelle H, Pineau J, Precup D, Bengio Y (2018) Disentangling the independently controllable factors of variation by interacting with the world. arXiv:1802.09484
Tran L, Yin X, Liu X (2017) Disentangled representation learning Gan for pose-invariant face recognition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1283–1292. https://doi.org/10.1109/CVPR.2017.141
Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: International conference on machine learning
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2008) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Wu Y, He K (2018) Group normalization. arXiv:1803.08494
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2017.634
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SCH (2020) Deep learning for person re-identification: a survey and outlook. arXiv:2001.04193
Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286
Funding
This work was supported by the National Natural Science Foundation of China [Grant No. 61432012].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors have declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gao, Z., Chen, Y., Guo, Q. et al. Gram regularization for sparse and disentangled representation. Pattern Anal Applic 25, 337–349 (2022). https://doi.org/10.1007/s10044-021-01033-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-021-01033-4