Unlabeled PCA-shuffling initialization for convolutional neural networks

Ou, Jun; Li, Yujian; Shen, Chengkai

doi:10.1007/s10489-018-1230-2

Unlabeled PCA-shuffling initialization for convolutional neural networks

Published: 05 July 2018

Volume 48, pages 4565–4576, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jun Ou¹,
Yujian Li¹ &
Chengkai Shen¹

491 Accesses
6 Citations
Explore all metrics

A Correction to this article was published on 25 August 2018

This article has been updated

Abstract

In order to obtain prominent recognition accuracy convolutional neural networks (CNNs) need large amounts of labeled data to initialize network parameters. However, there exist two open problems, i.e., the uncertainties of the initialized effects and the limited labeled data To address the problems, we propose a novel method named UPSCNNs, which uses unlabeled data to perform Principal Component Analysis (PCA) and shuffling initialization for CNNs, composed of four steps, i.e. sampling the input images, calculating the sampling sets with PCA and initializing and shuffling the convolutional kernels. In cases with the same network architecture and activation function, i.e., Rectified Linear Units, we conduct the comparative experiments on three image datasets, i.e., STL-10, CIFAR-10(I) and CIFAR-10(II). In terms of accuracy, we find (1) the novel method increases by 4-20 percent in comparison to other weight initialization methods, e.g., Msra initialization, Xavier initialization and Random initialization and (2) an increase of 1-3 percent is obtained with unlabeled data than with only labeled data The results indicate that our method can make full use of unlabeled data for initializing CNNs to achieve good recognition effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

SSD: Single Shot MultiBox Detector

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Connor Shorten & Taghi M. Khoshgoftaar

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Olga Russakovsky, Jia Deng, … Li Fei-Fei

Change history

25 August 2018
The original version of this article unfortunately contained a mistake. Errors in the online version as follows: (1) In the 2nd row of Column 8 (Rel(%)) in Table 5, the number “54814” should be changed to “5.4814”.
25 August 2018
The original version of this article unfortunately contained a mistake. Errors in the online version as follows: (1) In the 2nd row of Column 8 (Rel(%)) in Table 5, the number ?54814? should be changed to ?5.4814?.

References

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
Article Google Scholar
Hinton G, Deng L, Yu D, Dahl GE, Mohamed AR, Jaitly N, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97
Article Google Scholar
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
MATH Google Scholar
Mikolov T, Deoras A, Povey D, Burget L, Černocký J (2011) Strategies for training large scale neural network language models. In: 2011 IEEE workshop on automatic speech recognition and understanding (ASRU). IEEE, pp 196–201
Bordes A, Chopra S (2014) Question answering with subgraph embeddings. arXiv:1406.3676
Jean S, Cho K, Memisevic R, Bengio Y (2014) On using very large target vocabulary for neural machine translation. arXiv:1412.2007
Thimm G, Fiesler E (1995) Neural network initialization. In: International workshop on artificial neural networks. Springer, Berlin, pp 535–542
Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics, pp 249–256
Bengio Y (2012) Practical recommendations for gradient-based training of deep architectures. In: Neural networks: tricks of the trade. Springer, Berlin, pp 437–478
Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Gu S, Jin Y (2017) Multi-train: a semi-supervised heterogeneous ensemble classifier. Neurocomputing 249:202–211
Article Google Scholar
He G, Li Y, Zhao W (2017) An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification. Knowl-Based Syst 124:80–92
Article Google Scholar
Karlos S, Fazakis N, Kotsiantis S, Sgarbas K (2017) Self-trained stacking model for semi-supervised learning. Int J Artif Intell Tools 26(2):1750001
Article Google Scholar
Fazakis N, Karlos S, Kotsiantis S, Sgarbas K (2017) Self-trained rotation forest for semi-supervised learning. J Intell Fuzzy Syst 32(1):711–722
Article Google Scholar
Grzeszick R, Fink GA (2016) An iterative partitioning-based method for semi-supervised annotation learning in image collections. Int J Pattern Recognit Artif Intell 30(2):1655005
Article MathSciNet Google Scholar
Scalzo F, Hu X (2013) Semi-supervised detection of intracranial pressure alarms using waveform dynamics. Physiol Meas 34(4):465
Article Google Scholar
Culp M, Michailidis G (2008) An iterative algorithm for extending learners to a semi-supervised setting. J Comput Graph Stat 17(3):545–571
Article MathSciNet Google Scholar
Zhu J, Hoi SC, Lyu MR (2008) Face annotation using transductive kernel fisher discriminant. IEEE Trans Multimed 10(1):86– 96
Article Google Scholar
Pfahringer B, Leschi C, Reutemann P (2007) Scaling up semi-supervised learning: an efficient and effective LLGC variant. In: Pacific-asia conference on knowledge discovery and data mining. Springer, Berlin, pp 236–247
Liu Y J, Lu S, Li D, Tong S (2017) Adaptive controller design-based ABLF for a class of nonlinear time-varying state constraint systems. IEEE Trans Syst Man Cybern Syst Hum 47(7):1546–1553
Article Google Scholar
Li D P, Liu YJ, Tong S, Chen CP, Li D (2018) Neural networks-based adaptive control for nonlinear state constrained systems with input delay. IEEE Transactions on Cybernetics
Wang Y, Qiu Y, Thai T, Moore K, Liu H, Zheng B (2017) A two-step convolutional neural network based computer-aided detection scheme for automatically segmenting adipose tissue volume depicting on CT images. Comput Methods Programs Biomed 144:97–104
Article Google Scholar
Yoon Y, Jeon HG, Yoo D, Lee JY, Kweon IS (2017) Light-field image super-resolution using convolutional neural network. IEEE Signal Process Lett 24(6):848–852
Article Google Scholar
Fu X, Huang J, Ding X, Liao Y, Paisley J (2017) Clearing the skies: a deep network architecture for single-image rain removal. IEEE Trans Image Process 26(6):2944–2956
Article MathSciNet Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Suárez-Paniagua V, Segura-Bedmar I, Martínez P (2017) Exploring convolutional neural networks for drug–drug interaction extraction. Database, 2017
Li C, Wang X, Liu W (2017) Neural features for pedestrian detection. Neurocomputing 238:420–432
Article Google Scholar
Sui X, Zheng Y, Wei B, Bi H, Wu J, Pan X, Zhang S (2017) Choroid segmentation from optical coherence tomography with graph-edge weights learned from deep convolutional neural networks. Neurocomputing 237:332–341
Article Google Scholar
Panda P, Sengupta A, Roy K (2017) Energy-efficient and improved image recognition with conditional deep learning. ACM J Emerg Technol Comput Syst (JETC) 13(3):33
Google Scholar
Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdisciplinary Rev Comput Stat 2 (4):433–459
Article Google Scholar
Krizhevsky A (2012) Cuda-convnet
Acharya UR, Fujita H, Oh SL, Raghavendra U, Tan JH, Adam M, Hagiwara Y (2018) Automated identification of shockable and non-shockable life-threatening ventricular arrhythmias using convolutional neural network. Futur Gener Comput Syst 79:952–959
Article Google Scholar
Coates A, Ng A, Lee H (2011) An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the 4th international conference on artificial intelligence and statistics, pp 215–223
Hui KY (2013) Direct modeling of complex invariances for visual object features. In: International conference on machine learning, pp 352–360
Coates A, Ng AY (2011) Selecting receptive fields in deep networks. In: Advances in neural information processing systems, pp 2528–2536
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61175004 and the Natural Science Foundation of Beijing Municipality under grant 4112009.

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, People’s Republic of China
Jun Ou, Yujian Li & Chengkai Shen

Authors

Jun Ou
View author publications
You can also search for this author in PubMed Google Scholar
Yujian Li
View author publications
You can also search for this author in PubMed Google Scholar
Chengkai Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Ou.

Additional information

The original version of this article was revised: incorrect data in found in tables 5, 6 and 8, and figures 7, 9, 10 and 11 were corrected.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ou, J., Li, Y. & Shen, C. Unlabeled PCA-shuffling initialization for convolutional neural networks. Appl Intell 48, 4565–4576 (2018). https://doi.org/10.1007/s10489-018-1230-2

Download citation

Published: 05 July 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10489-018-1230-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Unlabeled PCA-shuffling initialization for convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

A survey on Image Data Augmentation for Deep Learning

ImageNet Large Scale Visual Recognition Challenge

Change history

25 August 2018

25 August 2018

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unlabeled PCA-shuffling initialization for convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

A survey on Image Data Augmentation for Deep Learning

ImageNet Large Scale Visual Recognition Challenge

Change history

25 August 2018

25 August 2018

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation