A Simple Convolutional Transfer Neural Networks in Vision Tasks
Convolutional neural networks (ConvNets) is multi-stages trainable architecture that can learn invariant features in many vision tasks. Real-world applications of ConvNets are always limited by strong requirements of expensive and time-consuming labels generating in each specified task, so the challenges can be summarized as that labeled data is scarce while unlabeled data is abundant. The traditional ConvNets does not consider any information hidden in the large-scale unlabeled data. In this work, a very simple convolutional transfer neural networks (CTNN) has been proposed to address the challenges by introducing the idea of unsupervised transfer learning to ConvNets. We propose our model with LeNet5, one of the simplest model in ConvNets, where an efficient unsupervised reconstruction based pre-training strategy has been introduced to kernel training from both labeled and unlabeled data, or from both training and testing data. The contribution of the proposed model is that it can fully use all the data, including training and testing simultaneously, thus the performances can be improved when the labeled training data is insufficient. Widely used hand-written dataset MNIST, together with two retinal vessel datasets, DRIVE and STARE, are employed to validate the proposed work. The classification experiments results have demonstrated that the proposed CTNN is able to reduce the requirement of sufficient labeled training samples in real-world applications.
KeywordsConvolutional neural networks Transfer learning Unsupervised pre-training PCA
This work was supported in part by the National Natural Science Foundation of China under Grants 8167176681301278, 61172179, 61103121, 61571382, and 61571005, in part by the Guangdong Natural Science Foundation under Grant 2015A030313007, in part by the Fundamental Research Funds for the Central Universities under Grants 20720160075, 20720150169 and 20720150093, in part by the National Natural Science Foundation of Fujian Province, China 2017J01126, in part by the CCF-Tencent research fund.
- 3.Lecun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. Neural Netw. Curr. Appl. Chappman Hall 86(11), 2278–2324 (1992)Google Scholar
- 4.Jarrett, K., Kavukeuoglu, K., Ranzato, M.A., Lecun, Y.: What is the best multi-stage architecture for object recognition? In: IEEE International Conference on Computer Vision, pp. 2146–2153 (2009)Google Scholar
- 6.Frome, A., Cheung, G., Abdulkader, A., Zennaro, M., Wu, B.: Large-scale privacy protection in street level imagery. IEEE Int. Conf. Comput. Vis. 1(2), 2373–2380 (2009)Google Scholar
- 8.Lecun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional networks and applications in vision. In: IEEE International Conference on Computer Vision pp, pp. 253–256 (2010)Google Scholar
- 9.Soltau, H., Sano, G., Sainath, T.N.: Joint training of convolutional and non-convolutional neural networks. In: International Conference on Acoustics, Speech and Signal Processing, pp. 5572–5576 (2014)Google Scholar
- 10.Sainath, T.N., Mohamed, A.R., Kingsbury, B., Ramabhardran, B.: Deep convolutional neural networks for LVCSR. In: International Conference on Acoustics, Speech and Signal Processing, pp. 8614–8618 (2014)Google Scholar
- 12.Pieer, S., Lecun, Y.: Traffic sign recognition with multi-scale convolutional networks. In: International Joint Conference on Neural Networks, pp. 2809–2813 (2011)Google Scholar