Neural Processing Letters

, Volume 47, Issue 3, pp 783–797 | Cite as

Feature Analysis of Unsupervised Learning for Multi-task Classification Using Convolutional Neural Network



This study analyzes the characteristics of unsupervised feature learning using a convolutional neural network (CNN) to investigate its efficiency for multi-task classification and compare it to supervised learning features. We keep the conventional CNN structure and introduce modifications into the convolutional auto-encoder design to accommodate a subsampling layer and make a fair comparison. Moreover, we introduce non-maximum suppression and dropout for a better feature extraction and to impose sparsity constraints. The experimental results indicate the effectiveness of our sparsity constraints. We also analyze the efficiency of unsupervised learning features using the t-SNE and variance ratio. The experimental results show that the feature representation obtained in unsupervised learning is more advantageous for multi-task learning than that obtained in supervised learning.


Unsupervised learning Convolutional neural networks Multi-task learning Auto-encoder Deep learning 



This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MSIT) (No. NRF-2016R1A2A2A05921679) (50%) and the Institute for Information and Communications Technology Promotion (IITP) grant funded by the Korea Government (MSIT) (2016-0-00564, Development of Intelligent Interaction Technology Based on Context Awareness and Human Intention Understanding) (50%).


  1. 1.
    Bengio Y et al (2009) Learning deep architectures for ai. Foundations and trends\(\textregistered \). Mach Learn 2(1):1–127MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256Google Scholar
  3. 3.
    Gülçehre BY (2016) Knowledge matters: importance of prior information for optimization. J Mach Learn Res 17(8):1–32MathSciNetMATHGoogle Scholar
  4. 4.
    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Tech. rep., Technical Report 07-49, University of Massachusetts, AmherstGoogle Scholar
  6. 6.
    Kavukcuoglu K, Sermanet P, Boureau YL, Gregor K, Mathieu M, Cun YL (2010) Learning convolutional feature hierarchies for visual recognition. In: Advances in neural information processing systems, pp 1090–1098Google Scholar
  7. 7.
    Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of TorontoGoogle Scholar
  8. 8.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  9. 9.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRefGoogle Scholar
  10. 10.
    Li S, Liu ZQ, Chan AB (2014) Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 482–489Google Scholar
  11. 11.
    Liu C, Jang YM, Ozawa S, Lee M (2011) Incremental 2-directional 2-dimensional linear discriminant analysis for multitask pattern recognition. In: The 2011 international joint conference on neural networks (IJCNN). IEEE, pp 2911–2916Google Scholar
  12. 12.
    Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. Proc ICML 30(1)Google Scholar
  13. 13.
    Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605MATHGoogle Scholar
  14. 14.
    Makhzani A, Frey BJ (2015) Winner-take-all autoencoders. In: Advances in neural information processing systems, pp 2791–2799Google Scholar
  15. 15.
    Masci J, Meier U, Cirean D, Schmidhuber J (2011) Stacked convolutional auto-encoders for hierarchical feature extraction. In: International conference on artificial neural networks. Springer, pp 52–59Google Scholar
  16. 16.
    Mika S, Ratsch G, Weston J, Scholkopf B, Mullers KR (1999) Fisher discriminant analysis with kernels. In: Neural networks for signal processing IX, 1999. Proceedings of the 1999 IEEE signal processing society workshop. IEEE, pp 41–48Google Scholar
  17. 17.
    Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814Google Scholar
  18. 18.
    Ng HW, Winkler S (2014) A data-driven approach to cleaning large face datasets. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 343–347Google Scholar
  19. 19.
    Ozawa S, Roy A, Roussinov D (2009) A multitask learning model for online pattern recognition. IEEE Trans Neural Netw 20(3):430–445CrossRefGoogle Scholar
  20. 20.
    Paulin M, Douze M, Harchaoui Z, Mairal J, Perronin F, Schmid C (2015) Local convolutional features with unsupervised training for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 91–99Google Scholar
  21. 21.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint. arXiv:1409.1556v6
  22. 22.
    Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetMATHGoogle Scholar
  23. 23.
    Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708Google Scholar
  24. 24.
    Thrun S, Pratt L (2012) Learning to learn. Springer Science & Business Media, BerlinMATHGoogle Scholar
  25. 25.
    Tieleman T (2008) Training restricted boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1064–1071Google Scholar
  26. 26.
    Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103Google Scholar
  27. 27.
    Yi D, Lei Z, Liao S, Li SZ (2014) Learning face representation from scratch. arXiv preprint. arXiv:1411.7923v1
  28. 28.
    Zhang C, Zhang Z (2014) Improving multiview face detection with multi-task deep convolutional neural networks. In: 2014 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1036–1041Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.School of Electronics EngineeringKyungpook National UniversityTaeguSouth Korea

Personalised recommendations