Neural Computing and Applications

, Volume 27, Issue 5, pp 1361–1367 | Cite as

Learning a good representation with unsymmetrical auto-encoder

  • Yanan Sun
  • Hua Mao
  • Quan Guo
  • Zhang YiEmail author
Original Article


Auto-encoders play a fundamental role in unsupervised feature learning and learning initial parameters of deep architectures for supervised tasks. For given input samples, robust features are used to generate robust representations from two perspectives: (1) invariant to small variation of samples and (2) reconstruction by decoders with minimal error. Traditional auto-encoders with different regularization terms have symmetrical numbers of encoder and decoder layers, and sometimes parameters. We investigate the relation between the number of layers and propose an unsymmetrical structure, i.e., an unsymmetrical auto-encoder (UAE), to learn more effective features. We present empirical results of feature learning using the UAE and state-of-the-art auto-encoders for classification tasks with a range of datasets. We also analyze the gradient vanishing problem mathematically and provide suggestions for the appropriate number of layers to use in UAEs with a logistic activation function. In our experiments, UAEs demonstrated superior performance with the same configuration compared to other auto-encoders.


Auto-encoder Neural networks Feature learning Deep learning Unsupervised learning 



This work was supported by the National Science Foundation of China under Grant 61432012.


  1. 1.
    Baldi P, Hornik K (1989) Neural networks and principal component analysis: learning from examples without local minima. Neural Netw 2(1):53–58CrossRefGoogle Scholar
  2. 2.
    Baldi P, Pineda F (1991) Contrastive learning and neural oscillations. Neural Comput 3(4):526–545CrossRefGoogle Scholar
  3. 3.
    Baum EB, Haussler D (1989) What size net gives valid generalization? Neural Comput 1(1):151–160CrossRefGoogle Scholar
  4. 4.
    Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127 MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. Unsupervised Transf Learn Chall Mach Learn 7:19Google Scholar
  6. 6.
    Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153Google Scholar
  7. 7.
    Doya K (1992) Bifurcations in the learning of recurrent neural networks 3. Learning (RTRL) 3:17Google Scholar
  8. 8.
    Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. Dept. IRO, Universit de Montral, Technical ReportGoogle Scholar
  9. 9.
    Goodfellow I, Lee H, Le QV, Saxe A, Ng AY (2009) Measuring invariances in deep networks. In: Bengio Y, Schuurmans D, Lafferty JD, Williams CKI, Culotta A (eds) Advances in neural information processing systems 22, Curran Associates, Inc., pp 646–654 Google Scholar
  10. 10.
    Hinton GE (1987) Learning translation invariant recognition in a massively parallel networks. In: PARLE Parallel Architectures and Languages Europe, vol 1. Springer, Eindhoven, pp 1–13 Google Scholar
  11. 11.
    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Jarrett K, Kavukcuoglu K, Ranzato M, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: IEEE 12th international conference on computer vision, 2009. IEEE, pp 2146–2153Google Scholar
  13. 13.
    Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech. RepGoogle Scholar
  14. 14.
    Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th annual international conference on machine learning, pp 609–616. ACMGoogle Scholar
  15. 15.
    Liou CY, Cheng WC, Liou JW, Liou DR (2014) Autoencoder for words. Neurocomputing 139:84–96CrossRefGoogle Scholar
  16. 16.
    Liou CY, Huang JC, Yang WC (2008) Modeling word perception using the Elman network. Neurocomputing 71(16):3150–3157CrossRefGoogle Scholar
  17. 17.
    Moody J, Hanson S, Krogh A, Hertz JA (1995) A simple weight decay can improve generalization. Adv Neural Inf Process Syst 4:950–957Google Scholar
  18. 18.
    Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis Res 37(23):3311–3325CrossRefGoogle Scholar
  19. 19.
    Pineda FJ (1988) Dynamics and architecture for neural computation. J Complex 4(3):216–245MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Ranzato MA, Boureau Y-L, Cun YL (2008) Sparse feature learning for deep belief networks. In: Platt JC, Koller D, Singer Y, Roweis ST (eds) Advances in neural information processing systems 20, Curran Associates, Inc., Red Hook, New York, pp 1185–1192 Google Scholar
  21. 21.
    Ranzato MA, Poultney C, Chopra S, Cun YL (2007) Efficient learning of sparse representations with an energy-based model. In: Schölkopf B, Platt JC, Hoffman T (eds) Advances in neural information processing systems 19, MIT Press, pp 1137–1144Google Scholar
  22. 22.
    Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: Explicit invariance during feature extraction. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 833–840Google Scholar
  23. 23.
    Schwartz D, Samalam V, Solla SA, Denker J (1990) Exhaustive learning. Neural Comput 2(3):374–385CrossRefGoogle Scholar
  24. 24.
    Tishby N, Levin E, Solla SA (1989) Consistent inference of probabilities in layered networks: predictions and generalizations. In: International joint conference on neural networks, IJCNN, 1989. IEEE, pp 403–409Google Scholar
  25. 25.
    Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103Google Scholar
  26. 26.
    Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408 MathSciNetzbMATHGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2015

Authors and Affiliations

  1. 1.Machine Intelligence Laboratory, College of Computer ScienceSichuan UniversityChengduPR China

Personalised recommendations