Multimedia Tools and Applications

, Volume 77, Issue 20, pp 26563–26580 | Cite as

Stacked multichannel autoencoder – an efficient way of learning from synthetic data

  • Xi Zhang
  • Yanwei FuEmail author
  • Shanshan Jiang
  • Xiangyang Xue
  • Yu-Gang Jiang
  • Gady Agam


Learning from synthetic data has many important applications in case where sufficient amounts of labeled data are not available. Using synthetic data is challenging due to differences in feature distributions between synthetic and actual data, a phenomenon we term synthetic gap. In this paper, we investigate and formalize a general framework – Stacked Multichannel Autoencoder (SMCAE) that enables bridging the synthetic gap and learning from synthetic data more efficiently. In particular, we show that our SMCAE can not only transform and use synthetic data on a challenging face-sketch recognition task, but that it can also help simulate real images which can be used for training classifiers for recognition. Preliminary experiments validate the effectiveness of the proposed framework.


Multimodal autoencoder Synthetic gap Satellite image classification Learning from synthetic data Face-sketch recognition 



This work is supported by Fudan University-CIOMP Joint Fund (FC2017-006). Yanwei Fu is supported by The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (No. TP2017006). Yanwei Fu is the corresponding author.


  1. 1.
    Alimoglu F, Alpaydin E (1997) Combining multiple representations and classifiers for handwritten digit recognition. In: ICDARGoogle Scholar
  2. 2.
    Alnajar F, Lou Z, Alvarez J, Gevers T (2014) Expression-invariant age estimation. In: BMVCGoogle Scholar
  3. 3.
    Bache K, Lichman M (2013) UCI machine learning repository. [Online]. Available:
  4. 4.
    Bal G, Agam G, Frieder O, Frieder G (2008) Interactive degraded document enhancement and ground truth generation. In: Electronic imaging 2008 international society for optics and photonicsGoogle Scholar
  5. 5.
    Baldi P (2012) Autoencoders, unsupervised learning, and deep architectures. Unsupervised Transfer Learn Challenges Mach Learn 7:43Google Scholar
  6. 6.
    Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan J W (2010) A theory of learning from different domains. Mach LearnGoogle Scholar
  7. 7.
    Bengio Y (2009) Learning deep architectures for ai. Foundations and trends®;, in Machine Learning 2(1):1–127MathSciNetCrossRefGoogle Scholar
  8. 8.
    Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. Unsupervised Transfer Learn Challenges Mach Learn 7:19Google Scholar
  9. 9.
    Chen M, Xu Z, Weinberger K Q, Sha F (2012) Marginalized denoising autoencoders for domain adaptation. In: International conference on machine learningGoogle Scholar
  10. 10.
    Deng J, Zhang Z, Marchi E, Schuller B (2013) Sparse autoencoder-based feature transfer learning for speech emotion recognition. In: Affective Computing and intelligent interaction (ACII)Google Scholar
  11. 11.
    Glorot X, Bordes A, Bengio Y (2011) Domain adaptation for large-scale sentiment classification: a deep learning approach. In: ICMLGoogle Scholar
  12. 12.
    Kan M, Shan S, Zhang H, Lao S, Chen X (2012) Multi-view discriminant analysis. In: ECCVGoogle Scholar
  13. 13.
    Klare B F, Li Z, Jain A K (2011) Matching forensic sketches to mug shot photos. IEEE Trans Pattern Anal Mach Intell 33(3):639–646CrossRefGoogle Scholar
  14. 14.
    Lampert C H, Nickisch H, Harmeling S (2013) Attribute-based classification for zero-shot visual object categorization. In: IEEE TPAMIGoogle Scholar
  15. 15.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  16. 16.
    Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66CrossRefGoogle Scholar
  17. 17.
    Pan S J, Yang Q (2010) A survey on transfer learning. In: IEEE TKDEGoogle Scholar
  18. 18.
    Pishchulin L, Jain A, Wojek C, Andriluka M, Thormählen T, Schiele B (2011) Learning people detection models from few training samples. In: 2011 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE, pp 1473–1480Google Scholar
  19. 19.
    Ruiz A, Van de Weijer J, Binefa X (2014) Regularized multi-concept mil for weakly-supervised facial behavior categorization. In: BMVCGoogle Scholar
  20. 20.
    Sarinnapakorn K, Kubat M (2007) Combining subclassifiers in text categorization. A dst-based solution and a case study. In: IEEE TKDEGoogle Scholar
  21. 21.
    Srivastava N, Salakhutdinov R R (2012) Multimodal learning with deep boltzmann machines. In: Advances in neural information processing systems, pp 2222–2230Google Scholar
  22. 22.
    Sun B, Saenko K (2014) From virtual to reality: fast adaptation of virtual object detectors to real domains. In: Proceedings of the British machine vision conference. BMVA PressGoogle Scholar
  23. 23.
    Turk M A, Pentland A P (1991) Face recognition using eigenfaces. In: IEEE Computer Society conference on computer vision and pattern recognition. IEEE, pp 586–591Google Scholar
  24. 24.
    Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(2579–2605):85zbMATHGoogle Scholar
  25. 25.
    Varga T (2004) Comparing natural and synthetic training data for off-line cursive handwriting recognition. In: IWFHR-9 Ninth International workshop on frontiers in handwriting recognition, 2004. IEEE, pp 221–225Google Scholar
  26. 26.
    Varga T, Bunke H (2003) Effects of training set expansion in handwriting recognition using synthetic data. In: 11th Conf. of the international graphonomics society. CiteseerGoogle Scholar
  27. 27.
    Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning. ACM, pp 1096–1103Google Scholar
  28. 28.
    Vincent P, Larochelle H, Bengio Y, Manzagol P -A (2011) Extracting and composing robust features with denoising autoencoders. In: ICMLGoogle Scholar
  29. 29.
    Wang X, Tang X (2009) Face photo-sketch synthesis and recognition. In: IEEE TPAMIGoogle Scholar
  30. 30.
    Wang W, Cui Z, Chang H, Shan S, Chen X (2014) Deeply coupled auto-encoder networks for cross-view classification. arXiv:1402.2031
  31. 31.
    Weinberger K, Dasgupta A, Langford J, Smola A, Attenberg J (2009) Feature hashing for large scale multitask learning. In: ICMLGoogle Scholar
  32. 32.
    Zhang W, Wang X, Tang X (2011) Coupled information-theoretic encoding for face photo-sketch recognition. In: CVPR. IEEE, pp 513–520Google Scholar
  33. 33.
    Zhang X, Agam G, Chen X (2014) Alignment of 3d building models with satellite images using extended chamfer matching. In: The IEEE Conference on computer vision and pattern recognition (CVPR) workshopsGoogle Scholar
  34. 34.
    Zhou Q -Y, Neumann U (2008) Fast and extensible building modeling from airborne lidar data. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems. ACM, p 7Google Scholar
  35. 35.
    Zhu F, Shao L, Tang J (2014) Boosted cross-domain categorization. In: British machine vision conferenceGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Xi Zhang
    • 1
  • Yanwei Fu
    • 2
    • 4
    Email author
  • Shanshan Jiang
    • 1
  • Xiangyang Xue
    • 3
  • Yu-Gang Jiang
    • 3
    • 4
  • Gady Agam
    • 1
  1. 1.Illinois Institute of TechnologyChicagoUSA
  2. 2.School of Data ScienceFudan UniversityShanghaiChina
  3. 3.School of Computer ScienceFudan UniversityShanghaiChina
  4. 4.The Academy for Engineering and TechnologyFudan UniversityShanghaiChina

Personalised recommendations