Multimedia Tools and Applications

, Volume 78, Issue 1, pp 573–589 | Cite as

Coupled-learning convolutional neural networks for object recognition

  • Chunyan XuEmail author
  • Jian Yang
  • Junbin Gao


Recently, convolutional neural networks (CNN) have been attracting considerable attention in various computer vision tasks. Motivated by neuroscience, CNN has several similar properties with the learning process of human brain. A prominent difference is that each CNN is an independent learning process while the effective interaction/communication between people can play important role in the human visual system. Inspired by this fact, we proposed a novel Coupled-learning Convolutional Neural Network (Co-CNN) for the task of object recognition, which boosts its discriminative capability by employing the dynamic interaction between neural networks. Contrary to existing network architectures posing the network optimization problem as an isolated learning process, the intuition behind the Co-CNN framework is that the coupled learning mechanism may prevent the algorithm away from over-fitting to one or more particular objective functions. The proposed Co-CNN framework has three unique characteristics: (1) Co-CNN, which is a novel deep network learning framework, can simultaneously optimize both neural networks with same/different structures. (2) The learned semantic information, which can be gradually mined from neural networks, is employed to guide the communication between neural networks. (3) Co-CNN well incorporates the coupled-learning mechanism into the process of learning neural networks, and then further improve the recognition performance of neural networks by adopting the learned semantic information. Comprehensive evaluations on five benchmark datasets (CIFAR-10, CIFAR-100, MNIST, SVHN and Imagenet) well demonstrate the significant superiority of our proposed Co-CNN framework over other existing algorithms.


Coupled-learning mechanism Convolutional neural network Object recognition Semantic information 



This work is supported by the National Natural Science Foundation of China (Grant No. 61602244 and 61502235) and partially sponsored by CCF-Tencent Open Research Fund.


  1. 1.
    Berg A, Deng J, Fei-Fei L (2010) Large-scale visual recognition challenge. In:
  2. 2.
    Boekaerts M, Zeidner M, Pintrich PR (2000) Handbook of self-regulation. Academic Press, CambridgeGoogle Scholar
  3. 3.
    Cavana RY (1999) Modeling the environment: an introduction to system dynamics models of environmental systems. Island Press, Washington, D.C.Google Scholar
  4. 4.
    Dayan P, Abbott L (2005) Theoretical neuroscience: computational and mathematical modeling of neural systems. Computational neuroscience. Massachusetts Institute of Technology PressGoogle Scholar
  5. 5.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE Conference on computer vision and pattern recognition, pp 248–255Google Scholar
  6. 6.
    Gan C, Wang N, Yang Y, Yeung DY, Hauptmann AG (2015) Devnet: a deep event network for multimedia event detection and evidence recounting. In: IEEE Conference on computer vision and pattern recognitionGoogle Scholar
  7. 7.
    Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on computer vision and pattern recognitionGoogle Scholar
  8. 8.
    Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: International conference on machine learningGoogle Scholar
  9. 9.
    Graham B (2014) Spatially-sparse convolutional neural networks. arXiv:1409.6070
  10. 10.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognitionGoogle Scholar
  11. 11.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRefGoogle Scholar
  12. 12.
    Hou X, Shen L, Sun K, Qiu G (2017) Deep feature consistent variational autoencoder. In: IEEE Winter conference on applications of computer vision, pp 1133–1141Google Scholar
  13. 13.
    Jain AK, Mao J, Mohiuddin K (1996) Artificial neural networks: a tutorial. IEEE Comput 29:31–44CrossRefGoogle Scholar
  14. 14.
    Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Computer Science Department, University of Toronto, Tech RepGoogle Scholar
  15. 15.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  16. 16.
    LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRefGoogle Scholar
  17. 17.
    Lee TS, Mumford D (2003) Hierarchical bayesian inference in the visual cortex. J Opt Soc Amer A 20:1434–1448CrossRefGoogle Scholar
  18. 18.
    Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2014) Deeply-supervised nets. In: Advances in neural information processing systems workshop on deep learning and representation learningGoogle Scholar
  19. 19.
    Lee CY, Gallagher PW, Tu Z (2015) Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree. arXiv:1509.08985
  20. 20.
    Li Y, Shen L (2017) Skin lesion analysis towards melanoma detection using deep learning network. arXiv:1703.00577
  21. 21.
    Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: IEEE Conference on computer vision and pattern recognitionGoogle Scholar
  22. 22.
    Liang X, Xu C, Shen X, Yang J, Liu S, Tang J, Lin L, Yan S (2015) Human parsing with contexttualized convolutional neural network. In: International conference on computer visionGoogle Scholar
  23. 23.
    Lin M, Chen Q, Yan S (2014) Network in network. In: International conference on learning representationsGoogle Scholar
  24. 24.
    Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: Advances in neural information processing systems workshop on deep learning and unsupervised feature learning, vol 2011, p 4Google Scholar
  25. 25.
    Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: International conference on computer visionGoogle Scholar
  26. 26.
    Palmer ES (1999) Vision science: photons to phenomenology. MIT Press, CambridgeGoogle Scholar
  27. 27.
    Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2015) Fitnets: hints for thin deep nets. In: ICLRGoogle Scholar
  28. 28.
    Sanchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: 2011 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1665–1672Google Scholar
  29. 29.
    Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representationsGoogle Scholar
  30. 30.
    Springenberg JT, Riedmiller M (2013) Improving deep neural networks with probabilistic maxout units. arXiv:13126116
  31. 31.
    Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. In: NIPS, pp 2377–2385Google Scholar
  32. 32.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE Conference on computer vision and pattern recognitionGoogle Scholar
  33. 33.
    Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inform Sci 295(1):395–406CrossRefGoogle Scholar
  34. 34.
    Xu C, Lu C, Liang X, Gao J, Zheng W, Wang T, Yan S (2016) Multi-loss regularized neural network. IEEE Trans Circ Syst Video Technol 26(12):2273–2283CrossRefGoogle Scholar
  35. 35.
    Xu Z, Yang Y, Hauptmann AG (2015) A discriminative cnn video representation for event detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1798–1807Google Scholar
  36. 36.
    Yang M, Wang X, Zeng G, Shen L (2017) Joint and collaborative representation with local adaptive convolution feature for face recognition with single sample per person. Pattern Recogn 66:117–128CrossRefGoogle Scholar
  37. 37.
    Yuan C, Sun X, LV R (2016) Fingerprint liveness detection based on multi-scale lpq and pca. Chin Commun 13(7):60–65CrossRefGoogle Scholar
  38. 38.
    Zhao J, Mathieu M, Goroshin R, Lecun Y (2015) Stacked what-where auto-encoders. In: arXiv:1506.02351
  39. 39.
    Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P (2015) Conditional random fields as recurrent neural networks. In: International conference on computer visionGoogle Scholar
  40. 40.
    Zheng Y, Jeon B, Xu D, Wu QJ, Zhang H (2015) Image segmentation by generalized hierarchical fuzzy c-means algorithm. J Intell Fuzzy Syst 28(2):961–973Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringNanjing University of Science and TechnologyNanjingChina
  2. 2.Discipline of Business Analytics, University of Sydney Business SchoolUniversity of SydneySydneyAustralia

Personalised recommendations