Learning two-pathway convolutional neural networks for categorizing scene images

Bai, Shuang; Li, Zhaohong; Hou, Jianjun

doi:10.1007/s11042-016-3900-6

Learning two-pathway convolutional neural networks for categorizing scene images

Published: 21 September 2016

Volume 76, pages 16145–16162, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shuang Bai¹,
Zhaohong Li¹ &
Jianjun Hou¹

409 Accesses
11 Citations
Explore all metrics

Abstract

Scenes are closely related to the kinds of objects that may appear in them. Objects are widely used as features for scene categorization. On the other hand, landscapes with more spatial structures of scenes are representative of scene categories. In this paper, we propose a deep learning based algorithm for scene categorization. Specifically, we design two-pathway convolutional neural networks for exploiting both object attributes and spatial structures of scene images. Different from conventional deep learning methods, which usually focus on only one aspect of images, each pathway of the proposed architecture is tuned to capture a different aspect of images. As a result, complementary information of image contents can be utilized effectively. In addition, to deal with the feature redundancy problem caused by combining features from different sources, we adopt the ℓ _2,1 norm during classifier training to control selectivity of each type of features. Extensive experiments are conducted to evaluate the proposed method. Obtained results demonstrate that the proposed approach achieves superior performances over conventional methods. Moreover, the proposed method is a general framework, which can be easily extended to more pathways and applied to solve other problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scene representation using a new two-branch neural network model

Article 01 December 2023

Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2

Article 30 January 2023

Analysis of the inter-dataset representation ability of deep features for high spatial resolution remote sensing image scene classification

Article 27 August 2018

References

Amari S (1993) Backpropagation and stochastic gradient descent method. Neurocomputing 5:185–196
Article MATH Google Scholar
Bay H, Tuytelaars T, Gool L V (2006) Surf: speeded up robust features. In: Proc. Eur. conf. comput. vision, pp 404–417
Bengio Y (2009) Learning deep architectures for ai. Found Trends Mach Learn 2(1):1–127
Article MathSciNet MATH Google Scholar
Bishop C M (2006) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Boureau Y, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 2559–2566
Byeon W, Breuel T, Raue F, Liwicki M (2015) Scene labeling with lstm recurrent neural networks. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 3547–3555
Chen Z, Chi Z, Fu H (2014) A hybrid holistic/semantic approach for scene classification. In: Proc.IEEE Int. conf. pattern recog., pp 2299–2304
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. IEEE Int. conf. comput. vis. pattern recog., vol 2, pp 886–893
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 248–255
Doersch C, Gupta A, Efros AA (2013) Mid-level visual element discovery as discriminative mode seeking. In: Proc. Adv. neural inf. process. syst.
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: Proc. Int. conf. mach. learn.
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Gemert JC, Geusebroek JM, Veenman CJ, Smeulders AW (2008) Kernel codebooks for scene categorization. In: Proc. Eur. conf. comput. vision
Goh H, Thome N, Cord M, Lim J (2014) Learning deep hierarchical visual feature coding. IEEE Trans Neural Netw Learn Syst 25(12):2212–2225
Article Google Scholar
Gong Y, Wang L, Guo R, Lazebnik S (2015) Multi-scale orderless pooling of deep convolutional activation features. In: Proc. Eur. conf. comput. vision
Izadinia H, Sadeghi F, Farhadi A (2014) Incorporating scene context and object layout into appearance modeling. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 232–239
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2013) Caffe: an open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/
Juneja M, Vedaldi A, Jawahar C, Zisserman A (2013) Blocks that shout: distinctive parts for scene classification. In: Proc. IEEE Int. conf. comput. vis. pattern recog.
Kavukcuoglu K, Ranzato M, Fergus R, LeCun Y (2009) Learning invariant features through topographic filter maps. In: Proc. IEEE Int. conf. comput. vis. pattern recog.
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Proc. Adv. neural inf. process. syst.
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 2169–2178
Li L, Su H, Xing EP, Fei-Fei L (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: Proc. adv. neural inf. process. syst.
Li L-J, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: Proc. IEEE int. conf. comput. vis., pp 1–8
Lin D, Lu C, Liao R, Jia J (2014) Learning important spatial pooling regions for scene classification. In: Proc. IEEE Int. conf. comput. vis. pattern recog.
Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l2,1-norm minimization. In: Conference on uncertainty in artificial intelligence
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Springer, New York, p 10036
Book MATH Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article MATH Google Scholar
Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models. In: Proc. IEEE int. conf. comput. vis.
Pinheiro P, Collobert R (2014) Recurrent convolutional neural networks for scene labeling. In: Proceedings of the 31st international conference on machine learning, pp 82–90
Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: Proc. IEEE Int. conf. comput. vis. pattern recog
Quelhas P, Monay F, Odobez JM, Gatica-Perez D, Tuytelaars T (2007) A thousand words in a scene. IEEE Trans Pattern Anal Mach Intell 29(9):1575–1589
Article Google Scholar
Ranzato M, Huang F, Boureau Y, LeCun Y (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 1–8
Ranzato M, Susskind J, Mnih V, Hinton G (2011) On deep generative models with applications to recognition. In: Proc. IEEE Int. conf. comput. vis. pattern recog.
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognitione. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 512–519
Singh AAES, Gupta A (2012) Unsupervised discovery of mid-level discriminative patches. In: Proc. eur. conf. comput. vision, pp 73–86
Sadeghi F, Tappen MF (2012) Latent pyramidal regions for recognizing scenes. In: Proc. eur. conf. comput. vision, pp 228–241
Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Article MathSciNet MATH Google Scholar
Shabou A, LeBorgne H (2012) Locality-constrained and spatially regularized coding for scene categorization. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 3618–3625
Szummer M, Picard RW (1998) Indoor-outdoor image classification. In: Proc. IEEE int. workshop content-based access image video database, pp 42–51
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality constrained linear coding for image classification. In: Proc. IEEE Int. conf. comput. vis. pattern recog.
Wang P, Wang J, Zeng G, Xu W, Zha H, Li S (2013) Supervised kernel descriptors for visual recognition. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 2858–2865
Wu J, Rehg J (2011) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501
Article Google Scholar
Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 3485–3492
Zhang L, Zhen X, Shao L (2014) Learning object-to-class kernels for scene classification. IEEE Trans Image Process 23(8):3241–3253
Article MathSciNet Google Scholar
Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Proc. adv. neural inf. process. syst.

Download references

Acknowledgments

This work was supported in part by the Fundamental Research Funds for the Central Universities (2014JBM017), in part by A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), and in part by Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET).

Author information

Authors and Affiliations

School of Electronic and Information Engineering, Beijing Jiaotong University, No.3 Shang Yuan Cun, Hai Dian District, Beijing, China
Shuang Bai, Zhaohong Li & Jianjun Hou

Authors

Shuang Bai
View author publications
You can also search for this author in PubMed Google Scholar
Zhaohong Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Hou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuang Bai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bai, S., Li, Z. & Hou, J. Learning two-pathway convolutional neural networks for categorizing scene images. Multimed Tools Appl 76, 16145–16162 (2017). https://doi.org/10.1007/s11042-016-3900-6

Download citation

Received: 25 January 2016
Revised: 11 July 2016
Accepted: 24 August 2016
Published: 21 September 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11042-016-3900-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning two-pathway convolutional neural networks for categorizing scene images

Abstract

Access this article

Similar content being viewed by others

Scene representation using a new two-branch neural network model

Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2

Analysis of the inter-dataset representation ability of deep features for high spatial resolution remote sensing image scene classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning two-pathway convolutional neural networks for categorizing scene images

Abstract

Access this article

Similar content being viewed by others

Scene representation using a new two-branch neural network model

Building discriminative features of scene recognition using multi-stages of inception-ResNet-v2

Analysis of the inter-dataset representation ability of deep features for high spatial resolution remote sensing image scene classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation