Skip to main content
Log in

Learning two-pathway convolutional neural networks for categorizing scene images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Scenes are closely related to the kinds of objects that may appear in them. Objects are widely used as features for scene categorization. On the other hand, landscapes with more spatial structures of scenes are representative of scene categories. In this paper, we propose a deep learning based algorithm for scene categorization. Specifically, we design two-pathway convolutional neural networks for exploiting both object attributes and spatial structures of scene images. Different from conventional deep learning methods, which usually focus on only one aspect of images, each pathway of the proposed architecture is tuned to capture a different aspect of images. As a result, complementary information of image contents can be utilized effectively. In addition, to deal with the feature redundancy problem caused by combining features from different sources, we adopt the 2,1 norm during classifier training to control selectivity of each type of features. Extensive experiments are conducted to evaluate the proposed method. Obtained results demonstrate that the proposed approach achieves superior performances over conventional methods. Moreover, the proposed method is a general framework, which can be easily extended to more pathways and applied to solve other problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Amari S (1993) Backpropagation and stochastic gradient descent method. Neurocomputing 5:185–196

    Article  MATH  Google Scholar 

  2. Bay H, Tuytelaars T, Gool L V (2006) Surf: speeded up robust features. In: Proc. Eur. conf. comput. vision, pp 404–417

  3. Bengio Y (2009) Learning deep architectures for ai. Found Trends Mach Learn 2(1):1–127

    Article  MathSciNet  MATH  Google Scholar 

  4. Bishop C M (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  5. Boureau Y, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 2559–2566

  6. Byeon W, Breuel T, Raue F, Liwicki M (2015) Scene labeling with lstm recurrent neural networks. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 3547–3555

  7. Chen Z, Chi Z, Fu H (2014) A hybrid holistic/semantic approach for scene classification. In: Proc.IEEE Int. conf. pattern recog., pp 2299–2304

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proc. IEEE Int. conf. comput. vis. pattern recog., vol 2, pp 886–893

  9. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 248–255

  10. Doersch C, Gupta A, Efros AA (2013) Mid-level visual element discovery as discriminative mode seeking. In: Proc. Adv. neural inf. process. syst.

  11. Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: Proc. Int. conf. mach. learn.

  12. Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  13. Gemert JC, Geusebroek JM, Veenman CJ, Smeulders AW (2008) Kernel codebooks for scene categorization. In: Proc. Eur. conf. comput. vision

  14. Goh H, Thome N, Cord M, Lim J (2014) Learning deep hierarchical visual feature coding. IEEE Trans Neural Netw Learn Syst 25(12):2212–2225

    Article  Google Scholar 

  15. Gong Y, Wang L, Guo R, Lazebnik S (2015) Multi-scale orderless pooling of deep convolutional activation features. In: Proc. Eur. conf. comput. vision

  16. Izadinia H, Sadeghi F, Farhadi A (2014) Incorporating scene context and object layout into appearance modeling. In: Proc. IEEE Int. conf. comput. vis. pattern recog., pp 232–239

  17. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2013) Caffe: an open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/

  18. Juneja M, Vedaldi A, Jawahar C, Zisserman A (2013) Blocks that shout: distinctive parts for scene classification. In: Proc. IEEE Int. conf. comput. vis. pattern recog.

  19. Kavukcuoglu K, Ranzato M, Fergus R, LeCun Y (2009) Learning invariant features through topographic filter maps. In: Proc. IEEE Int. conf. comput. vis. pattern recog.

  20. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Proc. Adv. neural inf. process. syst.

  21. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 2169–2178

  22. Li L, Su H, Xing EP, Fei-Fei L (2010) Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: Proc. adv. neural inf. process. syst.

  23. Li L-J, Fei-Fei L (2007) What, where and who? Classifying events by scene and object recognition. In: Proc. IEEE int. conf. comput. vis., pp 1–8

  24. Lin D, Lu C, Liao R, Jia J (2014) Learning important spatial pooling regions for scene classification. In: Proc. IEEE Int. conf. comput. vis. pattern recog.

  25. Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l2,1-norm minimization. In: Conference on uncertainty in artificial intelligence

  26. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  27. Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Springer, New York, p 10036

    Book  MATH  Google Scholar 

  28. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  29. Pandey M, Lazebnik S (2011) Scene recognition and weakly supervised object localization with deformable part-based models. In: Proc. IEEE int. conf. comput. vis.

  30. Pinheiro P, Collobert R (2014) Recurrent convolutional neural networks for scene labeling. In: Proceedings of the 31st international conference on machine learning, pp 82–90

  31. Quattoni A, Torralba A (2009) Recognizing indoor scenes. In: Proc. IEEE Int. conf. comput. vis. pattern recog

  32. Quelhas P, Monay F, Odobez JM, Gatica-Perez D, Tuytelaars T (2007) A thousand words in a scene. IEEE Trans Pattern Anal Mach Intell 29(9):1575–1589

    Article  Google Scholar 

  33. Ranzato M, Huang F, Boureau Y, LeCun Y (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 1–8

  34. Ranzato M, Susskind J, Mnih V, Hinton G (2011) On deep generative models with applications to recognition. In: Proc. IEEE Int. conf. comput. vis. pattern recog.

  35. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognitione. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 512–519

  36. Singh AAES, Gupta A (2012) Unsupervised discovery of mid-level discriminative patches. In: Proc. eur. conf. comput. vision, pp 73–86

  37. Sadeghi F, Tappen MF (2012) Latent pyramidal regions for recognizing scenes. In: Proc. eur. conf. comput. vision, pp 228–241

  38. Sanchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245

    Article  MathSciNet  MATH  Google Scholar 

  39. Shabou A, LeBorgne H (2012) Locality-constrained and spatially regularized coding for scene categorization. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 3618–3625

  40. Szummer M, Picard RW (1998) Indoor-outdoor image classification. In: Proc. IEEE int. workshop content-based access image video database, pp 42–51

  41. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality constrained linear coding for image classification. In: Proc. IEEE Int. conf. comput. vis. pattern recog.

  42. Wang P, Wang J, Zeng G, Xu W, Zha H, Li S (2013) Supervised kernel descriptors for visual recognition. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 2858–2865

  43. Wu J, Rehg J (2011) Centrist: a visual descriptor for scene categorization. IEEE Trans Pattern Anal Mach Intell 33(8):1489–1501

    Article  Google Scholar 

  44. Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: Proc. IEEE int. conf. comput. vis. pattern recog., pp 3485–3492

  45. Zhang L, Zhen X, Shao L (2014) Learning object-to-class kernels for scene classification. IEEE Trans Image Process 23(8):3241–3253

    Article  MathSciNet  Google Scholar 

  46. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Proc. adv. neural inf. process. syst.

Download references

Acknowledgments

This work was supported in part by the Fundamental Research Funds for the Central Universities (2014JBM017), in part by A Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), and in part by Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology (CICAEET).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuang Bai.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bai, S., Li, Z. & Hou, J. Learning two-pathway convolutional neural networks for categorizing scene images. Multimed Tools Appl 76, 16145–16162 (2017). https://doi.org/10.1007/s11042-016-3900-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3900-6

Keywords

Navigation