AI 2016: Advances in Artificial Intelligence pp 102-108 | Cite as
Object-Based Representation for Scene Classification
Abstract
How to encode and represent a scene remains a critical problem in both human and computer vision. Traditional local and global features are useful and have some successes; however, many observations on human scene perception seem to point to an object-based representation. In this paper, we propose a high-level representation for scene categorization. First, we utilize semantic segmentation to get semantic regions. Then we obtain an object histogram representation of a scene by summation pooling over all regions. Second, we build spatial and geometrical priors for each object and each pair of co-occurrent objects from training scenes, and integrate the spatial and geometrical information of objects into the scene representation. Experimental results on two datasets demonstrate that the proposed representation is effective and competitive.
Keywords
Gaussian Mixture Model Convolutional Neural Network Scene Categorization High Level Representation Indoor SceneNotes
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Project 61175116, the Science and Technology Commission of Shanghai Municipality under research grant no. 14DZ2260800 and Shanghai Knowledge Service Platform for Trustworthy Internet of Things (No. ZF1213).
References
- 1.Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 2.Dixit, M., Chen, S., Gao, D., Rasiwasia, N., Vasconcelos, N.: Scene classification with semantic fisher vectors. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
- 3.Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VII. LNCS, vol. 8695, pp. 392–407. Springer, Heidelberg (2014)Google Scholar
- 4.Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)Google Scholar
- 5.Kwitt, R., Vasconcelos, N., Rasiwasia, N.: Scene recognition on the semantic manifold. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 359–372. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 6.Li, L.J., Su, H., Lim, Y., Li, F.F.: Object bank: an object-level image representation for high-level visual recognition. Int. J. Comput. Vision 107, 20–39 (2014)CrossRefGoogle Scholar
- 7.Li, X., Guo, Y.: Latent semantic representation learning for scene classification. In: Proceedings of the 31st International Conference on Machine Learning (2014)Google Scholar
- 8.Su, Y., Jurie, F.: Improving image classification using semantic attributes. Int. J. Comput. Vision 100, 59–77 (2012)CrossRefGoogle Scholar
- 9.Vogel, J., Schiele, B.: Semantic modeling of natural scenes for content-based image retrieval. Int. J. Comput. Vision 72, 133–157 (2007)CrossRefGoogle Scholar
- 10.Wu, R., Wang, B., Wang, W., Yu, Y.: Harvesting discriminative meta objects with deep CNN features for scene classification. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)Google Scholar
- 11.Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Proceedings of Advances in Neural Information Processing Systems (NIPS) (2014)Google Scholar
- 12.Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNS. In: Proceedings of International Conference on Learning Representations (ICLR) (2015)Google Scholar