Abstract
In this paper, we propose a novel gaze shifting kernel for scene image categorization, focusing on discovering the mechanism of humans perceiving visually/semantically salient regions in a scene. First, a weakly supervised embedding algorithm projects the local image descriptors (i.e., graphlets) into a pre-specified semantic space. Afterward, each graphlet can be represented by multiple visual features at both low-level and high-level. As humans typically attend to a small fraction of regions in a scene, a sparsity-constrained graphlet ranking algorithm is proposed to dynamically integrate both the low-level and the high-level visual cues. The top-ranked graphlets are either visually or semantically salient according to human perception. They are linked into a path to simulate human gaze shifting. Finally, we calculate the gaze shifting kernel (GSK) based on the discovered paths from a set of images. Experiments on the USC scene and the ZJU aerial image data sets demonstrate the competitiveness of our GSK, as well as the high consistency of the predicted path with real human gaze shifting path.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Guestrin, E.D., Eizenman, M.: General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE T-BE 53(6), 1124–1133 (2006)
Jixu, C., Qiang, J.: Probabilistic gaze estimation without active personal calibration. In: Proceedings of CVPR (2011)
Nakazawa, A., Nitschke, C.: Point of gaze estimation through corneal surface reflection in an active illumination environment. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 159–172. Springer, Heidelberg (2012)
Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE T-PAMI 31(4), 607–626 (2009)
Cai, Q., Gallup, D., Zhang, C., Zhang, Z.: Head 3D deformable face tracking with a commodity depth camera. In: Proceeding of ECCV (2010)
Lu, F., Okabe, T., Sugano, Y., Sato, Y.: A head pose-free approach for appearance-based gaze estimation. In: Proceedings of BMVC (2011)
Mora, K.A.F., Odobez, J.-M.: Gaze estimation from multimodal kinect data. In: CVPR Workshop (2012)
Mora, K.A.F., Odobez, J.-M.: Person independent 3D gaze estimation from remote RGB-D camera. In: Proceedings of ICIP (2013)
Moosmann, F., Larlus, D., Frederic, J.: Learning saliency maps for object categorization. In: ECCV Workshop (2006)
Gao, D., Vasconcelos, N.: Discriminant saliency for visual recognition from cluttered scenes. In: Proceedings of NIPS (2004)
Gao, D., Vasconcelos, N.: Integrated learning of saliency, complex features and object detectors from cluttered scenes. In: Proceedings of CVPR (2005)
Parikh, D., Zitnick, C.L., Chen, T.: Determining patch saliency using low-level context. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 446–459. Springer, Heidelberg (2008)
Oliva, A., Torralba, A., Castelhano, M.S., Henderson, J.M.: Top-down control of visual attention in object detection. In: Proceedings of ICCV (2009)
Harada, T., Ushiku, Y., Yuya Y.: Discriminative spatial pyramid. In: Proceedings of CVPR, Yasuo Kuniyoshi (2011)
Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: Proceedings of CVPR (2011)
Zhang, L., Song, M., Zhao, Q., Liu, X., Bu, J., Chen, C.: Probabilistic graphlet transfer for photo cropping. IEEE T-IP 21(5), 803–815 (2013)
Lin, Z., Chen, M., Ma, Y.: The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices, arXiv preprint (2010). arXiv:1009.5055
Zhang, L., Han, Y., Yang, Y., Song, M., Yan, S., Tian, Q.: Discovering discriminative graphlets for aerial image categories recognition. IEEE T-IP 22(12), 5071–5084 (2013)
Siagian, C., Itti, L.: Rapid biologically-inspired scene classification using features shared with visual attention. IEEE T-PAMI 29(2), 300–312 (2007)
Harchaoui, Z., Bach, F.: Image classification with segmentation graph kernels. In: Proceedings of ICCV (2007)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of ICCV (2006)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Proceedings of CVPR (2010)
Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: Proceedings of CVPR (2009)
Li, L.-J., Su, H., Xing, E.P., Fei-Fei, L.: Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: Proceedings of NIPS (2010)
Hou, X., Harel, J., Koch, C., Signature, I.: Highlighting sparse salient regions. IEEE T-PAMI 34(1), 194–201 (2012)
Yao, B., Yang, X., Zhu, S.-C.: Introduction to a large scale general purpose ground truth dataset: methodology, annotation tool, and benchmarks. In: EMMCVPR (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, L., Hong, R., Wang, M. (2015). Gaze Shifting Kernel: Engineering Perceptually- Aware Features for Scene Categorization. In: Ho, YS., Sang, J., Ro, Y., Kim, J., Wu, F. (eds) Advances in Multimedia Information Processing -- PCM 2015. PCM 2015. Lecture Notes in Computer Science(), vol 9314. Springer, Cham. https://doi.org/10.1007/978-3-319-24075-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-24075-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24074-9
Online ISBN: 978-3-319-24075-6
eBook Packages: Computer ScienceComputer Science (R0)