Abstract
Most existing image classification algorithms mainly focus on dealing with images with only “object” concepts. However, in real-world cases, a great variety of images contain “verb–object” concepts, rather than only “object” ones. The hierarchical structure embedded in these “verb–object” concepts can help to enhance classification. However, traditional feature representation methods cannot utilize it. To tackle this problem, we present in this paper a novel approach, called inductive hierarchical nonnegative graph embedding. By assuming that those “verb–object” concept images which share the same “object” part but different “verb” part have a specific hierarchical structure, we integrate this hierarchical structure into the nonnegative graph embedding technique, together with the definition of inductive matrix, to (1) conduct effective feature extraction from hierarchical structure, (2) easily transfer each new testing sample into its low-dimensional nonnegative representation, and (3) perform image classification of “verb–object” concept images. Extensive experiments compared with the state-of-the-art algorithms on nonnegative data factorization demonstrate the classification power of proposed approach on “verb–object” concept images classification.
Similar content being viewed by others
Notes
Superscript numbers of matrices, 1, 2, 11, 12, etc., are symbols, not the power in math.
References
Belhumeur, P., Hespanha, J.: Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 711–720 (1997)
Carneiro, G., Chan, A., Moreno, P., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 394–410 (2007)
Ding, C.H., Li, T., Jordan, M.I.: Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 45–55 (2010)
Gao, Y., Fan, J., Xue, X., Jain, R.: Automatic image annotation by incorporating feature hierarchy and boosting to scale up svm classifiers. Proceedings of the 14th annual ACM international conference on Multimedia, pp. 901–910. ACM, New York (2006)
Heger, A., Holm, L.: Sensitive pattern discovery with fuzzyalignments of distantly related proteins. Bioinformatics 19(suppl 1), i130–i137 (2003)
Hong, R., Tang, J., Tan, H.-K., Ngo, C.-W., Yan, S., Chua, T.-S.: Beyond search: event-driven summarization for web videos. TOMCCAP 7(4), 35 (2011)
Hong, R., Wang, M., Li, G., Nie, L., Zha, Z.-J., Chua, T.-S.: Multimedia question answering. IEEE Multimed. 19(4), 72–78 (2012)
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)
Hu, C., Zhang, B., Yan, S., Yang, Q., Yan, J., Chen, Z., Ma, W.: Mining ratio rules via principal sparse non-negative matrix factorization. In Fourth IEEE International Conference on Data Mining, 2004. ICDM’04, pp. 407–410. IEEE (2004)
Kim, P., Tidor, B.: Subsystem identification through dimensionality reduction of large-scale gene expression data. Genome Res. 13(7), 1706–1718 (2003)
Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Second Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 481–492 (1951)
Lee, D., Seung, H., et al.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Li, L., Jiang, S., Huang, Q.: Learning hierarchical semantic description via mixed-norm regularization for image understanding. IEEE Trans. Multimed. 14(5), 1401–1413 (2012)
Li, L.-J., Su, H., Fei-Fei, L., Xing, E.P.: Object bank: a high-level image representation for scene classification & semantic feature sparsification, pp. 1378–1386. In: Advances in Neural Information Processing Systems (2010)
Li, S.Z., Hou, X.W., Zhang, H.J., Cheng, Q.S.: Learning spatially localized, parts-based representation. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001, vol. 1, pp. I-207. IEEE (2001)
Liu, X., Yan, S., Jin, H.: Projective nonnegative graph embedding. IEEE Trans. Image Process. 19(5), 1126–1137 (2010)
Ramanath, R., Kuehni, R., Snyder, W., Hinks, D.: Spectral spaces and color spaces. Color Res. Appl. 29(1), 29–37 (2004)
Ramanath, R., Snyder, W., Qi, H.: Eigenviews for object recognition in multispectral imaging systems. In: Applied Imagery Pattern Recognition Workshop, 2003. Proceedings. 32nd, pp. 33–38. IEEE (2003)
Sun, C., Bao, B.-K., Xu, C.: Verb-object concepts image classification via hierarchical nonnegative graph embedding. In: Proceeding of 19th International Conference on Multimedia Modeling (MMM), pp. 58–69 (2013)
Wang, C., Song, Z., Yan, S., Zhang, L., Zhang, H.: Multiplicative nonnegative graph embedding. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 389–396. IEEE (2009)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 3360–3367. IEEE (2010)
Wang, M., Hong, R., Li, G., Zha, Z.-J., Yan, S., Chua, T.-S.: Event driven web video summarization by tag localization and key-shot identification. IEEE Trans. Multimed. 14(4), 975–985 (2012)
Wang, Y., Jia, Y.: Fisher non-negative matrix factorization for learning local features. In: Proc. Asian Conf. on Comp. Vision, Citeseer (2004)
Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., Lin, S.: Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 40–51 (2007)
Yang, J., Yang, S., Fu, Y., Li, X., Huang, T.: Non-negative graph embedding. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8. IEEE (2008)
Yao, B., Jiang, X., Khosla, A., Lin, A., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: IEEE International Conference on Computer Vision (ICCV), 2011, pp. 1331–1338. IEEE (2011)
Yun, X.: Non-negative matrix factorization for face recognition. PhD thesis, Hong Kong Baptist University (2007)
Zhang, X., Zha, Z., Xu, C.: Learning verb-object concepts for semantic image annotation. Proceedings of the 19th ACM International Conference on Multimedia, pp. 1077–1080. ACM, New York (2011)
Acknowledgments
This work is supported in part by National Basic Research Program of China (No. 2012CB316304), National Natural Science Foundation of China (No. 61225009, No. 61201374) and Beijing Natural Science Foundation (No. 4131004). This work is also supported by the Singapore National Research Foundation under its International Research Centre@Singapore Funding Initiative and administered by the IDM Programme Office.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Here we present the convergence proof of update rule for both matrix \(W\) and matrix \(C\).
1.1 Preliminaries
First of all, we introduce the concept of auxiliary function and the lemma which will be used for algorithmic derivation.
Definition 1
Function \(G(A,A')\) is an auxiliary function for function \(F(A)\) if the following conditions are satisfied
From this definition, we have the following lemma with proof omitted [12].
Lemma 1
If \(G\) is an auxiliary function, then \(F\) is non-increasing under the update
where \(t\) denotes the \(t\)th iteration.
1.2 Convergence proof of update rule for W
Let \(F_{ij}\) as the part of \(F(W)\) relevant to \(W_{ij}\), we have
The auxiliary function of \(F_{ij}\) is then designed as
Lemma 2
Equation (48) is an auxiliary function for \(F_{ij}\), which is the part of \(F(W)\) relevant to \(W_{ij}\).
Proof
Obviously, \(G(W_{ij}, W_{ij}) = F_{ij}(W_{ij})\). We only need to prove that \(G(W_{ij}, W^t_{ij}) \ge F_{ij}(W_{ij})\).
First, we have the Taylor series expansion of \(F_{ij}\)
Then, it is easy to verify that
Thus we have
Then, \(G(W_{ij}, W^t_{ij}) \ge F_{ij}(W_{ij})\) holds.
Lemma 3
Equation (34) could be obtained by minimizing the auxiliary function \(G(W_{ij}, W^t_{ij})\).
Proof
Let \(\partial G(W_{ij}, W^t_{ij}) / \partial W_{ij} = 0\), we have
Finally we can obtain the update rule for \(W\)
and the lemma is proved.
1.3 Convergence proof of update rule for C
Let \(F_{ij}\) as the part of \(F(C)\) relevant to \(C_{ij}\), we have
The auxiliary function of \(F_{ij}\) is then designed as
Lemma 4
Equation (57) is an auxiliary function for \(F_{ij}\), which is the part of \(F(C)\) relevant to \(C_{ij}\).
Proof
Obviously, \(G(C_{ij}, C_{ij}) = F_{ij}(C_{ij})\). We only need to prove that \(G(C_{ij}, C^t_{ij}) \ge F_{ij}(C_{ij})\).
First, we have the Taylor series expansion of \(F_{ij}\)
Then, it is easy to verify that
Thus we have \(G(C_{ij}, C^t_{ij}) \ge F_{ij}(C_{ij})\).
Lemma 5
Equation (43) could be obtained by minimizing the auxiliary function \(G(C_{ij}, C^t_{ij})\).
Proof
Let \(\partial G(C_{ij}, C^t_{ij}) \ / \ \partial C_{ij} = 0\), we have
Finally we can obtain the update rule for \(C\)
and the lemma is proved. \(\square \)
Rights and permissions
About this article
Cite this article
Sun, C., Bao, BK. & Xu, C. Inductive hierarchical nonnegative graph embedding for “verb–object” image classification. Machine Vision and Applications 25, 1647–1659 (2014). https://doi.org/10.1007/s00138-013-0548-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-013-0548-3