MILC2: A Multi-Layer Multi-Instance Learning Approach to Video Concept Detection

Gu, Zhiwei; Mei, Tao; Tang, Jinhui; Wu, Xiuqing; Hua, Xian-Sheng

doi:10.1007/978-3-540-77409-9_3

Zhiwei Gu¹,
Tao Mei²,
Jinhui Tang¹,
Xiuqing Wu¹ &
…
Xian-Sheng Hua²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4903))

Included in the following conference series:

International Conference on Multimedia Modeling

1652 Accesses
4 Citations

Abstract

Video is a kind of structured data with multi-layer (ML) information, e.g., a shot is consisted of three layers including shot, key-frame, and region. Moreover, multi-instance (MI) relation is embedded along the consecutive layers. Both the ML structure and MI relation are essential for video concept detection. The previous work [5] dealt with ML structure and MI relation by constructing a MLMI kernel in which each layer is assumed to have equal contribution. However, such equal weighting technique cannot well model MI relation or handle ambiguity propagation problem, i.e., the propagation of uncertainty of sub-layer label through multiple layers, as it has been proved that different layers have different contributions to the kernel. In this paper, we propose a novel algorithm named MILC² (Multi-Layer Multi-Instance Learning with Inter-layer Consistency Constraint.) to tackle the ambiguity propagation problem, in which an inter-layer consistency constraint is explicitly introduced to measure the disagreement of inter-layers, and thus the MI relation is better modeled. This learning task is formulated in a regularization framework with three components including hyper-bag prediction error, inter-layer inconsistency measure, and classifier complexity. We apply the proposed MILC² to video concept detection over TRECVID 2005 development corpus, and report better performance than both standard Support Vector Machine based and MLMI kernel methods.

This work was performed when the first author visited Microsoft Research Asia as an intern.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

TRECVID: TREC Video Retrieval Evaluation. http://www-nlpir.nist.gov/projects/TRECVID
Chen, Y., Bi, J., Wang, J.Z.: MILES: Multiple-instance learning via embedded instance selection. IEEE Trans. on Pattern Analysis and Machine Intelligence 28(12), 1931–1947 (2006)
Article Google Scholar
Cheung, P.-M., Kwok, J.T.: A regularization framework for multiple-instance learning. In: Proceedings of International Conference on Machine Learning, pp. 193–200. ACM Press, New York (2006)
Google Scholar
Deng, Y., Manjunath, B.S.: Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. on Pattern Analysis and Machine Intelligence 23(8), 800–810 (2001)
Article Google Scholar
Gu, Z., Mei, T., Hua, X.-S., Tang, J., Wu, X.: Multi-layer multi-instance kernel for video concept detection. In: Proceedings of ACM Multimedia, Augsburg, Germany (September 2007)
Google Scholar
Kwok, J., Cheung, P.-M.: Marginalized multi-instance kernels. In: Proceedings of International Joint Conference on Artificial Intelligence, Hyderabad, India, pp. 901–906 (January 2007)
Google Scholar
Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: Proceedings of International Conference on Machine Learning, pp. 341–349. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Naphade, M., Smith, J.R., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-scale concept ontology for multimedia. IEEE MultiMedia 13(3), 86–91 (2006)
Article Google Scholar
Qi, G.-J., Hua, X.-S., Rui, Y., Tang, J., Mei, T., Zhang, H.-J.: Correlative multi-label video annotation. In: Proceedings of ACM Multimedia, Augsburg, Germany (September 2007)
Google Scholar
Smola, A., Vishwanathan, S., Hofmann, T.: Kernel methods for missing variables. In: Proceedings of International Workshop on Artificial Intelligence and Statistics, Barbados (2005)
Google Scholar
Gartner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: Proceedings of International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (2002)
Google Scholar
Yan, R., Naphade, M.: Semi-supervised cross feature learning for semantic concept detection in video. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2005)
Google Scholar
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation Campaigns and TRECVID. In: Proceedings of ACM SIGMM International Workshop on Multimedia Information Retrieval (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, 230027, China
Zhiwei Gu, Jinhui Tang & Xiuqing Wu
Microsoft Research Asia, Beijing, 100080, China
Tao Mei & Xian-Sheng Hua

Authors

Zhiwei Gu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Mei
View author publications
You can also search for this author in PubMed Google Scholar
Jinhui Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xiuqing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xian-Sheng Hua
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Shin’ichi Satoh Frank Nack Minoru Etoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gu, Z., Mei, T., Tang, J., Wu, X., Hua, XS. (2008). MILC²: A Multi-Layer Multi-Instance Learning Approach to Video Concept Detection. In: Satoh, S., Nack, F., Etoh, M. (eds) Advances in Multimedia Modeling. MMM 2008. Lecture Notes in Computer Science, vol 4903. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77409-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-77409-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77407-5
Online ISBN: 978-3-540-77409-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics