Video Annotation by Incremental Learning from Grouped Heterogeneous Sources

Wang, Han; Song, Hao; Wu, Xinxiao; Jia, Yunde

doi:10.1007/978-3-319-16814-2_32

Han Wang¹⁷,
Hao Song¹⁷,
Xinxiao Wu¹⁷ &
…
Yunde Jia¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9007))

Included in the following conference series:

Asian Conference on Computer Vision

1611 Accesses
1 Citations

Abstract

Transfer learning has shown promising results in leveraging loosely labeled Web images (source domain) to learn a robust classifier for the unlabeled consumer videos (target domain). Existing transfer learning methods typically apply source domain data to learn a fixed model for predicting target domain data once and for all, ignoring rapidly updating Web data and continuously changes of users requirements. We propose an incremental transfer learning framework, in which heterogeneous knowledge are integrated and incrementally added to update the target classifier during learning process. Under the framework, images (image source domain) queried from Web image search engine and videos (video source domain) from existing action datasets are adopted to provide static information and motion information of the target video, respectively. For the image source domain, images are partitioned into several groups according to their semantic information. And for the video source domain, videos are divided in the same way. Unlike traditional methods which measure relevance between the source group and the whole target domain videos, the group weights in this paper are treated as latent variables for each target domain video and learned automatically according to the probability distribution difference between the individual source group and target domain videos. Experimental results on the two challenging video datasets (i.e., CCV and Kodak) demonstrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bergamo, A., Torresani, L.: Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In: Advances in Neural Information Processing Systems (NIPS) (2010)
Google Scholar
Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: an unsupervised approach. In: ICCV, pp. 999–1006 (2011)
Google Scholar
Kulis, B., Saenko, K., Darrell, T.: What you saw is not what you get: domain adaptation using asymmetric kernel transforms. In: CVPR, pp. 1785–1792 (2011)
Google Scholar
Duan, L., Xu, D., Tsang, Chang, S.F.: Exploiting web images for event recognition in consumer videos: a multiple source domain adaptation approach. In: CVPR, pp. 1959–1966 (2012)
Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition. ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)
Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision. ICCV 2005, vol. 2, pp. 1395–1402. IEEE (2005)
Google Scholar
Yang, J., Yan, R., Hauptmann, A.: Cross-domain video concept detection using adaptive svms. In: International Conference on Multimedia, pp. 188–197 (2007)
Google Scholar
Duan, L., Tsang, I., Xu, D., Maybank, S.: Domain transfer svm for video concept detection. In: CVPR, pp. 1375–1381 (2009)
Google Scholar
Duan, L., Xu, D., Tsang, I., Luo, J.: Visual event recognition in videos by learning from web data. In: CVPR, pp. 1959–1966 (2010)
Google Scholar
Wang, H., Wu, X., Jia, Y.: Video annotation via image groups from the web. IEEE Trans. Multimedia 16, 1282–1291 (2014)
Article Google Scholar
Doretto, G., Yao, Y.: Boosting for transfer learning with multipple auxiliary domains. In: CVPR (2010)
Google Scholar
Schweikert, G., Widmer, C., Schölkopf, B., Rätsch, G.: An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In: NIPS (2009)
Google Scholar
Ikizler-Cinbis, N., Cinbis, R., Sclaroff, S.: Learning actions from the web. In: CVPR, pp. 995–1002 (2009)
Google Scholar
Tang, K., Ramanathan, V., Fei-Fei, L., Koller, D.: Shifting weights: adapting object detectors from image to video. In: Advances in Neural Information Processing Systems, pp. 647–655 (2012)
Google Scholar
Chen, L., Duan, L., Xu, D.: Event recognition in videos by learning from heterogeneous web sources. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2666–2673. IEEE (2013)
Google Scholar
Do, T.M.T., Artières, T.: Large margin training for hidden markov models with partially observed states. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 265–272. ACM (2009)
Google Scholar
Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22, e49–e57 (2006)
Article Google Scholar
Gretton, A., Borgwardt, K., Rasch, M.J., Scholkopf, B., Smola, A.J.: A kernel method for the two-sample problem. In: NIPS (2008)
Google Scholar
Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.: Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 99, 1517–1561 (2010)
Google Scholar
Jiang, Y., Ye, G., Chang, S., Ellis, D., Loui, A.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: ICMR, p. 29 (2011)
Google Scholar
Loui, A., Luo, J., Chang, S., Ellis, D., Jiang, W., Kennedy, L., Lee, K., Yanagawa, A.: Kodak’s consumer video benchmark data set: concept definition and annotation. In: Workshop on Multimedia Information Retrieval, pp. 245–254 (2007)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Google Scholar
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Article Google Scholar
Bruzzone, L., Marconcini, M.: Domain adaptation problems: a dasvm classification technique and a circular validation strategy. PAMI 32, 770–787 (2010)
Article Google Scholar
Duan, L., Xu, D., Tsang, W.H.: Domain adaptation from multiple sources: a domain-dependent regularization approach. IEEE Trans. Neural Networks Learn. Syst. 23, 504–518 (2012)
Article Google Scholar
Chattopadhyay, R., Sun, Q., Fan, W., Davidson, I., Panchanathan, S., Ye, J.: Multisource domain adaptation and its application to early detection of fatigue. ACM Trans. Knowl. Discov. Data (TKDD) 6, 18 (2012)
Google Scholar

Download references

Acknowledgements

The research was supported in part by the Natural Science Foundation of China (NSFC) under Grant No. 61203274, the Specialized Research Fund for the Doctoral Program of Higher Education of China (20121101120029), the Specialized Fund for Joint Building Program of Beijing Municipal Education Commission, and the Excellent young scholars Research Fund of BIT.

Author information

Authors and Affiliations

Beijing Lab of Intelligent Information Technology and the School of Computer Science, Beijing Institute of Technology, Beijing, 100081, China
Han Wang, Hao Song, Xinxiao Wu & Yunde Jia

Authors

Han Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Song
View author publications
You can also search for this author in PubMed Google Scholar
Xinxiao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yunde Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Han Wang .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Song, H., Wu, X., Jia, Y. (2015). Video Annotation by Incremental Learning from Grouped Heterogeneous Sources. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-16814-2_32
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics