Skip to main content

Video Annotation by Incremental Learning from Grouped Heterogeneous Sources

  • Conference paper
  • First Online:
Computer Vision -- ACCV 2014 (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9007))

Included in the following conference series:

Abstract

Transfer learning has shown promising results in leveraging loosely labeled Web images (source domain) to learn a robust classifier for the unlabeled consumer videos (target domain). Existing transfer learning methods typically apply source domain data to learn a fixed model for predicting target domain data once and for all, ignoring rapidly updating Web data and continuously changes of users requirements. We propose an incremental transfer learning framework, in which heterogeneous knowledge are integrated and incrementally added to update the target classifier during learning process. Under the framework, images (image source domain) queried from Web image search engine and videos (video source domain) from existing action datasets are adopted to provide static information and motion information of the target video, respectively. For the image source domain, images are partitioned into several groups according to their semantic information. And for the video source domain, videos are divided in the same way. Unlike traditional methods which measure relevance between the source group and the whole target domain videos, the group weights in this paper are treated as latent variables for each target domain video and learned automatically according to the probability distribution difference between the individual source group and target domain videos. Experimental results on the two challenging video datasets (i.e., CCV and Kodak) demonstrate the effectiveness of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bergamo, A., Torresani, L.: Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In: Advances in Neural Information Processing Systems (NIPS) (2010)

    Google Scholar 

  2. Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: an unsupervised approach. In: ICCV, pp. 999–1006 (2011)

    Google Scholar 

  3. Kulis, B., Saenko, K., Darrell, T.: What you saw is not what you get: domain adaptation using asymmetric kernel transforms. In: CVPR, pp. 1785–1792 (2011)

    Google Scholar 

  4. Duan, L., Xu, D., Tsang, Chang, S.F.: Exploiting web images for event recognition in consumer videos: a multiple source domain adaptation approach. In: CVPR, pp. 1959–1966 (2012)

    Google Scholar 

  5. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition. ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)

    Google Scholar 

  6. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision. ICCV 2005, vol. 2, pp. 1395–1402. IEEE (2005)

    Google Scholar 

  7. Yang, J., Yan, R., Hauptmann, A.: Cross-domain video concept detection using adaptive svms. In: International Conference on Multimedia, pp. 188–197 (2007)

    Google Scholar 

  8. Duan, L., Tsang, I., Xu, D., Maybank, S.: Domain transfer svm for video concept detection. In: CVPR, pp. 1375–1381 (2009)

    Google Scholar 

  9. Duan, L., Xu, D., Tsang, I., Luo, J.: Visual event recognition in videos by learning from web data. In: CVPR, pp. 1959–1966 (2010)

    Google Scholar 

  10. Wang, H., Wu, X., Jia, Y.: Video annotation via image groups from the web. IEEE Trans. Multimedia 16, 1282–1291 (2014)

    Article  Google Scholar 

  11. Doretto, G., Yao, Y.: Boosting for transfer learning with multipple auxiliary domains. In: CVPR (2010)

    Google Scholar 

  12. Schweikert, G., Widmer, C., Schölkopf, B., Rätsch, G.: An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In: NIPS (2009)

    Google Scholar 

  13. Ikizler-Cinbis, N., Cinbis, R., Sclaroff, S.: Learning actions from the web. In: CVPR, pp. 995–1002 (2009)

    Google Scholar 

  14. Tang, K., Ramanathan, V., Fei-Fei, L., Koller, D.: Shifting weights: adapting object detectors from image to video. In: Advances in Neural Information Processing Systems, pp. 647–655 (2012)

    Google Scholar 

  15. Chen, L., Duan, L., Xu, D.: Event recognition in videos by learning from heterogeneous web sources. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2666–2673. IEEE (2013)

    Google Scholar 

  16. Do, T.M.T., Artières, T.: Large margin training for hidden markov models with partially observed states. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 265–272. ACM (2009)

    Google Scholar 

  17. Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22, e49–e57 (2006)

    Article  Google Scholar 

  18. Gretton, A., Borgwardt, K., Rasch, M.J., Scholkopf, B., Smola, A.J.: A kernel method for the two-sample problem. In: NIPS (2008)

    Google Scholar 

  19. Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.: Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 99, 1517–1561 (2010)

    Google Scholar 

  20. Jiang, Y., Ye, G., Chang, S., Ellis, D., Loui, A.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: ICMR, p. 29 (2011)

    Google Scholar 

  21. Loui, A., Luo, J., Chang, S., Ellis, D., Jiang, W., Kennedy, L., Lee, K., Yanagawa, A.: Kodak’s consumer video benchmark data set: concept definition and annotation. In: Workshop on Multimedia Information Retrieval, pp. 245–254 (2007)

    Google Scholar 

  22. Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)

    Google Scholar 

  23. Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)

    Article  Google Scholar 

  24. Bruzzone, L., Marconcini, M.: Domain adaptation problems: a dasvm classification technique and a circular validation strategy. PAMI 32, 770–787 (2010)

    Article  Google Scholar 

  25. Duan, L., Xu, D., Tsang, W.H.: Domain adaptation from multiple sources: a domain-dependent regularization approach. IEEE Trans. Neural Networks Learn. Syst. 23, 504–518 (2012)

    Article  Google Scholar 

  26. Chattopadhyay, R., Sun, Q., Fan, W., Davidson, I., Panchanathan, S., Ye, J.: Multisource domain adaptation and its application to early detection of fatigue. ACM Trans. Knowl. Discov. Data (TKDD) 6, 18 (2012)

    Google Scholar 

Download references

Acknowledgements

The research was supported in part by the Natural Science Foundation of China (NSFC) under Grant No. 61203274, the Specialized Research Fund for the Doctoral Program of Higher Education of China (20121101120029), the Specialized Fund for Joint Building Program of Beijing Municipal Education Commission, and the Excellent young scholars Research Fund of BIT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Han Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, H., Song, H., Wu, X., Jia, Y. (2015). Video Annotation by Incremental Learning from Grouped Heterogeneous Sources. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16814-2_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16813-5

  • Online ISBN: 978-3-319-16814-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics