Abstract
Transfer learning has shown promising results in leveraging loosely labeled Web images (source domain) to learn a robust classifier for the unlabeled consumer videos (target domain). Existing transfer learning methods typically apply source domain data to learn a fixed model for predicting target domain data once and for all, ignoring rapidly updating Web data and continuously changes of users requirements. We propose an incremental transfer learning framework, in which heterogeneous knowledge are integrated and incrementally added to update the target classifier during learning process. Under the framework, images (image source domain) queried from Web image search engine and videos (video source domain) from existing action datasets are adopted to provide static information and motion information of the target video, respectively. For the image source domain, images are partitioned into several groups according to their semantic information. And for the video source domain, videos are divided in the same way. Unlike traditional methods which measure relevance between the source group and the whole target domain videos, the group weights in this paper are treated as latent variables for each target domain video and learned automatically according to the probability distribution difference between the individual source group and target domain videos. Experimental results on the two challenging video datasets (i.e., CCV and Kodak) demonstrate the effectiveness of our proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bergamo, A., Torresani, L.: Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In: Advances in Neural Information Processing Systems (NIPS) (2010)
Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: an unsupervised approach. In: ICCV, pp. 999–1006 (2011)
Kulis, B., Saenko, K., Darrell, T.: What you saw is not what you get: domain adaptation using asymmetric kernel transforms. In: CVPR, pp. 1785–1792 (2011)
Duan, L., Xu, D., Tsang, Chang, S.F.: Exploiting web images for event recognition in consumer videos: a multiple source domain adaptation approach. In: CVPR, pp. 1959–1966 (2012)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition. ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision. ICCV 2005, vol. 2, pp. 1395–1402. IEEE (2005)
Yang, J., Yan, R., Hauptmann, A.: Cross-domain video concept detection using adaptive svms. In: International Conference on Multimedia, pp. 188–197 (2007)
Duan, L., Tsang, I., Xu, D., Maybank, S.: Domain transfer svm for video concept detection. In: CVPR, pp. 1375–1381 (2009)
Duan, L., Xu, D., Tsang, I., Luo, J.: Visual event recognition in videos by learning from web data. In: CVPR, pp. 1959–1966 (2010)
Wang, H., Wu, X., Jia, Y.: Video annotation via image groups from the web. IEEE Trans. Multimedia 16, 1282–1291 (2014)
Doretto, G., Yao, Y.: Boosting for transfer learning with multipple auxiliary domains. In: CVPR (2010)
Schweikert, G., Widmer, C., Schölkopf, B., Rätsch, G.: An empirical analysis of domain adaptation algorithms for genomic sequence analysis. In: NIPS (2009)
Ikizler-Cinbis, N., Cinbis, R., Sclaroff, S.: Learning actions from the web. In: CVPR, pp. 995–1002 (2009)
Tang, K., Ramanathan, V., Fei-Fei, L., Koller, D.: Shifting weights: adapting object detectors from image to video. In: Advances in Neural Information Processing Systems, pp. 647–655 (2012)
Chen, L., Duan, L., Xu, D.: Event recognition in videos by learning from heterogeneous web sources. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2666–2673. IEEE (2013)
Do, T.M.T., Artières, T.: Large margin training for hidden markov models with partially observed states. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 265–272. ACM (2009)
Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22, e49–e57 (2006)
Gretton, A., Borgwardt, K., Rasch, M.J., Scholkopf, B., Smola, A.J.: A kernel method for the two-sample problem. In: NIPS (2008)
Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.: Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 99, 1517–1561 (2010)
Jiang, Y., Ye, G., Chang, S., Ellis, D., Loui, A.: Consumer video understanding: a benchmark database and an evaluation of human and machine performance. In: ICMR, p. 29 (2011)
Loui, A., Luo, J., Chang, S., Ellis, D., Jiang, W., Kennedy, L., Lee, K., Yanagawa, A.: Kodak’s consumer video benchmark data set: concept definition and annotation. In: Workshop on Multimedia Information Retrieval, pp. 245–254 (2007)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Bruzzone, L., Marconcini, M.: Domain adaptation problems: a dasvm classification technique and a circular validation strategy. PAMI 32, 770–787 (2010)
Duan, L., Xu, D., Tsang, W.H.: Domain adaptation from multiple sources: a domain-dependent regularization approach. IEEE Trans. Neural Networks Learn. Syst. 23, 504–518 (2012)
Chattopadhyay, R., Sun, Q., Fan, W., Davidson, I., Panchanathan, S., Ye, J.: Multisource domain adaptation and its application to early detection of fatigue. ACM Trans. Knowl. Discov. Data (TKDD) 6, 18 (2012)
Acknowledgements
The research was supported in part by the Natural Science Foundation of China (NSFC) under Grant No. 61203274, the Specialized Research Fund for the Doctoral Program of Higher Education of China (20121101120029), the Specialized Fund for Joint Building Program of Beijing Municipal Education Commission, and the Excellent young scholars Research Fund of BIT.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, H., Song, H., Wu, X., Jia, Y. (2015). Video Annotation by Incremental Learning from Grouped Heterogeneous Sources. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9007. Springer, Cham. https://doi.org/10.1007/978-3-319-16814-2_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-16814-2_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16813-5
Online ISBN: 978-3-319-16814-2
eBook Packages: Computer ScienceComputer Science (R0)