Pattern Analysis and Applications

, Volume 16, Issue 1, pp 117–124 | Cite as

Transductive multi-distance learning for video search

Short Paper
  • 185 Downloads

Abstract

Graph-based semi-supervised learning approaches have been proven effective and efficient in solving the problem of the inefficiency of labeled training data in many real-world application areas, such as video annotation. As a significant factor of these algorithms, however, pair-wise similarity metric of samples has not been fully investigated. Specifically, for existing approaches, the estimation of pair-wise similarity between two samples relies on the spatial property of video data. On the other hand, temporal property, an essential characteristic of video data, is not embedded into the pair-wise similarity measure. Accordingly, in this paper, a novel framework for video annotation, called Joint Spatio-Temporal Correlation Learning (JSTCL) is proposed. This framework is characterized by simultaneously taking into account both the spatial and temporal property of video data to improve the estimation of pair-wise similarity. We apply the proposed framework to video annotation and report superior performance compared to key existing approaches over the benchmark TRECVID data set.

Keywords

Graph-based semi-supervised learning Pair-wise similarity measure Spatio-temporal correlation 

Notes

Acknowledgments

This work was supported by NTFC of 20103223120003, NSFC Program of 61001077, Jiangsu NSF Program of 08KJB510015, and Jiangsu College NSF Program of 10KJB510014. It is also supported by the NJUPT Program of NY207090, NY209018 and NY209020.

References

  1. 1.
    Seeger M (2001) Learning with labeled and unlabeled data. Technical report, Edinburgh UniversityGoogle Scholar
  2. 2.
    Chapelle O, Zien A, Scholkopf B (2006) Semi-supervised learning. MIT Press, CambridgeGoogle Scholar
  3. 3.
    Song Y, Hua X, Wang M (2005) Semi-automatic video annotation based on active learning with multiple complementary predictors. In: Proceeding of ACM international conference on multimedia information retrieval, pp 97–104Google Scholar
  4. 4.
    Yan R, Naphade M (2005) Semi-supervised cross feature learning for semantic annotation in videos. In: Proceeding of IEEE international conference on computer vision and pattern recognition, pp. 657–663, 2005Google Scholar
  5. 5.
    Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic function. In: Proceeding of IEEE international conference on machine learning, pp 912–919Google Scholar
  6. 6.
    Zhou D, Bousquet O, SchÄolkopf B (2003) Learning with local and global consistency. In: Proceeding of IEEE international conference on neural information processing systems, pp 321–328Google Scholar
  7. 7.
    Belkin M, Matveeva I, Niyogi P (2004) Regularization and semi-supervised learning on large graphs. In: Proceeding of IEEE international conference on annual conference on computational learning theory, pp 624–638Google Scholar
  8. 8.
    He J, Li M, Zhang C (2006) Generalized manifold-ranking based image retrieval. In: IEEE transaction on image processing, pp 3170–3177Google Scholar
  9. 9.
    Wang C, Jing F, Zhang L, Zhang H (2007) Image annotation refinement using random walk with restarts. In: Proceeding of ACM international conference on multimedia, pp 647–650Google Scholar
  10. 10.
    Yuan X, Hua X, Wang M, Wu X (2007) Manifold-ranking based video concept detection on large database and feature pool. In: Proceeding of ACM international conference on multimedia, pp 623–626Google Scholar
  11. 11.
    Wang M, Hua X, Zhang H (2008) Automatic video annotation by semi-supervised learning with kernel density estimation. In: Proceeding of ACM international conference on multimedia, pp 967–976Google Scholar
  12. 12.
    Wang M, Meiz T, Dai L (2008) Video annotation by graph-based learning with neighborhood similarity. In: Proceedings of ACM international conference on multimedia, pp 325–328Google Scholar
  13. 13.
    Tang J, Hua X, Wu X (2009) Anisotropic manifold ranking for video annotation. In: Proceedings of IEEE international conference on multimedia and expo, pp 492–495Google Scholar
  14. 14.
    Stricker M, Orengo M (1995) Similarity of color images. In: Proceedings of IEEE international conference on storage and retrieval for image and video databases, pp 381–392Google Scholar
  15. 15.
    Pass G (1997) Comparing images using color coherence vectors. In: Proceeding of ACM international conference on multimedia, pp 65–73Google Scholar
  16. 16.
    Kokare M, Chatterji B, Biswas P (2003) Comparison of similarity metrics for texture image retrieval. In: Proceedings of IEEE international conference on multimedia and expo, pp 571–575Google Scholar
  17. 17.
    Zhu X (2007). Semi-supervised learning literature survey. Technical report, University of Wisconsin-MadisonGoogle Scholar
  18. 18.
    TRECVID. Trecvid retrieval evaluations. http://wwwnlpir.nist.gov/projects/trecvid
  19. 19.
    Chang C, Lin C. LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  20. 20.
    Wang J, Zhao Y, Wu X, Hua X (2008) Transductive multi-label learning for video concept detection. In: Proceedings of ACM international conference on multimedia, pp 298–304Google Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  1. 1.School of Automation EngineeringNanjing University of Posts and TelecommunicationsNanjingPeople’s Republic of China
  2. 2.School of Electronic Information and Electric EngineeringShanghai Jiao Tong UniversityShanghaiPeople’s Republic of China

Personalised recommendations