Multimedia Tools and Applications

, Volume 70, Issue 2, pp 647–660 | Cite as

Typicality ranking: beyond accuracy for video semantic annotation



In video annotation, the typicalities or relevancy degrees of relevant samples to a certain concept are generally different. Thus we argue that it is more reasonable to rank typical relevant samples higher than non-typical ones. However, generally the labels of the training data only differentiate relevant of irrelevant; that is to say, typical or non-typical training samples have the same contribution to the learning process. Therefore, the learned scores of the unlabeled data cannot well measure the typicality. Accordingly, three pre-processing approaches are proposed to relax the labels of the training data to real-valued typicality scores. Then the typicality scores of the training data are propagated to unlabeled data using manifold ranking. Meanwhile, we propose to use a novel criterion, Average Typicality Precision (ATP), to replace the frequently used one, Average Precision (AP), for evaluating the performance of video typicality ranking algorithms. Though AP cares the number of relevant samples at the top of the annotation rank list, it actually does not care the typicality order of these samples, while which was taken into consideration of the evaluation strategy ATP. Experiments conducted on the TRECVID data set demonstrate that this typicality ranking scheme is more consistent with human perception than normal accuracy based ranking schemes.


Video Annotation Typicality Ranking Average Typicality Precision 



The work presented in this paper was partially supported by National Nature Science Foundation of China (NSFC) under grants 61103059 and 61173104, and Jiangsu Nature Science Foundation under grant BK2011700.


  1. 1.
    Duda RO, Stork DG, Hart PE (Oct. 2000) Pattern classification, 2nd edn. John Wiley.Google Scholar
  2. 2.
    Feng SL, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. IEEE Conf Comput Vis Pattern Recogn.Google Scholar
  3. 3.
    Ghoshal A, Arcing P, Khudanpur S (2005) Hidden Markov models for automatic annotation and content-based retrieval of images and video. ACM Conference on Research & Development on Information Retrieval.Google Scholar
  4. 4.
    Guidelines for the TRECVID 2005 Evaluation.
  5. 5.
    He J, Li M, Zhang H-J, Tong H, Zhang C (Oct. 2006) Generalized manifold-ranking based image retrieval. IEEE Trans Image Process.Google Scholar
  6. 6.
    Li X, Snoek CGM, Worring M (2009) Learning social tag relevance by neighbor voting. IEEE Trans Multimed 11(7).Google Scholar
  7. 7.
    Liu D, Hua X-S, Yang L, Wang M, Zhang H-J (2009) Tag ranking. 18th International Conference on World Wide Web.Google Scholar
  8. 8.
    Naphade M, Smith JR, Tesic J, Chang S-F, Hsu W, Kennedy L, Hauptmann AG, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimedia 16:3Google Scholar
  9. 9.
    Rui Y, Huang TS, Ortega M, Mehrotra S (Sept. 1998) Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Trans Circuit Syst Video Technol.Google Scholar
  10. 10.
    Schein AI (2005) Active learning for logistic regression, PhD thesis, University of Pennsylvania.Google Scholar
  11. 11.
    Schohn G, Cohn D (2000) Less is more: active learning with support vector machines. Proc. 17th International Conference on Machine Learning.Google Scholar
  12. 12.
    Schwaninger A, Vogel J, Hofer F, Schiele B (Oct. 2006) A psychophysically plausible model for typicality ranking of natural scenes. ACM Trans Appl Percept 3(Issue 4).Google Scholar
  13. 13.
    Seung HS, Opper M, Sompolinsky H (1992) Query by committee. Conference on Computational Learning Theory.Google Scholar
  14. 14.
    Shen J, Cheng Z (2011) Personalized video similarity measure. Multimedia Systems.Google Scholar
  15. 15.
    Shen J, Tao D, Li X (2008) Modality mixture projections for semantic video event detection. IEEE Trans Circ Syst Video Tech 18:11Google Scholar
  16. 16.
    Smith JR, Schirling P (2006) Metadata standards roundup. IEEE Multimedia 13:2CrossRefGoogle Scholar
  17. 17.
    Snoek CG, Worring M, Smeulders AW (2005) Early versus late fusion in semantic video analysis. ACM International Conference on Multimedia.Google Scholar
  18. 18.
    Snoek CGM, Worring M, Gemert JCV, Geusebroek J-M, Smeulders AWM (2006) The challenge problem for automated detection of 101 semantic concepts in multimedia. ACM Multimedia.Google Scholar
  19. 19.
    Song Y, Hua X-S, Dai L, Wang M (2005) Semi-automatic video annotation based on active learning with multiple complementary predictors. ACM International Workshop on Multimedia Information Retrieval.Google Scholar
  20. 20.
    Song Y, Hua X-S, Qi G-J, Dai L-R, Wang M, Zhang H-J (2006) Efficient semantic annotation method for indexing large personal video database. ACM International Workshop on Multimedia Information Retrieval.Google Scholar
  21. 21.
    Tang J, Song Y, Hua X-S, Mei T, Wu X (2006) To construct optimal training set for video annotation. ACM International Conference on Multimedia.Google Scholar
  22. 22.
    Tang J, Hua X-S, Mei T, Qi G-J, Wu X (2007) Video annotation based on temporally consistent gaussian random field. Electron Lett 43(8).Google Scholar
  23. 23.
    Tang J, Hua X-S, Qi G-J, Wang M, Mei T, Wu X (2007) Structure-sensitive manifold ranking for video concept detection. ACM Multimedia. Augsburg, Germany, Sep. 23–29.Google Scholar
  24. 24.
    Tang J, Hua X-S, Qi G-J, Gu Z, Wu X (2007) Beyond accuracy: typicality ranking for video annotation. IEEE International Conference on Multimedia and Expo.Google Scholar
  25. 25.
    Tang J, Hua X-S, Qi G-J, Song Y, Wu X (2008) Video annotation based on kernel linear neighborhood propagation. IEEE Trans Multimed 10:4CrossRefGoogle Scholar
  26. 26.
    Tang J, Hua X-S, Wang M, Gu Z, Qi G-J, Wu X (2009) Correlative linear neighborhood propagation for video annotation. IEEE Trans Syst Man Cybern B Cybern 39:2CrossRefGoogle Scholar
  27. 27.
    Tang J, Wang M, Hua X-S, Chua T-S (2011) Social media mining and search. Multimed Tool Appl.Google Scholar
  28. 28.
    Tong H, He J, Li M, Zhang C, Ma W (2005) Graph based multimodality learning. ACM Multimedia.Google Scholar
  29. 29.
    TREC-10 appendix on common evaluation measures.
  30. 30.
    Wang F, Zhang C (2008) Label propagation through linear neighborhoods. IEEE Trans Knowl Data Eng 20:1CrossRefGoogle Scholar
  31. 31.
    Wang M, Hua X-S, Song Y, Yuan X, Dai L, Zhang H-J (2006) Automatic video annotation by semi-supervised learning with kernel density estimation. ACM International Conference on Multimedia.Google Scholar
  32. 32.
    Wu Y, Chang EY (2004) Optimal multimodal fusion for multimedia data analysis. ACM International Conference on Multimedia.Google Scholar
  33. 33.
    Yan R, Hauptamann AG (2003) The combination limit in multimedia retrieval. ACM International Conference on Multimedia.Google Scholar
  34. 34.
    Yan R, Naphade M (2005) Semi-supervised cross feature learning for semantic concept detection in videos. IEEE Conf Comput Vis Pattern Recogn.Google Scholar
  35. 35.
    Yang J, Liu Y, Ping EX, Hauptmann AG (2007) Harmonium models for semantic video representation and classification. SIAM Conference on Data Mining.Google Scholar
  36. 36.
    Yuan X, Hua X-S, Wang M, Wu X (2006) Manifold-ranking based video concept detection on large database and feature pool. ACM International Conference on Multimedia.Google Scholar
  37. 37.
    Zhou D, Bousquet O, Lal TN, Weston J, Scholkopf B (2003) Learning with local and global consistency. 17-th Annual Conference on Neural Information Processing Systems.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyNanjing University of Science and TechnologyNanjingPeople’s Republic of China
  2. 2.Microsoft Bing Media SearchRedmondUSA

Personalised recommendations