International Conference on Multimedia Modeling

MultiMedia Modeling pp 251-263 | Cite as

Towards Training-Free Refinement for Semantic Indexing of Visual Media

  • Peng Wang
  • Lifeng Sun
  • Shiqang Yang
  • Alan F. Smeaton
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9516)


Indexing of visual media based on content analysis has now moved beyond using individual concept detectors and there is now a focus on combining concepts or post-processing the outputs of individual concept detection. Due to the limitations and availability of training corpora which are usually sparsely and imprecisely labeled, training-based refinement methods for semantic indexing of visual media suffer in correctly capturing relationships between concepts, including co-occurrence and ontological relationships. In contrast to training-dependent methods which dominate this field, this paper presents a training-free refinement (TFR) algorithm for enhancing semantic indexing of visual media based purely on concept detection results, making the refinement of initial concept detections based on semantic enhancement, practical and flexible. This is achieved using global and temporal neighbourhood information inferred from the original concept detections in terms of weighted non-negative matrix factorization and neighbourhood-based graph propagation, respectively. Any available ontological concept relationships can also be integrated into this model as an additional source of external a priori knowledge. Experiments on two datasets demonstrate the efficacy of the proposed TFR solution.


Semantic indexing Refinement Concept detection enhancement Context fusion Factorization Propagation 


  1. 1.
    Aly, R., Hiemstra, D., de Jong, F., Apers, P.: Simulating the future of concept-based video retrieval under improved detector performance. Multimedia Tools Appl. 60(1), 203–231 (2012)CrossRefGoogle Scholar
  2. 2.
    Jiang, W., Chang, S.-F., Loui, A.: Context-based concept fusion with boosted conditional random fields. In: ICASSP, p. I-949 (2007)Google Scholar
  3. 3.
    Jiang, Y.-G., Dai, Q., Wang, J., Ngo, C.-W., Xue, X., Chang, S.-F.: Fast semantic diffusion for large-scale context-based image and video annotation. IEEE Trans. Image Proc. 21(6), 3080–3091 (2012)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Jiang, Y.-G., Wang, J., Chang, S.-F., Ngo, C.-W.: Domain adaptive semantic diffusion for large scale context-based video annotation. In: ICCV, pp. 1420–1427 (2009)Google Scholar
  5. 5.
    Jiang, Y.-G., Ngo, C.-W., Yang, J.: Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of the 6th ACM International Conference on Image and Video Retrieval (CIVR), pp. 494–501. ACM (2007)Google Scholar
  6. 6.
    Jin, Y., Khan, L., Wang, L., Awad, M.: Image annotations by combining multiple evidence & WordNet. In: ACM Multimedia, pp. 706–715 (2005)Google Scholar
  7. 7.
    Kennedy, L.S., Chang, S.-F.: A reranking approach for context-based concept fusion in video indexing and retrieval. In: CIVR, pp. 333–340. ACM (2007)Google Scholar
  8. 8.
    Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: NIPS, pp. 556–562. MIT Press, April 2001Google Scholar
  9. 9.
    Li, B., Goh, K., Chang, E.Y.: Confidence-based dynamic ensemble for image annotation and semantics discovery. In: ACM Multimedia, pp. 195–206 (2003)Google Scholar
  10. 10.
    Naphade, M., Smith, J.R., Tesic, J., Chang, S.-F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-scale concept ontology for multimedia. IEEE Multimedia 13(3), 86–91 (2006)CrossRefGoogle Scholar
  11. 11.
    Qi, G.-J., Hua, X.-S., Rui, Y., Tang, J., Mei, T., Zhang, H.-J.: Correlative multi-label video annotation. In: ACM Multimedia, pp. 17–26 (2007)Google Scholar
  12. 12.
    Smeaton, A., Over, P., Kraaij, W.: High level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Divakaran, A. (ed.) Multimedia Content Analysis, Theory and Applications, pp. 151–174 (2008)Google Scholar
  13. 13.
    Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: Proceedings of the ACM International Workshop on Multimedia Information Retrieval, pp. 321–330. ACM (2006)Google Scholar
  14. 14.
    Snoek, C.G.M., Worring, M.: Concept-based video retrieval. Found. Trends Inf. Retrieval 2(4), 215–322 (2008)CrossRefGoogle Scholar
  15. 15.
    Wang, C., Jing, F., Zhang, L., Zhang, H.-J.: Image annotation refinement using random walk with restarts. In: ACM Multimedia, pp. 647–650 (2006)Google Scholar
  16. 16.
    Wang, C., Jing, F., Zhang, L., Zhang, H.-J.: Content-based image annotation refinement. In: CVPR, pp. 1–8 (2007)Google Scholar
  17. 17.
    Wang, P., Smeaton, A.F., Gurrin, C.: Factorizing time-aware multi-way tensors for enhancing semantic wearable sensing. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015, Part I. LNCS, vol. 8935, pp. 571–582. Springer, Heidelberg (2015) Google Scholar
  18. 18.
    Wu, Y., Tseng, B., Smith, J.: Ontology-based multi-classification learning for video concept detection. In: ICME, vol. 2, pp. 1003–1006 (2004)Google Scholar
  19. 19.
    Xu, D., Cui, P., Zhu, W., Yang, S.: Find you from your friends: graph-based residence location prediction for users in social media. In: ICME, pp. 1–6 (2014)Google Scholar
  20. 20.
    Xue, X., Zhang, W., Zhang, J., Wu, B., Fan, J., Lu, Y.: Correlative multi-label multi-instance image annotation. In: ICCV, pp. 651–658. IEEE (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Peng Wang
    • 1
  • Lifeng Sun
    • 1
  • Shiqang Yang
    • 1
  • Alan F. Smeaton
    • 2
  1. 1.National Laboratory for Information Science and Technology, Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  2. 2.Insight Centre for Data AnalyticsDublin City UniversityGlasnevin, Dublin 9Ireland

Personalised recommendations