International Conference on Multimedia Modeling

MultiMedia Modeling pp 874-885 | Cite as

Ordering of Visual Descriptors in a Classifier Cascade Towards Improved Video Concept Detection

  • Foteini Markatopoulou
  • Vasileios Mezaris
  • Ioannis Patras
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9516)


Concept detection for semantic annotation of video fragments (e.g. keyframes) is a popular and challenging problem. A variety of visual features is typically extracted and combined in order to learn the relation between feature-based keyframe representations and semantic concepts. In recent years the available pool of features has increased rapidly, and features based on deep convolutional neural networks in combination with other visual descriptors have significantly contributed to improved concept detection accuracy. This work proposes an algorithm that dynamically selects, orders and combines many base classifiers, trained independently with different feature-based keyframe representations, in a cascade architecture for video concept detection. The proposed cascade is more accurate and computationally more efficient, in terms of classifier evaluations, than state-of-the-art classifier combination approaches.


Concept detection Video analysis Cascade architecture Classifier ordering 



This work was supported by the European Commission under contract FP7-600826 ForgetIT.


  1. 1.
    Bao, L., et al.: CMU-Informedia@TRECVID 2011 semantic indexing. In: TRECVID 2011 Workshop, Gaithersburg, MD, USA (2011)Google Scholar
  2. 2.
    Bay, H., et al.: Speeded-up robust features (surf). Comput. Vis. Image Underst. 110(3), 346–359 (2008)CrossRefGoogle Scholar
  3. 3.
    Chellapilla, K., Shilman, M., Simard, P.Y.: Combining multiple classifiers for faster optical character recognition. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 358–367. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  4. 4.
    Cheng, W.C., Jhan, D.M.: A cascade classifier using adaboost algorithm and support vector machine for pedestrian detection. In: IEEE International Conference on SMC, pp. 1430–1435 (2011)Google Scholar
  5. 5.
    Jegou, H., et al.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2010), San Francisco, CA, pp. 3304–3311 (2010)Google Scholar
  6. 6.
    Krizhevsky, A., Ilya, S., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc., Red Hook (2012)Google Scholar
  7. 7.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  8. 8.
    Markatopoulou, F., Pittaras, N., Papadopoulou, O., Mezaris, V., Patras, I.: A study on the use of a binary local descriptor and color extensions of local descriptors for video concept detection. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015, Part I. LNCS, vol. 8935, pp. 282–293. Springer, Heidelberg (2015) Google Scholar
  9. 9.
    Markatopoulou, F., Mezaris, V., Patras, I.: Cascade of classifiers based on binary, non-binary and deep convolutional network descriptors for video concept detection. In: IEEE International Conference on Image Processing (ICIP 2015). IEEE, Canada (2015)Google Scholar
  10. 10.
    Nguyen, C., Vu Le, H., Tokuyama, T.: Cascade of multi-level multi-instance classifiers for image annotation. In: KDIR 2011, pp. 14–23 (2011)Google Scholar
  11. 11.
    Over, P., et al.: Trecvid 2013 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2013. NIST, USA (2013)Google Scholar
  12. 12.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Maragos, P., Paragios, N., Daniilidis, K. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  13. 13.
    Safadi, B., Quénot, G.: Re-ranking by local re-scoring for video indexing and retrieval. In: 20th ACM International Conference on Information and Knowledge Management, pp. 2081–2084. ACM, NY (2011)Google Scholar
  14. 14.
    Sidiropoulos, P., Mezaris, V., Kompatsiaris, I.: Video tomographs and a base detector selection strategy for improving large-scale video concept detection. IEEE Trans. Circ. Syst. Video Technol. 24(7), 1251–1264 (2014)CrossRefGoogle Scholar
  15. 15.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv technical report (2014)Google Scholar
  16. 16.
    Strat, S.T., Benoit, A., Bredin, H., Quénot, G., Lambert, P.: Hierarchical late fusion for concept detection in videos. In: Fusiello, A., Murino, V., Cucchiara, R. (eds.) ECCV 2012 Ws/Demos, Part III. LNCS, vol. 7585, pp. 335–344. Springer, Heidelberg (2012) Google Scholar
  17. 17.
    Szegedy, C., et al.: Going deeper with convolutions. In: CVPR 2015 (2015).
  18. 18.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001), vol. 1, pp. 511–518 (2001)Google Scholar
  19. 19.
    Yilmaz, E., Kanoulas, E., Aslam, J.A.: A simple and efficient sampling method for estimating AP and NDCG. In: 31st ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 603–610. ACM, USA (2008)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Foteini Markatopoulou
    • 1
    • 2
  • Vasileios Mezaris
    • 1
  • Ioannis Patras
    • 2
  1. 1.Information Technologies Institute (ITI)CERTHThermiGreece
  2. 2.Queen Mary University of LondonLondonUK

Personalised recommendations