Skip to main content

Compressing Visual Descriptors of Image Sequences

Part of the Lecture Notes in Computer Science book series (LNISA,volume 10133)


In recent years, there has been significant progress in developing more compact visual descriptors, typically by aggregating local descriptors. However, all these methods are descriptors for still images, and are typically applied independently to (key) frames when used in tasks such as instance search in video. Thus, they do not make use of the temporal redundancy of the video, which has negative impacts on the descriptor size and the matching complexity. We propose a compressed descriptor for image sequences, which encodes a segment of video using a single descriptor. The proposed approach is a framework that can be used with different local descriptors, including compact descriptors. We describe the extraction and matching process for the descriptor and provide evaluation results on a large video data set.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-51814-5_11
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   84.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-51814-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   107.00
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.


  1. 1.


  1. Call for proposals for compact descriptors for video analysis (CDVA) - search and retrieval. Technical report ISO/IEC JTC1/SC29/WG11/N15339 (2015)

    Google Scholar 

  2. Evaluation framework for compact descriptors for video analysis - search and retrieval - version 2.0. Technical report ISO/IEC JTC1/SC29/WG11/N15729 (2015)

    Google Scholar 

  3. ISO/IEC 15938-13: Information technology - multimedia content description interface - part 13: compact descriptors for visual search (2015)

    Google Scholar 

  4. Arandjelovic, R., Zisserman, A.: All about VLAD. In: 2013 IEEE Conference Computer Vision and Pattern Recognition (CVPR), pp. 1578–1585, June 2013

    Google Scholar 

  5. Balestri, M., Francini, G., Lepsøy, S.: Keypoint identification. Patent application WO 2015/011185 A1 (2013)

    Google Scholar 

  6. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)

    CrossRef  Google Scholar 

  7. Duan, L.-Y., Gao, F., Chen, J., Lin, J., Huang, T.: Compact descriptors for mobile visual search and MPEG CDVS standardization. In: IEEE International Symposium on Circuits and Systems, pp. 885–888 (2013)

    Google Scholar 

  8. Jegou, H., Douze, M., Schmid, C., Perez, P.: Aggregating local descriptors into a compact image representation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3304–3311, June 2010

    Google Scholar 

  9. Lin, J., Duan, L.-Y., Huang, Y., Luo, S., Huang, T., Gao, W.: Rate-adaptive compact fisher codes for mobile visual search. IEEE Sig. Process. Lett. 21(2), 195–198 (2014)

    CrossRef  Google Scholar 

  10. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    CrossRef  Google Scholar 

  11. Mikolajczyk, K., Schmid, C.: Scale & affine invariant interest point detectors. Int. J. Comput. Vis. 60(1), 63–86 (2004)

    CrossRef  Google Scholar 

  12. Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: IEEE Conference Computer Vision and Pattern Recognition, June 2007

    Google Scholar 

  13. Picard, D., Gosselin, P.-H.: Improving image similarity with vectors of locally aggregated tensors. In: IEEE International Conference on Image Processing, Brussels, BE, September 2011

    Google Scholar 

  14. Rublee, E., Rabaud, V., Konolige, K. Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2564–2571, November 2011

    Google Scholar 

Download references


The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007–2013) under grant agreement no 610370, ICoSOLE, and from the Austrian Research Promotion Agency under the KIRAS grant E.V.A.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Werner Bailer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bailer, W., Wechtitsch, S., Thaler, M. (2017). Compressing Visual Descriptors of Image Sequences. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10133. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51813-8

  • Online ISBN: 978-3-319-51814-5

  • eBook Packages: Computer ScienceComputer Science (R0)