Fusing Keyword Search and Visual Exploration for Untagged Videos

  • Kai Uwe BarthelEmail author
  • Nico Hezel
  • Klaus Jung
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10705)


Video collections often cannot be searched by keywords because most videos are poorly annotated. We present a system that allows to search untagged videos by sketches, example images and keywords. Having analyzed the most frequent search terms and the corresponding images from the Pixabay stock photo agency we derived visual features that allow to search for 20000 keywords. For each keyword we use several image features to be able to cope with large visual and conceptual variations. As the intention of a user searching for an image is unknown, we retrieve thousands of result images (video scenes), which are shown as a visually sorted hierarchical image map. The user can easily find images of interest by dragging and zooming. The visual arrangement of the images is performed with an improved version of a self-sorting map, which allows organizing thousands of images in fractions of a second. If an image similar to the search query has been found, further zooming will show more related images, retrieved from a precomputed image graph. The new approach helps to find untagged images very quickly in an exploratory, incremental way.


Content-based video retrieval Exploration Image browsing Visualization Navigation Convolutional neural networks 


  1. 1.
    Cobârzan, C., et al.: Interactive video search tools: a detailed analysis of the video browser showdown 2015. Multimed. Tools Appl. 76(4), 5539–5571 (2017)CrossRefGoogle Scholar
  2. 2.
    Krizhevsky, A. et al.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  3. 3.
    He, K. et al.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE Computer Society (2016)Google Scholar
  4. 4.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). CrossRefGoogle Scholar
  5. 5.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Bach, F.R., Blei, D.M. (eds.) ICML, pp. 448–456 (2015).
  6. 6.
    Szegedy, C. et al.: Going deeper with convolutions. In: CVPR, pp. 1–9. IEEE Computer Society (2015)Google Scholar
  7. 7.
    Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). CrossRefGoogle Scholar
  9. 9.
    Wie, W.: GitHub ResNet. Accessed 8 Oct 2017
  10. 10.
    Hezel, N., et al.: ImageX – Explore and search local/private images, submitted to MMM2018Google Scholar
  11. 11.
    Girshick, R.: Fast R-CNN. CoRR. abs/1504.08083 (2015)Google Scholar
  12. 12.
    Barthel, K.U., Hezel, N., Mackowiak, R.: ImageMap - visually browsing millions of images. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds.) MMM 2015. LNCS, vol. 8936, pp. 287–290. Springer, Cham (2015). Google Scholar
  13. 13.
    Picsbuffet Homepage. Accessed 15 Oct 2017
  14. 14.
    Barthel, K.U., Hezel N.: Visually exploring millions of images using image maps and graphs. In: Big Data Analytics for Large-Scale Multimedia Search (2018, to appear). (to be published by Wiley & Sons)Google Scholar
  15. 15.
    Barthel, K.U. et al.: Visually browsing millions of images using image graphs. In: Ionescu, B. et al. (eds.) ICMR, pp. 475–479. ACM (2017)Google Scholar
  16. 16.
    Strong, G., Gong, M.: Self-sorting map: an efficient algorithm for presenting multimedia data in structured layouts. IEEE Trans. Multimed. 16(4), 1045–1058 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Visual Computing GroupHTW Berlin, University of Applied SciencesBerlinGermany

Personalised recommendations