Mental Visual Browsing

  • Jun He
  • Xindi Shang
  • Hanwang Zhang
  • Tat-Seng Chua
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9517)


We present a surprisingly easy-to-use video browser for helping users to pinpoint a specific video shot in mind, within a long video. At each interactive iteration, the only user effort required is to click 1 shot, which most visually relates to the user’s mental target, out of 8 displayed shots. Then, the system updates the browsing model and display another 8 shots for the next iteration. The proposed system is underpinned by a theoretically-sound Bayesian framework that maintains the probabilities of all the video shots segmented from the long video. This framework guarantees that we can find the target shot out of around 1-h video within 3–5 iterations. We believe that our system will perform well in the Video Broswer Showdown game of MMM 2016.


Relevance feedback Bayesian system Video browsing Mental search 


  1. 1.
    Arandjelovic, R., Zisserman, A.: All about VLAD. In: CVPR (2013)Google Scholar
  2. 2.
    Ferecatu, M., Geman, D.: A statistical framework for image category search from a mental picture. TPAMI 31(6), 1087–1101 (2009)CrossRefGoogle Scholar
  3. 3.
    Jégou, H., Perronnin, F., Douze, M., Sanchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. TPAMI 34(9), 1704–1716 (2012)CrossRefGoogle Scholar
  4. 4.
    Jia, Y.: Caffe: an open source convolutional architecture for fast feature embedding (2013).
  5. 5.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  6. 6.
    Over, P., Awad, G., Michel, M., Fiscus, J., Sanders, G., Shaw, B., Kraaij, W., Smeaton, A.F., Quenot, G.: Trecvid 2012 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID (2012)Google Scholar
  7. 7.
    Schoeffmann, K.: A user-centric media retrieval competition: the video browser showdown 2012–2014. IEEE MultiMedia 21, 8–13 (2014)CrossRefGoogle Scholar
  8. 8.
    Schoeffmann, K., Ahlström, D., Bailer, W., Cobârzan, C., Hopfgartner, F., McGuinness, K., Gurrin, C., Frisson, C., Le, D.-D., Del Fabro, M., et al.: The video browser showdown: a live evaluation of interactive video search tools. IJMIR 3(2), 113–127 (2014)Google Scholar
  9. 9.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
  10. 10.
    Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative cnn video representation for event detection (2014). arXiv preprint arXiv:1411.4006

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Jun He
    • 1
  • Xindi Shang
    • 1
  • Hanwang Zhang
    • 1
  • Tat-Seng Chua
    • 1
  1. 1.School of ComputingNational University of SingaporeSingaporeSingapore

Personalised recommendations