Skip to main content

Mental Visual Browsing

Part of the Lecture Notes in Computer Science book series (LNISA,volume 9517)


We present a surprisingly easy-to-use video browser for helping users to pinpoint a specific video shot in mind, within a long video. At each interactive iteration, the only user effort required is to click 1 shot, which most visually relates to the user’s mental target, out of 8 displayed shots. Then, the system updates the browsing model and display another 8 shots for the next iteration. The proposed system is underpinned by a theoretically-sound Bayesian framework that maintains the probabilities of all the video shots segmented from the long video. This framework guarantees that we can find the target shot out of around 1-h video within 3–5 iterations. We believe that our system will perform well in the Video Broswer Showdown game of MMM 2016.


  • Relevance feedback
  • Bayesian system
  • Video browsing
  • Mental search

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-27674-8_44
  • Chapter length: 5 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-27674-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   84.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.


  1. Arandjelovic, R., Zisserman, A.: All about VLAD. In: CVPR (2013)

    Google Scholar 

  2. Ferecatu, M., Geman, D.: A statistical framework for image category search from a mental picture. TPAMI 31(6), 1087–1101 (2009)

    CrossRef  Google Scholar 

  3. Jégou, H., Perronnin, F., Douze, M., Sanchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. TPAMI 34(9), 1704–1716 (2012)

    CrossRef  Google Scholar 

  4. Jia, Y.: Caffe: an open source convolutional architecture for fast feature embedding (2013).

  5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  6. Over, P., Awad, G., Michel, M., Fiscus, J., Sanders, G., Shaw, B., Kraaij, W., Smeaton, A.F., Quenot, G.: Trecvid 2012 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: TRECVID (2012)

    Google Scholar 

  7. Schoeffmann, K.: A user-centric media retrieval competition: the video browser showdown 2012–2014. IEEE MultiMedia 21, 8–13 (2014)

    CrossRef  Google Scholar 

  8. Schoeffmann, K., Ahlström, D., Bailer, W., Cobârzan, C., Hopfgartner, F., McGuinness, K., Gurrin, C., Frisson, C., Le, D.-D., Del Fabro, M., et al.: The video browser showdown: a live evaluation of interactive video search tools. IJMIR 3(2), 113–127 (2014)

    Google Scholar 

  9. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556

  10. Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative cnn video representation for event detection (2014). arXiv preprint arXiv:1411.4006

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Xindi Shang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

He, J., Shang, X., Zhang, H., Chua, TS. (2016). Mental Visual Browsing. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9517. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27673-1

  • Online ISBN: 978-3-319-27674-8

  • eBook Packages: Computer ScienceComputer Science (R0)