figure a

Welcome to the special issue on video retrieval. Recently there has been an explosion in digital video due to the prevalence of digital television, movie streaming, digital surveillance, and massive video collections such as NetFlix, iTunes, and Amazon. With the immense amounts of video come the critical tasks of analyzing, browsing, and searching for video. In this special issue we present new ideas and developments in the field of video retrieval comprised of peer-reviewed papers recommended and carefully reviewed by the editorial board and prominent experts (special thanks to Mohan Kankanhalli, Stefan Rueger, and R. Manmatha).

One approach toward bridging the semantic gap (this refers to the gap between machine low level features and the high level human language) is by searching through storylines, which are matched and aligned to the video. The paper “Aligning Plot Synopses to Videos for Story-based Retrieval” by Makarand Tapaswi, Martin Bäuml, and Rainer Stiefelhagen, performs an alignment between synopses and video shots by treating it as an optimization problem solved using dynamic programming. Their findings show promising results on real world television video datasets.

The semantic gap is also being bridged in video event detection and video classification. The paper “Weakly Supervised Detection of Video Events Using Hidden Conditional Random Fields” by Kimiaki Shirahama, Marcin Grzegorzek, and Kuniaki Uehara, focuses on two well-known problems in event detection: weakly supervised setting and unclear event structure. They describe an approach using Hidden Conditional Random Fields, which gives significant performance improvements.

The paper “Video Classification with Densely Extracted HOG/HOF/MBH Features: An Evaluation of the Accuracy/Computational Efficiency Trade-off” by J. Uijlings, I. C. Duta, E. Sangineto and N. Sebe compares well-known approaches in bag-of-words based video classification and proposes several improvements. The authors give valuable insights in the trade-off between accuracy and computational efficiency regarding vector quantization techniques.

Another way in which video analysis is important to society is in video copy detection both for retrieval and also illegal copy detection. In the paper, “A novel framework for CBCD using integrated color and acoustic features” by R. Roopalakshmi, the author proposes a novel multimodal fingerprint based on spatio-temporal visual features and mel-frequency cepstral coefficients. Results from TRECVID datasets show promising results.

The ability to find similar videos allows and enables intuitive browsing and automatic video classification. The work “VIDCAR: an unsupervised CBVR framework for identifying similar videos with prominent object motion” by Chiranjoy Chattopadhyay, proposes a method and feature descriptor which uses the temporal curvature scale space to represent the evolution of the video content descriptor, which allows their system to work on video shots with camera movement. Their system was tested on well-known international datasets such as KTH and VPLAB and compared to competitive methods from the research literature.