Advertisement

Multimedia Tools and Applications

, Volume 69, Issue 2, pp 247–251 | Cite as

Guest editorial: Event-based video analysis/retrieval

  • Anastasios Doulamis
  • Nikolaos Doulamis
  • Luc van Gool
  • Mark Nixon
Guest Editorial

Identification of events from visual cues is in general an arduous task because of complex motion, cluttered backgrounds, occlusions, and geometric and photometric variations of the physical objects. This is even more challenging in case of detection of a logical chain of events, i.e., of a sequence of events called a workflow, and in case of the presence of multiple workflows of events in the environment, able to interact one with the other and affect one the outcome of the other.

The recent research advances in computer vision and pattern recognition society have stimulated the development of a series of innovative algorithms, tools and methods for salient object detection and tracking in still images/video streams. These techniques are framed with appropriate descriptors (usually with invariance properties) such as the Scale-Invariant Feature Transform (SIFT) or the Speeded Up Robust Features (SURF), or the MPEG-7 visual descriptors. All these research methods can be considered as initial steps towards the ultimate goal for behavior/event understanding. However, automatic comprehension of someone’s behavior within a scene or even automatic supervision of workflows (e.g., industrial processes) is a complex research field of great attention but with limited results so far.

Most of the current approaches presented involve machine learning theories, such as supervised or semi-supervised methods, object tracking algorithms, adaptation mechanisms to handle complex, dynamic and abrupt visual conditions and application-specific analysis topics.

On the other hand, during the past few years, more and more people have been coping with the so called “Information Overload” phenomenon. On the basis of the i) diversity and plenitude of media information currently available on the web and ii) the gradual but quick role shift of the users from being solely content consumers to acting both as content consumers and content providers a new challenge is emerging: how to make the most out of these vast amounts of media information quickly searchable often under a personalized manner.

This issue of Multimedia Tools and Applications Journal (MTAP) covers these two research challenging issues in computer vision and multimedia. The issue is mainly focused on visual events analysis and understanding with applications on visual retrieval topics. The call was published in MTAP journal. In parallel the Guest Editors disseminated the event in several groups and societies researching on this topic.

The call attracted many submissions all over the world, from Europe, America (mainly from USA and Canada), Australia, Asia (mainly from China, and Japan). All articles underwent a peer review process for at least three reviewers, while most of them received four or even five reviews. All non-rejected articles underwent revision to meet the reviewers’ and editors’ comments, while some of them were sent for a second or even a third revision at least all comments were carefully addressed. Failure to meet these tough conditions led to rejection.

After completion of this hard process only eight papers were finally accepted for publication in MTAP, while most of the submitted ones were rejected. The accepted papers cover most of the research components needed for action recognition and event detection from visual data. In particular, the papers of this issue concern (i) the use and/or fusion of new features able to improve the event detection performance, (ii) visual event analysis under complex conditions such as the industrial ones, (iii) environmental event detection on exploiting depth sensors cameras, (iv) extraction of song meanings from movies and finally (v) multimedia content search on the use of crowd-sourcing information and social awareness.

In particular, the work of Benmokhtar et al. proposes a new feature fusion framework able to analyzing and detecting visual events. Another approach is given in the article entitled “Efficient tracking using a robust motion estimation technique”. Instead of fusing visual features, the authors of this article introduce a new motion feature able to track salient objects within complex visual scenes while leaving the non-salient ones intact. The work entitled “A top-down event-driven approach for concurrent activity recognition” proposes a top-down algorithm for identifying complex industrial workflows from a set of visual cameras. Great research challenges are imposed even from the content of the sequences used. In addition, a feedback mechanism is imposed to improve event detection analysis under such environments. Verstockt et al. proposes a scheme for detecting fires on the use of depth sensors like multimodal Time of Flight.

The remaining four articles are focused more on retrieving visual content from large datasets. Content retrieval exploits visual analysis on salient actions taking place within imagery data. A connecting link among classical event detection algorithms and the ones proposed for visual retrieval can be considered the article “Event-driven video adaptation: A powerful tool for industrial video supervision”. In this work, Doulamis et. al. proposes a new video adaptation scheme able to update visual content with respect to users’ needs and contextual preferences in terms of network capacity, location, terminal devices properties and accessibility.

The work of “Mining movie archives for song sequences” mines songs from movies’ archives. On the other hand, Chorianopoulos et al. propose in “VideoSkip: event detection in social web videos with an implicit user heuristic” an event-based social web interface for videos. Finally the work entitled “automatic annotation of image databases based on implicit crowdsourcing, visual concept modeling and evolution” introduces new methods for annotating large video databases by exploiting crowdsourcing information on users that implicitly evaluate the visual content.

The Guest Editors express their thanks to the Editor-in-Chief of Multimedia Tools and Applications (MTAP) journal, Prof. Borko Furht for supporting this call. The Guest Editors would also like to give special thanks to all the anonymous Reviewers for their excellent work in reading and judging on the quality of the papers.

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Anastasios Doulamis
    • 1
  • Nikolaos Doulamis
    • 2
  • Luc van Gool
    • 3
  • Mark Nixon
    • 4
  1. 1.Technical University of Crete, KounoupidianaChaniaGreece
  2. 2.National Technical University of AthensZografouGreece
  3. 3.Eidgenoessische Technische Hochschule Zuerich (ETHZ)ZuerichSwitzerland
  4. 4.University of SouthamptonSouthamptonUK

Personalised recommendations