Computer Vision – ECCV 2014

Volume 8693 of the series Lecture Notes in Computer Science pp 533-548

Stixmantics: A Medium-Level Model for Real-Time Semantic Scene Understanding

  • Timo ScharwächterAffiliated withEnvironment Perception, Daimler R&DDepartment of Computer Science, TU Darmstadt
  • , Markus EnzweilerAffiliated withEnvironment Perception, Daimler R&D
  • , Uwe FrankeAffiliated withEnvironment Perception, Daimler R&D
  • , Stefan RothAffiliated withDepartment of Computer Science, TU Darmstadt

* Final gross prices may vary according to local VAT.

Get Access


In this paper we present Stixmantics, a novel medium-level scene representation for real-time visual semantic scene understanding. Relevant scene structure, motion and object class information is encoded using so-called Stixels as primitive elements. Sparse feature-point trajectories are used to estimate the 3D motion field and to enforce temporal consistency of semantic labels. Spatial label coherency is obtained by using a CRF framework.

The proposed model abstracts and aggregates low-level pixel information to gain robustness and efficiency. Yet, enough flexibility is retained to adequately model complex scenes, such as urban traffic. Our experimental evaluation focuses on semantic scene segmentation using a recently introduced dataset for urban traffic scenes. In comparison to our best baseline approach, we demonstrate state-of-the-art performance but reduce inference time by a factor of more than 2,000, requiring only 50 ms per image.


semantic scene understanding bag-of-features region classification real-time stereo vision stixels