Semantic Road Segmentation via Multi-scale Ensembles of Learned Features
Semantic segmentation refers to the process of assigning an object label (e.g., building, road, sidewalk, car, pedestrian) to every pixel in an image. Common approaches formulate the task as a random field labeling problem modeling the interactions between labels by combining local and contextual features such as color, depth, edges, SIFT or HoG. These models are trained to maximize the likelihood of the correct classification given a training set. However, these approaches rely on hand–designed features (e.g., texture, SIFT or HoG) and a higher computational time required in the inference process.
Therefore, in this paper, we focus on estimating the unary potentials of a conditional random field via ensembles of learned features. We propose an algorithm based on convolutional neural networks to learn local features from training data at different scales and resolutions. Then, diversification between these features is exploited using a weighted linear combination. Experiments on a publicly available database show the effectiveness of the proposed method to perform semantic road scene segmentation in still images. The algorithm outperforms appearance based methods and its performance is similar compared to state–of–the–art methods using other sources of information such as depth, motion or stereo.
KeywordsConvolutional Neural Network Conditional Random Field Weighted Linear Combination Unary Potential Average Recognition Rate
Unable to display preview. Download preview PDF.
- 3.Brostow, G.J., Fauqueur, J., Cipolla, R.: Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters (2008)Google Scholar
- 5.Floros, G., Rematas, K., Leibe, B.: Multi-class image labeling with top-down segmentation and generalized robust pn potentials. In: BMVC 2011 (2011)Google Scholar
- 7.Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Scene parsing with multiscale feature learning, purity trees, and optimal covers. In: ICML 2012 (2012)Google Scholar
- 9.LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time-series. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks. MIT Press (1995)Google Scholar
- 10.Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience (2004)Google Scholar
- 11.Sturgess, P., Alahari, K., Ladicky, L., Torr, P.H.S.: Combining appearance and structure from motion features for road scene understanding. In: BMVC 2009 (2009)Google Scholar
- 12.Levinshtein, A., Stere, A., Kutulakos, K., Fleet, D., Dickinson, S., Siddiqi, K.: Turbopixels: Fast superpixels using geometric flows. PAMI 31 (2009)Google Scholar
- 13.Domke, J.: Graphical models toolbox, http://phd.gccis.rit.edu/justindomke/JGMT/ (accessed July 31, 2012)
- 15.Farabet, C., LeCun, Y., Kavukcuoglu, K., Culurciello, E., Martini, B., Akselrod, P., Talay, S.: Large-scale FPGA-based convolutional networks. In: Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press (2011)Google Scholar