Computer vision is starting to become practical and successful. The academic computer vision community has grown immensely in recent years and there is a rapidly growing computer vision industry ranging from high-tech giants to humble start-ups. In particular, the technology that computer vision relies on—computers, the internet, and cameras—is ever-improving in quality and power. Indeed the advances in technology alone may lead future historians of computer vision to conclude that researchers in the 1970’s and 1980’s were crazy romantics to address such a difficult problem with the hardware available at that time.

Nevertheless the rapid growth of computer vision and its freewheeling interdisciplinary nature, which continually borrows and adapts techniques from different disciplines, has led to its own dangers. The field risks being factionalized into groups of researchers using different techniques and with the same ideas being re-invented under different names. The experimental methodology can also be criticized by sometimes valuing complex models which yield gains of a few percentage points in performance on a dataset rather than simpler and more insightful models. Hence, as computer vision matures there is growing need for papers which address the fundamental issues of computer vision in a rigorous mathematics manner and which are carefully tested by well-designed experiments.

This special issue contains seven papers which satisfy these requirements. They range over some of the major topics in computer vision—geometry, lighting, learning, probabilistic modeling—and help illustrate the richness of the field, the power and sophistication of the techniques being used, and the practical success on real world tasks.

Three of the papers address the fundamental issues of geometry, lighting, and the interactions between them. Firstly, “A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization” (NRSfM) contains an elegant mathematical analysis of the problem of non-rigid structure from motion using the factorization approach. This analysis shows that the problem can be solved “prior-free” and leads naturally to a practical algorithm for NRSfM. Secondly, “Decomposing global light transport using time of flight imaging” shows how to decompose time of flight videos into direct, subsurface scattering, and interreflection components. This can be applied to recover projective depth from the direct component in the presence of global scattering, to identify and label different types of global illumination effects, to measure the parameters of subsurface scattering materials from a single point of view; to performi edge detection, and to adjust subsurface scattering to render novel images of the scene. Thirdly, “A Closed-Form, Consistent and Robust Solution to Uncalibrated Photometric Stereo Via Local Diffuse Reflectance Maxima” addresses the classic Generalized Bas Relief (GBR) ambiguity which involves both lighting and geometry. This paper contains insightful mathematical analysis of the interaction of geometry and lighting. They introduce the concept of LDR maxima and shows that this can yield a closed form solution for the GBR parameters which is robust, consistent, and gives good results on real world scenes.

The remaining four papers involve learning which is, arguably, the most important factor in the growing success of computer vision systems. The first two papers—“The Shape Boltzmann Machine: A Strong Model of Object Shape” and “Face Alignment by Explicit Shape Regression”—address the topics of shape modeling and object alignment. The Shape Boltzmann Machine (SBM) is a model for shape, based on deep Boltzmann machines, which is learnt from training data and is used for segmentation, detection, inpainting and graphics. The SBM capture shape sufficiently well that samples from it look realistic. By contrast, the paper on face alignment uses a regression approach to address this classic and important problem. This method significantly outperforms alternative methods is speed and accuracy. The second two papers apply learning to motion problems. The first paper, on “Max-Margin Early Event Detector’, using the maximum-margin framework to train temporal event detectors to recognize partial events, enabling early detection. This work extends structured output SVM to sequential data. This method is successfully applied to detect facial expressions, hand gestures, and human activities. The second paper on “Multi-Target Tracking by Online Learning a CRF Model of Appearance and Motion Patterns” applies online learning to multi-target tracking. The tracking problem is formulated using an online learned CRF model containing unary functions based on motion and appearance models for discriminating targets, together with binary functions which differentiate g pairs of tracklets.