A Bayesian approach to object detection using probabilistic appearance-based models
Abstract
In this paper, we introduce a Bayesian approach, inspired by probabilistic principal component analysis (PPCA) (Tipping and Bishop in J Royal Stat Soc Ser B 61(3):611–622, 1999), to detect objects in complex scenes using appearance-based models. The originality of the proposed framework is to explicitly take into account general forms of the underlying distributions, both for the in-eigenspace distribution and for the observation model. The approach combines linear data reduction techniques (to preserve computational efficiency), non-linear constraints on the in-eigenspace distribution (to model complex variabilities) and non-linear (robust) observation models (to cope with clutter, outliers and occlusions). The resulting statistical representation generalises most existing PCA-based models (Tipping and Bishop in J Royal Stat Soc Ser B 61(3):611–622, 1999; Black and Jepson in Int J Comput Vis 26(1):63–84, 1998; Moghaddam and Pentland in IEEE Trans Pattern Anal Machine Intell 19(7):696–710, 1997) and leads to the definition of a new family of non-linear probabilistic detectors. The performance of the approach is assessed using receiver operating characteristic (ROC) analysis on several representative databases, showing a major improvement in detection performances with respect to the standard methods that have been the references up to now.
Keywords
Eigenspace representation Probabilistic PCA Bayesian approach Non-Gaussian models M-estimators Half-quadratic algorithms1 Introduction
A reliable detection of objects in complex scenes exhibiting non-Gaussian noise, clutter and occlusions remains an open issue in many instances. Since the early 1990s, appearance-based representations have met unquestionable success in this field [4, 5]. Global appearance models represent objects using raw 2D brightness images (intensity surfaces), without any feature extraction or construction of complex 3D models. Appearance models have the ability to efficiently encode shape, pose and illumination in a single, compact representation, using data reduction techniques. The early success of global appearance models, in particular, in face recognition [5], has given rise to a very active research field, which has resulted in the ability of recognising 3D objects in databases of more than 100 objects [4] or, more recently, the recognition, with good reliability, of comprehensive object classes (cars, faces) [6] in complex, unstructured environments.
-
They are probabilistic: they make it possible to represent a class of images and make available all the traditional methods of statistical estimation (maximum likelihood, Bayesian approaches)
-
They are linear and, thus, are suited to efficient implementation [11]
-
Although linear, they have outperformed, in terms of detection and recognition, not only the traditional linear approaches (PCA, ICA), but also non-linear approaches (such as neural networks or non-linear kernel PCA), in a recent comparison carried out by Moghaddam [8]
In the Bayesian approach proposed in the present paper, linear (i.e. PCA-based) data reduction techniques are associated to non-linear noise models and non-Gaussian prior models to derive robust and efficient image detectors. The proposed framework unifies different PCA-based models previously proposed in the literature [2, 3]. Our approach straightforwardly integrates non-linear statistical constraints on the distribution of the images in the eigenspace. We show experimentally the importance of an appropriate model for this distribution and its impact on the performance of the detection process. Moreover, the approach enables, when necessary, to introduce robust hypotheses on the distribution of noise, allowing to cope with clutter, outliers and occlusions. This leads to the definition of a novel family of general-purpose detectors that experimentally outperform the existing PCA-based methods [2, 3].
The paper is organised as follows. Section 2 briefly reviews existing PCA-based detection methods. Section 3 describes the different constituents of the proposed Bayesian approach: eigenspace representation, non-linear noise models and non-Gaussian priors. Detection algorithms and implementations are detailed in Sect. 4. Section 5 presents a comparison between the proposed Bayesian detector and several state-of-the-art approaches. Three databases have been used to illustrate the contributions of the various components of the model. An objective assessment is proposed using receiver operating characteristic (ROC) analysis, showing the benefits of the approach.
2 PCA-based statistical detection
Detection, classification and recognition algorithms that use PCA-based subspace representations [3, 5] first relied on the computation of simple, Euclidean distances between the observed image and the training samples. The quadratic distance to the centre of the training class (sum of squared differences (SSD)) or the orthogonal distance to the eigenspace (distance from feature space (DFFS)) have, thus, first been used [3, 5]. None of these distances, however, is satisfying: the first one assigns the same distance to all images belonging to a hyper-sphere, while the second gives the same measure for all observations distributed on spaces which are parallel to the eigenspace. It is, therefore, easy to generate examples that would make these methods fail. A significant improvement has been obtained by recasting the problem in a probabilistic framework [1, 3, 10, 12]. Moghaddam and Pentland [10] proposed a statistical formulation of PCA based on multivariate Gaussian (or mixture-of-Gaussians) distributions. The resulting probabilistic model embeds distance information both in the eigenspace and in its orthogonal. Experimental results by Moghaddam and Pentland [3] and Moghaddam [8] have shown the major contribution of this approach, not only by comparison with SSD and DFFS, but also comparatively to methods based on non-linear representations, such as non-linear PCA or kernel PCA [8]. Tipping and Bishop [1, 12] and Roweis [13] have recently proposed, in independent but similar works, other probabilistic interpretations of PCA, probabilistic PCA and sensible PCA, respectively. These rely on a latent variable model which, assuming Gaussian distributions, yields the same representation as Moghaddam and Pentland [3, 10].
Most of the time in eigenspace methods, the noise distributions have been considered as Gaussian. As is well known, such a hypothesis is seldom verified in practice. Therefore, standard detection and recognition methods based on Gaussian noise models are sensitive to gross errors or outliers stemming, for instance, from partial occlusions or clutter. M-estimators, introduced by Huber [14] in robust statistical estimation, are forgiving about such artifacts. They have, in particular, been used to develop PCA-based robust recognition methods [2]. More recently, an alternative to M-estimation, based on random sampling and hypotheses testing, was proposed to address the problem of robust recognition [15].
Another important limitation of standard methods concerns the a priori modelling of the distribution of the learning images in the eigenspace. In standard PCA-based approaches, these densities are generally considered as Gaussian [1, 2, 3] or uniform, although they are often non-Gaussian (see Sect. 3.5 and [4]). This strongly biases the detection towards the mean image. Thus, modelling complex in-eigenspace distributions remains a key issue for the practical application of eigenspace methods. The first approach proposed to address this problem in the field of visual recognition was described by Murase and Nayar [4]. Murase and Nayar have used an ad hoc B-spline representation of the non-linear manifold corresponding to the distribution of training images in the eigenspace. Locally linear embedding [9], mixtures of Gaussians [3, 12] and other more computationally involved non-linear models [8] have also been considered more recently. Non-linear generalisations of PCA have been developed using auto-associative neural networks [16] or self-organising maps [17]. Neural networks, however, are prone to over-fitting and require the definition of a proper architecture and learning scheme. The notion of a non-linear, low-dimensional manifold passing “through the middle of the data set” was formalised in [18] for two dimensions. Extensions to a larger dimension (which is far from being trivial) have been recently proposed, such as non-linear PCA [19] or probabilistic principal surfaces [20], but their implementation remains involved. Another approach that has become popular in the late 1990s is kernel PCA [21], where an implicit non-linear mapping is applied to the input by means of a Mercer kernel function, before a linear PCA is applied in the mapped feature space. The approach is appealing because its implementation is simpler, since it uses only linear algebra, but the optimal choice of the kernel function is still an open issue.
In this paper, we propose an alternative to these non-linear representations that preserves the linearity of the underlying latent variable model (thus, preserving computational simplicity). The proposed Bayesian framework generalises most PCA-based models previously proposed in the literature. Our approach combines linear data reduction techniques (to preserve computational efficiency), non-Gaussian models on the in-eigenspace distribution (to represent complex variabilities) and robust hypotheses on the distribution of noise (to cope with clutter, outliers and occlusions). The proposed representation is described in the next section.
3 Detection: a Bayesian approach
3.1 Principle of detection
Detection process: extraction of the observation vector at each pixel location (left). Computation of the log-likelihood (centre). Thresholding of the log-likelihood map to locate the object (right)
3.2 A Bayesian framework for the detection
Likelihood
Approximated likelihood
Computational complexity
Assuming the relation shown in Eq. 1, then Eq. 2 or Eq. 3 provide a generic way to compute the likelihood of the observations y for the class of interest \( {\user1{\mathcal{B}}}. \) Whatever the function f and the assumptions for \( {\user1{\mathcal{P}}}{\left( {{\mathbf{w}}|{\user1{\mathcal{B}}}} \right)}\;{\text{and}}\;{\user1{\mathcal{P}}}{\left( {{\mathbf{c}}|{\user1{\mathcal{B}}}} \right)}, \) computing \( {\user1{\mathcal{P}}}{\left( {{\mathbf{y}}|{\user1{\mathcal{B}}}} \right)} \) with Eq. 2 or Eq. 3 can be performed using simulation methods [22]. However, as explained in Sect. 3.1, the likelihood \( {\user1{\mathcal{P}}}{\left( {{\mathbf{y}}|{\user1{\mathcal{B}}}} \right)} \) has to be computed for each observation window extracted from the image. Due to their computation costs, those simulation methods are, therefore, ill-suited in practice.
In order to show the potential of our approach, we limit hereafter the application of this method as follows. First, the relation f in Eq. 1 is chosen to be linear (see Sect. 3.3). Second, the distributions of the noise w and the informative vector c are limited to several hypotheses that we present in detail in Sects. 3.4 and 3.5, respectively. These restrictions allow us to define efficient algorithms for the computation of the likelihood \( {\user1{\mathcal{P}}}{\left( {{\mathbf{y}}|{\user1{\mathcal{B}}}} \right)} \) in Sect. 4.
3.3 Linear projection model f
3.3.1 Eigenspace decomposition
3.3.2 Observation model
3.4 Noise models \( {\user1{\mathcal{P}}}{\left( {{\mathbf{w}}|{\user1{\mathcal{B}}}} \right)} \)
3.5 Prior models in eigenspace \( {\user1{\mathcal{P}}}{\left( {{\mathbf{c}}|{\user1{\mathcal{B}}}} \right)} \)
Uniform distribution
Gaussian distribution
Other prior distributions
Sample training images from the COIL database [25]
AVG database: the mean image of the white traffic signs is learned with its rotation in the image plane (θ denotes the rotation angle)
Some of the A43 database training images
Distribution of the latent variables c of the training images in a 3D eigenspace
Distribution of the latent variables c in the first two planes of the eigenspace for the AVG database. The circular pattern is typical when learning image plane rotation variability [30]
4 Detection algorithms
4.1 Standard detection methods
In this paragraph, we consider the classical Gaussian hypothesis for the noise distribution.
4.1.1 Gaussian noise, uniform prior
4.1.2 Gaussian noise, Gaussian prior
4.2 Robust detection methods
In the following paragraphs, \( {\user1{\mathcal{P}}}{\left( {{\mathbf{y}}|{\mathbf{c}},\;{\user1{\mathcal{B}}}} \right)} \) is no longer assumed to be Gaussian in order to take into account outliers.
4.2.1 Robust noise model, uniform prior
ARTUR or location step with modified weights
LEGEND or the location step with modified residuals [29]
Robust ρ functions used in continuation [24]
| Acronym | ρ(x) | Convexity |
|---|---|---|
| HS | \( 2{\sqrt {1 + x^{2} } } - 2 \) | Convex |
| HL | log(1+x2) | Non-convex |
| GM | x2/(1+x2) | Non-convex |
4.2.2 Robust noise model, Gaussian prior
4.3 Detection using non-Gaussian distributions
5 Experimental results
This section is devoted to the assessment of the different detectors described previously by using the three databases presented in Sect. 3.5: COIL, AVG and A43. Test images have been created from occurrences of the objects of interest by embedding the objects in various textured backgrounds, with large occlusions (see for instance Figs. 7 and 11). ROC curves enable an objective comparison of the different detectors. These are plots of the true positive rate against the false alarm rate. In our case, the former is defined as the ratio of correct decisions to the total number of occurrences of the objects, while the latter is the ratio of the number of incorrect decisions to the total number of possible false alarms (i.e. the locations where no object is present in the images—roughly, the size of the images × the number of images). The correctness of detection is assessed using the following rule: since the exact position of the object of interest is known, the detection is considered to be correct if it occurs in the 8-neighbourhood of the true solution (i.e. a 1-pixel tolerance in accuracy). Note that the detection is performed by simple thresholding of the likelihood map, without any kind of post-processing. For better visualisation, the ROC curves presented hereafter are plotted on a semi-logarithmic scale.
Proposed detectors and their underlying assumptions
| \( {\user1{\mathcal{P}}}{\left( {{\mathbf{c}}|{\user1{\mathcal{B}}}} \right)} \) | ||||
|---|---|---|---|---|
| Uniform | Gaussian | Non-Gaussian | ||
| \( {\user1{\mathcal{P}}}{\left( {{\mathbf{w}}|{\user1{\mathcal{B}}}} \right)} \) | Gaussian | GU(DFFS) | GG | – |
| Robust | RU | RG | RNG | |
5.1 Importance of a robust noise model
This first experiment compares the detectors based on Gaussian and non-Gaussian noise assumptions. Let us recall that the latter allows the presence of outliers in the observations.
We use the COIL database with J=5. The test set collects 21 scenes (300×200 pixels each) containing 57 occurrences of the objects of interest, with partial occlusions and cluttered background (see Fig. 7).
Examples of test scenes and their log-likelihood maps computed using the GU, GG, RU detectors and the complete model RNG. Bright intensities correspond to a high likelihood value. COIL database, J=5
A non-Gaussian prior distribution is taken into account in the RNG detector (see Eq. 25), which is based on a non-Gaussian noise model and on a Parzen window estimate of the prior density. As can be seen, the introduction of an appropriate prior in the RNG detector slightly improves the results of RU (see Figs. 7 and 8). Overall, the RNG model leads to the best results—significantly better than detectors based on Gaussian assumptions only.
Remark
Receiver operating characteristic (ROC) curves for the robust detector RU for different values of J, COIL database
Influence of observation noise (SNR=22 dB) on the robust detector RU, COIL database, J=10
5.2 Importance of the prior model (I)
This second experiment shows that a careful modelling of the prior distribution in the eigenspace may be of great benefit to detection performances.
Log-likelihood maps for scenes I2 and I17. Bright intensities correspond to a high likelihood value. AVG database, J=20
5.3 Importance of the prior model (II)
This last experiment is another illustration of the importance of an accurate modelling of the eigenspace distribution, this time in the case of a more general form of the underlying pdf.
Examples of likelihood maps (bright intensities correspond to a high likelihood value). A43 database, J=30
Receiver operating characteristic (ROC) curves for the GU and RU detectors and for the complete model (RNG). A43 database, J=30
For the RNG detector, as already explained, the distribution in the eigenspace is modelled using Parzen windows with Gaussian kernels (see Sect. 4.3). We use the approximation in Eq. 23: the likelihood is computed using a robust ML estimate of the latent variables c. Besides, a high weight is given to the prior term. The corresponding detection maps are presented Fig. 13. They allow an easy localisation of the target objects, which appear as bright spots on the likelihood maps. Figure 14 shows the corresponding ROC curve. One can readily see the improvement brought by the complete RNG model: more than 70% of the objects of interest are detected before the first false alarms appear. An accurate model for the distribution in the eigenspace is, therefore, essential.
6 Conclusion
In this paper, we have presented a novel Bayesian approach for object detection using global appearance-based representations. The proposed framework combines non-Gaussian noise models with general, non-linear assumptions about the distribution of latent variables in the eigenspace. Non-Gaussian noise models yield robust estimators, which can deal with severely degraded occurrences of objects. A key feature of the proposed approach is its ability to embed non-linear priors on the eigenspace in a linear latent variable representation. This significantly improves the performances of the detector in critical situations.
This work finally unifies several standard detection methods proposed in the literature and leads to the definition of a new family of probabilistic detectors able to cope with complex object distributions and adverse situations, such as cluttered backgrounds, partial occlusions or corrupted observations.
7 Originality and contribution
In this paper, we are interested in a particular class of appearance-based representations [4, 5], namely probabilistic appearance models [3, 10]. They can represent large classes of images and make available all the traditional methods of statistical estimation.
Our Bayesian model is inspired by the latent variable representation proposed by Tipping and Bishop [1] in the Gaussian case (namely, probabilistic PCA, or PPCA for short). The originality of our approach is that it explicitly takes into account general, non-Gaussian forms of the underlying distributions, both for the prior and for the observation model. In particular, it straightforwardly integrates non-linear models for the distribution of the images in the eigenspace. Thus, it deviates from standard PPCA, and, in particular, the parameters of the models are no longer maximum likelihood estimates. The benefit of our approach is its ability to better represent the complex distributions that may occur in practical applications. The proposed framework also unifies the main PCA-based models mentioned in the literature [2, 3].
The performance of the approach has been assessed using receiver operating characteristic (ROC) curve analysis on several representative databases. The experimental results clearly show the impact of an appropriate model for the in-eigenspace distribution on the performances of the detection process. Moreover, the approach also enables, when necessary, to introduce robust hypotheses on the distribution of noise, allowing to cope with clutter, outliers and occlusions, which is also illustrated by the experimental results.
The main contribution of the paper is, thus, the definition of a novel family of general purpose detectors that experimentally compare favourably with several state-of-the-art PCA-based detectors recently described in the literature.
8 About the authors
Rozenn Dahyot received the diploma of the general engineering school ENSPS in France and an MSc (DEA) in computer vision from the University of Strasbourg in 1998. She gained her Ph.D. in image processing from the University of Strasbourg, France in 2001. She is currently a research associate in the Department of Statistics at Trinity College, Dublin, Ireland. Her research interests concern multimedia understanding, object or event detection and recognition and statistical learning, amongst others.
Pierre Charbonnier obtained his engineering degree (1991) and his Ph.D. qualification (1994) from the University of Nice-Sophia Antipolis, France. He is currently a senior researcher (“Chargé de Recherche”) for the French Ministry of Equipment, Transport and Housing at the Laboratoire Régional des Ponts et Chaussées in Strasbourg (ERA 27 LCPC), France. His interests include statistical models and deformable models applied to image analysis.
Fabrice Heitz received his engineering degree in electrical engineering and telecommunications from Telecom Bretagne, France, in 1984 and his Ph.D. degree from Telecom Paris, France, in 1988. From 1988 until 1994, he was with INRIA Rennes as a senior researcher in image processing and computer vision. He is now a professor at Ecole Nationale Superieure de Physique, Strasbourg, (Image Science, Computer Science and Remote Sensing Laboratory LSIIT UMR CNRS 7005), France. His research interests include statistical image modelling, image sequence analysis and medical image analysis. Professor Heitz was an associate editor for the IEEE Transactions on Image Processing journal from 1996 to 1999. He is currently the assistant director of LSIIT.
Notes
Acknowledgements
This work was supported by a Ph.D. grant awarded by the Laboratoire Central des Ponts-et-Chaussées, France.
References
- 1.Tipping ME, Bishop CM (1999) Probabilistic principal component analysis. J Roy Stat Soc B 61(3):611–622CrossRefMATHGoogle Scholar
- 2.Black MJ, Jepson AD (1998) Eigentracking: robust matching and tracking of articulated objects using a view-based representation. Int J Comput Vis 26(1):63–84CrossRefGoogle Scholar
- 3.Moghaddam B, Pentland A (1997) Probabilistic visual learning for object representation. IEEE Trans Pattern Anal Machine Intell 19(7):696–710CrossRefGoogle Scholar
- 4.Murase H, Nayar SK (1995) Visual learning and recognition of 3-D objects from appearance. Int J Comput Vis 14(1):5–24Google Scholar
- 5.Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86Google Scholar
- 6.Schneiderman H (2000) A statistical approach to 3D object detection applied to faces and cars. PhD thesis, Carnegie Mellon University, Pittsburg, PennsylvaniaGoogle Scholar
- 7.Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New YorkGoogle Scholar
- 8.Moghaddam B (2002) Principal manifolds and Bayesian subspaces for visual recognition. IEEE Trans Pattern Anal Machine Intell 24(6):780–788CrossRefGoogle Scholar
- 9.Saul LK, Roweis ST (2003) Think globally, fit locally: unsupervised learning of low dimensional manifolds. J Machine Learn Res 4:119–155CrossRefGoogle Scholar
- 10.Moghaddam B, Pentland A (1995) Probabilistic visual learning for object detection. In: Proceedings of the 5th international conference on computer vision, Cambridge, Massachusetts, June 1995, pp 786–793Google Scholar
- 11.Hamdan R, Heitz F, Thoraval L (2003) A low complexity approximation of probabilistic appearance models. Pattern Recogn 36(5):1107–1118CrossRefMATHGoogle Scholar
- 12.Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analysers. Neural Comput 11(2):443–482CrossRefGoogle Scholar
- 13.Roweis ST (1998) EM algorithms for PCA and SPCA. In: Jordan, MI, Kearns MJ, Solla SA (eds) Advances in neural information processing systems, vol 10. MIT Press, Cambridge, Massachusetts, pp 626–632Google Scholar
- 14.Huber PJ (1981) Robust statistics. Wiley, New YorkGoogle Scholar
- 15.Leonardis A, Bischof H (2000) Robust recognition using eigenimages. Comput Vis Image Und, CVIU 7(1):99–118Google Scholar
- 16.Kramer MA (1991) Nonlinear principal component analysis using autoassociative neural networks. AiChe J 32(2):233–243CrossRefGoogle Scholar
- 17.Kohonen T (2001) Self-organizing maps, vol 30, 3rd edn. Springer, Berlin Heidelberg New YorkGoogle Scholar
- 18.Hastie T, Stuetzle W (1989) Principal curves. J Am Stat Assoc 84(406):502–516MATHGoogle Scholar
- 19.Chalmond B, Girard S (1999) Nonlinear modeling of scattered multivariate data and its application to shape change. IEEE Trans Pattern Anal Machine Intell 21(5):422–432CrossRefGoogle Scholar
- 20.Chang K, Ghosh J (2001) A unified model for probabilistic principal surfaces. IEEE Trans Pattern Anal Machine Intell 23(1):22–41CrossRefMATHGoogle Scholar
- 21.Scholkopf B, Smola A, Muller K (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319CrossRefGoogle Scholar
- 22.Bernardo JM, Smith AF (2000) Bayesian theory. Wiley, New YorkGoogle Scholar
- 23.MacKay DJC (1995) Probable network and plausible predictions—a review of practical Bayesian methods for supervised neural networks. Network–Comput Neural 6(3):469–505MATHGoogle Scholar
- 24.Dahyot R, Charbonnier P, Heitz F (2000) Robust visual recognition of colour images. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR 2000), Hilton Head Island, South Carolina, June 2000, vol 1, pp 685–690Google Scholar
- 25.Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL-20). Technical report CUCS-005-96, Department of Computer Science, Columbia UniversityGoogle Scholar
- 26.Park RH (2002) Comments on “optimal approximation of uniformly rotated images: relationship between Karhunen-Loeve expansion and discrete cosine transform.” IEEE Trans Image Processing 11(3):332–334CrossRefMathSciNetGoogle Scholar
- 27.Charbonnier P, Blanc-Féraud L, Aubert G, Barlaud M (1994) Two deterministic half-quadratic regularization algorithms for computed imaging. In: Proceedings IEEE international conference on image processing (ICIP’94), Austin, Texas, November 1994, pp 168–172Google Scholar
- 28.Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1995) Numerical recipes in C: the art of scientific computing. Cambridge University Press, Cambridge, UKGoogle Scholar
- 29.Dahyot R (2001) Appearance-based road scene video analysis for the management of the road network (in French). PhD thesis, Université Louis Pasteur Strasbourg, FranceGoogle Scholar
- 30.Jogan M, Leonardis A (2001) Parametric eigenspace representations of panoramic images. In: Proceedings of the 10th international conference on advanced robotics (ICAR 2001), 2nd workshop on omnidirectional vision applied to robotic orientation and nondestructive testing (NDT), Budapest, Hungary, August 2001, pp 31–36Google Scholar