Abstract
Object identification is a specialized type of recognition in which the category (e.g. cars) is known and the goal is to recognize an object’s exact identity (e.g. Bob’s BMW). Two special challenges characterize object identification. First, inter-object variation is often small (many cars look alike) and may be dwarfed by illumination or pose changes. Second, there may be many different instances of the category but few or just one positive “training” examples per object instance. Because variation among object instances may be small, a solution must locate possibly subtle object-specific salient features, like a door handle, while avoiding distracting ones such as specular highlights. With just one training example per object instance, however, standard modeling and feature selection techniques cannot be used. We describe an on-line algorithm that takes one image from a known category and builds an efficient “same” versus “different” classification cascade by predicting the most discriminative features for that object instance. Our method not only estimates the saliency and scoring function for each candidate feature, but also models the dependency between features, building an ordered sequence of discriminative features specific to the given image. Learned stopping thresholds make the identifier very efficient. To make this possible, category-specific characteristics are learned automatically in an off-line training procedure from labeled image pairs of the category. Our method, using the same algorithm for both cars and faces, outperforms a wide variety of other methods.
Similar content being viewed by others
References
Amit, Y., & Geman, D. (1999). A computational model for visual selection. Neural Computation, 11(7), 1691–1715.
Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. Fisherfaces: Recognition using class specific linear projections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 711–720.
Belongie, S., Malik, J., & Puzicha, J. (2001). Matching shapes. In International conference on computer vision (pp. 454–463).
Berg, T. L., Berg, A. C., Edwards, J., Maire, M., White, R., Teh, Y. W., Learned-Miller, E., & Forsyth, D. A. (2004). Names and faces in the news. Computer Vision and Pattern Recognition, 2, 848–854.
Berg, A., Berg, T., & Malik, J. (2005). Shape matching and object recognition using low distortion correspondence. In CVPR (pp. 26–33).
Bernstein, E. J., & Amit, Y. (2005). Part-based statistical models for object classification and detection. In IEEE Computer vision and pattern recognition (pp. 734–740).
Blanz, V., Romdhani, S., & Vetter, T. (2002). Face identification across different poses and illuminations with a 3d morphable model. In Proceedings of the 5th international conference on automatic face and gesture recognition (pp. 202–207).
Bolme, D., Beveridge, R., Teixeira, M., & Draper, B. (2003). The CSU face identification evaluation system: Its purpose, features and structure. In ICVS (pp. 128–138).
Diamond, R., & Carey, S. (1986). Why faces are and are not special: An effect of expertise. Journal of Experimental Psychology, 115, 107–117.
Dork, G., & Schmid, C. (2005). Object class recognition using discriminative local features (Technical Report RR-5497). INRIA Rhone-Alpes.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407–499.
Fei-Fei, L., Fergus, R., & Perona, P. (2003). A Bayesian approach to unsupervised one-shot learning of object categories. In International conference on computer vision (Vol. 2, pp. 1134–1141).
Fleuret, F. (2004). Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 5, 1531–1555.
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In 13th international conference on machine learning (pp. 148–156).
Heisele, B., Poggio, T., & Pontil, M. (2000). Face detection in still gray images (A.I. Memo No. 521). Massachusetts Institute of Technology Artificial Intelligence Lab, May 2000.
Jain, V., Ferencz, A., & Learned-Miller, E. (2006). Discriminative training of hyper-feature models for object identification. In British machine vision conference (Vol. 1, pp. 357–366).
John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In International conference on machine learning (pp. 121–129).
Kadir, T., & Brady, M. (2001). Scale, saliency and image description. International Journal of Computer Vision, 45(2), 83–105.
Kibble, W. F. (1941). A two-variate gamma type distribution. Sankhya, 5, 137–150.
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. London: Chapman and Hall.
Miller, E. G., Matsakis, N. E., & Viola, P. A. (2000). Learning from one example through shared densities on transforms. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 464–471).
Moghaddam, B., Jebara, T., & Pentland, A. (2000). Bayesian face recognition. Pattern Recognition, 33, 1771–1782.
Mori, G., Belongie, S., & Malik, J. (2001). Shape contexts enable efficient retrieval of similar shapes. In CVPR (pp. 723–730).
Schneiderman, H., & Kanade, T. (2000). A statistical approach to 3d object detection applied to faces and cars. In CVPR (pp. 1746–1759).
Shental, N., Bar-Hillel, A., Hertz, T., & Weinshall, D. (2003). Computing Gaussian mixture models with EM using equivalence constraints. In NIPS.
Shental, N., Hertz, T., Weinshall, D., & Pavel, M. (2002). Adjustment learning and relevant component analysis. In ECCV.
Tarr, M., & Gauthier, I. (2000). FFA: A flexible fusiform area for subordinate-level visual processing automatized by expertise. Nature Neuroscience, 3(8), 764–769.
Thrun, S. (1996). Explanation-based neural network learning: A lifelong learning approach. Dordrecht: Kluwer.
Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cogntive Neuroscience, 3(1), 71–86.
Vidal-Naquet, M., & Ullman, S. (2003). Object recognition with informative features and linear classification. In International conference on computer vision (pp. 281–288).
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE conference on computer vision and pattern recognition (pp. 511–518).
Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. European Conference on Computer Vision, 1, 18–32.
Wiskott, L., Fellous, J., Krüger, N., & von der Malsburg, C. (1997). Face recognition by elastic bunch graph matching. Proceedings 7th International Conference on Computer Analysis of Images and Patterns, 19(7), 775–779.
Xing, E., Ng, A., Jordan, M., & Russell, S. (2002). Distance metric learning with application to clustering with side-information. In Advances in neural information processing systems.
Zhao, W., Chellappa, R., Rosenfeld, A., & Phillips, P. (2003). Face recognition: A literature survey. ACM Computing Surveys, 35(4), 399–458.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ferencz, A., Learned-Miller, E.G. & Malik, J. Learning to Locate Informative Features for Visual Identification. Int J Comput Vis 77, 3–24 (2008). https://doi.org/10.1007/s11263-007-0093-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-007-0093-5