Abstract
We develop a distance metric for clustering and classification algorithms which is invariant to affine transformations and includes priors on the transformation parameters. Such clustering requirements are generic to a number of problems in computer vision.
We extend existing techniques for affine-invariant clustering, and show that the new distance metric outperforms existing approximations to affine invariant distance computation, particularly under large transformations. In addition, we incorporate prior probabilities on the transformation parameters. This further regularizes the solution, mitigating arare but serious tendency of the existing solutions to diverge. For the particular special case of corresponding point sets we demonstrate that the affine invariant measure we introduced may be obtained in closed form.
As an application of these ideas we demonstrate that the faces of the principal cast of a feature film can be generated automatically using clustering with appropriate invariance. This is a very demanding test as it involves detecting and clustering over tens of thousands of images with the variances including changes in viewpoint, lighting, scale and expression.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Y. Amit and D. Geman. A computational model for visual selection. Neural Computation, 11(7):1691–1715, 1999.
M. C. Burl, M. Weber, and P. Perona. A probabilistic approach to object recognition using local photometry and global geometry. In ECCV(2), pages 628–641, 1998.
R. Byrd, R.B. Schnabel, and G. A. Shultz. A trust region algorithm for nonlinearly constrained optimization. SIAM J. Numer. Anal., 24:1152–1170, 1987.
A. R. Conn, N. I. M. Gould, and P. L. Toint. Trust-Region Methods. MPS/SIAM Series on Optimization. SIAM, Philadelphia, 2000.
F. De la Torre and M. J. Black. Robust principal component analysis for computer vision. In Proc. International Conference on Computer Vision, 2001.
I. Dryden and K. Mardia. Statistical shape analysis. John Wiley & Sons, New York, 1998.
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, 1973.
D. Fasulo. An analysis of recent work on clustering algorithms. Technical Report UW-CSE-01-03-02, University of Washington, 1999.
B. Frey and N. Jojic. Transformed component analysis: joint estimation of spatial transformations and image components. In Proc. International Conference on Computer Vision, pages 1190–1196, 1999.
R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521623049, 2000.
E. Hjelmås and B. K. Low. Face detection: A survey. Computer Vision and Image Understanding, 83(3):236–274, 2001.
M. Irani. Multi-frame optical flow estimation using subspace constraints. In ICCV, pages 626–633, 1999.
M. Irani and P. Anandan. About direct methods. In W. Triggs, A. Zisserman, and R. Szeliski, editors, Vision Algorithms: Theory and Practice, volume 1883 of LNCS, pages 267–277. Springer, 2000.
L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, NY, USA, 1990.
Y LeCun, L. Bottou, Y Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
T. Leung and J. Malik. Recognizing surfaces using three-dimensional textons. In Proc. 7th International Conference on Computer Vision, Kerkyra, Greece, pages 1010–1017, Kerkyra, Greece, September 1999.
T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, December 1999.
K. Mikolajczyk, R. Choudhury, and C. Schmid. Face detection in a video sequence — a temporal approach. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2001.
B. A. Olshausen and D.J. Field. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–9, 1996.
W. Press, B. Flannery, S. Teukolsky, and W. Vetterling. Numerical Recipes in C. Cambridge University Press, 1988.
C. Schmid. Constructing models for content-based image retrieval. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2001.
H. Schneiderman and T. Kanade. A histogram-based method for detection of faces and cars. In Proc. ICIP, volume 3, pages 504–507, September 2000.
B. Schölkopf, C. Burges, and V. Vapnik. Incorporating invariances in support vector learning machines. In Articial Neural Networks, ICANN’96, pages 47–52, 1996.
J. Shi and J. Malik. Normalized cuts and image segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 731–743, 1997.
H. Sidenbladh and M. J. Black. Learning image statistics for Bayesian tracking. In Proc. International Conference on Computer Vision, pages II:709–716, 2001.
P. Simard, Y. Le Cun, and J. Denker. Efficient pattern recognition using a new transformation distance. In Advances in Neural Info. Proc. Sys. (NIPS), volume 5, pages 50–57, 1993.
P. Simard, Y. Le Cun, J. Denker, and B. Victorri. Transformation invariance in pattern recognition—tangent distance and tangent propagation. In Lecture Notes in Computer Science, Vol. 1524, pages 239–274. Springer, 1998.
C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization approach. International Journal of Computer Vision, 9(2):137–154, November 1992.
P. H. S. Torr and A. Zisserman. Feature based methods for structure and motion estimation. In W. Triggs, A. Zisserman, and R. Szeliski, editors, Vision Algorithms: Theory and Practice, volume 1883 of LNCS, pages 278–294. Springer, 2000.
K. Toyama and A. Blake. Probabalistic tracking in a metric space. In Proc. International Conference on Computer Vision, pages II, 50–57, 2001.
N. Vasconcelos and A. Lippman. Multiresolution tangent distance for affine-invariant classification. In Advances in Neural Info. Proc. Sys. (NIPS), volume 10, pages 843–849, 1998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fitzgibbon, A., Zisserman, A. (2002). On Affine Invariant Clustering and Automatic Cast Listing in Movies. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds) Computer Vision — ECCV 2002. ECCV 2002. Lecture Notes in Computer Science, vol 2352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47977-5_20
Download citation
DOI: https://doi.org/10.1007/3-540-47977-5_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43746-8
Online ISBN: 978-3-540-47977-2
eBook Packages: Springer Book Archive