Sequential Learning of Layered Models from Video

Titsias, Michalis K.; Williams, Christopher K. I.

doi:10.1007/11957959_29

Michalis K. Titsias²⁰ &
Christopher K. I. Williams²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4170))

2739 Accesses

Abstract

A popular framework for the interpretation of image sequences is the layers or sprite model, see e.g. [15], [6] . Jojic and Frey [8] provide a generative probabilistic model framework for this task, but their algorithm is slow as it needs to search over discretized transformations (e.g. translations, or affines) for each layer simultaneously. Exact computation with this model scales exponentially with the number of objects, so Jojic and Frey used an approximate variational algorithm to speed up inference. Williams and Titsias [16] proposed an alternative sequential algorithm for the extraction of objects one at a time using a robust statistical method, thus avoiding the combinatorial explosion.

In this chapter we elaborate on our sequential algorithm in the following ways: Firstly, we describe a method to speed up the computation of the transformations based on approximate tracking of the multiple objects in the scene. Secondly, for sequences where the motion of an object is large so that different views (or aspects) of the object are visible at different times in the sequence, we learn appearance models of the different aspects. We demonstrate our method on four video sequences, including a sequence where we learn articulated parts of a human body.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allan, M., Titsias, M.K., Williams, C.K.I.: Fast Learning of Sprites using Invariant Features. In: Proceedings of the British Machine Vision Conference 2005, pp. 40–49 (2005)
Google Scholar
Black, M.J., Jepson, A.D.: EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation. In: Proc. ECCV, pp. 329–342 (1996)
Google Scholar
Darrell, T., Pentland, A.P.: Cooperative Robust Estimation Using Layers of Support. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(5), 474–487 (1995)
Article Google Scholar
Fitzgibbon, A., Zisserman, A.: On Affine Invariant Clustering and Automatic Cast Listing in Movies. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 304–320. Springer, Heidelberg (2002)
Google Scholar
Frey, B.J., Jojic, N.: Transformation Invariant Clustering Using the EM Algorithm. IEEE Trans Pattern Analysis and Machine Intelligence 25(1), 1–17 (2003)
Article Google Scholar
Irani, M., Rousso, B., Peleg, S.: Computing Occluding and Transparent Motions. International Journal of Computer Vision 12(1), 5–16 (1994)
Article Google Scholar
Jepson, A.D., Fleet, D.J., Black, M.J.: A Layered Motion Representation with Occlusion and Compact Spatial Support. In: ECCV 2002. LNCS, vol. 2353, pp. 692–706. Springer, Heidelberg (2002)
Google Scholar
Jojic, N., Frey, B.J.: Learning Flexible Sprites in Video Layers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2001. IEEE Computer Society Press, Kauai (2001)
Google Scholar
Kumar, M.P., Torr, P.H.S., Zisserman, A.: Learning layered pictorial structures from video. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, pp. 158–163 (2004)
Google Scholar
Sawhney, H.S., Ayer, S.: Compact Representations of Videos Through Dominant and Multiple Motion Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(8), 814–830 (1996)
Article Google Scholar
Tao, H., Sawhney, H.S., Kumar, R.: Dynamic Layer Representation with Applications to Tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.II: 134–141 (2000)
Google Scholar
Titsias, M.K., Williams, C.K.I.: Fast unsupervised greedy learning of multiple objects and parts from video. In: Proc. Generative-Model Based Vision Workshop (2004)
Google Scholar
Titsias, M.K.: Unsupervised Learning of Multiple Objects in Images. Ph.D thesis, School of Informatics, University of Edinburgh (2005)
Google Scholar
Torr, P.H.S.: Geometric motion segmentation and model selection. Phil. Trans. Roy. Soc. Lond. A 356, 1321–1340 (1998)
Article MATH MathSciNet Google Scholar
Wang, J.Y.A., Adelson, E.H.: Representing Moving Images with Layers. IEEE Transactions on Image Processing 3(5), 625–638 (1994)
Article Google Scholar
Williams, C.K.I., Titsias, M.K.: Greedy Learning of Multiple Objects in Images using Robust Statistics and Factorial Learning. Neural Computation 16(5), 1039–1062 (2004)
Article MATH Google Scholar
Wills, J., Agarwal, S., Belongie, S.: What Went Where. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2003, pp.I: 37–44 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, University of Edinburgh, Edinburgh, EH1 2QL, UK
Michalis K. Titsias & Christopher K. I. Williams

Authors

Michalis K. Titsias
View author publications
You can also search for this author in PubMed Google Scholar
Christopher K. I. Williams
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Département d’Informatique, Ecole Normale Supérieure, P.O. Box, Paris, France
Jean Ponce
Carnegie Mellon University, Pittsburgh, USA
Martial Hebert
GRAVIR-INRIA, 655 avenue de l’Europe, P.O. Box, 38330, Montbonnot, France
Cordelia Schmid
Department of Engineering Science, University of Oxford, Parks Road, OX1 3PJ, Oxford, UK
Andrew Zisserman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Titsias, M.K., Williams, C.K.I. (2006). Sequential Learning of Layered Models from Video. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds) Toward Category-Level Object Recognition. Lecture Notes in Computer Science, vol 4170. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11957959_29

Download citation

DOI: https://doi.org/10.1007/11957959_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68794-8
Online ISBN: 978-3-540-68795-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics