Sequential Learning of Layered Models from Video

  • Michalis K. Titsias
  • Christopher K. I. Williams
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4170)


A popular framework for the interpretation of image sequences is the layers or sprite model, see e.g. [15], [6] . Jojic and Frey [8] provide a generative probabilistic model framework for this task, but their algorithm is slow as it needs to search over discretized transformations (e.g. translations, or affines) for each layer simultaneously. Exact computation with this model scales exponentially with the number of objects, so Jojic and Frey used an approximate variational algorithm to speed up inference. Williams and Titsias [16] proposed an alternative sequential algorithm for the extraction of objects one at a time using a robust statistical method, thus avoiding the combinatorial explosion.

In this chapter we elaborate on our sequential algorithm in the following ways: Firstly, we describe a method to speed up the computation of the transformations based on approximate tracking of the multiple objects in the scene. Secondly, for sequences where the motion of an object is large so that different views (or aspects) of the object are visible at different times in the sequence, we learn appearance models of the different aspects. We demonstrate our method on four video sequences, including a sequence where we learn articulated parts of a human body.


Video Sequence Training Image Layered Model Tracking Algorithm Sequential Learn 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allan, M., Titsias, M.K., Williams, C.K.I.: Fast Learning of Sprites using Invariant Features. In: Proceedings of the British Machine Vision Conference 2005, pp. 40–49 (2005)Google Scholar
  2. 2.
    Black, M.J., Jepson, A.D.: EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation. In: Proc. ECCV, pp. 329–342 (1996)Google Scholar
  3. 3.
    Darrell, T., Pentland, A.P.: Cooperative Robust Estimation Using Layers of Support. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(5), 474–487 (1995)CrossRefGoogle Scholar
  4. 4.
    Fitzgibbon, A., Zisserman, A.: On Affine Invariant Clustering and Automatic Cast Listing in Movies. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 304–320. Springer, Heidelberg (2002)Google Scholar
  5. 5.
    Frey, B.J., Jojic, N.: Transformation Invariant Clustering Using the EM Algorithm. IEEE Trans Pattern Analysis and Machine Intelligence 25(1), 1–17 (2003)CrossRefGoogle Scholar
  6. 6.
    Irani, M., Rousso, B., Peleg, S.: Computing Occluding and Transparent Motions. International Journal of Computer Vision 12(1), 5–16 (1994)CrossRefGoogle Scholar
  7. 7.
    Jepson, A.D., Fleet, D.J., Black, M.J.: A Layered Motion Representation with Occlusion and Compact Spatial Support. In: ECCV 2002. LNCS, vol. 2353, pp. 692–706. Springer, Heidelberg (2002)Google Scholar
  8. 8.
    Jojic, N., Frey, B.J.: Learning Flexible Sprites in Video Layers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2001. IEEE Computer Society Press, Kauai (2001)Google Scholar
  9. 9.
    Kumar, M.P., Torr, P.H.S., Zisserman, A.: Learning layered pictorial structures from video. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, pp. 158–163 (2004)Google Scholar
  10. 10.
    Sawhney, H.S., Ayer, S.: Compact Representations of Videos Through Dominant and Multiple Motion Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(8), 814–830 (1996)CrossRefGoogle Scholar
  11. 11.
    Tao, H., Sawhney, H.S., Kumar, R.: Dynamic Layer Representation with Applications to Tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.II: 134–141 (2000)Google Scholar
  12. 12.
    Titsias, M.K., Williams, C.K.I.: Fast unsupervised greedy learning of multiple objects and parts from video. In: Proc. Generative-Model Based Vision Workshop (2004)Google Scholar
  13. 13.
    Titsias, M.K.: Unsupervised Learning of Multiple Objects in Images. Ph.D thesis, School of Informatics, University of Edinburgh (2005)Google Scholar
  14. 14.
    Torr, P.H.S.: Geometric motion segmentation and model selection. Phil. Trans. Roy. Soc. Lond. A 356, 1321–1340 (1998)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Wang, J.Y.A., Adelson, E.H.: Representing Moving Images with Layers. IEEE Transactions on Image Processing 3(5), 625–638 (1994)CrossRefGoogle Scholar
  16. 16.
    Williams, C.K.I., Titsias, M.K.: Greedy Learning of Multiple Objects in Images using Robust Statistics and Factorial Learning. Neural Computation 16(5), 1039–1062 (2004)MATHCrossRefGoogle Scholar
  17. 17.
    Wills, J., Agarwal, S., Belongie, S.: What Went Where. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2003, pp.I: 37–44 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Michalis K. Titsias
    • 1
  • Christopher K. I. Williams
    • 1
  1. 1.School of InformaticsUniversity of EdinburghEdinburghUK

Personalised recommendations