A Probabilistic Framework for Spatio-Temporal Video Representation & Indexing

  • Hayit Greenspan
  • Jacob Goldberger
  • Arnaldo Mayer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2353)


In this work we describe a novel statistical video representation and modeling scheme. Video representation schemes are needed to enable segmenting a video stream into meaningful video-objects, useful for later indexing and retrieval applications. In the proposed methodology, unsupervised clustering via Guassian mixture modeling extracts coherent space-time regions in feature space, and corresponding coherent segments (video-regions) in the video content. A key feature of the system is the analysis of video input as a single entity as opposed to a sequence of separate frames. Space and time are treated uniformly. The extracted space-time regions allow for the detection and recognition of video events. Results of segmenting video content into static vs. dynamic video regions and video content editing are presented.


Feature Space Video Sequence Gaussian Mixture Model Video Content Minimum Description Length 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    S. Belongie, C. Carson, H. Greenspan, and J. Malik. Color and texture-based image segmentation using em and its application to content based image retrieval. In Proc. of the Int. Conference on Computer Vision, pages 675–82, 1998.Google Scholar
  2. 2.
    R. Castagno, T. Ebrahimi, and M. Kunt. Video segmentation based on multiple features for interactive multimedia applications. IEEE Trans. on Circuits and Systems for Video Technology, 8(5):562–571, 1998.CrossRefGoogle Scholar
  3. 3.
    S-F Chang, W. Chen, H. Meng, H. Sundaram, and D. Zhong. A fully automated content-based video search engine supporting spatiotemporal queries. IEEE Transactions on Circuits and Systems for Video Technology, 8(5):602–615, 1998.CrossRefGoogle Scholar
  4. 4.
    T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley and Sons, 1991.Google Scholar
  5. 5.
    A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statistical Soc. B, 39(1):1–38, 1997.MathSciNetGoogle Scholar
  6. 6.
    Y. Deng and B.S. Manjunath. Content-based search of video using color, texture and motion. In Proc. IEEE Int. Conf. Image Processing, volume 2, pages 534–537, 1997.CrossRefGoogle Scholar
  7. 7.
    Y. Deng and B.S. Manjunath. Netra-v: Toward an object-based video representation. IEEE Transactions on Circuits and Systems for Video Technology, 8(5):616–627, 1998.CrossRefGoogle Scholar
  8. 8.
    B. Duc, P. Schroeter, and J. Bigun. Spatio-temporal robust motion estimation and segmentation.In 6th Int. Conf. Comput. Anal. Images and Patterns,pages 238–245, 1995.Google Scholar
  9. 9.
    R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. John Wiley and Sons Inc., 1973.Google Scholar
  10. 10.
    A. Hampapur et al. Virage video engine. In Proc. SPIE, volume 3022, pages 188–200, 1997.CrossRefGoogle Scholar
  11. 11.
    B. Horn and B. Schunck. Determining optical flow. Artificial Intell., 17:185–203, 1981.CrossRefGoogle Scholar
  12. 12.
    G. Iyengar and A.B. Lippman. Videobook: An experiment n characterization of video. In Proc. IEEE Int. Conf. Image Processing, volume 3, pages 855–858, 1996.CrossRefGoogle Scholar
  13. 13.
    A. Jepson and M. Black. Mixture models for optical flow computation. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 760–761, 1993.Google Scholar
  14. 14.
    V. Koble, D. Doermann, and K. Lin. Archiving, indexing, and retieval of video in the compressed domain. In Proc. SPIE, volume 2916,pages 78–89, 1996.CrossRefGoogle Scholar
  15. 15.
    C.W. Ngo, T.C. Pong, H.J. Zhang, and R.T. Chin. Motion characterization by temporal slice analysis. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 768–773, 2000.Google Scholar
  16. 16.
    P. Salembier and F. Marques. Region-based representations of image and video: Segmentation tools for multimedia services. IEEE Trans. on Circuits and Systems for Video Technology, 9(8):1147–1168, 1999.CrossRefGoogle Scholar
  17. 17.
    J.Y. Wang and E.H. Adelson. Spatio-temporal segmentation of video data.In SPIE, volume 2182, pages 120–131, 1994.CrossRefGoogle Scholar
  18. 18.
    G. Wyszecki and W. Stiles. Color Science: Concepts and Methods, Quantitative Data and Formulae. Wiley, 1982.Google Scholar
  19. 19.
    L. Zelnik-Manor and M. Irani. Event-based analysis of video. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, December 2001.Google Scholar
  20. 20.
    H.J. Zhang, Y. Gong, S.W. Smoliar, and S.Y. Tan. Automatic parsing of news video. In Proceedings of the International Conference on Multimedia Computing and Systems, pages 45–54, May 1994.Google Scholar
  21. 21.
    H.J. Zhang and S.W. Smoliar. Developing power tools for video and retrieval. In SPIE: Storage Retrieval Image and Video Databases, volume II, 2185, pages 140–149, February 1994.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Hayit Greenspan
    • 1
  • Jacob Goldberger
    • 2
  • Arnaldo Mayer
    • 1
  1. 1.Faculty of EngineeringTel Aviv UniversityTel AvivIsrael
  2. 2.CUTe Ltd.Tel-AvivIsrael

Personalised recommendations