Movie/Script: Alignment and Parsing of Video and Text Transcription

Cour, Timothee; Jordan, Chris; Miltsakaki, Eleni; Taskar, Ben

doi:10.1007/978-3-540-88693-8_12

Timothee Cour⁴,
Chris Jordan⁴,
Eleni Miltsakaki⁴ &
…
Ben Taskar⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5305))

Included in the following conference series:

European Conference on Computer Vision

10k Accesses
39 Citations

Abstract

Movies and TV are a rich source of diverse and complex video of people, objects, actions and locales “in the wild”. Harvesting automatically labeled sequences of actions from video would enable creation of large-scale and highly-varied datasets. To enable such collection, we focus on the task of recovering scene structure in movies and TV series for object tracking and action retrieval. We present a weakly supervised algorithm that uses the screenplay and closed captions to parse a movie into a hierarchy of shots and scenes. Scene boundaries in the movie are aligned with screenplay scene labels and shots are reordered into a sequence of long continuous tracks or threads which allow for more accurate tracking of people, actions and objects. Scene segmentation, alignment, and shot threading are formulated as inference in a unified generative model and a novel hierarchical dynamic programming algorithm that can handle alignment and jump-limited reorderings in linear time is presented. We present quantitative and qualitative results on movie alignment and parsing, and use the recovered structure to improve character naming and retrieval of common actions in several episodes of popular TV series.

Download to read the full chapter text

Chapter PDF

Towards Automatic Textual Summarization of Movies

Language-Motivated Approaches to Action Recognition

Aligning plot synopses to videos for story-based retrieval

Article 11 September 2014

Makarand Tapaswi, Martin Bäuml & Rainer Stiefelhagen

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Huang, G., Jain, V., Learned-Miller, E.: Unsupervised joint alignment of complex images. In: International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar
Ramanan, D., Baker, S., Kakade, S.: Leveraging archival video for building face datasets. In: International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar
Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition (2008), http://lear.inrialpes.fr/pubs/2008/LMSR08
Sivic, J., Everingham, M., Zisserman, A.: Person spotting: video shot retrieval for face sets. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, Springer, Heidelberg (2005)
Google Scholar
Everingham, M., Sivic, J., Zisserman, A.: Hello! my name is.. buffy – automatic naming of characters in tv video. In: Proceedings of the British Machine Vision Conference (2006)
Google Scholar
Lienhart, R.: Reliable transition detection in videos: A survey and practitioner’s guide. Int. Journal of Image and Graphics (2001)
Google Scholar
Ngo, C.-W., Pong, T.C., Zhang, H.J.: Recent advances in content-based video analysis. International Journal of Image and Graphics 1, 445–468 (2001)
Article Google Scholar
Zhai, Y., Shah, M.: Video scene segmentation using markov chain monte carlo. IEEE Transactions on Multimedia 8, 686–697 (2006)
Article Google Scholar
Yeung, M., Yeo, B.L., Liu, B.: Segmentation of video by clustering and graph analysis. Comp. Vision Image Understanding (1998)
Google Scholar
Kender, J., Yeo, B.: Video scene segmentation via continuous video coherence. In: IEEE Conference on Computer Vision and Pattern Recognition (1998)
Google Scholar
Balas, E., Simonetti, N.: Linear time dynamic programming algorithms for new classes of restricted tsps: A computational study. INFORMS Journal on Computing 13, 56–75 (2001)
Article MathSciNet Google Scholar
Myers, C.S., Rabiner, L.R.: A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal (1981)
Google Scholar
Viola, P.A., Jones, M.J.: Robust real-time face detection. International Journal of Computer Vision 57, 137–154 (2004)
Article Google Scholar
Everingham, M.R., Sivic, J., Zisserman, A.: Hello! my name is buffy: Automatic naming of characters in tv video. In: BMVC, vol. III, p. 899 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Pennsylvania, Philadelphia, PA, 19104, USA
Timothee Cour, Chris Jordan, Eleni Miltsakaki & Ben Taskar

Authors

Timothee Cour
View author publications
You can also search for this author in PubMed Google Scholar
Chris Jordan
View author publications
You can also search for this author in PubMed Google Scholar
Eleni Miltsakaki
View author publications
You can also search for this author in PubMed Google Scholar
Ben Taskar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Illinois at Urbana Champaign, 3310 Siebel Hall, IL 61801, Urbana, USA
David Forsyth
Department of Computing, Wheatley, Oxford Brookes University, OX33 1HX, Oxford, UK
Philip Torr
Department of Engineering Science, University of Oxford, Parks Road, OX1 3PJ, Oxford, UK
Andrew Zisserman

Electronic Supplementary Material

Supplementary material(9,998 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cour, T., Jordan, C., Miltsakaki, E., Taskar, B. (2008). Movie/Script: Alignment and Parsing of Video and Text Transcription. In: Forsyth, D., Torr, P., Zisserman, A. (eds) Computer Vision – ECCV 2008. ECCV 2008. Lecture Notes in Computer Science, vol 5305. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88693-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-88693-8_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88692-1
Online ISBN: 978-3-540-88693-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movie/Script: Alignment and Parsing of Video and Text Transcription

Abstract

Chapter PDF

Similar content being viewed by others

Towards Automatic Textual Summarization of Movies

Language-Motivated Approaches to Action Recognition

Aligning plot synopses to videos for story-based retrieval

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Electronic Supplementary Material

Supplementary material(9,998 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Movie/Script: Alignment and Parsing of Video and Text Transcription

Abstract

Chapter PDF

Similar content being viewed by others

Towards Automatic Textual Summarization of Movies

Language-Motivated Approaches to Action Recognition

Aligning plot synopses to videos for story-based retrieval

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Electronic Supplementary Material

Supplementary material(9,998 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation