Recognizing Objects and Scenes in News Videos

Baştan, Muhammet; Duygulu, Pınar

doi:10.1007/11788034_39

Muhammet Baştan²⁰ &
Pınar Duygulu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4071))

Included in the following conference series:

International Conference on Image and Video Retrieval

781 Accesses
1 Citations

Abstract

We propose a new approach to recognize objects and scenes in news videos motivated by the availability of large video collections. This approach considers the recognition problem as the translation of visual elements to words. The correspondences between visual elements and words are learned using the methods adapted from statistical machine translation and used to predict words for particular image regions (region naming), for entire images (auto-annotation), or to associate the automatically generated speech transcript text with the correct video frames (video alignment). Experimental results are presented on TRECVID 2004 data set, which consists of about 150 hours of news videos associated with manual annotations and speech transcript text. The results show that the retrieval performance can be improved by associating visual and textual elements. Also, extensive analysis of features are provided and a method to combine features are proposed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Giza++, http://www.fjoch.com/GIZA++.html
Trec vieo retrieval evaluation, http://www-nlpir.nist.gov/projects/trecvid
Barnard, K., Duygulu, P., de Freitas, N., Forsyth, D.A., Blei, D., Jordan, M.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Article MATH Google Scholar
Blei, D., Jordan, M.I.: Modeling annotated data. In: 26th Annual International ACM SIGIR Conference, Toronto, Canada, July 28-August 1, pp. 127–134 (2003)
Google Scholar
Brown, P., Pietra, S.A.D., Pietra, V.J.D., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Google Scholar
Carbonetto, P., de Freitas, N., Barnard, K.: A Statistical Model for General Contextual Object Recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 350–362. Springer, Heidelberg (2004)
Chapter Google Scholar
Duygulu, P., Barnard, K., Freitas, N., Forsyth, D.A.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Chapter Google Scholar
Duygulu, P., Wactlar, H.: Associating video frames with text. In: Multimedia Information Retrieval Workshop in conjuction with the 26th annual ACM SIGIR conference on Information Retrieval, Toronto, Canada (2003)
Google Scholar
Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the International Conference on Pattern Recognition (CVPR 2004), vol. 2, pp. 1002–1009 (2004)
Google Scholar
Gauvain, J., Lamel, L., Adda, G.: The limsi broadcast news transcription system. Speech Communication 37(1-2), 89–108 (2002)
Article MATH Google Scholar
Ghoshal, A., Ircing, P., Khudanpur, S.: Hidden Markov Models for Automatic Annotation and Content Based Retrieval of Images and Video. In: The 28th International ACM SIGIR Conference, Salvador, Brazil, August 15-19 (2005)
Google Scholar
Lin, C.-Y., Tseng, B.L., Smith, J.R.: Video collaborative annotation forum: Establishing ground-truth labels on large multimedia datasets. In: NIST TREC 2003 Video Retrieval Evaluation Conference, Gaithersburg, MD (November 2003)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2) (2004)
Google Scholar
Och, F.J., Ney, H.: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics 1(29), 19–51 (2003)
Article Google Scholar
Quelhas, P., Monay, F., Odobez, J.-M., Gatica-Perez, D., Tuytelaars, T., Gool, L.V.: Modeling scenes with local descriptors and latent aspects. In: Proc. International Conference on Computer Vision (ICCV), Beijing (2005)
Google Scholar
Yang, J., Chen, M.-Y., Hauptmann, A.: Finding person X: Correlating names with visual appearances. In: Enser, P.G.B., Kompatsiaris, Y., O’Connor, N.E., Smeaton, A.F., Smeulders, A.W.M. (eds.) CIVR 2004. LNCS, vol. 3115, pp. 270–278. Springer, Heidelberg (2004)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Bilkent University, Ankara, Turkey
Muhammet Baştan & Pınar Duygulu

Authors

Muhammet Baştan
View author publications
You can also search for this author in PubMed Google Scholar
Pınar Duygulu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Arts, Media and Engineering Program, Arizona State University, 85281, Tempe, AZ,
Hari Sundaram
Intelligent Information Management Department, IBM T.J. Watson Research Center, 19 Skyline Drive, 10532, Hawthorne, NY, USA
Milind Naphade
Intelligent Information Management Department, IBM T. J. Watson Research Center, 19 Skyline Drive, 10532, Hawthorne, NY, USA
John R. Smith
Microsoft Corporation, Microsoft China R&D Group, 49 Zhichun Road, 100080, Beijing, China
Yong Rui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baştan, M., Duygulu, P. (2006). Recognizing Objects and Scenes in News Videos. In: Sundaram, H., Naphade, M., Smith, J.R., Rui, Y. (eds) Image and Video Retrieval. CIVR 2006. Lecture Notes in Computer Science, vol 4071. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11788034_39

Download citation

DOI: https://doi.org/10.1007/11788034_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-36018-6
Online ISBN: 978-3-540-36019-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics