Summarizing Multimedia Content

  • Natwar Modani
  • Pranav Maneriker
  • Gaurush Hiranandani
  • Atanu R. Sinha
  • Utpal
  • Vaishnavi Subramanian
  • Shivani Gupta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10042)

Abstract

Today multimedia content comprising both text and images is growing at a rapid pace. There has been a body of work to summarize text content, but to the best of our knowledge, no method has been developed to summarize multimedia content. We propose two methods for summarizing multimedia content. Our novel approach explicitly recognizes two desirable, normative characteristics of a summary - good coverage and diversity of the respective text and images, and that text and images should be coherent with each other. Two methods are examined - graph based and a modification to the submodular approach. Moreover, we propose a metric to measure the quality of a multimedia summary which captures coverage and diversity of text and images as well as coherence between the text and images in the summary. We experimentally demonstrate that the proposed methods achieve good quality multimedia summaries.

Keywords

Summarization Text and images Multimedia content Algorithms 

References

  1. 1.
    dAcierno, A., Gargiulo, F., Moscato, V., Penta, A., Persia, F., Picariello, A., Sansone, C., Sperl, G.: A multimedia summarizer integrating text and images. In: Intelligent Interactive Multimedia Systems and Services, pp. 21–33. Smart Innovation, Systems and Technologies (2014)Google Scholar
  2. 2.
    Ding, D., Metze, F., Rawat, S., Schulam, P.F., Burger, S.: Generating natural language summaries for multimedia. In: Proceedings of the Seventh International Natural Language Generation Conference, pp. 128–130. Association for Computational Linguistics (2012)Google Scholar
  3. 3.
    Ding, D., Metze, F., Rawat, S., Schulam, P.F., Burger, S., Younessian, E., Bao, L., Christel, M.G., Hauptmann, A.: Beyond audio and video retrieval: towards multimedia summarization. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, p. 2. ACM (2012)Google Scholar
  4. 4.
    Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1_2 CrossRefGoogle Scholar
  5. 5.
    Kageback, M., Mogren, O., Tahmasebi, N., Dubhashi, D.: Extractive summarization using continuous vector space models. In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC), pp. 31–39. EACL (2014)Google Scholar
  6. 6.
    Karpathy, A., Joulin, A., Fei-Fei, L.: Deep fragment embeddings for bidirectional image sentence mapping. Archive, Cornell University Library (2014). http://arxiv.org/abs/1406.5679
  7. 7.
    Krähenbühl, P., Koltun, V.: Geodesic object proposals. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 725–739. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10602-1_47 Google Scholar
  8. 8.
    Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Baby talk: understanding and generating simple image descriptions. In: CVPR (2011)Google Scholar
  9. 9.
    Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, Stroudsburg, PA, USA, vol. 1, pp. 510–520 (2011)Google Scholar
  10. 10.
    Luhn, H.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Mihalcea, R.: Language independent extractive summarization. In: ACLdemo, pp. 49–52 (2005)Google Scholar
  12. 12.
    Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., Berg, A., Yamaguchi, K., Berg, T., Stratos, K., Hal Daum, I.: Midge: generating image descriptions from computer vision detections. In: EACL (2012)Google Scholar
  13. 13.
    Modani, N., Khabiri, E., Srinivasan, H., Caverlee, J.: Graph based modeling for product review summarization. In: WISE (2015)Google Scholar
  14. 14.
    Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 43–76. Springer, New York (2012)CrossRefGoogle Scholar
  15. 15.
    Ordonez, V., Kulkarni, G., Berg, T.L.: Im2text: describing images using 1 million captioned photographs. In: NIPS (2011)Google Scholar
  16. 16.
    Socher, R., Fei-Fei, L.: Connecting modalities: semi-supervised segmentation and annotation of images using unaligned text corpora. In: CVPR (2010)Google Scholar
  17. 17.
    Wu, J., Xu, B., Li, S.: An unsupervised approach to rank product reviews. In: FSKD, pp. 1769–1772 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Natwar Modani
    • 1
  • Pranav Maneriker
    • 1
  • Gaurush Hiranandani
    • 1
  • Atanu R. Sinha
    • 1
  • Utpal
    • 1
  • Vaishnavi Subramanian
    • 1
  • Shivani Gupta
    • 2
  1. 1.BigData Experience LabAdobe ResearchBangaloreIndia
  2. 2.Adobe IndiaNoidaIndia

Personalised recommendations