Rethinking Summarization and Storytelling for Modern Social Multimedia

  • Stevan RudinacEmail author
  • Tat-Seng Chua
  • Nicolas Diaz-Ferreyra
  • Gerald Friedland
  • Tatjana Gornostaja
  • Benoit Huet
  • Rianne Kaptein
  • Krister Lindén
  • Marie-Francine Moens
  • Jaakko Peltonen
  • Miriam Redi
  • Markus Schedl
  • David A. Shamma
  • Alan Smeaton
  • Lexing Xie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10704)


Traditional summarization initiatives have been focused on specific types of documents such as articles, reviews, videos, image feeds, or tweets, a practice which may result in pigeonholing the summarization task in the context of modern, content-rich multimedia collections. Consequently, much of the research to date has revolved around mostly toy problems in narrow domains and working on single-source media types. We argue that summarization and story generation systems need to refocus the problem space in order to meet the information needs in the age of user-generated content in different formats and languages. Here we create a framework for flexible multimedia storytelling. Narratives, stories, and summaries carry a set of challenges in big data and dynamic multi-source media that give rise to new research in spatial-temporal representation, viewpoint generation, and explanation.


Social multimedia Summarization Storytelling 


  1. 1.
    Tian, Y., Srivastava, J., Huang, T., Contractor, N.: Social multimedia computing. Computer 43(8), 27–36 (2010)CrossRefGoogle Scholar
  2. 2.
    Kouloumpis, E., Wilson, T., Moore, J.: Twitter sentiment analysis: the good the bad and the OMG!, pp. 538–541. AAAI Press (2011)Google Scholar
  3. 3.
    Cha, M., Kwak, H., Rodriguez, P., Ahn, Y.Y., Moon, S.: I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In: ACM IMC 2007, pp. 1–14 (2007)Google Scholar
  4. 4.
    Shamma, D.A., Kennedy, L., Churchill, E.F.: Tweet the debates: understanding community annotation of uncollected sources. In: ACM WSM 2009, pp. 3–10 (2009)Google Scholar
  5. 5.
    Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: ACM WWW 2010, pp. 591–600 (2010)Google Scholar
  6. 6.
    Bian, J., Yang, Y., Zhang, H., Chua, T.S.: Multimedia summarization for social events in microblog stream. IEEE Trans. Multimed. 17(2), 216–228 (2015)CrossRefGoogle Scholar
  7. 7.
    Hong, R., Tang, J., Tan, H.K., Ngo, C.W., Yan, S., Chua, T.S.: Beyond search: event-driven summarization for web videos. ACM Trans. Multimed. Comput. Commun. Appl. 7(4), 35:1–35:18 (2011)CrossRefGoogle Scholar
  8. 8.
    Gornostay (Gornostaja), T., Aker, A.: Development and implementation of multilingual object type toponym-referenced text corpora for optimizing automatic image description generation. In: Dialogue 2009 (2009)Google Scholar
  9. 9.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS 2013, pp. 3111–3119. CAI (2013)Google Scholar
  10. 10.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS 2012, pp. 1097–1105. CAI (2012)Google Scholar
  11. 11.
    Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 43–76. Springer, Boston (2012). CrossRefGoogle Scholar
  12. 12.
    Rudinac, S., Larson, M., Hanjalic, A.: Learning crowdsourced user preferences for visual summarization of image collections. IEEE Trans. Multimed. 15(6), 1231–1243 (2013)CrossRefGoogle Scholar
  13. 13.
    Rudinac, S., Hanjalic, A., Larson, M.: Generating visual summaries of geographic areas using community-contributed images. IEEE Trans. Multimed. 15(4), 921–932 (2013)CrossRefGoogle Scholar
  14. 14.
    Xie, L., Sundaram, H., Campbell, M.: Event mining in multimedia streams. Proc. IEEE 96(4), 623–647 (2008)CrossRefGoogle Scholar
  15. 15.
    Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: ACM SIGIR 1998, pp. 335–336. ACM, New York (1998)Google Scholar
  16. 16.
    Uchihashi, S., Foote, J., Girgensohn, A., Boreczky, J.: Video manga: generating semantically meaningful video summaries. In: ACM MM 1999, pp. 383–392. ACM (1999)Google Scholar
  17. 17.
    Sundaram, H., Xie, L., Chang, S.F.: A utility framework for the automatic generation of audio-visual skims. In: ACM MM 2002, pp. 189–198. ACM (2002)Google Scholar
  18. 18.
    Kennedy, L., Naaman, M.: Less talk, more rock: automated organization of community-contributed collections of concert videos. In: WWW 2009, pp. 311–320. ACM (2009)Google Scholar
  19. 19.
    Lu, Z., Grauman, K.: Story-driven summarization for egocentric video. In: IEEE CVPR 2013, pp. 2714–2721 (2013)Google Scholar
  20. 20.
    Yu, S.I., Yang, Y., Hauptmann, A.: Harry potter’s marauder’s map: localizing and tracking multiple persons-of-interest by nonnegative discretization. In: IEEE CVPR 2013, pp. 3714–3720 (2013)Google Scholar
  21. 21.
    Xie, L., Natsev, A., Kender, J.R., Hill, M., Smith, J.R.: Visual memes in social media: tracking real-world news in YouTube videos. In: ACM MM 2011, pp. 53–62. ACM (2011)Google Scholar
  22. 22.
    Wactlar, H.D., Kanade, T., Smith, M.A., Stevens, S.M.: Intelligent access to digital video: informedia project. Computer 29(5), 46–52 (1996)CrossRefGoogle Scholar
  23. 23.
    Kennedy, L.S., Naaman, M.: Generating diverse and representative image search results for landmarks. In: ACM WWW 2008, pp. 297–306 (2008)Google Scholar
  24. 24.
    van Leuken, R.H., Garcia, L., Olivares, X., van Zwol, R.: Visual diversification of image search results. In: ACM WWW 2009, pp. 341–350 (2009)Google Scholar
  25. 25.
    Lestari Paramita, M., Sanderson, M., Clough, P.: Diversity in photo retrieval: overview of the ImageCLEFPhoto task 2009. In: Peters, C., Caputo, B., Gonzalo, J., Jones, G.J.F., Kalpathy-Cramer, J., Müller, H., Tsikrika, T. (eds.) CLEF 2009. LNCS, vol. 6242, pp. 45–59. Springer, Heidelberg (2010). CrossRefGoogle Scholar
  26. 26.
    Sanderson, M., Tang, J., Arni, T., Clough, P.: What else is there? Search diversity examined. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 562–569. Springer, Heidelberg (2009). CrossRefGoogle Scholar
  27. 27.
    Ionescu, B., Popescu, A., Radu, A.L., Müller, H.: Result diversification in social image retrieval: a benchmarking framework. Multimed. Tools Appl. 75(2), 1301–1331 (2016)CrossRefGoogle Scholar
  28. 28.
    Ren, Z., Liang, S., Meij, E., de Rijke, M.: Personalized time-aware tweets summarization. In: ACM SIGIR 2013, pp. 513–522 (2013)Google Scholar
  29. 29.
    Hao, Q., Cai, R., Wang, X.J., Yang, J.M., Pang, Y., Zhang, L.: Generating location overviews with images and tags by mining user-generated travelogues. In: ACM MM 2009, pp. 801–804. ACM, New York (2009)Google Scholar
  30. 30.
    Gurrin, C., Smeaton, A.F., Doherty, A.R.: Lifelogging: personal big data. Found. Trends Inf. Retr. 8(1), 1–125 (2014)CrossRefGoogle Scholar
  31. 31.
    Lee, Y.J., Ghosh, J., Grauman, K.: Discovering important people and objects for egocentric video summarization. In: IEEE CVPR 2012, pp. 1346–1353, June 2012Google Scholar
  32. 32.
    Harman, D., Over, P.: The DUC summarization evaluations. In: HLT 2002, San Francisco, CA, USA, pp. 44–51. Morgan Kaufmann Publishers Inc. (2002)Google Scholar
  33. 33.
    Dang, H.T.: Overview of DUC 2006. In: DUC 2006 (2006)Google Scholar
  34. 34.
    Owczarzak, K., Dang, H.T.: Overview of the TAC 2011 summarization track: guided task and AESOP task. In: TAC 2011 (2011)Google Scholar
  35. 35.
    Over, P., Smeaton, A.F., Awad, G.: The TRECVid 2008 BBC rushes summarization evaluation. In: ACM TVS 2008, pp. 1–20 (2008)Google Scholar
  36. 36.
    Radev, D.R., Hovy, E., McKeown, K.: Introduction to the special issue on summarization. Comput. Linguist. 28(4), 399–408 (2002)CrossRefGoogle Scholar
  37. 37.
    Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: ACL 2004 Workshop, pp. 74–81 (2004)Google Scholar
  38. 38.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL 2002, pp. 311–318 (2002)Google Scholar
  39. 39.
    Nenkova, A., Passonneau, R.J.: Evaluating content selection in summarization: the pyramid method. In: HLT-NAACL, pp. 145–152 (2004)Google Scholar
  40. 40.
    Li, Y., Merialdo, B.: VERT: automatic evaluation of video summaries. In: ACM MM 2010, pp. 851–854 (2010)Google Scholar
  41. 41.
    Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban computing: concepts, methodologies, and applications. ACM Trans. Intell. Syst. Technol. 5(3), 38:1–38:55 (2014)Google Scholar
  42. 42.
    Boonzajer Flaes, J., Rudinac, S., Worring, M.: What multimedia sentiment analysis says about city liveability. In: Ferro, N., Crestani, F., Moens, M.-F., Mothe, J., Silvestri, F., Di Nunzio, G.M., Hauff, C., Silvello, G. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 824–829. Springer, Cham (2016). CrossRefGoogle Scholar
  43. 43.
    Dey, L., Haque, S.M., Khurdiya, A., Shroff, G.: Acquiring competitive intelligence from social media. In: MOCR AND 2011, p. 3. ACM (2011)Google Scholar
  44. 44.
    Doherty, A.R., Hodges, S.E., King, A.C., Smeaton, A.F., Berry, E., Moulin, C.J., Lindley, S., Kelly, P., Foster, C.: Wearable cameras in health. Am. J. Prev. Med. 44, 320–323 (2013)CrossRefGoogle Scholar
  45. 45.
    Lee, H., Smeaton, A.F., O’Connor, N.E., Jones, G., Blighe, M., Byrne, D., Doherty, A., Gurrin, C.: Constructing a SenseCam visual diary as a media process. Multimed. Syst. 14(6), 341–349 (2008)CrossRefGoogle Scholar
  46. 46.
    Schoen, H., Gayo-Avello, D., Takis Metaxas, P., Mustafaraj, E., Strohmaier, M., Gloor, P.: The power of prediction with social media. Internet Res. 23(5), 528–543 (2013)CrossRefGoogle Scholar
  47. 47.
    Tian, M., Sandler, M.B.: Towards music structural segmentation across genres: features, structural hypotheses, and annotation principles. ACM Trans. Intell. Syst. Technol. 8(2), 23:1–23:19 (2016)CrossRefGoogle Scholar
  48. 48.
    Goto, M.: A chorus section detection method for musical audio signals and its application to a music listening station. IEEE Trans. Audio Speech Lang. Process. 14(5), 1783–1794 (2006)CrossRefGoogle Scholar
  49. 49.
    Chai, W.: Semantic segmentation and summarization of music. IEEE Sig. Process. Mag. 23(2), 124–132 (2006)CrossRefGoogle Scholar
  50. 50.
    Schedl, M., Flexer, A., Urbano, J.: The neglected user in music information retrieval research. J. Intell. Inf. Syst. 41, 523–539 (2013)CrossRefGoogle Scholar
  51. 51.
    Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., Li, L.J.: YFCC100M: the new data in multimedia research. Commun. ACM 59(2), 64–73 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Stevan Rudinac
    • 1
    Email author
  • Tat-Seng Chua
    • 2
  • Nicolas Diaz-Ferreyra
    • 3
  • Gerald Friedland
    • 4
  • Tatjana Gornostaja
    • 5
  • Benoit Huet
    • 6
  • Rianne Kaptein
    • 7
  • Krister Lindén
    • 8
  • Marie-Francine Moens
    • 9
  • Jaakko Peltonen
    • 10
  • Miriam Redi
    • 11
  • Markus Schedl
    • 12
  • David A. Shamma
    • 13
  • Alan Smeaton
    • 14
  • Lexing Xie
    • 15
  1. 1.University of AmsterdamAmsterdamThe Netherlands
  2. 2.National University of SingaporeSingaporeSingapore
  3. 3.Universität Duisburg-EssenDuisburgGermany
  4. 4.University of CaliforniaBerkeleyUSA
  5. 5.TildeRigaLatvia
  6. 6.EURECOMSophia AntipolisFrance
  7. 7.CrunchrAmsterdamThe Netherlands
  8. 8.University of HelsinkiHelsinkiFinland
  9. 9.KU LeuvenLeuvenBelgium
  10. 10.Aalto UniversityEspooFinland
  11. 11.Nokia Bell LabsCambridgeUK
  12. 12.Johannes Kepler Universität LinzLinzAustria
  13. 13.FX Palo Alto Laboratory, Inc.Palo AltoUSA
  14. 14.Dublin City UniversityDublinIreland
  15. 15.Australian National UniversityCanberraAustralia

Personalised recommendations