Advertisement

Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12361)

Abstract

A key capability of an intelligent system is deciding when events from past experience must be remembered and when they can be forgotten. Towards this goal, we develop a predictive model of human visual event memory and how those memories decay over time. We introduce Memento10k, a new, dynamic video memorability dataset containing human annotations at different viewing delays. Based on our findings we propose a new mathematical formulation of memorability decay, resulting in a model that is able to produce the first quantitative estimation of how a video decays in memory over time. In contrast with previous work, our model can predict the probability that a video will be remembered at an arbitrary delay. Importantly, our approach combines visual and semantic information (in the form of textual captions) to fully represent the meaning of events. Our experiments on two video memorability benchmarks, including Memento10k, show that our model significantly improves upon the best prior approach (by 12% on average).

Keywords

Memorability estimation Memorability decay Multimodal video understanding 

Notes

Acknowledgment

We thank Zoya Bylinskii and Phillip Isola for their useful discussions and Alex Lascelles and Mathew Monfort for helping with the dataset.

Supplementary material

504471_1_En_14_MOESM1_ESM.pdf (491 kb)
Supplementary material 1 (pdf 491 KB)

References

  1. 1.
    Akagunduz, E., Bors, A.G., Evans, K.K.: Defining image memorability using the visual memory schema. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2165–2178 (2019)CrossRefGoogle Scholar
  2. 2.
    Baddeley, A.: Working memory. Science 255(5044), 556–559 (1992).  https://doi.org/10.1126/science.1736359, https://science.sciencemag.org/content/255/5044/556
  3. 3.
    Bainbridge, W., Isola, P., Aude, O.: The intrinsic memorability of face photographs. J. Exp. Psychol. Gen. 142, 1323–1334 (2013)CrossRefGoogle Scholar
  4. 4.
    Barrouillet, P., Bernardin, S., Camos, V.: Time constraints and resource sharing in adults’ working memory spans. J. Exp. Psychol. Gen. 133, 83–100 (2004).  https://doi.org/10.1037/0096-3445.133.1.83CrossRefGoogle Scholar
  5. 5.
    Baveye, Y., Cohendet, R., Perreira Da Silva, M., Le Callet, P.: Deep learning for image memorability prediction: the emotional bias. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 491–495. ACM (2016)Google Scholar
  6. 6.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606 (2016)
  7. 7.
    Borkin, A., et al.: Beyond memorability: visualization recognition and recall. IEEE Trans. Vis. Comput. Graph. 22(1), 519–528 (2016)CrossRefGoogle Scholar
  8. 8.
    Borkin, M., et al.: What makes a visualization memorable? IEEE Trans. Vis. Comput. Graph. 19(12), 2306–2315 (2013)CrossRefGoogle Scholar
  9. 9.
    Brady, T.F., Konkle, T., Alvarez, G.A., Oliva, A.: Visual long-term memory has a massive storage capacity for object details. Proc. Nat. Acad. Sci. 105(38), 14325–14329 (2008)CrossRefGoogle Scholar
  10. 10.
    Brady, T.F., Konkle, T., Gill, J., Oliva, A., Alvarez, G.A.: Visual long-term memory has the same limit on fidelity as visual working memory. Psychol. Sci. 24(6), 981–990 (2013). pMID: 23630219.  https://doi.org/10.1177/0956797612465439
  11. 11.
    Bylinskii, Z., Isola, P., Bainbridge, C., Torralba, A., Oliva, A.: Intrinsic and extrinsic effects on image memorability. Vis. Res. 116, 165–178 (2015)CrossRefGoogle Scholar
  12. 12.
    Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 4724–4733 (2017).  https://doi.org/10.1109/CVPR.2017.502
  13. 13.
    Cohendet, R., Demarty, C., Duong, N.Q.K., Martin, E.: VideoMem: constructing, analyzing, predicting short-term and long-term video memorability. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2531–2540 (2019)Google Scholar
  14. 14.
    Cohendet, R., et al.: MediaEval 2018: Predicting Media Memorability Task. CoRR abs/1807.01052 (2018). http://arxiv.org/abs/1807.01052
  15. 15.
    Cohendet, R., Yadati, K., Duong, N.Q., Demarty, C.H.: Annotating, understanding, and predicting long-term video memorability. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 178–186. ACM (2018)Google Scholar
  16. 16.
    Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
  17. 17.
    Dubey, R., Peterson, J., Khosla, A., Yang, M.H., Ghanem, B.: What makes an object memorable? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1089–1097 (2015)Google Scholar
  18. 18.
    Engilberge, M., Chevallier, L., Pérez, P., Cord, M.: Finding beans in burgers: Deep semantic-visual embedding with localization. CoRR abs/1804.01720 (2018). http://arxiv.org/abs/1804.01720
  19. 19.
    Fajtl, J., Argyriou, V., Monekosso, D., Remagnino, P.: AMNet: memorability estimation with attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6363–6372 (2018)Google Scholar
  20. 20.
    Frome, A., et al.: DeVise: a deep visual-semantic embedding model. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2121–2129. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5204-devise-a-deep-visual-semantic-embedding-model.pdf
  21. 21.
    Goetschalckx, L., Andonian, A., Oliva, A., Isola, P.: GANalyze: towards visual definition of cognitive image properties. In: IEEE International Conference on Computer Vision, ICCV 2019, Seoul, Korea, pp. 5744–5753 (2019)Google Scholar
  22. 22.
    Goetschalckx, L., Moors, P., Wagemans, J.: Image memorability across longer time intervals. Memory 26, 581–588 (2017).  https://doi.org/10.1080/09658211.2017.1383435CrossRefGoogle Scholar
  23. 23.
    Han, J., Chen, C., Shao, L., Xintao, H., Jungong, H., Tianming, L.: Learning computational models of video memorability from fMRI brain imaging. IEEE Trans. Cybern. 45(8), 1692–1703 (2015)CrossRefGoogle Scholar
  24. 24.
    Isola, P., Parikh, D., Torralba, A., Oliva, A.: Understanding the intrinsic memorability of images. In: Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, pp. 2429–2437 (2011)Google Scholar
  25. 25.
    Isola, P., Xiao, J., Parikh, D., Torralba, A., Oliva, A.: What makes a photograph memorable? IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1469–1482 (2014)Google Scholar
  26. 26.
    Isola, P., Xiao, J., Torralba, A., Oliva, A.: What makes an image memorable? In: The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011, pp. 145–152 (2011).  https://doi.org/10.1109/CVPR.2011.5995721
  27. 27.
    Jaegle, A., Mehrpour, V., Mohsenzadeh, Y., Meyer, T., Oliva, A., Rust, N.: Population response magnitude variation in inferotemporal cortex predicts image memorability. ELife 8, e47596 (2019)CrossRefGoogle Scholar
  28. 28.
    Karpathy, A., Li, F.: Deep visual-semantic alignments for generating image descriptions. CoRR abs/1412.2306 (2014). http://arxiv.org/abs/1412.2306
  29. 29.
    Khosla, A., Bainbridge, W., Torralba, A., Oliva, A.: Modifying the memorability of face photographs. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3200–3207 (2013)Google Scholar
  30. 30.
    Khosla, A., Raju, A.S., Torralba, A., Oliva, A.: Understanding and predicting image memorability at a large scale. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2390–2398 (2015)Google Scholar
  31. 31.
    Khosla, A., Xiao, J., Torralba, A., Oliva, A.: Memorability of image regions. In: Advances in Neural Information Processing Systems, pp. 305–313 (2012)Google Scholar
  32. 32.
    Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. CoRR abs/1411.2539 (2014). http://arxiv.org/abs/1411.2539
  33. 33.
    Konkle, T., Brady, T., Alvarez, G., Oliva, A.: Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. J. Exp. Psychol. Gen. 139, 558–578 (2010).  https://doi.org/10.1037/a0019165CrossRefGoogle Scholar
  34. 34.
    Konkle, T., Brady, T., Alvarez, G., Oliva, A.: Scene memory is more detailed than you think: the role of categories in visual long-term memory. Psychol. Sci. 21, 1551–6 (2010).  https://doi.org/10.1177/0956797610385359CrossRefGoogle Scholar
  35. 35.
    Koutstaal, W., Reddy, C., Jackson, E., Prince, S., Cendan, D., Schacter, D.: False recognition of abstract versus common objects in older and younger adults: testing the semantic categorization account. J. Exp. Psychol. Learn. Mem. Cogn. 29, 499–510 (2003)CrossRefGoogle Scholar
  36. 36.
    Mohsenzadeh, Y., Mullin, C., Oliva, A., Pantazis, D.: The perceptual neural trace of memorable unseen scenes. Sci. Rep. 8, 6033 (2019)CrossRefGoogle Scholar
  37. 37.
    Monfort, M., et al.: Moments in time dataset: one million videos for event understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42, 502–508 (2019)CrossRefGoogle Scholar
  38. 38.
    Perera, S., Tal, A., Zelnik-Manor, L.: Is image memorability prediction solved? In: The IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)Google Scholar
  39. 39.
    Shekhar, S., Singal, D., Singh, H., Kedia, M., Shetty, A.: Show and recall: learning what makes videos memorable. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2730–2739 (2017)Google Scholar
  40. 40.
    Sidorov, O.: Changing the image memorability: from basic photo editing to GANs. In: The IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)Google Scholar
  41. 41.
    Squalli-Houssaini, H., Duong, N., Gwenaëlle, M., Demarty, C.H.: Deep learning for predicting image memorability. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2018)Google Scholar
  42. 42.
    Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence - video to text. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)Google Scholar
  43. 43.
    Võ, M.L.H., Bylinskii, Z., Oliva, A.: Image memorability in the eye of the beholder: tracking the decay of visual scene representations. bioRxiv, p. 141044 (2017)Google Scholar
  44. 44.
    Vogt, S., Magnussen, S.: Long-term memory for 400 pictures on a common theme. Exp. Psychol. 54, 298–303 (2007)CrossRefGoogle Scholar
  45. 45.
    Wiseman, S., Neisser, U.: Perceptual organization as a determinant of visual recognition memory. Am. J. Psychol. 87, 675–681 (1974)CrossRefGoogle Scholar
  46. 46.
    Xiao, H.: Bert-as-service (2018). https://github.com/hanxiao/bert-as-service
  47. 47.
    Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. CoRR abs/1502.03044 (2015). http://arxiv.org/abs/1502.03044
  48. 48.
    Zarezadeh, S., Rezaeian, M., Sadeghi, M.T.: Image memorability prediction using deep features. In: 2017 Iranian Conference on Electrical Engineering (ICEE), pp. 2176–2181. IEEE (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Massachusetts Institute of TechnologyCambridgeUSA

Personalised recommendations