Abstract
Various works have suggested that the memorability of an image is consistent across people, and thus can be treated as an intrinsic property of an image. Using computer vision models, we can make specific predictions about what people will remember or forget. While older work has used now-outdated deep learning architectures rooted in shallow visual processing to predict image memorability, innovations in the field have given us new techniques to apply to this problem. Here, we propose and evaluate five alternative deep learning models which exploit developments in the field from the last 5 years, largely the introduction of residual neural networks, which are intended to allow the model to use semantic information in the memorability estimation process. These new models were tested against the prior state of the art with a combined dataset built to optimize both within-category and across-category predictions. Our findings suggest that the key prior memorability network had overstated its generalizability and was overfit on its training set. Our new models outperform this prior model, leading us to conclude that residual networks outperform simpler convolutional neural networks in memorability regression. We make our new state-of-the-art model readily available to the research community, allowing memory researchers to make predictions about memorability on a wider range of images.
Similar content being viewed by others
Availability of Data and Material
The model is available from the python packaging authority (https://pypi.org/project/resmem/), and an online demo is available on the Brain Bridge Lab website (https://brainbridgelab.uchicago.edu/resmem). Miscellaneous data, including feature analyses, prediction performance within all subcategories of MemCat, and an archival copy of the pretrained model, are hosted on OSF at (https://osf.io/qf5ry/). The data used to train ResMem came from two sources. LaMem is hosted by MIT (http://memorability.csail.mit.edu/download.html). MemCat is hosted by the Flemish government (https://gestaltrevision.be/projects/memcat/).
Code Availability
The code for the ResMem package as published is hosted on GitHub at (https://github.com/Brain-Bridge-Lab/resmem). The code used to generate figures and run analyses is split across two repositories, https://github.com/Brain-Bridge-Lab/BrainBridge-MemNet) and https://github.com/Brain-Bridge-Lab/resmem-analysis).
References
Bainbridge, W.A. (2017). The memorability of people: intrinsic memorability across transformations of a person’s face. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(5), 706–716. https://doi.org/10.1037/xlm0000339.
Bainbridge, W.A. (2019). Memorability: how what we see influences what we remember. In K.D. Federmeier D.M. Beck (Eds.) Psychology of Learning and Motivation, (Vol. 70 pp. 1–27).
Bainbridge, W.A., & Rissman, J. (2018). Dissociating neural markers of stimulus memorability and subjective recognition during episodic retrieval. Scientific Reports, 8(1), 8679. https://doi.org/10.1038/s41598-018-26467-5.
Bainbridge, W.A., Isola, P., & Oliva, A. (2013). The intrinsic memorability of face photographs. Journal of Experimental Psychology: General, 142(4), 1323–1334. https://doi.org/10.1037/a0033872.
Bainbridge, W.A., Dilks, D.D., & Oliva, A. (2017). Memorability: a stimulusdriven perceptual neural signature distinctive from memory. NeuroImage, 149, 141–152. https://doi.org/10.1016/j.neuroimage.2017.01.063.
Bainbridge, W.A., Berron, D., Schütze, H., Cardenas-Blanco, A., Metzger, C., Dobisch, L., Bittner, D., Glanz, W., Spottke, A., Rudolph, J., Brosseron, F., Buerger, K., Janowitz, D., Fliessbach, K., Heneka, M., Laske, C., Buchmann, M., Peters, O., Diesing, D., ..., Düzel, E. (2019). Memorability of photographs in subjective cognitive decline and mild cognitive impairment: implications for cognitive assessment. Alzheimer’s and Dementia: Diagnosis, Assessment & Disease Monitoring, 11(1), 610–618. https://doi.org/10.1016/j.dadm.2019.07.005.
Basavaraju, S., Gaj, S., & Sur, A. (2019). Object memorability prediction using deep learning: location and size bias. Journal of Visual Communication and Image Representation, 59, 117–127. https://doi.org/10.1016/j.jvcir.2019.01.008.
Chellapilla, K., Puri, S., & Simard, P. (2006). High performance convolutional neural networks for document processing. In G. Lorette (Ed.) Tenth international workshop on frontiers in handwriting recognition. Université de Rennes 1. http://www.suvisoft.com. https://hal.inria.fr/inria-00112631. La Baule: Suvisoft.
Cichy, R.M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports, 6(1), 27755. https://doi.org/10.1038/srep27755.
Cireşan, D.C., Meier, U., Gambardella, L.M., & Schmidhuber, J. (2010). Deep, big, simple neural nets for handwritten digit recognition. Neural Computation, 22(12), 3207–3220. https://doi.org/10.1162/NECO_a_00052.
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition (pp. 248–255). Miami: IEEE, DOI https://doi.org/10.1109/CVPR.2009.5206848, (to appear in print).
Dubey, R., Peterson, J., Khosla, A., Yang, M.-H., & Ghanem, B. (2015). What makes an object memorable? Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015, 1089–1097.
Fajtl, J., Argyriou, V., Monekosso, D., & Remagnino, P. (2018). AMNet: memorability estimation with attention. In 2018 IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2018.00666 (pp. 6363–6372). Salt Lake City: IEEE.
Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In 2009 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2009.5206772 (pp. 1778–1785). Miami: IEEE.
Fukushima, K. (1980). Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202. https://doi.org/10.1007/BF00344251.
Goetschalckx, L., & Wagemans, J. (2019). MemCat: a new category-based image set quantified on memorability. PeerJ, 7, 8169. https://doi.org/10.7717/peerj.8169.
Harris, C.R., Millman, K.J., van der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M.H., Brett, M., Haldane, A., del Río, J.F., Wiebe, M., Peterson, P., ..., Oliphant, T.E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. arXiv:1512.03385 [cs].
Hovhannisyan, M., Clarke, A., Geib, B.R., Cicchinelli, R., Monge, Z., Worth, T., Szymanski, A., Cabeza, R., & Davis, S.W. (2021). The visual and semantic features that predict object memory: concept property norms for 1,000 object images. Memory & Cognition, 49, 712–731. https://doi.org/10.3758/s13421-020-01130-5.
Huiskes, M.J., & Lew, M.S. (2008). The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM international conference on multimedia information retrieval. MIR ’08. https://doi.org/10.1145/1460096.1460104 (pp. 39–43). New York: Association for Computing Machinery.
Isola, P., Xiao, J., Torralba, A., & Oliva, A. (2011a). What makes an image memorable? 145–152. https://doi.org/10.1109/CVPR.2011.599572.
Isola, P., Parikh, D., Torralba, A., & Oliva, A. (2011b). Understanding the intrinsic memorability of images. In Advances in neural information processing systems.
Isola, P., Xiao, J., Parikh, D., Torralba, A., & Oliva, A. (2014). What makes a photograph memorable? IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7), 1469–1482. https://doi.org/10.1109/TPAMI.2013.200.
Jaegle, A., Mehrpour, V., Mohsenzadeh, Y., Meyer, T., Oliva, A., & Rust, N. (2019). Population response magnitude variation in inferotemporal cortex predicts image memorability. eLife, 8, 47596. https://doi.org/10.7554/eLife.47596.
Jozwik, K.M., Kriegeskorte, N., Cichy, R.M., & Mur, M. (2018). Deep convolutional neural networks, features, and categories perform similarly at explaining primate high-level visual representations. In 2018 Conference on cognitive computational neuroscience. https://doi.org/10.32470/CCN.2018.1232-0. Philadelphia: Cognitive Computational Neuroscience.
Judd, T., Ehinger, K., Durand, F., & Torralba, A. (2009). Learning to predict where humans look. In IEEE international conference on computer vision (ICCV).
Khosla, A., Bainbridge, W.A., Torralba, A., & Oliva, A. (2013). Modifying the memorability of face photographs. In 2013 IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2013.397(pp. 3200–3207). Sydney: IEEE.
Khosla, A., Das Sarma, A., & Hamid, R. (2014). What makes an image popular?. In Proceedings of the 23rd international conference on World Wide Web—WWW ’14. https://doi.org/10.1145/2566486.2567996 (pp. 867–876). Seoul: ACM Press.
Khosla, A., Raju, A.S., Torralba, A., & Oliva, A. (2015). Understanding and predicting image memorability at a large scale. In 2015 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2015.275 (pp. 2390–2398). Santiago: IEEE.
Koch, G.E., Akpan, E., & Coutanche, M.N. (2020). Image memorability is predicted at different stages of a convolutional neural network. bioRxiv https://www.biorxiv.org/content/early/2020/03/14/834796.full.pdf. https://doi.org/10.1101/834796.
Kramer, M., Hebart, M.H., Baker, C.I., & Bainbridge, W.A. (2021). Characterizing memorability in representational space: analyzing relative contributions of perceptual and conceptual information. In Vision Sciences Society.
Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.org/10.1145/3065386.
Lloyd, E.C., Shehzad, Z., Schebendach, J., Bakkour, A., Xue, A.M., Assaf, N.F., Jilani, R., Walsh, B.T., Steinglass, J., & Foerde, K. (2020). Food folio by columbia center for eating disorders: a freely available food image database. Frontiers in Psychology, 11, 3556. https://doi.org/10.3389/fpsyg.2020.585044.
Li, X., Bainbridge, W.A., & Bakkour, A. (2022). Memorable but not chosen: no effect of memorability on value-based decisions. PsyArXiv.
Machajdik, J., & Hanbury, A. (2010). Affective image classification using features inspired by psychology and art theory. In Proceedings of the 18th ACM international conference on multimedia. MM ’10. https://doi.org/10.1145/1873951.1873965 (pp. 83–92). New York: Association for Computing Machinery.
Mohsenzadeh, Y., Mullin, C., Oliva, A., & Pantazis, D. (2019). The perceptual neural trace of memorable unseen scenes. Scientific Reports, 9(1), 6033. https://doi.org/10.1038/s41598-019-42429-x.
Murray, N., Marchesotti, L., & Perronnin, F. (2012). AVA: a large-scale database for aesthetic visual analysis. In 2012 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2012.6247954(pp. 2408–2415). Providence: IEEE.
Olah, C., Mordvintsev, A., & Schubert, L. (2017). Feature visualization. Distill, 2(11), 10–2391500007. https://doi.org/10.23915/distill.00007.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., & Chintala, S. (2019). PyTorch: an imperative style, high-performance deep learning library. In advances in neural information processing systems 32 (pp. 8024-8035). Curran associates, inc. Retrieved from http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
Ramanathan, S., Hutchison, D., Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., Sudan, M., Terzopoulos, D., Tygar, D., Vardi, M.Y., Weikum, G., Katti, H., Sebe, N., Kankanhalli, M., & Chua, T. -S. (2010). An eye fixation database for saliency detection in images. In K. Daniilidis, P. Maragos, & N Paragios (Eds.) Computer vision—ECCV 2010 (Vol. 6314). https://doi.org/10.1007/978-3-642-15561-13 (pp. 30–43). Berlin: Springer.
Saleh, B., Farhadi, A., & Elgammal, A. (2013). Object-centric anomaly detection by attribute-based reasoning. In 2013 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2013.107 (pp. 787–794). Portland: IEEE.
Squalli-Houssaini, H., Duong, N.Q.K., Gwenaelle, M., & Demarty, C. -H. (2018). Deep Learning for Predicting Image Memorability. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/ICASSP.2018.8462292 (pp. 2371–2375). Calgary: IEEE.
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., & Torralba, A. (2010). SUN database: large-scale scene recognition from abbey to zoo. In 2010 IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2010.5539970 (pp. 3485–3492). San Francisco: IEEE.
Xie, W., Bainbridge, W.A., Inati, S.K., Baker, C.I., & Zaghloul, K.A. (2020). Memorability of words in arbitrary verbal associations modulates memory retrieval in the anterior temporal lobe. Nature Human Behaviour, 4(9), 937–948. https://doi.org/10.1038/s41562-020-0901-2.
Yamins, D.L.K., Hong, H., Cadieu, C.F., Solomon, E.A., Seibert, D., & DiCarlo, J.J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences, 111(23), 8619–8624. https://doi.org/10.1073/pnas.1403112111.
Acknowledgements
We would like to acknowledge Lore Goetschalckx for providing us with details on MemNet’s implementation. We would also like to acknowledge Deepasri Prasad and Max Kramer for information sharing and general feedback.
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
W.A. Bainbridge conceived of the presented idea. C.D. Needell designed, programmed, and tuned the model. C.D. Needell and W.A. Bainbridge wrote the manuscript. W.A. Bainbridge supervised the project.
Corresponding author
Ethics declarations
Ethics Approval
Not applicable
Consent to Participate
Not applicable
Consent for Publication
Not applicable
Conflict of Interest
Not applicable
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Needell, C.D., Bainbridge, W.A. Embracing New Techniques in Deep Learning for Estimating Image Memorability. Comput Brain Behav 5, 168–184 (2022). https://doi.org/10.1007/s42113-022-00126-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42113-022-00126-5