Advertisement

Multiple instance learning based deep CNN for image memorability prediction

  • Sathisha BasavarajuEmail author
  • Arijit Sur
Article
  • 45 Downloads

Abstract

Image memorability is a recent topic in the domain of computer vision, which enables one to measure the degree at which images are memorable to human cognitive system. Initial research on image memorability shown that memorability is an inherent characteristic of an image, and humans are consistent in remembering images. Further, it is also demonstrated that memorability of an image can be determined using machine learning and computer vision techniques. In this paper, a novel deep learning based image memorability prediction model is proposed. The proposed model automatically learns and utilises multiple visual factors such as object semantics, visual emotions, and saliency to predict image memorability scores. In particular, the proposed model employs multiple instance learning framework to utilise emotion cues evoking from single global context and multiple local contexts of an image. An extensive set of experiments are being carried out on large-scale image memorability dataset LaMem. The experimental results show that the proposed model performs better than current state-of-the-art models by reaching a rank correlation of 0.67, which is close to human consistency (ρ = 0.68).

Keywords

Deep learning Image memorability Memorability and emotions Memorability and saliency Multiple instance learning 

Notes

References

  1. 1.
    Anderson AK, Wais PE, Gabrieli JD (2006) Emotion enhances remembrance of neutral events past. Proc Natl Acad Sci 103(5):1599–1604CrossRefGoogle Scholar
  2. 2.
    Baveye Y, Cohendet R, Perreira Da Silva M, Le Callet P (2016) Deep learning for image memorability prediction: the emotional bias. In: Proceedings of the 2016 ACM on multimedia conference. ACM, pp 491–495Google Scholar
  3. 3.
    Blackwell AF (1997) Correction: a picture is worth 84.1 words. In: Proceedings of the first ESP student workshop, pp 15–22Google Scholar
  4. 4.
    Borkin MA, Vo AA, Bylinskii Z, Isola P, Sunkavalli S, Oliva A, Pfister H (2013) What makes a visualization memorable? IEEE Trans Vis Comput Graph 19(12):2306–2315CrossRefGoogle Scholar
  5. 5.
    Bradley MM, Greenwald MK, Petry MC, Lang PJ (1992) Remembering pictures: pleasure and arousal in memory. J Exp Psychol Learn Mem Cogn 18(2):379CrossRefGoogle Scholar
  6. 6.
    Brady TF, Konkle T, Alvarez GA, Oliva A (2008) Visual long-term memory has a massive storage capacity for object details. Proc Natl Acad Sci 105 (38):14325–14329CrossRefGoogle Scholar
  7. 7.
    Carbonneau MA, Cheplygina V, Granger E, Gagnon G (2017) Multiple instance learning: a survey of problem characteristics and applications. Pattern RecognGoogle Scholar
  8. 8.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 1. IEEE, pp 886–893Google Scholar
  9. 9.
    Dietterich TG, Lathrop RH, Lozano-Pérez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1-2):31–71CrossRefGoogle Scholar
  10. 10.
    Dubey R, Peterson J, Khosla A, Yang MH, Ghanem B (2015) What makes an object memorable?. In: Proceedings of the IEEE international conference on computer vision, pp 1089–1097Google Scholar
  11. 11.
    Everingham M, Winn J (2010) The pascal visual object classes challenge 2010 (voc2010) development kitGoogle Scholar
  12. 12.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  13. 13.
    Hilbert M (2012) How much information is there in the “information society”? Significance 9(4):8–12CrossRefGoogle Scholar
  14. 14.
    Huiskes MJ, Lew MS (2008) The mir flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM, pp 39–43Google Scholar
  15. 15.
    Hunt RR, Worthen JB (2006) Distinctiveness and memory. Oxford University Press, LondonCrossRefGoogle Scholar
  16. 16.
    Isola P, Parikh D, Torralba A, Oliva A (2011) Understanding the intrinsic memorability of images. In: Advances in neural information processing systems, pp 2429–2437Google Scholar
  17. 17.
    Isola P, Xiao J, Torralba A, Oliva A (2011) What makes an image memorable?. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 145–152Google Scholar
  18. 18.
    Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 2106–2113Google Scholar
  19. 19.
    Khosla A, Das Sarma A, Hamid R (2014) What makes an image popular?. In: Proceedings of the 23rd international conference on world wide web. ACM, pp 867–876Google Scholar
  20. 20.
    Khosla A, Raju AS, Torralba A, Oliva A (2015) Understanding and predicting image memorability at a large scale. In: 2015 IEEE international conference on computer vision (ICCV). IEEE, pp 2390–2398Google Scholar
  21. 21.
    Khosla A, Xiao J, Isola P, Torralba A, Oliva A (2012) Image memorability and visual inception. In: SIGGRAPH Asia 2012 technical briefs. ACMGoogle Scholar
  22. 22.
    Khosla A, Xiao J, Torralba A, Oliva A (2012) Memorability of image regions. In: Advances in neural information processing systems, pp 296–304Google Scholar
  23. 23.
    Konkle T, Brady TF, Alvarez GA, Oliva A (2010) Conceptual distinctiveness supports detailed visual long-term memory for real-world objects. J Exp Psychol Gen 139(3):558CrossRefGoogle Scholar
  24. 24.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105Google Scholar
  25. 25.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 2169–2178Google Scholar
  26. 26.
    Li W, Duan L, Xu D, Tsang IWH (2011) Text-based image retrieval using progressive multi-instance learning. In: 2011 IEEE international conference on computer vision (ICCV). IEEE, pp 2049–2055Google Scholar
  27. 27.
    Li Y, Hou X, Koch C, Rehg JM, Yuille AL (2014) The secrets of salient object segmentation. Georgia Institute of TechnologyGoogle Scholar
  28. 28.
    Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440Google Scholar
  29. 29.
    Lu X, Lin Z, Shen X, Mech R, Wang J (2015) Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In: Proceedings of the IEEE international conference on computer vision, pp 990–998Google Scholar
  30. 30.
    Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on multimedia. ACM, pp 83–92Google Scholar
  31. 31.
    Mancas M, Le Meur O (2013) Memorability of natural scenes: The role of attention. In: 2013 20th IEEE international conference on image processing (ICIP). IEEE, pp 196–200Google Scholar
  32. 32.
    Maqsood I, Khan MR, Abraham A (2004) An ensemble of neural networks for weather forecasting. Neural Comput Applic 13(2):112–122CrossRefGoogle Scholar
  33. 33.
    Maren S (1999) Long-term potentiation in the amygdala: a mechanism for emotional learning and memory. Trends Neurosci 22(12):561–567CrossRefGoogle Scholar
  34. 34.
    Murray N, Marchesotti L, Perronnin F (2012) Ava: a large-scale database for aesthetic visual analysis. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2408–2415Google Scholar
  35. 35.
    Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRefGoogle Scholar
  36. 36.
    Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175CrossRefGoogle Scholar
  37. 37.
    Pan J, Sayrol E, Giro-i Nieto X, McGuinness K, O’Connor NE (2016) Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 598–606Google Scholar
  38. 38.
    Peng H, Li K, Li B, Ling H, Xiong W, Hu W (2015) Predicting image memorability by multi-view adaptive regression. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 1147–1150Google Scholar
  39. 39.
    Perrone MP, Cooper LN (1995) When networks disagree: Ensemble methods for hybrid neural networks. In: How we learn; how we remember: Toward an understanding of brain and neural systems: Selected papers of Leon N Cooper. World scientific, pp 342–358Google Scholar
  40. 40.
    Phelps EA (2004) Human emotion and memory: interactions of the amygdala and hippocampal complex. Curr Opin Neurobiol 14(2):198–202CrossRefGoogle Scholar
  41. 41.
    Pinheiro PO, Collobert R (2015) From image-level to pixel-level labeling with convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1713–1721Google Scholar
  42. 42.
    Ramanathan S, Katti H, Sebe N, Kankanhalli M, Chua TS (2010) An eye fixation database for saliency detection in images. In: European conference on computer vision. Springer, pp 30–43Google Scholar
  43. 43.
    Rao T, Xu M, Liu H, Wang J, Burnett I (2016) Multi-scale blocks based image emotion classification using multiple instance learning. In: 2016 IEEE international conference on image processing (ICIP). IEEE, pp 634–638Google Scholar
  44. 44.
    Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149CrossRefGoogle Scholar
  45. 45.
    Rock I, Englestein P (1959) A study of memory for visual form. The American Journal of PsychologyGoogle Scholar
  46. 46.
    Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252MathSciNetCrossRefGoogle Scholar
  47. 47.
    Saleh B, Farhadi A, Elgammal A (2013) Object-centric anomaly detection by attribute-based reasoning. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 787–794Google Scholar
  48. 48.
    Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8Google Scholar
  49. 49.
    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  50. 50.
    Song HO, Lee YJ, Jegelka S, Darrell T (2014) Weakly-supervised discovery of visual pattern configurations. In: Advances in neural information processing systems, pp 1637–1645Google Scholar
  51. 51.
    Standing L (1973) Learning 10000 pictures. Q J Exp Psychol 25(2):207–222CrossRefGoogle Scholar
  52. 52.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, et al. (2015) Going deeper with convolutions. CvprGoogle Scholar
  53. 53.
    Van De Weijer J, Schmid C, Verbeek J (2007) Learning color names from real-world images. In: IEEE conference on computer vision and pattern recognition, 2007. CVPR’07. IEEE, pp 1–8Google Scholar
  54. 54.
    Vijayanarasimhan S, Grauman K (2008) Keywords to visual categories: Multiple-instance learning forweakly supervised object categorization. In: IEEE conference on computer vision and pattern recognition, 2008. CVPR 2008. IEEE, pp 1–8Google Scholar
  55. 55.
    Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3485–3492Google Scholar
  56. 56.
    Xu Y, Mo T, Feng Q, Zhong P, Lai M, Eric I, Chang C (2014) Deep learning of feature representation with multiple instance learning for medical image analysis. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1626–1630Google Scholar
  57. 57.
    Zhang C, Platt JC, Viola PA (2006) Multiple instance boosting for object detection. In: Advances in neural information processing systems, pp 1417–1424Google Scholar
  58. 58.
    Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp 487–495Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of CSEIndian Institute of Technology GuwahatiGuwahatiIndia

Personalised recommendations