Skip to main content

Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2021)

Abstract

Experimental evaluations dealing with visual known-item search tasks, where real users look for previously observed and memorized scenes in a given video collection, represent a challenging methodological problem. Playing a searched “known” scene to users prior to the task start may not be sufficient in terms of scene memorization for re-identification (i.e., the search need may not necessarily be successfully “implanted”). On the other hand, enabling users to observe a known scene played in a loop may lead to unrealistic situations where users can exploit very specific details that would not remain in their memory in a common case. To address these issues, we present a proof-of-concept implementation of a new visual known-item search task presentation methodology that relies on a recently introduced deep saliency estimation method to limit the amount of revealed visual video contents. A filtering process predicts and subsequently removes information which in an unconstrained setting would likely not leave a lasting impression in the memory of a human observer. The proposed presentation setting is compliant with a realistic assumption that users perceive and memorize only a limited amount of information, and at the same time allows to play the known scene in the loop for verification purposes. The new setting also serves as a search clue equalizer, limiting the rich set of present exploitable content features in video and thus unifies the perceived information by different users. The performed evaluation demonstrates the feasibility of such a task presentation by showing that retrieval is still possible based on query videos processed by the proposed method. We postulate that such information incomplete tasks constitute the necessary next step to challenge and assess interactive multimedia retrieval systems participating at visual known-item search evaluation campaigns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.multimediaeval.org/.

  2. 2.

    https://ffmpeg.org/.

  3. 3.

    https://github.com/remega/OMCNN_2CLSTM.

  4. 4.

    https://github.com/lucaro/VideoSaliencyFilter.

References

  1. Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283 (2016). https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf

  2. Abeles, P.: BoofCV v0.25 (2016). http://boofcv.org/

  3. Akagunduz, E., Bors, A., Evans, K.: Defining image memorability using the visual memory schema. IEEE Trans. Pattern Anal. Mach. Intell. 42(9), 2165–2178 (2019)

    Article  Google Scholar 

  4. Andreadis, S., et al.: VERGE in VBS 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 778–783. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_69

    Chapter  Google Scholar 

  5. Bainbridge, W.A., Hall, E.H., Baker, C.I.: Drawings of real-world scenes during free recall reveal detailed object and spatial information in memory. Nat. Commun. 10(1), 1–13 (2019)

    Article  Google Scholar 

  6. Bainbridge, W.A., Isola, P., Oliva, A.: The intrinsic memorability of face photographs. J. Exp. Psychol. Gen. 142(4), 1323 (2013)

    Article  Google Scholar 

  7. Berns, F., Rossetto, L., Schoeffmann, K., Beecks, C., Awad, G.: V3C1 Dataset: an evaluation of content characteristics. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, pp. 334–338 (2019)

    Google Scholar 

  8. Brady, T.F., Konkle, T., Alvarez, G.A., Oliva, A.: Visual long-term memory has a massive storage capacity for object details. Proc. Nat. Acad. Sci. 105(38), 14325–14329 (2008)

    Article  Google Scholar 

  9. Chaudhry, R., Kilaru, M., Shekhar, S.: Show and recall@ MediaEval 2018 ViMemNet: predicting video memorability (2018)

    Google Scholar 

  10. Cohendet, R., Demarty, C.H., Duong, N., Sjöberg, M., Ionescu, B., Do, T.T.: MediaEval 2018: predicting media memorability task. arXiv preprint arXiv:1807.01052 (2018)

  11. Cohendet, R., Yadati, K., Duong, N.Q., Demarty, C.H.: Annotating, understanding, and predicting long-term video memorability. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 178–186. ACM (2018)

    Google Scholar 

  12. DeCarlo, D., Santella, A.: Stylization and abstraction of photographs. ACM Trans. Graph. 21(3), 769–776 (2002). https://doi.org/10.1145/566654.566650

    Article  Google Scholar 

  13. Deng, X., Xu, M., Jiang, L., Sun, X., Wang, Z.: Subjective-driven complexity control approach for HEVC. IEEE Trans. Circ. Syst. Video Technol. 26(1), 91–106 (2015)

    Article  Google Scholar 

  14. Dubey, R., Peterson, J., Khosla, A., Yang, M.H., Ghanem, B.: What makes an object memorable? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1089–1097 (2015)

    Google Scholar 

  15. Ephrat, A., et al.: Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. ACM Trans. Graph. 37(4), 112:1–112:11 (2018). https://doi.org/10.1145/3197517.3201357

    Article  Google Scholar 

  16. Fletcher, D.: Rocketman. Paramount Pictures, May 2019

    Google Scholar 

  17. Hayward, S.: Cinema Studies: The Key Concepts (Routledge Key Guides). Flashback, Routledge (2000)

    Google Scholar 

  18. Isola, P., Xiao, J., Parikh, D., Torralba, A., Oliva, A.: What makes a photograph memorable? IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1469–1482 (2013)

    Article  Google Scholar 

  19. Jiang, L., Xu, M., Liu, T., Qiao, M., Wang, Z.: DeepVS: a deep learning based video saliency prediction approach. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 625–642. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_37

    Chapter  Google Scholar 

  20. Jónsson, B., Khan, O.S., Koelma, D.C., Rudinac, S., Worring, M., Zahálka, J.: Exquisitor at the video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 796–802. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_72

  21. Khosla, A., Raju, A.S., Torralba, A., Oliva, A.: Understanding and predicting image memorability at a large scale. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2390–2398 (2015)

    Google Scholar 

  22. Kilbourn, R.: Memory and the Flashback in Cinema (2013). https://doi.org/10.1093/obo/9780199791286-0182

  23. Kim, B., Shim, J.Y., Park, M., Ro, Y.M.: Deep learning-based video retrieval using object relationships and associated audio classes. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 803–808. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_73

    Chapter  Google Scholar 

  24. Kratochvíl, M., Veselý, P., Mejzlík, F., Lokoč, J.: SOM-Hunter: video browsing with relevance-to-SOM feedback loop. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 790–795. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_71

    Chapter  Google Scholar 

  25. Le, N.-K., Nguyen, D.-H., Tran, M.-T.: An interactive video search platform for multi-modal retrieval with advanced concepts. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 766–771. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_67

    Chapter  Google Scholar 

  26. Leibetseder, A., Münzer, B., Primus, J., Kletz, S., Schoeffmann, K.: diveXplore 4.0: the ITEC deep interactive video exploration system at VBS2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 753–759. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_65

    Chapter  Google Scholar 

  27. Lokoč, J., Kovalčík, G., Souček, T.: VIRET at video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 784–789. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_70

    Chapter  Google Scholar 

  28. Mandler, J.M., Ritchey, G.H.: Long-term memory for pictures. J. Exp. Psychol. Hum. Learn. Mem. 3(4), 386 (1977)

    Article  Google Scholar 

  29. Nguyen, P.A., Wu, J., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 772–777. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_68

    Chapter  Google Scholar 

  30. Nolan, C.: Memento. In: Newmarket Films, September 2000

    Google Scholar 

  31. Park, S., Song, J., Park, M., Ro, Y.M.: IVIST: interactive video search tool in VBS 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 809–814. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_74

    Chapter  Google Scholar 

  32. Ramsay, D., Ananthabhotla, I., Paradiso, J.: The intrinsic memorability of everyday sounds. In: Audio Engineering Society Conference: 2019 AES International Conference on Immersive and Interactive Audio. Audio Engineering Society (2019)

    Google Scholar 

  33. Rodriguez-Hidalgo, A., Peláez-Moreno, C., Gallardo-Antolín, A.: Echoic log-surprise: a multi-scale scheme for acoustic saliency detection. Expert Syst. Appl. 114, 255–266 (2018)

    Article  Google Scholar 

  34. Rossetto, L., et al.: Interactive video retrieval in the age of deep learning-detailed evaluation of VBS 2019. IEEE Trans. Multimedia 23, 243–256 (2020)

    Article  Google Scholar 

  35. Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C – a research video collection. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11295, pp. 349–360. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05710-7_29

    Chapter  Google Scholar 

  36. Sauter, L., Amiri Parian, M., Gasser, R., Heller, S., Rossetto, L., Schuldt, H.: Combining boolean and multimedia retrieval in vitrivr for large-scale video search. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 760–765. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_66

    Chapter  Google Scholar 

  37. Savii, R.M., dos Santos, S.F., Almeida, J.: Gibis at MediaEval 2018: predicting media memorability task. In: Working Notes Proceedings of the MediaEval 2018 Workshop. CEUR-WS (2018)

    Google Scholar 

  38. Shekhar, S., Singal, D., Singh, H., Kedia, M., Shetty, A.: Show and recall: learning what makes videos memorable. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2730–2739 (2017)

    Google Scholar 

  39. Simons, D.J., Chabris, C.F.: Gorillas in our midst: sustained inattentional blindness for dynamic events. Perception 28(9), 1059–1074 (1999)

    Article  Google Scholar 

  40. Smeaton, A.F., et al.: Dublin’s participation in the predicting media memorability task at MediaEval, vol. 2018 (2018)

    Google Scholar 

  41. Wang, S., Wang, W., Chen, S., Jin, Q.: RUC at MediaEval 2018: visual and textual features exploration for predicting media memorability. In: Working Notes Proceedings of the MediaEval 2018 Workshop. CEUR-WS (2018)

    Google Scholar 

  42. Zhao, H., Gan, C., Rouditchenko, A., Vondrick, C., McDermott, J., Torralba, A.: The sound of pixels. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 587–604. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_35

    Chapter  Google Scholar 

Download references

Acknowledgements

The authors would like to thank all the participants of the 2020 Video Browser Showdown who contributed to the dedicated evaluation of the queries produced using the approach presented in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Rossetto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rossetto, L., Bailer, W., Bernstein, A. (2021). Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67832-6_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67831-9

  • Online ISBN: 978-3-030-67832-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics