Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations

Rossetto, Luca; Bailer, Werner; Bernstein, Abraham

doi:10.1007/978-3-030-67832-6_49

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12572))

Included in the following conference series:

International Conference on Multimedia Modeling

2502 Accesses
2 Citations

Abstract

Experimental evaluations dealing with visual known-item search tasks, where real users look for previously observed and memorized scenes in a given video collection, represent a challenging methodological problem. Playing a searched “known” scene to users prior to the task start may not be sufficient in terms of scene memorization for re-identification (i.e., the search need may not necessarily be successfully “implanted”). On the other hand, enabling users to observe a known scene played in a loop may lead to unrealistic situations where users can exploit very specific details that would not remain in their memory in a common case. To address these issues, we present a proof-of-concept implementation of a new visual known-item search task presentation methodology that relies on a recently introduced deep saliency estimation method to limit the amount of revealed visual video contents. A filtering process predicts and subsequently removes information which in an unconstrained setting would likely not leave a lasting impression in the memory of a human observer. The proposed presentation setting is compliant with a realistic assumption that users perceive and memorize only a limited amount of information, and at the same time allows to play the known scene in the loop for verification purposes. The new setting also serves as a search clue equalizer, limiting the rich set of present exploitable content features in video and thus unifies the perceived information by different users. The performed evaluation demonstrates the feasibility of such a task presentation by showing that retrieval is still possible based on query videos processed by the proposed method. We postulate that such information incomplete tasks constitute the necessary next step to challenge and assess interactive multimedia retrieval systems participating at visual known-item search evaluation campaigns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Video Search with Collage Queries

Unified Image and Video Saliency Modeling

Summarizing First-Person Videos from Third Persons’ Points of Views

Notes

References

Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283 (2016). https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
Abeles, P.: BoofCV v0.25 (2016). http://boofcv.org/
Akagunduz, E., Bors, A., Evans, K.: Defining image memorability using the visual memory schema. IEEE Trans. Pattern Anal. Mach. Intell. 42(9), 2165–2178 (2019)
Article Google Scholar
Andreadis, S., et al.: VERGE in VBS 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 778–783. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_69
Chapter Google Scholar
Bainbridge, W.A., Hall, E.H., Baker, C.I.: Drawings of real-world scenes during free recall reveal detailed object and spatial information in memory. Nat. Commun. 10(1), 1–13 (2019)
Article Google Scholar
Bainbridge, W.A., Isola, P., Oliva, A.: The intrinsic memorability of face photographs. J. Exp. Psychol. Gen. 142(4), 1323 (2013)
Article Google Scholar
Berns, F., Rossetto, L., Schoeffmann, K., Beecks, C., Awad, G.: V3C1 Dataset: an evaluation of content characteristics. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, pp. 334–338 (2019)
Google Scholar
Brady, T.F., Konkle, T., Alvarez, G.A., Oliva, A.: Visual long-term memory has a massive storage capacity for object details. Proc. Nat. Acad. Sci. 105(38), 14325–14329 (2008)
Article Google Scholar
Chaudhry, R., Kilaru, M., Shekhar, S.: Show and recall@ MediaEval 2018 ViMemNet: predicting video memorability (2018)
Google Scholar
Cohendet, R., Demarty, C.H., Duong, N., Sjöberg, M., Ionescu, B., Do, T.T.: MediaEval 2018: predicting media memorability task. arXiv preprint arXiv:1807.01052 (2018)
Cohendet, R., Yadati, K., Duong, N.Q., Demarty, C.H.: Annotating, understanding, and predicting long-term video memorability. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp. 178–186. ACM (2018)
Google Scholar
DeCarlo, D., Santella, A.: Stylization and abstraction of photographs. ACM Trans. Graph. 21(3), 769–776 (2002). https://doi.org/10.1145/566654.566650
Article Google Scholar
Deng, X., Xu, M., Jiang, L., Sun, X., Wang, Z.: Subjective-driven complexity control approach for HEVC. IEEE Trans. Circ. Syst. Video Technol. 26(1), 91–106 (2015)
Article Google Scholar
Dubey, R., Peterson, J., Khosla, A., Yang, M.H., Ghanem, B.: What makes an object memorable? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1089–1097 (2015)
Google Scholar
Ephrat, A., et al.: Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation. ACM Trans. Graph. 37(4), 112:1–112:11 (2018). https://doi.org/10.1145/3197517.3201357
Article Google Scholar
Fletcher, D.: Rocketman. Paramount Pictures, May 2019
Google Scholar
Hayward, S.: Cinema Studies: The Key Concepts (Routledge Key Guides). Flashback, Routledge (2000)
Google Scholar
Isola, P., Xiao, J., Parikh, D., Torralba, A., Oliva, A.: What makes a photograph memorable? IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1469–1482 (2013)
Article Google Scholar
Jiang, L., Xu, M., Liu, T., Qiao, M., Wang, Z.: DeepVS: a deep learning based video saliency prediction approach. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 625–642. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_37
Chapter Google Scholar
Jónsson, B., Khan, O.S., Koelma, D.C., Rudinac, S., Worring, M., Zahálka, J.: Exquisitor at the video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 796–802. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_72
Khosla, A., Raju, A.S., Torralba, A., Oliva, A.: Understanding and predicting image memorability at a large scale. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2390–2398 (2015)
Google Scholar
Kilbourn, R.: Memory and the Flashback in Cinema (2013). https://doi.org/10.1093/obo/9780199791286-0182
Kim, B., Shim, J.Y., Park, M., Ro, Y.M.: Deep learning-based video retrieval using object relationships and associated audio classes. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 803–808. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_73
Chapter Google Scholar
Kratochvíl, M., Veselý, P., Mejzlík, F., Lokoč, J.: SOM-Hunter: video browsing with relevance-to-SOM feedback loop. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 790–795. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_71
Chapter Google Scholar
Le, N.-K., Nguyen, D.-H., Tran, M.-T.: An interactive video search platform for multi-modal retrieval with advanced concepts. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 766–771. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_67
Chapter Google Scholar
Leibetseder, A., Münzer, B., Primus, J., Kletz, S., Schoeffmann, K.: diveXplore 4.0: the ITEC deep interactive video exploration system at VBS2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 753–759. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_65
Chapter Google Scholar
Lokoč, J., Kovalčík, G., Souček, T.: VIRET at video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 784–789. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_70
Chapter Google Scholar
Mandler, J.M., Ritchey, G.H.: Long-term memory for pictures. J. Exp. Psychol. Hum. Learn. Mem. 3(4), 386 (1977)
Article Google Scholar
Nguyen, P.A., Wu, J., Ngo, C.-W., Francis, D., Huet, B.: VIREO @ video browser showdown 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 772–777. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_68
Chapter Google Scholar
Nolan, C.: Memento. In: Newmarket Films, September 2000
Google Scholar
Park, S., Song, J., Park, M., Ro, Y.M.: IVIST: interactive video search tool in VBS 2020. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 809–814. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_74
Chapter Google Scholar
Ramsay, D., Ananthabhotla, I., Paradiso, J.: The intrinsic memorability of everyday sounds. In: Audio Engineering Society Conference: 2019 AES International Conference on Immersive and Interactive Audio. Audio Engineering Society (2019)
Google Scholar
Rodriguez-Hidalgo, A., Peláez-Moreno, C., Gallardo-Antolín, A.: Echoic log-surprise: a multi-scale scheme for acoustic saliency detection. Expert Syst. Appl. 114, 255–266 (2018)
Article Google Scholar
Rossetto, L., et al.: Interactive video retrieval in the age of deep learning-detailed evaluation of VBS 2019. IEEE Trans. Multimedia 23, 243–256 (2020)
Article Google Scholar
Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3C – a research video collection. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, W.-H., Vrochidis, S. (eds.) MMM 2019. LNCS, vol. 11295, pp. 349–360. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05710-7_29
Chapter Google Scholar
Sauter, L., Amiri Parian, M., Gasser, R., Heller, S., Rossetto, L., Schuldt, H.: Combining boolean and multimedia retrieval in vitrivr for large-scale video search. In: Ro, Y.M., et al. (eds.) MMM 2020. LNCS, vol. 11962, pp. 760–765. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_66
Chapter Google Scholar
Savii, R.M., dos Santos, S.F., Almeida, J.: Gibis at MediaEval 2018: predicting media memorability task. In: Working Notes Proceedings of the MediaEval 2018 Workshop. CEUR-WS (2018)
Google Scholar
Shekhar, S., Singal, D., Singh, H., Kedia, M., Shetty, A.: Show and recall: learning what makes videos memorable. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2730–2739 (2017)
Google Scholar
Simons, D.J., Chabris, C.F.: Gorillas in our midst: sustained inattentional blindness for dynamic events. Perception 28(9), 1059–1074 (1999)
Article Google Scholar
Smeaton, A.F., et al.: Dublin’s participation in the predicting media memorability task at MediaEval, vol. 2018 (2018)
Google Scholar
Wang, S., Wang, W., Chen, S., Jin, Q.: RUC at MediaEval 2018: visual and textual features exploration for predicting media memorability. In: Working Notes Proceedings of the MediaEval 2018 Workshop. CEUR-WS (2018)
Google Scholar
Zhao, H., Gan, C., Rouditchenko, A., Vondrick, C., McDermott, J., Torralba, A.: The sound of pixels. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 587–604. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_35
Chapter Google Scholar

Download references

Acknowledgements

The authors would like to thank all the participants of the 2020 Video Browser Showdown who contributed to the dedicated evaluation of the queries produced using the approach presented in this paper.

Author information

Authors and Affiliations

University of Zurich, Zurich, Switzerland
Luca Rossetto & Abraham Bernstein
Joanneum Research, Graz, Austria
Werner Bailer

Authors

Luca Rossetto
View author publications
You can also search for this author in PubMed Google Scholar
Werner Bailer
View author publications
You can also search for this author in PubMed Google Scholar
Abraham Bernstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Rossetto .

Editor information

Editors and Affiliations

Charles University, Prague, Czech Republic
Jakub Lokoč
Charles University, Prague, Czech Republic
Tomáš Skopal
Klagenfurt University, Klagenfurt, Austria
Klaus Schoeffmann
CERTH-ITI, Thessaloniki, Greece
Vasileios Mezaris
Renmin University of China, Beijing, China
Xirong Li
CERTH-ITI, Thessaloniki, Greece
Stefanos Vrochidis
Queen Mary University of London, London, UK
Ioannis Patras

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rossetto, L., Bailer, W., Bernstein, A. (2021). Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12572. Springer, Cham. https://doi.org/10.1007/978-3-030-67832-6_49

Download citation

DOI: https://doi.org/10.1007/978-3-030-67832-6_49
Published: 21 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67831-9
Online ISBN: 978-3-030-67832-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations

Abstract

Access this chapter

Similar content being viewed by others

Video Search with Collage Queries

Unified Image and Video Saliency Modeling

Summarizing First-Person Videos from Third Persons’ Points of Views

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Considering Human Perception and Memory in Interactive Multimedia Retrieval Evaluations

Abstract

Access this chapter

Similar content being viewed by others

Video Search with Collage Queries

Unified Image and Video Saliency Modeling

Summarizing First-Person Videos from Third Persons’ Points of Views

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation