Skip to main content

Marine Video Kit: A New Marine Video Dataset for Content-Based Analysis and Retrieval

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2023)


Effective analysis of unusual domain specific video collections represents an important practical problem, where state-of-the-art general purpose models still face limitations. Hence, it is desirable to design benchmark datasets that challenge novel powerful models for specific domains with additional constraints. It is important to remember that domain specific data may be noisier (e.g., endoscopic or underwater videos) and often require more experienced users for effective search. In this paper, we focus on single-shot videos taken from moving cameras in underwater environments, which constitute a nontrivial challenge for research purposes. The first shard of a new Marine Video Kit dataset is presented to serve for video retrieval and other computer vision challenges. Our dataset is used in a special session during Video Browser Showdown 2023. In addition to basic meta-data statistics, we present several insights based on low-level features as well as semantic annotations of selected keyframes. The analysis also contains experiments showing limitations of respected general purpose models for retrieval. Our dataset and code are publicly available at

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000)

    Google Scholar 

  2. Chen, J., Chen, X., Ma, L., Jie, Z., Chua, T.S.: Temporally grounding natural sentence in video. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018)

    Google Scholar 

  3. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  4. Derya, A., Anthony, H., Suchendra, B.: Mouss dataset (2018)

    Google Scholar 

  5. Fabbri, C., Islam, M.J., Sattar, J.: Enhancing underwater imagery using generative adversarial networks. In: 2018 IEEE International Conference on Robotics and Automation (ICRA) (2018)

    Google Scholar 

  6. Fisher, R.B., Chen-Burger, Y.H., Giordano, D., Hardman, L., Lin, F.P., et al.: Fish4Knowledge: collecting and analyzing massive coral reef fish video data, vol. 104. Springer (2016)

    Google Scholar 

  7. Gurrin, C., et al.: Introduction to the fifth annual lifelog search challenge. In: International Conference on Multimedia Retrieval (2022)

    Google Scholar 

  8. Heller, S., et al.: Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th video browser showdown. Int. J. Multimed. Inf. Retr. 11(1), 1–18 (2022)

    Google Scholar 

  9. Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Niebles, J.C.: Dense-captioning events in videos. In: International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  10. Levy, D., Levy, D., Belfer, Y., Osherov, E., Bigal, E., Scheinin, A.P., Nativ, H., Tchernov, D., Treibitz, T.: Automated analysis of marine video with limited data. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)

    Google Scholar 

  11. Li, Q., Li, J., Shi, Z., Gu, Z., Zheng, H., Zheng, B., Li, J.: A holistic marine video dataset. In: OCEANS 2021: San Diego - Porto (2021)

    Google Scholar 

  12. Li, X.L., Liang, P.: Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190 (2021)

  13. Lokoč, J., Souček, T.: How many neighbours for known-item search? In: Similarity Search and Applications - 14th International Conference, SISAP 2021 Proceedings (2021)

    Google Scholar 

  14. Mithun, N.C., Li, J., Metze, F., Roy-Chowdhury, A.K.: Learning joint embedding with multimodal cues for cross-modal video-text retrieval. In: Proceeding of International Conference on Multimedia Retrieval (ICMR). ACM (2018)

    Google Scholar 

  15. Mokady, R., Hertz, A., Bermano, A.H.: Clipcap: Clip prefix for image captioning. arXiv preprint arXiv:2111.09734 (2021)

  16. Pedersen, M., Haurum, J.B., Gade, R., Moeslund, T.B., Madsen, N.: Detection of marine animals in a new underwater dataset with varying visibility. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)

    Google Scholar 

  17. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  18. Rossetto, L., Schuldt, H., Awad, G., Butt, A.A.: V3c-a research video collection. In: International Conference on Multimedia Modeling (2019)

    Google Scholar 

  19. Tomar, S.: Converting video formats with ffmpeg. Linux Journal (2006)

    Google Scholar 

  20. Tunai, P.M., Alexandra, B.A., Maia, H.: A contrast-guided approach for the enhancement of low-lighting underwater images. J. Imaging 5(10), 79 (2019)

    Google Scholar 

  21. Xu, J., Mei, T., Yao, T., Rui, Y.: Msr-vtt: A large video description dataset for bridging video and language. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  22. Youngjae, Y., Jongseok, K., Gunhee, K.: A joint sequence fusion model for video question answering and retrieval. In: Proceeding of European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  23. Zhou, L., Xu, C., Corso, J.J.: Towards automatic learning of procedures from web instructional videos. In: AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  24. Zhuang, P., Wang, Y., Qiao, Y.: Wildfish: a large benchmark for fish recognition in the wild. In: Proceeding of ACM Multimedia Conference on Multimedia Conference (2018)

    Google Scholar 

Download references


This research project is partially supported by an internal grant from HKUST (R9429), the Innovation and Technology Support Programme of the Innovation and Technology Fund (Ref: ITS/200/20FP), the Marine Conservation Enhancement Fund (MCEF20107), Charles University grant (SVV-260588), and the Innovation Team Project of Universities in Guangdong Province (No. 2020KCXTD023).

Disclaimer. Any opinions, findings, conclusions, or recommendations expressed in this material do not necessarily reflect the views of HKLTL, CAPCO, HK Electric, and the Marine Conservation Enhancement Fund.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Quang-Trung Truong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Truong, QT. et al. (2023). Marine Video Kit: A New Marine Video Dataset for Content-Based Analysis and Retrieval. In: Dang-Nguyen, DT., et al. MultiMedia Modeling. MMM 2023. Lecture Notes in Computer Science, vol 13833. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-27076-5

  • Online ISBN: 978-3-031-27077-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics