Skip to main content

An Interactive Video Search Tool: A Case Study Using the V3C1 Dataset

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12573))

Included in the following conference series:

  • 1788 Accesses

Abstract

This paper presents a prototype of an interactive video search tool for the preparation of MMM 2021 Video Browser Showdown (VBS). Our tool is tailored to enable searching for the public V3C1 dataset associated with various analysis results including detected objects, speech recognition, and visual features. It supports two types of searches: text-based and visual-based. With a text-based search, the tool enables users for querying videos using their textual descriptions, while with a visual-based search, one provides a video example to search for similar videos. Metadata extracted by recent state-of-the-art computer vision algorithms for object detection and visual features are used for accurate search. For an efficient search, the metadata are managed in two database engines: Whoosh and PostgreSQL. The tool also enables users to refine the search results by providing relevance feedback and customizing the intermediate analysis of the query inputs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.statista.com/statistics/259477/hours-of-video-uploaded-to-youtube-every-minute/.

  2. 2.

    When considering only nouns, the size of the vocabulary of DenseCap is 849 objects.

  3. 3.

    https://developer.apple.com/documentation/coreimage/cifacefeature.

  4. 4.

    https://pypi.org/project/langdetect/.

  5. 5.

    https://www.nltk.org/book/ch05.html.

  6. 6.

    https://www.postgresql.org/docs/10/cube.html.

  7. 7.

    https://www.postgresql.org/docs/11/textsearch-indexes.html.

  8. 8.

    If a keyframe does not match the detected objects keyword but matches the other query keywords, it is omitted from the retrieved keyframes.

References

  1. Alfarrarjeh, A., et al.: Hybrid indexes for spatial-visual search. In: Thematic Workshops of ACM MM, pp. 75–83 (2017)

    Google Scholar 

  2. Alfarrarjeh, A., et al.: A data-centric approach for image scene localization. In: Big Data, pp. 594–603. IEEE (2018)

    Google Scholar 

  3. Alfarrarjeh, A., et al.: A class of R*-tree indexes for spatial-visual search of geo-tagged street images. In: ICDE, pp. 1990–1993. IEEE (2020)

    Google Scholar 

  4. Berns, F., et al.: V3C1 dataset: an evaluation of content characteristics. In: ICMR, pp. 334–338 (2019)

    Google Scholar 

  5. Explosion Inc.: spaCy (2020). https://spacy.io/

  6. Johnson, J., et al.: DenseCap: fully convolutional localization networks for dense captioning. In: CVPR, pp. 4565–4574 (2016)

    Google Scholar 

  7. Kasutani, E., Yamada, A.: The MPEG-7 color layout descriptor: a compact image feature description for high-speed image/video segment retrieval. In: ICIP, vol. 1, pp. 674–677. IEEE (2001)

    Google Scholar 

  8. Kim, S.H., et al.: MediaQ: mobile multimedia management system. In: MMSys, pp. 224–235 (2014)

    Google Scholar 

  9. Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  10. Nazir, A., et al.: Content based image retrieval system by using HSV color histogram, discrete wavelet transform and edge histogram descriptor. In: iCoMET, pp. 1–6. IEEE (2018)

    Google Scholar 

  11. Smith, R.: An overview of the tesseract OCR engine. In: ICDAR, vol. 2, pp. 629–633. IEEE (2007)

    Google Scholar 

  12. Tolias, G., et al.: Particular object retrieval with integral max-pooling of CNN activations. In: ICLR (2016)

    Google Scholar 

  13. Yee, K.P., et al.: Faceted metadata for image search and browsing. In: HCI, pp. 401–408 (2003)

    Google Scholar 

  14. Zoph, B., et al.: Learning transferable architectures for scalable image recognition. In: CVPR, pp. 8697–8710 (2018)

    Google Scholar 

Download references

Acknowledgment

This research has been supported in part by the USC Integrated Media Systems Center and unrestricted cash gifts from Oracle. The authors also acknowledge the USC Center for Advanced Research Computing (CARC) for providing computing resources for conducting some of the experiments. Also, thanks to Dr. Aiichiro Nakno for his help in using CARC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdullah Alfarrarjeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alfarrarjeh, A., Yoon, J., Kim, S.H., Abu Jabal, A., Nagaraj, A., Siddaramaiah, C. (2021). An Interactive Video Search Tool: A Case Study Using the V3C1 Dataset. In: Lokoč, J., et al. MultiMedia Modeling. MMM 2021. Lecture Notes in Computer Science(), vol 12573. Springer, Cham. https://doi.org/10.1007/978-3-030-67835-7_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67835-7_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67834-0

  • Online ISBN: 978-3-030-67835-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics