Dong, J., Li, X., Xu, C., Ji, S., He, Y., et al.: Dual encoding for zero-example video retrieval. In: Proceedings of IEEE Conference on CVPR 2019, pp. 9346–9355 (2019)
Google Scholar
Faghri, F., Fleet, D.J., et al.: VSE++: improving visual-semantic embeddings with hard negatives. In: Proceedings of the British Machine Vision Conference (BMVC) (2018)
Google Scholar
Galanopoulos, D., Mezaris, V.: Attention mechanisms, signal encodings and fusion strategies for improved ad-hoc video search with dual encoding networks. In: Proceedings of the ACM International Conference on Multimedia Retrieval, (ICMR 2020). ACM (2020)
Google Scholar
Gkountakos, K., Dimou, A., Papadopoulos, G.T., Daras, P.: Incorporating textual similarity in video captioning schemes. In: 2019 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC), pp. 1–6. IEEE (2019)
Google Scholar
Ye, G., Li, Y., Xu, H., et al.: EventNet: a large scale structured concept library for complex event detection in video. In: Proceedings of the ACM MM (2015)
Google Scholar
Hara, K., et al.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and imagenet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Jegou, H., et al.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2010)
CrossRef
Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., et al.: OpenImages: a public dataset for large-scale multi-label and multi-class image classification (2017). https://storage.googleapis.com/openimages/web/index.html
Li, Y., Song, Y., Cao, L., Tetreault, J., et al.: TGIF: a new dataset and benchmark on animated GIF description. In: Proceedings of IEEE CVPR 2016 (2016)
Google Scholar
Markatopoulou, F., Moumtzidou, A., Galanopoulos, D., et al.: ITI-CERTH participation in TRECVID 2017. In: Proceedings of the TRECVID 2017 Workshop, USA (2017)
Google Scholar
Pittaras, N., Markatopoulou, F., Mezaris, V., Patras, I.: Comparison of fine-tuning and extension strategies for deep convolutional neural networks. In: Amsaleg, L., Guðmundsson, G.Þ., Gurrin, C., Jónsson, B.Þ., Satoh, S. (eds.) MMM 2017. LNCS, vol. 10132, pp. 102–114. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-51811-4_9
CrossRef
Google Scholar
Schoeffmann, K.: Video browser showdown 2012–2019: a review. In: 2019 International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–4. IEEE (2019)
Google Scholar
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Tan, W.R., Chan, C.S., Aguirre, H.E., Tanaka, K.: Ceci n’est pas une pipe: a deep convolutional network for fine-art paintings classification. In: 2016 IEEE ICIP, pp. 3703–3707. IEEE (2016)
Google Scholar
Venugopalan, S., Rohrbach, M., Donahue, J., et al.: Sequence to sequence-video to text. In: Proceedings of the IEEE ICCV, pp. 4534–4542 (2015)
Google Scholar
Xu, J., Mei, T., Yao, T., Rui, Y.: MSR-VTT: a large video description dataset for bridging video and language. In: The IEEE Conference on CVPR, June 2016
Google Scholar
Zhou, B., Lapedriza, A., et al.: Places: a 10 million image database for scene recognition. IEEE Trans. PAMI 40(6), 1452–1464 (2017)
CrossRef
Google Scholar