Exploring the Impact of Training Data Bias on Automatic Generation of Video Captions

Smeaton, Alan F.; Graham, Yvette; McGuinness, Kevin; O’Connor, Noel E.; Quinn, Seán; Arazo Sanchez, Eric

doi:10.1007/978-3-030-05710-7_15

Exploring the Impact of Training Data Bias on Automatic Generation of Video Captions

Alan F. Smeaton¹⁸,
Yvette Graham¹⁸,
Kevin McGuinness¹⁸,
Noel E. O’Connor¹⁸,
Seán Quinn¹⁸ &
…
Eric Arazo Sanchez¹⁸

Conference paper
First Online: 08 December 2018

2662 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11295))

Abstract

A major issue in machine learning is availability of training data. While this historically referred to the availability of a sufficient volume of training data, recently this has shifted to the availability of sufficient unbiased training data. In this paper we focus on the effect of training data bias on an emerging multimedia application, the automatic captioning of short video clips. We use subsets of the same training data to generate different models for video captioning using the same machine learning technique and we evaluate the performances of different training data subsets using a well-known video caption benchmark, TRECVid. We train using the MSR-VTT video-caption pairs and we prune this to reduce and make the set of captions describing a video more homogeneously similar, or more diverse, or we prune randomly. We then assess the effectiveness of caption-generating trained with these variations using automatic metrics as well as direct assessment by human assessors. Our findings are preliminary and show that randomly pruning captions from the training data yields the worst performance and that pruning to make the data more homogeneous, or diverse, does improve performance slightly when compared to random. Our work points to the need for more training data, both more video clips but, more importantly, more captions for those videos.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aafaq, N., Gilani, S.Z., Liu, W., Mian, A.: Video description: a survey of methods, datasets and evaluation metrics. arXiv preprint arXiv:1806.00186 (2018)
Aneja, J., Deshpande, A., Schwing, A.G.: Convolutional image captioning. In: Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Awad, G., et al.: TRECVID 2017: evaluating ad-hoc and instance video search, events detection, video captioning and hyperlinking. In: Proceedings of TRECVID 2017. NIST (2017)
Google Scholar
Baeza-Yates, R.: Bias on the web. Commun. ACM 61(6), 54–61 (2018)
Article Google Scholar
Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. (Early Access) (2018). https://doi.org/10.1109/TPAMI.2018.2798607
Article Google Scholar
Chen, X., et al.: Microsoft COCO captions: data collection and evaluation server. CoRR, abs/1504.00325 (2015)
Google Scholar
Graham, Y., Awad, G., Smeaton, A.: Evaluation of automatic video captioning using direct assessment. CoRR, abs/1710.10586 (2017)
Google Scholar
Graham, Y., Mathur, N., Baldwin, T.: Randomized significance tests in machine translation. In: ACL 2014 Workshop on Statistical Machine Translation, pp. 266–274. Association for Computational Linguistics (2014)
Google Scholar
Han, L., Kashyap, A., Finin, T., Mayfield, J., Weese, J.: UMBC EBIQUITY-CORE: semantic textual similarity systems. Joint. Conf. Lex. Comput. Semant. 1, 44–52 (2013)
Google Scholar
Iacobacci, I., Pilehvar, M.T., Navigli, R.: SensEmbed: learning sense embeddings for word and relational similarity. In: Proceedings of ACL, pp. 95–105 (2015)
Google Scholar
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Article MathSciNet Google Scholar
Karpathy, A.: Connecting images and natural language. Ph.D. thesis, Stanford University, August 2016
Google Scholar
Kashyap, A., et al.: Robust semantic text similarity using LSA, machine learning, and linguistic resources. Lang. Resour. Eval. 50(1), 125–161 (2016)
Article Google Scholar
Kilickaya, M., Erdem, A., Ikizler-Cinbis, N., Erdem, E.: Re-evaluating automatic metrics for image captioning. In: Proceedings of EACL, April 2017
Google Scholar
Marsden, M., et al.: Dublin City University and partners’ participation in the INS and VTT tracks at TRECVid 2016. In: Proceedings of TREVid, NIST, Gaithersburg, MD, USA (2016)
Google Scholar
Pan, Y., Mei, T., Yao, T., Li, H., Rui, Y.: Jointly modeling embedding and translation to bridge video and language. In: Computer Vision and Pattern Recognition (CVPR), pp. 4594–4602 (2016)
Google Scholar
Pérez-Mayos, L., Sukno, F.M., Wanner, L.: Improving the quality of video-to-language models by optimizing annotation of the training material. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10704, pp. 279–290. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73603-7_23
Chapter Google Scholar
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: MIR 2006: International Workshop on Multimedia Information Retrieval, pp. 321–330 (2006)
Google Scholar
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence – video to text. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Xu, J., Mei, T., Yao, T., Rui, Y.: MSR-VTT: a large video description dataset for bridging video and language. In: Computer Vision and Pattern Recognition (CVPR), pp. 5288–5296, June 2016
Google Scholar
Zhou, L., Zhou, Y., Corso, J.J., Socher, R., Xiong, C.: End-to-end dense video captioning with masked transformer. In: Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar

Download references

Acknowledgements

This work is supported by Science Foundation Ireland under grant numbers 12/RC/2289 and 15/SIRG/3283.

Author information

Authors and Affiliations

Insight Centre for Data Analytics, Dublin City University, Dublin 9, Ireland
Alan F. Smeaton, Yvette Graham, Kevin McGuinness, Noel E. O’Connor, Seán Quinn & Eric Arazo Sanchez

Authors

Alan F. Smeaton
View author publications
You can also search for this author in PubMed Google Scholar
Yvette Graham
View author publications
You can also search for this author in PubMed Google Scholar
Kevin McGuinness
View author publications
You can also search for this author in PubMed Google Scholar
Noel E. O’Connor
View author publications
You can also search for this author in PubMed Google Scholar
Seán Quinn
View author publications
You can also search for this author in PubMed Google Scholar
Eric Arazo Sanchez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan F. Smeaton .

Editor information

Editors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smeaton, A.F., Graham, Y., McGuinness, K., O’Connor, N.E., Quinn, S., Arazo Sanchez, E. (2019). Exploring the Impact of Training Data Bias on Automatic Generation of Video Captions. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11295. Springer, Cham. https://doi.org/10.1007/978-3-030-05710-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-05710-7_15
Published: 08 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05709-1
Online ISBN: 978-3-030-05710-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics