Abstract
The development of digital watermarking on machine learning models focuses solely on classification tasks, and other tasks are forgotten. In this chapter, we demonstrate that image captioning tasks cannot be adequately protected by the present digital watermarking architecture, which are generally considered as one of the most difficult AI challenges. To safeguard the image captioning model, we propose two distinct embedding strategies in the recurrent neural network’s hidden memory state. We demonstrate through empirical evidence that a forged key will result in an unusable image captioning model, negating the intent of infringement. This is the first attempt, as far as we are aware, to propose ownership protection for the image captioning model. The effectiveness of our proposed approach to withstand different attacks without compromising the original image captioning performance has been demonstrated by the experiments on the MS-COCO and Flickr30k datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adi, Y., Baum, C., Cisse, M., Pinkas, B., Keshet, J.: Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In: USENIX, pp. 1615–1631 (2018)
Anderson, P., Fernando, B., Johnson, M., Gould, S.: Spice: Semantic propositional image caption evaluation. In: European Conference on Computer Vision, pp. 382–398. Springer, Berlin (2016)
Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)
Banerjee, S., Lavie, A.: Meteor: An automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp. 65–72 (2005)
Bernardi, R., Cakici, R., Elliott, D., Erdem, A., Erdem, E., Ikizler-Cinbis, N., Keller, F., Muscat, A., Plank, B.: Automatic description generation from images: A survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, 409–442 (2016)
Chen, H., Rouhani, B.D., Fu, C., Zhao, J., Koushanfar, F.: DeepMarks: A secure fingerprinting framework for digital rights management of deep learning models. In: Proceedings of the 2019 on International Conference on Multimedia Retrieval, pp. 105–113 (2019)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)
Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10578–10587 (2020)
Darvish Rouhani, B., Chen, H., Koushanfar, F.: DeepSigns: An end-to-end watermarking framework for ownership protection of deep neural networks. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 485–497 (2019)
Ding, S., Qu, S., Xi, Y., Sangaiah, A.K., Wan, S.: Image caption generation with high-level image features. Pattern Recogn. Lett. 123, 89–95 (2019)
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Fan, L., Ng, K.W., Chan, C.S.: Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks. In: Advances in Neural Information Processing Systems, pp. 4716–4725 (2019)
Guo, J., Potkonjak, M.: Watermarking deep neural networks for embedded systems. In: Proceedings of the 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–8. IEEE, New York (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
He, X., Shi, B., Bai, X., Xia, G.S., Zhang, Z., Dong, W.: Image caption generation with part of speech guidance. Pattern Recogn. Lett. 119, 229–237 (2019)
Herdade, S., Kappeler, A., Boakye, K., Soares, J.: Image captioning: Transforming objects into words. Adv. Neural Inf. Proces. Syst. 32, 1–12 (2019)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Ji, J., Du, Z., Zhang, X.: Divergent-convergent attention for image captioning. Pattern Recogn. 115, 107928 (2021)
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
Kinga, D., Ba Adam, J.: A method for stochastic optimization. In: ICLR, vol. 5 (2015)
Le Merrer, E., Perez, P., Trédan, G.: Adversarial frontier stitching for remote neural network watermarking. Neural Comput. Applic. 32(13), 1–12 (2019)
Li, G., Zhu, L., Liu, P., Yang, Y.: Entangled transformer for image captioning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8928–8937 (2019)
Lim, J.H., Chan, C.S.: Mask captioning network. In: Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), pp. 1–5. IEEE, New York (2019)
Lin, C.-Y.: ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. Barcelona, Spain (2004). Association for Computational Linguistics
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer, Berlin (2014)
Luo, Y., Ji, J., Sun, X., Cao, L., Wu, Y., Huang, F., Lin, C.-W., Ji, R.: Dual-level collaborative transformer for image captioning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2286–2293 (2021)
Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (m-rnn). In: International Conference on Learning Representations (2015)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics, New York (2002)
Quan, Y., Teng, H., Chen, Y., Ji, H.: Watermarking deep neural networks in image processing. IEEE Transactions on Neural Networks and Learning Systems 32(5), 1852–1865 (2020)
Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7008–7024 (2017)
Uchida, Y., Nagai, Y., Sakazawa, S., Satoh, S.: Embedding watermarks into deep neural networks. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval (ICMR ’17), pp. 269–277, New York, NY, USA (2017). Association for Computing Machinery, New York
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Proces. Syst. 30, 1–11 (2017)
Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4566–4575 (2015)
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: A neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
Wang, J., Wang, W., Wang, L., Wang, Z., Feng, D.D., Tan, T.: Learning visual relationship and context-aware attention for image captioning. Pattern Recogn. 98, 107075 (2020)
Xiao, X., Wang, L., Ding, K., Xiang, S., Pan, C.: Dense semantic embedding network for image captioning. Pattern Recogn. 90, 285–296 (2019)
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. T-ACL 2, 67–78 (2014)
Zhang, J., Gu, Z., Jang, J., Wu, H., Stoecklin, M.P., Huang, H., Molloy, I.: Protecting intellectual property of deep neural networks with watermarking. In: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pp. 159–172 (2018)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Lim, J.H. (2023). Ownership Protection for Image Captioning Models. In: Fan, L., Chan, C.S., Yang, Q. (eds) Digital Watermarking for Machine Learning Model. Springer, Singapore. https://doi.org/10.1007/978-981-19-7554-7_8
Download citation
DOI: https://doi.org/10.1007/978-981-19-7554-7_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7553-0
Online ISBN: 978-981-19-7554-7
eBook Packages: Computer ScienceComputer Science (R0)