Abstract
Obtaining well-performed deep neural networks usually requires expensive data collection and training procedures. Accordingly, they are valuable intellectual properties of their owners. However, recent literature revealed that the adversaries can easily “steal” models by acquiring their function-similar copy, even when they have no training samples and information about the victim models. In this chapter, we introduce a robust and harmless model watermark, based on which we design a model ownership verification via hypothesis test. In particular, our model watermark is persistent during complicated stealing processes and does not introduce additional security risks. Specifically, our defense consists of three main stages. First, we watermark the model by embedding external features, based on modifying some training samples via style transfer. After that, we train a meta-classifier to determine whether a suspicious model is stolen from the victim, based on model gradients. The final ownership verification is judged by hypothesis test. Extensive experiments on CIFAR-10 and ImageNet datasets verify the effectiveness of our defense under both centralized training and federated learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adi, Y., Baum, C., Cisse, M., Pinkas, B., Keshet, J.: Turning your weakness into a strength: watermarking deep neural networks by backdooring. In: USENIX Security (2018)
Chandrasekaran, V., Chaudhuri, K., Giacomelli, I., Jha, S., Yan, S.: Exploring connections between active learning and model extraction. In: USENIX Security (2020)
Chen, X., Zhang, Y., Wang, Y., Shu, H., Xu, C., Xu, C.: Optical flow distillation: Towards efficient and stable video style transfer. In: ECCV (2020)
Cheng, S., Liu, Y., Ma, S., Zhang, X.: Deep feature space Trojan attack of neural networks by controlled detoxification. In: AAAI (2021)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Duan, R., Ma, X., Wang, Y., Bailey, J., Qin, A.K., Yang, Y.: Adversarial camouflage: hiding physical-world attacks with natural styles. In: CVPR (2020)
Fang, G., Song, J., Shen, C., Wang, X., Chen, D., Song, M.: Data-free adversarial distillation (2019). arXiv preprint arXiv:1912.11006
Geiping, J., Fowl, L., Huang, W.R., Czaja, W., Taylor, G., Moeller, M., Goldstein, T.: Witches’ brew: industrial scale data poisoning via gradient matching. In: ICLR (2021)
Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F.A., Brendel, W.: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. In: ICLR (2019)
Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: BadNets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019)
Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., Bennamoun, M.: Deep learning for 3d point clouds: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4338–4364 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NeurIPS Workshop (2014)
Hogg, R.V., McKean, J., Craig, A.T.: Introduction to Mathematical Statistics. Pearson Education (2005)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV (2017)
Jagielski, M., Carlini, N., Berthelot, D., Kurakin, A., Papernot, N.: High accuracy and high fidelity extraction of neural networks. In: USENIX Security (2020)
Jia, H., Choquette-Choo, C.A., Chandrasekaran, V., Papernot, N.: Entangled watermarks as a defense against model extraction. In: USENIX Security (2021)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: ECCV (2016)
Juuti, M., Szyller, S., Marchal, S., Asokan, N.: PRADA: protecting against DNN model stealing attacks. In: EuroS&P (2019)
Kesarwani, M., Mukhoty, B., Arya, V., Mehta, S.: Model extraction warning in MLaaS paradigm. In: ACSAC (2018)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lee, T., Edwards, B., Molloy, I., Su, D.: Defending against neural network model stealing attacks using deceptive perturbations. In: IEEE S&P Workshop (2019)
Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Signal Proc. Mag. 37(3), 50–60 (2020)
Li, Y., Jiang, Y., Li, Z., Xia, S.-T.: Backdoor learning: a survey. IEEE Trans. Neural Netw. Learn. Syst. (2022).
Li, Y., Zhang, Z., Bai, J., Wu, B., Jiang, Y., Xia, S.T.: Open-sourced dataset protection via backdoor watermarking. In: NeurIPS Workshop (2020)
Li, Y., Zhong, H., Ma, X., Jiang, Y., Xia, S.T.: Few-shot backdoor attacks on visual object tracking. In: ICLR (2022)
Li, Y., Zhu, L., Jia, X., Jiang, Y., Xia, S.T., Cao, X.: Defending against model stealing via verifying embedded external features. In: AAAI (2022)
Li, Y., Li, Y., Wu, B., Li, L., He, R., Lyu, S.: Invisible backdoor attack with sample-specific triggers. In: ICCV (2021)
Liu, H., Weng, Z., Zhu, Y.: Watermarking deep neural networks with greedy residuals. In: ICML (2021)
Maini, P., Yaghini, M., Papernot, N.: Dataset inference: Ownership resolution in machine learning. In: ICLR (2021)
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: AISTATS (2017)
Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (2021).
Nguyen, T.A., Tran, A.: Input-aware dynamic backdoor attack. In: NeurIPS (2020)
Nguyen, T.A., Tran, A.T.: WaNet-imperceptible warping-based backdoor attack. In: ICLR (2021)
Orekondy, T., Schiele, B., Fritz, M.: Knockoff nets: stealing functionality of black-box models. In: CVPR (2019)
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: AsiaCCS (2017)
Sachs, L.: Applied Statistics: A Handbook of Techniques. Springer, Berlin (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Stokes, J.M., Yang, K., Swanson, K., Jin, W., Cubillos-Ruiz, A., Donghia, N.M., MacNair, C.R., French, S., Carfrae, L.A., Bloom-Ackermann, Z., et al.: A deep learning approach to antibiotic discovery. Cell 180(4), 688–702 (2020)
Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction APIs. In: USENIX Security (2016)
Wang, T., Kerschbaum, F.: RIGA: covert and robust white-box watermarking of deep neural networks. In: WWW (2021)
Yan, H., Li, X., Li, H., Li, J., Sun, W., Li, F.: Monitoring-based differential privacy mechanism against query flooding-based model extraction attack. IEEE Trans. Depend. Secure Comput. (2021)
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)
Zhai, T., Li, Y., Zhang, Z., Wu, B., Jiang, Y., Xia, S.-T.: Backdoor attack against speaker verification. In: ICASSP (2021)
Zhang, J., Gu, Z., Jang, J., Wu, H., Stoecklin, M.P., Huang, H., Molloy, I.: Protecting intellectual property of deep neural networks with watermarking. In: AsiaCCS (2018)
Zhang, J., Chen, D., Liao, J., Zhang, W., Hua, G., Yu, N.: Passport-aware normalization for deep model protection. In: NeurIPS (2020)
Zhu, L., Liu, X., Li, Y., Yang, X., Xia, S.-T., Lu, R.: A fine-grained differentially private federated learning against leakage from gradients. IEEE Internet Things J. (2021)
Acknowledgements
We sincerely thank Xiaojun Jia from Chinese Academy of Science and Professor Xiaochun Cao from Sun Yat-sen University for their constructive comments and helpful suggestions on an early draft of this chapter.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Li, Y., Zhu, L., Bai, Y., Jiang, Y., Xia, ST. (2023). The Robust and Harmless Model Watermarking. In: Fan, L., Chan, C.S., Yang, Q. (eds) Digital Watermarking for Machine Learning Model. Springer, Singapore. https://doi.org/10.1007/978-981-19-7554-7_4
Download citation
DOI: https://doi.org/10.1007/978-981-19-7554-7_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7553-0
Online ISBN: 978-981-19-7554-7
eBook Packages: Computer ScienceComputer Science (R0)