Skip to main content

Detect and Remove Watermark in Deep Neural Networks via Generative Adversarial Networks

  • 572 Accesses

Part of the Lecture Notes in Computer Science book series (LNSC,volume 13118)

Abstract

Deep neural networks (DNN) have achieved remarkable performance in various fields. However, training a DNN model from scratch requires expensive computing resources and a lot of training data, which are difficult to obtain for most individual users. To this end, intellectual property (IP) infringement of deep learning models is an emerging problem in recent years. Pre-trained models may be stolen or abused by illegal users without the permission of the model owner. Recently, many works have been proposed to protect the intellectual property of DNN models. Among these works, embedding watermarks into DNN based on backdoor is one of the widely used methods. However, the backdoor-based watermark faces the risk of being detected or removed by an adversary. In this paper, we propose a scheme to detect and remove backdoor-based watermark in deep neural networks via generative adversarial networks (GAN). The proposed attack method consists of two phases. In the first phase, we use the GAN and few clean images to detect the watermarked class and reverse the watermark trigger in a DNN model. In the second phase, we fine-tune the watermarked DNN with the reversed backdoor images to remove the backdoor watermark. Experimental results on the MNIST and CIFAR-10 datasets demonstrate that, the proposed method can effectively remove watermarks in DNN models, as the watermark retention rates of the watermarked LeNet-5 and ResNet-18 models reduce from 99.99% to 1.2% and from 99.99% to 1.4%, respectively. Meanwhile, the proposed attack only introduces a very slight influence on the performance of the DNN model. The test accuracy of the watermarked DNN on the MNIST and CIFAR-10 datasets drops by only 0.77% and 2.67%, respectively. Compared with existing watermark removal works, the proposed attack can successfully remove the backdoor-based DNN watermarking with fewer data, and can reverse the watermark trigger and the watermark class from the DNN model.

Keywords

  • Deep neural networks
  • Intellectual property protection
  • Watermark removal
  • Generative adversarial networks
  • Fine-tuning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-91356-4_18
  • Chapter length: 17 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-91356-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Adi, Y., Baum, C., Cissé, M., Pinkas, B., Keshet, J.: Turning your weakness into a strength: watermarking deep neural networks by backdooring. In: 27th USENIX Security Symposium, pp. 1615–1631 (2018)

    Google Scholar 

  2. Aiken, W., Kim, H., Woo, S.S., Ryoo, J.: Neural network laundering: removing black-box backdoor watermarks from deep neural networks. Comput. Secur. 106, 1–14 (2021)

    CrossRef  Google Scholar 

  3. Allen, D.M.: Mean square error prediction as a criterion for selecting regression variables. Technometrics 13(3), 469–475 (1971)

    CrossRef  Google Scholar 

  4. Chen, H., Fu, C., Zhao, J., Koushanfar, F.: DeepInspect: a black-box trojan detection and mitigation framework for deep neural networks. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 4658–4664 (2019)

    Google Scholar 

  5. Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv:1712.05526 (2017)

  6. Chen, X., et al.: REFIT: a unified watermark removal framework for deep learning systems with limited data. In: ACM Asia Conference on Computer and Communications Security, pp. 321–335 (2021)

    Google Scholar 

  7. Deng, L.: The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Sig. Process. Mag. 29(6), 141–142 (2012)

    CrossRef  Google Scholar 

  8. Goodfellow, I.J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  9. Harrington, P.: Machine Learning in Action, 1st edn, Manning Publications, Shelter Island, April 2012

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  11. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical Report (2009)

    Google Scholar 

  12. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    CrossRef  Google Scholar 

  13. Liu, X., Li, F., Wen, B., Li, Q.: Removing backdoor-based watermarks in neural networks with limited data. In: 25th International Conference on Pattern Recognition, pp. 10149–10156 (2020)

    Google Scholar 

  14. Merrer, E.L., Pérez, P., Trédan, G.: Adversarial frontier stitching for remote neural network watermarking. Neural Comput. Appl. 32(13), 9233–9244 (2020)

    CrossRef  Google Scholar 

  15. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv:1411.1784 (2014)

  16. Ribeiro, M., Grolinger, K., Capretz, M.A.M.: MLaaS: machine learning as a service. In: 14th IEEE International Conference on Machine Learning and Applications, pp. 896–902 (2015)

    Google Scholar 

  17. Shafieinejad, M., Lukas, N., Wang, J., Li, X., Kerschbaum, F.: On the robustness of backdoor-based watermarking in deep neural networks. In: Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, pp. 177–188 (2021)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations, pp. 1–14 (2015)

    Google Scholar 

  19. Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction APIs. In: 25th USENIX Security Symposium, pp. 601–618 (2016)

    Google Scholar 

  20. Uchida, Y., Nagai, Y., Sakazawa, S., Satoh, S.: Embedding watermarks into deep neural networks. In: Proceedings of the ACM on International Conference on Multimedia Retrieval, pp. 269–277 (2017)

    Google Scholar 

  21. Wang, B., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: IEEE Symposium on Security and Privacy, pp. 707–723 (2019)

    Google Scholar 

  22. Wang, T., Kerschbaum, F.: Attacks on digital watermarks for deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2622–2626 (2019)

    Google Scholar 

  23. Xiao, C., Li, B., Zhu, J., He, W., Liu, M., Song, D.: Generating adversarial examples with adversarial networks. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 3905–3911 (2018)

    Google Scholar 

  24. Xue, M., He, C., Wang, J., Liu, W.: One-to-N & N-to-one: Two advanced backdoor attacks against deep learning models. IEEE Transactions on Dependable and Secure Computing, pp. 1–17, early access (2020)

    Google Scholar 

  25. Xue, M., Wang, J., Liu, W.: DNN intellectual property protection: taxonomy, attacks and evaluations (Invited paper). In: Great Lakes Symposium on VLSI, pp. 455–460 (2021)

    Google Scholar 

  26. Xue, M., Wu, Z., He, C., Wang, J., Liu, W.: Active DNN IP protection: a novel user fingerprint management and DNN authorization control technique. In: 19th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 975–982 (2020)

    Google Scholar 

  27. Xue, M., Yuan, C., Wu, H., Zhang, Y., Liu, W.: Machine learning security: threats, countermeasures, and evaluations. IEEE Access 8, 74720–74742 (2020)

    CrossRef  Google Scholar 

  28. Yang, Z., Dang, H., Chang, E.: Effectiveness of distillation attack and countermeasure on neural network watermarking. arXiv:1906.06046 (2019)

  29. Zhang, J., et al.: Protecting intellectual property of deep neural networks with watermarking. In: Proceedings of the Asia Conference on Computer and Communications Security, pp. 159–172 (2018)

    Google Scholar 

  30. Zhu, L., Ning, R., Wang, C., Xin, C., Wu, H.: GangSweep: sweep out neural backdoors by GAN. In: The 28th ACM International Conference on Multimedia, pp. 3173–3181 (2020)

    Google Scholar 

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 61602241).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingfu Xue .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Sun, S., Wang, H., Xue, M., Zhang, Y., Wang, J., Liu, W. (2021). Detect and Remove Watermark in Deep Neural Networks via Generative Adversarial Networks. In: Liu, J.K., Katsikas, S., Meng, W., Susilo, W., Intan, R. (eds) Information Security. ISC 2021. Lecture Notes in Computer Science(), vol 13118. Springer, Cham. https://doi.org/10.1007/978-3-030-91356-4_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91356-4_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91355-7

  • Online ISBN: 978-3-030-91356-4

  • eBook Packages: Computer ScienceComputer Science (R0)