Skip to main content

Adversarial frontier stitching for remote neural network watermarking

Abstract

The state-of-the-art performance of deep learning models comes at a high cost for companies and institutions, due to the tedious data collection and the heavy processing requirements. Recently, Nagai et al. (Int J Multimed Inf Retr 7(1):3–16, 2018), Uchida et al. (Embedding watermarks into deep neural networks, ICMR, 2017) proposed to watermark convolutional neural networks for image classification, by embedding information into their weights. While this is a clear progress toward model protection, this technique solely allows for extracting the watermark from a network that one accesses locally and entirely. Instead, we aim at allowing the extraction of the watermark from a neural network (or any other machine learning model) that is operated remotely, and available through a service API. To this end, we propose to mark the model’s action itself, tweaking slightly its decision frontiers so that a set of specific queries convey the desired information. In the present paper, we formally introduce the problem and propose a novel zero-bit watermarking algorithm that makes use of adversarial model examples. While limiting the loss of performance of the protected model, this algorithm allows subsequent extraction of the watermark using only few queries. We experimented the approach on three neural networks designed for image classification, in the context of MNIST digit recognition task.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. \({\hat{k}}_w+\varepsilon\)” stands for a small modification of the parameters of \({\hat{k}}_w\) that preserves the value of the model, i.e., that does not deteriorate significantly its performance.

  2. Code will be open-sourced on GitHub, upon article acceptance.

  3. This about \(3.5\%\) accuracy drop is also the one tolerated by a recent work on trojaning neural networks [18].

  4. https://github.com/DwangoMediaVillage/keras_compressor.

References

  1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/. Software available from tensorflow.org

  2. Adi Y, Baum C, Cisse M, Pinkas B, Keshet J (2018) Turning your weakness into a strength: watermarking deep neural networks by backdooring. In: 27th \(\{\)USENIX\(\}\) security symposium (\(\{\)USENIX\(\}\) security 18) pp 1615–1631

  3. Braudaway GW, Magerlein KA, Mintzer CF (1996) Color correct digital watermarking of images. United States Patent 5530759

  4. Carlini N, Wagner DA (2018) Audio adversarial examples: targeted attacks on speech-to-text. CoRR arXiv:1801.01944

  5. Chang CY, Su SJ (2005) A neural-network-based robust watermarking scheme. SMC, Santa Monica

    Book  Google Scholar 

  6. Chollet F et al. (2015) Keras. https://keras.io

  7. Davchev T, Korres T, Fotiadis S, Antonopoulos N, Ramamoorthy S (2019) An empirical evaluation of adversarial robustness under transfer learning. In: ICML workshop on understanding and improving generalization in deep learning

  8. Duddu V, Samanta D, Rao DV, Balas VE (2018) Stealing neural networks via timing side channels. CoRR arXiv:1812.11720

  9. Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: ICLR

  10. Grosse K, Manoharan P, Papernot N, Backes M, McDaniel PD (2017) On the (statistical) detection of adversarial examples. CoRR arXiv:1702.06280

  11. Guo J, Potkonjak M (2018) Watermarking deep neural networks for embedded systems. In: 2018 IEEE/ACM international conference on computer-aided design (ICCAD), pp 1–8. https://doi.org/10.1145/3240765.3240862

  12. Hartung F, Kutter M (1999) Multimedia watermarking techniques. Proc IEEE 87(7):1079–1107. https://doi.org/10.1109/5.771066

    Article  Google Scholar 

  13. Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. CoRR arXiv:1504.00941

  14. Le Merrer E, Perez P, Trédan G (2017) Adversarial frontier stitching for remote neural network watermarking. CoRR arXiv:1711.01894

  15. Le Merrer E, Trédan G (2019) Tampernn: efficient tampering detection of deployed neural nets. CoRR arXiv:1903.00317

  16. LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist

  17. Li S, Neupane A, Paul S, Song C, Krishnamurthy SV, Roy-Chowdhury AK, Swami A (2018) Adversarial perturbations against real-time video classification systems. CoRR arXiv:1807.00458

  18. Liu Y, Ma S, Aafer Y, Lee WC, Zhai J, Wang W, Zhang X (2017) Trojaning attack on neural networks. NDSS, New York

    Google Scholar 

  19. Moosavi-Dezfooli S, Fawzi A, Fawzi O, Frossard P (2017) Universal adversarial perturbations. In: CVPR

  20. Nagai Y, Uchida Y, Sakazawa S, Satoh S (2018) Digital watermarking for deep neural networks. Int J Multimed Inf Retr 7(1):3–16

    Article  Google Scholar 

  21. Oh SJ, Augustin M, Fritz M, Schiele B (2018) Towards reverse-engineering black-box neural networks. In: International conference on learning representations. https://openreview.net/forum?id=BydjJte0-

  22. Papernot N, Carlini N, Goodfellow I, Feinman R, Faghri F, Matyasko A, Hambardzumyan K, Juang YL, Kurakin A, Sheatsley R, Garg A, Lin YC (2017) cleverhans v2.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768

  23. Papernot N, McDaniel P, Goodfellow I, Jha S, Celik ZB, Swami A (2017) Practical black-box attacks against machine learning. In: ASIA CCS

  24. Papernot N, McDaniel P, Jha S, Fredrikson M, Berkay Celik Z, Swami A (2015) The limitations of deep learning in adversarial settings. arXiv preprint arXiv:1511.07528

  25. Papernot N, McDaniel PD, Jha S, Fredrikson M, Celik ZB, Swami A (2015) The limitations of deep learning in adversarial settings. arXiv preprint arXiv:1511.07528

  26. Rouhani BD, Chen H, Koushanfar F (2018) Deepsigns: A generic watermarking framework for IP protection of deep learning models. CoRR arXiv:1804.00750

  27. Rozsa A, Günther M, Boult TE (2016) Are accuracy and robustness correlated? In: ICMLA

  28. Sethi TS, Kantardzic M (2018) Data driven exploratory attacks on black box classifiers in adversarial domains. Neurocomputing 289:129–143. https://doi.org/10.1016/j.neucom.2018.02.007

    Article  Google Scholar 

  29. Shafahi A, Huang WR, Studer C, Feizi S, Goldstein T (2018) Are adversarial examples inevitable? CoRR arXiv:1809.02104

  30. Shin HC, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298. https://doi.org/10.1109/TMI.2016.2528162

    Article  Google Scholar 

  31. Tramèr F, Zhang F, Juels A, Reiter MK, Ristenpart T (2016) Stealing machine learning models via prediction apis. In: USENIX security symposium

  32. Tramèr F, Kurakin A, Papernot N, Boneh D, McDaniel P (2017) Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204

  33. Uchida Y, Nagai Y, Sakazawa S, Satoh S (2017) Embedding watermarks into deep neural networks. ICMR

  34. van den Berg E (2016) Some insights into the geometry and training of neural networks. arXiv preprint arXiv:1605.00329

  35. Van Schyndel RG, Tirkel AZ, Osborne CF (1994) A digital watermark. In: Proceedings of 1st international conference on image processing, vol 2. IEEE, pp 86–90

  36. Wang B, Gong NZ (2018) Stealing hyperparameters in machine learning. CoRR arXiv:1802.05351

  37. Yuan X, He P, Zhu Q, Li X (2019) Adversarial examples: attacks and defenses for deep learning. IEEE Transactions on neural networks and learning systems, pp 1–20. https://doi.org/10.1109/TNNLS.2018.2886017

  38. Zhang J, Gu Z, Jang J, Wu H, Stoecklin MP, Huang H, Molloy I (2018) Protecting intellectual property of deep neural networks with watermarking. In: Proceedings of the 2018 on Asia conference on computer and communications security. ACM, pp 159–172

  39. Zhao X, Liu Q, Zheng H, Zhao BY (2015) Towards graph watermarks. In: COSN

Download references

Acknowledgements

The authors would like to thank the reviewers for their constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erwan Le Merrer.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Le Merrer, E., Pérez, P. & Trédan, G. Adversarial frontier stitching for remote neural network watermarking. Neural Comput & Applic 32, 9233–9244 (2020). https://doi.org/10.1007/s00521-019-04434-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-019-04434-z

Keywords

  • Watermarking
  • Neural network models
  • Black box interaction
  • Adversarial examples
  • Model decision frontiers