Skip to main content

Common Component in Black-Boxes Is Prone to Attacks

  • 1940 Accesses

Part of the Lecture Notes in Computer Science book series (LNSC,volume 12972)

Abstract

Neural network models are getting increasingly complex. Large models are often modular, consisting of multiple separate sharable components. The development of such components may require specific domain knowledge, intensive computation power, and large datasets. Therefore, there is a high incentive for companies to keep these components proprietary. However, when a common component is included in multiple black-box models, it could potentially provide another attack vector and weaken security. In this paper, we present a method that “extracts” the common component from black-box models, using only limited resources. With a small number of data samples, an attacker can (1) obtain accurate information about the shared component, stealing propriety information of the intellectual property, and (2) utilize this component to train new tasks or execute subsequent attacks such as model cloning, class inversion, and adversarial attacks more effectively. Comprehensive experiments demonstrate that our proposed method successfully extracts the common component through hard-label and black-box access only. Moreover, the consequent attacks are also effective against straightforward defenses that introduce noise and dummy classifiers.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-88418-5_28
  • Chapter length: 21 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-88418-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.

Notes

  1. 1.

    We assume the adversary does not have the resources to get a large number of training data.

  2. 2.

    https://github.com/timesler/facenet-pytorch.

  3. 3.

    https://github.com/clovaai/voxceleb_trainer.

References

  1. Brendel, W., Rauber, J., Bethge, M.: Decision-based adversarial attacks: reliable attacks against black-box machine learning models. In: ICLR (2018)

    Google Scholar 

  2. Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: a dataset for recognising faces across pose and age. In: FG, pp. 67–74 (2018)

    Google Scholar 

  3. Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: IEEE S&P, pp. 39–57 (2017)

    Google Scholar 

  4. Chen, P., Sharma, Y., Zhang, H., Yi, J., Hsieh, C.: EAD: elastic-net attacks to deep neural networks via adversarial examples. In: AAAI, pp. 10–17 (2018)

    Google Scholar 

  5. Chen, S., He, Z., Sun, C., Huang, X.: Universal adversarial attack on attention and the resulting dataset damagenet. arXiv preprint 2001.06325 (2020)

    Google Scholar 

  6. Chung, J.S., et al.: In defence of metric learning for speaker recognition. In: Interspeech (2020)

    Google Scholar 

  7. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)

    Google Scholar 

  8. Dong, Y., Liao, F., Pang, T., Hu, X., Zhu, J.: Discovering adversarial examples with momentum. arXiv preprint 1710.06081 (2017)

    Google Scholar 

  9. Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: NIPS, pp. 658–666 (2016)

    Google Scholar 

  10. Dosovitskiy, A., Brox, T.: Inverting visual representations with convolutional networks. In: CVPR, pp. 4829–4837 (2016)

    Google Scholar 

  11. Fredrikson, M., Jha, S., Ristenpart, T.: Model inversion attacks that exploit confidence information and basic countermeasures. In: ACM CCS, pp. 1322–1333 (2015)

    Google Scholar 

  12. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)

    Google Scholar 

  14. Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: NIPS, pp. 833–840 (2002)

    Google Scholar 

  15. Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)

    Google Scholar 

  16. Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. In: ICLR (2017)

    Google Scholar 

  17. Lee, S., Kil, R.M.: Inverse mapping of continuous functions using local and global information. IEEE Trans. Neural Netw. 5(3), 409–423 (1994)

    CrossRef  Google Scholar 

  18. Lowd, D., Meek, C.: Adversarial learning. In: ACM SIGKDD, pp. 641–647 (2005)

    Google Scholar 

  19. Lu, B., Kita, H., Nishikawa, Y.: Inverting feedforward neural networks using linear and nonlinear programming. IEEE Trans. Neural Netw. 10(6), 1271–1290 (1999)

    CrossRef  Google Scholar 

  20. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)

    Google Scholar 

  21. Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: CVPR, pp. 5188–5196 (2015)

    Google Scholar 

  22. Milli, S., Schmidt, L., Dragan, A.D., Hardt, M.: Model reconstruction from model explanations. In: FAT*, pp. 1–9 (2019)

    Google Scholar 

  23. Moosavi-Dezfooli, S., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: CVPR, pp. 2574–2582 (2016)

    Google Scholar 

  24. Nagrani, A., Chung, J.S., Xie, W., Zisserman, A.: Voxceleb: large-scale speaker verification in the wild. Comput. Speech Lang. 60, 101027 (2020)

    CrossRef  Google Scholar 

  25. Nash, C., Kushman, N., Williams, C.K.I.: Inverting supervised representations with autoregressive neural density models. In: AISTATS, vol. 89, pp. 1620–1629 (2019)

    Google Scholar 

  26. Ng, H., Winkler, S.: A data-driven approach to cleaning large face datasets. In: ICIP, pp. 343–347 (2014)

    Google Scholar 

  27. Oh, S.J., Schiele, B., Fritz, M.: Towards reverse-engineering black-box neural networks. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 121–144. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28954-6_7

    CrossRef  Google Scholar 

  28. Orekondy, T., Schiele, B., Fritz, M.: Knockoff nets: stealing functionality of black-box models. In: CVPR, pp. 4954–4963 (2019)

    Google Scholar 

  29. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: ICASSP, pp. 5206–5210 (2015)

    Google Scholar 

  30. Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: AsiaCCS, pp. 506–519 (2017)

    Google Scholar 

  31. Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520 (2018)

    Google Scholar 

  32. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: CVPR, pp. 815–823 (2015)

    Google Scholar 

  33. Szegedy, C., et al.: Intriguing properties of neural networks. In: ICLR (2014)

    Google Scholar 

  34. Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction apis. In: USENIX (2016)

    Google Scholar 

  35. Wang, B., Gong, N.Z.: Stealing hyperparameters in machine learning. In: IEEE S&P, pp. 36–52 (2018)

    Google Scholar 

  36. Yang, Z., Zhang, J., Chang, E., Liang, Z.: Neural network inversion in adversarial setting via background knowledge alignment. In: ACM CCS, pp. 225–240 (2019)

    Google Scholar 

Download references

Acknowledgement

This research is supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore. This work is partly supported by the Biomedical Research Council of the Agency for Science, Technology, and Research, Singapore.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiyi Zhang .

Editor information

Editors and Affiliations

A Evaluation on Time Series Audio Data and Speaker Classification

A Evaluation on Time Series Audio Data and Speaker Classification

The proposed strategy is generic and can be applied on different types of data and neural network architecture. We repeat a similar set of experiments as Sect. 4 on audio data and voice classification tasks.

1.1 A.1 Dataset

This evaluation was done using LibriSpeech [29] dataset. The version we use contains 100 h of English speech from 251 unique speakers. We use 100 speakers to train the victim embedding classifiers and use 100 speakers to attack the victims. The remaining speakers are reserved for analysis.

1.2 A.2 Model Setup

We choose a SpeakerNet model [6] trained on development set of VoxCeleb2 [24] which contains 145,569 voice recordings of 5,994 speakers with data augmentation, as the victim embedder. The weights are directly obtained from GitHub repositoryFootnote 3. For embedding classifiers, we also use simple models with only two fully connected layers and same train and test splitting as Sect. 4. Similar to our evaluation on image dataset, we also test a set of different combinations of settings for victims. During the attack, to construct the tree-like substitute, we use the ResNet34Half [13] as the trunk and shallow fully connected networks as branches.

1.3 A.3 Attack Process

We query the victim classifiers as black-boxes. We use 9,061 unlabeled voice recordings from 100 speakers which have no overlap with the training data of victims for the attack. The number of recordings we use is around 6.22% of the original dataset used to training the victim embedder and the number of speakers we use is around 1.67% of the original dataset. For all combination of settings, we train 100 epochs and save the best models.

1.4 A.4 Benchmark of Embedder Extraction

Clustering Capability of Embedder (Q1). We visualize the clustering capability of the extracted embedders using 964 voice recordings of 10 speakers from a separate testing dataset. Here we are using the embedders extracted from victims with 10 classes. The embeddings generated by the original embedder and extracted embedder are projected to 2D space using t-Distributed Stochastic Neighbor Embedding (t-SNE) [14]. We also try training from scratch using the same amount of data we used to query the victim model. However, the amount of data is too little to generate any meaningful result.

In Fig. 6, we can see that embedders extracted from multiple victim classifiers indeed have significantly better clustering capabilities. The embedder extracted from a single victim performs poorly.

Fig. 6.
figure 6

Comparison of capabilities in clustering voice of 10 testing speakers. Each color represents a speaker.

Degree of Distance Preservation (Q2). Here we did a similar experiment as in Sect. 4.4 to compute ratio of pairwise distances among embeddings generated by both the extracted embedders and victim embedder. We use 964 voice recordings of 10 speakers for this experiment.

Fig. 7.
figure 7

Ratio of pairwise distances among embeddings generated by victim embedder and by extracted embedders.

In Fig. 7, we plot the distribution of distance ratio for 3 embedders, each extracted from 1, 5, 10 victim classifiers of 10 classes respectively. Extracting from more victim models yields much smaller dispersion, indicating the distances are better preserved.

1.5 A.5 Performance in Attack Scenarios

Training a Composite Model for New Task (S1). We evaluate the performance of the extracted embedders when used in new voice classification tasks. Here we use the embedders we extracted from victims with 10 classes. They are the models visualized in Fig. 6(a)(b)(c). In Table 8, we can see performance of the embedder increases with the amount of victims available for extraction, and decreases with the number of classes.

Table 8. Accuracy of classification when using extracted embedders to create embedding.

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J., Tann, W.JW., Chang, EC., Lee, H.K. (2021). Common Component in Black-Boxes Is Prone to Attacks. In: Bertino, E., Shulman, H., Waidner, M. (eds) Computer Security – ESORICS 2021. ESORICS 2021. Lecture Notes in Computer Science(), vol 12972. Springer, Cham. https://doi.org/10.1007/978-3-030-88418-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88418-5_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88417-8

  • Online ISBN: 978-3-030-88418-5

  • eBook Packages: Computer ScienceComputer Science (R0)