ActGraph: prioritization of test cases based on deep neural network activation graph

Chen, Jinyin; Ge, Jie; Zheng, Haibin

doi:10.1007/s10515-023-00396-8

ActGraph: prioritization of test cases based on deep neural network activation graph

Published: 22 August 2023

Volume 30, article number 28, (2023)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Jinyin Chen^1,2,
Jie Ge² &
Haibin Zheng^1,2

233 Accesses
1 Citation
Explore all metrics

Abstract

Widespread applications of deep neural networks (DNNs) benefit from DNN testing to guarantee their quality. In the DNN testing, numerous test cases are fed into the model to explore potential vulnerabilities, but they require expensive manual cost to check the label. Therefore, test case prioritization is proposed to solve the problem of labeling cost, e.g., surprise adequacy-based, uncertainty quantifiers-based and mutation-based prioritization methods. However, most of them suffer from limited scenarios (i.e. high confidence adversarial or false positive cases) and high time complexity. To address these challenges, we propose the concept of the activation graph from the perspective of the spatial relationship of neurons. We observe that the activation graph of cases that triggers the model’s misbehavior significantly differs from that of normal cases. Motivated by it, we design a test case prioritization method based on the activation graph, ActGraph, by extracting the high-order node feature of the activation graph for prioritization. ActGraph explains the difference between the test cases to solve the problem of scenario limitation. Without mutation operations, ActGraph is easy to implement, leading to lower time complexity. Extensive experiments on three datasets and four models demonstrate that ActGraph has the following key characteristics. (i) Effectiveness and generalizability: ActGraph shows competitive performance in all of the natural, adversarial and mixed scenarios, especially in RAUC-100 improvement (\(\sim \times \)1.40). (ii) Efficiency: ActGraph runs at less time cost (\(\sim \times \)1/50) than the state-of-the-art method. The code of ActGraph is open-sourced at https://github.com/Embed-Debuger/ActGraph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HashC: Making DNNs’ Coverage Testing Finer and Faster

TPFL: Test Input Prioritization for Deep Neural Networks Based on Fault Localization

CLEVEREST: Accelerating CEGAR-based Neural Network Verification via Adversarial Attacks

References

Cadar, C., Dunbar, D., Engler, D.R.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Draves, R., van Renesse, R. (eds.) 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, 8–10 December, 2008, San Diego, California, USA, Proceedings, pp. 209–224. USENIX Association (2008)
Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy, SP 2017, San Jose, CA, USA, 22–26 May 2017, pp. 39–57. IEEE Computer Society (2017)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 785–794. ACM (2016)
Chen, Y., Wang, Z., Wang, D., Yao, Y., Chen, Z.: Behavior pattern-driven test case selection for deep neural networks. In: IEEE International Conference on Artificial Intelligence Testing, AITest 2019, Newark, CA, USA, 4–9 April 2019, pp. 89–90. IEEE (2019)
Chen, J., Wu, Z., Wang, Z., You, H., Zhang, L., Yan, M.: Practical accuracy estimation for efficient deep neural network testing. ACM Trans. Softw. Eng. Methodol. 29(4), 30–13035 (2020)
Article Google Scholar
Danny Yadron, D.T.: Tesla driver dies in first fatal crash while using autopilot mode (2016). https://www.theguardian.com/technology/2016/jun/30/tesla-autopilot-death-self-driving-car-elon-musk
Dong, Y., Zhang, P., Wang, J., Liu, S., Sun, J., Hao, J., Wang, X., Wang, L., Dong, J.S., Ting, D.: There is limited correlation between coverage and robustness for deep neural networks. CoRR (2019) arXiv:1911.05904
Fel, T., Rodriguez, I.F.R., Linsley, D., Serre, T.: Harmonizing the object recognition strategies of deep neural networks with humans. In: NeurIPS (2022)
Feng, Y., Shi, Q., Gao, X., Wan, J., Fang, C., Chen, Z.: Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks. In: Khurshid, S., Pasareanu, C.S. (eds.) ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, 18–22 July 2020, pp. 177–188. ACM (2020)
Filan, D., Casper, S., Hod, S., Wild, C., Critch, A., Russell, S.: Clusterability in neural networks. CoRR (2021) arXiv:2103.03386
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 1263–1272. PMLR (2017)
Guo, J., Jiang, Y., Zhao, Y., Chen, Q., Sun, J.: Dlfuzz: differential fuzzing testing of deep learning systems. In: Leavens, G.T., Garcia, A., Pasareanu, C.S. (eds.) Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, 04–09 November 2018, pp. 739–743. ACM (2018)
Harel-Canada, F., Wang, L., Gulzar, M.A., Gu, Q., Kim, M.: Is neuron coverage a meaningful measure for testing deep neural networks? In: Devanbu, P., Cohen, M.B., Zimmermann, T. (eds.) ESEC/FSE’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, 8–13 November 2020, pp. 851–862. ACM (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society (2016)
Kim, J., Feldt, R., Yoo, S.: Guiding deep learning system testing using surprise adequacy. In: Atlee, J.M., Bultan, T., Whittle, J. (eds.) Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, 25–31 May 2019, pp. 1039–1049. IEEE/ACM (2019)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Master’s thesis, University of Tront (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, Z., Ma, X., Xu, C., Cao, C., Xu, J., Lü, J.: Boosting operational DNN testing efficiency through conditioning. In: Dumas, M., Pfahl, D., Apel, S., Russo, A. (eds.) Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, 26–30 August 2019, pp. 499–509. ACM (2019)
Li, Z., Ma, X., Xu, C., Cao, C.: Structural coverage criteria for neural networks could be misleading. In: Sarma, A., Murta, L. (eds.) Proceedings of the 41st International Conference on Software Engineering: New Ideas and Emerging Results, ICSE (NIER) 2019, Montreal, QC, Canada, 29–31 May 2019, pp. 89–92. IEEE/ACM (2019)
Li, Z., Ma, X., Xu, C., Xu, J., Cao, C., Lu, J.: Operational calibration: debugging confidence errors for DNNs in the field. In: Devanbu, P., Cohen, M.B., Zimmermann, T. (eds.) ESEC/FSE’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, 8–13 November 2020, pp. 901–913. ACM (2020)
Li, S., Dong, T., Zhao, B.Z.H., Xue, M., Du, S., Zhu, H.: Backdoors against natural language processing: a review. IEEE Secur. Priv. 20(5), 50–59 (2022)
Article Google Scholar
Liu, T.: Learning to rank for information retrieval. In: Crestani, F., Marchand-Maillet, S., Chen, H., Efthimiadis, E.N., Savoy, J. (eds.) Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, 19–23 July 2010, p. 904. ACM
Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L., Liu, Y., Zhao, J., Wang, Y.: Deepgauge: multi-granularity testing criteria for deep learning systems. In: Huchard, M., Kästner, C., Fraser, G. (eds.) Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, 3–7 September 2018, pp. 120–131. ACM (2018)
Ma, W., Papadakis, M., Tsakmalis, A., Cordy, M., Traon, Y.L.: Test selection for deep learning systems. ACM Trans. Softw. Eng. Methodol. 30(2), 13–11322 (2021)
Article Google Scholar
Malaiya, Y.K., Li, M.N., Bieman, J.M., Karcich, R.: Software reliability growth with test coverage. IEEE Trans. Reliab. 51(4), 420–426 (2002)
Article Google Scholar
Naitzat, G., Zhitnikov, A., Lim, L.: Topology of deep neural networks. J. Mach. Learn. Res. 21, 184–118440 (2020)
MathSciNet MATH Google Scholar
Ni, J., Xiang, D., Lin, Z., López-Martínez, C., Hu, W., Zhang, F.: DNN-based polsar image classification on noisy labels. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 15, 3697–3713 (2022)
Article Google Scholar
Odena, A., Olsson, C., Andersen, D.G., Goodfellow, I.J.: Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4901–4911. PMLR (2019)
Papernot, N., McDaniel, P.D., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy, EuroS &P 2016, Saarbrücken, Germany, 21–24 March 2016, pp. 372–387. IEEE (2016)
Pei, K., Cao, Y., Yang, J., Jana, S.: Deepxplore: automated whitebox testing of deep learning systems. In: Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, 28–31 October 2017, pp. 1–18. ACM (2017)
Rieck, B., Togninalli, M., Bock, C., Moor, M., Horn, M., Gumbsch, T., Borgwardt, K.M.: Neural persistence: A complexity measure for deep neural networks using algebraic topology. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net
Shen, W., Li, Y., Chen, L., Han, Y., Zhou, Y., Xu, B.: Multiple-boundary clustering and prioritization to promote neural network retraining. In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, 21–25 September 2020, pp. 410–422. IEEE (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings
Tian, Y., Pei, K., Jana, S., Ray, B.: Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Chaudron, M., Crnkovic, I., Chechik, M., Harman, M. (eds.) Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27–June 03 2018, pp. 303–314. ACM (2018)
Vahedian, F., Li, R., Trivedi, P., Jin, D., Koutra, D.: Leveraging the graph structure of neural network training dynamics. In: Hasan, M.A., Xiong, L. (eds.) Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022, pp. 4545–4549. ACM (2022)
Wang, Z., You, H., Chen, J., Zhang, Y., Dong, X., Zhang, W.: Prioritizing test inputs for deep neural networks via mutation analysis. In: 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22–30 May 2021, pp. 397–409. IEEE (2021)
Wang, J., Su, W., Luo, C., Chen, J., Song, H., Li, J.: CSG: classifier-aware defense strategy based on compressive sensing and generative networks for visual recognition in autonomous vehicle systems. IEEE Trans. Intell. Transp. Syst. 23(7), 9543–9553 (2022)
Article Google Scholar
Weiss, M., Tonella, P.: Fail-safe execution of deep learning based systems through uncertainty monitoring. In: 14th IEEE Conference on Software Testing, Verification and Validation, ICST 2021, Porto de Galinhas, Brazil, 12-16 April 2021, pp. 24–35. IEEE (2021)
Weiss, M., Tonella, P.: Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). In: Ryu, S., Smaragdakis, Y. (eds.) ISSTA’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, 18–22 July 2022, pp. 139–150. ACM (2022)
Zhao, Y., Zhang, H.: Quantitative performance assessment of CNN units via topological entropy calculation. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net
Zhu, H., Hall, P.A.V., May, J.H.R.: Software unit test coverage and adequacy. ACM Comput. Surv. 29(4), 366–427 (1997)
Article Google Scholar

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (No. 62072406), the Zhejiang Provincial Natural Science Foundation (No. LDQ23F020001), the National Key Laboratory of Science and Technology on Information System Security (No. 61421110502), the National Natural Science Foundation of China (No. 62103374).

Author information

Authors and Affiliations

Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou, 310023, Zhejiang, China
Jinyin Chen & Haibin Zheng
College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, Zhejiang, China
Jinyin Chen, Jie Ge & Haibin Zheng

Authors

Jinyin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jie Ge
View author publications
You can also search for this author in PubMed Google Scholar
Haibin Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JC: conceptualization, data curation, funding acquisition, methodology, resources, writing-review & editing, supervision, data curation, formal analysis. JG: methodology, investigation, resources, writing—original draft, software, project administration. HZ: methodology, writing-original draft, visualization, software, text polish, validation. All authors reviewed the manuscript.

Corresponding author

Correspondence to Haibin Zheng.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, J., Ge, J. & Zheng, H. ActGraph: prioritization of test cases based on deep neural network activation graph. Autom Softw Eng 30, 28 (2023). https://doi.org/10.1007/s10515-023-00396-8

Download citation

Received: 15 February 2023
Accepted: 31 July 2023
Published: 22 August 2023
DOI: https://doi.org/10.1007/s10515-023-00396-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ActGraph: prioritization of test cases based on deep neural network activation graph

Abstract

Access this article

Similar content being viewed by others

HashC: Making DNNs’ Coverage Testing Finer and Faster

TPFL: Test Input Prioritization for Deep Neural Networks Based on Fault Localization

CLEVEREST: Accelerating CEGAR-based Neural Network Verification via Adversarial Attacks

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ActGraph: prioritization of test cases based on deep neural network activation graph

Abstract

Access this article

Similar content being viewed by others

HashC: Making DNNs’ Coverage Testing Finer and Faster

TPFL: Test Input Prioritization for Deep Neural Networks Based on Fault Localization

CLEVEREST: Accelerating CEGAR-based Neural Network Verification via Adversarial Attacks

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation