Skip to main content
Log in

ActGraph: prioritization of test cases based on deep neural network activation graph

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Widespread applications of deep neural networks (DNNs) benefit from DNN testing to guarantee their quality. In the DNN testing, numerous test cases are fed into the model to explore potential vulnerabilities, but they require expensive manual cost to check the label. Therefore, test case prioritization is proposed to solve the problem of labeling cost, e.g., surprise adequacy-based, uncertainty quantifiers-based and mutation-based prioritization methods. However, most of them suffer from limited scenarios (i.e. high confidence adversarial or false positive cases) and high time complexity. To address these challenges, we propose the concept of the activation graph from the perspective of the spatial relationship of neurons. We observe that the activation graph of cases that triggers the model’s misbehavior significantly differs from that of normal cases. Motivated by it, we design a test case prioritization method based on the activation graph, ActGraph, by extracting the high-order node feature of the activation graph for prioritization. ActGraph explains the difference between the test cases to solve the problem of scenario limitation. Without mutation operations, ActGraph is easy to implement, leading to lower time complexity. Extensive experiments on three datasets and four models demonstrate that ActGraph has the following key characteristics. (i) Effectiveness and generalizability: ActGraph shows competitive performance in all of the natural, adversarial and mixed scenarios, especially in RAUC-100 improvement (\(\sim \times \)1.40). (ii) Efficiency: ActGraph runs at less time cost (\(\sim \times \)1/50) than the state-of-the-art method. The code of ActGraph is open-sourced at https://github.com/Embed-Debuger/ActGraph.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Cadar, C., Dunbar, D., Engler, D.R.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Draves, R., van Renesse, R. (eds.) 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, 8–10 December, 2008, San Diego, California, USA, Proceedings, pp. 209–224. USENIX Association (2008)

  • Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy, SP 2017, San Jose, CA, USA, 22–26 May 2017, pp. 39–57. IEEE Computer Society (2017)

  • Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 785–794. ACM (2016)

  • Chen, Y., Wang, Z., Wang, D., Yao, Y., Chen, Z.: Behavior pattern-driven test case selection for deep neural networks. In: IEEE International Conference on Artificial Intelligence Testing, AITest 2019, Newark, CA, USA, 4–9 April 2019, pp. 89–90. IEEE (2019)

  • Chen, J., Wu, Z., Wang, Z., You, H., Zhang, L., Yan, M.: Practical accuracy estimation for efficient deep neural network testing. ACM Trans. Softw. Eng. Methodol. 29(4), 30–13035 (2020)

    Article  Google Scholar 

  • Danny Yadron, D.T.: Tesla driver dies in first fatal crash while using autopilot mode (2016). https://www.theguardian.com/technology/2016/jun/30/tesla-autopilot-death-self-driving-car-elon-musk

  • Dong, Y., Zhang, P., Wang, J., Liu, S., Sun, J., Hao, J., Wang, X., Wang, L., Dong, J.S., Ting, D.: There is limited correlation between coverage and robustness for deep neural networks. CoRR (2019) arXiv:1911.05904

  • Fel, T., Rodriguez, I.F.R., Linsley, D., Serre, T.: Harmonizing the object recognition strategies of deep neural networks with humans. In: NeurIPS (2022)

  • Feng, Y., Shi, Q., Gao, X., Wan, J., Fang, C., Chen, Z.: Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks. In: Khurshid, S., Pasareanu, C.S. (eds.) ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, 18–22 July 2020, pp. 177–188. ACM (2020)

  • Filan, D., Casper, S., Hod, S., Wild, C., Critch, A., Russell, S.: Clusterability in neural networks. CoRR (2021) arXiv:2103.03386

  • Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 1263–1272. PMLR (2017)

  • Guo, J., Jiang, Y., Zhao, Y., Chen, Q., Sun, J.: Dlfuzz: differential fuzzing testing of deep learning systems. In: Leavens, G.T., Garcia, A., Pasareanu, C.S. (eds.) Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, 04–09 November 2018, pp. 739–743. ACM (2018)

  • Harel-Canada, F., Wang, L., Gulzar, M.A., Gu, Q., Kim, M.: Is neuron coverage a meaningful measure for testing deep neural networks? In: Devanbu, P., Cohen, M.B., Zimmermann, T. (eds.) ESEC/FSE’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, 8–13 November 2020, pp. 851–862. ACM (2020)

  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society (2016)

  • Kim, J., Feldt, R., Yoo, S.: Guiding deep learning system testing using surprise adequacy. In: Atlee, J.M., Bultan, T., Whittle, J. (eds.) Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, 25–31 May 2019, pp. 1039–1049. IEEE/ACM (2019)

  • Krizhevsky, A.: Learning multiple layers of features from tiny images. Master’s thesis, University of Tront (2009)

  • LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  • Li, Z., Ma, X., Xu, C., Cao, C., Xu, J., Lü, J.: Boosting operational DNN testing efficiency through conditioning. In: Dumas, M., Pfahl, D., Apel, S., Russo, A. (eds.) Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, 26–30 August 2019, pp. 499–509. ACM (2019)

  • Li, Z., Ma, X., Xu, C., Cao, C.: Structural coverage criteria for neural networks could be misleading. In: Sarma, A., Murta, L. (eds.) Proceedings of the 41st International Conference on Software Engineering: New Ideas and Emerging Results, ICSE (NIER) 2019, Montreal, QC, Canada, 29–31 May 2019, pp. 89–92. IEEE/ACM (2019)

  • Li, Z., Ma, X., Xu, C., Xu, J., Cao, C., Lu, J.: Operational calibration: debugging confidence errors for DNNs in the field. In: Devanbu, P., Cohen, M.B., Zimmermann, T. (eds.) ESEC/FSE’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, 8–13 November 2020, pp. 901–913. ACM (2020)

  • Li, S., Dong, T., Zhao, B.Z.H., Xue, M., Du, S., Zhu, H.: Backdoors against natural language processing: a review. IEEE Secur. Priv. 20(5), 50–59 (2022)

    Article  Google Scholar 

  • Liu, T.: Learning to rank for information retrieval. In: Crestani, F., Marchand-Maillet, S., Chen, H., Efthimiadis, E.N., Savoy, J. (eds.) Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, 19–23 July 2010, p. 904. ACM

  • Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L., Liu, Y., Zhao, J., Wang, Y.: Deepgauge: multi-granularity testing criteria for deep learning systems. In: Huchard, M., Kästner, C., Fraser, G. (eds.) Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, 3–7 September 2018, pp. 120–131. ACM (2018)

  • Ma, W., Papadakis, M., Tsakmalis, A., Cordy, M., Traon, Y.L.: Test selection for deep learning systems. ACM Trans. Softw. Eng. Methodol. 30(2), 13–11322 (2021)

    Article  Google Scholar 

  • Malaiya, Y.K., Li, M.N., Bieman, J.M., Karcich, R.: Software reliability growth with test coverage. IEEE Trans. Reliab. 51(4), 420–426 (2002)

    Article  Google Scholar 

  • Naitzat, G., Zhitnikov, A., Lim, L.: Topology of deep neural networks. J. Mach. Learn. Res. 21, 184–118440 (2020)

    MathSciNet  MATH  Google Scholar 

  • Ni, J., Xiang, D., Lin, Z., López-Martínez, C., Hu, W., Zhang, F.: DNN-based polsar image classification on noisy labels. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 15, 3697–3713 (2022)

    Article  Google Scholar 

  • Odena, A., Olsson, C., Andersen, D.G., Goodfellow, I.J.: Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4901–4911. PMLR (2019)

  • Papernot, N., McDaniel, P.D., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy, EuroS &P 2016, Saarbrücken, Germany, 21–24 March 2016, pp. 372–387. IEEE (2016)

  • Pei, K., Cao, Y., Yang, J., Jana, S.: Deepxplore: automated whitebox testing of deep learning systems. In: Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, 28–31 October 2017, pp. 1–18. ACM (2017)

  • Rieck, B., Togninalli, M., Bock, C., Moor, M., Horn, M., Gumbsch, T., Borgwardt, K.M.: Neural persistence: A complexity measure for deep neural networks using algebraic topology. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net

  • Shen, W., Li, Y., Chen, L., Han, Y., Zhou, Y., Xu, B.: Multiple-boundary clustering and prioritization to promote neural network retraining. In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, 21–25 September 2020, pp. 410–422. IEEE (2020)

  • Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings

  • Tian, Y., Pei, K., Jana, S., Ray, B.: Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Chaudron, M., Crnkovic, I., Chechik, M., Harman, M. (eds.) Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27–June 03 2018, pp. 303–314. ACM (2018)

  • Vahedian, F., Li, R., Trivedi, P., Jin, D., Koutra, D.: Leveraging the graph structure of neural network training dynamics. In: Hasan, M.A., Xiong, L. (eds.) Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022, pp. 4545–4549. ACM (2022)

  • Wang, Z., You, H., Chen, J., Zhang, Y., Dong, X., Zhang, W.: Prioritizing test inputs for deep neural networks via mutation analysis. In: 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22–30 May 2021, pp. 397–409. IEEE (2021)

  • Wang, J., Su, W., Luo, C., Chen, J., Song, H., Li, J.: CSG: classifier-aware defense strategy based on compressive sensing and generative networks for visual recognition in autonomous vehicle systems. IEEE Trans. Intell. Transp. Syst. 23(7), 9543–9553 (2022)

    Article  Google Scholar 

  • Weiss, M., Tonella, P.: Fail-safe execution of deep learning based systems through uncertainty monitoring. In: 14th IEEE Conference on Software Testing, Verification and Validation, ICST 2021, Porto de Galinhas, Brazil, 12-16 April 2021, pp. 24–35. IEEE (2021)

  • Weiss, M., Tonella, P.: Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). In: Ryu, S., Smaragdakis, Y. (eds.) ISSTA’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, 18–22 July 2022, pp. 139–150. ACM (2022)

  • Zhao, Y., Zhang, H.: Quantitative performance assessment of CNN units via topological entropy calculation. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net

  • Zhu, H., Hall, P.A.V., May, J.H.R.: Software unit test coverage and adequacy. ACM Comput. Surv. 29(4), 366–427 (1997)

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (No. 62072406), the Zhejiang Provincial Natural Science Foundation (No. LDQ23F020001), the National Key Laboratory of Science and Technology on Information System Security (No. 61421110502), the National Natural Science Foundation of China (No. 62103374).

Author information

Authors and Affiliations

Authors

Contributions

JC: conceptualization, data curation, funding acquisition, methodology, resources, writing-review & editing, supervision, data curation, formal analysis. JG: methodology, investigation, resources, writing—original draft, software, project administration. HZ: methodology, writing-original draft, visualization, software, text polish, validation. All authors reviewed the manuscript.

Corresponding author

Correspondence to Haibin Zheng.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Ge, J. & Zheng, H. ActGraph: prioritization of test cases based on deep neural network activation graph. Autom Softw Eng 30, 28 (2023). https://doi.org/10.1007/s10515-023-00396-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-023-00396-8

Keywords

Navigation