Abstract
Widespread applications of deep neural networks (DNNs) benefit from DNN testing to guarantee their quality. In the DNN testing, numerous test cases are fed into the model to explore potential vulnerabilities, but they require expensive manual cost to check the label. Therefore, test case prioritization is proposed to solve the problem of labeling cost, e.g., surprise adequacy-based, uncertainty quantifiers-based and mutation-based prioritization methods. However, most of them suffer from limited scenarios (i.e. high confidence adversarial or false positive cases) and high time complexity. To address these challenges, we propose the concept of the activation graph from the perspective of the spatial relationship of neurons. We observe that the activation graph of cases that triggers the model’s misbehavior significantly differs from that of normal cases. Motivated by it, we design a test case prioritization method based on the activation graph, ActGraph, by extracting the high-order node feature of the activation graph for prioritization. ActGraph explains the difference between the test cases to solve the problem of scenario limitation. Without mutation operations, ActGraph is easy to implement, leading to lower time complexity. Extensive experiments on three datasets and four models demonstrate that ActGraph has the following key characteristics. (i) Effectiveness and generalizability: ActGraph shows competitive performance in all of the natural, adversarial and mixed scenarios, especially in RAUC-100 improvement (\(\sim \times \)1.40). (ii) Efficiency: ActGraph runs at less time cost (\(\sim \times \)1/50) than the state-of-the-art method. The code of ActGraph is open-sourced at https://github.com/Embed-Debuger/ActGraph.
Similar content being viewed by others
References
Cadar, C., Dunbar, D., Engler, D.R.: KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In: Draves, R., van Renesse, R. (eds.) 8th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2008, 8–10 December, 2008, San Diego, California, USA, Proceedings, pp. 209–224. USENIX Association (2008)
Carlini, N., Wagner, D.A.: Towards evaluating the robustness of neural networks. In: 2017 IEEE symposium on security and privacy, SP 2017, San Jose, CA, USA, 22–26 May 2017, pp. 39–57. IEEE Computer Society (2017)
Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 785–794. ACM (2016)
Chen, Y., Wang, Z., Wang, D., Yao, Y., Chen, Z.: Behavior pattern-driven test case selection for deep neural networks. In: IEEE International Conference on Artificial Intelligence Testing, AITest 2019, Newark, CA, USA, 4–9 April 2019, pp. 89–90. IEEE (2019)
Chen, J., Wu, Z., Wang, Z., You, H., Zhang, L., Yan, M.: Practical accuracy estimation for efficient deep neural network testing. ACM Trans. Softw. Eng. Methodol. 29(4), 30–13035 (2020)
Danny Yadron, D.T.: Tesla driver dies in first fatal crash while using autopilot mode (2016). https://www.theguardian.com/technology/2016/jun/30/tesla-autopilot-death-self-driving-car-elon-musk
Dong, Y., Zhang, P., Wang, J., Liu, S., Sun, J., Hao, J., Wang, X., Wang, L., Dong, J.S., Ting, D.: There is limited correlation between coverage and robustness for deep neural networks. CoRR (2019) arXiv:1911.05904
Fel, T., Rodriguez, I.F.R., Linsley, D., Serre, T.: Harmonizing the object recognition strategies of deep neural networks with humans. In: NeurIPS (2022)
Feng, Y., Shi, Q., Gao, X., Wan, J., Fang, C., Chen, Z.: Deepgini: prioritizing massive tests to enhance the robustness of deep neural networks. In: Khurshid, S., Pasareanu, C.S. (eds.) ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, 18–22 July 2020, pp. 177–188. ACM (2020)
Filan, D., Casper, S., Hod, S., Wild, C., Critch, A., Russell, S.: Clusterability in neural networks. CoRR (2021) arXiv:2103.03386
Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6–11 August 2017. Proceedings of Machine Learning Research, vol. 70, pp. 1263–1272. PMLR (2017)
Guo, J., Jiang, Y., Zhao, Y., Chen, Q., Sun, J.: Dlfuzz: differential fuzzing testing of deep learning systems. In: Leavens, G.T., Garcia, A., Pasareanu, C.S. (eds.) Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, 04–09 November 2018, pp. 739–743. ACM (2018)
Harel-Canada, F., Wang, L., Gulzar, M.A., Gu, Q., Kim, M.: Is neuron coverage a meaningful measure for testing deep neural networks? In: Devanbu, P., Cohen, M.B., Zimmermann, T. (eds.) ESEC/FSE’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, 8–13 November 2020, pp. 851–862. ACM (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society (2016)
Kim, J., Feldt, R., Yoo, S.: Guiding deep learning system testing using surprise adequacy. In: Atlee, J.M., Bultan, T., Whittle, J. (eds.) Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, 25–31 May 2019, pp. 1039–1049. IEEE/ACM (2019)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Master’s thesis, University of Tront (2009)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Li, Z., Ma, X., Xu, C., Cao, C., Xu, J., Lü, J.: Boosting operational DNN testing efficiency through conditioning. In: Dumas, M., Pfahl, D., Apel, S., Russo, A. (eds.) Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, 26–30 August 2019, pp. 499–509. ACM (2019)
Li, Z., Ma, X., Xu, C., Cao, C.: Structural coverage criteria for neural networks could be misleading. In: Sarma, A., Murta, L. (eds.) Proceedings of the 41st International Conference on Software Engineering: New Ideas and Emerging Results, ICSE (NIER) 2019, Montreal, QC, Canada, 29–31 May 2019, pp. 89–92. IEEE/ACM (2019)
Li, Z., Ma, X., Xu, C., Xu, J., Cao, C., Lu, J.: Operational calibration: debugging confidence errors for DNNs in the field. In: Devanbu, P., Cohen, M.B., Zimmermann, T. (eds.) ESEC/FSE’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, 8–13 November 2020, pp. 901–913. ACM (2020)
Li, S., Dong, T., Zhao, B.Z.H., Xue, M., Du, S., Zhu, H.: Backdoors against natural language processing: a review. IEEE Secur. Priv. 20(5), 50–59 (2022)
Liu, T.: Learning to rank for information retrieval. In: Crestani, F., Marchand-Maillet, S., Chen, H., Efthimiadis, E.N., Savoy, J. (eds.) Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, Geneva, Switzerland, 19–23 July 2010, p. 904. ACM
Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L., Liu, Y., Zhao, J., Wang, Y.: Deepgauge: multi-granularity testing criteria for deep learning systems. In: Huchard, M., Kästner, C., Fraser, G. (eds.) Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, 3–7 September 2018, pp. 120–131. ACM (2018)
Ma, W., Papadakis, M., Tsakmalis, A., Cordy, M., Traon, Y.L.: Test selection for deep learning systems. ACM Trans. Softw. Eng. Methodol. 30(2), 13–11322 (2021)
Malaiya, Y.K., Li, M.N., Bieman, J.M., Karcich, R.: Software reliability growth with test coverage. IEEE Trans. Reliab. 51(4), 420–426 (2002)
Naitzat, G., Zhitnikov, A., Lim, L.: Topology of deep neural networks. J. Mach. Learn. Res. 21, 184–118440 (2020)
Ni, J., Xiang, D., Lin, Z., López-Martínez, C., Hu, W., Zhang, F.: DNN-based polsar image classification on noisy labels. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 15, 3697–3713 (2022)
Odena, A., Olsson, C., Andersen, D.G., Goodfellow, I.J.: Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 4901–4911. PMLR (2019)
Papernot, N., McDaniel, P.D., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: IEEE European Symposium on Security and Privacy, EuroS &P 2016, Saarbrücken, Germany, 21–24 March 2016, pp. 372–387. IEEE (2016)
Pei, K., Cao, Y., Yang, J., Jana, S.: Deepxplore: automated whitebox testing of deep learning systems. In: Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, 28–31 October 2017, pp. 1–18. ACM (2017)
Rieck, B., Togninalli, M., Bock, C., Moor, M., Horn, M., Gumbsch, T., Borgwardt, K.M.: Neural persistence: A complexity measure for deep neural networks using algebraic topology. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. OpenReview.net
Shen, W., Li, Y., Chen, L., Han, Y., Zhou, Y., Xu, B.: Multiple-boundary clustering and prioritization to promote neural network retraining. In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, 21–25 September 2020, pp. 410–422. IEEE (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings
Tian, Y., Pei, K., Jana, S., Ray, B.: Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Chaudron, M., Crnkovic, I., Chechik, M., Harman, M. (eds.) Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27–June 03 2018, pp. 303–314. ACM (2018)
Vahedian, F., Li, R., Trivedi, P., Jin, D., Koutra, D.: Leveraging the graph structure of neural network training dynamics. In: Hasan, M.A., Xiong, L. (eds.) Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022, pp. 4545–4549. ACM (2022)
Wang, Z., You, H., Chen, J., Zhang, Y., Dong, X., Zhang, W.: Prioritizing test inputs for deep neural networks via mutation analysis. In: 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22–30 May 2021, pp. 397–409. IEEE (2021)
Wang, J., Su, W., Luo, C., Chen, J., Song, H., Li, J.: CSG: classifier-aware defense strategy based on compressive sensing and generative networks for visual recognition in autonomous vehicle systems. IEEE Trans. Intell. Transp. Syst. 23(7), 9543–9553 (2022)
Weiss, M., Tonella, P.: Fail-safe execution of deep learning based systems through uncertainty monitoring. In: 14th IEEE Conference on Software Testing, Verification and Validation, ICST 2021, Porto de Galinhas, Brazil, 12-16 April 2021, pp. 24–35. IEEE (2021)
Weiss, M., Tonella, P.: Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). In: Ryu, S., Smaragdakis, Y. (eds.) ISSTA’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, 18–22 July 2022, pp. 139–150. ACM (2022)
Zhao, Y., Zhang, H.: Quantitative performance assessment of CNN units via topological entropy calculation. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. OpenReview.net
Zhu, H., Hall, P.A.V., May, J.H.R.: Software unit test coverage and adequacy. ACM Comput. Surv. 29(4), 366–427 (1997)
Acknowledgements
This research was supported by the National Natural Science Foundation of China (No. 62072406), the Zhejiang Provincial Natural Science Foundation (No. LDQ23F020001), the National Key Laboratory of Science and Technology on Information System Security (No. 61421110502), the National Natural Science Foundation of China (No. 62103374).
Author information
Authors and Affiliations
Contributions
JC: conceptualization, data curation, funding acquisition, methodology, resources, writing-review & editing, supervision, data curation, formal analysis. JG: methodology, investigation, resources, writing—original draft, software, project administration. HZ: methodology, writing-original draft, visualization, software, text polish, validation. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, J., Ge, J. & Zheng, H. ActGraph: prioritization of test cases based on deep neural network activation graph. Autom Softw Eng 30, 28 (2023). https://doi.org/10.1007/s10515-023-00396-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-023-00396-8