Machine learning steered symbolic execution framework for complex software code

Bu, Lei; Liang, Yongjuan; Xie, Zhunyi; Qian, Hong; Hu, Yi-Qi; Yu, Yang; Chen, Xin; Li, Xuandong

doi:10.1007/s00165-021-00538-3

Machine learning steered symbolic execution framework for complex software code

Original Article
Published: 26 May 2021

Volume 33, pages 301–323, (2021)
Cite this article

Formal Aspects of Computing

Lei Bu ORCID: orcid.org/0000-0003-0517-7801¹,
Yongjuan Liang¹,
Zhunyi Xie¹,
Hong Qian¹,
Yi-Qi Hu¹,
Yang Yu¹,
Xin Chen¹ &
…
Xuandong Li¹

173 Accesses
4 Citations
Explore all metrics

Abstract

During program traversing, symbolic execution collects path conditions and feeds them to a constraint solver to obtain feasible solutions. However, complex path conditions, like nonlinear constraints, which widely appear in programs, are hard to be handled efficiently by the existing solvers. In this paper, we adapt the classical symbolic execution framework with a machine learning approach for constraint satisfaction. The approach samples and learns from different solutions to identify potentially feasible area. This sampling-learning style solving can be applied in different class of complex problems easily. Therefore, incorporating this approach, our framework, MLBSE, supports the symbolic execution of not only simple linear path conditions, but also nonlinear arithmetic operations, and even black-box function calls of library methods. Meanwhile, thanks to the theoretical foundation of the machine learning based approach, when the solver fails to solve a path condition, we can have an estimation of the confidence in the satisfiability (ECS) of the problem to give users insights about how the problem is analyzed and whether they could ultimately find a solution. We implement MLBSE on the basis of Symbolic Path Finder (SPF) into a fully automatic Java symbolic execution engine. Users can feed their code to MLBSE directly, which is very convenient to use. To evaluate its performance, 22 real case programs are used as the benchmarks for MLBSE to generate test cases, which involve a total number of 1042 methods that are full of nonlinear operations, floating-point arithmetic as well as native method calls. Experiment results show that the coverage achieved by MLBSE is much higher than the state-of-the-art tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

Article Open access 19 April 2023

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Article 08 April 2024

Machine Learning in Computer Aided Engineering

References

Saswat, A., Burke Edmund, K., Yueh, C.T., John, C., Cohen Myra, B., Wolfgang, G., Mark, H., Jean, H.M., Phil, M.M., et al.: An orchestrated survey of methodologies for automated software test case generation. J Syst Softw 86(8), 1978–2001 (2013)
Article Google Scholar
Apache Commons Math (2018) https://commons.apache.org/
Borges M, Amorim MD, Anand S, Bushnell D, Păsăreanu CS (2012) Symbolic execution with interval solving and meta-heuristic search. In: 2012 IEEE fifth international conference on software testing, verification and validation (ICST). IEEE, pp 111–120
Boyer Robert, S., Bernard, E., Levitt Karl, N.: Select–a formal system for testing and debugging programs by symbolic execution. ACM SigPlan Not 10(6), 234–245 (1975)
Article Google Scholar
Barr Earl, T., Thanh, V., Le, V., Zhendong, S.: Automatic detection of floating-point exceptions. ACM SIGPLAN Not 48(1), 549–560 (2013)
Article Google Scholar
Chang David, D., Clayton David, A.: Precise identification of individual promoters for transcription of each strand of human mitochondrial DNA. Cell 36(3), 635–643 (1984)
Article Google Scholar
Cristian, C., Daniel, D., Engler Dawson, R., et al.: Klee: unassisted and automatic generation of high-coverage tests for complex systems programs. OSDI 8, 209–224 (2008)
Google Scholar
Siddhartha, C., Edward, G.: Understanding the metropolis-hastings algorithm. Am Stat 49(4), 327–335 (1995)
Google Scholar
Cadar C, Godefroid P, Khurshid S, Păsăreanu CS, Sen K, Tillmann N, Visser W (2011) Symbolic execution for software testing in practice: preliminary assessment. In: Proceedings of the 33rd international conference on software engineering. ACM, pp 1066–1071
Clarke Lori, A.: A system to generate test data and symbolically execute programs. IEEE Trans Softw Eng 3, 215–222 (1976)
Article MathSciNet Google Scholar
Cyclomatic Complexity (2018) http://eclemma.org/jacoco/trunk/doc/counters.html
Dinges P, Agha G (2014) Solving complex path conditions through heuristic search on induced polytopes. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 425–436
Martin, D.: Hilbert's tenth problem is unsolvable. Am Math Mon 80(3), 233–269 (1973)
Article MathSciNet Google Scholar
Martin, F., Christian, H., Tino, T., Stefan, R., Tobias, S.: Efficient solving of large non-linear arithmetic constraint systems with complex boolean structure. J Satisf Boolean Model Comput 1, 209–236 (2007)
MATH Google Scholar
Fu Zhoulai, S., Zhendong, : Xsat: a fast floating-point satisfiability solver. In: Chaudhuri, S., Farzan, A. (eds.) Computer aided verification, pp. 187–209. Springer, Cham (2016)
Galeotti JP, Fraser G, Arcuri A (2013) Improving search-based test suite generation with dynamic symbolic execution. In: 2013 IEEE 24th international symposium on software reliability engineering (ISSRE). IEEE, pp 360–369
Patrice, G., Nils, K., Koushik, S.: Dart: directed automated random testing. ACM Sigplan Not 40(6), 213–223 (2005)
Article Google Scholar
Fred, G.: Tabu search: a tutorial. Interfaces 20(4), 74–94 (1990)
Article Google Scholar
Gough, B.: GNU scientific library reference manual. Network Theory Ltd, Surrey (2009)
Google Scholar
Gies D, Rahmat-samii Y (2004) Particle swarm optimization (pso) for reflector antenna shaping. In: Antennas and propagation society international symposium, 2004. IEEE, vol 3, pp 2289–2292
Klaus, H., Thomas, P.: Model checking java programs using java pathfinder. Int J Softw Tools Technol Transf 2(4), 366–381 (2000)
Article Google Scholar
Jacoco (2018) http://www.eclemma.org/jacoco/
Jovanović, D., De Moura, L.: Solving non-linear arithmetic. In: Gramlich, B., Miller, D., Sattler, U. (eds.) Automated reasoning, pp. 339–354. Springer, Berlin (2012)
Chapter Google Scholar
Kingl James, C.: Symbolic execution and program testing. Commun ACM 19(7), 385–394 (1976)
Article MathSciNet Google Scholar
Luckow K, Dimjašević M, Giannakopoulou D, Howar F, Isberner M, Kahsai T, Rakamarić Z, Raman V (2016) JDart: a dynamic symbolic analysis framework. In: Chechik M, Raskin J-F (eds) Proceedings of the 22nd international conference on tools and algorithms for the construction and analysis of systems (TACAS), lecture notes in computer science, vol 9636. Springer, Berlin, pp 442–459
Willisa, L., Geuze Hans, J., Slot Jan, W.: Improving structural integrity of cryosections for immunogold labeling. Histochem Cell Biol 106(1), 41–58 (1996)
Article Google Scholar
Li X, Liang Y, Qian H, Hu Y-Q, Bu L, Yu Y, Chen X, Li X (2016) Symbolic execution of complex program driven by machine learning based constraint solving. In: Lo D, Apel S, Khurshid S (eds) Proceedings of the 31st IEEE/ACM international conference on automated software engineering, ASE 2016, Singapore, September 3–7, 2016. ACM, pp 554–559
Phil, M.M.: Search-based software test data generation: a survey. Softw Test Verif Reliab 14(2), 105–156 (2004)
Article Google Scholar
Minizinc (2018) http://www.minizinc.org/
Munos, R.: From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning. Found Trends Mach Learn 7(1), 1–130 (2014)
Article Google Scholar
Păsăreanu CS, Rungta N (2010) Symbolic pathfinder: symbolic execution of java bytecode. In: Proceedings of the IEEE/ACM international conference on automated software engineering. ACM, pp 179–180
Press William, H.: Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, Cambridge (2007)
MATH Google Scholar
Păsăreanu CS, Rungta N, Visser W (2011) Symbolic execution with mixed concrete-symbolic solving. In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, pp 34–44
Păsăreanu Corina, S., Willem, V.: A survey of new trends in symbolic execution for software testing and analysis. Int J Softw Tools Technol Transf 11(4), 339–353 (2009)
Article Google Scholar
Păsăreanu Corina, S., Willem, V., David, B., Jaco, G., Peter, M., Neha, R.: Symbolic pathfinder: integrating symbolic execution with model checking for java bytecode analysis. Autom Softw Eng 20(3), 391–425 (2013)
Article Google Scholar
Qian H, Yu Y (2016) On sampling-and-classification optimization in discrete domains. In: Proceedings of the 2016 IEEE congress on evolutionary computation (CEC'16), Vancouver, Canada, pp 4374–4381
Sen, K., Agha, G.: Cute and jcute: concolic unit testing and explicit path model-checking tools. Computer aided verification, pp. 419–423. Springer, Berlin (2006)
Chapter Google Scholar
Shafiei N, van Breugel F (2014) Automatic handling of native methods in java pathfinder. In: Proceedings of the 2014 international SPIN symposium on model checking of software. ACM, pp 97–100
Souza, M., Borges, M., d'Amorim, M., Păsăreanu, C.S.: Coral: solving complex constraints for symbolic pathfinder. NASA formal methods, pp. 359–374. Springer, Berlin (2011)
Chapter Google Scholar
Scientific Computation (2018) https://github.com/elizabethzhenliu/ScientificComputation
Sen K (2007) Concolic testing. In: Proceedings of the twenty-second IEEE/ACM international conference on automated software engineering. ACM, pp 571–572
Bobak, S., Kevin, S., Ziyu, W., Adams Ryan, P., de Freitas Nando, : Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104(1), 148–175 (2016)
Tillmann N, De Halleux J (2008) Pex–white box test generation for. net. In: Tests and proofs. Springer, Berlin, pp 134–153
Yu Y, Qian H, Hu Y-Q (2016) Derivative-free optimization via classification. In: Proceedings of the 30th AAAI conference on artificial intelligence (AAAI'16), Phoenix, AZ
Yu Y, Hu Y-Q, Qian H (2017) Sequential classification-based optimization for direct policy search. In: Proceedings of the 31st AAAI conference on artificial intelligence (AAAI'17), San Francisco, CA, pp 2029–2035

Download references

Acknowledgements

The authors want to thank the anonymous reviewers and editors for their valuable advices on improving this paper. The authors would also thank Mr. Xin Li, Mr. Yuchao Duan, and Mr. Bochuan Chen for their efforts devoted in developing MLBSE. This work is supported in part by the National Key Research and Development Program of China (2020AAA0107200), the National Natural Science Foundation of China (Nos. 61632015, 61690204, 61876077), and the Leading-edge Technology Prohgram of Jiangsu Natural Science Fundation (No. BK20202001).

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, People’s Republic of China
Lei Bu, Yongjuan Liang, Zhunyi Xie, Hong Qian, Yi-Qi Hu, Yang Yu, Xin Chen & Xuandong Li

Authors

Lei Bu
View author publications
You can also search for this author in PubMed Google Scholar
Yongjuan Liang
View author publications
You can also search for this author in PubMed Google Scholar
Zhunyi Xie
View author publications
You can also search for this author in PubMed Google Scholar
Hong Qian
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Qi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xuandong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Bu.

Additional information

Zhiming Liu, Xiaoping Chen, Ji Wang and Jim Woodcock

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bu, L., Liang, Y., Xie, Z. et al. Machine learning steered symbolic execution framework for complex software code. Form Asp Comp 33, 301–323 (2021). https://doi.org/10.1007/s00165-021-00538-3

Download citation

Received: 10 September 2020
Revised: 23 November 2020
Accepted: 30 January 2021
Published: 26 May 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00165-021-00538-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning steered symbolic execution framework for complex software code

Abstract

Access this article

Similar content being viewed by others

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Machine Learning in Computer Aided Engineering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Machine learning steered symbolic execution framework for complex software code

Abstract

Access this article

Similar content being viewed by others

Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives

CoRT: Transformer-based code representations with self-supervision by predicting reserved words for code smell detection

Machine Learning in Computer Aided Engineering

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation