Skip to main content

Improving Symbolic Automata Learning with Concolic Execution

  • 7227 Accesses

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12076)

Abstract

Inferring the input grammar accepted by a program is central for a variety of software engineering problems, including parsers verification, grammar-based fuzzing, communication protocol inference, and documentation. Sound and complete active learning techniques have been developed for several classes of languages and the corresponding automaton representation, however there are outstanding challenges that are limiting their effective application to the inference of input grammars. We focus on active learning techniques based on \(L^*\) and propose two extensions of the Minimally Adequate Teacher framework that allow the efficient learning of the input language of a program in the form of symbolic automata, leveraging the additional information that can extracted from concolic execution. Upon these extensions we develop two learning algorithms that reduce significantly the number of queries required to converge to the correct hypothesis.

This work has been partially supported by the EPSRC HiPEDS Centre for Doctoral Training (EP/L016796/1), the DSI-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), and a Royal Society Newton Mobility Grant (NMG\(\backslash \) R2 \(\backslash \)170142).

References

  1. Angluin, D.: Learning regular sets from queries and counterexamples. Information and Computation 75(2), 87–106 (1987)

    MathSciNet  CrossRef  Google Scholar 

  2. Angluin, D.: Queries and Concept Learning. Machine Learning 2(4), 319–342 (apr 1988)

    Google Scholar 

  3. Argyros, G., D'Antoni, L.: The learnability of symbolic automata. In: Chockler, H.,Weissenbacher, G. (eds.) Computer Aided Verification. CAV 2018. pp. 427–445. Springer International Publishing, Cham (2018)

    Google Scholar 

  4. Argyros, G., Stais, I., Kiayias, A., Keromytis, A.D.: Back in Black: Towards Formal, Black Box Analysis of Sanitizers and Filters. Proceedings - 2016 IEEE Symposium on Security and Privacy, SP 2016 pp. 91–109 (2016). https://doi.org/10.1109/SP.2016.14

  5. Aydin, A., Bang, L., Bultan, T.: Automata-Based Model Counting for String Constraints. In: Kroening, D., Păsăreanu, C.S. (eds.) Computer Aided Verification. pp. 255–272. Lecture Notes in Computer Science, Springer International Publishing, Cham (2015)

    Google Scholar 

  6. Bastani, O., Sharma, R., Aiken, A., Liang, P.: Synthesizing Program Input Grammars. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 95–110. ACM (2017), http://arxiv.org/abs/1608.01723

  7. Cui, W., Peinado, M., Chen, K., Wang, H.J., Irun-Briz, L.: Tupni: Automatic reverse engineering of input formats. Proceedings of the ACM Conference on Computer and Communications Security pp. 391–402 (2008). https://doi.org/10.1145/1455770.1455820

  8. D’Antoni, L.: AutomatArk (2018), https://github.com/lorisdanto/automatark

  9. D’Antoni, L.: SVPAlib (2018), https://github.com/lorisdanto/symbolicautomata/

  10. D’Antoni, L., Veanes, M.: The power of symbolic automata and transducers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10426 LNCS, 47–67 (2017)

    Google Scholar 

  11. Drews, S., D’Antoni, L.: Learning symbolic automata. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10205 LNCS, 173–189 (2017)

    Google Scholar 

  12. Geldenhuys, J., Visser, W.: Coastal (2019), https://github.com/DeepseaPlatform/coastal

  13. Godefroid, P., Kiezun, A., Levin, M.Y.: Grammar-based whitebox fuzzing. In: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 206–215 (2008). https://doi.org/10.1145/1379022.1375607

  14. Godefroid, P., Klarlund, N., Sen, K.: Dart: Directed automated random testing. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. p. 213–223. PLDI ’05, Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1065010.1065036

  15. Godefroid, P., Peleg, H., Singh, R.: Learn&Fuzz: Machine Learning for Input Fuzzing. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. pp. 50–59. IEEE Press, Urbana-Champaign, IL, USA (2017)

    Google Scholar 

  16. Gopinath, R., Mathis, B., Höschele, M., Kampmann, A., Zeller, A.: Sample-Free Learning of Input Grammars for Comprehensive Software Fuzzing (2018). http://arxiv.org/abs/1810.08289

  17. Heinz, J., Sempere, J.M.: Topics in grammatical inference (2016)

    Google Scholar 

  18. de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, New York, NY, USA (2010)

    CrossRef  Google Scholar 

  19. Höschele, M., Kampmann, A., Zeller, A.: Active Learning of Input Grammars (2017), http://arxiv.org/abs/1708.08731

  20. Isberner, M.: Foundations of Active Automata Learning: an Algorithmic Perspective. Ph.D. thesis (2015)

    Google Scholar 

  21. Isberner, M., Howar, F., Steffen, B.: The TTT Algorithm: A Redundancy-Free Approach to Active Automata Learning. In: Bonakdarpour, B., Smolka, S.A. (eds.) Runtime Verification. pp. 307–322. Springer International Publishing, Cham (2014)

    Google Scholar 

  22. Isberner, M., Steffen, B.: An Abstract Framework for Counterexample Analysis in Active Automata Learning. JMLR: Workshop and Conference Proceedings, 79–93 (2014)

    Google Scholar 

  23. Kearns, M.J., Vazirani, U.: Learning Finite Automata by Experimentation. In: An Introduction to Computational Learning Theory, pp. 155–158. The MIT Press (1994)

    Google Scholar 

  24. Lin, Z., Zhang, X., Xu, D.: Reverse engineering input syntactic structure from program execution and its applications. IEEE Transactions on Software Engineering 36(5), 688–703 (2010). https://doi.org/10.1109/TSE.2009.54

    CrossRef  Google Scholar 

  25. Maler, O., Mens, I.E.: Learning Regular Languages over Large Alphabets. In: Abraham, E., Havelund, K. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2014. pp. 485–499. Springer Berlin Heidelberg, Berlin, Heidelberg (2014)

    Google Scholar 

  26. de Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 337–340. Springer Berlin Heidelberg, Berlin, Heidelberg (2008)

    Google Scholar 

  27. Sen, K., Marinov, D., Agha, G.: Cute: A concolic unit testing engine for c. SIGSOFT Softw. Eng. Notes 30(5), 263–272 (Sep 2005). https://doi.org/10.1145/1095430.1081750, https://doi.org/10.1145/1095430.1081750

  28. Veanes, M., De Halleux, P., Tillmann, N.: Rex: Symbolic regular expression explorer. ICST 2010 - 3rd International Conference on Software Testing, Verification and Validation pp. 498–507 (2010). https://doi.org/10.1109/ICST.2010.15

  29. Wu, Z., Johnson, E., Bastani, O., Song, D.: REINAM: Reinforcement Learning for Input-Grammar Inference. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 488–498. ACM (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Willem Visser .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2020 The Author(s)

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Clun, D., van Heerden, P., Filieri, A., Visser, W. (2020). Improving Symbolic Automata Learning with Concolic Execution. In: Wehrheim, H., Cabot, J. (eds) Fundamental Approaches to Software Engineering. FASE 2020. Lecture Notes in Computer Science(), vol 12076. Springer, Cham. https://doi.org/10.1007/978-3-030-45234-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45234-6_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45233-9

  • Online ISBN: 978-3-030-45234-6

  • eBook Packages: Computer ScienceComputer Science (R0)