Improving Symbolic Automata Learning with Concolic Execution

Clun, Donato; van Heerden, Phillip; Filieri, Antonio; Visser, Willem

doi:10.1007/978-3-030-45234-6_1

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12076))

Included in the following conference series:

International Conference on Fundamental Approaches to Software Engineering

8172 Accesses

Abstract

Inferring the input grammar accepted by a program is central for a variety of software engineering problems, including parsers verification, grammar-based fuzzing, communication protocol inference, and documentation. Sound and complete active learning techniques have been developed for several classes of languages and the corresponding automaton representation, however there are outstanding challenges that are limiting their effective application to the inference of input grammars. We focus on active learning techniques based on \(L^*\) and propose two extensions of the Minimally Adequate Teacher framework that allow the efficient learning of the input language of a program in the form of symbolic automata, leveraging the additional information that can extracted from concolic execution. Upon these extensions we develop two learning algorithms that reduce significantly the number of queries required to converge to the correct hypothesis.

This work has been partially supported by the EPSRC HiPEDS Centre for Doctoral Training (EP/L016796/1), the DSI-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), and a Royal Society Newton Mobility Grant (NMG\(\backslash \) R2 \(\backslash \)170142).

Download to read the full chapter text

Chapter PDF

Active learning for extended finite state machines

Article 12 February 2016

The TTT Algorithm: A Redundancy-Free Approach to Active Automata Learning

Active Learning for Efficient Testing of Student Programs

References

Angluin, D.: Learning regular sets from queries and counterexamples. Information and Computation 75(2), 87–106 (1987)
Article MathSciNet Google Scholar
Angluin, D.: Queries and Concept Learning. Machine Learning 2(4), 319–342 (apr 1988)
Google Scholar
Argyros, G., D'Antoni, L.: The learnability of symbolic automata. In: Chockler, H.,Weissenbacher, G. (eds.) Computer Aided Verification. CAV 2018. pp. 427–445. Springer International Publishing, Cham (2018)
Google Scholar
Argyros, G., Stais, I., Kiayias, A., Keromytis, A.D.: Back in Black: Towards Formal, Black Box Analysis of Sanitizers and Filters. Proceedings - 2016 IEEE Symposium on Security and Privacy, SP 2016 pp. 91–109 (2016). https://doi.org/10.1109/SP.2016.14
Aydin, A., Bang, L., Bultan, T.: Automata-Based Model Counting for String Constraints. In: Kroening, D., Păsăreanu, C.S. (eds.) Computer Aided Verification. pp. 255–272. Lecture Notes in Computer Science, Springer International Publishing, Cham (2015)
Google Scholar
Bastani, O., Sharma, R., Aiken, A., Liang, P.: Synthesizing Program Input Grammars. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 95–110. ACM (2017), http://arxiv.org/abs/1608.01723
Cui, W., Peinado, M., Chen, K., Wang, H.J., Irun-Briz, L.: Tupni: Automatic reverse engineering of input formats. Proceedings of the ACM Conference on Computer and Communications Security pp. 391–402 (2008). https://doi.org/10.1145/1455770.1455820
D’Antoni, L.: AutomatArk (2018), https://github.com/lorisdanto/automatark
D’Antoni, L.: SVPAlib (2018), https://github.com/lorisdanto/symbolicautomata/
D’Antoni, L., Veanes, M.: The power of symbolic automata and transducers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10426 LNCS, 47–67 (2017)
Google Scholar
Drews, S., D’Antoni, L.: Learning symbolic automata. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10205 LNCS, 173–189 (2017)
Google Scholar
Geldenhuys, J., Visser, W.: Coastal (2019), https://github.com/DeepseaPlatform/coastal
Godefroid, P., Kiezun, A., Levin, M.Y.: Grammar-based whitebox fuzzing. In: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation. pp. 206–215 (2008). https://doi.org/10.1145/1379022.1375607
Godefroid, P., Klarlund, N., Sen, K.: Dart: Directed automated random testing. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. p. 213–223. PLDI ’05, Association for Computing Machinery, New York, NY, USA (2005). https://doi.org/10.1145/1065010.1065036
Godefroid, P., Peleg, H., Singh, R.: Learn&Fuzz: Machine Learning for Input Fuzzing. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. pp. 50–59. IEEE Press, Urbana-Champaign, IL, USA (2017)
Google Scholar
Gopinath, R., Mathis, B., Höschele, M., Kampmann, A., Zeller, A.: Sample-Free Learning of Input Grammars for Comprehensive Software Fuzzing (2018). http://arxiv.org/abs/1810.08289
Heinz, J., Sempere, J.M.: Topics in grammatical inference (2016)
Google Scholar
de la Higuera, C.: Grammatical Inference: Learning Automata and Grammars. Cambridge University Press, New York, NY, USA (2010)
Book Google Scholar
Höschele, M., Kampmann, A., Zeller, A.: Active Learning of Input Grammars (2017), http://arxiv.org/abs/1708.08731
Isberner, M.: Foundations of Active Automata Learning: an Algorithmic Perspective. Ph.D. thesis (2015)
Google Scholar
Isberner, M., Howar, F., Steffen, B.: The TTT Algorithm: A Redundancy-Free Approach to Active Automata Learning. In: Bonakdarpour, B., Smolka, S.A. (eds.) Runtime Verification. pp. 307–322. Springer International Publishing, Cham (2014)
Google Scholar
Isberner, M., Steffen, B.: An Abstract Framework for Counterexample Analysis in Active Automata Learning. JMLR: Workshop and Conference Proceedings, 79–93 (2014)
Google Scholar
Kearns, M.J., Vazirani, U.: Learning Finite Automata by Experimentation. In: An Introduction to Computational Learning Theory, pp. 155–158. The MIT Press (1994)
Google Scholar
Lin, Z., Zhang, X., Xu, D.: Reverse engineering input syntactic structure from program execution and its applications. IEEE Transactions on Software Engineering 36(5), 688–703 (2010). https://doi.org/10.1109/TSE.2009.54
Article Google Scholar
Maler, O., Mens, I.E.: Learning Regular Languages over Large Alphabets. In: Abraham, E., Havelund, K. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2014. pp. 485–499. Springer Berlin Heidelberg, Berlin, Heidelberg (2014)
Google Scholar
de Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 337–340. Springer Berlin Heidelberg, Berlin, Heidelberg (2008)
Google Scholar
Sen, K., Marinov, D., Agha, G.: Cute: A concolic unit testing engine for c. SIGSOFT Softw. Eng. Notes 30(5), 263–272 (Sep 2005). https://doi.org/10.1145/1095430.1081750, https://doi.org/10.1145/1095430.1081750
Veanes, M., De Halleux, P., Tillmann, N.: Rex: Symbolic regular expression explorer. ICST 2010 - 3rd International Conference on Software Testing, Verification and Validation pp. 498–507 (2010). https://doi.org/10.1109/ICST.2010.15
Wu, Z., Johnson, E., Bastani, O., Song, D.: REINAM: Reinforcement Learning for Input-Grammar Inference. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp. 488–498. ACM (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Imperial College London, London, UK
Donato Clun & Antonio Filieri
Stellenbosch University, Stellenbosch, South Africa
Phillip van Heerden & Willem Visser

Authors

Donato Clun
View author publications
You can also search for this author in PubMed Google Scholar
Phillip van Heerden
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Filieri
View author publications
You can also search for this author in PubMed Google Scholar
Willem Visser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Willem Visser .

Editor information

Editors and Affiliations

University of Paderborn, Paderborn, Germany
Heike Wehrheim
ICREA, Open University of Catalonia, Barcelona, Spain
Jordi Cabot

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Clun, D., van Heerden, P., Filieri, A., Visser, W. (2020). Improving Symbolic Automata Learning with Concolic Execution. In: Wehrheim, H., Cabot, J. (eds) Fundamental Approaches to Software Engineering. FASE 2020. Lecture Notes in Computer Science(), vol 12076. Springer, Cham. https://doi.org/10.1007/978-3-030-45234-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-45234-6_1
Published: 17 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45233-9
Online ISBN: 978-3-030-45234-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The European Joint Conferences on Theory and Practice of Software. (opens in a new tab)

Improving Symbolic Automata Learning with Concolic Execution

Abstract

Chapter PDF

Similar content being viewed by others

Active learning for extended finite state machines

The TTT Algorithm: A Redundancy-Free Approach to Active Automata Learning

Active Learning for Efficient Testing of Student Programs

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Improving Symbolic Automata Learning with Concolic Execution

Abstract

Chapter PDF

Similar content being viewed by others

Active learning for extended finite state machines

The TTT Algorithm: A Redundancy-Free Approach to Active Automata Learning

Active Learning for Efficient Testing of Student Programs

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation