Generating models of infinite-state communication protocols using regular inference with abstraction

Aarts, Fides; Jonsson, Bengt; Uijen, Johan; Vaandrager, Frits

doi:10.1007/s10703-014-0216-x

Generating models of infinite-state communication protocols using regular inference with abstraction

Published: 19 November 2014

Volume 46, pages 1–41, (2015)
Cite this article

Formal Methods in System Design Aims and scope Submit manuscript

Fides Aarts¹,
Bengt Jonsson³,
Johan Uijen¹^nAff2 &
…
Frits Vaandrager¹

724 Accesses
37 Citations
Explore all metrics

Abstract

In order to facilitate model-based verification and validation, effort is underway to develop techniques for generating models of communication system components from observations of their external behavior. Most previous such work has employed regular inference techniques which generate modest-size finite-state models. They typically suppress parameters of messages, although these have a significant impact on control flow in many communication protocols. We present a framework, which adapts regular inference to include data parameters in messages and states for generating components with large or infinite message alphabets. A main idea is to adapt the framework of predicate abstraction, successfully used in formal verification. Since we are in a black-box setting, the abstraction must be supplied externally, using information about how the component manages data parameters. We have implemented our techniques by connecting the LearnLib tool for regular inference with an implementation of session initiation protocol (SIP) in ns-2 and an implementation of transmission control protocol (TCP) in Windows 8, and generated models of SIP and TCP components.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

APSL: A Light Weight Testing Tool for Protocols with Complex Messages

Analysing Security Protocols Using Refinement in iUML-B

Model Checking Security Protocols

Notes

It is important to distinguish between mappers and adapters. Whereas a mapper takes care of the translation between concrete and abstract symbols, based on a history dependent abstraction function, the task of the adapter is to take care of the translation between the concrete symbols and the actual input and output events of the SUT. In our experiments the adapter does not abstract and its behavior is history independent. The behavior of the adapter is described by an injective function \(f\) that assigns to each concrete input symbol from \(I\) an input event for the SUT (here: a TCP packet), and a partial, injective function \(g\) that turns output events of the SUT (here: TCP packets or a timeout) into concrete output symbols from \(O\). In case the SUT performs an output event for which \(g\) is not defined (here: the SUT sends a TCP packet that is not expected by the adapter), the adapter raises an exception. Exceptions are not supposed to happen, and in fact did not occur in any of our experiments.
Uijen [55] also describes a more general learning experiment in which all possible combinations of the control bits \({ SYN}\), \({ ACK}\) and \({ FIN}\) are allowed, including the so-called Kamikaze packet [46] in which all the flag bits are turned on.

Abbreviations

\(\mathcal{A}\) :: Mapper
\(\mathcal{A}_S\) :: Symbolic mapper
\(\mathcal{H}\) :: Hypothesis (Mealy machine)
\(\mathcal{M}\) :: Mealy machine
\(\mathcal{M}_S\) :: Symbolic Mealy machine
\(H\) :: Set of states of hypothesis
\(I\) :: Set of (concrete) input symbols
\(O\) :: Set of (concrete) output symbols
\(Q\) :: Set of states of a Mealy machine
\(R\) :: Set of mapper states
\(T\) :: Set of event terms
\(V\) :: Set of variables
\(X\) :: Set of (abstract) input symbols
\(Y\) :: Set of (abstract) output symbols
\(a\) :: (Input or output) symbol
\(d\) :: Parameter value
\(e\) :: Term
\(h\) :: State of hypothesis
\(h_0\) :: Initial state of hypothesis
\(i\) :: (Concrete) input symbol
\(j, k, l, m, n\) :: Index
\(o\) :: (Concrete) output symbol
\(p\) :: Parameter
\(q\) :: State of Mealy machine
\(q_0\) :: Initial state of Mealy machine
\(r\) :: State of mapper
\(r_0\) :: Initial state of mapper
\(s\) :: Sequence of output symbols
\(t\) :: Term
\(u\) :: Sequence of input symbols
\(v\) :: Variable
\(w\) :: Sequence of input and output symbols
\(x\) :: Abstract input symbol
\(y\) :: Abstract output symbol
\(\alpha _\mathcal{A}\) :: Abstraction induced by \(\mathcal{A}\)
\(\gamma _\mathcal{A}\) :: Concretization induced by \(\mathcal{A}\)
\(\delta \) :: Update function
\(\epsilon \) :: Empty sequence
\(\varepsilon \) :: Event primitive
\(\xi \) :: Valuation
\(\tau _\mathcal{A}\) :: Observation abstraction function induced by \(\mathcal{A}\)
\(\varphi \) :: Formula
\(\psi \) :: Abstraction function
\(\varDelta \) :: Set of (symbolic) transitions
\(\varTheta \) :: Initial condition
\(\varSigma \) :: Event signature
\(\varPsi \) :: Set of event abstractions
\(\perp \) :: Undefined value
\(\rightarrow \) :: Transition relation
\(\Rightarrow \) :: Transition relation extended to sequences
\(\equiv \) :: Syntactic equality (of terms)
\(\approx \) :: Observation equivalence (of Mealy machines)
\(\le \) :: Implementation preorder/behavior inclusion (of Mealy machines)
\(\approx _{wb}\) :: Observation congruence (of CCS expressions)

References

Aarts F, Heidarian F, Kuppens H, Olsen P, Vaandrager FW (2012) Automata learning through counterexample-guided abstraction refinement. In: Giannakopoulou D, Méry D (eds) 18th international symposium on formal methods (FM 2012), Paris, France, August 27–31, 2012. Proceedings, volume 7436 of lecture notes in computer science. Springer, Berlin, pp 10–27. August
Aarts F, Heidarian F, Vaandrager FW (2012) A theory of abstractions for learning interface automata. In: Koutny M, Ulidowski I (eds) 23rd international conference on concurrency theory (CONCUR), Newcastle upon Tyne, UK, September 3–8, 2012. Proceedings, volume 7454 of lecture notes in computer science. Springer, Berlin, pp 240–255
Aarts F, Jonsson B, Uijen J (2010) Generating models of infinite-state communication protocols using regular inference with abstraction. In: Petrenko A, Maldonado JC, Simao A (eds) 22nd IFIP international conference on testing software and systems, Natal, Brazil, November 8–10, Proceedings, volume 6435 of lecture notes in computer science. Springer, Berlin, pp 188–204
Aarts F, Kuppens H, Tretmans GJ, Vaandrager FW, Verwer S (2012) Learning and testing the bounded retransmission protocol. In: Heinz J, de la Higuera C, and Oates T (eds) Proceedings 11th international conference on grammatical inference (ICGI 2012), September 5–8, 2012. University of Maryland, College Park, USA, volume 21 of JMLR workshop and conference proceedings, pp 4–18
Aarts F, Schmaltz J, Vaandrager FW (2010) Inference and abstraction of the biometric passport. In: Margaria T, Steffen B (eds) Leveraging applications of formal methods, verification, and balidation—4th international symposium on leveraging applications, ISoLA 2010, Heraklion, Crete, Greece, October 18–21, 2010, Proceedings, part I, volume 6415 of lecture notes in computer science. Springer, Berlin, pp 673–686
Ammons G, Bodik R, Larus J (2002) Mining specifications. In: Proceedings of 29th ACM symposium on principles of programming languages, pp 4–16
Angluin D (1987) Learning regular sets from queries and counterexamples. Inf Comput 75(2):87–106
Article MATH MathSciNet Google Scholar
Ball T, Rajamani SK (2002) The SLAM project: debugging system software via static analysis. In: Proceedings of the 29th ACM symposium on principles of programming languages, pp 1–3
Berg T, Jonsson B, Raffelt H (2006) Regular inference for state machines with parameters. In: Baresi L, Heckel R (eds) FASE, volume 3922 of lecture notes in computer science. Springer, Berlin, pp 107–121
Bergstra JA, Ponse A, Smolka SA (2001) editors. Handbook of process algebra. North-Holland
Broy M, Jonsson B, Katoen J-P, Leucker M, Pretschner A (2004) editors. Model-based testing of reactive systems, volume 3472 of lecture notes in computer science. Springer, Berlin
Brun Y, Ernst MD (2004) Finding latent code errors via machine learning over program executions. In: ICSE’04: 26th international conference on software enginering
Cassel S, Howar F, Jonsson B, Merten M, Steffen B (2011) A succinct canonical register automaton model. In: Bultan T, Hsiung P-A (eds) Automated technology for verification and analysis, 9th international symposium, ATVA 2011, Taipei, Taiwan, October 11–14, 2011. In: Bultan T, Hsiung P-A (eds) Proceedings, volume 6996 of lecture notes in computer science. Springer, Berlin, pp 366–380
Yuan CC, Domagoj B, ECR Shin, Song Dawn (2010) Inference and analysis of formal models of botnet command and control protocols. In: Al-Shaer E, Keromytis AD, and Shmatikov V (eds) ACM conference on computer and communications security. ACM, pp 426–439
Clarke EM, Grumberg O, Jha S, Lu Y, Veith H (2003) Counterexample-guided abstraction refinement for symbolic model checking. J ACM 50(5):752–794
Article MathSciNet Google Scholar
Cobleigh JM, Giannakopoulou D, Pasareanu CS (2003) Learning assumptions for compositional verification. In: Proceedings of the TACAS ’03, 9th international conference on tools and algorithms for the construction and analysis of systems, volume 2619 of lecture notes in computer science. Springer, Berlin, pp 331–346
Fiterău-Broştean P, Janssen R, Vaandrager FW (2014) Learning fragments of the TCP network protocol. In: Lang F, Flammini F (eds) Proceedings 19th international workshop on formal methods for industrial critical systems (FMICS’14), Florence, Italy, volume 8718 of lecture notes in computer science. Springer, Berlin, pp 78–93
van Glabbeek RJ (1993) The linear time—branching time spectrum II (the semantics of sequential systems with silent moves). In: Best E (ed) Proceedings CONCUR 93, Hildesheim, Germany, volume 715 of lecture notes in computer science. Springer, Berlin
Google Scholar
Gold EM (1967) Language identification in the limit. Inf Control 10(5):447–474
Article MATH Google Scholar
Grieskamp W, Kicillof N, Stobie K, Braberman V (2011) Model-based quality assurance of protocol documentation: tools and methodology. Softw Test Verif Reliab 21(1):55–71
Article Google Scholar
Grinchtein O (2008) Learning of timed systems. PhD thesis, Dept. of IT, Uppsala University, Sweden
Grinchtein O, Jonsson B, Leucker M (2004) Learning of event-recording automata. In: Proceedings of the joint conferences FORMATS and FTRTFT, volume 3253 of LNCS, pp 379–396
Groce A, Peled D, Yannakakis M (2002) Adaptive model checking. In: Katoen J-P, Stevens P (eds) Proceedings of the TACAS ’02, 8th international conference on tools and algorithms for the construction and analysis of systems, volume 2280 of lecture notes in computer science. Springer, Berlin, pp 357–370
Groz R, Li K, Petrenko A, Shahbaz M (2008) Modular system verification by inference, testing and reachability analysis. In: TestCom/FATES, volume 5047 of lecture notes in computer science, pp 216–233
Grumberg O, Veith H (eds) (2008) 25 years of model checking: history, achievements, perspectives, volume 5000 of lecture notes in computer science. Springer, Berlin
Hagerer A, Hungar H, Niese O, Steffen B (2002) Model generation by moderated regular extrapolation. In: Kutsche R-D, Weber H (eds) Proceedings of the FASE ’02, 5th international conference on fundamental approaches to software engineering, volume 2306 of lecture notes in computer science. Springer, Berlin, pp 80–95
Henzinger TA, Jhala R, Majumdar R, Sutre G (2002) Lazy abstraction. In: Proceedings of the 29th ACM symposium on principles of programming languages, pp 58–70
Howar F, Isberner M, Steffen B, Bauer O, Jonsson B (2012) Inferring semantic interfaces of data structures. In: ISoLA (1): leveraging applications of formal methods, verification and validation. Technologies for mastering change—5th international symposium, ISoLA 2012, Heraklion, Crete, Greece, October 15–18, 2012, Proceedings, part I, volume 7609 of lecture notes in computer science. Springer, Berlin, pp 554–571
Howar F, Steffen B, Merten M (2011) Automata learning with automated alphabet abstraction refinement. In: VMCAI, volume 6538 of lecture notes in computer science. Springer, Berlin, pp 263–277
Huima A (2007) Implementing conformiq qtronic. In: Petrenko A, Veanes M, Tretmans J, and Grieskamp W (eds) Proceedings of the TestCom/FATES, Tallinn, Estonia, June, 2007, volume 4581 of lecture notes in computer science, pp 1–12
Hungar H, Niese O, Steffen B (2003) Domain-specific optimization in automata learning. In: Proceedings of the 15th international conference on computer aided verification
Janssen R (2013) Learning a state diagram of TCP using abstraction. Bachelor thesis, ICIS, Radboud University Nijmegen
Jonsson B (1994) Compositional specification and verification of distributed systems. ACM Trans Progr Lang Syst 16(2):259–303
Article Google Scholar
Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge, MA
Google Scholar
Li K, Groz R, Shahbaz M (2006) Integration testing of distributed components based on learning parameterized I/O models. In: Najm E, Pradat-Peyre J-F, Donzeau-Gouge V (eds) FORTE, volume 4229 of lecture notes in computer science, pp 436–450
Loiseaux C, Graf S, Sifakis J, Bouajjani A, Bensalem S (1995) Property preserving abstractions for the verification of concurrent systems. Form Methods Syst Des 6(1):11–44
Article MATH Google Scholar
Lorenzoli D, Mariani L, Pezzè M (2008) Automatic generation of software behavioral models. In: Proceedings of the ICSE’08: 30th international conference on software enginering, pp 501–510
Mariani L, Pezz M (2007) Dynamic detection of COTS components incompatibility. IEEE Softw 24(5):76–85
Article Google Scholar
Merten M, Howar F, Steffen B, Cassel S, Jonsson B (2012) Demonstrating learning of register automata. In: Flanagan C, König B (eds) Tools and algorithms for the construction and analysis of systems—18th international conference, TACAS 2012, Held as part of the European joint conferences on theory and practice of software, ETAPS 2012, Tallinn, Estonia, March 24–April 1, 2012. Proceedings, volume 7214 of lecture notes in computer science. Springer, Berlin, pp 466–471
Merten M, Steffen B, Howar F, Margaria T (2011) Next generation LearnLib. In: Abdulla PA, Leino KRM (eds) TACAS, volume 6605 of lecture notes in computer science. Springer, Berlin, pp 220–223
Google Scholar
Milner R (1989) Communication and concurrency. Prentice-Hall, Englewood Cliffs, NJ
MATH Google Scholar
Mohri M (1997) Finite-state transducers in language and speech processing. Comput Linguist 23(2):269–311
MathSciNet Google Scholar
Niese O (2003) An integrated approach to testing complex systems. Technical report, Dortmund University, Doctoral thesis
The Network Simulator NS-2. http://www.isi.edu/nsnam/ns/
Peled D, Vardi MY, Yannakakis M (1999) Black box checking. In: Wu J, Chanson ST, Gao Q (eds) Formal methods for protocol engineering and distributed systems, FORTE/PSTV. Kluwer, Beijing, pp 225–240
Chapter Google Scholar
J. Postel (ed) (1981) Transmission control protocol—DARPA internet program protocol specification (RFC 3261), September 1981. http://www.ietf.org/rfc/rfc793.txt
Raffelt H, Steffen B, Berg T, Margaria T (2009) LearnLib: a framework for extrapolating behavioral models. STTT 11(5):393–407
Article Google Scholar
Rivest RL, Schapire RE (1993) Inference of finite automata using homing sequences. Inf Comput 103:299–347
Article MATH MathSciNet Google Scholar
Rosenberg J, Schulzrinne H, Camarillo G, Johnston A, Peterson J, Sparks R, Handley M, and Schooler E (2002) SIP: session initiation protocol (RFC 3261), June 2002. http://www.ietf.org/rfc/rfc3261.txt
Shahbaz M, Li K, Groz R (2007) Learning and integration of parameterized components through testing. In: Petrenko A, Veanes M, Tretmans J, and Grieskamp W (eds) TestCom/FATES, volume 4581 of lecture notes in computer science. Springer, Berlin, pp 319–334
Shu G, Lee D (2007) Testing security properties of protocol implementations - a machine learning based approach. In: Proceedings of the ICDCS’07, 27th IEEE international conference on distributed computing systems, Toronto, Ontario. IEEE Computer Society
Smeenk W (2012) Applying automata learning to complex industrial software. Master thesis, Radboud University Nijmegen, September
Stevens WR (1994) TCP/IP illustrated, volume 1: the protocols. Addison Wesley Longman Inc, Reading, MA
Google Scholar
Tretmans J (1992) A formal approach to conformance testing. PhD thesis, University of Twente, December
Uijen J (2009) Learning models of communication protocols using abstraction techniques. Master thesis, Radboud University Nijmegen and Uppsala University, November
Veanes M, Campbell C, Grieskamp W, Schulte W, Tillmann W, Nachmanson L (2008) Model-based testing of object-oriented reactive systems with spec explorer. In: Hierons RM, Bowen JP, Harman M (eds) Formal methods and testing, an outcome of the FORTEST network, revised selected papers, volume 4949 of lecture notes in computer science. Springer, Berlin, pp 39–76
Google Scholar
Veanes M, Hooimeijer P, Livshits B, Molnar D, Bjørner N Symbolic finite state transducers: algorithms and applications. In: Field J, Hicks M (eds) Proceedings of the 39th ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22–28, 2012, pp 137–150. ACM, 2012

Download references

Acknowledgments

This work was partially supported by the European Union FET Project 231167 CONNECT: Emergent Connectors for Eternal Software Intensive Networked Systems (http://connect-forever.eu/), the STW project 11763 ITALIA: Integrating Testing And Learning of Interface Automata, http://www.italia.cs.ru.nl/, and EU FP7 grant no 214755 QUASIMODO, http://www.quasimodo.aau.dk/. We are grateful to Falk Howar from TU Dortmund for his generous LearnLib support, and to Falk Howar and Bernhard Steffen for fruitful discussions. Paul Fiterău-Broştean helped us with the TCP experiments, using the setup developed by Ramon Jansen in his bachelor thesis [32]. We are also most grateful to both reviewers. Their critical comments very much helped us to improve the paper and to clarify our contribution.

Author information

Johan Uijen
Present address: CGI Nederland B.V., P.O. Box 8566, 3009 AN, Rotterdam, The Netherlands

Authors and Affiliations

Institute for Computing and Information Sciences, Radboud University Nijmegen, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands
Fides Aarts, Johan Uijen & Frits Vaandrager
Department of Computer Systems, Uppsala University, Uppsala, Sweden
Bengt Jonsson

Authors

Fides Aarts
View author publications
You can also search for this author in PubMed Google Scholar
Bengt Jonsson
View author publications
You can also search for this author in PubMed Google Scholar
Johan Uijen
View author publications
You can also search for this author in PubMed Google Scholar
Frits Vaandrager
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frits Vaandrager.

Additional information

A preliminary version of this paper appeared as [3].

Appendices

Appendix 1: Pruned SIP model

See Fig. 8.

Appendix 2: Complete SIP model

See Fig. 9.

Appendix 3: Model of TCP server

See Fig. 10

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aarts, F., Jonsson, B., Uijen, J. et al. Generating models of infinite-state communication protocols using regular inference with abstraction. Form Methods Syst Des 46, 1–41 (2015). https://doi.org/10.1007/s10703-014-0216-x

Download citation

Published: 19 November 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s10703-014-0216-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generating models of infinite-state communication protocols using regular inference with abstraction

Abstract

Access this article

Similar content being viewed by others

APSL: A Light Weight Testing Tool for Protocols with Complex Messages