Skip to main content
Log in

Generating models of infinite-state communication protocols using regular inference with abstraction

  • Published:
Formal Methods in System Design Aims and scope Submit manuscript


In order to facilitate model-based verification and validation, effort is underway to develop techniques for generating models of communication system components from observations of their external behavior. Most previous such work has employed regular inference techniques which generate modest-size finite-state models. They typically suppress parameters of messages, although these have a significant impact on control flow in many communication protocols. We present a framework, which adapts regular inference to include data parameters in messages and states for generating components with large or infinite message alphabets. A main idea is to adapt the framework of predicate abstraction, successfully used in formal verification. Since we are in a black-box setting, the abstraction must be supplied externally, using information about how the component manages data parameters. We have implemented our techniques by connecting the LearnLib tool for regular inference with an implementation of session initiation protocol (SIP) in ns-2 and an implementation of transmission control protocol (TCP) in Windows 8, and generated models of SIP and TCP components.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others


  1. It is important to distinguish between mappers and adapters. Whereas a mapper takes care of the translation between concrete and abstract symbols, based on a history dependent abstraction function, the task of the adapter is to take care of the translation between the concrete symbols and the actual input and output events of the SUT. In our experiments the adapter does not abstract and its behavior is history independent. The behavior of the adapter is described by an injective function \(f\) that assigns to each concrete input symbol from \(I\) an input event for the SUT (here: a TCP packet), and a partial, injective function \(g\) that turns output events of the SUT (here: TCP packets or a timeout) into concrete output symbols from \(O\). In case the SUT performs an output event for which \(g\) is not defined (here: the SUT sends a TCP packet that is not expected by the adapter), the adapter raises an exception. Exceptions are not supposed to happen, and in fact did not occur in any of our experiments.

  2. Uijen [55] also describes a more general learning experiment in which all possible combinations of the control bits \({ SYN}\), \({ ACK}\) and \({ FIN}\) are allowed, including the so-called Kamikaze packet [46] in which all the flag bits are turned on.


\(\mathcal{A}\) :


\(\mathcal{A}_S\) :

Symbolic mapper

\(\mathcal{H}\) :

Hypothesis (Mealy machine)

\(\mathcal{M}\) :

Mealy machine

\(\mathcal{M}_S\) :

Symbolic Mealy machine

\(H\) :

Set of states of hypothesis

\(I\) :

Set of (concrete) input symbols

\(O\) :

Set of (concrete) output symbols

\(Q\) :

Set of states of a Mealy machine

\(R\) :

Set of mapper states

\(T\) :

Set of event terms

\(V\) :

Set of variables

\(X\) :

Set of (abstract) input symbols

\(Y\) :

Set of (abstract) output symbols

\(a\) :

(Input or output) symbol

\(d\) :

Parameter value

\(e\) :


\(h\) :

State of hypothesis

\(h_0\) :

Initial state of hypothesis

\(i\) :

(Concrete) input symbol

\(j, k, l, m, n\) :


\(o\) :

(Concrete) output symbol

\(p\) :


\(q\) :

State of Mealy machine

\(q_0\) :

Initial state of Mealy machine

\(r\) :

State of mapper

\(r_0\) :

Initial state of mapper

\(s\) :

Sequence of output symbols

\(t\) :


\(u\) :

Sequence of input symbols

\(v\) :


\(w\) :

Sequence of input and output symbols

\(x\) :

Abstract input symbol

\(y\) :

Abstract output symbol

\(\alpha _\mathcal{A}\) :

Abstraction induced by \(\mathcal{A}\)

\(\gamma _\mathcal{A}\) :

Concretization induced by \(\mathcal{A}\)

\(\delta \) :

Update function

\(\epsilon \) :

Empty sequence

\(\varepsilon \) :

Event primitive

\(\xi \) :


\(\tau _\mathcal{A}\) :

Observation abstraction function induced by \(\mathcal{A}\)

\(\varphi \) :


\(\psi \) :

Abstraction function

\(\varDelta \) :

Set of (symbolic) transitions

\(\varTheta \) :

Initial condition

\(\varSigma \) :

Event signature

\(\varPsi \) :

Set of event abstractions

\(\perp \) :

Undefined value

\(\rightarrow \) :

Transition relation

\(\Rightarrow \) :

Transition relation extended to sequences

\(\equiv \) :

Syntactic equality (of terms)

\(\approx \) :

Observation equivalence (of Mealy machines)

\(\le \) :

Implementation preorder/behavior inclusion (of Mealy machines)

\(\approx _{wb}\) :

Observation congruence (of CCS expressions)


  1. Aarts F, Heidarian F, Kuppens H, Olsen P, Vaandrager FW (2012) Automata learning through counterexample-guided abstraction refinement. In: Giannakopoulou D, Méry D (eds) 18th international symposium on formal methods (FM 2012), Paris, France, August 27–31, 2012. Proceedings, volume 7436 of lecture notes in computer science. Springer, Berlin, pp 10–27. August

  2. Aarts F, Heidarian F, Vaandrager FW (2012) A theory of abstractions for learning interface automata. In: Koutny M, Ulidowski I (eds) 23rd international conference on concurrency theory (CONCUR), Newcastle upon Tyne, UK, September 3–8, 2012. Proceedings, volume 7454 of lecture notes in computer science. Springer, Berlin, pp 240–255

  3. Aarts F, Jonsson B, Uijen J (2010) Generating models of infinite-state communication protocols using regular inference with abstraction. In: Petrenko A, Maldonado JC, Simao A (eds) 22nd IFIP international conference on testing software and systems, Natal, Brazil, November 8–10, Proceedings, volume 6435 of lecture notes in computer science. Springer, Berlin, pp 188–204

  4. Aarts F, Kuppens H, Tretmans GJ, Vaandrager FW, Verwer S (2012) Learning and testing the bounded retransmission protocol. In: Heinz J, de la Higuera C, and Oates T (eds) Proceedings 11th international conference on grammatical inference (ICGI 2012), September 5–8, 2012. University of Maryland, College Park, USA, volume 21 of JMLR workshop and conference proceedings, pp 4–18

  5. Aarts F, Schmaltz J, Vaandrager FW (2010) Inference and abstraction of the biometric passport. In: Margaria T, Steffen B (eds) Leveraging applications of formal methods, verification, and balidation—4th international symposium on leveraging applications, ISoLA 2010, Heraklion, Crete, Greece, October 18–21, 2010, Proceedings, part I, volume 6415 of lecture notes in computer science. Springer, Berlin, pp 673–686

  6. Ammons G, Bodik R, Larus J (2002) Mining specifications. In: Proceedings of 29th ACM symposium on principles of programming languages, pp 4–16

  7. Angluin D (1987) Learning regular sets from queries and counterexamples. Inf Comput 75(2):87–106

    Article  MATH  MathSciNet  Google Scholar 

  8. Ball T, Rajamani SK (2002) The SLAM project: debugging system software via static analysis. In: Proceedings of the 29th ACM symposium on principles of programming languages, pp 1–3

  9. Berg T, Jonsson B, Raffelt H (2006) Regular inference for state machines with parameters. In: Baresi L, Heckel R (eds) FASE, volume 3922 of lecture notes in computer science. Springer, Berlin, pp 107–121

  10. Bergstra JA, Ponse A, Smolka SA (2001) editors. Handbook of process algebra. North-Holland

  11. Broy M, Jonsson B, Katoen J-P, Leucker M, Pretschner A (2004) editors. Model-based testing of reactive systems, volume 3472 of lecture notes in computer science. Springer, Berlin

  12. Brun Y, Ernst MD (2004) Finding latent code errors via machine learning over program executions. In: ICSE’04: 26th international conference on software enginering

  13. Cassel S, Howar F, Jonsson B, Merten M, Steffen B (2011) A succinct canonical register automaton model. In: Bultan T, Hsiung P-A (eds) Automated technology for verification and analysis, 9th international symposium, ATVA 2011, Taipei, Taiwan, October 11–14, 2011. In: Bultan T, Hsiung P-A (eds) Proceedings, volume 6996 of lecture notes in computer science. Springer, Berlin, pp 366–380

  14. Yuan CC, Domagoj B, ECR Shin, Song Dawn (2010) Inference and analysis of formal models of botnet command and control protocols. In: Al-Shaer E, Keromytis AD, and Shmatikov V (eds) ACM conference on computer and communications security. ACM, pp 426–439

  15. Clarke EM, Grumberg O, Jha S, Lu Y, Veith H (2003) Counterexample-guided abstraction refinement for symbolic model checking. J ACM 50(5):752–794

    Article  MathSciNet  Google Scholar 

  16. Cobleigh JM, Giannakopoulou D, Pasareanu CS (2003) Learning assumptions for compositional verification. In: Proceedings of the TACAS ’03, 9th international conference on tools and algorithms for the construction and analysis of systems, volume 2619 of lecture notes in computer science. Springer, Berlin, pp 331–346

  17. Fiterău-Broştean P, Janssen R, Vaandrager FW (2014) Learning fragments of the TCP network protocol. In: Lang F, Flammini F (eds) Proceedings 19th international workshop on formal methods for industrial critical systems (FMICS’14), Florence, Italy, volume 8718 of lecture notes in computer science. Springer, Berlin, pp 78–93

  18. van Glabbeek RJ (1993) The linear time—branching time spectrum II (the semantics of sequential systems with silent moves). In: Best E (ed) Proceedings CONCUR 93, Hildesheim, Germany, volume 715 of lecture notes in computer science. Springer, Berlin

    Google Scholar 

  19. Gold EM (1967) Language identification in the limit. Inf Control 10(5):447–474

    Article  MATH  Google Scholar 

  20. Grieskamp W, Kicillof N, Stobie K, Braberman V (2011) Model-based quality assurance of protocol documentation: tools and methodology. Softw Test Verif Reliab 21(1):55–71

    Article  Google Scholar 

  21. Grinchtein O (2008) Learning of timed systems. PhD thesis, Dept. of IT, Uppsala University, Sweden

  22. Grinchtein O, Jonsson B, Leucker M (2004) Learning of event-recording automata. In: Proceedings of the joint conferences FORMATS and FTRTFT, volume 3253 of LNCS, pp 379–396

  23. Groce A, Peled D, Yannakakis M (2002) Adaptive model checking. In: Katoen J-P, Stevens P (eds) Proceedings of the TACAS ’02, 8th international conference on tools and algorithms for the construction and analysis of systems, volume 2280 of lecture notes in computer science. Springer, Berlin, pp 357–370

  24. Groz R, Li K, Petrenko A, Shahbaz M (2008) Modular system verification by inference, testing and reachability analysis. In: TestCom/FATES, volume 5047 of lecture notes in computer science, pp 216–233

  25. Grumberg O, Veith H (eds) (2008) 25 years of model checking: history, achievements, perspectives, volume 5000 of lecture notes in computer science. Springer, Berlin

  26. Hagerer A, Hungar H, Niese O, Steffen B (2002) Model generation by moderated regular extrapolation. In: Kutsche R-D, Weber H (eds) Proceedings of the FASE ’02, 5th international conference on fundamental approaches to software engineering, volume 2306 of lecture notes in computer science. Springer, Berlin, pp 80–95

  27. Henzinger TA, Jhala R, Majumdar R, Sutre G (2002) Lazy abstraction. In: Proceedings of the 29th ACM symposium on principles of programming languages, pp 58–70

  28. Howar F, Isberner M, Steffen B, Bauer O, Jonsson B (2012) Inferring semantic interfaces of data structures. In: ISoLA (1): leveraging applications of formal methods, verification and validation. Technologies for mastering change—5th international symposium, ISoLA 2012, Heraklion, Crete, Greece, October 15–18, 2012, Proceedings, part I, volume 7609 of lecture notes in computer science. Springer, Berlin, pp 554–571

  29. Howar F, Steffen B, Merten M (2011) Automata learning with automated alphabet abstraction refinement. In: VMCAI, volume 6538 of lecture notes in computer science. Springer, Berlin, pp 263–277

  30. Huima A (2007) Implementing conformiq qtronic. In: Petrenko A, Veanes M, Tretmans J, and Grieskamp W (eds) Proceedings of the TestCom/FATES, Tallinn, Estonia, June, 2007, volume 4581 of lecture notes in computer science, pp 1–12

  31. Hungar H, Niese O, Steffen B (2003) Domain-specific optimization in automata learning. In: Proceedings of the 15th international conference on computer aided verification

  32. Janssen R (2013) Learning a state diagram of TCP using abstraction. Bachelor thesis, ICIS, Radboud University Nijmegen

  33. Jonsson B (1994) Compositional specification and verification of distributed systems. ACM Trans Progr Lang Syst 16(2):259–303

    Article  Google Scholar 

  34. Kearns MJ, Vazirani UV (1994) An introduction to computational learning theory. MIT Press, Cambridge, MA

    Google Scholar 

  35. Li K, Groz R, Shahbaz M (2006) Integration testing of distributed components based on learning parameterized I/O models. In: Najm E, Pradat-Peyre J-F, Donzeau-Gouge V (eds) FORTE, volume 4229 of lecture notes in computer science, pp 436–450

  36. Loiseaux C, Graf S, Sifakis J, Bouajjani A, Bensalem S (1995) Property preserving abstractions for the verification of concurrent systems. Form Methods Syst Des 6(1):11–44

    Article  MATH  Google Scholar 

  37. Lorenzoli D, Mariani L, Pezzè M (2008) Automatic generation of software behavioral models. In: Proceedings of the ICSE’08: 30th international conference on software enginering, pp 501–510

  38. Mariani L, Pezz M (2007) Dynamic detection of COTS components incompatibility. IEEE Softw 24(5):76–85

    Article  Google Scholar 

  39. Merten M, Howar F, Steffen B, Cassel S, Jonsson B (2012) Demonstrating learning of register automata. In: Flanagan C, König B (eds) Tools and algorithms for the construction and analysis of systems—18th international conference, TACAS 2012, Held as part of the European joint conferences on theory and practice of software, ETAPS 2012, Tallinn, Estonia, March 24–April 1, 2012. Proceedings, volume 7214 of lecture notes in computer science. Springer, Berlin, pp 466–471

  40. Merten M, Steffen B, Howar F, Margaria T (2011) Next generation LearnLib. In: Abdulla PA, Leino KRM (eds) TACAS, volume 6605 of lecture notes in computer science. Springer, Berlin, pp 220–223

    Google Scholar 

  41. Milner R (1989) Communication and concurrency. Prentice-Hall, Englewood Cliffs, NJ

    MATH  Google Scholar 

  42. Mohri M (1997) Finite-state transducers in language and speech processing. Comput Linguist 23(2):269–311

    MathSciNet  Google Scholar 

  43. Niese O (2003) An integrated approach to testing complex systems. Technical report, Dortmund University, Doctoral thesis

  44. The Network Simulator NS-2.

  45. Peled D, Vardi MY, Yannakakis M (1999) Black box checking. In: Wu J, Chanson ST, Gao Q (eds) Formal methods for protocol engineering and distributed systems, FORTE/PSTV. Kluwer, Beijing, pp 225–240

    Chapter  Google Scholar 

  46. J. Postel (ed) (1981) Transmission control protocol—DARPA internet program protocol specification (RFC 3261), September 1981.

  47. Raffelt H, Steffen B, Berg T, Margaria T (2009) LearnLib: a framework for extrapolating behavioral models. STTT 11(5):393–407

    Article  Google Scholar 

  48. Rivest RL, Schapire RE (1993) Inference of finite automata using homing sequences. Inf Comput 103:299–347

    Article  MATH  MathSciNet  Google Scholar 

  49. Rosenberg J, Schulzrinne H, Camarillo G, Johnston A, Peterson J, Sparks R, Handley M, and Schooler E (2002) SIP: session initiation protocol (RFC 3261), June 2002.

  50. Shahbaz M, Li K, Groz R (2007) Learning and integration of parameterized components through testing. In: Petrenko A, Veanes M, Tretmans J, and Grieskamp W (eds) TestCom/FATES, volume 4581 of lecture notes in computer science. Springer, Berlin, pp 319–334

  51. Shu G, Lee D (2007) Testing security properties of protocol implementations - a machine learning based approach. In: Proceedings of the ICDCS’07, 27th IEEE international conference on distributed computing systems, Toronto, Ontario. IEEE Computer Society

  52. Smeenk W (2012) Applying automata learning to complex industrial software. Master thesis, Radboud University Nijmegen, September

  53. Stevens WR (1994) TCP/IP illustrated, volume 1: the protocols. Addison Wesley Longman Inc, Reading, MA

    Google Scholar 

  54. Tretmans J (1992) A formal approach to conformance testing. PhD thesis, University of Twente, December

  55. Uijen J (2009) Learning models of communication protocols using abstraction techniques. Master thesis, Radboud University Nijmegen and Uppsala University, November

  56. Veanes M, Campbell C, Grieskamp W, Schulte W, Tillmann W, Nachmanson L (2008) Model-based testing of object-oriented reactive systems with spec explorer. In: Hierons RM, Bowen JP, Harman M (eds) Formal methods and testing, an outcome of the FORTEST network, revised selected papers, volume 4949 of lecture notes in computer science. Springer, Berlin, pp 39–76

    Google Scholar 

  57. Veanes M, Hooimeijer P, Livshits B, Molnar D, Bjørner N Symbolic finite state transducers: algorithms and applications. In: Field J, Hicks M (eds) Proceedings of the 39th ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL 2012, Philadelphia, Pennsylvania, USA, January 22–28, 2012, pp 137–150. ACM, 2012

Download references


This work was partially supported by the European Union FET Project 231167 CONNECT: Emergent Connectors for Eternal Software Intensive Networked Systems (, the STW project 11763 ITALIA: Integrating Testing And Learning of Interface Automata,, and EU FP7 grant no 214755 QUASIMODO, We are grateful to Falk Howar from TU Dortmund for his generous LearnLib support, and to Falk Howar and Bernhard Steffen for fruitful discussions. Paul Fiterău-Broştean helped us with the TCP experiments, using the setup developed by Ramon Jansen in his bachelor thesis [32]. We are also most grateful to both reviewers. Their critical comments very much helped us to improve the paper and to clarify our contribution.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Frits Vaandrager.

Additional information

A preliminary version of this paper appeared as [3].


Appendix 1: Pruned SIP model

See Fig. 8.

Fig. 8
figure 8

Pruned SIP model

Appendix 2: Complete SIP model

See Fig. 9.

Fig. 9
figure 9

Complete SIP model

Appendix 3: Model of TCP server

See Fig. 10

Fig. 10
figure 10

Learned model of the TCP server. For readability, we write \(Flag(SeqNr, AckNr) / Flag^{\prime }(SeqNr^{\prime }, AckNr^{\prime })\) instead of \(Request(Flag, SeqNr, AckNr)/\,Response\) \((Flag^{\prime }, SeqNr^{\prime }, AckNr^{\prime })\). Moreover, V represents the VALID equivalence class. Also, we omitted all the self-loops with output ‘timeout’ from the diagram. There is a clear correspondence between the states of the learned model and the states mentioned in the TCP standard [46, 53]: \(s0\) corresponds to the LISTEN state from the standard, \(s1\) to the SYN_RCVD state, \(s2\) to the ESTABLISHED state, and \(s3\) to the CLOSE_WAIT state

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aarts, F., Jonsson, B., Uijen, J. et al. Generating models of infinite-state communication protocols using regular inference with abstraction. Form Methods Syst Des 46, 1–41 (2015).

Download citation

  • Published:

  • Issue Date:

  • DOI: