Formal Aspects of Computing

, Volume 28, Issue 2, pp 233–263 | Cite as

Active learning for extended finite state machines

  • Sofia CasselEmail author
  • Falk Howar
  • Bengt Jonsson
  • Bernhard Steffen
Original Article


We present a black-box active learning algorithm for inferring extended finite state machines (EFSM)s by dynamic black-box analysis. EFSMs can be used to model both data flow and control behavior of software and hardware components. Different dialects of EFSMs are widely used in tools for model-based software development, verification, and testing. Our algorithm infers a class of EFSMs called register automata. Register automata have a finite control structure, extended with variables (registers), assignments, and guards. Our algorithm is parameterized on a particular theory, i.e., a set of operations and tests on the data domain that can be used in guards.

Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We also show that, under these conditions, our framework induces a generalization of the classical Nerode equivalence and canonical automata construction to the symbolic setting. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.


Priority Queue Symbolic Execution Automaton Learning Data Language Membership Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. ABL02.
    Ammons G, Bodík R, Larus JR (2002) Mining specifications. In: Proc. POPL 2002, pp 4–16. ACMGoogle Scholar
  2. ACMN05.
    Alur R, Cerný P, Madhusudan P, Nam W (2005) Synthesis of interface specifications for Java classes. In: Proc. POPL 2005, pp 98–109. ACMGoogle Scholar
  3. AdRP13.
    Aarts F, Ruiter JD, Poll E (2013) Formal models of bank cards for free. In: Proc. ICSTW 2013, pp 461–468. IEEEGoogle Scholar
  4. AHK+12.
    Aarts F, Heidarian F, Kuppens H, Olsen P, Vaandrager FW (2012) Automata learning through counterexample guided abstraction refinement. In: Proc. FM 2012, volume 7436 of LNCS, pp 10–27. SpringerGoogle Scholar
  5. AHKV14.
    Aarts F, Howar F, Kuppens H, Vaandrager FW (2014) Algorithms for inferring register automata—a comparison of existing approaches. In: Proc. ISoLA 2014, Part I, volume 8802 of LNCS, pp 202–219. SpringerGoogle Scholar
  6. AJUV14.
    Aarts F, Jonsson B, Uijen J, Vaandrager F (2014) Generating models of infinite-state communication protocols using regular inference with abstraction. Formal Methods Syst Design 46(1):1–41Google Scholar
  7. AKT+12.
    Aarts F, Kuppens H, Tretmans J, Vaandrager FW, Verwer S (2012) Learning and testing the bounded retransmission protocol. In: Proc. ICGI 2012, volume 21 of JMLR Proceedings, pp 4–18. JMLR.orgGoogle Scholar
  8. Ang87.
    Angluin D (1987) Learning regular sets from queries and counterexamples. Inf Comput 75(2): 87–106MathSciNetCrossRefzbMATHGoogle Scholar
  9. Arb04.
    Reo FA (2004) A channel-based coordination model for component composition. Math Struct Comput Sci 14(3): 329–366MathSciNetCrossRefGoogle Scholar
  10. ASV10.
    Aarts F, Schmaltz J, Vaandrager FW (2010) Inference and abstraction of the biometric passport. In: Proc. ISoLA 2010, Part I, volume 6415 of LNCS, pp 673–686. SpringerGoogle Scholar
  11. BB13.
    Botinčan M, Babić D (2013) Sigma*: symbolic learning of input–output specifications. In: Proc. POPL 2013, pp 443–456. ACMGoogle Scholar
  12. BBC+06.
    Ball T, Bounimova E, Cook B, Levin V, Lichtenberg J, McGarvey C, Ondrusek B, Rajamani SK, Ustuner A (2006) Thorough static analysis of device drivers. In: Proc. 2006 EuroSys Conf., pp 73–85. ACMGoogle Scholar
  13. BHLM13.
    Bollig B, Habermehl P, Leucker M, Monmege B (2013) A fresh approach to learning register automata. In: Proc. DLT 2013, volume 7907 of LNCS, pp 118–130. SpringerGoogle Scholar
  14. BJK+04.
    Broy M, Jonsson B, Katoen J-P, Leucker M, Pretschner A (eds) (2004) Model-based testing of reactive systems, volume 3472 of LNCS. Springer, BerlinGoogle Scholar
  15. BJR08.
    Berg T, Jonsson B, Raffelt H (2008) Regular inference for state machines using domains with equality tests. In: Proc. FASE, volume 4961 of LNCS, pp 317–331Google Scholar
  16. BPT10.
    Bertoli P, Pistore M, Traverso P (2010) Automated composition of web services via planning in asynchronous domains. Artif Intell 174(3-4): 316–361MathSciNetCrossRefGoogle Scholar
  17. CGP01.
    Clarke E. M, Grumberg O, Peled D (2001) Model checking. MIT Press, CambridgeCrossRefGoogle Scholar
  18. CHJ15.
    Cassel S, Howar F, Jonsson B (2015) RALib: a LearnLib extension for inferring efsms. In: DIFTS 2015, Available online:
  19. CHJS14.
    Cassel S, Howar F, Jonsson B, Steffen B (2014) Learning extended finite state machines. In: Proc. SEFM 2014, volume 8702 of LNCS, pp 250–264. SpringerGoogle Scholar
  20. dMB08.
    Moura L. MD, Bjørner N (2008) Z3: an efficient SMT solver. In: Proc. TACAS 2008, volume 4963 of LNCS, pp 337–340. SpringerGoogle Scholar
  21. EPG+07.
    Ernst MD, Perkins JH, Guo PJ, McCamant S, Pacheco C, Tschantz MS, Xiao C (2007) The Daikon system for dynamic detection of likely invariants. Sci Comput Program 69(1-3): 35–45MathSciNetCrossRefzbMATHGoogle Scholar
  22. GHP02.
    Gery E, Harel D, Rhapsody EP (2002) A complete life-cycle model-based development system. In: Proc. IFM 2002, volume 2335 of LNCS, pp 1–10. SpringerGoogle Scholar
  23. GIO12.
    Groz R, Irfan M-N, Oriat C (2012) Algorithmic improvements on regular inference of software models and perspectives for security testing. In: Proc. ISoLA 2012, Part I, volume 7609 of LNCS, pp 444–457. SpringerGoogle Scholar
  24. GRR12.
    Giannakopoulou D, Rakamarić Z, Raman V (2012) Symbolic learning of component interfaces. In: Proc. SAS 2012, volume 7460 of LNCS, pp 248–264. Springer, Berlin, HeidelbergGoogle Scholar
  25. HHNS02.
    Hagerer A, Hungar H, Niese O, Steffen B (2002) Model generation by moderated regular extrapolation. In: Proc. FASE 2002, volume 2306 of LNCS, pp 80–95. SpringerGoogle Scholar
  26. HIS+12.
    Howar F, Isberner M, Steffen B, Bauer O, Jonsson B (2012) Inferring semantic interfaces of data structures. In: Proc. ISoLA 2012, Part I, volume 7609 of LNCS, pp 554–571. SpringerGoogle Scholar
  27. HJM05.
    Henzinger TA, Jhala R, Majumdar R (2005) Permissive interfaces. In: Proc. ESEC/FSE 2005, pp 31–40. ACMGoogle Scholar
  28. HNS03.
    Hungar H, Niese O, Steffen B (2003) Domain-specific optimization in automata learning. In: Proc. CAV 2003, volume 2725 of LNCS, pp 315–327. SpringerGoogle Scholar
  29. How12.
    Howar F (2012) Active learning of interface programs. PhD thesis, Technical University of Dortmund, Germany, 2012Google Scholar
  30. HSJC12.
    Howar F, Steffen B, Jonsson B, Cassel S (2012) Inferring canonical register automata. In: Proc. VMCAI 2012, volume 7148 of LNCS, pp 251–266. SpringerGoogle Scholar
  31. HSM11.
    Howar F, Steffen B, Merten M (2011) Automata learning with automated alphabet abstraction refinement. In: Proc. VMCAI 2011, volume 6538 of LNCS, pp 263–277. SpringerGoogle Scholar
  32. Hui07.
    Huima A (2007) Implementing Conformiq Qtronic. In: Proc. TestCom/FATES 2007, volume 4581 of LNCS, pp 1–12. SpringerGoogle Scholar
  33. IHS14.
    Isberner M, Howar F, Steffen B (2014) Learning register automata: from languages to program structures. Mach Learn 96(1-2): 65–98MathSciNetCrossRefzbMATHGoogle Scholar
  34. IHS15.
    Isberner M, Howar F, Steffen B (2015) The open-source learnlib—a framework for active automata learning. In: Kroening D, Pasareanu CS (eds) Proc. CAV 2015, volume 9206 of LNCS, pp 487–495. SpringerGoogle Scholar
  35. JM09.
    Jhala R, Majumdar R (2009) Software model checking. ACM Comput Surv 41(4): 21, 1–21, 54CrossRefGoogle Scholar
  36. LMP08.
    Lorenzoli D, Mariani L, Pezzè M (2008) Automatic generation of software behavioral models. In: Proc. ICSE 2008, pp 501–510. ACMGoogle Scholar
  37. MM14.
    Maler O, Mens I-E (2014) Learning regular languages over large alphabets. In: Proc. TACAS 2014, volume 8413 of LNCS, pp 485–499. SpringerGoogle Scholar
  38. RS93.
    Rivest RL, Schapire RE (1993) Inference of finite automata using homing sequences. Inf Comput 103(2): 299–347MathSciNetCrossRefzbMATHGoogle Scholar
  39. SL07.
    Shu G, Lee D (2007) Testing security properties of protocol implementations—a machine learning based approach. In: Proc. ICDCS 2007, pp 25. IEEEGoogle Scholar
  40. UL07.
    Utting M, Legeard B (2007) Practical model-based testing—a tools approach. Morgan Kaufmann, BurlingtonGoogle Scholar
  41. WBDP10.
    Walkinshaw N, Bogdanov K, Derrick J, Paris J (2010) Increasing functional coverage by inductive testing: a case study. In: Proc. ICTSS 2010, volume 6435 of LNCS, pp 126–141. SpringerGoogle Scholar
  42. XSL+13.
    Xiao H, Sun J, Liu Y, Lin S-W, Sun C (2013) Tzuyu: learning stateful typestates. In: Proc. ASE 2013, pp 432–442. IEEEGoogle Scholar

Copyright information

© British Computer Society 2016

Authors and Affiliations

  • Sofia Cassel
    • 1
    Email author
  • Falk Howar
    • 2
  • Bengt Jonsson
    • 1
  • Bernhard Steffen
    • 3
  1. 1.Department of Information TechnologyUppsala UniversityUppsalaSweden
  2. 2.IPSSE, TU ClausthalClausthal-ZellerfeldGermany
  3. 3.Chair for Programming SystemsTU DortmundDortmundGermany

Personalised recommendations