Abstract
Program understanding is a time-consuming and tedious activity for software developers. Manually building abstractions from source code requires in-depth analysis of the code. Automatic extraction of such models is possible, but cannot derive meaningful abstractions that are not already contained in the code. The automated extraction even has problems to decide which aspects of the code are important and which are not. Therefore, interactive semi-automatic approaches are the compromise of choice. In this article, we describe how state machines that describe the behaviour of a function can be extracted from code. The approach includes interaction – the user decides which aspects of the identified potentially relevant information is really relevant and which is not. This helps to reduce the resulting state machines to an understandable degree. However, these state machines in their raw form have transition conditions that are very complex and thus not understandable for humans. Therefore, we also introduce a technique to reduce these guards to an understandable form. The technique is a combination of heuristic logic minimization, exploitation of infeasible paths, and using transition priorities. We evaluate the approach on industrial embedded C code, first in a case study with hundreds of extracted state machines, and then in two experiments with professional developers. The results show that the approach is highly effective in making the guards understandable, and that guards reduced by our approach and presented with priorities are easier to understand than guards without priorities. We also show that the overall approach is beneficial for program comprehension. The guard reduction approach itself is quite generic and can also be applied to other problems. We demonstrate that for the simplification of mode switch logic.
Similar content being viewed by others
References
Alimadadi S, Mesbah A, Pattabiraman K (2018) Inferring hierarchical motifs from execution traces. In: Proceedings of 40th International conference on software engineering, pp 776–787
Ammons G, Bodík R, Larus JR (2002) Mining specifications. In: Proceedings 29th symp. on principles of programming languages, pp 4–16
Bae JH, Chae HS (2016) Systematic approach for constructing an understandable state machine from a contract-based specification: Controlled experiments. Softw Syst Modell 15(3):847–879
Bailey RA (2008) Design of comparative experiments. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge
Bryant RE (1986) Graph-based algorithms for boolean function manipulation. IEEE Trans Comput 35(8):677–691
Christensen LB (2001) Experimental methodology. Allyn & Bacon, Boston
Collins RW, Hevner AR, Walton GH, Linger RC (2008) The impacts of function extraction technology on program comprehension: a controlled experiment. Inf Softw Technol 50(11):1165–1179
Corbett JC, Dwyer MB, Hatcliff J, Laubach S, Pasareanu CS, Robby ZH (2000) Bandera: extracting finite-state models from Java source code. In: Proceedings of 22nd International conference on software engineering, pp 439–448
Cornelissen B, Zaidman A, van Deursen A (2011) A controlled experiment for program comprehension through trace visualization. IEEE Trans Softw Eng 37(3):341–355
Coudert O, Sasao T (2002) Two-level Logic Minimization, chap. 2. Springer, Boston, pp 1–27
Cruz-Lemus JA, Maes A, Genero M, Poels G, Piattini M (2010) The impact of structural complexity on the understandability of UML statechart diagrams. Inf Sci 180(11):2209–2220
de Moura LM, Bjørner N (2008) Z3: an efficient SMT solver. In: Proceedings of 14th International conference on tools and algorithms for the construction and analysis of systems (TACAS), pp 337–340
Eisenbarth T, Koschke R, Vogel G (2005) Static object trace extraction for programs with pointers. J Syst Softw 77(3):263–284
Fjeldstad RK, Hamlen WT (1984) Application program maintenance study: Report to our respondents. In: Proceedings GUIDE 48
Godefroid P, Klarlund N, Sen K (2005) DART: directed automated random testing. In: Proceedings of conf. on programming language design and implementation, pp 213–223
Gravino C, Scanniello G, Tortora G (2015) Source-code comprehension tasks supported by UML design models: Results from a controlled experiment and a differentiated replication. J Vis Lang Comput 28:23–38
Harel D, Naamad A (1996) The STATEMATE semantics of statecharts. ACM Trans Softw Eng Methodol (TOSEM) 5(4):293–333
Knor R, Trausmuth G, Weidl J (1998) Reengineering C/C++ source code by transforming state machines. In: van der Linden F (ed) Development and evolution of software architectures for product families. Springer, Berlin, pp 97–105
Kumar A (2009) SCHAEM: A method to extract statechart representation of FSMs. In: Proceedings of int’l advance computing conference, pp 1556–1561
Kung DC, Suchak N, Gao JZ, Hsia P, Toyoshima Y, Chen C (1994) On object state testing. In: Proceedings of 18th int’l computer software and applications conference (COMPSAC), pp 222–227
Lind-Nielsen J (1999) BuDDy: A binary decision diagram package. https://sourceforge.net/projects/buddy/
Lo D, Khoo SC, Liu C (2007) Efficient mining of iterative patterns for software specification discovery. In: Proceedings of 13th International conference on knowledge discovery and data mining, pp 460–469
Minelli R, Mocci A, Lanza M (2015) I know what you did last summer: An investigation of how developers spend their time. In: Proceedings of 23rd International conference on program comprehension, ICPC ’15, pp 25–35
Murphy GC, Notkin D, Sullivan K (1995) Software reflexion models: Bridging the gap between source and high-level models. SIGSOFT Softw Eng Notes 20(4):18–28
Parnas DL (1994) Software aging. In: Proceedings of 16th International conference on software engineering (ICSE), pp 279–287
Quante J (2008) Do dynamic object process graphs support program understanding? - a controlled experiment. In: 2008 16th IEEE international conference on program comprehension, pp 73–82
Ricca F, Torchiano M, Leotta M, Tiso A, Guerrini G, Reggio G (2018) On the impact of state-based model-driven development on maintainability: a family of experiments using unimod. Empir Softw Eng 23(3):1743–1790
Roehm T, Tiarks R, Koschke R, Maalej W (2012) How do professional developers comprehend software?. In: Proceedings of 34th International conference on software engineering, pp 255–265
Rudell RL (1986) Multiple-valued logic minimization for PLA synthesis. Technical report, California Univ. Berkeley Electronics Research Lab
Said W, Quante J, Koschke R (2018a) On state machine mining from embedded control software. In: Proceedings of 34th International conference on software maintenance and evolution, pp 163–172
Said W, Quante J, Koschke R (2018b) Towards interactive mining of understandable state machine models from embedded software. In: Proceedings of 6th International conference on model-driven engineering and software development (MODELSWARD), pp 117–128
Said W, Quante J, Koschke R (2019a) Do extracted state machine models help to understand embedded software?. In: Proceedings of 27th International conference on program comprehension (ICPC), pp 191–196
Said W, Quante J, Koschke R (2019b) Towards understandable guards of extracted state machines from embedded software. In: 26th International conference on software analysis, evolution and reengineering (SANER), pp 264–274
Scanniello G, Gravino C, Genero M, Cruz-Lemus JA, Tortora G, Risi M, Dodero G (2018) Do software models based on the UML aid in source-code comprehensibility? aggregating evidence from 12 controlled experiments. Empir Softw Eng 23(5):2695–2733
Sen K, Marinov D, Agha G (2005) CUTE: a concolic unit testing engine for C. In: Proceedings of 10th european software engineering conf. / 13th int’l symp. on foundations of software engineering, pp 263–272
Sen T, Mall R (2016) Extracting finite state representation of Java programs. Softw Syst Model 15(2):497–511
Shoham S, Yahav E, Fink SJ, Pistoia M (2008) Static specification mining using automata-based abstractions. IEEE Trans Softw Eng 34(5):651–666
Snelting G, Robschink T, Krinke J (2006) Efficient path conditions in dependence graphs for software safety analysis. ACM Trans Softw Eng Methodol 15 (4):410–457
Somė SS, Lethbridge T (2002) Enhancing program comprehension with recovered state models. In: 10th int’l workshop on program comprehension (IWPC), pp 85–93
Tarjan RE (1974) Testing flow graph reducibility. J Comput Syst Sci 9(3):355–365
Tonella P, Potrich A (2011) Reverse engineering of object oriented code. Springer Publishing Company, Incorporated
van den Brand M, Serebrenik A, van Zeeland D (2008) Extraction of state machines of legacy C code with cpp2XMI. In: Proceedings of 7th belgian-netherlands software evolution workshop, pp 28–30
Vasu S, Kust O (2018) SCODE: Designing and verifying functionally safe systems in conformance to IEC61508 and ISO26262. In: Bargende M, Reuss HC, Wiedemann J (eds) Internationales Stuttgarter symposium, vol 18. Springer Fachmedien, Wiesbaden, pp 981–992
Walkinshaw N, Bogdanov K, Ali S, Holcombe M (2008) Automated discovery of state transitions and their functions in source code. Softw Test Verif Reliab 18(2):99–121
Whaley J, Martin MC, Lam MS (2002) Automatic extraction of object-oriented component interfaces. SIGSOFT Softw Eng Notes 27(4):218–228
Xiao H, Sun J, Liu Y, Lin S, Sun C (2013) TzuYu: Learning stateful typestates. In: Proceedings of 28th International conference on automated software engineering, pp 432–442
Xie T, Martin E, Yuan H (2006) Automatic extraction of abstract-object-state machines from unit-test executions. In: Proceedings of 28th International conference on software engineering, pp 835–838
Zimmerman MK, Lundqvist K, Leveson N (2002) Investigating the readability of state-based formal requirements specification languages. In: Proceedings of 24th International conference on software engineering, pp 33–43
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Massimiliano Di Penta and David D. Shepherd
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Software Analysis, Evolution and Reengineering (SANER)
Rights and permissions
About this article
Cite this article
Said, W., Quante, J. & Koschke, R. Mining understandable state machine models from embedded code. Empir Software Eng 25, 4759–4804 (2020). https://doi.org/10.1007/s10664-020-09865-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-020-09865-0