In this section, we summarise the responses to the questions in Table 3 and answer the RQs 1, 2, and 3 as explained in Section 4.1. To answer RQ1, we describe the sample in Section 5.2 and discuss some facets of FM use in Section 5.3. For RQ2, we summarise data about past use and usage intent in Section 5.4. For RQ3, we analyse further data in Section 5.5.
For data collection, we (1) advertised our survey on the channels in Table 5 and (2) personally invited > 30 persons. The sampling period lasted from August 2017 til March 2019. In this period, we repeated step 1 up to three times to increase the number of participants. Figure 1 summarises the distribution of responses. The channels in Table 5 particularly cover the European and North American areas.
Description of the Sample (Answering RQ 1)
A size estimation of the channels in Table 5 yields around 65K channel memberships (for some channels we make a best guess but, e.g. for LinkedIn the counts are given). Assuming participants are, on average, member of at least three of the channels, we could have reached up to 20K real persons. Given a recent estimate of worldwide 23 million SE practitioners (Evans Data 2018) and assuming that at least 1% are mission-critical SE practitioners, our population might comprise at least 230K persons, possibly around 38K in the US and 61K in Europe.Footnote 6 We received N = 216 responses resulting in an estimated response rate between 1 and 2% and a population coverage of at most 0.1% globally and 0.2% in the US and in Europe. About 40% of our respondents provided their email addresses, the majority from the US, UK, Germany, France, and a sixth from other EU and non-EU countries.
In the following, we summarise the responses to the questions about the application domain (Q1), the level of experience (Q2), and the motivations (Q3) of a FM user.
Guide to the Figures
For Likert-type ordered scales, we use centred diverging stacked bar charts (see, e.g. Fig. 4) as recommended by Robbins and Heiberger (2011). The horizontal bars in each line show the answer fractions according to the legend at the bottom and are annotated with the percentages of the left-most, middle, and right-most answer options. These bars are aligned by the midpoint of the middle group (for 3- and 5-level scales) or by the boundary between the two central groups (for 4-level scales). Bar labels often abbreviate the corresponding answer options in the questionnaire. The questionnaire copy in Appendix A.11 contains short definitions, explanations, and examples to clarify the answer options. For sake of brevity, we do not repeat this information here. “M” denotes the median, “CI” the 95% confidence interval for the median calculated according to Campbell and Gardner (1988), “X” the number of excluded data points per answer option, and “NA” the number of invalid data points.
Q1: Application Domain
For each domain, Fig. 2 shows the number of participants having experience in that domain.Footnote 7 Note that 180 of the respondents do have experience with applying FM in different industrial contexts, while only 36 have not applied FMs to any application domain. Medical healthcare is an example where participants could have checked more than one answer category because medical devices would belong to “device industry” and emergency management IT would belong to “critical infrastructures”. See Appendix A.11 for more information about the answer categories.
Q2: FM Experience
Figure 3 depicts participants’ years of experience in using FMs, showing that the sample covers all experience levels. However, the fraction of respondents with no experience (i.e., category “0”) is comparatively low. According to Section 4.6, one third of the participants can be considered LEs with up to three years of experience, and two thirds can be considered MEs with at least three years of experience (29 of those with even more than 25 years). A further analysis of the study participants’ experience profile is available from Table 8 in Appendix A.1 on page 36.
Figure 4 suggests that regulatory authorities play a subordinate role in triggering the use of FMs. In contrast, intrinsic motivation (in terms of private interest) seems to be the major factor for using FMs. For 9 respondents, none of the given factors was motivating at all. The 88 open responses for this question could either be subsumed in at least one of the given categories (65 in “Own (private) interest”, 11 in other categories) or be declared as a comment (3) or not a further motivation (9). Hence, coding did not require an additional answer category to Q3.
Facets of Formal Methods Use (Answering RQ 1)
In the following, we summarise the responses to the questions about the role of a user (Q4), use in specification (Q5), use in analysis (Q6), and the underlying purpose (Q7) of such use.
Figure 5 shows in which roles the respondents applied FMs. An analysis of the MC answers shows that 72% of the participants used FMs in an academic environment, as a researcher, lecturer, or student. 50% of the participants applied FMs in practice, as an engineer or consultant (see also Gleirscher and Marmsoler (2018)).
Q5: Use in Specification
The degree of usage of FMs for specification is depicted in Fig. 6. There is an almost balanced proportion between theoretical and practical experience with the use of various specification techniques. Only the use of FMs for the description of dynamical systems seems to be remarkably low.
Q6: Use in Analysis
The use of FMs for analysis is depicted in Fig. 7. Similar to specification techniques, we observe an almost balanced proportion between theoretical and practical experience with the usage of various analysis techniques. Outstanding is the use of assertion checking techniques, such as contracts. As expected from the observations for Q5, the use of FMs in computational engineering, such as algebraic reasoning about differential equations, is again exceptionally low.
Figure 8 depicts the participants’ purposes to apply FMs. It seems that the respondents employ FMs mainly for assurance, specification, and inspection. Synthesis, on the other hand, to them seems to be only a subordinate purpose in the use of FMs.
Past Use Versus Usage Intent (Answering RQ 2)
We investigate the usage intent of FMs across various domains and roles as well as the participants’ intent to use various FMs and their intended purpose to use FMs.
Figure 9 compares the respondents’ past domains of FM application with their intended domains (see Q8). This figure reveals two insights into the participants’ intentions to use FMs: (i) Fewer participants do not want to apply FMs in the future (19) than participants that have not used FMs (36, see yellow bars). Ten participants fall into both categories, they have not used FMs and do not intend to use FMs. (ii) The intended application of FMs outperforms the current application of FMs across all domains. Hence, there is a tendency to increase the use of FMs across all application domains.
Figure 10 compares the participants’ roles in which they applied FMs in the past with their intended role to apply FMs in the future (see Q9. Similar to the results for the application domain, we observe that some participants, who have not applied FMs in any role so far, intend to apply such methods in the future. However, the comparison reveals that academic disciplines (i.e., researcher and lecturer) seem to be stable. There is only a small difference between the number of participants who applied FMs in academic domains in the past and the number of participants who want to apply such methods to these domains in the future.
In contrast, there is a significant increase in the number of participants aiming to apply FMs, across all industrial roles.
Furthermore, the diagram shows a strong contrast between past and indented use in the category “Bachelor, master, or PhD student.” We can see several reasons for this difference. From the respondents who “used FMs as a student,” many (i) might not be able to “use FMs as a student” anymore because of having graduated, (ii) did not find FMs or the way FMs were taught as helpful, or (iii) moved into a business domain with no foreseeable demand for the application of FMs.
Q10: Intended Use for Specification
Figure 11 depicts the respondents’ intended future use of various FMs for system specification (i.e., formal description techniques). The figure shows an almost equal amount of participants aiming to decrease (i.e., “no more” and “less”) and increase (i.e., “more often”) their use of FMs for specification. Only dynamical system models again seem to be an exception: more participants want to decrease their use of this technology, compared to participants who want to increase it.
Q11: Intended Use for Analysis
The respondents’ intended use of FMs for the analysis of specifications (i.e., formal reasoning techniques) is depicted in Fig. 12. Except for process calculi, we observe a general tendency of the participants to increase their future FM use.
Q12: Intended Purpose
Figure 13 indicates why respondents intend to apply FMs. Again, there is a tendency of the participants to increase FM use across all listed purposes.
Q7 and Q12: Comparison of Code- and Model-based FMs
In the following, we regard practitioners with experience level “applied several times in engineering practice” or “applied once in engineering practice” and frequency “applied in 2 to 5 separate tasks” or “applied in more than 5 separate tasks” (see Table 4). We compare users of code-based FMs (CBs; including “abstract interpretation”, “assertion checking”, “symbolic execution”, “consistency checking”; with N = 128) with users of model-based FMs (MBs; including “process calculi”, “model checking”, “theorem proving”, and “simulation”; with N = 114). While some of the FM classes can be seen as both, code- and model-based, we made a choice based on our experience but left out “constraint solving” because it is a fundamental technique intensively applied in both.
The comparison of past and future use for code-based (top half of Fig. 21 in Appendix A.4) and model-based FMs (bottom half of Fig. 21), for example, in inspection (e.g. error detection, bug finding) shows the following:
CBs show slightly more frequently an increased intent (the “more often” group) than MBs; for both sub-groups, respondents with 2 to 5 and with more than 5 past uses.
MBs show slightly more frequently a decreased intent (the “no more” group) than CBs.
Looking at assurance (e.g. proof, error removal) shows the following:
MBs show slightly more frequently an increased intent than CBs when looking at respondents who have used FMs more than 5 times. However, MBs indicate slightly less frequently an increased intent than CBs when looking at respondents with 2 to 5 uses.
CBs indicate more dnk s after 2 to 5 uses and slightly more frequently a decreased intent after 5 uses in comparison with MBs.
Q1, Q5, and Q6: Practised FM Classes by Application Domain
We asked respondents about their use of each FM class independent of the application domain and about their general use of FMs in each such domain. Hence, we can only approximate past usage per FM class and application domain assuming that the overall usage per respondent is uniformly distributed among the specified FM classes and domains. For that, we interpret (and count) each respondent who specifies a domain in combination with “applied once in engineering practice” or “applied several times in engineering practice” for an FM class as a practitioner who has used (UFMp) or, respectively, wants to use (UFMi) FMs of that class in that domain. More generally, we count a respondent who specifies n domains, say d1 to dn, in combination with “applied once in engineering practice” or “applied several times in engineering practice” for m FM classes, say c1 to cm, as a practitioner who has used (UFMp) or, respectively, wants to use (UFMi) FMs of the classes c1 to cm in the domains d1 to dn. Figs. 14 and 15 show these approximations for UFMp and UFMi.
Perception of Challenges (Answering RQ 3)
Table 6 lists the FM challenges subject to discussion, their background, and literature referring to them. We apply the procedure described in Section 4.6.
General Ranking (Q13)
Figure 16 shows the respondents’ ratings of all challenges. Most of them believe that scalability will be the toughest challenge and maintainability is considered the least difficult of all rated obstacles. For reuse of proof results, proper abstractions, and tool support, the participants distribute more uniformly across moderate and high difficulty.
In the following, we compare specific groups of respondents by how they perceive the difficulty of the various challenges. We group respondents according to the criteria in Section 4.6 and according to the role, motivating factor, FM class, and purpose they specified. Appendix A.6 provides some background material for the following association analyses.
Less Experienced (LE) Versus more Experienced (ME) Respondents (Q2)
The comparison of the difficulty ratings of LEs with the ratings of MEs shows that (i) LEs less often perceive the given challenges as tough, t (ii) MEs significantly more often rate scalability as tough, (iii) both groups show the closest agreement on transfer of verification results and skills and education.
Non-practitioners (NP) Versus Practitioners (P) by Past Purpose (Q7)
The perception of skills and education and scalability as the most difficult challenges is largely independent of the purpose, again Ps attributing more significance to scalability. Scalability, the forerunner in Fig. 16, exhibits the most tough-ratings from NPs in synthesis and from Ps in assurance and clarification (see the top half of Fig. 22 in Appendix A.6).
Decreased Intent (Di) Versus Increased Intent (II) by Purpose (Q12)
The comparison of the difficulty ratings of respondents with no or decreased intent to use FMs for a specific purpose and of respondents with equal or increased intent shows: (i) Scalability and skills and education, both forerunners in Fig. 16, show the most tough-ratings from IIs for assurance (67%) and inspection (66%) and from DIs for synthesis (53%). (ii) The trend in Fig. 16 is more clearly observable from IIs than from DIs, where transfer of verification results and automation and tool support seem to be tougher than skills and education.
Non-Practitioners (NP) Versus Practitioners (P) by FM Class (Q5, Q6)
The top half of Fig. 17 shows for NPs, the trend in Fig. 16 is largely independent of the FM class, except for consistency checking and logic leading with tough proportions of 49%.
The bottom half of Fig. 17 shows for Ps, difficulty ratings across FM classes vary more: The foremost challenges in Fig. 16 received the most tough-ratings from users of process models, dynamical systems, process calculi, model checking, and theorem proving. Difficulty ratings of users are often centred on moderate or tough, proper abstraction and skills and education show a comparatively wide variety across FM classes.
The histograms in the lower right corners in Fig. 17 indicate that (i) NPs’ difficulty ratings vary less than Ps’ ratings, (ii) NPs’ ratings are more independent from the FM classes, and (iii) NPs’ difficulty ratings are lower on average than Ps’ ratings. Appendix A.6 contains several such association matrices with more detailed data in the matrix cells.
Decreased Intent (DI) Versus Increased Intent (II) by FM Class (Q10, Q11)
The trend in Fig. 16 is supported by many tough ratings (48%) for transfer of verification results from DIs in consistency checking. However, DIs in process calculi provide comparatively many tough-ratings (39%) for the generally low-ranked automation and tool support. Assertion checking exhibits comparatively low tough-proportions across all challenges whereas process calculi exhibit comparatively high tough-ratings. Mirroring the trend in Fig. 16, IIs show less variance than DIs across all FM classes.
Unmotivated (U) Versus Motivated (M) Respondents by Motivating Factor (Q3)
Respondents with moderate to strong motivation to use FMs more likely identify the given challenges as moderate to tough, regardless of the motivating factor. The trend in Fig. 16 seems explainable by many tough ratings from respondents motivated by regulatory authorities (69%), not motivated by tool providers (56%), and not motivated by superiors/principal investigators (56%, see Fig. 24 in Appendix A.6). Us’ tough-ratings are notably lower than Ms’ tough-ratings.
Past and Future Views by Role (Q4, Q9)
Although participants show role-based discrepancies between their past and intended use of FMs (Fig. 10), the perception of difficulty of the rated challenges seems to be largely similar, following the trend in Fig. 16. The high ranking of scalability (and reusability of verification results) is supported by many tough-ratings from tool provider stakeholders for the past view and many from lecturers for the future view. Respondents not having used FMs or not planning to use FMs exhibit the lowest tough-ratings but also the highest fractions of dnk-answers.
Past and Future Views by Domain (Q1, Q8)
The trend in Fig. 16 is underpinned by highest tough-proportions for respondents from the transportation, military systems, industrial machinery, and supportive domains.